Informatica Velocity

Download as pdf or txt
Download as pdf or txt
You are on page 1of 361

Velocity v9

Phases & Roles

2011 Informatica Corporation. All rights reserved.

Velocity v9
Phase 1: Manage

2011 Informatica Corporation. All rights reserved.

Phase 1: Manage
1 Manage 1.1 Define Project 1.1.1 Establish Business Project Scope 1.1.2 Build Business Case 1.1.3 Assess Centralized Resources 1.2 Plan and Manage Project 1.2.1 Establish Project Roles 1.2.2 Develop Project Estimate 1.2.3 Develop Project Plan 1.2.4 Manage Project 1.3 Perform Project Close

INFORMATICA CONFIDENTIAL

PHASE 1: MANAGE

2 of 23

Phase 1: Manage
Description
Managing the development of a data integration solution requires extensive planning. A well-defined, comprehensive plan provides the foundation from which to build a project solution. The goal of this phase is to address the key elements required for a solid project foundation. These elements include: Scope - Clearly defined business objectives. The measurable, business-relevant outcomes expected from the project should be established early in the development effort. Then, an estimate of the expected Return on Investment (ROI) can be developed to gauge the level of investment and anticipated return. The business objectives should also spell out a complete inventory of business processes to facilitate a collective understanding of these processes among project team members. Planning/Managing - The project plan should detail the project scope as well as its objectives, required work efforts, risks, and assumptions. A thorough, comprehensive scope can be used to develop a work breakdown structure (WBS) and establish project roles for summary task assignments. The plan should also spell out the change and control process that will be used for the project. Project Close/Wrap-Up - At the end of each project, the final step is to obtain project closure. Part of this closure is to ensure the completeness of the effort and obtain sign-off for the project. Additionally, a project evaluation will help in retaining lessons learned and assessing the success of the overall effort.

Prerequisites
None

Roles

Business Project Manager (Primary) Data Integration Developer (Secondary) Data Quality Developer (Secondary) Data Transformation Developer (Secondary) Presentation Layer Developer (Secondary) Production Supervisor (Approve) Project Sponsor (Primary) Quality Assurance Manager (Approve) Technical Architect (Primary) Technical Project Manager (Primary)

Considerations
None

Best Practices
None

Sample Deliverables
None

Last updated: 20-May-08 18:53

INFORMATICA CONFIDENTIAL

PHASE 1: MANAGE

3 of 23

Phase 1: Manage
Task 1.1 Define Project Description
This task entails constructing the business context for the project, defining in business terms the purpose and scope of the project as well as the value to the business (i.e., the business case).

Prerequisites
None

Roles

Business Analyst (Primary) Business Project Manager (Primary) Project Sponsor (Primary)

Considerations
There are no technical considerations during this task; in fact, any discussion of implementation specifics should be avoided at this time. The focus here is on defining the project deliverable in business terms with no regard for technical feasibility. Any discussion of technologies is likely to sidetrack the strategic thinking needed to develop the project objectives.

Best Practices
None

Sample Deliverables
Project Definition

Last updated: 01-Feb-07 18:43

INFORMATICA CONFIDENTIAL

PHASE 1: MANAGE

4 of 23

Phase 1: Manage
Subtask 1.1.1 Establish Business Project Scope Description
In many ways the potential for success of the development effort for a data integration solution correlates directly to the clarity and focus of its business scope. If the business purpose is unclear or the boundaries of the business objectives are poorly defined, there is a much higher risk of failure or, at least, of a less-than-direct path to limited success.

Prerequisites
None

Roles

Business Analyst (Primary) Business Project Manager (Review Only) Project Sponsor (Primary)

Considerations
The primary consideration in developing the Business Project Scope is balancing the high-priority needs of the key beneficiaries with the need to provide results within the near-term. The Project Manager and Business Analysts need to determine the key business needs and determine the feasibility of meeting those needs to establish a scope that provides value, typically within a 60 to 120 day time-frame. TIP As a general rule, involve as many project beneficiaries as possible in the needs assessment and goal definition. A "forum" type of meeting may be the most efficient way to gather the necessary information since it minimizes the amount of time involved in individual interviews and often encourages useful dialog among the participants. However, it is often difficult to gather all of the project beneficiaries and the project sponsor together for any single meeting, so you may have to arrange multiple meetings and summarize the input for the various participants.

Best Practices
Defining and Prioritizing Requirements

Sample Deliverables
Project Charter

Last updated: 24-Jun-10 14:18

INFORMATICA CONFIDENTIAL

PHASE 1: MANAGE

5 of 23

Phase 1: Manage
Subtask 1.1.2 Build Business Case Description
Building support and funding for a data integration solution nearly always requires convincing executive IT management of its value to the business. The best way to do this, if possible, is to actually calculate the project's estimated return on investment (ROI) through a business case that calculates ROI. ROI modeling is valuable because it: Supplies a fundamental cost-justification framework for evaluating a data integration project. Mandates advance planning among all appropriate parties, including IT team members, business users, and executive management. Helps organizations clarify and agree on the benefits they expect, and in that process, helps them set realistic expectations for the data integration solution or the data quality initiative. In addition to traditional ROI modeling on data integration initiatives, quantitative and qualitative ROI assessments should also include assessments of data quality. Poor data quality costs organizations vast sums in lost revenues. Defective data leads to breakdowns in the supply chain, poor business decisions, and inferior customer relationship management. Moreover, poor quality data can lead to failures in compliance with industry regulations and even to outright project failure at the IT level. It is vital to acknowledge data quality issues at an early stage in the project. Consider a data integration project that is planned and resourced meticulously but that is undertaken on a dataset where the data is of a poorer quality than anyone realized. This can lead to the classic code-load-explode scenario, wherein the data breaks down in the target system due to a poor understanding of the data and metadata. What is worse, a data integration project can succeed from an IT perspective but deliver little if any business value if the data within the system is faulty. For example, a CRM system containing a dataset with a large quantity of redundant or inaccurate records is likely to be of little value to the business. Often an organization does not realize it has data quality issues until it is too late. For this reason, data quality should be a consideration in ROI modeling for all data integration projects from the beginning. For more details on how to quantify business value and associated data integration project cost, please see Assessing the Business Case.

Prerequisites
1.1.1 Establish Business Project Scope

Roles

Business Project Manager (Secondary)

Considerations
The Business Case must focus on business value and, as much as possible, quantify that value. The business beneficiaries are primarily responsible for assessing the project benefits, while technical considerations drive the cost assessments. These two assessments - benefits and costs - form the basis for determining overall ROI to the business.

Building the Business Case Step 1 - Business Benefits


When creating your ROI model, it is best to start by looking at the expected business benefit of implementing the data integration solution. Common business imperatives include: Improving decision-making and ensuring regulatory compliance. Modernizing the business to reduce costs. Merging and acquiring other organizations. Increasing business profitability. Outsourcing non-core business functions to be able to focus on your companys core value proposition.

INFORMATICA CONFIDENTIAL

PHASE 1: MANAGE

6 of 23

Each of these business imperatives requires support via substantial IT initiatives. Common IT initiatives include: Business intelligence initiatives. Retirement of legacy systems. Application consolidation initiatives. Establishment of data hubs for customer, supplier, and/or product data. Business process outsourcing (BPO) and/or Software as a Service (SaaS). For these IT initiatives to be successful, you must be able to integrate data from a variety of disparate systems. The form of those data integration projects may vary. You may have a: Data Warehousing project, which enables new business insight usually through business intelligence. Data Migration project, where data sources are moved to enable a new application or system. Data Consolidation project, where certain data sources or applications are retired in favor of another. Master Data Management project, where multiple data sources come together to form a more complex, master view of the data. Data Synchronization project, where data between two source systems need to stay perfectly consistent to enable different applications or systems. B2B Data Transformation project, where data from external partners is transformed to internal formats for processing by internal systems and responses are transformed back to partner appropriate formats. Data Quality project, where the goals are to cleanse data and to correct errors such as duplicates, missing information, mistyped information and other data deficiencies. Once you have established the heritage of your data integration project back to its origins in the business imperatives, it is important to estimate the value derived from the data integration project. You can estimate the value by asking questions such as: What is the business goal of this project? Is this relevant? What are the business metrics or key performance indicators associated with this goal? How will the business measure the success of this initiative? How does data accessibility affect the business initiative? Does having access to all of your data improve the business initiative? How does data availability affect the business initiative? Does having data available when its needed improve the business initiative? How does data quality affect the business initiative? Does having good data quality improve the business initiative? Conversely, what is the potential negative impact of having poor data quality on the business initiative? How does data auditability affect the business? Does having an audit trail of your data improve the business initiative from a compliance perspective? How does data security affect the business? Does ensuring secure data improve the business initiative? After asking the questions above, youll start to be able to equate business value, in a monetary number, with the data integration project. Remember to not only estimate the business value over the first year after implementation, but also over the course of time. Most business cases and associated ROI models factor in expected business value for at least three years. If you are still struggling with estimating business value with the data integration initiative, see the table below that outlines common business value categories and how they relate to various data integration initiatives:

Business Value Category Explanation INCREASE REVENUE

Typical Metrics

Data Integration Examples

INFORMATICA CONFIDENTIAL

PHASE 1: MANAGE

7 of 23

New Customer Acquisition

Lower the costs of acquiring new customers

- cost per new customer acquisition - cost per lead - # new customers acquired/month per sales rep or per office/store

- Marketing analytics - Integration of third party data (from credit bureaus, directory services, salesforce.com, etc.) - Single view of customer across all products, channels - Marketing analytics & customer segmentation - Customer lifetime value analysis - Sales/agent productivity dashboard - Sales & demand analytics - Customer master data integration - Demand chain synchronization - Data sharing across design, development, production and marketing/sales teams - Data sharing with third parties e.g. contract manufacturers, channels, marketing agencies - Cross-geography/crosschannel pricing visibility - Differential pricing analysis and tracking - Promotions effectiveness analysis - product master data integration - demand analysis - cross-supplier purchasing history - cross-enterprise inventory rollup - scheduling and production synchronization - integration with third party logistics management and distribution partners - invoicing/collections reconciliation - fraud detection
8 of 23

Cross-Sell / UpSell

Increase penetration and sales within existing customers

% cross-sell rate # products/customer % share of wallet customer lifetime value

Sales and Channel Management

Increase sales productivity, and improve visibility into demand

- sales per rep or per employee - close rate - revenue per transaction

New Product / Service Delivery

Accelerate new product/service introductions, and improve "hit rate" of new offerings

- # new products launched/year - new product/service launch time - new product/service adoption rate

Pricing / Promotions

Set pricing and promotions to stimulate demand while improving margins

- margins - profitability per segment - cost-per-impression, cost-peraction

LOWER COSTS Supply Chain Management

Lower procurement costs, increase supply chain visibility, and improve inventory management Lower the costs to manufacture products and/or deliver services

purchasing discounts inventory turns quote-to-cash cycle time demand forecast accuracy production cycle times cost per unit (product) cost per transaction (service) straight-through-processing rate

Production & Service Delivery

Logistics & Distribution

Lower distribution costs and improve visibility into distribution chain

- distribution costs per unit - average delivery times - delivery date reliability

Invoicing, Collections and Fraud

Improve invoicing and collections efficiency, and detect/prevent fraud

- # invoicing errors - DSO (days sales outstanding) - % uncollectible


PHASE 1: MANAGE

INFORMATICA CONFIDENTIAL

Prevention Financial Management Streamline financial management and reporting

- % fraudulent transactions - End-of-quarter days to close - Financial reporting efficiency - Asset utilization rates - Financial data warehouse/reporting - Financial reconciliation - Asset management/tracking - Financial reporting - Compliance monitoring & reporting

MANAGE RISK Compliance Risk(e.g. SEC/SOX/Basel II/PCI)

Prevent compliance outages to avoid investigations, penalties, and negative impact on brand

-# negative audit/inspection findings - probability of compliance lapse - cost of compliance lapses (fines, recovery costs, lost business) - audit/oversight costs

Financial/Asset Risk Management

Improve risk management of key assets, including financial, commodity, energy or capital assets

errors & omissions probability of loss expected loss safeguard and control costs

- Risk management data warehouse - Reference data integration - Scenario analysis - Corporate performance management - Resiliency and automatic failover/recovery for all data integration processes

Business Continuity/ Disaster Recovery Risk

Reduce downtime and lost business, prevent loss of key data, and lower recovery costs

- mean time between failure (MTBF) - mean time to recover (MTTR) - recovery time objective (RTO) - recover point objective (RPO -data loss)

Step 2 Calculating the Costs


Now that you have estimated the monetary business value from the data integration project in Step 1, you will need to calculate the associated costs with that project in Step 2. In most cases, the data integration project is inevitable one way or another the business initiative is going to be accomplished so it is best to compare two alternative cost scenarios. One scenario would be implementing that data integration with tools from Informatica, while the other scenario would be implementing the data integration project without Informaticas toolset. Some examples of benchmarks to support the case for Informatica lowering the total cost of ownership (TCO) on data integration and data quality projects are outlined below:

Benchmarks from Industry Analysts, Consultants, and Authors Forrester Research, "The Total Economic Impact of Deploying Informatica PowerCenter", 2004 The average savings of using a data integration/ETL tool vs. hand coding: 31% in development costs 32% in operations costs 32% in maintenance costs 35% in overall project life-cycle costs Gartner, "Integration Competency Center: Where Are Companies Today?", 2005 The top-performing third of Integration Competency Centers (ICCs) will save an average of: 30% in data interface development time and costs 20% in maintenance costs
INFORMATICA CONFIDENTIAL PHASE 1: MANAGE 9 of 23

The top-performing third of ICCs will achieve 25% reuse of integration components Larry English, Improving Data Warehouse and Business Information Quality, Wiley Computer Publishing, 1999. "The business costs of non-quality data, including irrecoverable costs, rework of products and services, workarounds, and lost and missed revenue may be as high as 10 to 25 percent of revenue or total budget of an organization." "Invalid data values in the typical customer database averages around 15 to 20 percent Actual data errors, even though the values may be valid, may be 25 to 30 percent or more in those same databases." "Large organizations often have data redundantly stored 10 times or more." Ponemon Institute-- Study of costs incurred by 14 companies that had security breaches affecting between 1,500 to 900,000 consumer records Total costs to recover from a breach averaged $14 million per company, or $140 per lost customer record Direct costs for incremental, out-of-pocket, unbudgeted spending averaged $5 million per company, or $50 per lost customer for outside legal counsel, mail notification letters, calls to individual customers, increased call center costs and discounted product offers Indirect costs for lost employee productivity averaged $1.5 million per company, or $15 per customer record Opportunity costs covering loss of existing customers and increased difficulty in recruiting new customers averaged $7.5 million per company, or $75 per lost customer record. Overall customer loss averaged 2.6 percent of all customers and ranged as high as 11 percent
In addition to lowering cost of implementing a data integration solution, Informatica adds value to the ROI model by mitigating risk in the data integration project. In order to quantify the value of risk mitigation, you should consider the cost of project overrun and the associated likelihood of overrun when using Informatica vs. when you dont use Informatica for your data integration project. An example analysis of risk mitigation value is below:

INFORMATICA CONFIDENTIAL

PHASE 1: MANAGE

10 of 23

Step 3 Putting it all Together


Once you have calculated the three year business/IT benefits and the three year costs of using PowerCenter vs. not using PowerCenter, put all of this information into a format that is easy-to-read for IT and line of business executive management. The following isa sample summary of an ROI model:

INFORMATICA CONFIDENTIAL

PHASE 1: MANAGE

11 of 23

For data migration projects it is frequently necessary to prove that using Informatica technology for the data migration efforts has benefits over traditional means. To prove the value, three areas should be considered: 1. Informatica Software can reduce the overall project timeline by accelerating migration development efforts. 2. Informatica delivered migrations will have lower risk due to ease of maintenance, less development effort, higher quality of data, and increased project management tools with the metadata driven solution. 3. Availability of lineage reports as to how the data was manipulated by the data migration process and by whom.

Best Practices
Assessing the Business Case Developing the Business Case

Sample Deliverables
INFORMATICA CONFIDENTIAL PHASE 1: MANAGE 12 of 23

None

Last updated: 24-Jun-10 17:41

INFORMATICA CONFIDENTIAL

PHASE 1: MANAGE

13 of 23

Phase 1: Manage
Subtask 1.1.3 Assess Centralized Resources Description
The pre-existence of any centralized resources such as an Integration Competency Center (ICC) has an obvious impact on the tasks to be undertaken in a data integration project. The objective in Velocity is not to replicate the material that is available elsewhere on the set-up and operation of an ICC(http://www.informatica.com/solutions/icc/default.htm ). However, there are points in the development cycle where the availability of some degree of centralized resources has a material effect on the Velocity Work Breakdown Structure (WBS); some tasks are altered, some may no longer be required, and it is even possible that some new tasks will be created. If an ICC does not already exist, this subtask is finished since there are no centralized resources to assess and all the tasks in the Velocity WBS are the responsibility of the development team. If an ICC does exist, it is necessary to assess the extent and nature of the resources available in order to demarcate the responsibilities between the ICC and project teams. Typically, the ICC acquires responsibility for some or all of the data integration infrastructure (essentially the Non-Functional Requirements) and the project teams are liberated to focus on the functional requirements. The precise division of labor is obviously dependent on the degree of centralization and the associated ICC model that has been adopted. In the task descriptions that follow, an ICC section is included under the Considerations heading where alternative or supplementary activity is required if an ICC is in place.

Prerequisites
None

Roles

Business Project Manager (Primary)

Considerations
It is the responsiblity of the project manager to review the Velocity WBS in the light of the services provided by the ICC. The responsibility for each subtask should be established.

Best Practices
Selecting the Right ICC Model Planning the ICC Implementation

Sample Deliverables
None

Last updated: 01-Feb-07 18:43

INFORMATICA CONFIDENTIAL

PHASE 1: MANAGE

14 of 23

Phase 1: Manage
Task 1.2 Plan and Manage Project Description
This task incorporates the initial project planning and management activities as well as project management activities that occur throughout the project lifecycle. It includes the initial structure of the project team and the project work steps based on the business objectives and the project scope, and the continuing management of expectations through status reporting, issue tracking and change management.

Prerequisites
None

Roles

Business Project Manager (Primary) Data Integration Developer (Secondary) Data Quality Developer (Secondary) Presentation Layer Developer (Secondary) Project Sponsor (Approve) Technical Architect (Primary) Technical Project Manager (Primary)

Considerations
In general, project management activities involve reconciling trade-offs between business requests as to functionality and timing with technical feasibility and budget considerations. This often means balancing between sensitivity to project goals and concerns ("being a good listener") on the one hand, and maintaining a firm grasp of what is feasible ("telling the truth") on the other. The tools of the trade, apart from strong people skills (especially, interpersonal communication skills), are detailed documentation and frequent review of the status of the project effort against plan, of the unresolved issues, and of the risks regarding enlargement of scope ("change management"). Successful project management is predicated on regular communication of these project aspects with the project manager, and with other management and project personnel. For data migration projects there is often a project management office (PMO) in place The PMO is typically found in high dollar, high profile projects such as implementing a new ERP system that will often cost in the millions of dollars. It is important to identify the roles and gain the understanding of the PMO as to how these roles are needed and will intersect with the broader system implementation. More specifically, these roles will have responsibility beyond the data migration, so the resource requirements for the Data Migration must be understood and guaranteed as part of the larger effort overseen by the PMO. For B2B projects, technical considerations typically play an important role. The format of data received from partners (and replies sent to partners) forms a key consideration in overall business operations and has a direct impact on the planning and scoping of changes. Informatica recommends having the Technical Architect directly involved throughout the process.

Best Practices
None

Sample Deliverables
None

Last updated: 20-May-08 19:13

INFORMATICA CONFIDENTIAL

PHASE 1: MANAGE

15 of 23

Phase 1: Manage
Subtask 1.2.1 Establish Project Roles Description
This subtask involves defining the roles/skill sets that will be required to complete the project. This is a precursor to building the project team and making resource assignments to specific tasks.

Prerequisites
None

Roles

Business Project Manager (Primary) Project Sponsor (Approve) Technical Project Manager (Primary)

Considerations
The Business Project Scope established in 1.1.1 Establish Business Project Scope provides a primary indication of the required roles and skill sets. The following types of questions are useful discussion topics and help to validate the initial indicators: What are the main tasks/activities of the project and what skills/roles are needed to accomplish them? How complex or broad in scope are these tasks? This can indicate the level of skills needed. What responsibilities will fall to the company resources and which are off-loaded to a consultant? Who (i.e. company resource or consultant) will provide the project management? Who will have primary responsibility for infrastructure requirements? ...for data architecture? ...for documentation? ...for testing? ...for deployment/training/support? How much development and testing will be involved? This is a definitional activity and very distinct from the later assignment of resources. These roles should be defined as generally as possible rather than attempting to match a requirement with a resource at hand. After the project scope and required roles have been defined, there is often pressure to combine roles due to limited funding or availability of resources. There are some roles that inherently provide a healthy balance with one another, and if one person fills both of these roles, project quality may suffer. The classic conflict is between development roles and highly procedural or operational roles. For example, a QA Manager or Test Manager or Lead should not be the same person as a Project Manager or one of the development team. The QA Manager is responsible for determining the criteria for acceptance of project quality and managing quality-related procedures. These responsibilities directly conflict with the developers need to meet a tight development schedule. For similar reasons, development personnel are not ideal choices for filling such operational roles as Metadata Manager, DBA, Network Administrator, Repository Administrator, or Production Supervisor. Those roles require operational diligence and adherence to procedure as opposed to ad hoc development. When development roles are mixed with operational roles, resulting shortcuts often lead to quality problems in production systems. TIP Involve the Project Sponsor. Before defining any roles, be sure that the Project Sponsor is in agreement as to the project scope and major activities, as well as the level of involvement expected from company personnel and consultant personnel. If this agreement has not been explicitly accomplished, review the project scope with the Project Sponsor to resolve any remaining questions. In defining the necessary roles, be sure to provide the Sponsor with a full description of all roles, indicating which will rely on company personnel and which will use consultant personnel. This sets clear expectations for company involvement and indicates if there is a need to fill additional roles with consultant personnel if the company does not have personnel available in accordance with the project timing.

INFORMATICA CONFIDENTIAL

PHASE 1: MANAGE

16 of 23

The Role Descriptions in Roles provides typical role definitions. The Project Role Matrix can serve as a starting point for completing the project-specific roles matrix.

Best Practices
None

Sample Deliverables
Project Definition Project Role Matrix Work Breakdown Structure

Last updated: 01-Feb-07 18:43

INFORMATICA CONFIDENTIAL

PHASE 1: MANAGE

17 of 23

Phase 1: Manage
Subtask 1.2.2 Develop Project Estimate Description
Once the overall project scope and roles have been defined, details on project execution must be developed. These details should answer the questions of what must be done, who will do it, how long it will take, and how much will it cost. The objective of this subtask is to develop a complete WBS and, subsequently, a solid project estimate. Two important documents required for project execution are the: Work Breakdown Structure (WBS), which can be viewed as a list of tasks that must be completed to achieve the desired project results. (See Developing a Work Breakdown Structure (WBS) for more details) Project Estimate, which, at this time, focuses solely on development costs without consideration for hardware and software liabilities. Estimating a project is never an easy task, and often becomes more difficult as project visibility increases and there is an increasing demand for an "exact estimate". It is important to understand that estimates are never exact. However, estimates are useful for providing a close approximation of the level of effort required by the project. Factors such as project complexity, team skills, and external dependencies always have an impact on the actual effort required. The accuracy of an estimate largely depends on the experience of the estimator (or estimators). For example, an experienced traveller who frequently travels the route between his/her home or office and the airport can easily provide an accurate estimate of the time required for the trip. When the same traveller is asked to estimate travel time to or from an unfamiliar airport however, the estimation process becomes much more complex, requiring consideration of numerous factors such as distance to the airport, means of transportation, speed of available transportation, time of day that the travel will occur, expected weather conditions, and so on. The traveller can arrive at a valid overall estimate by assigning time estimates to each factor, then summing the whole. The resulting estimate however, is not likely to be nearly as accurate as the one based on knowledge gained through experience. The same holds true for estimating the time and resources required to complete development on a data integration solution project.

Prerequisites
None

Roles

Business Project Manager (Primary) Data Integration Developer (Secondary) Data Quality Developer (Secondary) Data Transformation Developer (Secondary) Presentation Layer Developer (Secondary) Project Sponsor (Approve) Technical Architect (Secondary) Technical Project Manager (Secondary)

Considerations
An accurate estimate depends greatly on a complete and accurate Work Breakdown Structure. Having the entire project team review the WBS when it is near completion helps to ensure that it includes all necessary project tasks. Project deadlines often slip because some tasks are overlooked and, therefore, not included in the initial estimates.

INFORMATICA CONFIDENTIAL

PHASE 1: MANAGE

18 of 23

Sample Data Requirements for B2B Projects


For B2B projects (and non B2B projects that have significant unstructured or semi-structured data transformation requirements) the actual creation and subsequent QA of transformations relies on having sufficient samples of input and output data; and specifications for data formats. When estimating for projects that use Informaticas B2B Data Transformation, estimates should include sufficient time to allow for the collection and assembly of sample data, any cleansing of sample data required (for example to conform to HIPAA or financial privacy regulations), and for any data analysis or metadata discovery to be performed on the sample data. By their nature, the full authoring of B2B data transformations cannot be completed (or in some cases proceed) without the availability of adequate sample data both for input to transformations and for comparison purposes during the quality assurance process.

Best Practices
None

Sample Deliverables
None

Last updated: 20-May-08 19:17

INFORMATICA CONFIDENTIAL

PHASE 1: MANAGE

19 of 23

Phase 1: Manage
Subtask 1.2.3 Develop Project Plan Description
In this subtask, the Project Manager develops a schedule for the project using the agreed-upon business project scope to determine the major tasks that need to be accomplished and estimates of the amount of effort and resources required.

Prerequisites
None

Roles

Business Project Manager (Primary) Project Sponsor (Approve) Technical Project Manager (Secondary)

Considerations
The initial project plan is based on agreements-to-date with the Project Sponsor regarding project scope, estimation of effort, roles, project timelines and any understanding of requirements. Updates to the plan (as described in Developing and Maintaining the Project Plan) are typically based on changes to scope, approach, priorities, or simply on more precise determinations of effort and of start and/or completion dates as the project unfolds. In some cases, later phases of the project, like System Test (or "alpha"), Beta Test and Deployment, are represented in the initial plan as a single set of activities, and will be more fully defined as the project progresses. Major activities (e.g., System Test, Deployment, etc.) typically involve their own full-fledged planning processes once the technical design is completed. At that time, additional activities may be added to the project plan to allow for more detailed tracking of those project activities. Perhaps the most significant message here is that an up-to-date plan is critical for satisfactory management of the project and for timely completion of its tasks. Keeping the plan updated as events occur and client understanding or needs and expectations change requires an on-going effort. The sooner the plan is updated and changes communicated to the Project Sponsor and/or company management, the less likely that expectations will be frustrated to a problematic level.

Best Practices
Data Migration Velocity Approach Developing a Work Breakdown Structure (WBS) Developing and Maintaining the Project Plan

Sample Deliverables
Project Roadmap Work Breakdown Structure

Last updated: 24-Jun-10 15:04

INFORMATICA CONFIDENTIAL

PHASE 1: MANAGE

20 of 23

Phase 1: Manage
Subtask 1.2.4 Manage Project Description
In the broadest sense, project management begins before the project starts and continues until its completion and perhaps beyond. The management effort includes: Managing the project beneficiary relationship(s), expectations and involvement Managing the project team, its make-up, involvement, priorities, activities and schedule Managing all project issues as they arise, whether technical, logistical, procedural, or personal. In a more specific sense, project management involves being constantly aware of, or preparing for, anything that needs to be accomplished or dealt with to further the project objectives, and making sure that someone accepts responsibility for such occurrences and delivers in a timely fashion. Project management begins with pre-engagement preparation and includes: Project Kick-off, including the initial project scope, project organization, and project plan Project Status and reviews of the plan and scope Project Content Reviews, including business requirements reviews and technical reviews Change Management as scope changes are proposed, including changes to staffing or priorities Issues Management Project Acceptance and Close

Prerequisites
None

Roles

Business Project Manager (Primary) Project Sponsor (Review Only) Technical Project Manager (Primary)

Considerations
In all management activities and actions, the Project Manager must balance the needs and expectations of the Project Sponsor and project beneficiaries with the needs, limitations and morale of the project team. Limitations and specific needs of the team must be communicated clearly and early to the Project Sponsor and/or company management to mitigate unwarranted expectations and avoid an escalation of expectation-frustration that can have a dire effect on the project outcome. Issues that affect the ability to deliver in any sense, and potential changes to scope, must be brought to the Project Sponsor's attention as soon as possible and managed to satisfactory resolution. In addition to "expectation management", project management includes Quality Assurance for the project deliverables. This involves soliciting specific requirements with subsequent review of deliverables that include in addition to the data integration solution documentation, user interfaces, knowledge-transfer and testing procedures.

Best Practices
Data Migration Project Challenges Managing the Project Lifecycle

Sample Deliverables
Issues Tracking

INFORMATICA CONFIDENTIAL

PHASE 1: MANAGE

21 of 23

Project Review Meeting Agenda Project Status Report Scope Change Assessment

Last updated: 24-Jun-10 15:05

INFORMATICA CONFIDENTIAL

PHASE 1: MANAGE

22 of 23

Phase 1: Manage
Task 1.3 Perform Project Close Description
This is a summary task that entails closing out the project and creating project wrap-up documentation. Each project should end with an explicit closure procedure. This process should include Sponsor acknowledgement that the project is complete and the end product meets expectations. A Project Close Report should be completed at the conclusion of the effort, along with a final status report. The project close documentation should highlight project accomplishments, lessons learned, justifications for tasks expected but not completed, and any recommendations for future work on the end product. This task should also generate a reconciliation document, reconciling project time/budget estimates with actual time and cost expenditures. As mentioned earlier in this chapter, experience is an important tool for succeeding in future efforts. Building upon the experience of a project team and publishing this information will help future teams succeed in similar efforts.

Prerequisites
None

Roles

Business Project Manager (Primary) Production Supervisor (Approve) Project Sponsor (Approve) Quality Assurance Manager (Approve) Technical Project Manager (Approve)

Considerations
None

Best Practices
None

Sample Deliverables
Project Close Report

Last updated: 01-Feb-07 18:43

INFORMATICA CONFIDENTIAL

PHASE 1: MANAGE

23 of 23

Velocity v9
Phase 2: Analyze

2011 Informatica Corporation. All rights reserved.

Phase 2: Analyze
2 Analyze 2.1 Define Business Drivers, Objectives and Goals 2.2 Define Business Requirements 2.2.1 Define Business Rules and Definitions 2.2.2 Establish Data Stewardship 2.3 Define Business Scope 2.3.1 Identify Source Data Systems 2.3.2 Determine Sourcing Feasibility 2.3.3 Determine Target Requirements 2.3.4 Determine Business Process Data Flows 2.3.5 Build Roadmap for Incremental Delivery 2.4 Define Functional Requirements 2.5 Define Metadata Requirements 2.5.1 Establish Inventory of Technical Metadata 2.5.2 Review Metadata Sourcing Requirements 2.5.3 Assess Technical Strategies and Policies 2.6 Determine Technical Readiness 2.7 Determine Regulatory Requirements 2.8 Perform Data Quality Audit 2.8.1 Perform Data Quality Analysis of Source Data 2.8.2 Report Analysis Results to the Business

INFORMATICA CONFIDENTIAL

PHASE 2: ANALYZE

2 of 51

Phase 2: Analyze
Description
Increasingly, organizations demand faster, better, and cheaper delivery of data integration and business intelligence solutions. Many development failures and project cancellations can be traced to an absence of adequate upfront planning and scope definition. Inadequately defined or prioritized objectives and project requirements foster scenarios where project scope becomes a moving target as requirements may change late in the game, requiring repeated rework of design or even development tasks. The purpose of the Analyze Phase is to build a solid foundation for project scope through a deliberate determination of the business drivers, requirements, and priorities that will form the basis of the project design and development. Once the business case for a data integration or business intelligence solution is accepted and key stakeholders are identified, the process of detailing and prioritizing objectives and requirements can begin - with the ultimate goal of defining project scope and, if appropriate, a roadmap for major project stages.

Prerequisites
None

Roles

Application Specialist (Primary) Business Analyst (Primary) Business Project Manager (Primary) Data Architect (Primary) Data Integration Developer (Primary) Data Quality Developer (Primary) Data Steward/Data Quality Steward (Primary) Database Administrator (DBA) (Primary) Legal Expert (Primary) Metadata Manager (Primary) Project Sponsor (Secondary) Security Manager (Primary) System Administrator (Primary) Technical Architect (Primary) Technical Project Manager (Primary)

Considerations
Functional and technical requirements must focus on the business goals and objectives of the stakeholders, and must be based on commonly agreed-upon definitions of business information. The initial business requirements are then compared to feasibility studies of the source systems to help the prioritization process that will result in a project roadmap and rough timeline. This sets the stage for incremental delivery of the requirements so that some important needs are met as soon as possible, thereby providing value to the business even though there may be a much longer timeline to complete the entire project. In addition, during this phase it can be valuable to identify the available technical metadata as a way to accelerate the design and improve its quality. A successful Analyze Phase can serve as a foundation for a successful project.

INFORMATICA CONFIDENTIAL

PHASE 2: ANALYZE

3 of 51

Best Practices
None

Sample Deliverables
None

Last updated: 01-Feb-07 18:43

INFORMATICA CONFIDENTIAL

PHASE 2: ANALYZE

4 of 51

Phase 2: Analyze
Task 2.1 Define Business Drivers, Objectives and Goals Description
In many ways, the potential for success of any data integration/business intelligence solution correlates directly to the clarity and focus of its business scope. If the business objectives are vague, there is a much higher risk of failure or, at least, of a less-thandirect path to likely limited success.

Business Drivers
The business drivers explain why the solution is needed and is being recommended at a particular time by identifying the specific business problems, issues, or increased business value that the project is likely to resolve or deliver. Business drivers may include background information necessary to understand the problems and/or needs. There should be clear links between the projects business drivers and the companys underlying business strategies.

Business Objectives
Objectives are concrete statements describing what the project is trying to achieve. Objectives should be explicitly defined so that they can be evaluated at the conclusion of a project to determine if they were achieved. Objectives written for a goal statement are nothing more than a deconstruction of the goal statement into a set of necessary and sufficient objective statements. That is, every objective must be accomplished to reach the goal, and no objective is superfluous. Objectives are important because they establish a consensus between the project sponsor and the project beneficiaries regarding the project outcome. The specific deliverables of an IT project, for instance, may or may not make sense to the project sponsor. However, the business objectives should be written so they are understandable by all of the project stakeholders.

Business Goals
Goal statements provide the overall context for what the project is trying to accomplish. They should align with the company's stated business goals and strategies. Project context is established in a goal statement by stating the project's object of study, its purpose, its quality focus, and its viewpoint. Characteristics of a well-defined goal should reference the project's business benefits in terms of cost, time, and/or quality. Because goals are high-level statements, it may take more than one project to achieve a stated goal. If the goal's achievement can be measured, it is probably defined at too low a level and may actually be an objective. If the goal is not achievable through any combination of projects, it is probably too abstract and may be a vision statement. Every project should have at least one goal. It is the agreement between the company and the project sponsor about what is going to be accomplished by the project. The goal provides focus and serves as the compass for determining if the project outcomes are appropriate. In the project management life cycle, the goal is bound by a number of objective statements. These objective statements clarify the fuzzy boundary of the goal statement. Taken as a pair, the goal and objectives statements define the project. They are the foundation for project planning and scope definition.

Prerequisites
None

Roles

Business Project Manager (Review Only) Project Sponsor (Review Only)

Considerations Business Drivers


The business drivers must be defined using business language. Identify how the project is going to resolve or address specific

INFORMATICA CONFIDENTIAL

PHASE 2: ANALYZE

5 of 51

business problems. Key components when identifying business drivers include: Describe facts, figures, and other pertinent background information to support the existence of a problem. Explain how the project resolves or helps to resolve the problem in terms familiar to the business. Show any links to business goals, strategies, and principles. Large projects often have significant business and technical requirements that drive the project's development. Consider explaining the origins of the significant requirements as a way of explaining why the project is needed.

Business Objectives
Before the project starts, define and agree on the project objectives and the business goals they define. The deliverables of the project are created based on the objectives - not the other way around. A meeting between all major stakeholders is the best way to create the objectives and gain a consensus on them at the same time. This type of meeting encourages discussion among participants and minimizes the amount of time involved in defining business objectives and goals. It may not be possible to gather all the project beneficiaries and the project sponsor together at the same time so multiple meetings may have to be arranged with the results summarized. While goal statements are designed to be vague, a well-worded objective is Specific, Measurable, Attainable/Achievable, Realistic and Time-bound (SMART). Specific: An objective should address a specific target or accomplishment. Measurable: Establish a metric that indicates that an objective has been met. Attainable: If an objective cannot be achieved, then it's probably a goal. Realistic: Limit objectives to what can realistically be done with available resources. Time-bound: Achieve objectives within a specified time frame. At a minimum, make sure each objective contains four parts, as follows: An outcome - describe what the project will accomplish. A time frame - the expected completion date of the project. A measure - metric(s) that will measure success of the project. An action - how to meet the objective. The business objectives should take into account the results of any data quality investigations carried out before or during the project. If the project source data quality is low, then the project's ability to achieve its objectives may be compromised. If the project has specific data-related objectives, such as regulatory compliance objectives, then a high degree of data quality may be an objective in its own right. For this reason, data quality investigations (such as a Data Quality Audit) should be carried out as early as is feasible in the project life-cycle. See 2.8 Perform Data Quality Audit. Generally speaking, the number of objectives comes down to how much business investment is going to be made in pursuit of the project's goals. High investment projects generally have many objectives. Low investment projects must be more modest in the objectives they pursue. There is considerable discretion in how granular a project manager may get in defining objectives. High-level objectives generally need a more detailed explanation and often lead to more definition in the project's deliverables to obtain the objective. Lower level, detailed objectives tend to require less descriptive narrative and deconstruct into fewer deliverables to obtain. Regardless of the number of objectives identified, the priority should be established by ranking the objectives with their respective impacts, costs, and risks.

Business Goals
The goal statement must also be written in business language so that anyone who reads it can understand it without further explanation. The goal statement should: Be short and to the point. Provide overall context for what the project is trying to accomplish.

INFORMATICA CONFIDENTIAL

PHASE 2: ANALYZE

6 of 51

Be aligned to business goals in terms of cost, time and quality. Smaller projects generally have a single goal. Larger projects may have more than one goal, which should also be prioritized. Since the goal statement is meant to be succinct, regardless of the number of goals a project has, the goal statement should always be brief and to the point.

Best Practices
None

Sample Deliverables
None

Last updated: 18-May-08 17:36

INFORMATICA CONFIDENTIAL

PHASE 2: ANALYZE

7 of 51

Phase 2: Analyze
Task 2.2 Define Business Requirements Description
A data integration/business intelligence solution development project typically originates from a company's need to provide management and/or customers with business analytics or to provide business application integration. As with any technical engagement, the first task is to determine clear and focused business requirements to drive the technology implementation. This requires determining what information is critical to support the project objectives and its relation to important strategic and operational business processes. Project success will be based on clearly identifying and accurately resolving these informational needs with the proper timing. The goal of this task is to ensure the participation and consensus of the project sponsor and key beneficiaries during the discovery and prioritization of these information requirements.

Prerequisites
None

Roles

Business Project Manager (Primary) Data Quality Developer (Secondary) Data Steward/Data Quality Steward (Primary) Legal Expert (Approve) Metadata Manager (Primary) Project Sponsor (Approve)

Considerations
In a data warehouse/business intelligence project, there can be strategic or tactical requirements.

Strategic Requirements
The customer management is typically interested in strategic questions that often include a significant timeframe. For example, How has the turnover of product x increased over the last year? or, 'What is the revenue of area a in January of this year as compared to last year?. Answers to strategic questions provide company executives with the information required to build on the company strengths and/or to eliminate weaknesses. Strategic requirements are typically implemented through a data warehouse type project with appropriate visualization tools.

Tactical Requirements
The tactical requirements serve the day to day business. Operational level employees want solutions to enable them to manage their on-going work and solve immediate problems. For instance, a distributor running a fleet of trucks has an unavailable driver on a particular day. They would want to answer questions such as, 'How can the delivery schedule be altered in order to meet the delivery time of the highest priority customer?' Answers to these questions are valid and pertinent for only a short period of time in comparison to the strategic requirements. Tactical requirements are often implemented via operational data integration.

Best Practices
None

INFORMATICA CONFIDENTIAL

PHASE 2: ANALYZE

8 of 51

Sample Deliverables
None

Last updated: 02-May-08 12:05

INFORMATICA CONFIDENTIAL

PHASE 2: ANALYZE

9 of 51

Phase 2: Analyze
Subtask 2.2.1 Define Business Rules and Definitions Description
A business rule is a compact and simple statement that represents some important aspect of a business process or policy. By capturing the rules of the businessthe logic that governs its operation systems can be created that are fully aligned with the needs of the organization. Business rules stem from the knowledge of business personnel and constrain some aspect of the business. From a technical perspective, a business rule expresses specific constraints on the creation, updating, and removal of persistent data in an information system. For example, a new bank account cannot be created unless the customer has provided an adequate proof of identification and address.

Prerequisites
None

Roles

Data Quality Developer (Secondary) Data Steward/Data Quality Steward (Primary) Legal Expert (Approve) Metadata Manager (Primary) Security Manager (Approve)

Considerations
Formulating business rules is an iterative process, often stemming from statements of policy in an organization. Rules are expressed in natural language. The following set of guidelines follow best practices and provide practical instructions on how to formulate business rules: Start with a well-defined and agreed upon set of unambiguous definitions captured in a definitions repository. Re-use existing definitions if available. Use meaningful and precise verbs to connect the definitions captured above. Use standard expressions to constrain business rules, such as must, must not, only if, no more than, etc. For example, the total commission paid to broker ABC can be no more than xy% of the total revenue received for the sale of widgets. Use standard expressions for derivation business rules like "x is calculated from/", "summed from", etc. For example, "the departmental commission paid is calculated as the total commission multiplied by the departmental rollup rate." The aim is to define atomic business rules, that is, rules that cannot be decomposed further. Each atomic business rule is a specific, formal statement of a single term, fact, derivation, or constraint on the business. The components of business rules, once formulated, provide direct inputs to a subsequent conceptual data modeling and analysis phase. In this approach, definitions and connections can eventually be mapped onto a data model and constraints and derivations can be mapped onto a set of rules that are enforced in the data model.

Best Practices
None

Sample Deliverables
Business Requirements Specification

INFORMATICA CONFIDENTIAL

PHASE 2: ANALYZE

10 of 51

Last updated: 01-Feb-07 18:43

INFORMATICA CONFIDENTIAL

PHASE 2: ANALYZE

11 of 51

Phase 2: Analyze
Subtask 2.2.2 Establish Data Stewardship Description
Data stewardship is about keeping the business community involved and focused on the goals of the project being undertaken. This subtask outlines the roles and responsibilities that key personnel can assume within the framework of an overall stewardship program. This participation should be regarded as ongoing because stewardship activities need to be performed at all stages of a project lifecycle and continue through the operational phase.

Prerequisites
None

Roles

Business Analyst (Secondary) Business Project Manager (Primary) Data Steward/Data Quality Steward (Secondary) Project Sponsor (Approve)

Considerations
A useful mix of personnel to staff a stewardship committee may include: An executive sponsor A business steward A technical steward A data steward

Executive Sponsor
Chair of the data stewardship committee Ultimate point of arbitration Liaison to management for setting and reporting objectives Should be recruited from project sponsors or management

Technical Steward
Member of the data stewardship committee Liaison with technical community Reference point for technical-related issues and arbitration Should be recruited from the technical community with a good knowledge of the business and operational processes

Business Steward
Member of the data stewardship committee Liaison with business users Reference point for business-related issues and arbitration Should be recruited from the business community

Data Steward
Member of the data stewardship committee INFORMATICA CONFIDENTIAL PHASE 2: ANALYZE 12 of 51

Balances data and quality targets set by the business with IT/project parameters Responsible for all issues relating to the data, including defining and maintaining business and technical rules and liaising with the business and technical communities Reference point for arbitration where data is put to different uses by separate groups of users whose requirements have to be reconciled The mix of personnel for a particular activity should be adequate to provide expertise in each of the major business areas that will be undertaken in the project. The success of the stewardship function relies on the early establishment and distribution of standardized documentation and procedures. These should be distributed to all of the team members working on stewardship activities. The data stewardship committee should be involved in the following activities: Arbitration Sanity checking Preparation of metadata Support

Arbitration
Arbitration means resolving data contention issues, deciding which is the best data to use, and determining how this data should best be transformed and interpreted so that it remains meaningful and consistent. This is particularly important during the phases where ambiguity needs to be resolved, for example, when conformed dimensions and standardized facts are being formulated by the analysis teams.

Sanity Checking
There is a role for the data stewardship committee to check the results and ensure that the transformation rules and processes have been applied correctly. This is a key verification task and is particularly important in evaluating prototypes developed in the Analyze Phase , during testing, and after the project goes live.

Preparation of Metadata
The data stewardship committee should be actively involved in the preparation and verification of technical and business metadata. Specific tasks are: Determining the structure and contents of the metadata Determining how the metadata is to be collected Determining where the metadata is to reside Determining who is likely to use the metadata Determining what business benefits are provided Determining how the metadata is to be acquired Depending on the tools used to determine the metadata (for example, PowerCenter Profiling option, Informatica Data Explorer), the Data Steward may take a lead role in this activity. Business metadata - The purpose of maintaining this type of information is to clarify context, aid understanding, and provide business users with the ability to perform high level searches for information. Business metadata is used to answer questions such as: How does this division of the enterprise calculate revenue?" Technical metadata - The purpose of maintaining this type of information is for impact analysis, auditing, and source-target analysis. Technical metadata is used to perform analysis such as: What would be the impact of changing the length of a field from INFORMATICA CONFIDENTIAL PHASE 2: ANALYZE 13 of 51

20 to 30 characters and what systems would be affected?

Support
The data stewardship committee should be involved in the inception and preparation of training of the user community by answering questions about data and the tools available to perform analytics. During the Analyze Phase the team would provide inputs to induction training programs prepared for system users when the project goes live. Such programs should include, for example, technical information about how to query the system and semantic information about the data that is retrieved.

New Functionality
The data stewardship committee needs to assess any major additions to functionality. The assessment should consider return on investment, priority, and scalability in terms of new hardware/software requirements. There may be a need to perform this activity during the Analyze Phase if functionality that was initially overlooked is to be included in the scope of the project. After the project has gone live, this activity is of key importance because new functionality needs to be assessed for ongoing development.

Best Practices
None

Sample Deliverables
None

Last updated: 15-Feb-07 17:55

INFORMATICA CONFIDENTIAL

PHASE 2: ANALYZE

14 of 51

Phase 2: Analyze
Task 2.3 Define Business Scope Description
The business scope forms the boundary that defines where the project begins and ends. Throughout the project discussions about the business requirements and objectives, it may appear that everyone views the project scope in the same way. However, there is commonly confusion about what falls inside the boundary of a specific project and what does not. Developing a detailed project scope and socializing it with your project team, sponsors, and key stakeholders is critical.

Prerequisites
None

Roles

Informatica Velocity v6 (Primary) Data Architect (Primary) Data Integration Developer (Primary) Data Quality Developer (Primary) Data Steward/Data Quality Steward (Secondary) Metadata Manager (Primary) Project Sponsor (Secondary) Technical Architect (Primary) Technical Project Manager (Primary)

Considerations
The primary consideration in developing the business scope is balancing the high-priority needs of the key beneficiaries with the need to provide results within the near-term. The Project Manager and Business Analysts need to determine the key business needs and determine the feasibility of meeting those needs to establish a scope that provides value, typically within a 60 to 120 day time-frame. Quick WINS are accomplishments in a relatively short time, without great expense and with a positive outcome - they can be included in the business scope. WINS stand for Ways to Implement New Solutions.

TIP As a general rule, involve as many project beneficiaries as possible in the needs assessment and goal definition. A "forum" type of meeting may be the most efficient way to gather the necessary information since it minimizes the amount of time involved in individual interviews and often encourages useful dialog among the participants. However, it is often difficult to gather all of the project beneficiaries and the project sponsor together for any single meeting, so you may have to arrange multiple meetings and summarize the input for the various participants. A common mistake made by project teams is to define the project scope only in general terms. This lack of definition causes managers and key beneficiaries throughout the company to make assumptions related to their own processes or systems falling inside or outside of the scope of the project. Then later, after significant work has been completed by the project team, some managers are surprised to learn that their assumptions were not correct, resulting in problems for the project team. Other project teams report problems with "scope creep" as their project gradually takes on more and more work. The safest rule is the more detail, the better along with details regarding what related elements are not within scope or will be delayed to a later effort.

Best Practices
INFORMATICA CONFIDENTIAL PHASE 2: ANALYZE 15 of 51

None

Sample Deliverables
None

Last updated: 18-May-08 17:35

INFORMATICA CONFIDENTIAL

PHASE 2: ANALYZE

16 of 51

Phase 2: Analyze
Subtask 2.3.1 Identify Source Data Systems Description
Before beginning any work with the data, it is necessary to determine precisely what data is required to support the data integration solution. In addition, the developers must also determine what source systems house the data, where the data resides in the source systems, and how the data is accessed. In this subtask, the development project team needs to validate the initial list of source systems and source formats and obtain documentation from the source system owners describing the source system schemas. For relational systems, the documentation should include Entity-Relationship diagrams (E-R diagrams) and data dictionaries, if available. For file based data sources (e.g., unstructured, semi-structured and complex XML) documentation may also include data format specifications for both internal and public (in the case of open data format standards) and any deviations from public standards. The development team needs to carefully review the source system documentation to ensure that it is complete (i.e., specifies data owners and dependencies) and current. The team also needs to ensure that the data is fully accessible to the developers and analysts that are building the data integration solution.

Prerequisites
None

Roles

Business Analyst (Primary) Data Architect (Primary) Data Integration Developer (Primary) Data Quality Developer (Primary) Data Transformation Developer (Primary)

Considerations
In determining the source systems for data elements, it is important to request copies of the source system data to serve as samples for further analysis. This is a requirement in 2.8.1 Perform Data Quality Analysis of Source Data , but is also important at this stage of development. As data volumes in the production environment are often large, it is advisable to request a subset of the data for evaluation purposes. However, requesting too small of a subset can be dangerous in that it fails to provide a complete picture of the data and may hide any quality issues that truly exist. Another important element of the source system analysis is to determine the life expectancy of the source system itself. Try to determine if the source system is likely to be replaced or phased out in the foreseeable future. As companies merge, or technologies and processes improve, many companies upgrade or replace their systems. This can present challenges to the team as the primary knowledge of those systems may be replaced as well. Understanding the life expectancy of the source system will play a crucial part in the design process. For example, assume you are building a customer data warehouse for a small bank. The primary source of customer data is a system called Shucks, and you will be building a staging area in the warehouse to act as a landing area for all of the source data. After your project starts, you discover that the bank is being bought out by a larger bank and that Shucks will be replaced within three months by the larger bank's source of customer data: a system called Grins. Instead of having to redesign your entire data warehouse to handle the new source system, it may be possible to design a generic staging area that could fit any customer source system instead of building a staging area based on one specific source system. Assuming that the bulk of your processing occurs after the data has landed in the staging area, you can minimize the impact of replacing source systems by designing a generic staging area that would essentially allow you to plug in the new source system. Designing this type of staging area however, takes a large amount of planning and adds time to the schedule, but will be well worth the effort because the warehouse is now able to handle source system changes. For Data Migration, the source systems that are in scope should be understood at the start of the project. During the Analyze Phase these systems should be confirmed and communicated to all key stakeholders. If there is a disconnect between which INFORMATICA CONFIDENTIAL PHASE 2: ANALYZE 17 of 51

systems are in and out of scope it is important to document and analyze the impact. Identifying new source systems may exponentially increment the amount of resources needed on the project and require re-planning. Make a point to overcommunicate what systems are in-scope.

Best Practices
None

Sample Deliverables
None

Last updated: 20-May-08 19:28

INFORMATICA CONFIDENTIAL

PHASE 2: ANALYZE

18 of 51

Phase 2: Analyze
Subtask 2.3.2 Determine Sourcing Feasibility Description
Before beginning to work with the data, it is necessary to determine precisely what data is required to support the data integration solution. In addition, the developers must determine: what source systems house the data. where the data resides in the source systems. how the data is accessed. Take care to focus only on data that is within the scope of the requirements. Involvement of the business community is important in order to prioritize the business data needs based upon how effectively the data supports the users' top priority business problems. Determining sourcing feasibility is a two-stage process, requiring: A thorough and high-level understanding of the candidate source systems. A detailed analysis of the data sources within these source systems.

Prerequisites
None

Roles

Application Specialist (Primary) Business Analyst (Primary) Data Architect (Primary) Data Quality Developer (Primary) Metadata Manager (Primary)

Considerations
In determining the source systems for data elements, it is important to request copies of the source system data to serve as samples for further analysis. Because data volumes in the production environment are often large, it is advisable to request a subset of the data for evaluation purposes. However, requesting too small a subset can be dangerous in that it fails to provide a complete picture of the data and may hide any quality issues that exist. Particular care needs to be taken when archived historical data (e.g., data archived on tapes) or syndicated data sets (i.e., externally provided data such as market research) is required as a source to the data integration application. Additional resources and procedures may be required to sample and analyze these data sources.

Candidate Source System Analysis


A list of business data sources should have been prepared during the business requirements phase. This list typically identifies 20 or more types of data that are required to support the data integration solution and may include, for example, sales forecasts, customer demographic data, product information (e.g., categories and classifiers), and financial information (e.g., revenues, commissions, and budgets). The candidate source systems (i.e., where the required data can be found) can be identified based on this list. There may be a single source or multiple sources for the required data. Types of source include:

INFORMATICA CONFIDENTIAL

PHASE 2: ANALYZE

19 of 51

Operational sources The systems an organization uses to run its business. It may be any combination of the ERP and legacy operational systems. Strategic sources The data may be sourced from existing strategic decision support systems; for example, executive information systems. External sources Any information source provided to the organization by an external entity, such as Nielsen marketing data or Dun & Bradstreet. The following checklist can help to evaluate the suitability of data sources, which can be particularly important for resolving contention amongst the various sources. Appropriateness of the data source with respect to the underlying business functions. Source system platform. Unique characteristics of the data source system. Source systems boundaries with respect to the scope of the project being undertaken. The accuracy of the data from the data source. The timeliness of the data from the data source. The availability of the data from the data source. Current and future deployment of the source data system. Access licensing requirements and limitations. Consider, for example, a low-latency data integration application that requires credit checks to be performed on customers seeking a loan. In this case, the relevant source systems may be: A call center that captures the initial transactional request and passes this information in real time to a data integration application. An external system against which a credibility check needs to be performed by the data integration application (i.e., to determine a credit rating). An internal data warehouse accessed by the data integration application to validate and complement the information. Timeliness, reliability, accuracy of data, and a single source for reference data may be key factors influencing the selection of the source systems. Note that projects typically under-estimate problems in these areas. Many projects run into difficulty because poor data quality, both at high (metadata) and low (record) levels, impacts the ability to perform transform and load operations. An appreciation of the underlying technical feasibility may also impact the choice of data sources and should be within the scope of the high-level analysis being undertaken. This activity is about compiling information about the as is and as will be technological landscape that affect the characteristics of the data source systems and their impact on the data integration solution. Factors to consider in this survey are: Current and future organizational standards Infrastructure. Services. Networks. Hardware, software, operational limitations. Best Practices. Migration strategies. External data sources. Security criteria. For B2B solutions, solutions with significant file based data sources (and other solutions with complex data transformation requirements) it is necessary to also assess data sizes, volumes and the frequency of data updates with respect to the ability to parse and transform the data and the implications that will have on hardware and software requirements. INFORMATICA CONFIDENTIAL PHASE 2: ANALYZE 20 of 51

A high-level analysis should also allow for the early identification of risks associated with the planned development, for example: If a source system is likely to be replaced or phased out in the foreseeable future. As companies merge, or technologies and processes improve, many companies upgrade or replace their systems. This can present challenges to the team, as the primary knowledge of those systems may be replaced as well. Understanding the life expectancy of the source system plays a crucial part in the design process. If the scope of the development is larger than anticipated. If the data quality is determined to be inadequate in one or more respects. Completion of this high-level analysis should reveal a number of feasible source systems, as well as the points of contact and owners of the source systems. Further, it should produce a checklist identifying any issues about inaccuracies of the data content, gaps in the data, and any changes in the data structures over time.

Data Quality
The next step in determining source feasibility is to perform a detailed analysis of the data sources, both in structure and in content, and to create an accurate model of the source data systems. Understanding data sources requires the participation of a data source expert/Data Quality Developer and a business analyst to clarify the relevance, technical content, and business meaning of the source data. A complete set of technical documentation and application source code should be available for this step. Documentation should include Entity-Relationship diagrams (E-R diagrams) for the source systems; these diagrams then serve as the blueprints for extracting data from the source systems. It is important not to rely solely on the technical documentation to obtain accurate descriptions of the source data, since this documentation may be out of date and inaccurate. Data profiling is a useful technique to determine the structure and integrity of the data sources, particularly when used in conjunction with the technical documentation. The data profiling process involves analyzing the source data, taking an inventory of available data elements, and checking the format of those data elements. It is important to work with the actual source data, either the complete dataset or a representative subset, depending on the data volume. Using sample data derived from the actual source systems is essential for identifying data quality issues and for determining if the data meets the business requirements of the organization. The output of the data profiling effort is a survey, whose recipients include the data stewardship committee, which documents: Inconsistencies of data structures with respect to the documentation. Gaps in data. Invalid data. Missing data. Missing documentation. Inconsistencies of data with respect to the business rules. Inconsistencies in standards and style. An assessment whether the source data is in a suitable condition for extraction. Re-engineering requirements to correct content errors. Bear in mind that the issue of data quality can cleave in two directions: discovering the structure and metadata characteristics of the source data, and analyzing the low-level quality of the data in terms of record accuracy, duplication, and other metrics. Indepth structural and metadata profiling of the data sources can be conducted through Informatica Data Explorer. Low-level/perrecord data quality issues also must be uncovered and, where necessary, corrected or flagged for correction at this stage in the project. See 2.8 Perform Data Quality Audit for more information on required data quality and data analysis steps.

Determine Source Availability


The next step is to determine when all source systems are likely to be available for data extraction. This is necessary in order to determine realistic start and end times for the load window. The developers need to work closely with the source system administrators during this step because the administrators can provide specific information about the hours of operations for their systems.

INFORMATICA CONFIDENTIAL

PHASE 2: ANALYZE

21 of 51

The Source Availability Matrix lists all the sources that are being used for data extraction and specifies the systems' downtimes during a 24-hour period. This matrix should contain details of the availability of the systems on different days of the week, including weekends and holidays. For Data Migration projects access to data is not normally a problem given the premise of the solution. Typically, data migration projects have high level sponsorship and whatever is needed is provided. However, for smaller-impact projects it is important that direct access is provided to all systems that are in scope. If direct access is not available, timelines should be increased and risk items should be added to the project. Historically, most projects without direct access go over-time due to lack of availability of key resources to provide extracted data. If this can be avoided by providing direct access it should.

Determine File Transformation Constraints


For solutions with complex data transformation requirements, the final step is to determine the feasibility of transforming the data to target formats and any implications that will have on the eventual system design. Very large flat file formats often require splitting processes to be introduced into the design in order to split the data into manageable sized chunks for subsequent processing. This will require identification of appropriate boundaries for splitting and may require additional steps to convert the data into formats that are suitable for splitting. For example large PDF-based data sources may require conversion into some other format such as XML before the data can be split.

Best Practices
None

Sample Deliverables
None

Last updated: 20-May-08 19:37

INFORMATICA CONFIDENTIAL

PHASE 2: ANALYZE

22 of 51

Phase 2: Analyze
Subtask 2.3.3 Determine Target Requirements Description
This subtask provides detailed business requirements that lead to design of the target data structures for a data integration project. For Operational Data Integration projects, this may involve identifying a subject area or transaction set within an existing operational schema or a new data store. For Data Warehousing / Business Intelligence projects, this typically involves putting some structure to the informational requirements. The preceding business requirements tasks (see Prerequisites) provide a high-level assessment of the organization's business initiative and provide business definitions for the information desired. Note that if the project involves enterprise-wide data integration, it is important that the requirements process involve representatives from all interested departments and that those parties reach a semantic consensus early in the process.

Prerequisites
None

Roles

Application Specialist (Secondary) Business Analyst (Primary) Data Architect (Primary) Data Steward/Data Quality Steward (Secondary) Data Transformation Developer (Secondary) Metadata Manager (Primary) Technical Architect (Primary)

Considerations Operational Data Integration


For an operational data integration project, requirements should be based on existing or defined business processes. However, for data warehousing projects, strategic information needs must be explored to determine the metrics and dimensions desired.

Metrics
Metrics should indicate an actionable business measurement. An example for a consultancy might be: "Compare the utilization rate of consultants for period x, segmented by industry, for each of the major geographies as compared to the prior period" Often a mix of financial (e.g., budget targets) and operational (e.g., trends in customer satisfaction) key performance metrics is required to achieve a balanced measure of the organizational performance. The key performance metrics may be directly sourced from an existing operational system or may require integration of data from various systems. Market analytics may indicate a requirement for metrics to be compared to external industry performance criteria. The key performance metrics should be agreed-upon through a consensus of the business users to provide common and meaningful definitions. This facilitates the design of processes to treat source metrics that may arrive in a variety of formats from various source systems.

Dimensions

INFORMATICA CONFIDENTIAL

PHASE 2: ANALYZE

23 of 51

The key to determining dimension requirements is to formulate a business-oriented description for the segmentation requirements for each of the desired metrics. This may involve an iterative process of interaction with the business community during requirements gathering sessions, paying attention to words such as by and where. For example, a Pay-TV operator may be interested in monitoring the effectiveness of a new campaign geared at enrolling new subscribers. In a simple case, the number of new subscribers would be an essential metric; it may, however, be important to the business community to perform an analysis based on the dimensions (e.g., by demography, by group, or by time). A technical consideration at this stage is to understand whether the dimensions are likely to be rapidly changing or slowly changing, since this can affect the structure of an eventual data model built from this analysis. Rapidly-changing dimensions are those whose values may change frequently over their lifecycle (e.g., a customer attribute that changes many times a year) as opposed to a slowly-changing dimension such as an organization that may only change when a reorganization occurs. It is also important at this stage to determine as many likely summarization levels of a dimension as possible. For example, time may have a hierarchical structure comprising year, quarter, month, and day while geography may be broken down into Major Region, Area, Subregion, etc. It is also important to clarify the lowest level of detail that is required for reporting. The metric and dimension requirements should be prioritized according to perceived business value to aid in the discussion of project scope in case there are choices to make regarding what to include or exclude.

Data Migration Projects


Data migration projects should be exclusively driven by the target system needs, not by what is available in the source systems. Therefore, it is recommended to identify the target system needs early in the Analyze Phase and focus the analysis activities on those objects.

B2B Projects
For B2B and non B2B projects that have significant flat file based data targets, consideration needs to be given to the target data to be generated. Considerations include: What are target file and data formats? What sizes of target files need to be supported? Will they require recombination of multiple intermediate data formats? Are there applicable intermediate or target canonical formats that can be created or leveraged? What XML schemas are needed to support the generation of the target formats? Do target formats conform to well known proprietary or open data format standards? Does target data generation need to be accomplished within specific time or other performance related thresholds? How are errors both in data received and in overall B2B operation communicated back to the internal operations staff and to external trading partners? What mechanisms are used to send data back to external partners? What applicable middleware, communications and enterprise application software is used in the overall B2B operation? What data transformation implications does the choice of middleware and infrastructure software impose? How is overall B2B interaction governed? What process flows are involved in the system and how are they managed (for example via B2B Data Exchange, external BPM software etc.)? Are there machine readable specifications that can be leveraged directly or on modification to support Specification driven transformation based creation of data transformation scripts? Is sample data available for testing and verification of any data transformation scripts created? At a higher level, the number and complexity of data sources, the number and complexity of data targets and the number and complexity of intermediate data formats and schemas determine the overall scope of the data transformation and integration aspects of B2B data integration projects as a whole.

Best Practices
None

INFORMATICA CONFIDENTIAL

PHASE 2: ANALYZE

24 of 51

Sample Deliverables
None

Last updated: 20-May-08 19:46

INFORMATICA CONFIDENTIAL

PHASE 2: ANALYZE

25 of 51

Phase 2: Analyze
Subtask 2.3.4 Determine Business Process Data Flows Description
For many real time file-oriented data integration solutions (such as B2B systems) the production of data outputs correspond to a business process that defines how data flows through the system, how it is to be routed and what data is produced in response to different circumstances. These can include: How data is to be received into the system. What specific business process flows are executed for different types of input data or for different values of inputs. How errors are handled and what is sent back to the submitter in the event of errors. How data is routed to backend or legacy systems. How data responses from legacy or backend systems is routed back to submitters or other parties. How data interacts with human workflow systems. How different trading partners are handled. Auditing, logging and monitoring requirements. In this subtask, the development project team needs to analyze how the initial source data is routed to backend systems and what responses are produced for normal success, failure and system errors. Documentation should include UML collaboration, activity, sequence diagrams, BMPL specifications or other design specifications for how data flows through the system. These documents then serve as the blueprints for defining business process flow logic, B2B Data Exchange partner management solutions and other related subsystems.

Prerequisites
None

Roles

Business Analyst (Primary) Data Architect (Primary) Data Integration Developer (Primary) Data Transformation Developer (Primary) Technical Architect (Primary)

Considerations
In many data integration solutions the data flows are internal to the process. For B2B systems, the business process is externalized in the sense that trading partners (i.e., external parties who submit and receive data) have knowledge and expectations around what data outputs are sent in response to specific inputs. They also have knowledge about the circumstances under which errors, acknowledgements and data responses are generated (and how they should be dealt with). Knowledge of these business processes may be embodied in solutions that exist in the trading partners automated systems. In essence, the business process flows are part of the public interface to the system. Externalized business process flows occur frequently in B2B systems that utilize standards such as EDI-X12, EDIFACT, HIPAA transactions, HL7 and other open data transaction standards. For example, if a trading partner submits a HIPAA 837 message requesting payment for a health care claim, the submitter may expect HIPAA 997 messages in response to errors in their submission. They can also expect that a corresponding HIPAA 835 INFORMATICA CONFIDENTIAL PHASE 2: ANALYZE 26 of 51

message will be generated to notify them of how the claim has been adjudicated.

Best Practices
None

Sample Deliverables
None

Last updated: 20-May-08 17:59

INFORMATICA CONFIDENTIAL

PHASE 2: ANALYZE

27 of 51

Phase 2: Analyze
Subtask 2.3.5 Build Roadmap for Incremental Delivery Description
Data Integration projects, whether data warehousing or operational data integration, are often large-scale, long-term projects. This can also be the case with analytics visualization projects or metadata reporting/management projects. Any complex project should be considered a candidate for incremental delivery. Under this strategy the entirety of the comprehensive objectives of the project are broken up into prioritized deliverables, each of which can be completed within approximately three months. This gives near-term deliverables that provide early value to the business (which can be helpful in funding discussions) and conversely, is an important avenue for early end-user feedback that may enable the development team to avoid major problems. This feedback may point out misconceptions or other design flaws which, if undetected, could cause costly rework later on. This roadmap, then, provides the project stakeholders with a rough timeline for completion of their entire objective, but also communicates the timing of these incremental sub-projects based on their prioritization. Below is an example of a timeline for a Sales and Finance data warehouse with the increments roughly spaced each quarter. Each increment builds on the completion of the prior increment, but each delivers clear value in itself. Q1 Yr 1 Implement Data Warehouse Architecture Q2 Yr 1 Q3 Yr 1 Q4 Yr 1 Q1 Yr 2

Revenue Analytics Complete Bookings,Billings,Backlog GL Analytics COGS Analysis

Prerequisites
None

Roles

Business Project Manager (Primary) Data Architect (Primary) Data Steward/Data Quality Steward (Secondary) Project Sponsor (Secondary) Technical Architect (Primary) Technical Project Manager (Primary)

Considerations
The roadmap is the culmination of business requirements analysis and prioritization. The business requirements are reviewed for logical subprojects (increments), source analysis is reviewed to provide feasibility, and business priorities are used to set the sequence of the increments, factoring in feasibility and the interoperability or dependencies of the increments. The objective is to start with increments that are highly feasible, have no dependencies and provide significant value to the business. One or two of these quick hit increments is important to build end-user confidence and patience as the later, more complex increments may be harder to deliver. It is critical to gain the buy-in of the main project stakeholders regarding priorities and agreement on the roadmap sequence. Advantages of incremental delivery include:

INFORMATICA CONFIDENTIAL

PHASE 2: ANALYZE

28 of 51

Customer value is delivered earlier the business sees an early start to its ROI. Early increments elicit feedback and sometimes clearer requirements that will be valuable in designing the later increments. Much lower risk of overall project failure because of the plan for early, attainable successes. Highly likely that even if all of the long-term objectives are not achieved (they may prove infeasible or lose favor with the business), the project still provides the value of the increments that are completed. Because the early increments reflect high-priority business needs, they may attract more visibility and have greater perceived value than the project as a whole. Disadvantages can be There is always some extra effort involved in managing the release of multiple increments. However, there is less risk of costly rework effort due to misunderstood (or changing) requirements because of early feedback from end-users. There may be schema redesign or other rework necessary after initial increments because of unforeseen requirements or interdependencies.

Best Practices
None

Sample Deliverables
None

Last updated: 20-May-08 17:20

INFORMATICA CONFIDENTIAL

PHASE 2: ANALYZE

29 of 51

Phase 2: Analyze
Task 2.4 Define Functional Requirements Description
For any project to be ultimately successful, it must resolve the business objectives in a way that the end users find easy to use and satisfactory in addressing their needs. A functional requirements document is necessary to ensure that the project team understands these needs in detail and is capable of proceeding with a system design based upon the end user needs. The business drivers and goals provide a high-level view of these needs and serve as the starting point for the detailed functional requirements document. Business rules and data definitions further clarify specific business requirements and are very important in developing detailed functional requirements and ultimately the design itself.

Prerequisites
None

Roles

Business Project Manager (Review Only)

Considerations
Different types of projects require different functional requirements analysis processes. For example, an understanding of how key business users will use analytics reporting should drive the functional requirements for a business analytics project, while the requirements for data migration or operational data integration projects should be based on an analysis of the target transactions they are expected to support and what the receiving system needs in order to process the incoming data. Requirements for metadata management projects involve reviewing IT requirements for reporting and managing project metadata, surveying the corporate information technology landscape to determine potential sources of metadata, and interviewing potential users to determine reporting needs and preferences.

Business Analytics Projects


Developers need to understand the end-users expectations and preferences in terms of analytic reporting in order to determine the functional requirements for data warehousing or analytics projects. This understanding helps to determine the details regarding what data to provide, with what frequency, at what level of summarization, and periodicity, with what special calculations, and so forth. The analysis may include studying existing reporting, interviewing current information providers (i.e., those currently developing reports and analyses for Finance and other departments), and even reviewing mock-ups and usage scenarios with key endusers.

Data Migration
Functional requirements analysis for data migration projects involves a thorough understanding of the target transactions within the receiving system(s) and how the systems will process the incoming data for those transactions. The business requirements should indicate frequency of load for migration systems that will be run in parallel for a period of time (i.e., repeatedly).

Operational Data Integration


These projects are similar to data migration projects in terms of the need to understand the target transactions and how the data will be processed to accommodate them. The processing may involve multiple load steps, each with a different purpose, some operational and perhaps some for reporting. There may also be real-time requirements for some, and there may be a need for interfaces with queue-based messaging systems in situations where EAI-type integration between operational databases is involved or master data management requirements.

Data Integration Projects


For all data integration projects (i.e., all of the above), developers also need to review the source analysis with the DBAs to determine the functional requirements of the source extraction processes. INFORMATICA CONFIDENTIAL PHASE 2: ANALYZE 30 of 51

B2B Projects
For B2B projects and flat file/XML-based data integration projects, the data formats that are required for trading partners to interact with the system, the mechanisms for trading partners and operators to determine the success and failure of transformations and the internal interactions with legacy systems and other applications all form part of the requirements of the system. These, in turn, may impose additional User Interface and/or EAI type integration requirements. For large B2B projects, overall business process management will typically form part of the overall system which may impose requirements around the use of partner management software such as B2B Data Exchange and/or business process management software. Often B2B systems may have real-time requirements and involve the use of interfaces with queue-based messaging systems, web services and other application integration technologies. While these are technical, rather than business requirements, for Business Process Outsourcing and other types of B2B interaction, technical considerations often form a core component of the business operation.

Building the Specifications


For each distinct set of functional requirements, the Functional Requirements Specifications template can provide a valuable guide for determining the system constraints, inputs, outputs, and dependencies. For projects using a phased approach, priorities need to be assigned to functions based on the business needs, dependencies for those functions, and general development efficiency. Prioritization will determine in what phase certain functionality is delivered.

Best Practices
None

Sample Deliverables
None

Last updated: 20-May-08 19:51

INFORMATICA CONFIDENTIAL

PHASE 2: ANALYZE

31 of 51

Phase 2: Analyze
Task 2.5 Define Metadata Requirements Description
Metadata is often articulated as data about data. It is the collection of information that further describes the data used in the data integration project. Examples of metadata include: Definition of the data element Business names of the element System abbreviations for that element The data type (string, date, decimal, etc.) Size of the element Source location In terms of flat file and XML sources, metadata can include open and proprietary data standards and an organizations interpretations of those standards. In addition to the previous examples, flat file metadata can include: Standards documents governing layout and semantics of data formats and interchanges. Companion or interpretation guides governing an organizations interpretation of data in a particular standard. Specifications of transformations between data formats. COBOL copybook definitions for flatfile data to be passed to legacy or backend systems. All of these pieces of metadata are of interest to various members of the metadata community, some are of interest only to certain technical staff members, while other pieces may be very useful for business people attempting to navigate through the enterprise data warehouse or across and through various business/subject area-orientated data marts. That is, metadata can provide answers to such typical business questions as: What does a particular piece of data mean (i.e., its definition in business terminology)? What is the time scale for some number? How is some particular metric calculated? Who is the data owner? Metadata also provides answers to Technical questions: What does this mapping do (i.e., source to target dependency)? How will a change over here affect things over there (i.e., impact analysis)? Where are the bottlenecks (i.e., in reports or mappings)? How current is my information? What is the load history for a particular object? Which reports are being accessed most frequently and by whom? The components of a metadata requirements document include: Decision on how metadata will be used in the organization Assign data ownership Decision on who should use what metadata, and why, and how Determine business and source system definitions and names Determine metadata sources (i.e., modeling tools, databases, ETL, BI, OLAP, XML Schemas, etc.) Determine training requirements

INFORMATICA CONFIDENTIAL

PHASE 2: ANALYZE

32 of 51

Determine the quality of the metadata sources (i.e., absolute, relative, historical, etc.) Determine methods to consolidate metadata from multiple sources Identify where metadata will be stored (e.g., central, distributed, or both) Evaluate the metadata products and their capabilities (i.e., repository-based, CASE dictionary, warehouse manager, etc.) Determine responsibility for: Capturing Establishing standards and procedures Maintaining and securing the metadata Proper use, quality control, and update procedures Establish metadata standards and procedures Define naming standards (i.e., abbreviations, class words, code values, etc.) Create a Metadata committee Determine if the metadata storage will be active or passive Determine the physical requirements of the metadata storage Determine and monitor measures to establish the use and effectiveness of the metadata.

Prerequisites
None

Roles

Application Specialist (Review Only) Business Analyst (Primary) Business Project Manager (Primary) Data Architect (Primary) Data Steward/Data Quality Steward (Primary) Database Administrator (DBA) (Primary) Metadata Manager (Primary) System Administrator (Primary)

Considerations
One of the primary objectives of this subtask is to attain broad consensus among all key business beneficiaries regarding metadata business requirements priorities, it is critical to obtain as much participation as possible in this process.

B2B Projects
For B2B and flat file oriented data integration projects, metadata is often defined in less structured forms than for data dictionaries or other traditional means of managing metadata. The process of designing the system may include the need to determine and document the metadata consumed and produced by legacy and 3rd party systems. In some cases applicable metadata may need to be mined from sample operational data from unstructured and semi structured system documentation. For B2B projects, getting adequate sample source and target data can become a critical part of defining the metadata requirements.

Best Practices
INFORMATICA CONFIDENTIAL PHASE 2: ANALYZE 33 of 51

None

Sample Deliverables
None

Last updated: 20-May-08 20:05

INFORMATICA CONFIDENTIAL

PHASE 2: ANALYZE

34 of 51

Phase 2: Analyze
Subtask 2.5.1 Establish Inventory of Technical Metadata Description
Organizations undertaking new initiatives require access to consistent and reliable data resources. Confidence in the underlying information assets and an understanding of how those assets relate to one another can provide valuable leverage in the strategic decision-making process. As organizations grow through mergers and consolidations, systems that generate data become isolated resources unless they are properly integrated. Integrating these data assets and turning them into key components of the decision-making process requires significant effort. Metadata is required for a number of purposes: Provide a data dictionary Assist with change management and impact analysis Provide a system of record (lineage) Facilitate data auditing to comply with regulatory requirements Provide a basis on which formal data cleansing can be conducted Identify potential choices of canonical data formats Facilitate definition of data mappings An inventory of sources (i.e., repositories) is necessary in order to understand the availability and coverage of metadata, the ease of accessing and collating what is available, and any potential gaps in metadata provisioning. The inventory is also the basis on which the development of metadata collation and reporting can be planned. In particular, if Metadata Manager is used, there may be a need to develop custom resources to access certain metadata repositories, which can require significant effort. A metadata inventory will provide the basis for which informed estimates and project plans can be prepared.

Prerequisites
None

Roles

Application Specialist (Review Only) Business Analyst (Primary) Data Steward/Data Quality Steward (Primary) Metadata Manager (Primary) Technical Architect (Primary)

Considerations
The first part of the process is to establish a Metadata Inventory that lists all metadata sources. This investigation will establish: The (generally recognized) name of each source. The type of metadata (usually the product maintaining it) and the format in which is kept (e.g., database type and version). The priority assigned to investigation. Cross-references to other documents (e.g., design or modeling documents). The type of reporting expected from the metadata. The availability of an XConnect (assuming Metadata Manager is used) to access repository and collate the metadata.

INFORMATICA CONFIDENTIAL

PHASE 2: ANALYZE

35 of 51

The second part of the process is to investigate in detail those metadata repositories or sources that will be required to meet the next phase of requirements. This investigation will establish: Ownership and stewardship of the metadata (responsibilities of the owners and stewards are usually pre-defined by an organization and not part of preparing the metadata inventory). Existence of a metadata model (one will need to be developed if it does not exist, usually by the Business Analysts and System Specialist). System and business definition of the metadata items. Frequency and methods of updating the repository. Extent of any update history. Those elements required for reporting/analysis purposes. The quality of the metadata sources (i.e., quality can be measured qualitatively by a questionnaire issued to users, but may be better measured against metrics that either exist with the organization or are proposed as part of developing this inventory). The development effort involved in developing a method of accessing/extracting metadata (for Metadata Manager, a custom XConnect) if none already exists. (Ideally, the estimates should be in man-days by skill, and include a list of prerequisites and dependencies).

B2B Projects
For B2B and flat file oriented data integration projects, metadata is often defined and maintained in the form of non-database oriented metadata such as XML schemas or data format specifications (and specifications as to how standards should be interpreted). Metadata may need to be mined from sample data, legacy systems and/or mapping specifications. Metadata repositories may take the form of document repositories using document management or source control technologies. In B2B systems, the responsibility for tracking metadata may shift to members of the technical architecture team; as traditional database design, planning and maintenance may play a lesser role in these systems.

Best Practices
None

Sample Deliverables
Metadata Inventory

Last updated: 20-May-08 20:12

INFORMATICA CONFIDENTIAL

PHASE 2: ANALYZE

36 of 51

Phase 2: Analyze
Subtask 2.5.2 Review Metadata Sourcing Requirements Description
Collect business requirements about metadata that is expected to be stored and analyzed. These requirements are determined by the reporting needs for metadata, as well as details of the metadata source, and the ability to extract and load this information to the respective metadata repository or metadata warehouse. Having a thorough understanding of these requirements is a must for a smooth and timely implementation of any metadata analysis solution per business requirements.

Prerequisites
None

Roles

Business Project Manager (Review Only) Metadata Manager (Primary) System Administrator (Review Only)

Considerations Determine Metadata Reporting Requirements


Metadata reporting requirements should drive the specific metadata to be collected, as well as the implementation of tools to collect, store, and display it. The need to expose metadata to developers is quite different than a need to expose metadata to operations personnel and quite different than a need to expose metadata to business users. Each of these pieces of the metadata picture requires different information and can be stored in and handled by different metadata repositories. Developers typically require metadata that helps determine how source information maps to a target, as well as information that can help with impact analysis in the case of change to a source or target structure or transformation logic. If there are data quality routines to be implemented, source metadata can also help to determine the best method for such implementation, as well as specific expectations regarding the quality of the source data. Operations personnel generally require metadata regarding either the data integration processes or business intelligence reporting, or both. This information is helpful in determining issues or problems with delivering information to the final end-user with regard to items such as the expected source data sizes versus actual processed; the time to run specific processes, and if load windows are being met; the number of end users running specific reports; the time of day reports are being run and when the load on the system is highest; etc. This metadata allows operations to address issues as they arise. When reviewing metadata, business users want to know how the data was generated (and related) and what manipulation, if any, was performed to produce it. Information looked at ranges from specific reference metadata (i.e., ontologies and taxonomies) to the transformations and/or calculations that were used to create the final report values.

Sources of Metadata
After initial reporting requirements are developed, the location and accessibility of the metadata must be considered. Some sources of metadata only exist in documentation, or can be considered home grown by the systems that are used to perform specific tasks on the data, or exist as knowledge gained through the course of working with the data. If it is important to include this information in a metadata repository or warehouse, it is important to note that there is not likely to be an automated method of extracting and loading this type of metadata. In the best case scenario, a custom process can be created to load this metadata and in the worst case, this information needs to be entered manually. Various other more formalized sources of metadata usually have automated methods for loading to a metadata repository or warehouse. This includes information that is held in data modeling tools, data integration platforms, database management systems and business intelligence tools. It is important to note that most sources of metadata that can be loaded in an automated fashion contain mechanisms for holding some custom / unstructured type metadata, such as description fields. This methodology may obviate the need for INFORMATICA CONFIDENTIAL PHASE 2: ANALYZE 37 of 51

creating custom methods of loading metadata or manually entering the same metadata in various locations.

Metadata Storage and Loading


For each of the various types of metadata reporting requirements mentioned, as well as the various types of metadata sources, different methods of storage may fit better than others and affect how the various metadata can be sourced. In the case of metadata for developers and operations personnel, this type can generally be found and stored in the repositories of the software used to accomplish the tasks, such as the PowerCenter repository or the business intelligence software repository. Usually, these software packages include sufficient reporting capability to meet the required needs of this type of reporting. At the same time, most of these metadata repositories include locations for manually entering metadata, as well as automatically importing metadata from various sources. Specifically, when using the PowerCenter repository as a metadata hub, there are various locations where description fields can be used to include unstructured type / more descriptive metadata. Mechanisms such as metadata extensions also allow for userdefined fields of metadata. In terms of automated loading of metadata, PowerCenter can import definitions from data modeling tools using Metadata Exchange. Also, metadata from various sources, targets, and other objects can be imported natively from the connections the PowerCenter software can make to these systems, including items such as database management systems, ERP systems via PowerConnects, and XML schema definitions. In general, however, if robust reporting is required, or reporting across multiple software metadata repositories, a metadata warehouse platform such as Informatica Metadata Manager may be more appropriate to handle such functions. In the case of metadata requirements for a business user, this usually requires a platform that can integrate the metadata from various metadata sources, as well as provide a relatively robust reporting function, which specific software metadata repositories usually lack. Thus, in these cases, a platform like Metadata Manager is optimal. When using Metadata Manager, custom XConnects need to be created to accommodate any metadata source that does not already have a pre-built loading interface or any source where the pre-built interface does not extract all the required metatdata. (For details about developing a custom Xconnect, refer to the Informatica Metadata Manager 8.5.1 Custom Metadata Integration Guide). Metadata Manager contains various XConnect interfaces for data modeling tools, the PowerCenter data integration platform, database management systems and business intelligence tools. (For a specific list, refer to the Metadata Manager Administrator Guide).

Metadata Analysis and Reports


The specific types of analysis and reports must also be considered with regard to specifically what metadata needs to be sourced. For metadata repositories like PowerCenter, the available analysis is very specific and little information beyond what is normally sourced into the repository can be available for reporting. In the case of a metadata warehouse platform such as Metadata Manager, more comprehensive reporting can be created. From a high-level, the following analysis is possible with Metadata Manager: Metadata browsing Metadata searching Where-used analysis Lineage analysis Packaged reports Metadata Manager provides more specific metadata analysis to help analyze source repository metadata, including: Business intelligence reports - to analyze a business intelligence system, such as report information, user activity, and how long it takes to run reports. Data integration reports - to analyze data integration operations, such as reports that identify data integration problems, and analyze data integration processes. Database management reports - to explore database objects, such as schemas, structures, methods, triggers, and

INFORMATICA CONFIDENTIAL

PHASE 2: ANALYZE

38 of 51

indexes, and the relationships among them. Metamodel reports to analyze how metadata is classified for each repository. (For more information about metamodels, refer to the Metadata Manager Administrator Guide). ODS reports to analyze data in particular metadata repositories. It may be possible that even with a metadata warehouse platform like Metadata Manager, some analysis requirements cannot be fulfilled by the above-mentioned features and out-of-the-box reports. Analysis should be performed to identify any gaps and to determine if any customization or design can be done within Metadata Manager to resolve the gaps. Bear in mind that Informatica Data Explorer (IDE) also provides a range of source data and metadata profiling and source-totarget mapping capabilities.

Best Practices
None

Sample Deliverables
None

Last updated: 09-May-08 13:52

INFORMATICA CONFIDENTIAL

PHASE 2: ANALYZE

39 of 51

Phase 2: Analyze
Subtask 2.5.3 Assess Technical Strategies and Policies Description
Every IT organization operates using an established set of corporate strategies and related development policies. Understanding and detailing these approaches may require discussions ranging from Sarbanes-Oxley compliance to specific supported hardware and software considerations. The goal of this subtask is to detail and assess the impact of these policies as they relate to the current project effort.

Prerequisites
None

Roles

Application Specialist (Review Only) Business Project Manager (Primary) Data Architect (Primary) Database Administrator (DBA) (Primary) System Administrator (Primary)

Considerations
Assessing the impact of an enterprises IT policies may incorporate a wide range of discussions covering an equally wide range of business and developmental areas. The following types of questions should be considered in beginning this effort.

Overall
Is there an overall IT Mission Statement? If so, what specific directives might affect the approach to this project effort?

Environment
What are the current hardware or software standards? For example, NT vs. UNIX vs. Linux? Oracle vs. SQL Server? SAP vs. PeopleSoft? What, if any, data extraction and integration standards currently exist? What source systems are currently utilized? For example, mainframe? flat file? relational database? What, if any, regulatory requirements exist regarding access to and historical maintenance of the source data? What, if any, load window restrictions exist regarding system and/or source data availability? How many environments are used in a standard deployment? For example: 1) Development, 2) Test, 3) QA, 4) PreProduction, 5) Production. What is or will be the end-user presentation layer?

Project Team
What is a standard project team structure? For example, Project Sponsor, Business Analyst, Project Manager, Developer, etc. Are dedicated support resources assigned? Or are they often shared among initiatives (e.g., DBAs)? Is all development performed by full time employees? Are contractors and/or offshore resources employed?

Project Lifecycle
What is a typical development lifecycle? What are standard milestones? What criteria are typically applied to establish production readiness?

INFORMATICA CONFIDENTIAL

PHASE 2: ANALYZE

40 of 51

What change controls mechanisms/procedures are in place? Are these controls strictly policy-based, or are any specific change-control software in use? What, if any, promotion/release standards are used? What is the standard for production support?

Metadata and Supporting Documentation


What types of supporting documentation are typically required? What, if any, is the current metadata strategy within the enterprise? Resolving the answers to questions such as these will enable a greater accuracy in project planning, scoping, and staffing efforts. Additionally, the understanding gained from this assessment ensures that any new project effort will better marry its approach to the established practices of the organization.

Best Practices
None

Sample Deliverables
None

Last updated: 15-Feb-07 18:11

INFORMATICA CONFIDENTIAL

PHASE 2: ANALYZE

41 of 51

Phase 2: Analyze
Task 2.6 Determine Technical Readiness Description
The goal of this task is to determine the readiness of an IT organization with respect to its technical architecture, implementation of said architecture, and the associated staffing required to support the technical solution. Conducting this analysis, through interviews with the existing IT team members (such as those noted in the Roles section), provides evidence as to whether or not the critical technologies and associated support system are sufficiently mature as to not present significant risk to the endeavor.

Prerequisites
None

Roles

Business Project Manager (Primary) Database Administrator (DBA) (Primary) System Administrator (Primary) Technical Architect (Primary)

Considerations
Carefully consider the following questions when evaluating the technical readiness of a given enterprise:

Has the architecture team been staffed and trained in the assessment of critical technologies? Have all of the decisions been made regarding the various components of the infrastructure, including: network, servers, and software? Has a schedule been established regarding the ordering, installing, and deployment of the servers and network? If in place, what are the availability, capacity, scalability, and reliability of the infrastructure? Has the project team been fully staffed and trained, including but not limited to: a Project Manager, Technical Architect, System Administrator, Developer(s), and DBA(s)? (See 1.2.1 Establish Project Roles). Are proven implementation practices and approaches in place to ensure a successful project? (See 2.5.3 Assess Technical Strategies and Policies). Has the Technical Architect evaluated and verified the Informatica PowerCenter Quickstart configuration requirements? Has the repository database been installed and configured? By gaining a better understanding of questions such as these, developers can achieve a clearer picture of whether or not that organization is sufficiently ready to move forward with the project effort. This information also helps to develop a more accurate and reliable project plan.

Best Practices
None

Sample Deliverables
None

Last updated: 01-Feb-07 18:44

INFORMATICA CONFIDENTIAL

PHASE 2: ANALYZE

42 of 51

Phase 2: Analyze
Task 2.7 Determine Regulatory Requirements Description
Many organizations must now comply with a range of regulatory requirements such as financial services regulation, data protection, Sarbanes-Oxley, retention of data for potential criminal investigations, and interchange of data between organizations. Some industries may also be required to complete specialized reports for government regulatory bodies. This can mean prescribed reporting, detailed auditing of data, and specific controls over actions and processing of the data. These requirements differ from the "normal" business requirements in that they are imposed by legislation and/or external bodies. The penalties for not precisely meeting the requirements can be severe. However, there is a "carrot and stick" element to regulatory compliance. Regulatory requirements and industry standards can also present the business with an opportunity to improve its data processes and update the quality of its data in key areas. Successful compliance for example, in the banking sector, with the Basel II Accord brings the potential for more productive and profitable uses of data. As data is prepared for the later stages in a project, the project personnel must establish what government or industry standards the project data must adhere to and devise a plan to meet these standards. These steps include establishing a catalog of all reporting and auditing required, including any prescribed content, formats, processes, and controls. The definitions of content (e.g., inclusion/exclusion rules, timescales, units, etc.) and any metrics or calculations, are likely to be particularly important.

Prerequisites
None

Roles

Business Analyst (Primary) Business Project Manager (Review Only) Legal Expert (Primary)

Considerations
Areas where requirements arise include the following: Sarbanes-Oxley regulations in the U.S. mean a proliferation of controls on processes and data. Developers need to work closely with an organizations Finance Department to ascertain exactly how Sarbanes-Oxley affects the project. There may be implications for how environments are set up and controls for migration between environments (e.g., between Development, Test, and Production), as well as for sign-offs, specified verification, etc. Another regulatory system applicable to financial companies is the Basel II Accord. While Basel II does not have the force of law, it is a de facto requirement within the international financial community. Other industries are demanding adherence to new data standards, both communally, by coming together around common data models such as bar codes and RFID (radio frequency identification), and individually, as enterprises realize the benefits of synchronizing their data storage conventions with suppliers and customers. Such initiatives are sometimes gathered under the umbrella of Global Data Synchronization (GDS); the key benefit of GDS is that it is not a compliance chore but a positive and profitable initiative for a business. If your project must comply with a government or industry regulation, or if the business simply insists on high standards for its data (for example, to establish a single version of the truth for items in the business chain), then you must increase your focus on data quality in the project. 2.8 Perform Data Quality Audit is dedicated to performing a Data Quality Audit that can provide the project stakeholders with a detailed picture of the strengths and weaknesses of the project data in key compliance areas such as accuracy, completeness, and duplication. For example, compliance with a request for data under Section 314 of the USA-PATRIOT Act is likely to be difficult for a business that finds it has large numbers of duplicate records, or records that contain empty fields, or fields populated with default values. Such problems should be identified and addressed before the data is moved downstream in the project.

INFORMATICA CONFIDENTIAL

PHASE 2: ANALYZE

43 of 51

Regulatory requirements often require the ability to clearly audit the processes affecting the data. This may require a metadata reporting system that can provide viewing and reporting of data lineage and where-used. Remember, such a system can produce spin-off benefits for IT in terms of automated project documentation and impact analysis. Industry and regulatory standards for data interchange may also affect data model and ETL designs. HIPAA and HL7compliance may dictate transaction definitions that affect healthcare-related projects, as may SWIFT or Basel II for financerelated data. Potentially there are now two areas to investigate in more detail: data and metadata. Map the requirements back to the data and/or metadata required using a standard modeling approach. Use data models and the metadata catalog to assess the availability and quality of the required data and metadata. Use the data models of the systems and data sources involved, along with the inventory of metadata. Verify that the target data models meet the regulatory requirements.

Processes and Auditing Controls


It is important that data can be audited at every stage of processing where it is necessary. To this end, review any proposed processes and audit controls to verify that the regulatory requirements can be met and that any gaps are filled. Also, ensure that reporting requirements can be met, again filling any gaps. It is important to check that the format, content, and delivery mechanisms for all reports comply with the regulatory requirements.

Best Practices
None

Sample Deliverables
None

Last updated: 15-Feb-07 18:13

INFORMATICA CONFIDENTIAL

PHASE 2: ANALYZE

44 of 51

Phase 2: Analyze
Task 2.8 Perform Data Quality Audit
Data Quality is a key factor for several tasks and subtasks in the Analyze Phase. The quality of the proposed project source data, in terms of both its structure and content, is a key determinant of the specifics of the business scope and of the success of the project in general. For information on issues relating primarily to data structure, see subtask 2.3.2 Determine Sourcing Feasibility, which focuses on the quality of the data content. Problems with the data content must be communicated to senior project personnel as soon as they are discovered. Poor data quality can impede the proper execution of later steps in the project, such as data transformation and load operations, and can also compromise the business ability to generate a return on the project investment. This is compounded by the fact that most businesses underestimate the extent of their data quality problems. There is little point in performing a data warehouse, migration, or integration project if the underlying data is in bad shape. The Data Quality Audit is designed to analyze representative samples of the source data and discover their data quality characteristics so that these can be articulated to all relevant project personnel. The project leaders can then decide what actions, if any, are necessary to correct data quality issues and ensure that the successful completion of the project is not in jeopardy.

Description

Prerequisites
None

Roles

Business Project Manager (Secondary) Data Quality Developer (Primary) Data Steward/Data Quality Steward (Primary) Technical Project Manager (Secondary)

Considerations
The Data Quality Audit can typically be conducted very quickly, but the actual time required is determined by the starting condition of the data and the success criteria defined at the beginning of the audit. The main steps are as follow: Representative samples of source data from all main areas are provided to the Data Quality Developer. The Data Quality Developer uses a data analysis tool to determine the quality of the data according to several criteria. The Data Quality Developer generates summary reports on the data and distributes these to the relevant roles for discussion and next steps. Two important aspects of the audit are (1) the data quality criteria used, and (2) the type of report generated.

Data Quality Criteria


You can define any number and type of criteria for your data quality. However, there are six standard criteria: Accuracy is concerned with the general accuracy of the data in a dataset. It is often determined by comparing the dataset with a reliable reference source, for example, a dictionary file containing product reference data. Completeness is concerned with missing data, that is, fields in the dataset that have been left empty or whose default values have been left unchanged. For example, many data input fields have a default date setting of 01/01/1900. If a record includes 01/01/1900 as a data of birth, it is highly likely that the field was never populated. Conformity is concerned with data values of a similar type that have been entered in a confusing or unusable manner, for example, telephone numbers that include/omit area codes. Consistency is concerned with the occurrence of disparate types of data records in a dataset created for a single data type (e.g., the combination of personal and business information in a dataset intended for business data only). INFORMATICA CONFIDENTIAL PHASE 2: ANALYZE 45 of 51

Integrity is concerned with the recognition of meaningful associations between records in a dataset. For example, a dataset may contain records for two or more family members in a household but without any means for the organization to recognize or use this information. Duplication is concerned with data records that duplicate one anothers information, that is, with identifying redundant records in the dataset or records with meaningful information in common. For example: A dataset may contain user-entered records for Batch No. 12345 and Batch 12345, where both records describe the same batch. A dataset may contain several records with common surnames and street addresses, indicating that the records refer to a single household; this type of information is relevant to marketing personnel. This list is not absolute; the characteristics above are sometimes described with other terminology, such as redundancy or timeliness. Every organizations data needs are different, and the prevalence and relative priority of data quality issues differ from one organization and one project to the next. Note that the accuracy factor differs from the other five factors in the following respect: whereas, for example,a pair of duplicate records may be visible to the naked eye, it can be difficult to tell simply by eyeballing if a given data record is inaccurate. Accuracy can be determined by applying fuzzy logic to the data or by validating the records against a verified reference data set.

Best Practices
Developing the Data Quality Business Case

Sample Deliverables
None

Last updated: 21-Aug-07 14:06

INFORMATICA CONFIDENTIAL

PHASE 2: ANALYZE

46 of 51

Phase 2: Analyze
Subtask 2.8.1 Perform Data Quality Analysis of Source Data Description
The data quality audit is a business rules-based approach that aims to help define project expectations through the use of data quality processes (or mappings) and data quality scorecards. It involves conducting a data analysis on the project data, or on a representative sample of the data, and producing an accurate and qualified summary of the datas quality. This subtask focuses on data quality analysis. The results are processed and presented to the business users in the next subtask 2.8.2 Report Analysis Results to the Business.

Prerequisites
None

Roles

Business Analyst (Secondary) Data Quality Developer (Primary) Data Steward/Data Quality Steward (Secondary)

Considerations
There are three key steps in the process:

1. Select Target Data


The main objective of this step is to meet with the data steward and business owners to identify the data sources to be analyzed. For each data source, the Data Quality Developer will need all available information on the data format, content, and structure,as well as input on known data quality issues. The result of this step is a list of the sources of data to be analyzed, along with the identification of all known issues. These define the initial scope of the audit. The following figure illustrates selecting target data from multiple sources.

2. Run Data Quality Analysis


This step identifies and quantifies data quality issues in the source data. Data quality analysis rules and mappings are configured in Informatica Data Quality Analyst and Developer tools. The data quality rules and mappings should be configured in a manner that enables the production of scorecards in the next subtask. A scorecard is a graphical representation of the levels of data quality

INFORMATICA CONFIDENTIAL

PHASE 2: ANALYZE

47 of 51

in the dataset. The rules and mappings designed at this stage identify cases of inconsistent, incomplete, or absent data values. Using Data Quality, the Data Quality Developer can identify all such data content issues. Data analysis provides detailed metrics to guide the next steps of the audit. For example: For character data, analysis identifies all distinct values (such as code values) and their frequency distribution. For numeric data, analysis provides statistics on the highest, lowest, average, and total, as well as the number of positive values, negative values, zero/null values, and any non-numeric values. For dates, analysis identifies the highest and lowest dates, the number of blank/null fields, as well as any invalid date values. For consumer packaging data, analysis can detect issues such as bar codes with correct/incorrect numbers of digits. The figure below shows a sample scorecard.

3. Define Business Rules


The key objectives of this step are to identify issues in the areas of completeness, conformity, and consistency, to prioritize data quality issues, and to define customized data quality rules. These objectives involve: Discussions of data quality analyses with business users to define completeness, conformity, and consistency rules for each data element. Tuning and re-running the analysis mappings with these business rules. For each data set, a set of base rules must be established to test the conformity of the attributes' data values against basic rule definitions. For example, if an attribute has a date type, then that attribute should only have date information stored. At a minimum, all the necessary fields must be tested against the base rule sets.

Best Practices
Data Profiling

Sample Deliverables
None

Last updated: 28-Oct-10 01:38

INFORMATICA CONFIDENTIAL

PHASE 2: ANALYZE

48 of 51

Phase 2: Analyze
Subtask 2.8.2 Report Analysis Results to the Business Description
The steps outlined in subtask 2.8.1 lead to the preparation of the Data Quality Audit Report, which is delivered in this subtask. The Data Quality Audit report highlights the state of the data analyzed in an easy-to-read, high-impact fashion. The report can include the following types of file: Data quality scorecards - charts and graphs of data quality that can be pre-set to present and compare data quality across key fields and data types Drill-down reports that permit reviewers to access the raw data underlying the summary information Exception files In this subtask, potential risk areas are identified and alternative solutions are evaluated. The Data Quality Audit concludes with a presentation of these findings to the business and project stakeholders and agreement on recommended next steps.

Prerequisites
None

Roles

Business Analyst (Secondary) Business Project Manager (Secondary) Data Quality Developer (Primary) Data Steward/Data Quality Steward (Primary) Technical Project Manager (Secondary)

Considerations
There are two key activities in this subtask: delivering the report, and framing a discussion for the business about what actions to take based on the report conclusions. Delivering the report involves formatting the analysis results from subtask 2.8.1 into a framework that can be easily understood by the business. This includes building data quality scorecards, preparing the data sources for the scorecards, and possibly creating audit summary documentation such as a Microsoft Word document or a PowerPoint slideshow. The data quality issues can then be evaluated, recommendations made, and project targets set.

Creating Scorecards
Informatica Data Quality (IDQ) is used to identify, measure, and categorize data quality issues according to business criteria. IDQ reports information in several formats, including database tables, CSV files, HTML files, and graphically. (Graphical displays, or scorecards, are linked to the underlying data so that viewers can move from high-level to low-level views of the data.) Part of the report creation process is the agreement of pass/fail scores for the data and the assignment of weights to the data performance for different criteria. For example, the business may state that at least 98 percent of values in address data fields must be accurate and weight the zip+four field as most important. Once the scorecards are defined, the data quality plans can be re-used to track data quality progress over time and throughout the organization. The data quality scorecard can also be presented through a dashboard framework, which adds value to the scorecard by grouping graphical information in business-intelligent ways.

INFORMATICA CONFIDENTIAL

PHASE 2: ANALYZE

49 of 51

As can be seen in the above figure, a dashboard can present measurements in a traffic light manner (color-coded green/amber/red) to provide quick visual cues as to the quality of and actions needed for the data.

Reviewing the Audit Results and Deciding the Next Step


By integrating various data analysis results within the dashboard application, the stakeholders can review the current state of data quality and decide on appropriate actions within the project. The set of stakeholders should include one or more members of the data stewardship committee, the project manager, data experts, a Data Quality Developer, and representatives of the business. Together, these stakeholders can review the data quality audit conclusions and conduct a cost-benefit comparison of the desired data quality levels versus the impact on the project of the steps to achieve these levels. In some projects for example, when the data must comply with government or industry regulations the data quality levels are non-negotiable, and the project stakeholders must work to those regulations. In other cases, the business objectives may be achieved by data quality levels that are less than 100 percent. In all cases, the project data must obtain a minimum quality levels in order to pass through the project processes and be accepted by the target data source. For these reasons, it is necessary to discuss data quality as early as possible in project planning.

Ongoing Audits and Data Quality Monitoring


Conducting a data quality audit one time provides insight into the then-current state of the data, but does not reflect how project activity can change data quality over time. Tracking levels of data quality over time, as part of an ongoing monitoring process, provides a historical view of when and how much the quality of data has improved. The following figure illustrates how ongoing audits can chart progress in data quality.

INFORMATICA CONFIDENTIAL

PHASE 2: ANALYZE

50 of 51

As part of a statistical control process, data quality levels can be tracked on a periodic basis and charted to show if the measured levels of data quality reach and remain in an acceptable range, or whether some event has caused the measured level to fall below what is acceptable. Statistical control charts can help in notifying data stewards when an exception event impacts data quality and can help to identify the offending information process. Historical statistical tracking and charting capabilities are available within a data quality scorecard, and scorecards can be easily updated; once configured, the scorecard typically does not need to be re-created for successive data quality analyses.

Best Practices
None

Sample Deliverables
None

Last updated: 15-Feb-07 17:29

INFORMATICA CONFIDENTIAL

PHASE 2: ANALYZE

51 of 51

Velocity v9
Phase 3: Architect

2011 Informatica Corporation. All rights reserved.

Phase 3: Architect
3 Architect 3.1 Develop Solution Architecture 3.1.1 Define Technical Requirements 3.1.2 Develop Architecture Logical View 3.1.3 Develop Configuration Recommendations 3.1.4 Develop Architecture Physical View 3.1.5 Estimate Volume Requirements 3.2 Design Development Architecture 3.2.1 Develop Quality Assurance Strategy 3.2.2 Define Development Environments 3.2.3 Develop Change Control Procedures 3.2.4 Determine Metadata Strategy 3.2.5 Develop Change Management Process 3.2.6 Define Development Standards 3.3 Implement Technical Architecture 3.3.1 Procure Hardware and Software 3.3.2 Install/Configure Software

INFORMATICA CONFIDENTIAL

PHASE 3: ARCHITECT

2 of 45

Phase 3: Architect
Description
During this Phase of the project, the technical requirements are defined, the project infrastructure is developed and the development standards and strategies are defined. The conceptual architecture that forms the basis for determining capacity requirements and configuration recommendations is designed. The environments and strategies for the entire development process are defined. The strategies include development standards, quality assurance, change control processes and metadata strategy. It is critical that the architecture decisions made during this phase are guided by an understanding of the business needs. As Data Integration architectures become more real-time and mission critical, good architecture decisions will ensure the success of the overall effort. This phase should culminate in the implementation of the hardware and software that will allow the Design Phase and the Build Phase of the project to begin. Proper execution during the Architect Phase is especially important for Data Migration and B2B Data Transformation projects. In the Architect Phase a series of key tasks are undertaken to accelerate development, ensure consistency and expedite completion of the data migration or data transformation project.

Prerequisites
None

Roles

Business Analyst (Primary) Business Project Manager (Primary) Data Architect (Primary) Data Integration Developer (Secondary) Data Quality Developer (Primary) Data Warehouse Administrator (Review Only) Database Administrator (DBA) (Primary) Metadata Manager (Primary) Presentation Layer Developer (Secondary) Project Sponsor (Approve) Quality Assurance Manager (Primary) Repository Administrator (Primary) Security Manager (Secondary) System Administrator (Primary) Technical Architect (Primary) Technical Project Manager (Primary)

Considerations
None

Best Practices
None

INFORMATICA CONFIDENTIAL

PHASE 3: ARCHITECT

3 of 45

Sample Deliverables
None

Last updated: 24-Oct-10 16:31

INFORMATICA CONFIDENTIAL

PHASE 3: ARCHITECT

4 of 45

Phase 3: Architect
Task 3.1 Develop Solution Architecture Description
The scope of solution architecture in a data integration or an enterprise data warehouse project is quite broad and involves careful consideration of many disparate factors. Data integration solutions have grown in scope as well as the amount of data they process. This necessitates careful consideration of architectural issues across a number of architectural domains. Well-designed solution architecture is very crucial to any data integration effort, and can be the most influential, visible part of the whole effort. A robust solution architecture not only meets the business requirements but it also exceeds the expectations of the business community. Given the continuous state of change that has become a trademark of information technology, it is prudent to have an architecture that is not only easy to implement and manage, but also flexible enough to accommodate changes in the future, easily extendable, reliable (with minimal or no downtime), and vastly scalable. This task approaches the development of the architecture as a series of stepwise refinements: First, reviewing the requirements. Then developing a logical model of the architecture for consideration. Refining the logical model into a physical model, and Validating the physical model. In addition, because the architecture must consider anticipated data volumes, it is necessary to develop a thorough set of estimates. The Technical Architect is responsible for ensuring that the proposed architecture can support the estimated volumes.

Prerequisites
None

Roles

Business Analyst (Primary) Data Architect (Primary) Data Quality Developer (Primary) Data Warehouse Administrator (Review Only) Database Administrator (DBA) (Primary) System Administrator (Primary) Technical Architect (Primary) Technical Project Manager (Review Only)

Considerations
A holistic view of architecture encompasses three realms, the development architecture, the execution architecture, and the operations architecture. These three areas of concern provide a framework for considering how any system is built, how it runs, and how it is operated. Although there may be some argument about whether an integration solution is a "system," it is clear that it has all the elements of a software system, including databases, executable programs, end users, maintenance releases, and so forth. Of course, all of these elements must be considered in the design and development of the enterprise solution. Each of these architectural areas involves specific responsibilities and concerns: Development Architecture, which incorporates technology standards, tools, and the techniques and services required in the development of the enterprise solution. This may include many of the services described in the execution architecture, but also involves services that are unique to development environments such as security mechanisms for INFORMATICA CONFIDENTIAL PHASE 3: ARCHITECT 5 of 45

controlling access to development objects, change control tools and procedures, and migration capabilities. Execution Architecture, which includes the entire supporting infrastructure required to run an application or set of applications. In the context of an enterprise-wide integration solution, this includes client and server hardware, operating systems, database management systems, network infrastructure, and any other technology services employed in the runtime delivery of the solution. Operations Architecture, which is a unified collection of technology services, tools, standards, and controls required to keep a business application production or development environment operating at the designed service level. This differs from the execution architecture in that its primary users are system administrators and production support personnel. The specific activities that comprise this task focus primarily on the Execution Architecture. 3.2 Design Development Architecture focuses on the development architecture and the Operate Phase discusses the important aspects of operating a data integration solution. Refer to the Operate Phase for more information on the operations architecture.

Best Practices
None

Sample Deliverables
None

Last updated: 01-Feb-07 18:44

INFORMATICA CONFIDENTIAL

PHASE 3: ARCHITECT

6 of 45

Phase 3: Architect
Subtask 3.1.1 Define Technical Requirements Description
In anticipation of architectural design and subsequent detailed technical design steps, the business requirements and functional requirements must be reviewed and a high-level specification of the technical requirements developed. The technical requirements will drive these design steps by clarifying what technologies will be employed and, from a high-level, how they will satisfy the business and functional requirements.

Prerequisites
None

Roles

Business Analyst (Primary) Data Quality Developer (Secondary) Technical Architect (Primary) Technical Project Manager (Review Only)

Considerations
The technical requirements should address, at least at a conceptual level, implementation specifications based on the findings to date (regarding data rules, source analysis, strategic decisions, etc.) such as: Technical definitions of business rule derivations (including levels of summarization. Definitions of source and target schema at least at logical/conceptual level. Data acquisition and data flow requirements. Data quality requirements (at least at a high level). Data consolidation/integration requirements (at least at a high level). Report delivery and access specifications. Performance requirements (both back-end and presentation performance). Security requirements and structures (access, domain, administration, etc.). Connectivity specifications and constraints (especially limits of access to operational systems). Specific technologies required (if requirements clearly indicate such). For Data Migration projects the technical requirements are fairly consistent and known They will require processes to: Populate the reference data structures Acquire the data from source systems Convert to target definitions Load to the target application Meet the necessary audit functionalities The details of which will be covered in a data migration strategy.

Best Practices
None

Sample Deliverables
None

INFORMATICA CONFIDENTIAL

PHASE 3: ARCHITECT

7 of 45

Last updated: 01-Feb-07 18:44

INFORMATICA CONFIDENTIAL

PHASE 3: ARCHITECT

8 of 45

Phase 3: Architect
Subtask 3.1.2 Develop Architecture Logical View Description
Much like a logical data model, a logical view of the architecture provides a high-level depiction of the various entities and relationships as an architectural blueprint of the entire data integration solution. The logical architecture helps people to visualize the solution and show how all the components work together. The major purposes of the logical view are: To describe how the various solution elements work together (i.e., databases, ETL, reporting, and metadata). To communicate the conceptual architecture to project participants to validate the architecture. To serve as a blueprint for developing the more detailed physical view. The logical diagram provides a road map of the enterprise initiative and an opportunity for the architects and project planners to define and describe, in some detail, the individual components. The logical view should show relationships in the data flow and among the functional components; indicating, for example, how local repositories relate to the global repository (if applicable). The logical view must take into consideration all of the source systems required to support the solution, the repositories that will contain the runtime metadata, and all known data marts and reports. This is a living architectural diagram, to be refined as you implement or grow the solution. The logical view does not contain detailed physical information such as server names, IP addresses, hardware specifications, etc. These details will be fleshed out in the development of the physical view.

Prerequisites
None

Roles

Data Architect (Secondary)

Considerations
The logical architecture should address reliability, availability, scalability, performance, usability, extensibility, interoperability, security, and QA. It should incorporate all of the high-level components of the information architecture, including but not limited to: All relevant source systems Informatica Domains, Informatica Nodes Data Integration Services, Data Integration Repositories Business Intelligence repositories Metadata Management, Metadata Reporting Real-time Messaging, Web Services, XML Server Data Quality tools, Data Modeling tools Data Archive tools Target data structures, e.g., data warehouse, data marts, ODS Web Application Servers ROLAP engines, Portals, MOLAP cubes, Data Mining For Data Migration projects a key component is the documentation of the various utility database schemas. This will likely include legacy staging, pre-load staging, reference data, and audit database schemas. Additionally, database schemas for Informatica Data Quality and Informatica Data Explorer will also be included.

INFORMATICA CONFIDENTIAL

PHASE 3: ARCHITECT

9 of 45

Best Practices
Designing Data Integration Architectures Master Data Management Architecture with Informatica PowerCenter Enterprise Grid Option

Sample Deliverables
None

Last updated: 07-Oct-09 15:08

INFORMATICA CONFIDENTIAL

PHASE 3: ARCHITECT

10 of 45

Phase 3: Architect
Subtask 3.1.3 Develop Configuration Recommendations Description
Using the Architecture Logical View as a guide, and considering any corporate standards or preferences, develop a set of recommendations for how to technically configure the analytic solution. These recommendations will serve as the basis for discussion with the appropriate parties, including project management, the Project Sponsor, system administrators, and potentially the user community. At this point, the recommendations of the Data Architect and Technical Architect should be very well formed, based on their understanding of the business requirements and the current and planned technical standards. The recommendations will be formally documented in the next subtask 3.1.4 Develop Architecture Physical View but are not documented at this stage since they are still considered open to debate. Discussions with interested constituents should focus on the recommended architecture, not on protracted debate over the business requirements. It is critical that the scope of the project be set - and agreed upon - prior to developing and documenting the technical configuration recommendations. Changes in the requirements at this point can have a definite impact on the project delivery date. (Refer back to the Manage Phase for a discussion of scope setting and control issues).

Prerequisites
None

Roles

Data Architect (Secondary) Technical Architect (Primary)

Considerations
The configuration recommendations must balance a number of factors in order to be adopted: Technical solution - The recommended configuration must, of course, solve the technical challenges posed by the analytic solution. In particular, it must consider data capacity and volume throughput requirements. Conformity - The recommended solution should work well within the context of the organization's existing infrastructure and conform to the organization's future infrastructure direction. Cost - The incremental cost of the solution must fit within whatever budgetary parameters have been established by project management. In many cases, incremental costs can be reduced by leveraging existing available hardware resources and leveraging PowerCenters server grid technology. The primary areas to consider in developing the recommendations include, but are not necessarily limited to: Server Hardware and Operating System - Many IT organizations mandate or strongly encourage - the choice of server hardware and operating system to fit into the corporate standards. Depending on the size and throughput requirements, the server may be either UNIX, Linux, or NT-based. The technical architectures should also provide a recommendation of a 32-bit architecture or a 64 bit architecture based on the cost/benefit of each. It is advisable to consider the vast advantages of 64-bit OS and PowerCenter as this is likely to provide increased resources and enable faster processing speeds. This is also likely to support the handling of larger numbers in data. It is also important to ensure the hardware is built for OLAP applications, which typically tend to be computational intensive as compared to OLTP systems which require hyper threading. This determination is important for ensuring improved performance. Also make sure the RAM size is determined in accordance with the systems to be built. In many cases RAM disks can be used in place of RAM when increased RAM availability is an issue. This is especially important when the PowerCenter application creates huge cache files. Consult the Platform Availability Matrix at my.informatica.com for specifics on the applications under consideration for the project. Bear in mind that not all applications have the same level of availability on every platform. This is also true for INFORMATICA CONFIDENTIAL PHASE 3: ARCHITECT 11 of 45

database connectivity (see Database Management System below). Disk Storage Systems The architecture of the disk storage system should also be included in the architecture configuration. Some organizations leverage a Storage Area Network (SAN) to store all data, while other organizations opt for local storage. In any case, careful consideration should be given to disk array and striping configuration in order to optimize performance for the related systems (i.e., database, ETL, and BI). Database Management System Similar to organizational standards that mandate hardware or operating system choices, many organizations also mandate the choice of a database management system. In instances where a choice of the DBMS is available, it is important to remember that PowerCenter and Data Analyzer support a vast array of DBMSs on a variety of platforms (refer to the PowerCenter Installation Guide and Data Analyzer Installation Guide for specifics). A DBMS that is supported by all components in the technical infrastructure, such as OS, ETL, and BI, to name a few, should be chosen. PowerCenter Server The PowerCenter server should, of course, be considered when developing the architecture recommendations. Considerations should include network traffic (between the repository server, PowerCenter server, database server, and client machines), the location of the PowerCenter repository database, and the physical storage that will contain the PowerCenter executables as well as source, target, and cache files. Data Analyzer or other Business Intelligence Data Integration Platforms Whether using Data Analyzer or a different BI tool for analytics, the goal is to develop configuration recommendations that result in a highperformance application passing data efficiently between source system, ETL server, database tables, and BI end-user reports. For Web-based analytic tools such as Data Analyzer, one should also consider user requirements that may dictate that a secure Web-server infrastructure be utilized to provide reporting access outside of the corporate firewall to enable features such as reporting access from a mobile device. Typically, a secure Web-server infrastructure that utilizes a demilitarized zone (DMZ) will result in a different technical architecture configuration than an infrastructure that simply supports reporting from within the corporate firewall.

TIP Use the Architecture Logical View as a starting point for discussing the technical configuration recommendations. As drafts of the physical view are developed, they will be helpful for explaining the planned architecture.

Best Practices
Platform Sizing PowerCenter Enterprise Grid Option

Sample Deliverables
None

Last updated: 24-Jun-10 14:12

INFORMATICA CONFIDENTIAL

PHASE 3: ARCHITECT

12 of 45

Phase 3: Architect
Subtask 3.1.4 Develop Architecture Physical View Description
The physical view of the architecture is a refinement of the logical view, but takes into account the actual hardware and software resources necessary to build the architecture. Much like a physical data model, this view of the architecture depicts physical entities (i.e., servers, workstations, and networks) and their attributes (i.e., hardware model, operating system, server name, IP address). In addition, each entity should show the elements of the logical model supported by it. For example, a UNIX server may be serving as a PowerCenter server engine, Data Analyzer server engine,and may also be running Oracle to store the associated PowerCenter repositories. The physical view is the summarized planning document for the architecture implementation. The physical view is unlikely to explicitly show all of the technical information necessary to configure the system, but should provide enough information for domain experts to proceed with their specific responsibilities. In essence, this view is a common blueprint that the system's general contractor (i.e. the Technical Architect) can use to communicate to each of the subcontractors (i.e. UNIX Administrator, Mainframe Administrator, Network Administrator, Application Server Administrator, DBAs, etc).

Prerequisites
None

Roles

Data Warehouse Administrator (Approve) Database Administrator (DBA) (Primary) System Administrator (Primary)

Considerations
None

Best Practices
Domain Configuration PowerCenter Enterprise Grid Option

Sample Deliverables
None

Last updated: 07-Oct-09 15:11

INFORMATICA CONFIDENTIAL

PHASE 3: ARCHITECT

13 of 45

Phase 3: Architect
Subtask 3.1.5 Estimate Volume Requirements Description
Estimating the data volume and physical storage requirements of a data integration project is a critical step in the architecture planning process. This subtask represents a starting point for analyzing data volumes, but does not include a definitive discussion of capacity planning. Due to the varying complexity and data volumes associated with data integration solutions, it is crucial to review each technical area of the proposed solution with the appropriate experts (i.e., DBAs, Network Administrators, Server System Administrators, etc.).

Prerequisites
None

Roles

Data Architect (Primary) Data Quality Developer (Primary) Database Administrator (DBA) (Primary) System Administrator (Primary) Technical Architect (Secondary)

Considerations
Capacity planning and volume estimation should focus on several key areas that are likely to become system bottlenecks or to strain system capacity, specifically:

Disk Space Considerations


Database size is the most likely factor to affect disk space usage in the data integration solution. As the typical data integration solution does not alter the source systems, there is usually no need to consider their size. However, the target databases, and any ODS or staging areas demand disk storage over and above the existing operational systems. A Database Sizing Model workbook is one effective means for estimating these sizes. During the Architect Phase only a rough volume estimate is required. After the Design Phase is completed, the database sizing model should be updated to reflect the data model and any changes to the known business requirements. The basic techniques for database sizing are well understood by experienced DBAs. Estimates of database size must factor in: Determine the upper bound of the precision of each table row. This can obviously be affected by certain DBMS data types, so be sure to take into account each physical byte consumed. The documentation for the DBMS should specify storage requirements for all supported data types. After the physical data model has been developed, the row width can be calculated. Depending on the type of table, this number may be vastly different for a "young" warehouse than one at "maturity". For example, if the database is designed to store three years of historical sales data, and there is an average daily volume of 5,000 sales, the table will contain 150,000 rows after the first month, but will have swelled to nearly 5.5 million rows at full maturity. Beyond the third year, there should be a process in place for archiving data off the table, thus limiting the size to 5.5 million rows. Indexing can add a significant disk usage penalty to a database. Depending on the overall size of the indexed table, and the size of the keys used in the index, an index may require 30 to 80 percent additional disk space. Again, the DBMS documentation should contain specifics about calculating index size. Partitioning the physical target can greatly increase the efficiency and organization of the load process. However, it does increase the number of physical units to be maintained. Be sure to discuss with the DBAs the most intelligent structuring of the database partitions. Using these basic factors, it is possible to construct a database sizing model (typically in spreadsheet form) that lists all database

INFORMATICA CONFIDENTIAL

PHASE 3: ARCHITECT

14 of 45

tables and indexes, their row widths, and estimated number of rows. Once the row number estimates have been validated, the estimating model should produce a fairly accurate estimate of database size. Note that the model will provide an estimate of raw data size. Be sure to consult the DBAs to understand how to factor in the physical storage characteristics relative to the DBMS being used, such as block parameter sizes. The estimating process also provides a good opportunity to validate the star schema data model. For example, fact tables should contain only composite keys and discrete facts. If a fact table is wider than 32-64 bytes, it may be wise to re-evaluate what is being stored. The width of the fact table is very important, since a warehouse can contain millions, tens of millions, or even hundreds of millions of fact records. The dimension tables, on the other hand, will typically be wider than the fact tables, and may contain redundant data (e.g., names, addresses, etc.), but will have far fewer rows. As a result, the size of the dimension tables is rarely a major contributor to the overall target database size. Since there is the possibility of unstructured data being sourced, transformed and stored, it is important to factor in any conversion in data size, either up or down, from source to target. It is important to remember that Business Intelligence (BI) tools may consume significant storage space, depending on the extent to which they pre-aggregate data and how that data is stored. Because this may be an important factor in the overall disk space requirements, be sure to consider storage techniques carefully during the BI platform selection process.

TIP If you have determined that the star schema is the right model to use for the data integration solution, be sure that the DBAs who are responsible for the target data model understand its advantages. A DBA who is unfamiliar with the star schema may seek to normalize the data model in order to save space. Firmly resist this tendency to normalize.

Data Processing Volume


Data processing volume refers to the amount of data being processed by a given PowerCenter server within a specified timeframe. In most data integration implementations, a load window is allotted representing clock time. This window is determined by the availability of the source systems for extracts and the end-user requirements for access to the target data sources. Maintenance jobs that run on a regular basis may further limit the length of the load window. As a result of the limited load window, the PowerCenter server engine must be able to perform its operations on all data in a given time period. The ability to do so is constrained by three factors: Time it takes to extract the data (potentially including network transfer time, if the data is on a remote server) Transformation time within PowerCenter Load time (which is also potentially impacted by network latency) The biggest factors affecting extract and load times are, however, related to database tuning. Refer to Performance Tuning Databases (Oracle) for suggestions on improving database performance. The throughput of the PowerCenter Server engine is typically the last option for improved performance. Refer to the Velocity Best Practice Tuning Sessions for Better Performance which includes suggestions on tuning mappings and sessions to optimize performance. From an estimating standpoint, however, it is impossible to accurately project the throughput (in terms of rows per second) of a mapping due to the high variability in mapping complexity, quantity and complexity of transformations, and the nature of the data being transformed. It is a more accurate estimation to use clock time to ensure processing within the given load window. If the project includes steps dedicated to improving data quality (for example, as described in Task 4.6) then a related performance factor is the time taken to perform data matching (that is, record de-duplication) operations. Depending on the size of the dataset concerned, data matching operations in Infomatica Data Quality can take several hours of processor time to complete. Data matching processes can be tuned and executed on remote machines on the network to significantly reduce record processing time. Refer to the Best Practice Effective Data Matching Techniques for more information.

Network Throughput
Once the physical data row sizes and volumes have been estimated, it is possible to estimate the required network capacity. It is

INFORMATICA CONFIDENTIAL

PHASE 3: ARCHITECT

15 of 45

important to remember the network overhead associated with packet headers, as this can have an affect on the total volume of data being transmitted. The Technical Architect should work closely with a Network Administrator to examine network capacity between the different components involved in the solution. The initial estimate is likely to be rough, but should provide a sense of whether the existing capacity is sufficient and whether the solution should be architected differently (i.e., move source or target data prior to session execution, re-locate server engine(s), etc.). The Network Administrator can thoroughly analyze network throughput during system and/or performance testing, and apply the appropriate tuning techniques. It is important to involve the network specialists early in the Architect Phase so that they are not surprised by additional network requirements when the system goes into production.

TIP Informatica generally recommends having either the source or target database co-located with the PowerCenter Server engine because this can significantly reduce network traffic. If such co-location is not possible, it may be advisable to FTP data from a remote source machine to the PowerCenter Server as this is a very efficient way of transporting the data across the network.

Best Practices
Database Sizing

Sample Deliverables
None

Last updated: 17-Jun-10 14:44

INFORMATICA CONFIDENTIAL

PHASE 3: ARCHITECT

16 of 45

Phase 3: Architect
Task 3.2 Design Development Architecture Description
The Development Architecture is the collection of technology standards, tools, techniques, and services required to develop a solution. This task involves developing a testing approach, defining the development environments, and determining the metadata strategy. The benefits of defining the development architecture are achieved later in the project, and include good communication and change controls as well as controllable migration procedures. Ignoring proper controls is likely to lead to issues later on in the project. Although the various subtasks that compose this task are described here in linear fashion, all of these subtasks relate to the others, so it is important to approach the overall body of work in this task as a whole and consider the development architecture as a whole.

Prerequisites
None

Roles

Business Project Manager (Primary) Data Architect (Secondary) Data Integration Developer (Secondary) Database Administrator (DBA) (Primary) Metadata Manager (Primary) Presentation Layer Developer (Secondary) Project Sponsor (Review Only) Quality Assurance Manager (Primary) Repository Administrator (Primary) Security Manager (Secondary) System Administrator (Primary) Technical Architect (Primary) Technical Project Manager (Primary)

Considerations
The Development Architecture should be designed prior to the actual start of development because many of the decisions made at the beginning of the project may have unforeseen implications once the development team has reached its full size. The design of the Development Architecture must consider numerous factors including the development environment(s), naming standards, developer security, change control procedures, and more. The scope of a typical PowerCenter implementation, possibly covering more than one project, is much broader than a departmentally-scoped solution. It is important to consider this statement fully, because it has implications for the planned deployment of a solution, as well as the requisite planning associated with the development environment. The main difference is that a departmental data mart type project can be created with only two or three developers in a very short time period. By contrast, a full integration solution involving the creation of an ICC (Integration Competency Center) or an analytic solution that approaches enterprise scale requires more of a "big team" approach. This is because many more organizational groups are involved, adherence to standards is much more important, and testing must be more rigorous, since the results will be visible to INFORMATICA CONFIDENTIAL PHASE 3: ARCHITECT 17 of 45

a larger audience. The following paragraphs outline some of the key differences between a departmental development effort and an enterprise effort: With a small development team, the environment may be simplistic: Communication between developers is easy; it may literally consist of shouting over a cubicle partition. Only one or two repository folders may be necessary, since there is little risk of the developers "stepping on" each other's work. Naming standards are not rigidly enforced. Migration procedures are loose; development objects are moved into production without undue emphasis on impact analysis and change control procedures. Developer security is ignored; typically, all developers use similarly often highly privileged user ids. However, as the development team grows and the project becomes more complex, this simplified environment leads to serious development issues: Developers accustomed to informal communication may not thoroughly inform the entire development team of important changes to shared objects. Repository folders originally named to correspond to individual developers will not adequately support subject area- or release-based development groups. Developers maintaining others' mappings are likely to spend unnecessary time and effort trying to decipher unfamiliar names. Failure to understand the dependencies of shared objects leads to unknown impacts on the dependent objects. The lack of rigor in testing and migrating objects into production leads to runtime bugs and errors in the warehouse loading process. Sharing a single developer ID among multiple developers makes it impossible to determine which developer locked a development object, or who made the last change to an object. More importantly, failure to define secured development groups allows all developers to access all folders, leading to the possibility of untested changes being made in test environments. These factors represent only a subset of the issues that may occur when the development architecture is haphazardly constructed, or "organically" grown. As is the case with the execution environment, a departmental data mart development effort can "get away with" minimal architectural planning. But any serious effort to develop an enterprise-scale analytic solution must be based on well-planned architecture, including both the development and execution environments. In Data Migration projects it is common to build out a set of reference data tables to support the effort. These often include tables to hold configuration details (valid values), cross-reference specifics, default values, data control structures, table-driven parameter tables. These structures will be key component in the development of re-usable objects.

Best Practices
None

Sample Deliverables
None

Last updated: 01-Feb-07 18:45

INFORMATICA CONFIDENTIAL

PHASE 3: ARCHITECT

18 of 45

Phase 3: Architect
Subtask 3.2.1 Develop Quality Assurance Strategy Description
Although actual testing starts with unit testing during the build phase followed by the projects Test Phase, there is far more involved in producing a high quality project. The QA Strategy includes definition of key QA roles, key verification processes and key QA assignments involved in detailing all of the validation procedures for the project.

Prerequisites
None

Roles

Quality Assurance Manager (Primary) Security Manager (Secondary) Test Manager (Primary)

Considerations
In determining what project steps will require verification, the QA Manager or owner of the projects QA processes, should consider the business requirements and the project methodology. Although it may take a sales effort to win over management to a QA process that is highly involved throughout the project, the benefits can be proven historically in the success rates of projects and their ongoing maintenance costs. However, the trade-offs of cost vs. value will likely affect the scope of QA. Potential areas of verification to be considered for QA processes: Formal business requirements reviews with key business stakeholders and sign-off Formal technical requirements reviews with IT stakeholders and sign-off Formal review of environments and architectures with key technical personnel Peer reviews of logic designs Peer walkthroughs of data integration logic (mappings, code, etc.) Unit Testing: definition of procedures, review of test plans, formal sign-off for unit tests Gatekeeping for migration out of Development environment (into QA and/or Production) Regression testing: definition of procedures, review of test plans, formal sign-off System Tests: review of Test Plans, formal acceptance process Defect Management: review of procedures, validation of resolution User Acceptance Test: review of Test Plans, formal acceptance process Documentation review Training materials review Review of Deployment Plan; sign-off for deployment completion

Best Practices
None

Sample Deliverables
None

Last updated: 01-Feb-07 18:45

INFORMATICA CONFIDENTIAL

PHASE 3: ARCHITECT

19 of 45

Phase 3: Architect
Subtask 3.2.2 Define Development Environments Description
Although the development environment was relatively simple in the early days of computer system development when a mainframe-based development project typically involved one or more isolated regions connected to one or more database instances, distributed systems, such as federated data warehouses, involve much more complex development environments, and many more "moving parts." The basic concept of isolating developers from testers, and both from the production system, is still critical to development success. However, relative to a centralized development effort, there are many more technical issues, hardware platforms, database instances, and specialized personnel to deal with. The task of defining the development environment is, therefore, extremely important and very difficult. Because of the wide variance in corporate technical environments, standards, and objectives, there is no "optimal" development environment. Rather, there are key areas of consideration and decisions that must be made with respect to them. After the development environment has been defined, it is important to document its configuration, including (most importantly) the information the developers need to use the environments. For example, developers need to understand what systems they are logging into, what databases they are accessing, what repository (or repositories) they are accessing, and where sources and targets reside. An important component of any development environment is to configure it as close to the test and production environments as possible given time and budget. This can significantly ease the development and integration efforts downstream and will ultimately save time and cost during the testing phases.

Prerequisites
3.1.1 Define Technical Requirements

Roles

Database Administrator (DBA) (Primary) Repository Administrator (Primary) System Administrator (Primary) Technical Architect (Primary) Technical Project Manager (Review Only)

Considerations
The development environment for any data integration solution must consider many of the same issues as a "traditional" development project. The major differences are that the development approach is "repository-centric" (as opposed to codebased), there are multiple sources and targets (unlike a typical system development project, which deals with a single database), and few (if any) hand-coded objects to build and maintain. In addition, because of the repository-based development approach, the development environment must consider all of the following key areas: Repository Configuration. This involves critical decisions, such as whether to use local repositories, a global repository, or both, as well as determining an overall metadata strategy (see 3.2.4 Determine Metadata Strategy ). Folder structure. Within each repository, folders are used to group and organize work units or report objects. To be effective, the folder structure must consider the organization of the development team(s), as well as the change control/migration approach. Developer security. Both PowerCenter and Data Analyzer have built-in security features that allow an administrative user (i.e., the Repository Administrator) to define the access rights of all other users to objects in the repository. The organization of security groups should be carefully planned and implemented prior to the start of development. As an additional option, LDAP can be used to assist in simplifying the organization of users and permissions.

Repository Configuration
INFORMATICA CONFIDENTIAL PHASE 3: ARCHITECT 20 of 45

Informatica's data integration platform, PowerCenter, provides capabilities for integrating multiple heterogeneous sources and targets. The requirements of the development team should dictate to what extent the PowerCenter capabilities are exploited, if at all. In a simple data integration development effort, source data may be extracted from a single database or set of flat files, and then transformed and loaded into a single target database. More complex data integration development efforts involve multiple source and target systems. Some of these may include mainframe legacy systems as well as third-party ERP providers. Most data integration solutions currently being developed involve data from multiple sources, target multiple data marts, and include the participation of developers from multiple areas within the corporate organization. In order to develop a cohesive analytic solution, with shared concepts of the business entities, transformation rules, and end results, a PowerCenter-based development environment is required. There are basically three ways to configure an Informatica-based data integration solution although variations on these three options are certainly possible, particularly with the addition of PowerExchange products (i.e., PowerExchange for SAPNetweaver, PowerExchange for PeopleSoft Enterprise) and Data Analyzer for front-end reporting. However, from a development environment standpoint, the following three configurations serve as the basis for determining how to best configure the environment for developers: Standalone PowerCenter. In this configuration, there is a single repository that cannot be shared with any others within the enterprise. This type of repository is referred to as a local repository and is typically used for small, independent, departmental data marts. Many of the capabilities within PowerCenter are available, including developer security, folder structures, and shareable objects. The primary development restrictions are that the objects in the repository can't be shared with other repositories, and this repository cannot access objects in any other repositories. Multiple developers, working on multiple projects, can still use this repository; folders can be configured to restrict access to specified developers (or groups); and a repository administrator with SuperUser authority can control production objects. This means that there would be an instance of repository for development, testing, and production. Some companies can manage co-locating development and testing on one repository by segregating codes through folder strategies. PowerCenter Data Integration Hub with Networked Local Repositories. This configuration combines a centralized, shared global repository with one or more distributed local repositories. The strength of this solution is that multiple development groups can work semi-autonomously, while sharing common development objects. In the production environment, distributing the server load across the PowerCenter server engines can leverage this same configuration. This option can dramatically affect the definition of the development environment. PowerCenter as a Data Integration Hub with a Data Analyzer Front-End to the Reporting Warehouse. This configuration provides an end-to-end suite of products that allow developers to build the entire data integration solution from data loads to end-user reporting.

PowerCenter Data Integration Hub with Networked Local Repositories


In this advanced repository configuration, the Technical Architect must pay careful attention to the sharing of development objects and the use of multiple repositories. Again, there is no single "correct" solution, only general guidelines for consideration. In most cases, the PowerCenter Global Repository becomes a development focal point. Departmental developers wishing to access enterprise definitions of sources, targets, and shareable objects connect to the Global Repository to do so. The layout of this repository, and its contents, must be thoroughly planned and executed. The Global Repository may include shareable folders containing: Source definitions. Because many source systems may be shared, it is important to have a single "true" version of their schemas resident in the Global Repository. Target definitions. Apply the same logic regarding source definitions. Shareable objects. Shared objects should be created and maintained in a single place; the Global Repository is the place.

TIP It is very important to house all globally-shared database schemas in the Global Repository. Because most IT organizations prefer to maintain their database schemas in a CASE/data modeling tool, the procedures for updating the PowerCenter definitions of source/target schemas must include importing these schemas from tools such as ERwin. It is far easier to develop these procedures for a single (global) repository than for each of the
INFORMATICA CONFIDENTIAL PHASE 3: ARCHITECT 21 of 45

(independent) local repositories that may be using the schemas.


Of course, even if the overall development environment includes a PowerCenter Data Integration Hub, there may still be nonshared sources, targets, and development objects. In these cases, it is perfectly acceptable to house the definitions within a local repository. If necessary, these objects may eventually be migrated into the shared Global Repository. And, it may still make sense to do local development and unit testing in a local repository - even for shared objects, since they shouldn't be shared until they have been fully tested. During the Architect Phase and the Design Phase the Technical Architect should work closely with Project Management and the development lead(s) to determine the appropriate repository placement of development objects in a PowerCenter-based environment. After the initial configuration is determined, the Technical Architect can limit his/her involvement in this area. For example, any data quality steps taken with Infomatica Data Quality (IDQ) applications (such as those implemented in 2.8 Perform Data Quality Audit or 5.3 Design and Build Data Quality Process) are performed using processes saved to a discrete IDQ repository. These processes (called plans in IDQ parlance) can be added to PowerCenter transfomations and subsequently saved with those transformations in the PowerCenter repository. As indicated above, data quality plans can be designed and tested within an IDQ repository before deployment in PowerCenter. Moreover, depending on their purpose, plans may remain in an IDQ server repository, from which they can be distributed as needed across the enterprise, for the life of the project. In addition to the sharing advantages provided by the PowerCenter Data Integration Hub approach, the global repository also serves as a centralized entry point for viewing all repositories linked to it via networked local repositories. This mechanism allows a global repository administrator to oversee multiple development projects without having to separately log-in to each of the individual local repositories. This capability is useful for ensuring that individual project teams are adhering to enterprise standards and may also be used by centralized QA teams, where appropriate.

Folder Architecture Options and Alternatives


Repository folders provide development teams with a simple method for grouping and organizing work units. The process for creating and administering folders is quite simple, and thoroughly explained in Informaticas product documentation. The main area for consideration is the determination of an appropriate folder structure within one or more repositories.

TIP If the migration approach adopted by the Technical Architect involves migrating from a development repository to another repository (test or production), it may make sense for the "target" repository to mirror the folder structure within the development repository. This simplifies the repository-to-repository migration procedures. Another possible approach is to assign the same names to corresponding database connections in both the "source" and "target" repositories. This is particularly useful when performing folder copies from one environment to another because it eliminates the need to change database connection settings after the folder copy has been completed.
The most commonly employed general approaches to folder structure are: Folders by Subject (Target) Area. The Subject Area Division method provides a solid infrastructure for large data warehouse or data mart developments by organizing work by key business area. This strategy is particularly suitable for large projects populating numerous target tables. For example, folder names may be SALES, DISTRIBUTION, etc. Folder Division by Environment. This method is easier to establish and maintain than Folders by Subject Area, but is suitable only for small development teams working with a minimal number of mappings. As each developer completes unit tests in his/her individual work folders, the mappings or objects are consolidated as they are migrated to test or QA. Migration to production is significantly simplified, with the maximum number of required folder copies limited to the number of environments. Eventually however, the number of mappings in a single folder may become too large to easily maintain. Folder names may be DEV1, DEV2, DEV3, TEST, QA, etc. Folder Dividion by Source Area. The Source Area Division method is attractive to some development teams, particularly if development is centralized around the source systems. In these situations, the promotion and deployment process can be quite complex depending on the load strategy. Folder names may be ERP, BILLING, etc. In addition to these basic approaches, many PowerCenter development environments also include developer folders that are used as "sandboxes," allowing for unrestricted freedom in development and testing. Data Analyzer creates Personal Folders for each user name which can be used as a sandbox area for report development and test. Once the developer has completed the initial development and unit testing within his/her own sandbox folder, he/she can migrate the results to the appropriate folder.

INFORMATICA CONFIDENTIAL

PHASE 3: ARCHITECT

22 of 45

TIP PowerCenter does not support nested folder hierarchies, which creates a challenge to logically grouping development objects in different folders. A common technique for logically grouping folders is to use standardized naming conventions, typically prefixing folder names with a brief, unique identifier. For example, suppose three developers are working on the development of a Marketing department data mart. Concurrently, in the same repository, another group of developers is working on a Sales data mart. In order to allow each developer to work in his/her own folder, while logically grouping them together, the folders may be named SALES_DEV1, SALES_DEV2, SALES_DEV3, MRKT_DEV1, etc. Because the folders are arranged alphabetically, all of the SALES-related folders will sort together, as will the MRKT folders .

Finally, it is also important to consider the migration process in the design of the folder structures. The migration process depends largely on the folder structure that is established, and the type of repository environment. In earlier versions of PowerCenter, the most efficient method to migrate an object was to perform a complete folder copy. This involves grouping mappings meaningfully within a folder, since all mappings within the folder migrate together. However, if individual objects need to be migrated, the migration process can become very cumbersome, since each object needs to be "manually" migrated. PowerCenter 7.x introduced the concept of team-based development and object versioning, which integrated a true versioncontrol tool within PowerCenter. Objects can be treated as individual elements and can be checked out for development and checked in for testing. Objects can also be linked together to facilitate their deployment to downstream repositories. Data Analyzer 4.x uses the export and import of repository objects for the migration process among environments. Objects are exported and imported as individual pieces and cannot be linked together in a deployment group as they can in PowerCenter 7.x or migrated as a complete folder as they can in earlier versions of PowerCenter.

Developer Security
The security features built into PowerCenter and Data Analyzer allow the development team to be grouped according to the functions and responsibilities of each member. One common, but risky, approach is to give all developers access to the default Administrator ID provided upon installation of the PowerCenter or Data Analyzer software. Many projects use this approach because it allows developers to begin developing mappings and sessions as soon as the software is installed. INFORMATICA STRONGLY DISCOURAGES THIS PRACTICE. The following paragraphs offer some recommendations for configuring security profiles for a development team. PowerCenter's and Data Analyzers security approach is similar to database security environments. PowerCenters security management is performed through the Repository Manager and Data Analyzers security is performed through tasks on the Administrator tab. The internal security enables multi-user development through management of users, groups, privileges, and folders. Despite the similarities, PowerCenter UserIDs are distinct from database userids, and they are created, managed, and maintained via administrative functions provided by the PowerCenter Repository Manager or Data Analyzer Administrator. Although privileges can be assigned to users or groups, it is more common to assign privileges to groups only, and then add users to each group. This approach is simpler than assigning privileges on a user-by-user basis since there are generally a few groups and many users, and any user can belong to more than one group. Every user must be assigned to at least one group. For companies that have the capabilities to do so, LDAP integration is an available option that can minimize the administration of usernames and passwords separately. If you use LDAP authentication for repository users, the repository maintains an association between repository user names and external login names. When you create a user, you can select the login name from the external directory. For additional information on PowerCenter and Data Analyzer security, including suggestions for configuring user privileges and folder-level privileges, see Configuring Security. As development objects migrate closer to the production environment, security privileges should be tightened. For example, the testing group is typically granted Execute permissions in order to run mappings, but should not be given Write access to the mappings. When the testing team identifies necessary changes, it can communicate those changes (via a Change Request or bug report) to the development group, which fixes the error and re-migrates the result to the test area. The tightest security of all is reserved for promoting development objects into production. In some environments, no member of the development team is permitted to move anything into production. In these cases, a System Owner or other system INFORMATICA CONFIDENTIAL PHASE 3: ARCHITECT 23 of 45

representative outside the development group must be given the appropriate repository privileges to complete the migration process. The Technical Architect and Repository Administrator must understand these conditions while designing an appropriate security solution.

Best Practices
Configuring Security

Sample Deliverables
None

Last updated: 19-Dec-07 16:54

INFORMATICA CONFIDENTIAL

PHASE 3: ARCHITECT

24 of 45

Phase 3: Architect
Subtask 3.2.3 Develop Change Control Procedures Description
Changes are inevitable during the initial development and maintenance stages of any project. Wherever and whenever the changes occur - in the logical and physical data models, extract programs, business rules, or deployment plans - they must be controlled. Change control procedures include formal procedures to be followed when requesting a change to the developed system (such as sources, targets, mappings, mapplets, shared transformations, sessions, or batches for PowerCenter and schemas, global variables, reports, or shared objects for Data Analyzer). The primary purpose of a change control process is to facilitate the coordination among the various organizations involved with effecting this change (i.e., development, test, deployment, and operations). This change control process controls the timing, impact, and method by which development changes are migrated through the promotion hierarchy. However, the change control process must not be so cumbersome as to hinder speed of deployment. The procedures should be thorough and rigid, without imposing undue restrictions on the development team's goal of getting its solution into production in a timely manner. This subtask addresses many of the factors influencing the design of the change control procedures. The procedures themselves should be a well-documented series of steps, describing what happens to a development object once it has been modified (or created) and unit tested by the developer. The change control procedures document should also provide background contextual information, including the configuration of the environment, repositories, and databases.

Prerequisites
None

Roles

Data Integration Developer (Secondary) Database Administrator (DBA) (Secondary) Presentation Layer Developer (Secondary) Quality Assurance Manager (Approve) Repository Administrator (Secondary) System Administrator (Secondary) Technical Project Manager (Primary)

Considerations
It is important to recognize that the change control procedures and the organization of the development environment are heavily dependent upon each other. It is impossible to thoroughly design one without considering the other. The following development environment factors influence the approach taken to change control:

Repository Configuration
Subtask 3.2.2 Define Development Environments discusses the two basic approaches to repository configuration. The first one, Stand-Alone PowerCenter, is the simplest configuration in that it involves a single repository. If that single repository supports both development and production (although this is not generally advisable), then the change control process is fairly straightforward; migrations involve copying the relevant object from a development folder to a production folder, or performing a complete folder copy. However, because of the many advantages gained by isolating development from production environments, Informatica recommends physically separating repositories whenever technically and fiscally feasible. This decision complicates the change control procedures somewhat, but provides a more stable solution. The general approach for migration is similar regardless of whether the environment is a single repository or multiple repository

INFORMATICA CONFIDENTIAL

PHASE 3: ARCHITECT

25 of 45

approach. In either case, logical groupings of development objects have been created, representing the various promotion levels within the promotion hierarchy (e.g., DEV, TEST, QA, PROD). In the single repository approach, the logical grouping is accomplished through the use of folders named accordingly. In the multiple repository approach, an entire repository may be used for one (or more) promotion levels. Whenever possible, the production repository should be independent of the others. A typical configuration would be a shared repository supporting both DEV and TEST, and a separate PROD repository. If the object is a global object (reusable or not reusable), the change must be applied to the global repository. If the object is shared, the shortcuts referencing this object automatically reflect the change from any location in the global or local architecture. Therefore, only the "original" object must be migrated. If the object is stored in both repositories (i.e., global and local), the change must be made in both repositories. Finally, if the object is only stored locally, the change is only implemented in the local repository.

TIP With a PowerCenter Data Integration Hub implementation, global repositories can register local repositories. This provides access to both repositories through one "console", simplifying the administrative tasks for completing change requests. In this case, the global Repository Administrator can perform all repository migration tasks.
Regardless of the repository configuration however, the following questions must be considered in the change control procedures: What PowerCenter or Data Analyzer objects does this change affect? What other system objects are affected by the change? What processes (migration/promotion, load) does this change impact? What processes does the client have in place to handle and track changes? Who else uses the data affected by the change and are they involved in the change request? How will this change be promoted to other environments in a timely manner? What is the effort involved in making this change? Is there time in the project schedule for this change? Is there sufficient time to fully test the change?

Change Request Tracking Method


The change procedures must include a means for tracking change requests and their migration schedules, as well as a procedure for backing out changes, if necessary. The Change Request Form should include information about the nature of the change, the developer making the change, the timing of the request for migration, and enough technical information about the change that it can be reversed if necessary. There are a number of ways to back-out a changed development object. It is important to note, however, that prior to PowerCenter 7.x, reversing a change to a single object in the repository is very tedious and error-prone, and should be considered as a last resort. The time to plan for this occurrence however, is during the implementation of the development environment, not after an incorrect change has been migrated into Production. Backing out a change in PowerCenter 7.x, however, is a simple as reverting to a previous version of the object(s).

Team Based Development, Tracking and Reverting to Previous Version


The team-based development option provides functionality in two areas: versioning and deployment. But, other features, such as repository queries and labeling are necessary to ensure optimal use of versioning and deployment. The following sections describe this functionality at a general level. For a more detailed explanation of any of the capabilities of the team-based development features of PowerCenter, refer to the appropriate sections of the PowerCenter documentation. While the functionality provided via team-based development is quite powerful, it is clear that there are better ways of using it to achieve expected goals. The activities of coordinating development in a team environment, tracking finished work that needs to be reviewed or migrated, managing migrations, and ensuring minimal errors can be quite complex. The process requires a combination of PowerCenter functionality and user process to implement effectively.

Data Migration Projects


For Data Migration projects change control is critical for success. It is common that the target system has continual changes during the life of the data migration project. These cause changes to specifications, which in turn cause a need to change the INFORMATICA CONFIDENTIAL PHASE 3: ARCHITECT 26 of 45

mappings, sessions, workflows, and scripts that make up the data migration project. Change control is important to allow the project management to understand the scope of change and to limit the impact that process changes cause to related processes. For data migration, the key to change control is in the communication of changes to ensure that testing activities are integrated.

Best Practices
None

Sample Deliverables
None

Last updated: 15-Feb-07 18:48

INFORMATICA CONFIDENTIAL

PHASE 3: ARCHITECT

27 of 45

Phase 3: Architect
Subtask 3.2.4 Determine Metadata Strategy Description
Designing, implementing and maintaining a solid metadata strategy is a key enabler of high-quality solutions. The federated architecture model of a PowerCenter-based global metadata repository provides the ability to share metadata that crosses departmental boundaries while allowing non-shared metadata to be maintained independently. A proper metadata strategy provides Data Integration Developers, End-User Application Developers, and End Users with the ability to create a common understanding of the data, where it came from, and what business rules have been applied to it. As such, the metadata may be as important as the data itself, because it provides context and credibility to the data being analyzed. The metadata strategy should describe where metadata will be obtained, where it will be stored, and how it will be accessed. After the strategy is developed, the Metadata Manager is responsible for documenting and distributing it to the development team and end-user community. This solution allows for the following capabilities: Consolidation and cataloging of metadata from various source systems Reporting on cataloged metadata Lineage and where-used Analysis Operational reporting Extensibility The Business Intelligence Metadata strategy can also assist in achieving the goals of data orientation by providing a focus for sharing the data assets of an organization. It can provide a map for managing the expanding requirements for reporting information that the business places upon the IT environment. The metadata strategy highlights the importance of a central data administration department for organizations that are concerned about data quality, integrity, and reuse. The components of a metadata strategy for Data Analyzer include: Determine how metadata will be used in the organization Data stewardship Data ownership Determine who will use what metadata and why Business definitions and names Systems definitions and names Determine training requirements for the power user as well as regular users

Prerequisites
None

Roles

Metadata Manager (Primary) Repository Administrator (Primary) Technical Project Manager (Approve)

Considerations
The metadata captured while building and deploying analytic solution architecture should pertain to each of the system's points of integration, an area where managing metadata provides benefit to IT and/or business users. The Metadata Manager should analyze each point of integration in order to answer the following questions: What metadata needs to be captured?

INFORMATICA CONFIDENTIAL

PHASE 3: ARCHITECT

28 of 45

Who are the users that will benefit from this metadata? Why is it necessary to capture this metadata (i.e., what are the actual benefits of capturing this metadata)? Where is the metadata currently stored (i.e., its source) and where will it ultimately reside? How will the repository be populated initially, maintained, and accessed? It is important to centralize metadata management functions despite the potential "metadata bottleneck" that may be created during development. This consolidation is beneficial when a production system based on clean, reliable metadata is unveiled to the company. The following table expands the concept of the Who, What, Why, Where, and How approach to managing metadata:

Metadata Definition (What?) Source Structures

Users (Who?) Source system users & owners, Data Migration Resources

Benefits (Why?)

Source of Metadata Metadata Store Population Maintenance (Where?) (Where?) (How?) (How?) Captured and loaded once, maintained as necessary

Access (How?) Repository Manager

Allows users Source PowerCenter Source to see all operational Repository Analyzer, structures and system PowerPlugs, associated PowerCenter, elements in Informatica the source Data Explorer system Target Target system Allows users Data Informatica Warehouse Warehouse users/analysts, to see all warehouse, Repository Designer, Structures DW Architects, structures and data marts PowerPlugs, Data migration associated PowerCenter resources elements in the target system Source to Data migration Simplifies PowerCenter Informatica PowerCenter target resources, documentation Repository Designer, mappings business process, Informatica analysts allows for Data Explorer quicker more efficient rework of mappings Reporting Tool Business Allows users Data Informatica analysts to see Analyzer Repository business names and definitions for query-building

Captured and loaded once, maintained as necessary

Repository Manager

Capture Data Changes

Repository Manager

Reporting Tool

Note that the Informatica Data Explorer (IDE) application suite possesses a wide range of functional capabilities for data and metadata profiling and for source-to-target mapping. The Metadata Manager and Repository Manager need to work together to determine how best to capture the metadata, always considering the following points: Source structures. Are source data structures captured or stored already in a CASE/data modeling tool? Are they maintained consistently? Target structures. Are target data structures captured or stored already in a CASE/data modeling tool? Is PowerCenter being used to create target data structure? Where will the models be maintained? Extract, Transform, and Load process. Assuming PowerCenter is being used for the ETL processing; the metadata will be created and maintained automatically within the PowerCenter repository. Also, remember that any ETL INFORMATICA CONFIDENTIAL PHASE 3: ARCHITECT 29 of 45

code developed outside of a PowerCenter mapping (i.e., in stored procedures or external procedures) will not have metadata associated with it. Analytic applications. Several front-end analytic tools have the ability to import PowerCenter metadata. This can simplify the development and maintenance of the analytic solution. Reporting tools. End users working with Data Analyzer may need access to the PowerCenter metadata in order to understand the business context of the data in the target database(s). Operational metadata. PowerCenter automatically captures rich operational data when batches and sessions are executed. This metadata may be useful to operators and end users, and should be considered an important part of the analytic solution.

Best Practices
None

Sample Deliverables
None

Last updated: 18-Oct-07 15:09

INFORMATICA CONFIDENTIAL

PHASE 3: ARCHITECT

30 of 45

Phase 3: Architect
Subtask 3.2.5 Develop Change Management Process Description
Change Management is the process for managing the implementation of changes to a project (i.e., data warehouse or data integration) including hardware, software, services, or related documentation. Its purpose is to minimize the disruption to services caused by change and to ensure that records of hardware, software, services and documentation are kept up to date. The Change Management process enables the actual change to take place. Elements of the process include identify change, create request for change, impact assessment, approval, scheduling, and implementation.

Prerequisites
None

Roles

Business Project Manager (Primary) Project Sponsor (Review Only) Technical Project Manager (Primary)

Considerations Identify Change


Change Management is necessary in any of the following situations: A problem arises that requires a change that will affect more than one business user or a user group such as sales, marketing, etc. A new requirement is identified as a result of advances in technology (e.g., a software upgrade) or a change in needs (for new functionality). A change is required to fulfill a change in business strategy as identified by a business leader or developer.

Request for Change


A request for change should be completed for each proposed change, with a checklist of items to be considered and approved before implementing the change. The change procedures must include a means for tracking change requests and their migration schedules, as well as a procedure for backing out changes, if necessary. The Change Request Form should include information about the nature of the change, the developer making the change, the timing of the request for migration, and enough technical information about the change that it can be reversed if necessary. Before implementing a change request in the PowerCenter environment, it is advisable to create an additional back-up repository. Using this back-up, the repository can be restored to a 'spare' repository database. After a successful restore, the original object can be retrieved via object copy. In addition, be sure to: Track changes manually (electronic or paper change request form), then change the object back to its original form by referring to the change request form. Create one to 'x' number of version folders, where 'x' is the number of versions back that repository information is maintained. If a change needs to be reversed, the object simply needs to be copied to the original development folder from this versioning folder. The number of 'versions' to maintain is at the discretion of the PowerCenter Administrator. Note however, that this approach has the disadvantage of being very time consuming and may also greatly increase the size of the repository databases.

PowerCenter Versions 7.X and 8.X


The team-based development option provides functionality in two areas: versioning and deployment. But, other features, such as repository queries and labeling are required to ensure optimal use of versioning and deployment. The following sections describe this functionality at a general level. For a more detailed explanation of any of the capabilities of the Team-based Development INFORMATICA CONFIDENTIAL PHASE 3: ARCHITECT 31 of 45

features of PowerCenter, please refer to the appropriate sections of the PowerCenter documentation. For clients using Data Analyzer for front-end reporting, certain considerations need to be addressed with the migration of objects: Data Analyzers repository database contains user profiles in addition to reporting objects. If users are synchronized from outside sources (like an LDAP directory or via Data Analyzers API), then a repository restore from one environment to another may delete user profiles (once the repository is linked to LDAP). When reports containing references to dashboards are migrated, the dashboards also need to be migrated to reflect the link to the report. In a clustered Data Analyzer configuration, certain objects that are migrated via XML imports may only be reflected on the node that the import operation was performed on. It may be necessary to stop and re-start the other nodes to refresh these nodes with these changes.

Approval to Proceed
An initial review of the Change Request form should assess the cost and value of proceeding with the change. If sufficient information is not provided on the request form to enable the initial reviewer to thoroughly assess the change, he or she should return the request form to the originator for further details. The originator can then resubmit the change request with the requested information. The change request must be tracked through all stages of the change request process, with thorough documentation regarding approval or rejection and resubmission.

Plan and Prepare Change


Once approval to proceed has been granted, the originator may plan and prepare the change in earnest. The following sections on the request for change must be completed at this stage: Full details of change Inform Administrator, backup repository and backup database. Impact on services and users Inform business users in advance about any anticipated outage. Assessment of risk of the change failing. Fallback plan in case of failure Includes reverting to old version using TBD Date and time of change Migration / Promotion plan Test-Dev and Dev-Prod

Impact Analysis
The Change Control Process must include a formalized approach to completing impact analysis. Any implemented change has some planned downstream impact (e.g., the values on a report will change, additional data will be included, a new target file will be populated, etc.) The importance of the impact analysis process is in recognizing unforeseen downstream affects prior to implementing the change. In many cases, the impact is easy to define. For example, if a requested change is limited to changing the target of a particular session from a flat file to a table, the impact is obvious. However, most changes occur within mappings or within databases, and the hidden impacts can be worrisome. For example, if a business rule change is made, how will the end results of the mapping be affected? If a target table schema needs to be modified within the repository, the corresponding target database must also be changed, and it must be done in sync with the migration of the repository change. An assessment must be completed to determine how a change request affects other objects in the analytic solution architecture. In many development projects, the initial analysis is performed, and then communicated to all affected parties (e.g., Repository Administrator, DBAs, etc.) at a regularly scheduled meeting. This ensures that everyone who needs to be notified is, and that all approve the change request. For PowerCenter, the Repository Manager can be used to identify object interdependencies. An impact analysis must answer the following questions: What PowerCenter or Data Analyzer objects does this change affect? What other system objects are affected by the change? What processes (i.e., migration/promotion, load) does this change impact? What processes does the client have in place to handle and track changes? Who else uses the data affected by the change and are they involved in the change request? How will this change be promoted to other environments in a timely manner? INFORMATICA CONFIDENTIAL PHASE 3: ARCHITECT 32 of 45

What is the effort involved in making this change? Is there time in the project schedule for this change? Is there sufficient time to fully test the change?

Implementation
Following final approval and after relevant and timely communications have been issued, the change may be implemented in accordance with the plan and the scheduled date and time. After implementation, the change request form should indicate whether the change was successful or unsuccessful so as to maintain a clear record of the outcome of the request.

Change Control and Migration /Promotion Process


Identifying the most efficient method for applying change to all environments is essential. Within the PowerCenter and Data Analyzer environments, the types of objects to manage are: Source definitions Target definitions Mappings and mapplets Reusable transformations Sessions Batches Reports Schemas Global variables Dashboards Schedules In addition, there are objects outside of the Informatica architecture that are directly linked to these objects, so the appropriate procedures need to be established to ensure that all items are synchronized. When a change request is submitted, the following steps should occur: 1. Perform impact analysis on the request. List all objects affected by the change, including development objects and databases. 2. Approve or reject the change or migration request. The Project Manager has authority to approve/reject change requests. 3. If approved, pass the request to the PowerCenter Administrator for processing. 4. Migrate the change to the test environment. 5. Test the requested change. If the change does not pass testing, the process will need to start over for this object. 6. Submit the promotion request for migration to QA and/or production environments. 7. If appropriate, the Project Manager approves the request. 8. The Repository Administrator promotes the object to appropriate environments.

Best Practices
None

Sample Deliverables
None

Last updated: 24-Jun-10 17:09

INFORMATICA CONFIDENTIAL

PHASE 3: ARCHITECT

33 of 45

Phase 3: Architect
Subtask 3.2.6 Define Development Standards Description
To ensure consistency and facilitate easy maintenance post production it is important to define and agree on development standards before development work has begun. The standards will define the ground rules for the development team. Standards can range in items from naming conventions to documentation standards to error handling standards. Development work should adhere to these standards throughout the lifecycle and new team members will be able to reference these standards to understand the requirements placed upon the design and build activities.

Prerequisites
None

Roles

Technical Architect (Primary) Technical Project Manager (Primary)

Considerations
While it may be impossible to predict every pattern and every type of object used in a project during an early stage, it is important to define standards for the areas you are certain will be part of the project and to continually update these standards throughout the project as necessary. Standards should be published in a common area that is available to the full team of project developers including internal staff, consultants and off-shore developers. Expectations should be set early on that all developers should follow these standards as part of their development process. New team members should be required to review these documents before beginning any design or development work.

Naming Conventions
Naming conventions are important to development efforts, as common methods and structures for naming can ensure that post production it is easy to understand the development objects created. This allows someone who did not build the original data integration routine to quickly understand the flow and objects and pinpoint areas where enhancements need to be made. Good naming conventions that are rigorously adhered to during development are key in reducing maintenance costs later in the project lifecycle. Since objects need to be named when created there is little effort or expense to implementing standards as part of the project. However, if they are not defined early on in the project it becomes a costly exercise to later implement standards as it will require the renaming of hundreds or thousands of objects.

Templates
Most projects have standard patterns across the list of data integration processes. It can be helpful to identify these patterns and create templates for development work. Some Informatica products allow for automated templates that can be used to generate development objects that follow the pattern. Either way, it can be helpful to document and agree on a template for specific known patterns. In the example of a Data Warehouse project - a specific flavor of a slowly changing dimension may be implemented across the Enterprise Data Warehouse. An Informatica mapping template that suggests the typical pattern of objects, flow and settings that may include error handling can be used by developers as a guide to building all data integration processes for dimensional data.

Adherence to Standards
Depending on the size and nature of the team, it can be difficult to ensure that all team members are following standards. As part of defining the standards defining ways to measure the adherence to the standards may also be considered. Methods could include any of the following:

INFORMATICA CONFIDENTIAL

PHASE 3: ARCHITECT

34 of 45

Formal review of objects with development team members or standards leads Formal production move process that reviews changes/new objects against required standards before migration to production Automated repository queries to isolate and expose object naming and settings that do not meet development standards Again, the effort expended here will drive maintenance and support savings later as it is easier to support hundreds of objects all developed with a common methodology, versus data integration loads with no particular standards or pattern. Often when there are production failures, the ability to quickly understand the data integration process and pinpoint the area causing an issue is critical. Most times the person maintaining the object did not originally develop the object and without good naming and documentation standards it can take time to understand what was developed originally. Minutes can translate into hours or even days of production down problems costing thousands of dollars for the organization.

Integration Competency Center Standards


Organizations that have implemented Integration Competency Centers will often have some level of development standards. These should be used as applicable to the project. There may be a need to review these standards to see if there is a need to enhance them for the project or if there are any patterns that will need additional coverage specifically for the project. For example, the Integration Competency Center may have defined naming conventions for objects. However, this is the first project in the organization to utilize the Informatica B2B Data Transformation toolset. There may be a need to work with the Integration Competency Center to update the naming convention standards to include the new objects created by this new functionality.

Best Practices
Naming Conventions Naming Conventions - Data Quality Naming Conventions - B2B Data Transformation Organizing and Maintaining Parameter Files & Variables

Sample Deliverables
None

Last updated: 24-Jun-10 17:54

INFORMATICA CONFIDENTIAL

PHASE 3: ARCHITECT

35 of 45

Phase 3: Architect
Task 3.3 Implement Technical Architecture Description
While it is crucial to design and implement a technical architecture as part of the data integration project development effort, most of the implementation work is beyond the scope of this document. Specifically, the acquisition and installation of hardware and system software is generally handled by internal resources, and is accomplished by following pre-established procedures. This section touches on these topics, but is not meant to be a step-bystep guide to the acquisition and implementation process. After determining an appropriate technical architecture for the solution (3.1 Develop Solution Architecture), the next step is to physically implement that architecture. This includes procuring and installing the hardware and software required to support the data integration processes.

Prerequisites
3.2 Design Development Architecture

Roles

Database Administrator (DBA) (Secondary) Project Sponsor (Approve) Repository Administrator (Primary) System Administrator (Primary) Technical Architect (Primary) Technical Project Manager (Primary)

Considerations
The project schedule should be the focus of the hardware and software implementation process. The entire procurement process, which may require a significant amount of time, must begin as soon as possible to keep the project moving forward. Delays in this step can cause serious delays to the project as a whole. There are, however, a number of proven methods for expediting the procurement and installation processes, as described in the related subtasks.

Best Practices
None

Sample Deliverables
None

Last updated: 01-Feb-07 18:45

INFORMATICA CONFIDENTIAL

PHASE 3: ARCHITECT

36 of 45

Phase 3: Architect
Subtask 3.3.1 Procure Hardware and Software Description
This is the first step in implementing the technical architecture. The procurement process varies widely among organizations, but is often based on a purchase request (i.e., Request for Purchase or RFP) generated by the Project Manager after the project architecture is planned and configuration recommendations are approved by IT management. An RFP is usually mandatory for procuring any new hardware or software. Although the forms vary widely among companies, an RFP typically lists what products need to be purchased, when they will be needed, and why they are necessary for the project. The document is then reviewed and approved by appropriate management and the organization's "buyer". It is critical to begin the procurement process well in advance of the start of development.

Prerequisites
3.2 Design Development Architecture

Roles

Database Administrator (DBA) (Secondary) Project Sponsor (Approve) Repository Administrator (Secondary) System Administrator (Secondary) Technical Architect (Primary) Technical Project Manager (Primary)

Considerations
Frequently, the Project Manager does not control purchasing new hardware and software. Approval must be received from another group or individual within the organization, often referred to as a "buyer". Even before product purchase decisions are finalized, it is a good idea to notify the buyer of necessary impending purchases, providing a brief overview of the types of products that are likely to be required and for what reasons. It may also be possible to begin the procurement process before all of the prerequisite steps are complete (See 2.2 Define Business Requirements, 3.1.2 Develop Architecture Logical View, and 3.1.3 Develop Configuration Recommendations. The Technical Architect should have a good idea of at least some of the software and hardware choices before a physical architecture and configuration recommendations are solidified. Finally, if development is ready to begin and the hardware procurement process is not yet complete, it may be worthwhile to get started on a temporary server with the intention of moving the work to the new server when it is available.

Best Practices
None

Sample Deliverables
None

Last updated: 01-Feb-07 18:45

INFORMATICA CONFIDENTIAL

PHASE 3: ARCHITECT

37 of 45

Phase 3: Architect
Subtask 3.3.2 Install/Configure Software Description
Installing, configuring, and deploying new hardware and software should not affect the progress of a data integration project. The entire development team depends on a properly configured technical environment. Incorrect installation or delays can have serious negative effects on the project schedule. Establishing and following a detailed installation plan can help avoid unnecessary delays in development. (See 3.1.2 Develop Architecture Logical View).

Prerequisites
3.2 Design Development Architecture

Roles

Database Administrator (DBA) (Primary) Repository Administrator (Primary) System Administrator (Primary) Technical Project Manager (Review Only)

Considerations
When installing and configuring hardware and software for a typical data warehousing project, the following Informatica software components should be considered: PowerCenter Services The PowerCenter services, including the repository, integration, log, and domain services, should be installed and configured on a server machine. PowerCenter Client The client tools for the PowerCenter engine must be installed and configured on the client machines for developers. The DataDirect - ODBC drivers should also be installed on the client machines. The PowerCenter client tools allow a developer to interact with the repository through an easy-to-use GUI interface. PowerCenter Reports PowerCenter Reports (PCR) is a reporting tool that enables users to browse and analyze PowerCenter metadata, allowing users to view PowerCenter operational load statistics and perform impact analysis. PCR is based on Informatica Data Analyzer, running on an included JBOSS application server, to manage and distribute these reports via an internet browser interface. PowerCenter Reports Client The PCR client is a web-based, thin-client tool that uses Microsoft Internet Explorer 6 as the client. Additional client tool installation for the PCR is usually not necessary, although the proper version of Internet Explorer should be verified on client workstations. Data Analyzer Server The analytics server engine for Data Analyzer should be installed and configured on a server. Data Analyzer Client Data Analyzer is a web-based, thin-client tool that uses Microsoft Internet Explorer 6 as the client. Additional client tool installation for Data Analyzer is usually not necessary, although the proper version of Internet Explorer should be verified on the client machines of business users to ensure that minimum requirements are met. PowerExchange PowerExchange has components that must be installed on the source system, PowerCenter server, and client. In addition to considering the Informatica software components that should be installed, the preferred database for the data integration project should be selected and installed, keeping these important database size considerations in mind: PowerCenter Metadata Repository - Although you can create a PowerCenter metadata repository with a minimum of 100MB of database space, Informatica recommends allocating up to 150MB for PowerCenter repositories. Additional space should be added for versioned repositories. The database user should have privileges to create tables, views, and indexes.

INFORMATICA CONFIDENTIAL

PHASE 3: ARCHITECT

38 of 45

Data Analyzer Metadata Repository - Although you can create a Data Analyzer repository with a minimum of 60MB of database space, Informatica recommends allocating up to 150MB for Data Analyzer repositories. The database user should have privileges to create tables, views, and indexes. Metadata Manager Repository Although you can create a Metadata Manager repository with a minimum of 550MB of database space, you may choose to allocate more space in order to plan for future growth. The database user should have privileges to create tables, views, and indexes. Data Warehouse Database Allow for ample space with growth at a rapid pace.

PowerCenter Server Installation


The PowerCenter services need to be installed and configured, along with any necessary database connectivity drivers, such as native drivers or ODBC. Connectivity needs to be established among all the platforms before the Informatica applications can be used. The recommended configuration for the PowerCenter environment is to install the PowerCenter services and the repository and target databases on the same multiprocessor machine. This approach minimizes network interference when the server is writing to the target database. Use this approach when available CPU and memory resources on the multiprocessor machine allow all software processes to operate efficiently without pegging the server. If available hardware dictates that the PowerCenter Server is separated physically from the target database server, Informatica recommends placing a high-speed network connection between the two servers. Some organizations house the repository database on a separate database server if they are running OLAP servers and want to consolidate metadata repositories. Because the repository tables are typically very small in comparison to the data mart tables, and storage parameters are set at the database level, it may be advisable to keep the repository in a separate database. For step-by-step instructions for installing the PowerCenter services, refer to the Informatica PowerCenter Installation Guide. The following list is intended to complement the installation guide when installing PowerCenter: Network Protocol - TCP/IP and IPX/SPX are the supported protocols for communication between the PowerCenter services and PowerCenter client tools. To improve repository performance, consider installing the Repository service on a machine with a fast network connection. To optimize performance, do not install the Repository service on a Primary Domain Controller (PDC) or a Backup Domain Controller (BDC). Native Database Drivers (or ODBC in some instances) are used by the Server to connect to the source, target, and repository databases. Ensure that all appropriate database drivers (and most recent patch levels) are installed on the PowerCenter server to access source, target, and repository databases. Operating System Patches Prior to installing PowerCenter, please refer to the PowerCenter Release Notes documentation to ensure that all required patches have been applied to the operating system. This step is often overlooked and can result in operating system errors and/or failures when running the PowerCenter Server. Data Movement Mode - The DataMovementMode option is set in the PowerCenter Integration Service configuration. The DataMovementMode can be set to ASCII or Unicode.Unicode is an international character set standard that supports all major languages (including US, European, and Asian), as well as common technical symbols. Unicode uses a fixed-width encoding of 16-bits for every character. ASCII is a single-byte code page that encodes character data with 7-bits. Although actual performance results depend on the nature of the application, if international code page support (i.e., Unicode) is not required, set the DataMovementMode to ASCII because the 7-bit storage of character data results in smaller cache sizes for string data, resulting in more efficient data movement. Versioning If Versioning is enabled for a PowerCenter Repository, developers can save multiple copies of any PowerCenter object to the repository. Although this feature provides developers with a seamless way to manage changes during the course of a project, it also results in larger metadata repositories. If Versioning is enabled for a repository, Informatica recommends allocating a minimum of 500MB of space in the database for the PowerCenter repository. Lightweight Directory Access Protocol (LDAP) - If you use PowerCenter default authentication, you create users and maintain passwords in the PowerCenter metadata repository using Repository Manager. The Repository service verifies users against these user names and passwords. If you use Lightweight Directory Access Protocol (LDAP), the Repository service passes a user login to the external directory for authentication, allowing synchronization of PowerCenter user names and passwords with network/corporate user names and passwords. The repository maintains an association between repository user names and external login names. You must create the user name-

INFORMATICA CONFIDENTIAL

PHASE 3: ARCHITECT

39 of 45

login associations, but you do not maintain user passwords in the repository. Informatica provides a PowerCenter plugin that you can use to interface between PowerCenter and an LDAP server. To install the plug-in, perform the following steps: 1. Configure the LDAP module connection information from the Administration Console. 2. 3. Register the package with each repository that you want to use it with. Set up users in each repository.

For more information on configuring LDAP authentication, refer to the Informatica PowerCenter Repository Guide.

PowerCenter Client Installation


The PowerCenter Client needs to be installed on all developer workstations, along with any necessary drivers, including database connectivity drivers such as ODBC. Before you begin the installation, verify that you have enough disk space for the PowerCenter Client. You must have 300MB of disk space to install the PowerCenter 8 Client tools. Also, make sure you have 30MB of temporary file space available for the PowerCenter Setup. When installing PowerCenter Client tools via a standard installation, choose to install the Client tools and ODBC components.

TIP You can install the PowerCenter Client tools in standard mode or silent mode. You may want to perform a silent installation if you need to install the PowerCenter Client on several machines on the network, or if you want to standardize the installation across all machines in the environment. When you perform a silent installation, the installation program uses information in a response file to locate the installation directory. You can also perform a silent installation for remote machines on the network.
When adding an ODBC data source name (DSN) to client workstations, it is a good idea to keep the DSN consistent among all workstations. Aside from eliminating the potential for confusion on individual developer machines, this is important when importing and exporting repository registries. The Repository Manager saves repository connection information in the registry. To simplify the process of setting up client systems, it is possible to export that information, and then import it for a new client. The registry references the data source names used in the exporting machine. If a registry is imported containing a DSN that does not exist on the client system, the connection will fail at runtime.

PowerCenter Reports Installation


PowerCenter Reports (PCR) replaces the PowerCenter Metadata Reporter. The reports are built on the Data Analyzer infrastructure. Data Analyzer must be installed and configured, along with the application server foundation software. Currently, PCR is shipped with the PowerCenter installation (both Standard and Advanced Editions). The recommended configuration for the PCR environment is to place the PCR/Data Analyzer server, application server, and repository databases on the same multiprocessor machine. This approach minimizes network input/output as the PCR server reads from the PowerCenter repository database. Use this approach when available CPU and memory resources on the multiprocessor machine allow all software processes to operate efficiently without pegging the server. If available hardware dictates that the PCR server be physically separated from the PowerCenter repository database server, Informatica recommends placing a high-speed network connection between the two servers. For step-by-step instructions for installing the PowerCenter Reports, refer to the Informatica PowerCenter Installation Guide. The following list of considerations is intended to complement the installation guide when installing PCR: Operating System Patch Levels Prior to installing PCR, be sure to refer to the Data Analyzer Release Notes documentation to ensure that all required patches have been applied to the operating system. This step is often overlooked and can result in operating system errors and/or failures if the correct patches are not applied. Lightweight Directory Access Protocol (LDAP) - If you use default authentication, you create users and maintain passwords in the Data Analyzer metadata repository. Data Analyzer verifies users against these user names and passwords. However, if you use Lightweight Directory Access Protocol (LDAP), Data Analyzer passes a user login to the external directory for authentication, allowing synchronization of Data Analyzer user names and passwords with

INFORMATICA CONFIDENTIAL

PHASE 3: ARCHITECT

40 of 45

network/corporate user names and passwords, as well as PowerCenter user names and passwords. The repository maintains an association between repository user names and external login names. You must create the user namelogin associations, but you do not have to maintain user passwords in the repository. In order to enable LDAP, you must configure the IAS.properties and ldaprealm.properties files. For more information on configuring LDAP authentication, see the Data Analyzer Administration Guide.

PowerCenter Reports Client Installation


The PCR client is a web-based, thin-client tool that uses Microsoft Internet Explorer 6 as the client. The proper version of Internet Explorer should be verified on client machines, ensuring that Internet Explorer 6 is the default web browser, and the minimum system requirements should be validated. In order to use PCR, the client workstation should have at least a 300MHz processor and 128MB of RAM. Please note that these are the minimum requirements for the PCR client, and that if other applications are running on the client workstation, additional CPU and memory is required. In most situations, users are likely to be multi-tasking using multiple applications, so this should be taken into consideration. Certain interactive features in the PCR require third-party plug-in software to work correctly. Users must download and install the plug-in software on their workstation before they can use these features. PCR uses the following third-party plug-in software: Microsoft SOAP Toolkit - In PCR, you can export a report to an Excel file and refresh the data in Excel directly from the cached data in PCR or from data in the data warehouse through PCR. To use the data refresh feature, you must first install the Microsoft SOAP Toolkit. For information on downloading the Microsoft SOAP Toolkit, see Working with Reports in the Data Analyzer User Guide. Adobe SVG Viewer - In PCR, you can display interactive report charts and chart indicators. You can click on an interactive chart to drill into the report data and view details and select sections of the chart. To view interactive charts, you must install Adobe SVG Viewer. For more information on downloading Adobe SVG Viewer, see Managing Account Information in the Data Analyzer User Guide. Lastly, for PCR to display its application windows correctly, Informatica recommends disabling any pop-up blocking utility on your browser. If a pop-up blocker is running while you are working with PCR, the PCR windows may not display properly.

Data Analyzer Server Installation


The Data Analyzer Server needs to be installed and configured along with the application server foundation software. Currently, Data Analyzer is certified on the following application servers: BEA WebLogic IBM WebSphere JBoss Application Server Refer to the PowerCenter Installation Guide for the current list of supported application servers and exact version numbers.

TIP When installing IBM WebSphere Application Server, avoid using spaces in the installation directory path name for the application server, http server, or messaging server.
The recommended configuration for the Data Analyzer environment is to put the Data Analyzer Server, application server, repository, and data warehouse databases on the same multiprocessor machine. This approach minimizes network input/output as the Data Analyzer Server reads from the data warehouse database. Use this approach when available CPU and memory resources on the multiprocessor machine allow all software processes to operate efficiently without pegging the server. If available hardware dictates that the Data Analyzer Server is separated physically from the data warehouse database server, Informatica recommends placing a high-speed network connection between the two servers. For step-by-step instructions for installing the Data Analyzer Server components, refer to the Informatica Data Analyzer Installation Guide. The following list of considerations is intended to complement the installation guide when installing Data Analyzer: Operating System Patch Levels Prior to installing Data Analyzer, refer to the Data Analyzer Release Notes INFORMATICA CONFIDENTIAL PHASE 3: ARCHITECT 41 of 45

documentation to ensure that all required patches have been applied to the operating system. This step is often overlooked and can result in operating system errors and/or failures if the correct patches are not applied. Lightweight Directory Access Protocol (LDAP) - If you use Data Analyzer default authentication, you create users and maintain passwords in the Data Analyzer metadata repository. Data Analyzer verifies users against these user names and passwords. However, if you use Lightweight Directory Access Protocol (LDAP), Data Analyzer passes a user login to the external directory for authentication, allowing synchronization of Data Analyzer user names and passwords with network/corporate user names and passwords, as well as PowerCenter user names and passwords. The repository maintains an association between repository user names and external login names. You must create the user name-login associations, but you do not maintain user passwords in the repository. In order to enable LDAP, you must configure the IAS.properties and ldaprealm.properties files. For more information on configuring LDAP authentication, refer to the Informatica Data Analyzer Administrator Guide.

TIP After installing Data Analyzer on the JBoss application server, set the minimum pool size to 0 in the file <JBOSS_HOME>/server/informatica/deploy/hsqldb-ds.xml. This ensures that the managed connections in JBOSS will be configured properly. Without this setting it is possible that email alert messages will not be sent properly. TIP Repository Preparation Before you install Data Analyzer, be sure to clear the database transaction log for the repository database. If the transaction log is full or runs out of space when the Data Analyzer installation program creates the Data Analyzer repository, the installation program will fail.

Data Analyzer Client Installation


The Data Analyzer Client is a web-based, thin-client tool that uses Microsoft Internet Explorer 6 as the client. The proper version of Internet Explorer should be verified on client machines, ensuring that Internet Explorer 6 is the default web browser, and the minimum system requirements should be validated. In order to use the Data Analyzer Client, the client workstation should have at least a 300MHz processor and 128MB of RAM. Please note that these are the minimum requirements for the Data Analyzer Client, and that if other applications are running on the client workstation, additional CPU and memory is required. In most situations, users are likely to be multi-tasking using multiple applications, so this should be taken into consideration. Certain interactive features in Data Analyzer require third-party plug-in software to work correctly. Users must download and install the plug-in software on their workstation before they can use these features. Data Analyzer uses the following third-party plug-in software: Microsoft SOAP Toolkit - In Data Analyzer, you can export a report to an Excel file and refresh the data in Excel directly from the cached data in Data Analyzer or from data in the data warehouse through Data Analyzer. To use the data refresh feature, you must first install the Microsoft SOAP Toolkit. For information on downloading the Microsoft SOAP Toolkit, see Working with Reports in the Data Analyzer User Guide. Adobe SVG Viewer - In Data Analyzer, you can display interactive report charts and chart indicators. You can click on an interactive chart to drill into the report data and view details and select sections of the chart. To view interactive charts, you must install Adobe SVG Viewer. For more information on downloading Adobe SVG Viewer, see Managing Account Information in the Data Analyzer User Guide. Lastly, for Data Analyzer to display its application windows correctly, Informatica recommends disabling any pop-up blocking utility on your browser. If a pop-up blocker is running while you are working with Data Analyzer, the Data Analyzer windows may not display properly.

Metadata Manager Installation


Metadata Manager software can be installed after the development environment configuration has been completed and approved. The following high-level steps are involved in Metadata Manager installation process: Metadata Manager requires a web server and a Java 2 Enterprise Edition (J2EE)-compliant application server. Metadata Manager works with BEA WebLogic Server, IBM WebSphere Application Server, and JBoss Application Server. If you choose to use BEA WebLogic or IBM WebSphere, they must be installed prior to the Metadata Manager installation. The JBoss Application INFORMATICA CONFIDENTIAL PHASE 3: ARCHITECT 42 of 45

Server can be installed from the Metadata Manager installation process. Informatica recommends that a system administrator, who is familiar with application and web servers, LDAP servers, and the J2EE platform, install the required software. For complete information on the Metadata Manager installation process, refer to the PowerCenter Installation Guide. 1. Install BEA WebLogic Server or IBM WebSphere Application Server on the machine where you plan to install Metadata Manager. You must install the application server and other required software before you install Metadata Manager. 2. You can install Metadata Manager on a machine with a Windows or UNIX operating system. Metadata Manager includes the following installation components: Metadata Manager Limited edition of PowerCenter Metadata Manager documentation in PDF format Metadata Manager and Data Analyzer integrated online help Configuration Console online help Be sure to refer to the Metadata Manager Release Notes for information regarding the supported versions of each application. To install Metadata Manager for the first time, complete each of the following tasks in the order listed below: 1. Create database user accounts. Create one database user account for the Metadata Manager Warehouse and Metadata Manager Server repository and another for the Integration repository. 2. Install the application server. Install BEA WebLogic Server or IBM WebSphere Application Server. 3. Install PowerCenter 8. Install PowerCenter 8 to manage metadata extract and load tasks. 4. Install Metadata Manager. When installing Metadata Manager, provide the connection information for the database user accounts for the Integration repository and the Metadata Manager Warehouse and Metadata Manager Server repository. The Metadata Manager installation creates both repositories and installs other Metadata Manager components, such as the Configuration Console, documentation, and XConnects. 5. Optionally, run the pre-compile utility (for BEA WebLogic Server and IBM WebSphere). If you are using the BEA WebLogic Server as your Application server, optionally pre-compile the JSP scripts to display the Metadata Manager web pages faster when they are accessed for the first time. 6. Apply the product license. Apply the application server license, as well as the PowerCenter and Metadata Manager licenses. 7. Configure the PowerCenter Server. Assign the Integration repository to the PowerCenter Server to enable running of prepackaged XConnect workflows. The workflow for each XConnect extracts metadata from the metadata source repository and loads it into the Metadata Manager Warehouse. Note: For more information about installing Metadata Manager, see Installing Metadata Manager chapter of the PowerCenter Installation Guide. After the software has been installed and tested, the Metadata Manager Administrator can begin creating security groups, users, and the repositories. Following are the some of the initial steps for the Metadata Manager Administrator once the Metadata Manager is installed. For more information on any of these steps, refer to the Metadata Manager Administration Guide. 1. After completing the Metadata Manager installation, configure XConnects to extract metadata. Configure an XConnect for each source repository, and then load metadata from the source repositories into the Metadata Manager Warehouse. 2. Repository registration / creation in the Metadata Manager. Add each source repository to Metadata Manager. This action adds the corresponding XConnect for this repository in the Configuration Console. 3. Set up the Configuration Console. Verify the Integration repository, PowerCenter Server, and PowerCenter Repository Server connections in the Configuration Console. Also, specify the PowerCenter source files directory in the Configuration Console. 4. Set up and run the XConnect for each source repository using the Configuration Console. 5. To limit the tasks that users can perform and the type of source repository metadata objects that users can view and modify, set user privileges and object access permissions.

PowerExchange Installation
INFORMATICA CONFIDENTIAL PHASE 3: ARCHITECT 43 of 45

Before beginning the installation, take time to read the PowerExchange Installation Guide as well as the documentation for the specific PowerExchange products you have licensed and plan to install. Take time to identify and notify resources you are going to need to complete the installation. Depending on the specific product, you could need any or all of the following: Database Administrator PowerCenter Administrator MVS Systems Administrator UNIX Systems Administrator Security Administrator Network Administrator Desktop (PC) Support

Installing the PowerExchange Listener on Source Systems


The process for installing PowerExchange on the source system varies greatly depending on the source system. Take care to read through the installation documentation prior to attempting the installation. The PowerExchange Installation Guide has step by step instructions for installing PowerExchange on all supported platforms.

Installing the PowerExchange Navigator on the PC


The Navigator allows you to create and edit data maps and tables. To install PowerExchange on the desktop (PC) for the first time, complete each of the following tasks in the order listed below: 1. Install the PowerExchange Navigator. Administrator access may be required to install the software. 2. Modify the dbmover.cfg file. Depending on your installation, modifications may not be required. Refer to the PowerExchange Reference Manual for information on the parameters in dbmover.cfg.

Installing PowerExchange Client for the PowerCenter Server


The PowerExchange client for the PowerCenter server allows PowerCenter to read data from PowerExchange data sources. The PowerCenter Administrator should perform the installation with the assistance of a server administrator. It is recommended that a separate user account be created to run the required processes. A PowerCenter Administrator needs to register the PowerExchange plug-in with the PowerExchange repository. Informatica recommends that the installation be performed in one environment and tested from end-to-end (from data map creation to running workflows) before attempting to install the product in other environments.

Best Practices
Advanced Client Configuration Options Advanced Server Configuration Options B2B Data Transformation Installation (for Unix) B2B Data Transformation Installation (for Windows) Installing Data Analyzer PowerExchange Installation (for AS/400) PowerExchange Installation (for Mainframe) Understanding and Setting UNIX Resources for PowerCenter Installations

Sample Deliverables
None

INFORMATICA CONFIDENTIAL

PHASE 3: ARCHITECT

44 of 45

Last updated: 24-Jun-10 13:59

INFORMATICA CONFIDENTIAL

PHASE 3: ARCHITECT

45 of 45

Velocity v9
Phase 4: Design

2011 Informatica Corporation. All rights reserved.

Phase 4: Design
4 Design 4.1 Develop Data Model(s) 4.1.1 Develop Enterprise Data Warehouse Model 4.1.2 Develop Data Mart Model(s) 4.2 Analyze Data Sources 4.2.1 Creating a Source to Target Data Store Matrix 4.2.2 Develop Source to Target Relationships 4.2.3 Determine Source Availability 4.3 Design Physical Database 4.3.1 Develop Physical Database Design 4.4 Design Presentation Layer 4.4.1 Design Presentation Layer Prototype 4.4.2 Present Prototype to Business Analysts 4.4.3 Develop Presentation Layout Design 4.4.4 Design ILM Seamless Access

INFORMATICA CONFIDENTIAL

PHASE 4: DESIGN

2 of 31

Phase 4: Design
The Design Phase lays the foundation for the upcoming Build Phase. In the Design Phase, all data models are developed, source systems are analyzed and physical databases are designed. The presentation layer is designed and a prototype constructed. Each task, if done thoroughly, enables the data integration solution to perform properly and provides an infrastructure that allows for growth and change. Each task in the Design Phase provides the functional architecture for the development process using PowerCenter. The design of target data store may include, data warehouses and data marts, star schemas, web services, message queues or custom databases to drive specific applications or effect a data migration. The Design Phase requires that several preparatory tasks are completed before beginning the development work of building and testing mappings, sessions, and workflows within PowerCenter.

Description

Prerequisites
3 Architect

Roles

Application Specialist (Primary) Business Analyst (Primary) Data Architect (Primary) Data Integration Developer (Primary) Data Quality Developer (Primary) Database Administrator (DBA) (Primary) Presentation Layer Developer (Primary) System Administrator (Primary) Technical Project Manager (Review Only)

Considerations
None

Best Practices
None

Sample Deliverables
None

Last updated: 01-Feb-07 18:45

INFORMATICA CONFIDENTIAL

PHASE 4: DESIGN

3 of 31

Phase 4: Design
Task 4.1 Develop Data Model(s) Description
A data integration/business intelligence project requires logical data models in order to begin the process of designing the target database structures that are going to support the solution architecture. The logical data model will, in turn, lead to the initial physical database design that will support the business requirements and be populated through data integration logic. While this task and its subtasks focus on the data models for Enterprise Data Warehouses and Enterprise Data Marts, many types of data integration projects do not involve a Data Warehouse. Data migration or synchronization projects typically have existing transactional databases as sources and targets; in these cases, the data models may be reverse engineered directly from these databases. The same may be true of data consolidation projects if the target is the same structure as an existing operational database. Operational data integration projects, including data consolidation into new data structures, requires a data model design, but typically one that is dictated by the functional processes. Regardless of the architecture chosen for the data integration solution, the data models for the target databases or data structures, need to be developed in logical and consistent fashion prior to development. Depending on the structure and approach to data storage supporting the data integration solution, the data architecture may include an Enterprise Data Warehouse (EDW) and one or more data marts. In addition, many implementations also include an Operational Data Store (ODS), which may also be referred to as a dynamic data store (DDS) or staging area. Each of these data stores may exist independently of the others, and may reside on completely different database management systems (DBMSs) and hardware platforms. In any case, each of the database schemas comprising the overall solution will require a corresponding logical model. An ODS may be needed when there are operational or reporting uses for the consolidated detail data or to provide a staging area, for example, when there is a short time span to pull data from the source systems. It can act as a buffer between the EDW and the source applications. The data model for the ODS is typically in third-normal form and may be a virtual duplicate of the source systems' models. The ODS typically receives the data after some cleansing and integration, but with little or no summarization from the source systems; the ODS can then become the source for the EDW. Major business intelligence projects require an EDW to house the data imported from many different source systems. The EDW represents an integrated, subject-oriented view of the corporate data comprised of relevant source system data. It is typically slightly summarized so that its information is relevant to management, as opposed to providing all the transaction details. In addition to its numerical details (i.e., atomic-level facts), it typically has derived calculations and subtotals. The EDW is not generally intended for direct access by end users for reporting purposesfor that we have the data marts. The EDW typically has a somewhat de-normalized structure to support reporting and analysis, as opposed to business transactions. Depending on size and usage, a variant of a star schema may be used. Data marts (DMs) are effectively subsets of the EDW. Data marts are fed directly from the enterprise data warehouse, ensuring synchronization of business rules and snapshot times. The logical design structures are typically dimensional star or snowflake schemas. The structures of the data marts are driven by the requirements of particular business users and reporting tools. There may be additions and reductions to the logical data mart design depending on the requirements for the particular data mart. Historical data capture requirements may differ from those on the enterprise data warehouse. A subject-oriented data mart may be able to provide for more historical analysis, or alternatively may require none. Detailed requirements drive content, which in turn drives the logical design that becomes the foundation of the physical database design. Two generic assumptions about business users also affect data mart design: Business users prefer systems they easily understand. Business users prefer systems that deliver results quickly. These assumptions encourage the use of star and snowflake schemas in the solution design. These types of schemas represent business activities as a series of discrete, time-stamped events (or facts) with business-oriented names, such as orders or shipments. These facts contain foreign key "pointers" to one or more dimensions that place the fact into a business context, such as the fiscal quarter in which the shipment occurred, or the sales region responsible for the order. The use of business terminology throughout the star or snowflake schema is much more meaningful to the end user than the typical normalized,

INFORMATICA CONFIDENTIAL

PHASE 4: DESIGN

4 of 31

technology-centric data model. During the modeling phase of a data integration project, it is important to consider all possible methods of obtaining a data model. Analyzing the cost benefits of build vs. buy may well reveal that it is more economical to buy a pre-built subject area model than to invest the time and money in building your own.

Prerequisites
None

Roles

Business Analyst (Secondary) Data Architect (Primary) Data Quality Developer (Primary) Technical Project Manager (Review Only)

Considerations Requirements
The question that should be asked before modeling begins is: Are the requirements sufficiently defined in at least one subject area that the data modeling tasks can begin? If the data modeling requires too much guesswork, at best, time will be wasted or, at worst, the Data Architect will design models that fail to support the business requirements. This question is particularly critical for designing logical data warehouse and data mart schemas. The EDW logical model is largely dependent on source system structures.

Conventions for Names and Data Types


Some internal standards need to be set at the beginning of the modeling process to define data types and names. It is extremely important for project team members to adhere to whatever conventions are chosen. If project team members deviate from the chosen conventions, the entire purpose is defeated. Conventions should be chosen for the prefix and suffix names of certain types of fields. For example, numeric surrogate keys in the data warehouse might use either seq or id as a suffix to easily identify the type of field to the developers. (See Naming Conventions for additional information.) Data modeling tools refer to common data types as domains. Domains are also hierarchical. For example, address can be of a string data type. Residential and business addresses are children of address. Establishing these data types at the beginning of the model development process is beneficial for consistency and timeliness in implementing the subsequent physical database design.

Metadata
A logical data model produces a significant amount of metadata and is likely to be a major focal point for metadata during the project. Metadata integration is a major up-front consideration if metadata is to be managed consistently and competently throughout the project. As metadata has to be delivered to numerous applications used in various stages of a data integration project, an integrated approach to metadata management is required. Informaticas Metadata Services products can be used to deliver metadata from other application repositories to the PowerCenter Repository and from PowerCenter to various business intelligence (BI) tools. Logical data models can be delivered to PowerCenter ready for data integration development. Additionally, metadata originating from these models can be delivered to end users through business intelligence tools. Many business intelligence vendors have tools that can access the PowerCenter Repository through the Metadata Services and Metadata Manager Benefits architectures.

Maintaining the Data Models


Data models are valuable documentation, both to the project and the business users. They should be stored in a repository in INFORMATICA CONFIDENTIAL PHASE 4: DESIGN 5 of 31

order to take advantage of PowerCenter's integrated metadata approach. Additionally, they should be regularly backed-up to file after major changes. Versioning should take place regularly within the repository so that it is possible to roll back several versions of a data model, if necessary. Once the backbone of a data model is in place, a change control procedure should be implemented to monitor any changes requested and record implementation of those changes. Adhering to rigorous change control procedures will help to ensure that all impacts of a change are recognized prior to their implementation. To facilitate metadata analysis and to keep your documentation up-to-date, you may want to consider the metadata reporting capabilities in Metadata Manager to provide automatically updated lineage and impact analysis.

TIP To link logical model design to the requirements specifications, use either of these methods: Option 1: Allocate one of the many entity or attribute description fields that data modeling tools provide to be the link between the elements of the logical design and the requirements documentation. Then, establish (and adhere to) a naming convention for the population of this field to identify the requirements that are met by the presence of a particular entity or attribute. Option 2: Record the name of the entity and associated attribute in a spreadsheet or database with the requirements that they support. Both options 1 and 2 allow for metadata integration. Option 1 is generally preferable because the links can be imported into the PowerCenter Repository through Metadata Exchange.

Best Practices
None

Sample Deliverables
None

Last updated: 15-Feb-07 19:01

INFORMATICA CONFIDENTIAL

PHASE 4: DESIGN

6 of 31

Phase 4: Design
Subtask 4.1.1 Develop Enterprise Data Warehouse Model Description
If the aim of the data integration project is to produce an Enterprise Data Warehouse (EDW), then the logical EDW model should encompass all of the sources that feed the warehouse. This model will be a slightly de-normalized structure to replicate source data from operational systems; it should be neither a full star, nor snowflake schema, nor a highly normalized structure of the source systems. Some of the source structures are redesigned in the model to migrate non-relational sources to relational structures. In some cases, it may be appropriate to provide limited consolidation where common fields are present in various incoming data sources. In summary, the developed EDW logical model should be the sum of all the parts but should exclude detailed attribute information.

Prerequisites
None

Roles

Business Analyst (Secondary) Data Architect (Primary) Technical Project Manager (Review Only)

Considerations Analyzing Sources


Designing an Enterprise Data Warehouse (EDW) is particularly difficult because it is an accumulation of multiple sources. The Data Architect needs to identify and replicate all of the relevant source structures in the EDW data model. The PowerCenter Designer client includes the Source Analyzer and Warehouse Designer tools, which can be useful for this task. These tools can be used to analyze sources, convert them into target structures and then expand them into universal tables. Alternatively, dedicated modeling tools can be used. In PowerCenter Designer, incoming non-relational structures can be normalized by use of the Normalizer transformation object. Normalized targets defined using PowerCenter can then be created in a database and reverse-engineered into the data model, if desired.

Universal Tables
Universal tables provide some consolidation and commonality among sources. For example, different systems may use different codes for the gender of a customer. A universal table brings together the fields that cover the same business subject or business rule. Universal tables are also intended to be the sum of all parts. For example, a customer table in one source system may have only standard contact details while a second system may supply fields for mobile phones and email addresses, but not include a field for a fax number. A universal table should hold all of the contact fields from both systems (i.e., standard contact details plus fields for fax, mobile phones and email). Additionally, universal tables should ensure syntactic consistency such that fields from different source tables represent the same data items and possess the same data types.

Relationship Modeling
Logical modeling tools allow different types of relationships to be identified among various entities and attributes. There are two types of relationships: identifying and non-identifying. An identifying relationship is one in which a child attribute relies on the parent for its full identity. For example, in a bank, an account must have an account type for it to be fully understood. The relationships are reflected in the physical design that a modeling tool produces from the logical design. The tool attempts to enforce identifying relationships through database constraints. Non-identifying relationships are relationships in which the parent object is not required for its identity. A data modeling tool does not enforce non-identifying relationships through constraints when the logical model is used to generate a physical database.

INFORMATICA CONFIDENTIAL

PHASE 4: DESIGN

7 of 31

Many-to-many relationships, one-to-one relationships, and many-to-one relationships can all be defined in logical models. The modeling tools hide the underlying complexities and show those objects as part of a physical database design. In addition, the modeling tools automatically create the lookup tables if the tool is used to generate the database schema.

Historical Considerations
Business requirements and refresh schedules should determine the amount and type of history that an EDW should hold. The logical history maintenance architecture should be common to all tables within the EDW. Capturing historical data usually involves taking snapshots of the database on a regular basis and adding the data to the existing content with time stamps. Alternatively, individual updates can be recorded and the previously current records can be timeperiod stamped or versioned. It is also necessary to decide how far back the history should go.

Data Quality
Data can be verified for validity and accuracy as it comes into the EDW. The EDW can reasonably be expected to answer such questions as: Is the post code or currency code valid? Has a valid date been entered (i.e., minimum age requirement for a driver's license)? Does the data conform to standard formatting rules? Additionally, data values can be evaluated against expected ranges. For example, dates of birth should be in a reasonable range (not after the current date, and not before 1st Jan 1900.) Values can also be validated against reference datasets. As well as using industry-standard references (e.g., ISO Currency Codes, ISO Units of Measure), it may be necessary to obtain or generate new reference data to perform all relevant data quality checks. The Data Architect should focus on the common factors in the business requirements as early as possible. Variations and dimensions specific to certain parts of the organization can be dealt with later in the design. More importantly, focusing on the commonalities early in the process also allows other tasks in the project cycle to proceed earlier. A project to develop an integrated solution architecture is likely to encounter such common business dimensions as organizational hierarchy, regional definitions, a number of calendars, and product dimensions, among others. Subject areas also incorporate metrics. Metrics are measures that businesses use to quantify their performance. Performance measures include productivity, efficiency, client satisfaction, turnover, profit and gross margin. Common business rules determine the formulae for the calculation of the metrics. The Data Architect may determine at this point that subject areas thought to be common are, in fact, not common across the entire organization. Various departments may use different rules to calculate their profit, commission payments, and customer values. These facts need to be identified and labeled in the logical model according to the part of the organization using the differing methods. There are two reasons for this: Common semantics enable business users to know if they are using the same organizational terminology as their colleagues. Commonality ensures continuity between the measures a business currently takes from an operational system and the new ones that will be available in the data integration solution. When the combination of dimensions and hierarchies is understood, they can be modeled. The Data Architect can use a star or snowflake structure to denormalize the data structures. Objectives such as trading ease of maintenance and minimal disk space storage against speed and usability determine whether a simple star or snowflake structure is preferable. One or two central tables should hold the facts. Variations in facts can be included in these tables along with common organizational facts. Variations in dimension may require additional dimensional tables.

TIP Determining Levels of Aggregation

INFORMATICA CONFIDENTIAL

PHASE 4: DESIGN

8 of 31

In the EDW, there may be limited value in holding multiple levels of aggregation. If the data warehouse is feeding dependent data marts, it may be better to aggregate using the PowerCenter server to load the appropriate aggregate data to the data mart. If specific levels are required, they should be modeled in the fact tables at the center of the star schema.

Syndicated Data Sets


Syndicated data sets, such as weather records, should be held in the data warehouse. These external dimensions will then be available as a subset of the data warehouse. It should be assumed that the data set will be updated periodically and that the history will be kept for reference unless the business determines it is not necessary. If the historical data is needed, the syndicated data sets will need to be date-stamped.

Code Lookup Tables


Use of single code lookup tables does not provide the same benefits as a single code lookup table on an OLTP system. The function of a single code lookup table is to provide central maintenance of codes and descriptions. This is not a benefit that can be achieved when populating a data warehouse since data warehouses are potentially loaded from more than one source several times. Having a single database structure is likely to complicate matters in the future. A single code lookup table implies the use of a single surrogate key. If problems occur in the load, they affect all code lookups - not just one. Separate codes would have to be loaded from their various sources and checked for existing records and updates. A single lookup table simply increases the amount of work mapping developers need to carry out to qualify the parts of the table they are concerned with for a particular mapping. Individual lookup tables remove the single point of failure for code lookups and improve development time for mappings; however, they also involve more work for the Data Architect. The Data Architect may prefer to show a single object for codes on the diagrams. He/she should however, ensure that regardless of how the code tables are modeled, they will be physically separable when the physical database implementation takes place.

Surrogate Keys
The use of surrogate keys in most dimensional models presents an additional obstacle that must be overcome in the solution design. It is important to determine a strategy to create, distribute, and maintain these keys as you plan your design. Any of the following strategies may be appropriate: Informatica Generated Keys. The sequence generator transformation allows the creation of surrogate keys natively in Informatica mappings. There are options for reusability, setting key-ranges, and continuous numbering between loads. The limitation to this strategy is that it cannot generate a number higher than 232 . However, two billion is generally big enough for most dimensions. External Table Based. PowerCenter can access an external code table during loads using the look-up transformation to obtain surrogate keys. External Code Generated. Informatica can access a stored procedure or external .dll that contains a programmatic solution to generate surrogate keys. This is done using either the stored procedure transformation or the external procedure transformation. Triggers/Database Sequence. Create a trigger on the target table, either by calling it from the source qualifier transformation or the stored procedure transformation, to perform the insert into the key field.

Best Practices
None

Sample Deliverables
None

Last updated: 15-Feb-07 19:03

INFORMATICA CONFIDENTIAL

PHASE 4: DESIGN

9 of 31

Phase 4: Design
Subtask 4.1.2 Develop Data Mart Model(s) Description
The data mart's logical data model, supports the final step in the integrated enterprise decision support architecture. These models should be easily identified with their source in the data warehouse and will provide the foundation for the physical design. In most modeling tools, the logical model can be used to automatically resolve and generate some of the physical design, such as lookups used to resolve many-to-many relationships. If the data integration project was initiated for the right reasons, the aim of the data mart is to solve a specific business issue for its business sponsors. As a subset of the data warehouse, one data mart may focus on the business customers while another may focus on residential services. The logical design must incorporate transformations supplying appropriate metrics and levels of aggregation for the business users. The metrics and aggregations must incorporate the dimensions that the data mart business users can use to study their metrics. The structure of the dimensions must be sufficiently simple to enable those users to quickly produce their own reports, if desired.

Prerequisites
None

Roles

Business Analyst (Secondary) Data Architect (Primary) Technical Project Manager (Review Only)

Considerations
The subject area of the data mart should be the first consideration because it determines the facts that must be drawn from the Enterprise Data Warehouse into the business-oriented data mart. The data mart will then have dimensions that the business wants to model the facts against. The data mart may also drive an application. If so, the application has certain requirements that must also be considered. If any additional metrics are required, they should be placed in the data warehouse, but the need should not arise if sufficient analysis was completed in earlier development steps.

TIP Keep it Simple! If, as is generally the case, the data mart is going to be used primarily as a presentation layer by business users extracting data for analytic purposes, the mart should use as simple a design as possible.

Best Practices
None

Sample Deliverables
None

Last updated: 01-Feb-07 18:46

INFORMATICA CONFIDENTIAL

PHASE 4: DESIGN

10 of 31

Phase 4: Design
Task 4.2 Analyze Data Sources Description
The goal of this task is to understand the various data sources that will be feeding the solution. Completing this task successfully increases the understanding needed to efficiently map data using PowerCenter. It is important to understand all of the data elements from a business perspective, including the data values and dependencies on other data elements. It is also important to understand where the data comes from, how the data is related, and how much data there is to deal with (i.e., volume estimates).

Prerequisites
None

Roles

Application Specialist (Primary) Business Analyst (Primary) Data Architect (Primary) Data Integration Developer (Primary) Database Administrator (DBA) (Secondary) System Administrator (Primary) Technical Project Manager (Review Only)

Considerations
None

Best Practices
Using Data Explorer for Data Discovery and Analysis

Sample Deliverables
None

Last updated: 01-Feb-07 18:46

INFORMATICA CONFIDENTIAL

PHASE 4: DESIGN

11 of 31

Phase 4: Design
Subtask 4.2.1 Creating a Source to Target Data Store Matrix Description
The Source to Target Data Store Matrix is a consolidated chart that describes all of the known linkages from source data stores to target data stores. A data store can be a table, a flat file or any other data end point. The matrix is used to understand when a single store may be required for multiple targets. It assists in reducing the complexity of a system by identifying common components that can be reused across an implementation.

Prerequisites
None

Roles

Data Architect (Primary) Database Administrator (DBA) (Secondary) Technical Project Manager (Review Only)

Considerations
In some integration efforts the Source to Target Data Store Matrix may be straightforward enough that it is not needed (e.g., Source Customer table to EDW Customer Dimension). In such cases, it may not be worthwhile to create a matrix. However, in situations where the Source Customer Table is used to populate a Master Customer Table, an Active Customer Table, a Customer Address Table and a Customer Contact Table, it can be a useful tool to assist in understanding and managing these relationships. At the point when the detailed Mapping and Shared Object inventory is created, it can be used to look for ways to reduce the number of data integration processes by developing re-use. The matrix is typically created in an excel document with the targets listed down in columns and the sources listed across in rows. Check marks are entered where intersections occur. The matrix may need to be updated frequently throughout the project as new requirements and data sourcing needs are discovered.

Best Practices
None

Sample Deliverables
Source To Target Data Store Matrix

Last updated: 01-Nov-10 23:57

INFORMATICA CONFIDENTIAL

PHASE 4: DESIGN

12 of 31

Phase 4: Design
Subtask 4.2.2 Develop Source to Target Relationships Description
Determining the relationship between the sources and target field attributes is important to identify any rework or target redesign that may be required if specific data elements are not available. This step defines the relationships between the data elements and clearly illuminates possible data issues, such as incompatible data types or unavailable data elements. The Source To Target Data Store Matrix can be used as an input to understand the possible multiple data sources that are required to populate the target. From there, field level information is reviewed and a more detailed matrix is developed to understand and document the nuances of the fields involved.

Prerequisites
4.2.1 Creating a Source to Target Data Store Matrix

Roles

Application Specialist (Secondary) Business Analyst (Primary) Data Architect (Primary) Data Integration Developer (Primary) Technical Project Manager (Review Only)

Considerations
Creating the relationships between the sources and targets is a critical task in the design process. It is important to map all of the data elements from the source data to an appropriate counterpart in the target schema. Taking the necessary care in this effort should result in the following: Identification of any data elements in the target schema that are not currently available from the source. The first step determines what data is not currently available from the source. When the source data is not available, the Data Architect may need to re-evaluate and redesign the target schema or determine where the necessary data can be acquired. Identification of any data elements that can be removed from source records because they are not needed in the target. This step eliminates any data elements that are not required in the target. In many cases, unnecessary data is moved through the extraction process. Regardless of whether the data is coming from flat files or relational sources, it is best to eliminate as much unnecessary data as possible, as early in the process as possible. Determination of the data flow required for moving the data from the source to the target. This can serve as a preliminary design specification for work to be performed during the Build Phase. Any data modifications or translations should be noted during this determination process as the source-to-target relationships are established. Determination of the quality of the data in the source. This ensures that data in the target is of high quality and serves its purpose. All source data should be analyzed in a data quality application to assess its current data quality levels. During the Design Phase , data quality processes can be introduced to fix identified issues and/or enrich data using reference information. Data quality should also be incorporated as an on-going process to be leveraged by the target data source. The next step in this subtask produces a Target-Source Element Matrix which provides a framework for matching the business requirements to the essential data elements and defining how the source and target elements are paired. The matrix lists each of the target tables from the data mart in the rows of the matrix and lists descriptions of the source systems in the columns, to provide the following data: Operational (transactional) system in the organization

INFORMATICA CONFIDENTIAL

PHASE 4: DESIGN

13 of 31

Operational data store External data provider Operating system DBMS Data fields Data descriptions Data profiling/analysis results Data quality operations, where applicable One objective of the data integration solution is to provide an integrated view of key business data. Therefore, for each target table one or more source systems must exist. The matrix should show all of the possible sources for this particular initiative. After this matrix is completed, the data elements must be checked for correctness and validated with both the Business Analyst(s) and the user community. The Project Manager is responsible for ensuring that these parties agree that the data relationships defined in the Target-Source Element Matrix are correct and meet the needs of the data integration solution. Prior to any mapping development work, the Project Manager should obtain sign-off from the Business Analysts and user community.

Undefined Data
In some cases the Data Architect cannot locate or access the data required to establish a rule defined by the Business Analyst. When this occurs, the Business Analyst may need to revalidate the particular rule or requirement to ensure that it meets the end-users' needs. If it does not, the Business Analyst and Data Architect must determine if there is another way to use the available data elements to enforce the rule. Enlisting the services of the System Administrator or another knowledgeable source system resource, may be helpful. If no solution is found, or if the data meets requirements but is not available, the Project Manager should communicate with the end-user community and propose an alternative business rule. Choosing to eliminate data too early in the process due to inaccessibility, however, may cause problems further down the road. The Project Manager should meet with the Business Analyst and the Data Architect to determine what rules or requirements can be changed and which must remain as originally defined. The Data Architect can propose data elements that can be safely dropped or changed without compromising the integrity of the user requirements. The Project Manager must then identify any risks inherent in eliminating or changing the data elements and decide which are acceptable to the project. Some of the potential risks involved in eliminating or changing data elements are: Losing a critical piece of data required for a business rule that was not originally defined but is likely to be needed in the future. Such data loss may require a substantial amount of rework and can potentially affect project timelines. Any change in data that needs to be incorporated in the Source or Target data models requires substantial time to rework and could significantly delay development. Such a change would also push back all tasks defined and require a change in the Project Plan. Changes in the Source system model may drop secondary relationships that were not initially visible.

Source Changes after Initial Assessment


When a source changes after the initial assessment, the corresponding Target-Source Element Matrix must also change. The Data Architect needs to outline everything that has changed, including the data types, names, and definitions. Then, the various risks involved in changing or eliminating data elements must be re-evaluated. The Data Architect should also decide which risks are acceptable. Once again, the System Administrator may provide useful information about the reasons for any changes to the source system and their effect on data relationships.

Best Practices
None

Sample Deliverables
Target-Source Element Matrix

INFORMATICA CONFIDENTIAL

PHASE 4: DESIGN

14 of 31

Last updated: 01-Nov-10 23:55

INFORMATICA CONFIDENTIAL

PHASE 4: DESIGN

15 of 31

Phase 4: Design
Subtask 4.2.3 Determine Source Availability Description
The final step in the 4.2 Analyze Data Sources task is to determine when all source systems are likely to be available for data extraction. This is necessary in order to determine realistic start and end times for the load window. The developers need to work closely with the source system administrators during this step because the administrators can provide specific information about the hours of operations for their systems. The final deliverable in this subtask, the Source Availability Matrix, lists all the sources that are being used for data extraction and specifies the systems' downtimes during a 24-hour period. This matrix should contain details of the availability of the systems on different days of the week, including weekends and holidays.

Prerequisites
None

Roles

Data Integration Developer (Secondary) Database Administrator (DBA) (Secondary) System Administrator (Primary) Technical Project Manager (Review Only)

Considerations
The information generated in this step will be crucial later in the development process for determining load windows and availability of source data. In many multi-national companies, source systems are distributed globaly, and therefore, may not be available for extraction concurrently. This can pose problems when trying to extract data with minimal (or no) disruption of users' day-to-day activities. Determining the source availability can go a long way in determining when the load window for a regularly scheduled extraction can run. This information is also helpful for determining whether an Operational Data Store (ODS) is needed. Sometimes, the extraction times can be so varied among necessary source systems that an ODS or staging area is required purely for logistical reasons.

Best Practices
None

Sample Deliverables
None

Last updated: 27-Oct-10 20:02

INFORMATICA CONFIDENTIAL

PHASE 4: DESIGN

16 of 31

Phase 4: Design
Task 4.3 Design Physical Database Description
The physical database design is derived from the logical models created in Task 4.1. Where the logical design details the relationships between logical entities in the system, the physical design considers the following physical aspects of the database: How the tables are arranged, stored (i.e., on which devices), partitioned, and indexed The detailed attributes of all database columns The likely growth over the life of the database How each schema will be created and archived Hardware availability and configuration (e.g., availability of disk storage space, number of devices, and physical location of storage). The physical design must reflect the end-user reporting requirements, organizing the data entities to allow a fast response to the expected business queries. Physical target schemas typically range from fully normalized (essentially OLTP structures) to snowflake and star schemas, and may contain both detail and aggregate information. The relevant end-user reporting tools, and the underlying RDBMS, may dictate following a particular database structure (e.g., multi-dimensional tools may arrange the data into data "cubes").

Prerequisites
None

Roles

Business Analyst (Secondary) Data Architect (Primary) Database Administrator (DBA) (Primary) System Administrator (Review Only) Technical Project Manager (Review Only)

Considerations
Although many factors influence the physical design of the data marts, end-user reporting needs are the primary driver. These needs determine the likely selection criteria, filters, selection sets and measures that will be used for reporting. These elements may, in turn, suggest indexing or partitioning policies (i.e., to support the most frequent cross-references between data objects or tables and identify the most common table joins) and appropriate access rights, as well as indicate which elements are likely to grow or change most quickly. Long-term strategies regarding growth of a data warehouse, enhancements to its usability and functionality, or additional data marts may all point toward specific design decisions to support future load nad/or reporting requirements. In all cases, the physical database design is tempered by system-imposed limits such as the available disk sizes and numbers; the functionality of the operating system or RDBMS; the human resources available for design and creation of procedures, scripts and DBA duties; and the volume, frequency and speed of delivery of source data. These factors all help to determine the best-fit physical structure for the specific project. A final consideration is how to implement the schema. Database design tools may generate and execute the necessary processes to create the physical tables, and the PowerCenter Metadata Exchange can interact with many common tools to pull target table definitions into the repository. However, automated scripts may still be necessary for dropping, truncating, and creating tables. For Data Migration, the tables that are designed and created are normally either stage tables or reference tables. These tables INFORMATICA CONFIDENTIAL PHASE 4: DESIGN 17 of 31

are generated to simplify the migration process. The table definitions for the target application are almost always provided to the data migration team. These are typically delivered with a packaged application or already exist for the broader project implementation.

Best Practices
None

Sample Deliverables
Physical Data Model Review Agenda

Last updated: 01-Feb-07 18:46

INFORMATICA CONFIDENTIAL

PHASE 4: DESIGN

18 of 31

Phase 4: Design
Subtask 4.3.1 Develop Physical Database Design Description
As with all design tasks, there are both enterprise and workgroup considerations in developing the physical database design. Optimally, the final design should balance the following factors: Ease of end-user reporting from the target Ensuring the maximum throughput and potential for parallel processing Effective use of available system resources, disk space and devices Minimizing DBA and systems administration overhead Effective use of existing tools and procedures Physical designs are required for target data marts, as well as any ODS/DDS schemas or other staging tables. The relevant end-user reporting tools, and the underlying RDBMS, may dictate following a particular database structure (e.g., multi-dimensional tools may arrange the data into data "cubes").

Prerequisites
None

Roles

Data Architect (Primary) Database Administrator (DBA) (Primary) System Administrator (Review Only) Technical Project Manager (Review Only)

Considerations
This task involves a number of major activities: Configuring the RDBMS, which involves determining what database systems are available and identifying their strengths and weakness Resolving hardware issues such as the size, location, and number of storage devices, networking links and required interfaces Determining distribution and accessibility requirements, such as 24x7 access and local or global access Determining if existing tools are sufficient or, if not, selecting new ones Determining back-up, recovery, and maintenance requirements (i.e., will the physical database design exceed the capabilities of the existing systems or make upgrades difficult?) The logical target data models provide the basic structure of the physical design. The physical design provides a structure that enables the source data to be quickly extracted and loaded in the transformation process, and allows a fast response to the enduser queries. Physical target schemas typically range from: Fully normalized (essentially OLTP structures) Denormalized relational structures (e.g., as above but with certain entities split or merged to simplify loading into them, or extracting from them to feed other databases) Classic snowflake and star schemas, ordered as fact and dimension tables in standard RDBMS systems, optimized for end-user reporting.

INFORMATICA CONFIDENTIAL

PHASE 4: DESIGN

19 of 31

Aggregate versions of the above. Proprietary multi-dimensional structures, allowing very fast (but potentially less flexible and detailed) queries. The design must also reflect the end-user reporting requirements, organizing the data entities to provide answers to the expected business queries.

Preferred Strategy
A typical multi-tier strategy uses a mixture of physical structures: Operational Data Store (ODS) design. This is usually closely related to the individual sources, and is, therefore, relationally organized (like the source OLTP), or simply relational copies of source flat files. Optimized for fast loading (to allow connection to the source system to be as short as possible) with few or no indexes or constraints. Data Warehouse design. Tied to subject areas, this may be based on a star-schema (i.e., where significant end-user reporting may occur), or a more normalized relational structure (where the data warehouse acts purely as a feeder to several dependent data marts) to speed up extracts to the subsequent data marts. Data Mart design. The usual source for complex business queries, this typically uses a star or snowflake schema, optimized for set-based reporting and cross-referenced against many, varied combinations of dimensional attributes. May use multi-dimensional structures if a specific set of end-user reporting requirements can be identified. TIP The tiers of a multi-tier strategy each have a specific purpose, which strongly suggests the likely physical structure: ODS - Staging from source should be designed to quickly move data from the operational system. The ODS structure should be very similar to the source since no transformations are performed, and has few indexes or constraints (which slow down loading). The Data Warehouse design should be biased toward feeding subsequent data marts, and should be indexed to allow rapid feeds to the marts, along with a relational structure. At the same time, since the data warehouse functions as the enterprise-wide central point of reference, physical partitioning of larger tables allow it to be quickly loaded via parallel processes. Because data volumes are high, the data warehouse and ODS structures should be as physically close as possible so as to avoid network traffic. Data Marts should be strongly biased toward reporting, most likely as star-schemas, or multi-dimensional cubes. The volumes will be smaller than the parent data warehouse, so the impact of indexes on loading is not as significant.

RDBMS Configuration
The physical database design is tempered by the functionality of the operating system and RDBMS. In an ideal world, all RDBMS systems might provide the same set of functions, level of configuration, and scalability. This is not the case however, different vendors include different features in their systems and new features are included with each new release; this may affect: Physical partitioning. This is not available with all systems. A lack of physical partitioning may affect performance when loading data into growing tables. When it is available, partitioning allows faster parallel loading to a single table, as well as greater flexibility in table reorganisations as well as backup and recovery. Physical device management. Ideally, using many physical devices to store individual targets or partitions can speed loading because several tables on a single device must use the same read-write heads when being updated in parallel. Of course, using multiple, separate devices may result in added administrative overhead and/or work for the DBA (i.e., to define additional pointers and create more complex backup instructions). Limits to individual tables. Older systems may not allow tables to physically grow past a certain size. This may require amending an initial physical design to split up larger tables. TIP Using multiple physical devices to store whole tables allows faster parallel updates to them. If target tables are physically partitioned, the separate partitions can be stored on separate physical devices, allowing a further order of parallel loading. The downside is that extra initial and ongoing DBA and systems administration overhead is required INFORMATICA CONFIDENTIAL PHASE 4: DESIGN 20 of 31

to fully manage partition management, although much of this can be automated using external scripts.

Tools
The relevant end-user reporting tools may dictate following a particular database structure, at least for the data mart and data warehouse designs. Although many popular business intelligence tools (e.g. Business Objects, MicroStrategy and others) can access a wide range of relational and denormalized structures, each generally works best with a particular type (e.g., long/thin vs. short/fat star schema designs). Multi-dimensional (MOLAP) tools often require specific (i.e., proprietary) structures to be used. These tools arrange the data logically into data "cubes", but physically use complex, proprietary systems for storage, indexing, and organization. Database design tools (ErWin, Designer 2000, PowerDesigner) may generate and execute the necessary processes to create the physical tables, but are also subject to their own features and functions.

Hardware Issues
Physical designs should be able to be implemented on the existing system (which can help to identify weaknesses in the physical infrastructure). The areas to consider are: The size of storage available The number of physical devices The physical location of such (e.g., on the same box, on a closely connected box, via a fast network, via telephone lines). The existing network connections, loading, and peaks in demand.

Distribution and Accessibility


For a large system, the likely demands on the data mart should affect the physical design. Factors to consider include: Will end-users require continuous (i.e., 24x7) access, or will a batch window be available to load new data? Each involves some issues: continuous access may require complex partitioning schemes and/or holding multiple copies of the data, while a batch window would allow indexes/constraints to be dropped before loading, resulting in significantly decreased load times. Will different users require access to the same data, but in different forms (e.g., different levels of aggregation, or different sub-sets of the data)? Will all end-users access the same physical data, or local copies of it (which need to be distributed in some way)? This issue affects the potential size of any data mart. TIP If the end-users require 24x7 access, and incoming volumes of source data are very large, it is possible with later releases of major RDBMS tools to load table-space and index partitions entirely separately, only swapping them into the reporting target at the end. This is not true for all databases, however, and, if available, needs to be incorporated into the actual load mechanisms.

Back Up, Recovery And Maintenance


Finally, since unanticipated downtime is likely to affect an organization's ability to plan, forecast, or even operate effectively, the physical structures must be designed with an eye on any existing limits to the general data management processes. Because the physical designs lead to real volumes of data, it is important to determine: Will the designs fit into existing back up processes? Will they execute within the available timeframes and limits? Will recovery processes allow end-users to quickly re-gain access to their reporting system? Will the structures be easy to maintain (i.e., to change, reorganize, rebuild, or upgrade)? TIP INFORMATICA CONFIDENTIAL PHASE 4: DESIGN 21 of 31

Indexing frequently-used selection fields/columns can substantially speed up the response for end-user reporting because the database engine optimizes its search pattern, rather than simply scanning all rows of the table if appropriately indexed fields are used in a request. The more indexes that exist on the target, however, the slower the speed of data loading into the target, since maintaining the indexes becomes an additional load on the database engine. Where an appropriate batch window is available for performing the data load, the indexes can be dropped before loading, and then re-generated after the load. If no window is available, the strategy should be one of balancing the load and reporting needs by careful selection of which fields to index. For Data Migration projects, it is rare that any tables will be designed for the source or target application. If tables are needed they will most likely be staging tables or be used to assist in transformation. It is common that the staging tables will mirror either the source system or the target system. It is encouraged to create two levels of staging where Legacy Stage will mirror the source system and Pre-Load Stage will mirror the target system. Developers often take advantage of PowerCenters table generation functionality in designer for this purpose; to quickly generate needed tables and subsequently to reverse engineer the table definitions with a modeling tool, after the fact. For Data Archiving projects, the key exercise is determining which production tables are the archive candidates given they house the largest amount of historical transactions. Once those tables are identified and selected a separate schema is designed that identically mirrors the production system schema. These archive tables will be made available to the original application via a seamless access layer such that when the archived data is required, it can be joined to the production data. It is important to be careful about documenting the location and purpose of these tables in your architecture and metadata documents as these tables can be confused with the original production tables/schema.

Best Practices
None

Sample Deliverables
None

Last updated: 16-Oct-09 16:27

INFORMATICA CONFIDENTIAL

PHASE 4: DESIGN

22 of 31

Phase 4: Design
Task 4.4 Design Presentation Layer Description
The objective of this task is to design a presentation layer for the end-user community. The developers will use the design that results from this task and its associated subtasks in the Build Phase to build the presentation layer (5.6 Build Presentation Layer). This task includes activities to develop a prototype, demonstrate it to users and get their feedback, and document the overall presentation layer. The purpose of any presentation layer is to design an application that can transform operational data into relevant business information. An analytic solution helps end users to formulate and support business decisions by providing this information in the form of context, summarization, and focus. Note: Readers are reminded that this guide is intentionally analysis-neutral. This section describes some general considerations and deliverables for determining how to deliver information to the end user. This step may actually take place earlier in this phase, or occur in parallel with the data integration tasks. The presentation layer application should be capable of handling a variety of analytical approaches, including the following: Ad hoc reporting. Used in situations where users need extensive direct, interactive exploration of the data. The tool should enable users to formulate there own queries by directly manipulating relational tables and complex joins. Such tools must support: Query formulation that includes multipass SQL, highlighting (alerts), semi-additive summations, and direct SQL entry. Analysis and presentation capabilities like complex formatting, pivoting, charting and graphs, and userchangeable variables. Strong technical features. Thin client web access with ease of use, metadata access, picklist, and seamless integration with other applications. This approach is suitable when users want to answer questions such as, "What were Product X revenues in the past quarter?" Online Analytical Processing (OLAP). Arguably the most common approach and most often associated with analytic solution architectures. There are several types of OLAP (e.g., MOLAP, ROLAP, HOLAP, and DOLAP are all variants), each with their own characteristics. The tool selection process should highlight these distinguishing characteristics in the event that OLAP is deemed the appropriate approach for the organization. The OLAP technologies provide multidimensional access to business information, allowing users to drill down, drill through, and drill across data. OLAP access is more discovery-oriented than ad hoc reporting. Dashboard reporting (Push-Button). Dashboard reporting from the data warehouse effectively replaced the concept of EIS (executive information systems), largely because EIS could not contain sufficient data for true analysis. Nevertheless, the need for an executive style front-end still exists and dashboard reporting (sometimes referred to as Push-Button access) largely fills the need. Dashboard reporting emphasizes the summarization and presentation of information to the end user in a user friendly and extremely graphical interface. Graphical presentation of the information attempts to highlight business trends or exceptional conditions. Data Mining. An artificial intelligence-based technology that integrates large databases and proposes possible patterns or trends in the data. A commonly cited example is the telecommunications company that uses data mining to highlight potential fraud by comparing activity to the customer's previous calling patterns. The key distinction is data mining's ability to deliver trend analysis without specific requests by the end users.

Prerequisites
None

INFORMATICA CONFIDENTIAL

PHASE 4: DESIGN

23 of 31

Roles

Business Analyst (Primary) Presentation Layer Developer (Primary)

Considerations
The presentation layer tool must: Comply with established standards across the organization Be compatible with the current and future technology infrastructures The analysis tool does not necessarily have to be "one size fits all." Meeting the requirements of all end users may require mixing different approaches to end-user analysis. For example, if most users are likely to be satisfied with an OLAP tool while a group focusing on fraud detection requires data mining capabilities, the end-user analysis solution should include several tools, each satisfying the needs of the various user groups. The needs of the various users should be determined by the user requirements defined in 2.2 Define Business Requirements.

Best Practices
None

Sample Deliverables
Information Requirements Specification

Last updated: 01-Feb-07 18:46

INFORMATICA CONFIDENTIAL

PHASE 4: DESIGN

24 of 31

Phase 4: Design
Subtask 4.4.1 Design Presentation Layer Prototype Description
The purpose of this subtask is to develop a prototype of the end-user presentation layer "application" for review by the business community (or its representatives). The result of this subtask is a working prototype for end-user review and investigation. PowerCenter can deliver a rough cut of the data to the target schema; then, Data Analyzer (or other business intelligence tools) can build reports relatively quickly, thereby allowing the end-user capability to evolve through multiple iterations of the design.

Prerequisites
None

Roles

Business Analyst (Primary) Presentation Layer Developer (Primary)

Considerations
It is important to use actual source data in the prototype. The closer the prototype is to what the end user will actually see upon final release, the more relevant the feedback. In this way, end users can see an initial interpretation of their needs and validate or expand upon certain requirements. Also consider the benefits of baselining the user requirements through a sign-off process. This makes it easier for the development team to focus on deliverables. A formal change control request process complements this approach. Baselining user requirements also allows accurate tracking of progress against the project plan and provides transparency to changes in the user requirements. This approach helps to ensure that the project plan remains close to schedule.

Best Practices
None

Sample Deliverables
None

Last updated: 01-Feb-07 18:46

INFORMATICA CONFIDENTIAL

PHASE 4: DESIGN

25 of 31

Phase 4: Design
Subtask 4.4.2 Present Prototype to Business Analysts Description
The purpose of this subtask is to present the presentation layer prototype to business analysts and the end users. The result of this subtask will be a deliverable, the Prototype Feedback document, containing detailed results from the prototype presentation meeting or meetings. The Prototype Feedback document should contain such administrative information as date and time of the meeting, a list of participants, and a summary of what was presented. The bulk of the document should contain a list of participants' approval or rejection of various aspects of the prototype. The feedback should cover such issues as pre-defined reports, presentation of data, review of formulas for any derived attributes, dimensional hierarchies, and so forth. The prototype demonstration should focus on the capabilities of the end user analysis tool and highlight the differences between typical reporting environments and decision support architectures. This subtask also serves to educate the users about the capabilities of their new analysis tool. A thorough understanding of what the tool can provide enables the end users to refine their requirements to maximize the benefit of the new tool. Technologies such as OLAP, EIS and Data Mining often bring a new data analysis capability and approach to end users. In an ad hoc reporting paradigm, end users must precisely specify their queries. Multidimensional analysis allows for much more discovery and research, which follows a different paradigm. A prototype that uses familiar data to demonstrate these abilities helps to launch the education process while also improving the design. The demonstration of the prototype is also an opportunity to further refine the business requirements discovered in the requirements gathering subtask. The end users themselves can offer feedback and ensure that the method of data presentation and the actual data itself are correct.

Prerequisites
4.4.1 Design Presentation Layer Prototype

Roles

Business Analyst (Primary) Presentation Layer Developer (Primary)

Considerations
The Data Integration Developer needs to be an active participant in this subtask to ensure that the presentation layer is developed with a full understanding of the needs of the end users. Using actual source data in the development of the prototype gives the Data Integration Developer a knowledge base of what data is or is not available in the source systems and in what format that data is stored. Having all parties participate in this activity facilitates the process of working through any data issues that may be identified. As with the tool selection process, it is important here to assemble a group that represents the spectrum of end-users across the organization, from business analysts to high-level managers. A cross section of end users at various levels ensures an accurate representation of needs across the organization. Different job functions require different information and may also require various data access methods (i.e., ad hoc, OLAP, EIS, Data Mining). For example, information that is important to business analysts such as metadata, may not be important to a high-level manager, and vice-versa. The demonstration of the presentation layer tool prototype should not be a one-time activity; instead it should be conducted at several points throughout design and development to facilitate and elicit end-user feedback. Involving the end users is vital to getting "buy-in" and ensuring that the system will meet their requirements. User involvement also helps build support for the presentation layer tool throughout the organization.

Best Practices
None

Sample Deliverables
Prototype Feedback INFORMATICA CONFIDENTIAL PHASE 4: DESIGN 26 of 31

Last updated: 01-Feb-07 18:46

INFORMATICA CONFIDENTIAL

PHASE 4: DESIGN

27 of 31

Phase 4: Design
Subtask 4.4.3 Develop Presentation Layout Design Description
The goal of any data integration, warehousing or business intelligence project is to collect and transform data into meaningful information for use by the decision makers of a business. The next step after prototyping the presentation layer and gaining approval from the Business Analysts, is to improve and finalize its design for use by the end users. A well-designed interface effectively communicates this information to the end user. If an interface is not designed intuitively, however, the end users may not be able to successfully leverage the information to their benefit. The principles are the same regardless of the type of application (e.g., customer relationship management reporting or metadata reporting solution).

Prerequisites
4.4.1 Design Presentation Layer Prototype

Roles

Business Analyst (Secondary) Presentation Layer Developer (Primary)

Considerations Types of Layouts


Each piece of information presented to the end user has its own level of importance. The significance and required level of detail in the information to be delivered determines whether to present the information on a dashboard or a report. For example, information that needs to be concise and answers the question Has this measurement fallen below the critical threshold number?, qualifies to be an Indicator on a dashboard. The more critical information in the above-mentioned category, needing to reach the end user without having to wait for the user to log onto the system, needs to be implemented as an Alert. However, most information delivery requirements constitute detailed reports, such as sales data for all the regions or revenue by product category etc.

Dashboards
Data Analyzer dashboards contain all the critical information users need in one single interface. Data can be provided via Alerts, Indicators, or links to Favorite Reports and Shared Documents. Data Analyzer facilitates the design of an appealing presentation layout for the information by providing predefined dashboard layouts. A clear understanding of what needs to be displayed, as well as how many different types of indicators and alerts are going to put on the dashboard are important in the selection of an appropriate dashboard layout. Generally, each subset of data should be placed in a separate container. Detailed Reports can be put as links on the dashboards so that users can easily navigate to more detailed reports.

Report Table Layout


Each report that you are going to build should have suitable design features for the data to be displayed so as to ensure that the report communicates its message effectively. In order to ascertain this, be sure to understand the type of data that each report is going to display before choosing a report table layout. For example, a tabular layout would be appropriate for a sales revenue report that shows the dollar amounts against only one dimension (e.g., product category), but a sectional layout would be more appropriate if the end users are interested in seeing the dollar amounts for each category of the product in each district, one at a time. When developing either a dashboard or report, be sure to consider the following points: Who is your audience? You have to know who is the intended recipient of the information that you are going to provide. The audiences requirements and preferences should drive your presentation style. Often there will be multiple audiences for the information you have to share. On many occasions, you will find that the same information will best serve its purpose if presented in two different styles to two different users. For example: you may have to create multiple dashboards in a single project and personalize each dashboard to a specific group of end users needs.

INFORMATICA CONFIDENTIAL

PHASE 4: DESIGN

28 of 31

What type of information do the users need and what are their expectations? Always remember that the users are looking for very specific pieces of information in the presentation layout. Most of the time, the business users are not highly technically skilled personnel. They do not always have the time or required skills to navigate to various places and search for the specific metric or value that matters to them. Try to place yourself in the users' shoes and ask yourself questions such as what would be the most helpful way to display the information or what could be the possible uses of the information that you are providing. Additionally, the users' expectations will affect the way your information is presented to them. Some users may be interested in more indicators and charts while others may want see detailed reports. The more thoroughly you understand the user expectations, the better you can design presentation layout. Why do they need it? Understanding this can help you to choose the right layout for each piece of information that you have to present. If they want granular information, then they are likely to want a detailed report. However, if they just need quick glimpses of the data, indicators on a dashboard or emailed alerts are likely to be more appropriate. When does the data need to be displayed? It is critical to know when important business processes occur. This can help drive the development and scheduling of reports daily, weekly, monthly, etc. This can also help to determine what type of indicators to develop, such as monthly or daily sales. How should the data be displayed? A well-designed chart, graph or an indicator can convey critical information to the concerned users quickly and accurately. It becomes important to choose the right colors and backgrounds to catch the users attention where it is needed the most. A good example of this would be using a bright red color for all your alerts, green for all the good values and so on.

Tip It is also important to determine if there are any enterprise standards set for the layout designs of the reports and dashboards, especially the color codes as given in the example above.

Best Practices
None

Sample Deliverables
None

Last updated: 15-Feb-07 19:10

INFORMATICA CONFIDENTIAL

PHASE 4: DESIGN

29 of 31

Phase 4: Design
Subtask 4.4.4 Design ILM Seamless Access Description
The seamless access layer consists of one or two schemas in the Production ERP system or on the Online Archive system, depending upon the application. There is a documented solution for seamless access on Oracle E-Business Suite, PeopleSoft, Siebel or Deltek Costpoint. The objects created for seamless access are the same, regardless of the application, and that includes custom applications. They are created by running a built-in job, and with the exception of tuning, no manual intervention is required to create the objects. The design of the seamless access layer is specific to the application and those specifics are documented separately for each of the supported canned applications. For custom applications, the seamless access objects are created and viewing the data is a matter of querying the correct schema. For canned applications, the data can be queried directly, but there is also a solution to allow for accessing the data through the User Interface for that specific canned application. For PeopleSoft a new URL is created that connects to a combined seamless access user providing a view into the data as if no data was ever relocated. When designing seamless access for PeopleSoft a decision should be made on whether to also create an archive-only URL and, if so, the users who should be given the new URL(s) must be identified. For Oracle E-Business Suite the same URL is leveraged and new data groups and responsibilities are created to give access to archive-only and combined versions of the data by switching responsibilities. The design of seamless access in this case involves identifying the responsibilities for an archive-only and combined version and deciding which users need to be assigned to those responsibilities. The data being relocated is old inactive data and most users should not need to have access to the data. For Siebel a new URL is also created that connects to a combined seamless access user giving a view into the data like no data was ever relocated. With Siebel the design of seamless access involves deciding if an archive-only URL should also be created and identifying the users who should be given the new URL(s). For Deltek Costpoint the application user must be DELTEK so the seamless access schemas are created in the Online Archive database and the combined user is called DELTEK. A new URL is created that connects to the DELTEK schema on the Online Archive database to provide a combined view of the data. The design of seamless access with Deltek Costpoint involves identifying the users who should be given the new combined URL. Since multiple source instances are used in the Development and Testing phases of an Archive project the design of seamless access for Deltek Costpoint must take into consideration that DELTEK must be the name for the combined seamless access schema requiring a separate Online Archive database for each Source ERP instance.

Prerequisites
3.3 Implement Technical Architecture

Roles

Application Specialist (Primary) Business Analyst (Primary) Business Project Manager (Secondary) Data Architect (Primary) Database Administrator (DBA) (Secondary)

Considerations
When designing the seamless access layer it is important to keep in mind that the data being relocated by the archiving process is older inactive data. Not all users will need access to the relocated data and it will not be accessed in the same way as the current data in the ERP system. The data that has been relocated is in a read-only format and if any changes need to be made to the data it will have to be restored prior to making those changes.

Best Practices
INFORMATICA CONFIDENTIAL PHASE 4: DESIGN 30 of 31

Seamless Access Oracle E-Business Suite

Sample Deliverables
None

Last updated: 02-Nov-10 15:03

INFORMATICA CONFIDENTIAL

PHASE 4: DESIGN

31 of 31

Velocity v9
Phase 5: Build

2011 Informatica Corporation. All rights reserved.

Phase 5: Build
5 Build 5.1 Launch Build Phase 5.1.1 Review Project Scope and Plan 5.1.2 Review Physical Model 5.1.3 Define Defect Tracking Process 5.2 Implement Physical Database 5.3 Design and Build Data Quality Process 5.3.1 Design Data Quality Technical Rules 5.3.2 Determine Dictionary and Reference Data Requirements 5.3.3 Design and Execute Data Enhancement Processes 5.3.4 Design Run-time and Real-time Processes for Operate Phase Execution 5.3.5 Develop Inventory of Data Quality Processes 5.3.6 Review and Package Data Transformation Specification Processes and Documents 5.4 Design and Develop Data Integration Processes 5.4.1 Design High Level Load Process 5.4.2 Develop Error Handling Strategy 5.4.3 Plan Restartability Process 5.4.4 Develop Inventory of Mappings & Reusable Objects 5.4.5 Design Individual Mappings & Reusable Objects 5.4.6 Build Mappings & Reusable Objects 5.4.7 Perform Unit Test 5.4.8 Conduct Peer Reviews 5.5 Design and Develop B2B Data Transformation Processes 5.5.1 Develop Inventory of B2B Data Transformation Processes 5.5.2 Develop B2B Error Handling and Validation Strategy 5.5.3 Design B2B Data Transformation Processes 5.5.4 Build Parsers, Serializers, Mappers and Streamers 5.5.5 Build B2B Transformation Process from Data Transformation Objects 5.5.6 Unit Test B2B Data Transformation Process 5.6 Design and Build Information Lifecycle Management Processes 5.6.1 Design ILM Entities 5.6.2 Build ILM Entities 5.6.3 Unit Test ILM Entities 5.7 Populate and Validate Database 5.7.1 Build Load Process 5.7.2 Perform Integrated ETL Testing 5.8 Build Presentation Layer 5.8.1 Develop Presentation Layer 5.8.2 Demonstrate Presentation Layer to Business Analysts 5.8.3 Build Seamless Access 5.8.4 Unit Test Seamless Access INFORMATICA CONFIDENTIAL PHASE 5: BUILD 2 of 82

5.8.5 Demonstrate Seamless Access to Business Users

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

3 of 82

Phase 5: Build
Description
The Build Phase uses the design work completed in the Architect Phase and the Design Phase as inputs to physically create the data integration solution including data quality and data transformation development efforts. At this point, the project scope, plan, and business requirements defined in the Manage Phase should be re-evaluated to ensure that the project can deliver the appropriate value at an appropriate time.

Prerequisites
None

Roles

Business Analyst (Primary) Business Project Manager (Secondary) Data Architect (Primary) Data Integration Developer (Primary) Data Quality Developer (Primary) Data Steward/Data Quality Steward (Secondary) Data Warehouse Administrator (Secondary) Database Administrator (DBA) (Primary) Presentation Layer Developer (Primary) Project Sponsor (Approve) Quality Assurance Manager (Primary) Repository Administrator (Secondary) System Administrator (Secondary) Technical Project Manager (Primary) Test Manager (Primary)

Considerations
PowerCenter serves as a complete data integration platform to move data from source to target databases, perform data transformations, and automate the extract, transform, and load (ETL) processes. As a project progresses from the Design Phase to the Build Phase, it is helpful to review the activities involved in each of these processes. Extract - PowerCenter extracts data from a broad array of heterogeneous sources. Data can be accessed from sources including IBM mainframe and AS400 systems, MQ Series, and TIBCO; ERP systems from SAP, Peoplesoft, and Siebel; relational databases; HIPAA sources; flat files; web log sources and direct parsing of XML data files through DTDs or XML schemas. PowerCenter interfaces mask the complexities of the underlying DBMS for the developer, enabling the build process to focus on implementing the business logic of the solution Transform - The majority of the work in the Build Phase focuses on developing and testing data transformations. These transformations apply the business rules, cleanse the data, and enforce data consistency from disparate sources as data is moved from source to target. Load - PowerCenter automates much of the load process. To increase performance and throughput, loads can be INFORMATICA CONFIDENTIAL PHASE 5: BUILD 4 of 82

multi-threaded, pipelined, streamed (concurrent execution of the extract, transform, and load steps), or serviced by more than one server. In addition DB2, Oracle, Sybase IQ and Teradata external loaders can be used to increase performance. Data can be delivered to EAI queues for enterprise applications. Data loads can also take advantage of Open Database Connectivity (ODBC) or use native database drivers to optimize performance. Pushdown optimization can even allow some or all of the transformation work to occur in the target database itself.

Best Practices
None

Sample Deliverables
None

Last updated: 01-Feb-07 18:46

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

5 of 82

Phase 5: Build
Task 5.1 Launch Build Phase Description
In order to begin the Build phase, all analysis performed in previous phases of the project needs to be compiled, reviewed and disseminated to the members of the Build team. Attention should be given to project schedule, scope, and risk factors. The team should be provided with: Project background Business objectives for the overall solution effort Project schedule, complete with key milestones, important deliverables, dependencies, and critical risk factors Overview of the technical design including external dependencies Mechanism for tracking scope changes, problem resolution, and other business issues A series of meetings may be required to transfer the knowledge from the Design team to the Build team, ensuring that the appropriate staff is provided with relevant information. Some or all of the following types of meetings may be required to get development under way: Kick-off meeting to introduce all parties and staff involved in the Build phase Functional design review to discuss the purpose of the project and the benefits expected and review the project plan Technical design review to discuss the source to target mappings, architecture design, and any other technical documentation Information provided in these meetings should enable members of the data integration team to immediately begin development. As a result of these meetings, the integration team should have a clear understanding of the environment in which they are to work, including databases, operating systems, database/SQL tools available in the environment, file systems within the repository and file structures within the organization relating to the project, and all necessary user logons and passwords. The team should be provided with points of contact for all facets of the environment (e.g., DBA, UNIX\NT Administrator, PowerCenter Administrator, etc.). The team should also be aware of the appropriate problem escalation plan. When team members encounter design problems or technical problems, there must be an appropriate path for problem escalation. The Project Manager should establish a specific mechanism for problem escalation along with a problem tracking report.

Prerequisites
None

Roles

Business Analyst (Secondary) Data Architect (Primary) Data Integration Developer (Secondary) Data Warehouse Administrator (Secondary) Database Administrator (DBA) (Secondary) Presentation Layer Developer (Primary) Quality Assurance Manager (Primary) Repository Administrator (Review Only) System Administrator (Review Only) Technical Project Manager (Primary) INFORMATICA CONFIDENTIAL PHASE 5: BUILD 6 of 82

Test Manager (Primary)

Considerations
It is important to include all relevant parties in the launch activities. If all points of discussion cannot be resolved during the kickoff meeting, the key personnel in each area should be present to reschedule quickly, so as not to affect the overall schedule. Because of the nature of the development process, there are often bottlenecks in the development flow. The Project Manager should be aware of the risk factors, which emanate from outside the project, and should be able to anticipate where bottlenecks are likely to occur. The Project Manager also needs to be aware of the external factors that create project dependencies, and should avoid having meetings prematurely when external dependencies have not been resolved. Having meetings prior to resolving these issues can result in significant down time for the developers while they wait to have their sources in place and finalized.

Best Practices
None

Sample Deliverables
None

Last updated: 01-Feb-07 18:46

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

7 of 82

Phase 5: Build
Subtask 5.1.1 Review Project Scope and Plan Description
The Build team needs to understand the project's objectives, scope, and plan in order to prepare themselves for the Build Phase. There is often a tendency to waste time developing non-critical features or functions. The team should review the project plan and identify the critical success factors and key deliverables to avoid focusing on relatively unimportant tasks. This helps to ensure that the project stays on its original track and avoids much unnecessary effort. The team should be provided with: Detailed descriptions of deliverables and timetables. Dependencies that effect deliverables. Critical success factors. Risk assessments made by the design team. With this information, the Build team should be able to enhance the project plan to navigate through the risk areas, dependencies, and tasks to reach its goal of developing an effective solution.

Prerequisites
None

Roles

Business Analyst (Review Only) Data Architect (Review Only) Data Integration Developer (Review Only) Data Warehouse Administrator (Review Only) Database Administrator (DBA) (Review Only) Presentation Layer Developer (Review Only) Quality Assurance Manager (Review Only) Technical Project Manager (Primary)

Considerations

With the Design Phase complete, this is the first opportunity for the team to review what it has learned during the Architect Phase and the Design Phase about the sources of data for the solution. It is also a good time to review and update the project plan, which was created before these findings, to incorporate the knowledge gained during the earlier phases. For example, the team may have learned that the source of data for marketing campaign programs is a spreadsheet that is not easily accessible by the network on which the data integration platform resides. In this case, the team may need to plan additional tasks and time to build a method for accessing the data. This is also an appropriate time to review data profiling and analysis results to ensure all data quality requirements have been taken into consideration. During the project scope and plan review, significant effort should be made to identify upcoming Build Phase risks and assess their potential impact on project schedule and/or cost. Because the design is complete, risk management at this point tends to be more tactical than strategic; however, the team leadership must be fully aware of key risk factors that remain. Team members are responsible for identifying the risk factors in their respective areas and notifying project management during the review process.

Best Practices
None

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

8 of 82

Sample Deliverables
Project Review Meeting Agenda

Last updated: 01-Feb-07 18:46

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

9 of 82

Phase 5: Build
Subtask 5.1.2 Review Physical Model Description
The data integration team needs the physical model of the target database in order to begin analyzing the source to target mappings and develop the end user interface known as the presentation layer. The Data Architect can provide database specifics such as: what are the indexed columns, what partitions are available and how they are defined, and what type of data is stored in each table. The Data Warehouse Administrator can provide metadata information and other source data information, and the Data Integration Developer(s) needs to understand the entire physical model of both the source and target systems, as well as all the dimensions, aggregations, and transformations that will be needed to migrate the data from the source to the target.

Prerequisites
None

Roles

Business Analyst (Secondary) Data Architect (Primary) Data Integration Developer (Secondary) Data Warehouse Administrator (Secondary) Database Administrator (DBA) (Secondary) Presentation Layer Developer (Primary) Quality Assurance Manager (Review Only) Repository Administrator (Review Only) Technical Project Manager (Review Only)

Considerations
Depending on how much up-front analysis was performed prior to the Build phase, the project team may find that the model for the target database does not correspond well with the source tables or files. This can lead to extremely complex and/or poorly performing mappings. For this reason, it is advisable to allow some flexibility in the design of the physical model to permit modifications to accommodate the sources. In addition, some end user products may not support some datatypes specific to a database. For example, Teradata's BYTEINT datatype is not supported by some end user reporting tools. As a result of the various kick-off and review meetings, the data integration team should have sufficient understanding of the database schemas to begin work on the Build-related tasks.

Best Practices
None

Sample Deliverables
Physical Data Model Review Agenda

Last updated: 01-Feb-07 18:46

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

10 of 82

Phase 5: Build
Subtask 5.1.3 Define Defect Tracking Process Description
Since testing is designed to uncover defects, it is crucial to properly record the defects as they are identified, along with their resolution process. This requires a defect tracking system that may be entirely manual, based on shared documents such as spreadsheets, or automated using, say, a database with a web browser front-end. Whatever tool is chosen, sufficient details of the problem must be recorded to allow proper investigation of the root cause and then the tracking of the resolution process. The success of a defect tracking system depends on: Formal test plans and schedules being in place, to ensure that defects are discovered, and that their resolutions can be retested. Sufficient details being recorded to ensure that any problems reported are repeatable and can be properly investigated.

Prerequisites
None

Roles

Data Integration Developer (Review Only) Data Warehouse Administrator (Review Only) Database Administrator (DBA) (Review Only) Presentation Layer Developer (Review Only) Quality Assurance Manager (Primary) Repository Administrator (Review Only) System Administrator (Review Only) Technical Project Manager (Primary) Test Manager (Primary)

Considerations
The defect tracking process should encompass these steps: Testers prepare Problem Reports to describe defects identified. Test Manager reviews these reports and assigns priorities on an Urgent/High/Medium/Low basis (Urgent should only be used for problems that will prevent or severely delay further testing). Urgent problems are immediately passed to the Project Manager for review/action. Non-urgent problems are reviewed by the Test Manager and Project Manager on a regular basis (this can be daily at a critical development time, but is usually less frequent) to agree priorities for all outstanding problems. The Project Manager assigns problems for investigation according to the agreed-upon priorities. The investigator attempts to determine the root cause of the defect and to define the changes needed to rectify the defect. The Project Manager reviews the results of investigations and assigns rectification work to fixers according to priorities and effective use of resources. The fixer make the required changes and conducts unit testing. Regression testing is also typically conducted. The Project Manager may decide to group a number of fixes together to INFORMATICA CONFIDENTIAL PHASE 5: BUILD 11 of 82

make effective use of resources. The Project Manager and Test Manager review the test results at their next meeting and agree on closure, if appropriate.

Best Practices
None

Sample Deliverables
Issues Tracking

Last updated: 01-Feb-07 18:46

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

12 of 82

Phase 5: Build
Task 5.2 Implement Physical Database Description
Implementing the physical database is a critical task that must be performed efficiently to ensure a successful project. In many cases, correct database implementation can double or triple the performance of the data integration processes and presentation layer applications. Conversely, poor physical implementation generally has the greatest negative performance impact on a system. The information in this section is intended as an aid for individuals responsible for the long-term maintenance, performance, and support of the database(s) used in the solution. It should be particularly useful for programmers, Database Administrators, and System Administrators with an in-depth understanding of their database engine and Informatica product suite, as well as the operating system and network hardware.

Prerequisites
None

Roles

Data Architect (Secondary) Data Integration Developer (Review Only) Database Administrator (DBA) (Primary) Repository Administrator (Secondary) System Administrator (Secondary)

Considerations
Nearly everything is a trade-off in the physical database implementation. One example is the trade off of the flexibility of a completely 3rd Normal Form data schema for the improved performance of a 2nd Normal Form database. The DBA is responsible for determining which of the many available alternatives is the best implementation choice for the particular database. For this reason, it is critical for this individual to have a thorough understanding of the data, database, and desired use of the database by the end-user community prior to beginning the physical design and implementation processes. The DBA should be thoroughly familiar with the design of star-schemas for Data Warehousing and Data Integration solutions, as well as standard 3rd Normal Form implementations for operational systems. For data migration projects this task often refers exclusively to the development of new tables in either a reference data schema or staging schemas. Developers are encouraged to leverage a reference data database which will hold reference data such as valid values, cross-reference tables, default values, exception handling details, and other tables necessary for successful completion of the data migration. Additionally, tables will get created in staging schemas. There should be little creation of tables in the source or target system due to the nature of the project. Therefore most of the table development will be in the developer space rather then in the applications that are part of the data migration.

Best Practices
None

Sample Deliverables
None

Last updated: 01-Feb-07 18:46

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

13 of 82

Phase 5: Build
Task 5.3 Design and Build Data Quality Process Description
Follow the steps in this task to design and build the data quality enhancement processes that can ensure that the project data meets the standards of data quality required for progress through the rest of the project. The processes designed in this task are based on the results of 2.8 Perform Data Quality Audit. Both the design and build components are captured in the Build Phase since much of this work is interative as intermediate builds of the data quality process are reviewed, the design is further expanded and enhanced. Note: If the results of the Data Quality Audit indicate that the project data already meets all required levels of data quality, then you can skip this task. However, this is unlikely to occur. Here again (as in subtask 2.3.1 Identify Source Data Systems) it is important to work as far as is practicable with the actual source data. Using data derived from the actual source systems - either the complete dataset or a subset - was essential in identifying quality issues during the Data Quality Audit and determining if the data meets the business requirements (i.e., if it answers the business questions identified in the Manage Phase). The data quality enhancement processes designed in the subtasks of this task must operate on as much of the project dataset(s) as deemed necessary, and possibly the entire dataset. Data quality checks can be of two types: one can cover the metadata characteristics of the data, and the other covers the quality of the data contents from a business perspective. In the case of complex ERP systems like SAP, where implementation has a high degree of variation from the base product, a thorough data quality check should be performed to consider the customizations.

Prerequisites
None

Roles

Business Analyst (Primary) Business Project Manager (Secondary) Data Integration Developer (Secondary) Data Quality Developer (Primary) Data Steward/Data Quality Steward (Secondary) Technical Project Manager (Approve)

Considerations
Because the quality of the source system data has a major effect on the correctness of all downstream data, it is imperative to resolve as many of the data issues as possible, as early as possible. Making the necessary corrections at this stage eliminates many of the questions that may otherwise arise later during testing and validation. If the data is flawed, the development initiative faces a very real danger of failing. In addition, eliminating errors in the source data makes it far easier to determine the nature of any problems that may arise in the final data outputs. If data comes from different sources, it is mandatory to correct data for each source as well as for the integrated data. If data comes from a mainframe, it is necessary to use the proper access method to interpret data correctly. Note however that Informatica Data Quality (IDQ) applications do not read data directly from mainframe. As indicated above, the issue of data quality covers far more than simply whether the source and target data definitions are compatible. From the business perspective, data quality processes seek to answer the following questions: what standard has the data achieved in areas that are important to the business, and what standards are required in these areas? There are six main areas of data quality performance: Accuracy, Completeness, Conformity, Consistency, INFORMATICA CONFIDENTIAL PHASE 5: BUILD 14 of 82

Integrity, and Duplication. These are fully explained in task 2.8 Perform Data Quality Audit. The Data Quality Developer uses the results of the Data Quality Audit as the benchmark for the data quality enhancement steps you need to apply in the current task. Before beginning to design the data quality processes, the Data Quality Developer, Business Analyst, Project Sponsor, and other interested parties must meet to review the outcome of the Data Quality Audit and agree the extent of remedial action needed for the project data. The first step is to agree on the business rules to be applied to the data. (See Subtask 5.3.1 Design Data Quality Technical Rules.) The tasks that follow are written from the perspective of Informatica Data Quality, Informaticas dedicated data quality application suite.

Best Practices
Data Cleansing

Sample Deliverables
None

Last updated: 01-Feb-07 18:46

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

15 of 82

Phase 5: Build
Subtask 5.3.1 Design Data Quality Technical Rules Description
Business rules are a key driver of data enhancement processes. A business rule is a condition of the data that must be true if the data is to be valid and, in a larger sense, for a specific business objective to succeed. In many cases, poor data quality is directly related to the datas failure concerning a business rule. In this subtask the Data Quality Developer and the Business Analyst, and optionally other personnel representing the business, establish the business rules to be applied to the data. An important factor in completing this task is proper documentation of the business rules.

Prerequisites
None

Roles

Business Analyst (Primary) Data Quality Developer (Primary) Data Steward/Data Quality Steward (Secondary)

Considerations
All areas of data quality can be affected by business rules, and business rules can be defined at high- and low-levels and at varying levels of complexity. Some business rules can be tested mathematically using simple processes, whereas others may require complex processes or reference data. For example, consider a financial institution that must store several types of information for account holders in order to comply with the Sarbanes-Oxley or the USA-PATRIOT Act. It defines several business rules for its database data, including: Field 1-Field n must not be null or populated with default values. Date of Birth field must contain dates within certain ranges (e.g., to indicate that the account holder is between 18 and 100 years old). All account holder addresses are considered valid by the postal service. These three rules are equally easy to express, but they are implemented in different ways. All three rules can be checked in a straightforward manner using Informatica Data Quality (IDQ), although the third rule, concerning address validation, requires reference data verification. The decision to use external reference data is covered in subtask 5.3.2 Determine Dictionary and Reference Data Requirements. When defining business rules, the Data Quality Developer must consider the following questions: How to document the rules. How to build the data quality processes to validate the rules.

Documenting Business Rules


Documenting rules is essential as a means of tracking the implementation of the business requirements. When documenting business rules, the following information must be provided: A unique ID should be provided for each rule. This can be as simple as a incremented number, or assigning a project code to each rule. A rule name should be captured to act as a quick reference and description. A text description of the rule provided by the Business Analyst. This should cover the business intent of the rule and be as complete as possible however, if the description becomes too lengthy or complex, it may be advisable to break it down into multiple rules. INFORMATICA CONFIDENTIAL PHASE 5: BUILD 16 of 82

The name of the data source containing the records affected by the rule. The data headers or field names containing the values affected by the rule. The Data Quality Developer and the Business Analyst can refer back to the results of the Data Quality Audit to identify this information. A technical definition of the rule should be created. This will allow for better reconciliation of business intent and technical implementation earlier in the process. Add columns for the mapping name and the results of implementing the rule. The Data Quality Developer can provide this information later. Note: In IDQ, a discrete data quality process is called a mapping. A data quality mapping has sources, targets, and analysis or enhancement algorithms and is analogous to a PowerCenter mapping. It is important to understand that a data quality mapping can be added to a PowerCenter mapping as a mapplet. These data quality mapplets will run when the container PowerCenter mapping is run in a workflow.

Assigning Business Rules to Data Quality Mappings


When the Data Quality Developer and Business Analyst have agreed on the business rules to apply to the data, the Data Quality Developer must decide how to convert the rules into data quality mappings. (The Data Quality Developer need not create the mappings at this stage) The Data Quality Developer may create a mapping for each rule, or may incorporate several rules into a single mapping. This decision is taken on a rule-by-rule basis. There is a trade-off between reusability, simplicity, and efficiency in mapping design. It may be more efficient to develop a single mapping that applies a number of rules, but the additional constraints or conditions may limit its reusability. Typically a mapping handles more than one rule. One advantage of this course of action is that the Data Quality Developer does not need to define and maintain multiple instances of input and output data, covering small increments of data quality progress, where a single set of inputs and outputs can do the same job in a more sophisticated mapping. Its also worth considering if the mapping will be run from within IDQ or added to a PowerCenter mapping for execution in a workflow. Additional steps will be required if the data quality mapping uses reference data stored in files or tables. It will need to be migrated or made accessible along with importing the Data Quality mapplet.

Best Practices
Data Quality Mapping Rules for PowerCenter

Sample Deliverables
None

Last updated: 26-Oct-10 21:15

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

17 of 82

Phase 5: Build
Subtask 5.3.2 Determine Dictionary and Reference Data Requirements Description
Many data quality mappings make use of reference data to validate and improve the quality of the input data. The main purposes of reference data are: To validate the accuracy of the data in question. For example, in cases where input data is verified against tables of known-correct data. To enrich data records with new data or enhance partially-correct data values. For example, in cases of address records that contain usable but incomplete postal information. (Typos can be identified and fixed; Plus-4 information can be added to zip codes.) When preparing to build data quality mappings, the Data Quality Developer must determine the types of dictionary and reference data sets that may be used in the data quality mappings, obtain approval to use third-party data, if necessary, and define a strategy for maintaining and distributing the reference data. An important factor in completing this task is the proper documentation of the required dictionary or reference data.

Prerequisites
None

Roles

Business Analyst (Secondary) Business Project Manager (Secondary) Data Quality Developer (Primary) Data Steward/Data Quality Steward (Secondary)

Considerations
Data quality mappings can make use of three different types of reference data. General Reference Data - General reference tables are typically installed as data tables to the Model repository and staging database. The out-of-the-box reference tables can be updated and managed via the Informatica Analyst tool. The tables contain information on common business terms from multiple countries. The types of reference information include telephone area codes, postcode formats, names, ID number formats, occupations, and acronyms. In addition to the out-of-the-box reference data, the Analyst tool gives users the ability to create and manage custom reference data sets. These data sets can be manually built, imported or created from Analyst tool profiling output. Address Reference Data Files Address reference data files contain validation information on postal addresses for a wide range of countries. The Address Validator transformation in Informatica Developer reads and applies this data. The Informatica Server Content installer must write the address reference data files to the file system of a machine that the Data Integration Service can access. This content is available from Informatica on a subscription basis. The contents of the address reference files cannot be edited. Identity Population Files Identify population files contain metadata on types of personal, household, and corporate identity. They also contain algorithms that apply the metadata to input data. The Match transformation and the Comparison transformation in the Informatica Developer tool use this data to parse potential identities from input fields. The Server Content installer writes the identity files to the file system of a machine that the Data Integration Service can access. These files are available from Informatica through separate licensing. If the Data Quality Developer feels that subscription reference data files are necessary, he or she must inform the Project Manager or other business personnel as soon as possible, as this is likely to effect (1) the project budget and (2) the software architecture implementation. INFORMATICA CONFIDENTIAL PHASE 5: BUILD 18 of 82

Managing Reference Data Sets


Managing reference data within Informatica Data Quality is a fairly straightforward process as the Informatica platform has built-in tools. Basic project level reference data is managed in the Informatica Analyst tool. Reference tables can be changed through the Analyst UI and rows can be added, deleted and updated. Reference tables can be added or deleted through Analyst as well, as project needs dictate. In addition to managing the reference data table content, permissions need to be managed as well. Project level permissions can be set in Informatica Developer. Permissions can be set so that users of other projects cannot access the general reference data of another project (however, address and population reference data types are shared across all projects). Examples of reference sets requiring limited permissions can include cross-references of pay ranges to job titles or bonus packages (which might appear in the migration or management of an HR system for example). Whenever a component that consumes reference data is added to a mapping, document (exactly what it is consuming and how) record the mapping name, the reference table name and the component instance that uses the reference data. Make sure to pass the inventory of reference data to all other personnel who are going to use the mapping. This is important for long term maintenance as well as progression through the development lifecycle. Reference data will need to be migrated through each stage of the development lifecycle. For example, it will need to be migrated from development to QA and QA to production so that each mapping calling it will have access. Along with migrating the reference data from environment to environment, additional steps will be required when IDQ mapplets are integrated with PowerCenter. The reference data must be exported and moved to a place the PowerCenter Integration Service can access. The inventory of required reference data should be used to ensure nothing is missed at each stage of the development lifecycle migration process. Data migration projects have additional reference data requirements which include a need to determine the valid values for key code fields and to ensure that all input data aligns with these codes. It is recommended to build valid value processes to perform this validation. It is also recommended to use a table driven approach to populate hard-coded values which then allows for easy changes if the specific hard-coded values change over time. Additionally, a large number of basic cross-references are also required for data migration projects. These data types are examples of reference data that should be planned for by using a specific approach to populate and maintain them with input from the business community. These needs can be met with a variety of Informatica products, but to expedite development, they must be addressed prior to building data integration processes.

Best Practices
Managing Internal and External Reference Data

Sample Deliverables
None

Last updated: 28-Oct-10 01:58

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

19 of 82

Phase 5: Build
Subtask 5.3.3 Design and Execute Data Enhancement Processes Description
This subtask, along with subtask 5.3.4 Design Run-time and Real-time Processes for Operate Phase Execution concerns the design and execution of the data quality mappings and mapplets that will prepare the project data for the Data Integration Design and Development in the Build Phase. While this subtask describes the creation and execution of mappings through Informatica Developer, subtask 5.3.4 focuses on the steps to deploy mappings and mapplets. However, there are several aspects to creating mappings primarily for deployment, and these are covered in 5.3.4. Users who are creating mappings should read both subtasks. Note: Informatica provides a user interface, the Developer tool, where mappings can be designed, tested, and deployed to the Data Integration Services in the domain. Developer has an intuitive user interface; however, the mappings that users construct in Developer can grow in size and complexity. Developer, like all software applications, requires user training and these subtasks are not a substitute for that training. Instead, they describe the rudiments of mapping construction, the elements required for various types of mappings, and the next steps to mapping deployment. Both subtasks assume that the Data Quality Developer will have received formal training.

Prerequisites
None

Roles

Data Quality Developer (Primary) Technical Project Manager (Approve)

Considerations
A data quality mapping is a discrete set of data analysis and/or enhancement operations with a data source and a data target. The design of a data quality mapping is quite similar to the design of a PowerCenter mapping. The data sources, targets, and analysis/enhancement transformations are represented on-screen by icons, much like the sources, targets, and transformations in a PowerCenter mapping. Sources, targets, and other transformations can be configured through a docked dialog box in a manner similar to PowerCenter. One difference between PowerCenter and Developer is that users cannot define workflows that contain serial data quality mappings, although this functionality can be replicated depending on the type of deployment selected. Data quality mappings can read source data from, and write data to files and databases. Most delimited, flat, or fixed-width file types are usable, as are DB2, Oracle, SQL Server databases and any database legible via ODBC. Informatica Data Quality stores mapping data in its own Model repository. The following figure illustrates a standard data quality mapping.

This data quality mapping shows a data source reading from a flat file physical data object, operational transformations analyzing the data, and a data target that receives the data as mapping output. A mapping can have any number of operational

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

20 of 82

transformations. Mappings can be designed to fulfill several data quality requirements, including data analysis, parsing, cleansing, standardization, enrichment, validation, matching, and consolidation. These are described in detail in the Best Practice Data Cleansing. When designing data quality mappings, the questions to consider include: What types of data quality rules are necessary to meet the needs of the project? The business should have already signed-off on specific data quality goals as a part of agreeing the overall project objectives, and the Data Quality Audit should have indicated the areas where the project data requires improvement. For example, the audit may indicate that the project data contains a high percentage of duplicate records, and therefore matching and pre-match grouping mappings may be necessary. What source data will be used for the mappings? This is related to the testing issue mentioned above. The final mappings that operate on the project data are likely to operate on the complete project dataset; in every case, the mappings will effect changes in the project data. Ideally, a complete clone of the project dataset should be available to the Data Quality Developer, so that the mappings can be designed and tested on a fully faithful version of the project data. At a minimum, a meaningful sample of the dataset should be replicated and made available for mapping design and test purposes. Does it make sense to create fewer mappings that implement many rules or more mappings that are rule specific? If the same data quality rule needs to be applied at multiple times or places, it may make sense to create these rules as individual mapplets or mappings which will make them easier to reuse. If a series of rules only needs to be applied to a single data set at a single point in time, it will typically make sense to group these into a single mapping. What test cycles are appropriate for the mappings? Testing and tuning mappings in Developer is a normal part of mapping development. The Data Quality Developer must be able to sign-off on each mapping as valid and executable. In many cases, the Data Steward may be required to sign off on output or cleansing applied to the data. Where will the mappings be deployed? A rule should be implemented roughly the same way regardless of the deployment type. There are a number of ways and places data quality mappings can be executed and deployed. Run from Developer The data quality mapping is run from the Developer tool. Runtime information is specified in Developer. Run characteristics and physical data object connections are specified in Developer as well. Running mappings from Developer is common during testing and for ad hoc requests, but it is not frequently the final deployment location for production work. Integrated into PowerCenter mappings The most common deployment mechanism for IDQ mappings is in PowerCenter. Data Quality mappings are exported from the Model repository and imported into the PowerCenter repository as mapplets. They are then placed in PowerCenter mappings and deployed in the same manner as other PowerCenter objects. In that manner, they can be scheduled and managed in the same manner as PowerCenter objects. Deploy it to the Data Integration Service Data Quality mappings can be deployed directly to the Data Integration Service. When this is done, an application is created that users can query against directly. Deploy it to a network file system Data Quality mappings can be deployed to a network file system typically for the purpose of checking it into a 3rd party version control system.

Best Practices
Effective Data Standardizing Techniques Effective Data Matching Techniques

Sample Deliverables
None
Last updated: 29-Oct-10 01:11

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

21 of 82

Phase 5: Build
Subtask 5.3.4 Design Run-time and Real-time Processes for Operate Phase Execution Description
This subtask, along with subtask 5.3.3 Design and Execute Data Enhancement Processes concerns the design and execution of the data quality mappings to prepare the project data for the Data Integration component of the Build Phase and possibly later phases. While subtask 5.3.3 describes the creation and execution of mappings through Informatica Developer, this subtask focuses on the steps to deploy data quality mappings and mapplets. There are several deployment options, and this document will focus on the most common, PowerCenter integration. Users who are creating mappings should read both subtasks. Common reasons IDQ mappings are deployed to PowerCenter include: IDQ mappings deployed to PowerCenter can be managed and deployed in the same manner as other PowerCenter objects. Processes can be scheduled to run on a reoccurring basis to maintain and report on data quality within the system. IDQ mappings can be scheduled to run without manual intervention. Long running processes do not require constant user observation and can be scheduled to run overnight. Run books can be created and series of objects can be organized and executed in a specific order through PowerCenter workflows. PowerCenter offers additional options for writing data to targets including bulk load options which can improve performance.

Prerequisites
None

Roles

Business Analyst (Review Only) Data Quality Developer (Primary) Technical Project Manager (Review Only)

Considerations
There are a number of factors to consider when deploying data quality mappings. Will the mapping be integrated into PowerCenter and deployed? This is the, or one of the, most common means of deploying data quality mappings. If data quality mappings are deployed in this manner, there are a number of factors to consider. Are all of the required source and target connections already created in PowerCenter or will they have to be created? When mappings are exported from IDQ and imported to PC as mapplets, their connection information is left behind. What were sources and targets in IDQ become input and output ports on the integrated mapplet in PowerCenter. Additional sources and connections may need to be added in PowerCenter. Are there multiple data quality mappings to integrate? If there are multiple mappings to integrate, create a run book to ensure they are executed in the proper order or integrated with the correct mappings in PowerCenter. For example, you will need to ensure a mapplet applying standardization rules is run before the data is fed to another for grouping and matching. What reference data or files will the mapping use? An important consideration during deployment to PowerCenter or other environments is whether the data quality objects require any reference data. Reference data tables or files If the integrated data quality objects call reference tables or files, those INFORMATICA CONFIDENTIAL PHASE 5: BUILD 22 of 82

data sets will need to be properly configured and located so that the PowerCenter Integration Services can access them. This will need to be done for every environment in which the objects are deployed. Subscription content If address validation transformations are used in integrated mappings, PowerCenter will need to be configured to point at the required reference data. This data will need to be visible from each environment the objects are deployed to. IMO Populations If the Identity Match Option is used, the required match populations will need to be accessible from each environment in which the object is deployed. Will an alternate method of deployment be used? The other means of deployment include deploying data quality plans directly to the Data Integration Service or a network drive. These means of deployment are less frequently used. If project requirements dictate their usage, the documentation should be consulted during the design stage to ensure requirements are addressed and the objects can be successfully deployed.

Best Practices
Real-Time Matching Using PowerCenter

Sample Deliverables
None

Last updated: 28-Oct-10 16:30

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

23 of 82

Phase 5: Build
Subtask 5.3.5 Develop Inventory of Data Quality Processes Description
When the Data Quality Developer has designed and tested the mappings to be used later in the project, he or she must then create an inventory of the mappings. This inventory should be as exhaustive as possible. Data quality mappings, once they achieve any size, can be hard for personnel other than the Data Quality Developer to read. Moreover, other project personnel and business users are likely to rely on the inventory to identify where the mapping functioned in the project.

Prerequisites
None

Roles

Data Quality Developer (Primary)

For each mapping created for use in the project (or for use in the Operate Phase and post-project scenarios), the inventory document should answer a number of questions. The questions can be divided into two sections: one relating to the mappings place and function relative to the project and its objectives, and the other relating to the mapping design itself. The questions below are a subset of those included in the sample deliverable document Data Quality Plan Design.

Considerations

Project-related Questions
What is the name of the mapping? What project is the mapping part of? Where does the mapping fit in the overall project? What particular aspect of the project does the mapping address? What are the objectives of the mapping? What issues, if any, apply to the mapping or its data? What department or group uses the mapping output? What are the predicted "before and after" states of the mapping data? Does a manual resolution process for bad or duplicate records follow mapping execution? Where is the mapping located (include machine details and folder location) and when was it executed? Is the mapping version-controlled? What are the creation/medatada details for the mapping? What steps were taken or should be taken following mapping execution?

Mapping Design-related Questions


INFORMATICA CONFIDENTIAL PHASE 5: BUILD 24 of 82

What are the specific data or business objectives of the mapping? Who ran (or should run) the mapping, and when? In what version of IDQ was the mapping designed? What Informatica application will run the mapping, and on which applications will the mapping run? Provide a screengrab of the mapping layout in the Developer user interface. What data source(s) are used? Where is the source located? What are the format and origin of the database table? Is the source data an output from another IDQ mapping, and if so, which one? Describe the activity of each component in the mapping. Component/transformation functionality can be described at a high level or low level, as appropriate. What reference files, dictionaries or tables are applied? What business rules are defined? This question can refer to the documented business rules from subtask 5.3.1 Design Data Quality Technical Rules. Provide the logical statements, if appropriate. What are the outputs for the instance, and how are they named? Where is the output written: report, database table, or file? Are there exception files? If so, where are they written? What is the next step in the project? Will the mapping(s) be re-used (e.g., in a runtime environment)? Who receives the mapping output data, and what actions are they likely to they take?

Best Practices
None

Sample Deliverables
None

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

25 of 82

Last updated: 26-Oct-10 20:49

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

26 of 82

Phase 5: Build
Subtask 5.3.6 Review and Package Data Transformation Specification Processes and Documents Description
In this subtask the Data Quality Developer collates all the documentation produced for the data quality operations thus far in the project and makes them available to the Project Manager, Project Sponsor, and Data Integration Developers in short, to all personnel who need them. The Data Quality Developer must also ensure that the data quality plans themselves are stored in locations known to and usable by the Data Integration Developers.

Prerequisites
None

Roles

Data Integration Developer (Secondary) Data Quality Developer (Primary) Technical Project Manager (Review Only)

Considerations
After the Data Quality Developer verifies that all data quality-related materials produced in the project are complete, he or she should hand them all over to other interested parties in the project. The Data Quality Developer should either arrange a handover meeting with all relevant project roles or ask the Data Steward to arrange such a meeting. The Data Quality Developer should consider making a formal presentation at the meeting and should prepare for a Q&A session before the meeting ends. The presentation may constitute a PowerPoint slide show and may include dashboard reports from data quality plans. The presentation should cover the following areas: Progress in treating the quality of the project data (before and after states of the data in the key data quality areas) Success stories, lessons learned Data quality targets: met or missed? Recommended next steps for project data Regarding data quality targets met or missed, the Data Quality Developer must be able to say whether the data operated on is now in a position to proceed through the rest of the project. If the Data Quality Developer believes that there are show stopper issues in the data quality, he or she must inform the business managers and provide an estimate of the work necessary to remedy the data issues. The business managers can then decide if the data can pass to the next stage of the project or if remedial action is appropriate. The materials that the Data Quality Developer must assemble include: Inventory of data quality plans (prepared in subtask 5.3.5 Develop Inventory of Data Quality Processes). Data Quality plan files (.pln or .xml files), or locations of the Data Quality repositories containing the plans. Details of backup data quality plans. (All Data Quality repositories containing final plans should be backed up.) Inventory of business rules used in the plans (prepared in subtask 5.3.1 Design Data Quality Technical Rules). Inventory of dictionary and reference files used in the plans (prepared in subtask 5.3.2 Determine Dictionary and Reference Data Requirements). Data Quality Audit results (prepared in task 2.8 Perform Data Quality Audit). Summary of task 5.3 Design and Build Data Quality Process.

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

27 of 82

Best Practices
Build Data Audit/Balancing Processes

Sample Deliverables
Data Quality Mapping Design

Last updated: 02-Nov-10 02:05

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

28 of 82

Phase 5: Build
Task 5.4 Design and Develop Data Integration Processes Description
A properly designed data integration process performs better and makes more efficient use of machine resources than a poorly designed process. This task includes the necessary steps for developing a comprehensive design plan for the data integration process, which incorporates high-level standards such as error-handling strategies, and overall load-processing strategies, as well as specific details and benefits of individual mappings. Many development delays and oversights are attributable to an incomplete or incorrect data integration process design, thus underscoring the importance of this task. When complete, this task should provide the development team with all of the detailed information necessary to construct the data integration processes with minimal interaction with the design team. This goal is somewhat unrealistic, however, because requirements are likely to change, design elements need further clarification, and some items are likely to be missed during the design process. Nevertheless, the goal of this task should be to capture and document as much detail as possible about the data integration processes prior to development.

Prerequisites
None

Roles

Business Analyst (Primary) Data Integration Developer (Primary) Data Warehouse Administrator (Secondary) Database Administrator (DBA) (Primary) Quality Assurance Manager (Primary) Technical Project Manager (Review Only)

Considerations
The PowerCenter platform provides facilities for developing and executing mappings for extraction, transformation and load operations. These mappings determine the flow of data between sources and targets, including the business rules applied to the data before it reaches a target. Depending on the complexity of the transformations, moving data can be a simple matter of passing data straight from a data source through an expression transformation to a target, or may involve a series of detailed transformations that use complicated expressions to manipulate the data before it reaches the target. The data may also undergo data quality operations inside or outside PowerCenter mappings; note also that some business rules may be closely aligned with data quality issues. (Pre-emptive steps to define business rules and to avoid data errors may have been performed already as part of task 5.3 Design and Build Data Quality Process.) It is important to capture design details at the physical level. Mapping specifications should address field sizes, transformation rules, methods for handling errors or unexpected results in the data, and so forth. This is the stage where business rules are transformed into actual physical specifications, avoiding the use of vague terms and moving any business terminology to a separate "business description" area. For example, a field that stores "Total Cost" should not have a formula that reads 'Calculate total customer cost.' Instead, the formula for 'Total Cost' should be documented as: Orders.Order_Qty * Item.Item_Price - Customer.Item_Discount where Order.Item_Num = Item.Item_Num and Order.Customer_Num = Customer.Customer_Num. Data Migration projects differ from typical data integration projects in that they should have an established process and templates for most processes that are developed. This is due to the fact that development is accelerated and more time is spent on data quality and driving out incomplete business rules then on traditional development. For migration projects the data integration processes can be further subdivided into the following processes:

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

29 of 82

Develop Acquire Processes Develop Convert Processes Develop Migrate/Load Processes Develop Audit Processes

Best Practices
Real-Time Integration with PowerCenter

Sample Deliverables
None

Last updated: 27-May-08 16:19

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

30 of 82

Phase 5: Build
Subtask 5.4.1 Design High Level Load Process Description
Designing the high-level load process involves the factors that must be considered outside of the mapping itself. Determining load windows, availability of sources and targets, session scheduling, load dependencies and session level error handling are all examples of issues that developers should deal with in this task. Creating a solid load process is an important part of developing a sound data integration solution. This subtask incorporates three steps, all of which involve specific activities, considerations, and deliverables. The steps are: 1. Identify load requirements . In this step, members of the development team work together to determine the load window. The load window is the amount of time it will take to load an individual table or an entire data warehouse or data mart. To begin this step, the team must have a thorough understanding of the business requirements developed in task 1.1 Define Project. The team should also consider the differences between the requirements for initial and subsequent loading; tables may be loaded differently in the initial load than they will subsequently. The load document generated in this step, describes the rules that should be applied to the session or mapping, in order to complete the loads successfully. 2. Determine dependencies . In this step, the Database Administrator works with the Data Warehouse Administrator and Data Integration Developer, to identify and document the relationships and dependencies that exist between tables within the physical database. These relationships affect the way in which a warehouse is loaded. In addition, the developers should consider other environmental factors, such as database availability, network availability, and other processes that may be executing concurrently with the data integration processes. 3. Create initial and ongoing load plan . In this step, the Data Integration Developer and Business Analyst use information created in the two earlier steps to develop a load plan document; this lists the estimated run times for the batches and sessions required to populate the data warehouse and/or data marts.

Prerequisites
None

Roles

Business Analyst (Review Only) Data Integration Developer (Primary) Data Warehouse Administrator (Secondary) Database Administrator (DBA) (Primary) Quality Assurance Manager (Approve) Technical Project Manager (Review Only)

Considerations Determining Load Requirements


The load window determined in step 1 of this subtask, can be used by the Data Integration Developers as a performance target. Mappings should be tailored to ensure that their sessions run to successful completion within the constraints set by the load window requirements document. The Database Administrator, Data Warehouse Administrator and Technical Architect are responsible for ensuring that their respective environments are tuned properly to allow for maximum throughput, to assist with this goal. Subsequent loads of a table are often performed differently than the initial load. For example, suppose the primary focus of a mapping is an update of a dimension. But in the first load of a warehouse, the dimension table has no data. The initial load of a table may involve the execution of a subset of the database operations used by subsequent loads. For example, if the primary focus of a mapping is an update of a dimension, before the first load of the warehouse, the dimension table will be empty. Consequently, the first load will perform a large number of inserts, while subsequent loads may perform a smaller number of both

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

31 of 82

insert and update operations. The development team should consider and document such situations and convey the different load requirements to the developer creating the mappings, and to the operations personnel configuring the sessions.

Identifying Dependencies
Foreign key (i.e., parent / child) relationships are the most common variable that should be considered in this step. When designing the load plan, the parent table must always be loaded before the child table, or integrity constraints (if applied) will be broken and the data load will fail. The Data Integration Developer is responsible for documenting these dependencies at a mapping level so that loads can be planned to coordinate with the existence of dependent relationships. The Developer should also consider and document other variables such as source and target database availability, network up/down time, and local server processes unrelated to PowerCenter when designing the load schedule.

TIP Load parent / child tables in the same mapping to speed development and reduce the number of sessions that must be managed. To load tables with parent / child relationships in the same mapping, use the constraint-based loading option at the session level. Use the target load plan option in PowerCenter Designer to ensure that the parent table is marked to be loaded first. The parent table keys will be loaded before an associated child foreign key is loaded into its table.
The load plans should be designed around the known availability of both source and target databases; it is particularly important to consider the availability of source systems, as these systems are typically beyond the operational control of the development team. Similarly, if sources or targets are located across a network, the development team should consult with the Network Administrator to discuss network capacity and availability in order to avoid poorly performing batches and sessions. Finally, although unrelated local processes executing on the server are not likely to cause a session to fail, they can severely decrease performance by keeping available processors and memory away from the PowerCenter server engine, thereby slowing throughput and possibly causing a load window to be missed.

Best Practices
Event Based Scheduling Leveraging PowerCenter Concurrent Workflows Performing Incremental Loads

Sample Deliverables
None

Last updated: 17-Jun-10 15:26

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

32 of 82

Phase 5: Build
Subtask 5.4.2 Develop Error Handling Strategy Description
After the high-level load process is outlined and source files and tables are identified, a decision needs to be made regarding how the load process will account for data errors. The identification of a data error within a load process is driven by the standards of acceptable data quality. The identification of a process error is driven by the stability of the process itself. It is unreasonable to expect any source system to contain perfect data. It is also unreasonable to expect any automated load process to execute correctly 100 percent of the time. Errors can be triggered by any number of events or scenarios, including session failure, platform constraints, bad data, time constraints, mismatched control totals, dependencies, or server availability. The challenge in implementing an error handling strategy is to design mappings and load routines robust enough to handle any or all possible scenarios or events that may trigger an error during the course of the load process. The degree of complexity of the error handling strategy varies from project to project, depending on such variables as source data, target system, business requirements, load volumes, load windows, platform stability, end-user environments, and reporting tools. The error handling development effort should include all the work that needs to be performed to correct errors in a reliable, timely, and automated manner. Several types of tasks within the Workflow Manager are designed to assist in error handling. The following is a subset of these tasks: Command Task allows the user to specify one or more shell commands to run during the workflow. Control Task allows the user to stop, abort, or fail the top-level workflow or the parent workflow based on an input-link condition. Decision Task allows the user to enter a condition that determines the execution of the workflow. This task determines how the PowerCenter Integration Service executes a workflow. Event Task specifies the sequence of task execution in a workflow. The event is triggered based on the completion of the sequence of tasks. Timer Task allows the user to specify the period of time to wait before the Integration Service executes the next task in the workflow. The user can choose to either set a specific time and date to start the next task or wait a period of time after the start time of another task. Email Task allows the user to configure email to be set to an administrator or business owner in the event that an error is encountered by a workflow task. The Data Integration Developer is responsible for determining: What data gets rejected, Why the data is rejected, When the rejected rows are discovered and processed, How the mappings handle rejected data, and Where the rejected data is written. Data integration developers should find an acceptable balance between the end users' needs for accurate and complete information and the cost of additional time and resources required to repair errors. The Data Integration Developer should consult closely with the Data Quality Developer in making these determinations, and include in the discussion the outputs from tasks 2.8 Perform Data Quality Audit and 5.3 Design and Build Data Quality Process.

Prerequisites
None

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

33 of 82

Roles

Data Integration Developer (Primary) Database Administrator (DBA) (Secondary) Quality Assurance Manager (Approve) Technical Project Manager (Review Only)

Considerations
Data Integration Developers should address the errors that commonly occur during the load process in order to develop an effective error handling strategy. These errors include: Session Failure. If a PowerCenter session fails during the load process, the failure of the session itself needs to be recognized as an error in the load process. The error handling strategy commonly includes a mechanism for notifying the process owner that the session failed, whether it is in the form of a message to a pager from operations or a postsession email from a PowerCenter Integration Service. There are several approaches to handling session failures within the Workflow Manager. These include custom-written recovery routines with pre- and post- session scripts, workflow variables such as the pre-defined task-specific variables or user-defined variables, and event tasks (e.g., the eventraise task and the event-wait task) can be used to start specific tasks in reaction to a failed task. Data Rejected by Platform Constraints. A load process may reject certain data if the data itself does not comply with database and data type constraints. For instance: The database server will reject a row if the primary key field(s) of that row already exists in the target. A PowerCenter Integration Service will reject a row if a date/time field is sent to a character field without implicitly converting the data. In both of these scenarios, the data will be rejected regardless of whether or not it was accounted for in the code. Although the data is rejected without developer intervention, accounting for it remains a challenge. In the first scenario, the data will end up in a reject file on the PowerCenter server. In the second scenario, the row of data is simply skipped by the Data Transformation Manager (DTM) and is not written to the target or to any reject file. Both scenarios require post-load reconciliation of the rejected data. An error handling strategy should account for data that is rejected in this manner; either by parsing reject files or balancing control totals. "Bad" Data. Bad data can be defined as data that enters the load process from one or more source systems, but is prevented from entering the target systems, which are typically staging areas, end-user environments, or reporting environments. This data can be rejected by the load process itself or designated as "bad" by the mapping logic created by developers. Some of the reasons that bad data may be encountered between the time it is extracted from the source systems and the time it is loaded to the target include: The data is simply incorrect. The data violates business rules. The data fails on foreign key validation. The data is converted improperly in a transformation. The strategy that is implemented to handle these types of errors determines what data is available to the business as well as the accuracy of that data. This strategy can be developed with PowerCenter mappings, which flag records within the data flow for success or failure, based on the data itself and the logic applied to that data. The records flagged for success are written to the target while the records flagged for failure are written to a reject file or table for reconciliation. Data Rejected by Time Constraints Load windows are typically pre-defined before data is moved to the target system. A load window is the time that is allocated for a load process to complete (i.e., start to finish) based on data volumes, business hours, and user requirements. If a load process does not complete within the load window, INFORMATICA CONFIDENTIAL PHASE 5: BUILD 34 of 82

notification and data that has not been committed to the target system must be incorporated in the error handling strategy. Notification can take place via operations, email, or page. Data that has not been loaded within the window can be written to staging areas or processed in recovery mode at a later time. Irreconcilable Control Totals. One way to ensure that all data is being loaded properly is to compare control totals captured on each session. Control totals can be defined as detailed information about the data that is being loaded in a session. For example, how many records entered the job stream? How many records were written to target X? How many records were written to target Y? A post-session script can be launched to reconcile the total records read into the job stream with the total numbers written to the target(s). If the number in does not match the number out, there may have been an error somewhere in the load process. To a degree, the PowerCenter session logs and repository tables store this type of information. Depending on the level of detail desired to capture control totals, some organizations run post-session reports against the repository tables and parse the log files. Others, wishing to capture more in-depth information about their loads, incorporate control totals in their mapping logic, spinning off check sums, row counts, and other calculations during the load process. These totals are then compared to figures generated by the source systems, triggering notification when numbers do not match up. The pmcmd command, gettaskdetails, provides information to assist in analyzing the loads. Issuing this command for a session task returns various data regarding a workflow, including the mapping name, session log file name, first error code and message, number of successful and failed rows from the source and target, and the number of transformation errors. Job Dependencies. Sessions and workflows can be configured to run based on dependencies. For example, the start of a session can be dependent on the availability of a source file. Or a batch may have sessions embedded that are dependent on each other's completion. If a session or batch fails at any point in the load process because of a dependency violation, the error handling strategy should catch the problem.

The use of events allows the sequence of execution within a workflow to be specified; an event raised on completion of one set of tasks, triggering the initiation of another. There are two event tasks that can be included in a workflow: event-raise and event-wait tasks. An event-wait task instructs the Integration Service to wait for a specific event to be raised before continuing with the workflow, while an event-raise task triggers an event at a particular point in a workflow. Events themselves can either be defined by the user or predefined (i.e., a file watch event) by PowerCenter.
Server Availability. If a node is unavailable at runtime, any sessions and workflows scheduled on it will not be run if it is the only resource configured within a domain. Similarly, if a PowerCenter Integration Service goes down during a load process, the sessions and workflows currently running on it will fail if it is the only service configured in the domain. Problems such as this are usually directly related to the stability of the server platform; network interrupts do happen, database servers do occasionally go down, and log/file space can inadvertently fill up. A thorough error handling strategy should assess and account for the probability of services not being available 100 percent of the time. This strategy may vary considerably depending on the PowerCenter configuration employed. For example, PowerCenter's High Availability options can be harnessed to eliminate many single points of failure within a domain and can help to ensure minimal service interruption.

Ensure Data Accuracy and Integrity


In addition to anticipating the common load problems, developers need to investigate potential data problems and the integrity of source data. One of the main goals of the load process is to ensure the accuracy of the data that is committed to the target systems. Because end users typically build reports from target systems and managers make decisions based on their content, the data in these systems must be sufficiently accurate to provide users with a level of confidence that the information they are viewing is correct. The accuracy of the data, before any logic is applied to it, is dependent on the source systems from which it is extracted. It is important, therefore, for developers to identify the source systems and thoroughly examine the data in them. task 2.8 Perform Data Quality Audit is specifically designed to establish such knowledge about project data quality, and task 5.3 Design and Build Data Quality Process is designed specifically to eliminate data quality problems as far as possible before data enters the Build

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

35 of 82

Phase of the project. In the absence of dedicated data quality steps such as these, one approach is to estimate, along with source owners and data stewards, how much of the data is still bad (vs. good) on a column-by-column basis, and then to determine which data can be fixed in either the source or the mappings, and which does not need to be fixed before it enters the target. However, the former approach is preferable as it (1) provides metrics to business and project personnel and (2) provides an effective means of addressing data quality problems. Data Integrity deals with the internal relationships of the data in the system and how those relationships are maintained (i.e., data in one table must match corresponding data in another table). When relationships cannot be maintained because of incorrect information entered from the source systems, the load process needs to determine if processing can continue or if the data should be rejected. Including lookups in a mapping is a good way of checking for data integrity. Lookup tables are used to match and validate data based upon key fields. The error handling process should account for the data that does not pass validation. Ideally, data integrity issues will not arise since the data has already been processed in the steps described in task 4.6.

Determine Responsibility For Data Integrity/Business Data Errors


Since it is unrealistic to expect any source system to contain data that is 100 percent accurate, it is essential to assign the responsibilities of correcting data errors. Taking ownership of these responsibilities throughout the project is vital to correcting errors during the load process. Specifically, individuals should be held accountable for: Providing business information Understanding the data layout Data stewardship (understanding the meaning and content of data elements) Delivering accurate data Part of the load process validates that the data conforms to known rules from the business. When these rules are not met by the source system data, the process should handle these exceptions in an appropriate manner. End users should either accept the consequences of permitting invalid data to enter the target system or they should choose to reject the invalid data. Both options involve complex issues for the business organization. The individuals responsible for providing business information to the developers must be knowledgeable and experienced in both the internal operations of the organization and the common practices of the relevant industry. It is important to understand the data and functionality of the source systems as well as the goals of the target environment. If developers are not familiar with the business practices of the organization, it is practically impossible to make valid judgments about which data should be allowed in the target system and which data should be flagged for error handling. The primary purpose for developing an error handling strategy is to prevent data that inaccurately portrays the state of the business from entering the target system. Providers of business information play a key role in distinguishing good data from bad. The individuals responsible for maintaining the physical data structures play an equally crucial role in designing the error handling strategy. These individuals should be thoroughly familiar with the format, layout, and structure of the data. After understanding the business requirements, developers must gather data content information from the individuals that have firsthand knowledge of how the data is laid out in the source systems and how it is to be presented in the target systems. This knowledge helps to determine which data should be allowed in the target system based on the physical nature of the data as opposed to the business purpose of the data. Data stewards, or their equivalent, are responsible for the integrity of the data in and around the load process. They are also responsible for maintaining translation tables, codes, and consistent descriptions across source systems. Their presence is not always required, depending on the scope of the project, but if a data steward is designated, he or she will be relied upon to provide developers with insight into such things as valid values, standard codes, and accurate descriptions. This type of information, along with robust business knowledge and a degree of familiarity with the data architecture, will give the Build team the necessary level of confidence to implement an error handling strategy that can ensure the delivery of accurate data to the target system. Data stewards are also responsible for correcting the errors that occur during the load process and in their field of expertise. If, for example, a new code is introduced from the source system that has no equivalent in a translation table, it should be flagged and presented to the data steward for review. The data steward can determine if the code should be in the translation table, and if it should have been flagged for error. The goal is to have the developers design the error handling process according to the information provided by the experts. The error handling process should recognize the errors and report INFORMATICA CONFIDENTIAL PHASE 5: BUILD 36 of 82

them to the owners with the relevant expertise to fix them. For Data Migration projects, it is important to develop a standard method to track data exceptions. Normally this tracking data is stored in a relational database with a corresponding set of exception reports. By developing this important standardized strategy, all data cleansing and data correction development will be expedited due to having a predefined method of determining what exceptions have been raised and which data caused the exception.

Best Practices
Disaster Recovery Planning with PowerCenter HA Option Error Handling Process Error Handling Strategies - B2B Data Transformation Error Handling Strategies - Data Warehousing Error Handling Strategies - General Error Handling Techniques - PowerCenter Mappings Error Handling Techniques - PowerCenter Workflows and Data Analyzer

Sample Deliverables
None

Last updated: 24-Jun-10 12:59

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

37 of 82

Phase 5: Build
Subtask 5.4.3 Plan Restartability Process Description
The process of updating a data warehouse with new data is sometimes described as "conducting a fire drill". This is because it often involves performing data updates within a tight timeframe, taking all or part of the data warehouse off-line while new data is loaded. While the update process is usually very predictable, it is possible for disruptions to occur, stopping the data load in mid-stream. To minimize the amount of time required for data updates and further ensure the quality of data loaded into the warehouse, the development team must anticipate and plan for potential disruptions to the loading process. The team must design the data integration platform so that the processes for loading data into the warehouse can be restarted efficiently in the event that they are stopped or disrupted.

Prerequisites
None

Roles

Data Integration Developer (Primary) Database Administrator (DBA) (Secondary) Quality Assurance Manager (Approve) Technical Project Manager (Review Only)

Considerations
Providing backup schemas for sources and staging areas for targets is one step toward improving the efficiency with which a stopped or failed data loading process can be restarted. Source data should not be changed prior to restarting a failed process, as this may cause the PowerCenter server to return missing or repeat values. A backup source schema allows the warehouse team to store a snapshot of source data, so that the failed process can be restarted using its original source. Similarly, providing a staging area for target data gives the team the flexibility of truncating target tables prior to restarting a failed process, if necessary. If flat file sources are being used, all sources should be date-stamped and stored until the loading processes using those sources that have successfully completed. A script can be incorporated into the data update process to delete or move flat file sources only upon successful completion of the update. A second step in planning for efficient restartability is to configure PowerCenter sessions so that they can be easily recovered. Sessions in workflows manage the process of loading data into a data warehouse.

TIP You can configure the links between sessions to only trigger downstream sessions upon success status. Also, PowerCenter versions 6 and above have the ability to configure a Workflow to Suspend on Error. This places the workflow in a state of suspension, so that the environmental problem can be assessed and fixed, while the workflow can be resumed from the point of suspension. Follow these steps to identify and create points of recovery within a workflow:
1. Identify the major recovery points in the workflow. For example, suppose a workflow has tasks

A,B,C,D,E that run in sequence. In this workflow, if a failure occurs at task A, you can restart the task and the workflow will automatically recover. Since session A is able to recover by merely restarting it, it is not a major recovery point. On the other hand, if task B fails, it will impact data integrity or subsequent runs. This means that task B is a major recovery point. All tasks that may impact data integrity or subsequent runs should be recovery points. 2. Identify the strategy for recovery
INFORMATICA CONFIDENTIAL PHASE 5: BUILD

38 of 82

Build restorability in mapping. If data extraction from source is datetime-driven, create a delete path within the mapping and run the workflow in suspend mode. When configuring sessions, if multiple sessions are to be run, arrange the sessions in a sequential manner within a workflow. This is particularly important if mappings in later sessions are dependent on data created by mappings in earlier sessions. Include transaction controls in mappings. Create a copy of the workflow and create session-level override and start-from date where recovery is required. One option is to delete records from the target and restart the process. Also, in some cases a special mapping that has a filter on the source may be required. This filter should be based on the recovery date or other relevant criteria.
3. Use the high availability feature in PowerCenter.

Other Ways to Design Restartability


PowerCenter Workflow Manager provides the ability to use post-session emails or to create email tasks to send notification to designated recipients informing them about a session run. Configure sessions so that an email is sent to the Workflow Operator when a session or workflow fails. This allows the operator to respond to the failed session as soon as possible. On the session property screen, configure the session to stop if errors occur in pre-session scripts. If the session stops, review and revise scripts as necessary. Determine whether or not a session really needs to be run in bulk mode. Successful recovery on a bulk-load session is not guaranteed, as bulk loading bypasses the database log. While running a session in bulk load can increase session performance, it may be easier to recover a large, normal loading session, rather than truncating targets and re-running a bulk-loaded session. If a session stops because it has reached a designated number of non-fatal errors (such as Reader, Writer, or DTM errors), consider increasing the possible number of non-fatal errors allowed, or de-selecting the "Stop On" option in the session property screen. Always be sure to examine log files when a session stops, and research and resolve potential reasons for the stop. Data Migration Projects often have a need to migrate significant volumes of data. Due to this fact, re-start processing should be considered in the Architect Phase and throughout the Design Phase and Build Phase. In many cases a full refresh is the best course of action. However, if large amounts of data need to be loaded, then the final load processes should include restart processing design which should be prototyped during the Architect Phase. This will limit the amount of time lost if any large-volume load fails.

Best Practices
Disaster Recovery Planning with PowerCenter HA Option

Sample Deliverables
None

Last updated: 04-Dec-07 18:16

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

39 of 82

Phase 5: Build
Subtask 5.4.4 Develop Inventory of Mappings & Reusable Objects Description
The next step in designing the data integration processes is breaking the development work into an inventory of components. These components then become the work tasks that are divided among developers and subsequently unit tested. Each of these components would help further refine the project plan by adding the next layer of detail for the tasks related to the development of the solution.

Prerequisites
None

Roles

Data Integration Developer (Primary)

Considerations
The smallest divisions of assignable work in PowerCenter are typically mappings and reusable objects. The Inventory of Reusable Objects and Inventory of Mappings created during this subtask are valuable high-level lists of development objects that need to be created for the project. Naturally, the lists will not be completely accurate at this point; they will be added to and subtracted from over the course of the project and should be continually updated as the project moves forward. Despite the ongoing changes, however, these lists are valuable tools, particularly from the perspective of the lead developer and project manager, because the objects on these lists can be assigned to individual developers and their progress tracked over the course of the project. A common mistake is to assume a source to target mapping document equates to a single mapping. This is often not the case. To load any one target table it might easily take more than one mapping to perform all of the needed tasks to correctly populate the table. Assume the case of loading a Data Warehouse Dimension table for which you have one source to target matrix document. You might then generate a : Source to Staging Area Mapping (Incremental) Data Cleansing and Rationalizaiton Mapping Staging to Warehouse Update/Insert Mapping Primary Key extract (Full extract of Primary Keys used in the delete mapping) Logical Delete Mapping (Mark dimension records as deleted if they no longer appear in source) It is important to break down the work into this level of detail because from the list above, you can see how a single source to target matrix may generate 5 separate mappings that could each be developed by different developers. From a project planning perspective, it is then useful to track each of these 5 mappings separately for status and completion. Also included in your mapping inventory are the special purpose mappings that are involved in the end to end process but not specifically defined by the business requirements and source to target matrixes. These would include audit mappings, aggregate mappings, mapping generation mappings, templates and other objects that will need to be developed during the build phase. For reusable objects, it is important to keep a holistic view of the project in mind when determining which objects are reusable and which ones are custom built. Sometimes an object that would seem sharable across any mapping making use of it, may need different versions depending on purpose. Having a list of the common objects that are being developed across the project allows individual developers to better plan their mapping level development efforts. By knowing that a particular mapping is going to utilize 4 reusable objects - they can focus on the unique work to that particular mapping and not duplicate the same functionality of the 4 reusable objects. This is another

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

40 of 82

area where Metadata Manager can become very useful for developers who want to do where used analysis for objects. As a result of the processes and tools implemented during the project, developers can achieve communication and coordination to improve productivity.

Best Practices
Working with Pre-Built Plans in Data Cleanse and Match

Sample Deliverables
Mapping Inventory

Last updated: 01-Feb-07 18:47

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

41 of 82

Phase 5: Build
Subtask 5.4.5 Design Individual Mappings & Reusable Objects Description
After the Inventory of Mappings and Inventory of Reusable Objects is created, the next step is to provide detailed design for each object on each list. The detailed design should incorporate sufficient detail to enable developers to complete the task of developing and unit testing the reusable objects and mappings. These details include specific physical information, down to the table, field, and datatype level, as well as error processing and any other information requirements identified.

Prerequisites
None

Roles

Business Analyst (Secondary) Data Integration Developer (Primary)

Considerations
A detailed design must be completed for each of the items identified in the Inventory of Mapping and Inventory of Reusable Objects. Developers use the documents created in subtask 5.4.4 Develop Inventory of Mappings & Reusable Objects to construct the mappings and reusable objects, as well as any other required processes.

Reusable Objects
Three key items should be documented for the design of reusable objects: inputs, outputs, and the transformations or expressions in between. Developers who have a clear understanding of what reusable objects are available are likely to create better mappings that are easy to maintain. For the project, consider creating a shared folder for common objects like sources, targets, and transformations. When you want to use these objects, you can create shortcuts that point to the object. Document the process and the available objects in the shared folder. In a multi-developer environment, assign a developer the task of keeping the objects organized in the folder, and updating sources and targets when appropriate. It is crucial to document reusable objects, particularly in a multi-developer environment. For example, if one developer creates a mapplet that calculates tax rate, the other developers must understand the mapplet in order to use it properly. Without documentation, developers have to browse through the mapplet objects to try to determine what the mapplet is doing. This is time consuming and often overlooks vital components of the mapplet. Documenting reusable objects provides a comprehensive overview of the workings of relevant objects and helps developers determine if an object is applicable in a specific situation.

Mappings
Before designing a mapping, it is important to have a clear picture of the end-to-end processes that the data will flow through. Then, design a high-level view of the mapping and document a picture of the process within the mapping, using a textual description to explain exactly what the mapping is supposed to accomplish and the methods or steps it follows to accomplish its goal. After the high-level flow has been established, it is important to document pre-mapping logic. Special joins for the source, filters, or conditional logic should be made clear upfront. The data being extracted from the source system dictates how the developer implements the mapping. Next, document the details at the field level, listing each of the target fields and the source field(s) that are used to create the target field. Document any expression that may take place in order to generate the target field (e.g., a sum of a field, a multiplication of two fields, a comparison of two fields, etc.). Whatever the rules, be sure to document them and remember to keep it at a physical level. The designer may have to do some investigation at this point for business rules as well. For example, the business rules may say, "For active customers, calculate a late fee rate". The designer of the mapping must determine that, on a physical level, that translates to 'for customers with an ACTIVE_FLAG of "1", multiply the DAYS_LATE field by the LATE_DAY_RATE field'. Document any other information about the mapping that is likely to be helpful in developing the mapping. Helpful information INFORMATICA CONFIDENTIAL PHASE 5: BUILD 42 of 82

may, for example, include source and target database connection information, lookups and how to match data in the lookup tables, data cleansing needed at a field level, potential data issues at a field level, any known issues with particular fields, pre or post mapping processing requirements, and any information about specific error handling for the mapping. The completed mapping design should then be reviewed with one or more team members for completeness and adherence to the business requirements. The design document should be updated if the business rules change or if more information is gathered during the build process. The mapping and reusable object detailed designs are a crucial input for building the data integration processes, and can also be useful for system and unit testing. The specific details used to build an object are useful for developing the expected results to be used in system testing. For Data Migrations, often the mappings are very similar for some of the stages; such as populating the reference data structures, acquiring data from the source, loading the target and auditing the loading process. In these cases, it is likely that a detailed template is documented for these mapping types. For mapping specific alterations such as converting data from source to target format, individual mapping designs may be created. This strategy reduces the sheer documentation required for the project, while still providing sufficient detail to develop the solution.

Best Practices
Key Management in Data Warehousing Solutions Mapping Design Mapping Templates

Sample Deliverables
None

Last updated: 17-Jun-10 15:18

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

43 of 82

Phase 5: Build
Subtask 5.4.6 Build Mappings & Reusable Objects Description
With the analysis and design steps complete, the next priority is to put everything together and build the data integration processes, including the mappings and reusable objects. Reusable objects can be very useful in the mapping building process. By this point, most reusable objects should have been identified, although the need for additional objects may become apparent during the development work. Commonly-used objects should be put into a shared folder to allow for code reuse via shortcuts. The mapping building process also requires adherence to naming standards, which should be defined prior to beginning this step. Developing, and consistently using, naming standards helps to ensure clarity and readability for the original developer and reviewers, as well as for the maintenance team that inherits the mappings after development is complete. In addition to building the mappings, this subtask involves updating the design documents to reflect any changes or additions found necessary to the original design. Accurate, thorough documentation helps to ensure good knowledge transfer and is critical to project success. Once the mapping is completed, a session must be made for the mapping in Workflow Manager. A unit testing session can be created initially to test that the mapping logic is executing as designed. To identify and troubleshoot problems in more detail, the debug feature may be leveraged; this feature is useful for looking at the data as it flows through each transformation. Once the initial session testing proves satisfactory, then pre- and post-session processes and session parameters should be incorporated and tested (if needed), so that the session and all of its processes are ready for unit testing.

Prerequisites
None

Roles

Data Integration Developer (Primary) Database Administrator (DBA) (Secondary)

Considerations
Although documentation for building the mapping already exists in the design document, it is extremely important to document the sources, targets, and transformations in the mapping at this point to help end users understand the flow of the mapping and ensure effective knowledge transfer. Importing the sources and targets is the first step in building a mapping. Although the targets and sources are determined during the Design Phase the keys, fields, and definitions should be verified in this subtask to ensure that they correspond with the design documents.

TIP When data modeling or database design tools (e.g., CA ERwin, Oracle Designer/2000, or Sybase PowerDesigner) are used in the design phase, Informatica PowerPlugs can be helpful for extracting the data structure definitions of source and target sources. Metadata Exchange for Data Models extract table, column, index and relationship definitions, as well as descriptions from a data model. This can save significant time because the PowerPlugs also import documentation and help users to understand the source and target structures in the mapping. For more information about Metadata Exchange for Data Models. PowerPlugs, refer to Informatica's web site (www.informatica.com) or the Metadata Exchange for Data Models' manuals.
The design documents may specify that data can be obtained from numerous sources, including DB/2, Informix, SQL Server, Oracle, Sybase, ASCII/EBCDIC flat files (including OCCURS and REDEFINES), Enterprise Resource Planning (ERP) applications, and mainframes via PowerExchange data access products. The design documents may also define the use of target schema and specify numerous ways of creating the target schema. Specifically, target schema may be created:

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

44 of 82

From scratch. From a default schema that is then modified as desired. With the help of the Cubes and Dimensions wizard (for multidimensional data models). By reverse-engineering the target from the database. With Metadata Exchange for Data Models.

TIP When creating sources and targets in PowerCenter Designer, be sure to include a description of the source/target in the object's comment section, and follow the appropriate naming standards identified in the design documentation (for additional information on source and target objects, refer to the PowerCenter User Guide).
Reusable objects are useful when standardized logic is going to be used in multiple mappings. A single reusable object is referred to as a mapplet. Mapplets represent a set of transformations and are constructed in the Mapplet Designer, much like creating a "normal" mapping. When mapplets are used in a mapping, they encapsulate logic into a single transformation object, making the flow of a mapping easier to understand. However, because the mapplets hide their underlying logic, it is particularly important to carefully document their purpose and function. Other types of reusable objects, such as reusable transformations, can also be very useful in mapping. When reusable transformations are used with mapplets, they facilitate the overall mapping maintenance. Reusable transformations can be built in either of two ways: If the design specifies that a transformation should be reusable, it can be created in the Transformation Developer, which automatically creates reusable transformations. If shared logic is not identified until it is needed in more than one mapping, transformations created in the Mapping Designer can be designated as reusable in the Edit Transformation dialog box. Informatica recommends using this method with care however, because after a transformation is changed to reusable, the change cannot be undone. Changes to a reusable transformation are reflected immediately in all mappings that employ the transformation. When all the transformations are complete, everything must be linked together (as specified in the design documentation) and arrangements made to begin unit testing.

Best Practices
Data Connectivity using PowerCenter Connect for BW Integration Server Data Connectivity using PowerExchange for SAP NetWeaver Data Connectivity using PowerExchange for WebSphere MQ Data Connectivity using PowerExchange for Web Services Integrating Data Quality Mappings with PowerCenter Development FAQs Using Parameters, Variables and Parameter Files Using Shortcut Keys in PowerCenter Designer Working with JAVA Transformation Object Mapping Auto-Generation Mapping SDK

Sample Deliverables
None

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

45 of 82

Last updated: 30-Oct-10 18:43

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

46 of 82

Phase 5: Build
Subtask 5.4.7 Perform Unit Test Description
The success of the solution rests largely on the integrity of the data available for analysis. If the data proves to be flawed, the solution initiative is in danger of failure. Complete and thorough unit testing is, therefore, essential to the success of this type of project. Within the presentation layer, there is always a risk of performing less than adequate unit testing. This is due primarily to the iterative nature of development and the ease with which a prototype can be deployed. Experienced developers are, however, quick to point out that data integration solutions and the presentation layers should be subject to more rigorous testing than transactional systems. To underscore this point, consider which poses a greater threat to an organization: sending a supplier an erroneous purchase order or providing a corporate vice president with flawed information about that supplier's ranking relative to other strategic suppliers?

Prerequisites
None

Roles

Business Analyst (Review Only) Data Integration Developer (Primary)

Considerations
Successful unit testing examines any inconsistencies in the transformation logic and ensures correct implementation of the error handling strategy. The first step in unit testing is to build a test plan (see Unit Test Plan). The test plan should briefly discuss the coding inherent in each transformation of a mapping and elaborate on the tests that are to be conducted. These tests should be based upon the business rules defined in the design specifications rather than on the specific code being tested. If unit tests are based only upon the code logic, they run the risk of missing inconsitencies between the actual code and the business rules defined during the Design Phase. If the transformation types include data quality transformations (that is, transformations designed on the Data Quality Integration transformation that links to Informatica Data Quality (IDQ) software) then the data quality processes (or plans) defined in IDQ are also candidates for unit testing. Good practice holds that all data quality plans that are going to be used on project data whether as part of a PowerCenter transfomation or a discrete process should be tested before formal use on such data. Consider establishing a discrete unit test stage for data quality plans. Test data should be available from the initial loads of the system. Depending on volumes, a sample of the initial load may be appropriate for development and unit testing purposes. It is important to use actual data in testing since test data does not necessarily cover all of the anomalies that are possible with true data, and creating test data can be very time consuming. However, depending upon the quality of the actual data used, it may be necessary to create test data in order to test any exception, error, and/or value threshold logic that may not be triggered by actual data. While it is possible to analyze test data without tools, there are many good tools available for creating and manipulating test data. Some are useful in editing data in a flat file, and most all offer some improvements in productivity. A detailed test script is essential for unit testing; the test scripts indicate the transformation logic being tested by each test record and should contain an expected result for each record.

TIP Session log tracing can be set in a mapping's transformation level, in a session's "Mapping" tab, or in a session's "Config Object" tab. For testing, it is generally good practice to override logging in a session's "Mapping" tab transformation properties. For instance, if you are testing the logic performed in a Lookup transformation, create a test session and only activate verbose data logging on the appropriate Lookup. This focuses the log file on the unit test at hand.

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

47 of 82

If you change the tracing level in the mapping itself, you will have to go back and modify the mapping after the testing has been completed. If you override tracing in a session's "Config Object" tab properties, this will affect all transformation objects in the mapping and potentially create a significantly larger session log to parse. It is also advisable to activate the test load option in the PowerCenter session properties and indicate the number of test records that are to be sourced. This ensures that the session does not write data to the target tables. After running a test session, analyze and document the actual results compared to the expected results outlined in the test script. Running the mapping in the Debugger also allows you to view the target data without the session writing data to the target tables. You can then document the actual results as compared to the expected results outlined in the test script. The ability to change the data running through the mapping while in debug mode is an extremely valuable tool because it allows you to test all conditions and logic as you step through the mapping, thereby ensuring appropriate results.
The first session should load test data into empty targets. After checking for errors from the initial load, a second run of test data should occur if the business requirements demand periodic updates to the target database. A thorough unit test should uncover any transformation flaws and document the adjustments needed to meet the data integration solution's business requirements.

Best Practices
None

Sample Deliverables
Defect Log Defect Report Test Condition Results

Last updated: 01-Feb-07 18:47

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

48 of 82

Phase 5: Build
Subtask 5.4.8 Conduct Peer Reviews Description
Peer review is a powerful technique for uncovering and resolving issues that otherwise would be discovered much later in the development process (i.e., during testing) when the cost of fixing is likely to be much higher. The main types of object that can be subject to formal peer review are: documents, code, and configurations.

Prerequisites
None

Roles

Data Integration Developer (Secondary) Quality Assurance Manager (Primary)

Considerations
The peer review process encompasses several steps, which vary depending on the object (i.e., document, code, etc.) being reviewed. In general, the process should include these steps: When an author confirms that an object has reached a suitable stage for review, he or she communicates this to the Quality Assurance Manager, who then schedules a review meeting. The number of reviewers at the meeting depends on the type of review, but should be limited to the minimally acceptable number. For example, the review meeting for a design document may include the business analyst who specified the requirements, a design authority, and one or two technical experts in the particular design aspects. It is a good practice to select reviewers with a direct interest in the deliverable. For example, the DBA should be involved in reviewing the logical data model to ensure that he/she has sufficient information to conduct the physical design. If possible, appropriate documents, code, and review checklist should be distributed prior to the review meeting to allow preparation. The author or Quality Assurance Manager should lead the meeting to ensure that it is structured and stays on point. The meeting should not be allowed to become bogged down in resolving defects, but should reach consensus on rating the object using a High/Medium/Low scale. During the meeting, reviewers should look at the object point-by-point and note any defects found in the Defect Log. Trivial items such as spelling or formatting errors should not be recorded in the log (to avoid clutter). If the number and impact of defects is small, the Quality Assurance Manager may decide to conduct an informal minireview after the defects are corrected to ensure that all problems have been appropriately rectified. If the initial review meeting identifies a significant amount of required rework, the Quality Assurance Manager should schedule another review meeting with the same review team to ensure that all defects are corrected. There are two main factors to consider when rating the impact of defects discovered during peer review, the effect on functionality and the saving in rework time. If a defect would result in a significant functional deficiency, or large amount of rework later in the project, it should be rated as 'high impact'. Metrics can be used to help in tracking the value of the review meetings. The cost of formal peer reviews is the man-time spent on meeting preparation, the review meeting itself, and the subsequent re-work. This can be recorded in man-days. The benefit of such reviews is the potential time saved. Although this can be estimated when the defect is originally noted, such estimates are unlikely to be reliable. It may be better to assign a notional benefit say two hours for a low-impact defect, one day for a medium-impact defect and two days for a high-impact defect. Adding up the benefit in man-days allows a direct comparison with cost. If no net benefit is obtained from the peer reviews, the Quality Assurance Manager should investigate a less intensive review regime, which can be implemented across the project or, more likely, in specific areas of the project.

Best Practices

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

49 of 82

None

Sample Deliverables
None

Last updated: 01-Feb-07 18:47

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

50 of 82

Phase : Work Breakdown Structure


Subtask 5.5 Design and Develop B2B Data Transformation Processes Description
A properly designed and organized Data Transformation Project performs better and makes better use of system resources. This task and its subtasks provide the steps that are necessary to create a proper design plan that includes the use of inventory management as well as an error handling strategy. Project delays and rewrites occur from an incomplete or non-existent project plan. The artifacts captured during these tasks can be reused for future projects and also allow for a quick insight into what an application is attempting to do and with which component(s). Once information is captured for the design plan, the development team should be able to continue through implementation with little interaction with the design team. However there will always be areas that need clarification and/or new requirements that need to be included in the original design.

Prerequisites
None

Roles

Data Transformation Developer (Primary) Quality Assurance Manager (Review Only) Technical Architect (Primary) Technical Project Manager (Secondary)

Considerations
The Data Transformation product provides two different environments; the Data Transformation Studio and the Data Transformation Server. The Data Transformation Studio gives the developer the ability to code and execute components. These components can be debugged and tuned before being pushed to the Data Transformation Server. The Data Transformation Server resides on the server where the main execution and processing of data will be accomplished. It is important to capture any alteration to the Studio engine settings so that they are migrated over to the Server configuration script. When creating the design specification it is important to keep the business logic directly tied to the area where the work is being described. Do not create a business logic section to be referenced. This greatly slows down development and does not clearly let the developer(s) know when alterations and/or additions are added to the specification.

Best Practices
None

Sample Deliverables
None

Last updated: 02-Nov-10 00:11

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

51 of 82

Phase 5: Build
Subtask 5.5.1 Develop Inventory of B2B Data Transformation Processes Description
Required components that need to be built as part of a Data Transformation project inventory include Streamers, Parsers, Mappers and Serializers. In order to create any-to-any transformations, components may (and should) be chained. A Parser normally covers any format to XML. A Mapper covers one representation of XML to another, and a Serializer covers XML to any end-format. Streamers are normally used on the front-end to break down large files, although they can be used on the back-end as well. Steamers are available for textual or XML structures. The number of components and tgp files needed to properly build the project depends upon the source data and its complexity. In the event that custom services are required by the use case, they are built during this phase. Note: A component may mean a Streamer, Parser, Mapper or Serializer. More detailed design information for each of these components is addressed in 5.5.3 Design B2B Data Transformation Processes. This subtask identifies which components will be required to address the process identified.

Prerequisites
None

Roles

Data Transformation Developer (Primary) Technical Architect (Primary)

Considerations
When building the inventory for a Data Transformation project the following questions should be answered to determine not just what components are required, but how many. How many components will be required, based on the logical steps in transforming from source to target formats? Will any of those components require supporting or sub-components? If a process is very complex or the handling of a data element requires a large number of steps to properly handle the conversion it may require one or many extra components. These components add complexity as well as failure points to the process. The code for a sub-component should be kept small. Is reusability within the transformation process possible? Can this component be used over again in other components or in other projects? If so then consider making the process a separate project and place it in the global directory to be utilized by other projects. Will memory structures be required to hold data (i.e., lookups)? These will require extra XSD structures and will increase the amount of memory resources the project will require as a whole. Is the source Data over 50MB (i.e., Streamer)? Large data files will require enormous amounts of system memory resources to be consumed and can delay the DT process from running while it is trying to load the file into memory. A Streamer allows DT the ability to concurrently process large data files while keeping the memory utilization low, however this does increase the CPU utilization. Does DT have a library for any data formats being processed? While any format can be processed by Data Transformation the product out of the box offers a number of libraries to handle the more common industry standard formats (e.g., SWIFT, NACHA, HIPPA, EDI, etc.). Any of these libraries can be altered if needed, as some implementations alter the standard to conform to business requirements. What formats are involved throughout the end-to-end transformation process? Knowing the data formats of both the Source and Target will allow for a better inventory development process. If the Target format cannot be identified in, at a minimum, a To-Be state before development, a level of detail should be taken to at least provide some direction with a wire frame. Not doing so can create many hours of recoding and unit testing. Do any of the source structures need to be custom pre-processed by a transformer? If the source format needs to have a pre-processer run before it can be properly processed in the Data Transformation Engine it will need to be reviewed to INFORMATICA CONFIDENTIAL PHASE 5: BUILD 52 of 82

the size of the source data. A pre-processer cannot be executed until the source file or chunked Streamer is loaded into memory. The DT Engine has to then process the source data and reload. These steps can be memory intensive. Is there a dependency on another transformation service? If a DT process is dependent on another Transformation Service it should be documented and placed were it can be utilized by other developers as well as published to the DT Server.

Best Practices
Naming Conventions - B2B Data Transformation

Sample Deliverables
Data Transformation Project Inventory

Last updated: 02-Nov-10 00:53

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

53 of 82

Phase 5: Build
Subtask 5.5.2 Develop B2B Error Handling and Validation Strategy Description
Once the high-level design document has been completed and the component inventory is established it is time to determine how errors in the project will be handled. Data Transformation has two different categories of errors Data Errors and Business Logic Errors. Data errors occur when issues arise with the actual data being processed. The source data can and usually does at some point have issues. These issues can come from users not properly entering data, data transfer issues and many other possible sources. Business Logic Errors are always predefined, and usually at project inception or in the case of migrations are already pre-set. Business Logic Errors are always the hardest to test for and to build error handling strategies for. Data validation is typically performed against XSD or industry standards such as HIPAA, SWIFT, EDI, etc. This allows DT to validate a complete message, specific pre-defined parts or any custom defined content. Data Errors can be caused by: Invalid data types Length issues Null fields Formatting issues Business Logic Errors are typically: Defined in the design documentation Due to inaccurate mapping specifications

Prerequisites
None

Roles

Quality Assurance Manager (Review Only) Technical Architect (Primary) Technical Project Manager (Secondary)

Considerations
Data Transformation provides the developer with the following two different abilities to handle errors: Custom Error Handling Functions allow the developer to build reusable code that can evaluate the rule or even the data about to be processed. The customer functions are required most for Business Logic as those rules are usually unique to the given process or department. Built-In Attributes is an ability provided by Data Transformation to help with the processing of the raw data. The Built-In Attributes give the developer the following two different abilities: on_fail - The on_fail attribute gives the developer the ability to write issues directly to the Data Transformation Log. This allows for more detail than just having a generic message written in the log. For larger source data being processed, a CustomLog can be used to allow for a Custom Serializer to create a specific response.

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

54 of 82

Validators - The Validators allow the developer the ability to provide very specific rules on the data. The DT Developer has the ability to create custom notifications to be used in the project.

Data Transformation provides pre-packaged extensive validation mechanisms for some of the industry standards. Two of the more commonly used ones are for HIPAA and SWIFT. The HIPAA validation engine is an add-on to Data Transformation that provides HIPAA validation levels 1-7 and is capable of producing the respective HIPAA validation response as well as an HTML report. SWIFT validation is built into the SWIFT transformation script, making Data Transformation SWIFT Certified. It is capable of validating structure, data and business rules.

Best Practices
None

Sample Deliverables
None

Last updated: 02-Nov-10 01:19

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

55 of 82

Phase 5: Build
Subtask 5.5.3 Design B2B Data Transformation Processes Description
Once the items in 5.5.1 Develop Inventory of B2B Data Transformation Processes and 5.5.2 Develop B2B Error Handling and Validation Strategy are complete, the overall design process needs to be created to show the developers how all of these components should be placed together. The Design process should have the project broken into four main sections.

Custom Components
Some projects require the creation of custom components utilizing the DT API library. These components need to be documented and detailed with the exact specifications that they need to be built to. Once a custom component is created the document needs to provide a mechanism for the distribution of the component to both developers and the server.

Source Processing
Processing of the source data is one of the most important aspects in a Data Transformation project. If the inventory contains a Streamer, than it must be placed first within the project. How will the data be processed in the Streamer? Are there considerations to be taken to make sure data elements are grouped together? Once the Streamer details have been derived (or there is not a need for a Streamer) than it should be determined if there is any pre-processing of the source data. Some binary data (especially if coming from the Mainframe) may require conversion from its original state into one that can be easily processed by the DTM Engine. In the case of structured and semi-structured data, source data would normally be processed into its equivalent XML presentation (i.e., EDI to EDI XML).

Logic Processing
The logic processing for a project is where most of the time will be spent in any given development cycle (as well as during unit testing). This section should greatly detail along with the 5.5.2 Data Transformation Error Handling Strategy how the data should be processed in a normal flow as well as what to do with data that does not correctly reflect the normal flow of the data. It is in this area as well that the 5.5.1 Data Transformation Inventory will show which components need to be placed and how they should be inter-connected.

Target Processing
Once the loading and logic processing of the data has been successfully accomplished the last step is to detail how the final data structure will be delivered. If there is a need to use a pre-built library to serialize the data into a predetermined structure, than the only required step is the final write out to the file system or calling application. Otherwise it will be necessary to detail which component(s) will be needed to create that final structure.

Prerequisites
5.5.1 Develop Inventory of B2B Data Transformation Processes 5.5.2 Develop B2B Error Handling and Validation Strategy

Roles

Technical Architect (Primary)

Considerations
Since not every component can be perceived in advance, if a business rule or chunk of data is presumed to be complex or possesses some challenge in the processing of the data, it is a good practice to put in a placeholder for a subcomponent to handle these issues. This will allow a developer to make a final decision at build time.

Best Practices
None

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

56 of 82

Sample Deliverables
None

Last updated: 02-Nov-10 01:48

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

57 of 82

Phase 5: Build
Subtask 5.5.4 Build Parsers, Serializers, Mappers and Streamers Description
Once the Design Process has been completed the next step is to build the individual components. There should be some order of detail used when building the components. Streamers and Parsers will usually be the first components to be built as they tend to be the primary source components for any given project. The source processing components should be built and tested to verify that the ability to consume and properly break apart the source data is completed to the specifications in the design document. This step should also flush out any XSD Source structure that needs to be addressed or created. The proper consumption of the source data should be considered as one of the most important aspects of any given project. If the data is not properly parsed then any further steps will be a reflection and magnify the errors in this step. If the source data is XML skip this step and go directly to the creation of the Mapper. Parsers can implement the business logic on the incoming source data while it is being parsed, but it must be noted that this can only work if a single pass can handle all the business logic required for this component. Data Transformation Mappers handle the manipulation of XML documents very well and with great efficiency. As such, the only requirement for this component to work correctly is that the source and target XSDs are available and accurate. Serializers give the developer the ability to structure any incoming XML into any structure that is required for the target format that is not XML. Serializers can be used with Error Handling Notification and as such may need to be built along with any other components during the development phase. XML must always be the source for this component.

Prerequisites
5.5.3 Design B2B Data Transformation Processes

Roles

Data Transformation Developer (Primary) Technical Architect (Review Only)

Considerations
The Data Transformation product comes with a very extensive list of libraries that can be used in both the Parser and Serializer build phase. These libraries can greatly accelerate the development process and provide some measure of assurance in knowing that they have been used many times over, and as such, provide a solid code base. Meta Level programming can be used with DT to create a framework that allows for a factory process of creating many Data Transformation Projects that can be very unique in nature or very similar based upon the Metadata provided. Meta Level programming is a good option for migration strategies where existing applications can be sourced or where a great level of metadata can be provided that states source/target structures and connections as well as any business logic. Components should be built independently of the project as a whole unless a sub-component is dependent upon a parent component to pass it a preprocessed data segment. Separating the component out and utilizing the example_source to test the component allows for a finer granularity of testing by the developer to verify the rules required for this sub-process in the project.

Best Practices
Implementing Mappers Implementing Parsers Implementing Streamers

Sample Deliverables
None

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

58 of 82

Last updated: 02-Nov-10 02:57

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

59 of 82

Phase 5: Build
Subtask 5.5.5 Build B2B Transformation Process from Data Transformation Objects Description
After the different components have been created in subtask 5.5.4 Build Parsers, Serializers, Mappers and Streamers it is possible to assemble these components into a Data Transformation Project. The assembly of the components should be followed from a Source to Target route. One of the biggest challenges in the assembly of the components is in maintaining expected results for all components. Expectations for both inputs and outputs should be clearly detailed. If expected results are not made clear to the developer than an unexpected result could impact the final assembly and require recoding of the component(s). The Data Transformation Inventory document should provide a good road map as to which components are reliant on others. Those that have dependencies should first be assembled as Units of Work. Each Unit of Work can then be inserted into the final project and the glue code can then be written to assemble them all together. Small amounts of this glue code will be required in the final assembly, as not every component will easily plug into the next. There may be a need for Error Handling Strategies to be placed into this glue code.

Prerequisites
5.5.1 Develop Inventory of B2B Data Transformation Processes 5.5.3 Design B2B Data Transformation Processes 5.5.4 Build Parsers, Serializers, Mappers and Streamers

Roles

Data Transformation Developer (Primary)

Considerations
When chaining together the different components and the use of TGP files, memory utilization and use of out-of-the-box provided TGP's should be taken into consideration. Whenever possible, components should make use of in-scope, memory preloaded data, rather than writing data into data-holders for use later on. Furthermore, when using of out-of-the-box provided TGPs, refrain from making changes to those files and components unless there is no other choice. In the event that Streamers are used, there may need to be a consideration on whether the Streamer and the rest of the components need to be separated into different Units of Work. Many different TGP files should be used when developing the project. One TGP file should be used for the direct execution of the project while other TGP files should be used to store sub-components that support those used in the main execution flow. Since each component can be executed separately or as Units of Work consider building these as separate Units of Work and Unit test them. This will help to ensure that each previous section communications as designed as the assembly moves forward. This does not mean that Unit Testing be completed in this step, but is merely to verify that each section is properly communicating the data.

Best Practices
None

Sample Deliverables
None

Last updated: 02-Nov-10 13:55

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

60 of 82

Phase 5: Build
Subtask 5.5.6 Unit Test B2B Data Transformation Process Description
After the successful creation and validation of the Data Transformation project a proper Unit Test should be performed to insure that basic run-time functionality and transformation logic is being performed. For the given data, any sub-components that can be validated should be validated as well. In this section, output structure and formatting should also be examined for correctness. Since this is not a complete end-to-end test, only the major communication points in the project should be evaluated. This subtask does not expect full validation of the overall solution. That testing will be done during Phase 6.

Prerequisites
None

Roles

Data Transformation Developer (Primary) Quality Assurance Manager (Review Only)

Considerations
A successful Unit Test will evaluate any inconsistencies in the transformation logic as well as any issues with the error handling strategy. For the test plan, each built component should have an independent example source that it can execute against; whether on the file system or as text assigned to the component Since a single Data Transformation project may consist of many different components it is important that a Unit Test plan be created for each of these components. The test plans should be based upon the business rules that were defined in the design specification and not on the code in the project. Test data should be included that covers both the initial load as well as daily processing. The test data should include not just the source, but the expected output for the exact source data being provided. Having both the source and target test data, the Unit Testing can then be scripted or automated by using tools to allow the testing phase to complete in a shorter timeframe

Best Practices
None

Sample Deliverables
Defect Log Defect Report Test Condition Results

Last updated: 02-Nov-10 14:08

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

61 of 82

Phase : Work Breakdown Structure


Subtask 5.6 Design and Build Information Lifecycle Management Processes Description
The design and build of the Information Lifecycle Management processes depends upon the type of ILM project. If the project is an application retirement project then retirement entities are created to retire the data and data discovery entities are created to access the data. With a data archive for performance project data archive entities are created and if the application is part of a Siebel, Deltek Costpoint PeopleSoft, or Oracle E-Business Suite ERP system, accelerators may be available to speed up the process. The source system is identified and a connection definition to that source is created regardless of the project type. In an application retirement project the target connection definition is a supported storage system or a file system. In an archive for performance project, while the target connection definition can also be a supported storage system or a file system, it is usually another database. Once the connection definitions have been created inside the data archive product then the Enterprise Data Manager is used to create or modify the metadata required to perform the archive or retirement. In the case of a database to database archive for performance, the solution provides for seamless access to the relocated data through the ERP system user interface. The specifics for each ERP system are different but the results are the same; access to the relocated inactive data through a familiar interface for the business users.

Prerequisites
3.3 Implement Technical Architecture

Roles

Business Analyst (Primary) Data Architect (Primary) Metadata Manager (Secondary) Technical Architect (Primary) Test Engineer (Secondary) Test Manager (Secondary)

Considerations
When designing and building the ILM processes, keep it as simple as possible. If multiple custom entities are required for a specific module to handle different business practices in different parts of the word, or even different regions of the country or departments within a company, try to create the smallest number required to get the job done. The more entities that exist for a specific module the more difficult it is to assure the correct one is being used. Once in production, unused custom entities should be removed from the system to avoid confusion.

Best Practices
Using Parallelism - ILM Archive

Sample Deliverables
None

Last updated: 02-Nov-10 15:15

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

62 of 82

Phase 5: Build
Subtask 5.6.1 Design ILM Entities Description
The design of ILM entities is dependent upon the type of entity being created and the project type. Data archive entities are created for data archive projects and retirement and data discovery entities are created for retirement projects. Designing the entities required to archive data from a production environment to another database to improve the performance of the production system is an iterative process that takes place in a recent copy of the production ERP database. If the ERP system is Seibel, Deltek Costpoint, PeopleSoft or Oracle E-Business Suite then canned metadata known as accelerators already exist for many applications within those systems and the effort required is reduced when compared to a custom application or system where accelerators do not already exist. The accelerators already have the groupings of tables and business rules most commonly used. Where an accelerator exists, the process is just a matter of identifying any other tables or custom tables, if any, that should be added to the canned accelerator and validating if the existing business rules need to be added to or modified to take business process over time into consideration. For example, an existing business rule that checks for functionality that wasnt being used may need to be disabled for the data that was created before the functionality was put into place. Once the earlier data is relocated the rule would need to be re-enabled for data created after the functionality was put into place. The process begins by generating candidates and examining the exceptions to canned business rules with the business users. The Accelerator Reference guide is used to examine the exceptions for possible changes and as changes are made they are tested to validate that expected results are achieved. Changes are never made to the standard entities so the first step is to make a copy of any entities that need to be modified in any way. Using the copy, make any modifications necessary to each of the entities and then test those changes. The metadata or entities used to retire an application are always created from scratch. When an Application is retired all the tables associated with that application are retired and designing the retirement entity involves how to group the tables for retirement. If the Application has a small number of tables (<100) then it can be retired using a single entity. However, an Application with a large number of tables (>1000) may need to be broken into several entities. The tables to be retired are mined through the Enterprise Data Manager and then all the tables are added to an entity with a 1=1 condition indicating that all of the data from the table should be retired which is always the case in a retirement project. Deciding how to break up the tables into separate entities is a matter of making sure each table is part of only one entity and all tables exist in an entity. The design of a data discovery entity begins with a report or query that will need to be run against the retired data. For every report or query that will be run, a data discovery entity will need to be created. The entity will consist of the tables referenced in the WHERE clause(s) in the report or query and the main transactional table in the query is defined as the driving table for the entity. If constraints exist on the tables then they will have already been loaded when the tables were mined. If no constraints exist, then they will need to be added manually using the Enterprise Data Manager for all the join conditions in the report or query.

Prerequisites
3.3 Implement Technical Architecture

Roles

Business Analyst (Primary) Data Architect (Primary) Metadata Manager (Secondary)

Considerations
When designing the ILM entities, the considerations depend upon the project type and the entity type. In a retirement project business rules dont apply, so the considerations center around both including all the tables and not having any tables in more than one retirement entity. If Data Discovery is not used then no data discovery entities will be required. If data discovery is used then all the queries will have to be identified to create the entities. If constraints do exist in the database they will be mined when the tables are mined for the data discovery entities. If they do not exist or some are missing they will have to be added through the EDM manually. In data archive projects the main consideration is the business rules. Business practices and data change

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

63 of 82

over time and that must be considered when creating the data archive entities.

Best Practices
None

Sample Deliverables
None

Last updated: 02-Nov-10 15:32

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

64 of 82

Phase 5: Build
Subtask 5.6.2 Build ILM Entities Description
The Enterprise Data Manager (EDM) is used to build the entities regardless of the project or entity type using the information learned during the 5.6.1 Design ILM Entities subtask. Creation of the data archive entities is based on what was learned during the 5.6.1 Design ILM Entities subtask for data archive entities. The business rule changes and timings for the business rule changes along with the list of additional tables, if any, that need to be added to the final entities can now be created. Changes are never made to the standard entities so the first step is to make a copy of any entities that need to be modified in any way. The names for the custom entities should be descriptive making it easy to choose the correct entity for each archive project. Creation of retirement entities is based on what was learned during the 5.6.1 Design ILM Entities subtask for designing retirement entities. The final entities that will be used to retire the data can now be built. A naming convention for the entity names should be established and followed and a new product family version should be created for each application to be retired. Using the information learned in the 5.6.1 Design ILM Entities subtask for data discovery entities, a data discovery entity will need to be created for each report or query identified. The entities are created using the Enterprise Data Manager and each one must include a driving table.

Prerequisites
5.6.1 Design ILM Entities

Roles

Technical Architect (Primary)

Considerations
When building the entities, a naming convention should be established and followed to reduce confusion over what each entity is used for and what it contains. For data discovery entities make sure the name has a notation indicating that it is a data discovery entity so that it isnt accidentally chosen for a retirement project. The same applies to the data archive and data retirement entities. They are all different and one type will not work for another purpose. That is not to say there will be an error in the process, but the result will not be what is desired.

Best Practices
Application ILM log4j Settings Using Parallelism - ILM Archive

Sample Deliverables
None

Last updated: 02-Nov-10 22:17

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

65 of 82

Phase 5: Build
Subtask 5.6.3 Unit Test ILM Entities Description
Once the entities for the specific project type have been built they need to be tested, this is usually an iterative process where, as issues are revealed during testing, designs are changed and entities are modified and re-tested. Testing the archive entities involves both checking the business rules and identifying any areas where performance tuning will be necessary. The business rules need to be examined and the exceptions investigated to confirm that no active transactions are being archived based on any of the rule changes. The entities should be tested with a volume of data equal to the largest amount of data that will be archived for a particular entity in a single cycle and all the steps should be examined for time. When long running steps are identified they should be targeted for potential tuning to improve performance. The process of examining the business rules and performance tuning should be performed for all entities in scope for the project. Testing of the retirement entities is a matter of retiring the data using the retirement entities and resolving issues with data type conversions as they come up. The retirement entities usually do not require any modification based on what information is learned during testing. The testing process for retirement entities is not about identifying changes that need to be made to the retirement entities themselves, but rather to identify data issues in the application to be retired. The testing of data discovery entities is a matter of retiring the data using the retirement entities and testing each of the queries or reports that the data discovery entities are based on. The data discovery entities usually do not require any modification based on what information is learned during testing. The testing process for data discovery entities is to identify areas where the data type conversions from the retirement process have made modifications to the queries, reports or even the data discovery entities necessary.

Prerequisites
5.6.2 Build ILM Entities

Roles

Test Engineer (Primary) Test Manager (Primary)

Considerations
Considerations for this task depend upon the project type and entity type, but at a basic level, testing is to root out possible problems. For a data discovery entity, testing will validate if any constraints are missing from the metadata behind the data discovery entity. In retirement, testing will identify issues with data type conversions and should also confirm that all tables are included in one (and only one) retirement entity. Testing entities in data archive business rules are verified and performance problems in the archiving process are identified and tuned.

Best Practices
None

Sample Deliverables
None

Last updated: 02-Nov-10 16:05

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

66 of 82

Phase 5: Build
Task 5.7 Populate and Validate Database Description
This task bridges the gap between unit testing and system testing. After unit testing is complete, the sessions for each mapping must be ordered so as to properly execute the complete data migration from source to target. Creating workflows containing sessions and other tasks with the proper execution order does this. By incorporating link conditions and/or decision tasks into workflows, the execution order of each session or task is very flexible. Additionally, event raises and event waits can be incorporated to further develop dependencies. The tasks within the workflows should be organized so as to achieve an optimum load in terms of data quality and efficiency. When this task is completed, the development team should have a completely organized loading model that it can use to perform a system test. The objective here is to eliminate any possible errors in the system test that relate directly to the load process. The final product of this task - the completed workflow(s) - is not static, however. Since the volume of data used in production may differ significantly from the volume used for testing, it may be necessary to move sessions and workflows around to improve performance.

Prerequisites
None

Roles

Business Analyst (Secondary) Data Integration Developer (Primary) Technical Project Manager (Review Only) Test Manager (Approve)

Considerations
At a minimum, this task requires a single instance of the target database(s). Also, while data may not be required for initial testing, the structure of the tables must be identical to those in the operational database(s). Additionally, consider putting all mappings to be tested in a single folder. This will allow them to be executed in the same workflows and reordered to assess optimum performance.

Best Practices
None

Sample Deliverables
None

Last updated: 29-Oct-10 12:54

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

67 of 82

Phase 5: Build
Subtask 5.7.1 Build Load Process Description
Proper organization of the load process is essential for achieving two primary load goals: Maintaining dependencies among sessions and workflows and Minimizing the load window Maintaining dependencies between sessions, worklets and workflows is critical for correct data loading; lack of dependency control results in incorrect or missing data. Minimizing the load window is not always as important, however this is dependent primarily on load volumes, hardware, and available load time.

Prerequisites
None

Roles

Business Analyst (Review Only) Data Integration Developer (Primary) Technical Project Manager (Review Only)

Considerations
The load development process involves the following five steps: 1. 2. 3. 4. 5. Clearly define and document all dependencies Analyze and document the load volume Analyze the processing resources available Develop operational requirements such as notifications, external processes and timing Develop tasks, worklets and workflows based on the results

If the volume of data is sufficiently low for the available hardware to handle, you may consider volume analysis optional, developing the load process solely on the dependency analysis. Also, if the hardware is not adequate to run the sessions concurrently, you will need to prioritize them. The highest priority within a group is usually assigned to sessions with the most child dependencies. Another possible component to add into the load process is sending e-mail. Three e-mail options are available for notification during the load process: Post-session e-mails can be sent after a session completes successfully or when it fails E-mail tasks can be placed in workflows before or after an event or series of events E-mails can be sent when workflows are suspended When the integrated load process is complete, it should be subject to unit test. This is true even if all of the individual components have already been subjected to unit test. The larger volumes associated with an actual operational run would be likely to hamper validation of the overall process. With unit test data, the staff members who perform unit testing should be able to easily identify major errors when the system is placed in operation.

Analyzing Load Volumes


The Load Dependency Analysis should list all sessions, in order of their dependency, together with any other events (Informatica or other), on which the sessions depend. The analysis must clearly document the dependency relationships between each session and/or event, the algorithm or logic needed to test the dependency conditions during execution, and the impact of any possible dependency test results (e.g., do not run a session, fail a session, fail a parent or worklet, etc.).

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

68 of 82

The load dependency documentation would for example, follow the following format: The first set of sessions or events listed in the analysis (Group A), would be those with no dependencies. The second set listed (Group B), would be those with a dependency on one or more sessions or events in the first set (Group A). Against each session in this list, the following information would be included: Dependency relationships (e.g. Succeed, Fail, Completed by (time), etc.) Action (e.g. do not run, fail parent) Notification (e.g. e-mail) The third set (Group C), would be those with a dependency on one or more sessions or events in the second set (Group B). Against each session in this list, similar dependency information as above would be included. The listing would be continued in the document, until all sessions are included. The Load Volume Analysis should list all the sources , source row counts and row widths, expected for each session. This should include the sources for all lookup transformations, in addition to the extract sources, as the amount of data that is read to initialize a lookup cache can materially affect the initialization and total execution time of a session. The Load Volume Analysis should also list sessions in descending order of processing time, estimated based these factors (i.e., the number of rows extracted, number of rows loaded, number and volume of lookups in the mappings). For Data Migration projects, the final load processes are the set of load scripts, scheduling objects, or master workflows that will be executed for the data migration. It is important that developers develop with a load plan in mind so that these load procedures can be developed quickly, as they are often developed late in the project development cycle when time is of short supply. It is recommended to keep all load scripts/schedules/master workflows to a minimum as the execution of each will become a given line item on the migration punchlist.

Best Practices
Data Integration Load Traceability Third Party Scheduler

Sample Deliverables
None

Last updated: 29-Oct-10 01:43

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

69 of 82

Phase 5: Build
Subtask 5.7.2 Perform Integrated ETL Testing Description
The task of integration testing is to check that components in a software system or, one step up, software applications at the company level , interact without error. There are a number of strategies that can be employed for integration testing, two examples are as follows: Integration testing based on business processes. In this strategy, tests examine all the system components affected by a particular business process. For instance, the one set of tests might cover the processing of a customer order, from acquisition and registration, through to delivery and payment. Additional business processes are incorporated into the tests, until all system components or applications have been sufficiently tested. Integration testing based on test objectives. For example, a test objective might be the integration of system components that use a common interface. In this strategy tests would be defined based on the interface. These two strategies illustrate that the ETL process is merely part of the equation rather than the focus of it. It is still important to take note of the ETL load so as to ensure that such aspects as performance and data quality are not adversely affected.

Prerequisites
None

Roles

Business Analyst (Secondary) Data Integration Developer (Primary) Technical Project Manager (Review Only) Test Manager (Approve)

Considerations
Although this is a minor test from an ETL perspective, it is crucial for the ultimate goal of a successful process implementation. Primary proofing of the testing method involves matching the number of rows loaded to each individual table. It is a good practice to keep the Load Dependency Analysis and Load Volume Analysis in mind during this testing, particularly if the process identifies a problem in the load order. Any deviations from those analyses are likely to cause errors in the loaded data. The final product of this subtask, the Final Load Process document, is the layout of workflows, worklets, and session tasks that will achieve an optimal load process. The Final Load Process document orders workflows, worklets, and session tasks in such a way as to maintain the required dependencies while minimizing the overall load window. This document will differ from that generated in the previous subtask, 5.5.1 Build Load Process to represent the current actual result. However, this layout is still dynamic and may change as a result of ongoing performance testing.

TIP The Integration Test Percentage (ITP) is a useful tool that indicates the percentage of the project's source code that has been unit and integration tested. The formula for ITP is: ITP = 100% * Transformation Objects Unit Tested/Total Objects As an example, this table shows the number of transformation objects for mappings. Mapping Trans. Objects M_ABC M_DEF 15 3 PHASE 5: BUILD 70 of 82

INFORMATICA CONFIDENTIAL

M_GHI M_JKL

24 7

If mapping M_ABC is the only one unit tested, the ITP is:
ITP = 100% * 15 / 49 = 30.61%

If mapping M_DEF is the only one unit tested, the ITP is:
ITP = 100% * 3 / 49 = 6.12%

If mappings M_GHI and M_JKL are unit tested, the ITP is:
ITP = 100% * (24 + 7) / 49 = 100% * 31 / 49 = 75.00%

And if all modules are unit tested, the ITP is:


ITP = 100% * 49 / 49 = 100%

The ITP metric provides a precise measurement as to how much unit and integration testing has been done. On actual projects, the definition of a unit can vary. A unit may be defined as an individual function, a group of functions, or an entire Computer Software Unit (which can be several thousand lines of code). The ITP metric is not based on the definition of a unit. Instead, the ITP metric is based on the actual number of transformation objects tested with respect to the total number of transformation objects defined in the project.

Best Practices
None

Sample Deliverables
None

Last updated: 29-Oct-10 12:57

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

71 of 82

Phase 5: Build
Task 5.8 Build Presentation Layer Description
The objective of this task is to develop the end-user analysis, using the results from 4.4 Design Presentation Layer. The result of this task should be a final presentation layer application that satisfies the needs of the organization. While this task may run in parallel with the building of the data integration processes, data is needed to validate the results of any presentation layer queries. This task cannot therefore, be completed before tasks 5.4 Design and Develop Data Integration Processes and 5.5 Populate and Validate Database. The Build Presentation Layer task consists of two subtasks which may need to be performed iteratively several times: 1. Developing the end-user presentation layer 2. Presenting the end-user the presentation layer to business analysts to elicit and incorporate their feedback. Throughout the Build Phase, the developers should refer to the deliverables produced during the Design Phase. These deliverables include a working prototype, end user feedback, metadata design framework and, most importantly, the Presentation Layer Design document, which is the final result of the Design Phase and incorporates all efforts completed during that phase. This document provides the necessary specifications for building the front-end application for the user community. This task incorporates both development and unit testing. Test data will be available from the initial loads of the target system. Depending on volumes, a sample of the initial load may be appropriate for development and unit testing purposes. This sample data set can be used to assist in building the presentation layer and validating reporting results , without the added effort of fabricating test data.

Prerequisites
None

Roles

Business Analyst (Primary) Presentation Layer Developer (Primary) Project Sponsor (Approve) Technical Project Manager (Review Only)

Considerations
The development of the presentation layer includes developing interfaces and predefined reports to provide end users with access to the data. It is important that data be available to validate the accuracy of the development effort. Having end users available to review the work-in-progress is an advantage, enabling developers to incorporate changes or additions early in the review cycle.

Best Practices
None

Sample Deliverables
None

Last updated: 29-Oct-10 12:48

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

72 of 82

Phase 5: Build
Subtask 5.8.1 Develop Presentation Layer Description
By the time you get to this subtask, all the design work should be complete, making this subtask relatively simple. Now is the time to put everything together and build the actual objects such as reports, alerts and indicators. During the build, it is important to follow any naming standards that may have been defined during the design stage, in addition to the standards set on layouts, formats, etc. Also, keep detailed documentation of these objects during the build activity. This will ensure proper knowledge transfer and ease of maintenance in addition to improving the readability for everyone. After an object is built, thorough testing should be performed to ensure that the data presented by the object is accurate and the object is meeting the performance that is expected. The principles for this subtask also apply to metadata solutions providing metadata to end users.

Prerequisites
None

Roles

Presentation Layer Developer (Primary)

Considerations
During the Build task, it is good practice to verify and review all the design options and to be sure to have a clear picture of what the goal is. Keep in mind that you have to create a report no matter what the final form of the information delivery is. In other words, the indicators and alerts are derived off a report and hence your first task is to create a report. The following considerations should be taken into account while building any piece of information delivery:

Step 1: What measurements do I want to display?


The measurements, which are called metrics in the BI terminology, are perhaps the most important part of the report. Begin the build task by selecting your metrics, unless you are creating an Attribute-only Report. Add all the metrics that you want to see on the report and arrange them in the required order. You can add a prompt to the report if you want to make it more generic over, for example, time periods or product categories. Optionally, you can choose a Time Key that you want to use as well for each metric.

Step 2: What parameters should I include?


The metrics are always measured against a set of predefined parameters. Select these parameters, which are called Attributes in the BI terminology, and add them to the Report (unless you are creating a Metric-only Report). You can add Prompts and Time Keys for the attributes too, just like the metrics. TIP Create a query for metrics and attributes. This will help in searching for the specific metrics or attributes much faster than manually searching in a pool of hundreds of metrics and attributes. Time setting preferences can vastly differ from one users requirement to that of another. One group of users may be interested just in the current data while another group may want to compare the trends and patterns over a period of time. It is important to thoroughly analyze the end users requirements and expectations prior to adding the Time Settings to reports.

Step 3: What are my data limiting criteria for this report?


Now that you have selected all the data elements that you need in the report, it is time to make sure that you are delivering only the relevant data set to the end users. Make sure to use the right Filters and Ranking criteria to accomplish this in the report. Consider using Filtersets instead of just Filters so that important criteria limiting the data sets can be standardized over a project INFORMATICA CONFIDENTIAL PHASE 5: BUILD 73 of 82

or department, for example.

Step 4: How should I format the report?


Presenting the information to the end user in an appealing format is as important as presenting the right data. A good portion of the formatting should be decided during the Design phase. However, you can consider the following points while formatting the reports: Table report type: The data in the report can be arranged in one of the following three table types: tabular, cross tabular, or sectional. Select the one that suits the report the best. Data sort order: Arrange the data such that the pattern makes it easy to find any part of the information that one is interested in. Chart or graph: A picture is worth a thousand words. A chart or graph can be very useful when you are trying to make a comparison between two or more time periods, regions, or product categories, etc.

Step 5: How do I deliver the information?


Once the report is ready, you should think about how the report should be delivered. In doing so, be sure to address the following points: Where should the report reside? Select a folder that is most suited for the data that the report contains. If the report is shared by more than one group of users, you may want to save it in a shared folder. Who should get the report, when, and how should they get it ? Make sure that proper security options are implemented for each report. There may be sensitive and confidential data that you want to ensure is not accessible by unauthorized users. When should the report be refreshed? - You can chose to run the report on-demand or schedule it to be automatically refreshed at regular intervals. Ad-hoc reports that are of interest to a smaller set of individuals are usually run on-demand. However, the bulk of the reports that are viewed regularly by different business users need to be scheduled to refresh periodically. The refresh interval should typically consider the period for which the business users are likely to consider the data current as well as the frequency of data change in the data warehouse. Occasionally, there will be a requirement to see the data in the report as soon as the data changes in the data warehouse (and data in the warehouse may change very frequently). You can handle situations like this by having the report refresh at real-time. Special requirements You should consider any special requirements a report may have at this time, such as whether the report needs to be broadcast to users, whether there is a need to export the data in the report to an external format etc. Based on these requirements, you can make minor changes in the report as necessary.

Packing More Power into the Information


Adding certain features to the report can make it more useful for everybody. Consider the following for each report that you build: Title of the report: The title of the report should reflect what the report contents are meant to convey. Rarely, it may become a tough task to name a report very accurately if the same report is viewed in two different perspectives by two different sets of users. You may consider making a copy of the report and naming the two instances to suit each set of users. Analytic workflows: Analytic workflows make the information analysis process as a whole more robust. Add the report to one or more analytic workflows so that the user can get additional questions answered in the context of a particular reports data. Drill paths: Check to make sure that the required drill paths are set up. If you dont find a drill path that you think is useful for this report, you may have to contact the Administrator and have it set up for you. Highlighters: It may also be a good idea to use highlighters to make critical pieces of information more conspicuous in the report. Comments and description: Comments and Descriptions make the reports more easily readable as well as helping when searching for a report.

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

74 of 82

Keywords: It is not uncommon to have numerous reports pertaining to the same business area residing in the same location. Including keywords in the report setup, will assist users in searching for a report more easily.

Indicator Considerations
After the base report is complete, you can build indicators on top of that report. First, you will need to determine and select the type of indicator that best suits the primary purpose. You can use chart, table or gauge indicators. Remember that there are several types of chart indicators as well as several different gauge indicators to choose from. To help decide what types of indicators to use, consider the following:
Do you want to display information on one specific metric?

Gauge indicators allow you to monitor a single metric and display whether or not that indicator is within an acceptable range. For example, you can create a gauge indicator to monitor the revenue metric value for each division of your company. When you create a gauge indicator, you have to determine and specify three ranges (low, medium, and high) for the metric value. Additionally, you have to decide how the gauge should be displayed: circular, flat, or digital.
Do you want to display information on multiple metrics?

If you want to display information for one or more attributes or multiple metrics, you can create either chart or table indicators. If you chose chart indicators, you have more than a dozen different types of charts to choose from (standard bar, stacked line, pie, etc). However, if youd like to see a subset of an actual report, including sum calculations, in a table view, chose a table indicator.

Alert Considerations
Alerts are created when something important is occurring, such as falling revenue or record breaking sales. When creating indicators, consider the following:
What are the important business occurrences?

These answers will come from discussions with your users. Once you find out what is important to the users, you can define the Alert rules.
Who should receive the alert?

It is important that the alert is delivered to the appropriate audience. An alert may go on a business units dashboard or a personal dashboard.
How should the alert be delivered?

Once the appropriate Alert receiver is identified, you must determine the proper delivery device. If the user doesnt log into Power Analyzer on a daily basis, maybe an email should be sent. If the alert is critical, a page could be sent. Furthermore, make sure that the required delivery device (i.e., email, phone, fax, or pager) has been registered in the BI tool.

Testing and Performance


Thorough testing needs to be performed on the report/indicator/alert after it is built to ensure that you are presenting accurate and desired information. Try to make sure that the individual rows, as well as aggregate values, have accurate numbers and are reported against the correct attributes. Always keep performance of the reports in mind. If a report takes too long time to generate data, then you need to identify what is causing the bottleneck and eliminate or reduce the bottleneck. The following points are worth remembering: Complex queries, especially against dozens of tables, can make a well-designed data warehouse look inefficient. Multi-pass SQL is supported by Data Analyzer Indexing is important, even in simple star schemas.

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

75 of 82

TIP You can view the query in your report if your report is taking a long time to get the data from the source system. Copy the query and evaluate the query, by running utilities such as Explain Plan on the query in Oracle, to make sure that it is optimized.

Best Practices
Data Analyzer Security

Sample Deliverables
None

Last updated: 29-Oct-10 12:49

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

76 of 82

Phase 5: Build
Subtask 5.8.2 Demonstrate Presentation Layer to Business Analysts Description
After the initial development effort, the development team should present the presentation layer to the Business Analysts to elicit and incorporate their feedback. When educating the end users about the front end tool whether a business intelligence tool or an application, it is important to focus on the capabilities of the tool and the differences between typical reporting environments and solution architectures. When end users thoroughly understand the capabilities of the front end that they will use, they can offer more relevant feedback.

Prerequisites
None

Roles

Business Analyst (Primary) Presentation Layer Developer (Primary) Project Sponsor (Approve) Technical Project Manager (Review Only)

Considerations
Demonstrating the presentation layer to the business analysts should be an iterative process that continues throughout the Build Phase. This approach helps the developers to gather and incorporate valuable user feedback and enables the end users to validate or clarify the interpretation of their requirements prior to the release of the end product, thereby ensuring that the end result meets the business requirements. The Project Manager must play an active role in the process of accepting and prioritizing end user requests. While the initial release of the presentation layer should satisfy user requirements, in an iterative approach, some of the additional requests may be implemented in future releases to avoid delaying the initial release. The Project Manager needs to work closely with the developers and analysts to prioritize the requests based upon the availability of source data to support the end users' requests and the level of effort necessary to incorporate the changes into the initial (or current) release. In addition, the Project Manager must communicate regularly with the end users to set realistic expectations and establish a process for evaluating and prioritizing feedback. This type of communication helps to avoid end-user dissatisfaction, particularly when some requests are not included in the initial release. The Project Manager also needs to clearly communicate release schedules and future development plans, including specifics about the availability of new features or capabilities, to the end-user community.

Best Practices
None

Sample Deliverables
None

Last updated: 29-Oct-10 12:51

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

77 of 82

Phase 5: Build
Subtask 5.8.3 Build Seamless Access Description
Based on what was learned in the 4.4.4 Design ILM Seamless Access subtask, the seamless access layer can now be created. It is assumed that at this point the Archive Source and Archive Target connections have already been defined. Regardless of the source ERP application the first step is to run a job to create the history tables in the online archive database. This job will create all tables that exist in any entity for the product family defined for the source instance. Once the tables have been created the job to create the seamless access schema objects can be run. It is assumed that the seamless access users have already been created. For everything except Deltek Costpoint the seamless access users will be created in the Source instance. The job will create synonyms to most objects and what is known as base views for the tables that exist in the online archive database. These tables are often referred to as the managed tables. For any views that reference a managed table in the from clause the synonym will be dropped and what is known as a dependent view will be created. If this is a custom application nothing more needs to be done. However, if it is one of the supported canned applications then the application specific setup instructions can now be followed to finish the seamless access layer setup. When implementing on Oracle E-Business Suite the two schemas owning the seamless access objects will be registered with the application. Data groups are then created with the user for all but one application, set to the seamless access schema. Using the list of responsibilities created during the 4.4.4 Design Seamless Access subtasks an archive-only and combined version is created for each using the archive-only or combined data group and those responsibilities are then given to the users requiring access to the older data. In a PeopleSoft implementation a new PIA is created and a PeopleSoft specific script will modify some of the seamless access objects. If an archive-only and combined version is required then two separate PIAs will need to be created and the script will need to be run twice. For this reason, with PeopleSoft using only a combined version is the most common practice. The new PIA will connect to a TNS alias of the source database and use the combined user instead of SYSADM. The users identified as needing access will be given the URL to the new PIA. Siebel is similar to PeopleSoft in that a new front end is created that points to the combined seamless access user. A script specific to Siebel is run to modify some of the seamless access objects in the combined schema and there is a new URL to give out to the users that require access. Usually if the shared type front-end is used in Siebel then a combined version is all that is used. There is also a client/server application that can be installed on each client machine with Siebel that can be leveraged by seamless access and no new front-end is required. The client/server application is configured to connect either as the archiveonly or combined seamless access user and as long as the Siebel specific script has been run for both schemas then an archive-only and combined version can be created on individual PCs of users that need access to the archived data. Deltek Costpoint is the only ERP application that requires the connection from the front-end application or client/server application to be a schema called DELTEK. For this reason the combined seamless access user must exist on a database other than the source database and either a client/server or new front-end can be used in the same way as is done for Siebel. The schema is created on the target and since the schema must be called DELTEK an archive only version is never created even though it is technically possible. Any query tool can be used to query the data in the source and target using the seamless access schemas. The seamless access objects provide a method to look at the data that has been archived and the data that has not been archived using the same queries that have been used prior to archiving the data by running them as one of the seamless access schemas as long as no owners are hard coded into the queries. If the schema names are hard coded in the existing queries then the references just need to be updated in archive-only and combined versions of the queries.

Prerequisites
4.4.4 Design ILM Seamless Access

Roles

Application Specialist (Primary)

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

78 of 82

Business Analyst (Primary) Business Project Manager (Secondary) Data Architect (Primary) Database Administrator (DBA) (Secondary)

Considerations
When creating the seamless access objects make sure and test the database link and verify it is pointing to the history public user and not the actual history user to enforce the read only nature of the history data. The objects being created will vary depending on the source ERP application, but they will be the same types of objects for anything. Once the creation job has completed, run counts of the objects and make sure the counts are what would be expected for each of the different application types. If the source application is Oracle E-Business suite dont take a short cut when creating the data groups by not including all of the applications. Many standard built-in reports do not belong to the application expected and debugging one will take longer than adding all the applications to begin with. Also, if they all are not added when the data group is created then they will need to be added one at a time. When building the Deltek Costpoint seamless access objects make sure the combined user is DELTEK and exists on the target. In PeopleSoft the seamless access schemas must be eight characters or less and so does the TNS alias.

Best Practices
Seamless Access Oracle E-Business Suite

Sample Deliverables
None

Last updated: 02-Nov-10 16:27

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

79 of 82

Phase 5: Build
Subtask 5.8.4 Unit Test Seamless Access Description
Testing the seamless access layer requires functional knowledge of how the inactive data that has been relocated will be accessed by the business users. Testing should include all reports and screens or forms that will be used to access the data. It should be noted that the seamless access schemas are read-only and insert, update and delete functionality will be disabled. During the testing identify all processes that do not perform acceptably and tune as required. The seamless access layer should also be tested for functionality to confirm that all data is returned as expected. A testing plan should be created and followed to make sure that everything is tested. However, if older data will never be accessed through a specific report or screen or form then it doesnt need to be tested. On the other hand, if a report or screen or form will be used to access the data it should be tested and verified. Identifying what needs to be tested and what does not need to be tested is half the battle and it is important to try and identify everything since production is never a good place to test. To begin the testing process for seamless access, data needs to be in the online archive database. Functional testing can begin before the volume of data in the online archive database is large, but for performance testing the online archive database should contain data equal to the initial retention policy and indices should be created on the history tables and then the history schema should be analyzed. There is a standalone job to create the indices on the history tables based upon the indices that exist on the corresponding source tables. There is not a job for analyzing the history schema and when it is analyzed ensure that tables and indices are analyzed as well.

Prerequisites
5.8.3 Build Seamless Access

Roles

Test Engineer (Primary) Test Manager (Primary)

Considerations
In PeopleSoft the seamless access is tested for functionality and to identify performance issues as with all other ERP applications. However, PeopleSoft is unique in testing because part of the testing is looking for either functionality that is disabled by default, but should be enabled or functionality that is enabled and should be disabled. Everybody needs the PeopleSoft combined PIA to behave differently, so the default behavior is usually changed and testing is where the changes are identified. In Oracle E-Business Suite the main consideration aside from what is common to all the ERP applications are hard coded schema names in custom reports, views, packages and other objects. During testing look for instances where the wrong data is returned by a form or report and if the data groups have been set up properly the most likely cause is a hard coded schema name somewhere.

Best Practices
None

Sample Deliverables
None

Last updated: 02-Nov-10 16:41

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

80 of 82

Phase 5: Build
Subtask 5.8.5 Demonstrate Seamless Access to Business Users Description
Demonstrating seamless access to the business users will require assistance from a business user to determine what needs to be demonstrated. The archive-only and combined access solutions should both be demonstrated, if they both exist. In the case of Deltek Costpoint there will not be an archive-only solution and for Siebel and PeopleSoft even though there can be an archive-only view in place it is usually just combined. Oracle applications on the other hand will almost always have both an archive-only and a combined view into the data in place. A demonstration of seamless access always turns into a demonstration of the data archive product as well. This is the first time the business users will be seeing firsthand how seamless access will work and they also need to be able to see the data show up in one version of the user interface and not in the other. When the data is archived they can see the data is now no longer available from the standard, view but is available through seamless access. It is important to show that the data can be restored using seamless access to verify that the restore actually takes place. The average business users will not care how the data is archived or restored, but will want to see that it happens and that the data can be accessed or restored if required. Begin the demonstration by using the standard user interface to view some data that has not been archived, but will qualify to be archived. Show that the data is also available from the combined user interface as well. In the case of Oracle Applications show that the data is not available in the archive-only responsibility and then archive the data and show that it is no longer available in the standard user interface and also that it is available in the combined version. Showing data that has not been archived is also available in the combined user interface. If the demonstration is for Oracle Applications show that the data is available in the archive-only responsibility and data that has not yet been archived is not available. Use the data archive product to restore the transaction or cycle the transaction that was archived in, and again query it from the different user interfaces to show that it is available where expected and not seen where it should not be seen.

Prerequisites
5.8.3 Build Seamless Access

Roles

Application Specialist (Primary) Business Analyst (Primary) End User (Review Only) Project Sponsor (Review Only) Training Coordinator (Secondary) User Acceptance Test Lead (Primary)

Considerations
Plan for the demonstration in advance and dont wait for the actual live demonstration to identify the transaction or cycle for restore or the timing cannot be predicted. Choose a small amount of data for archive and restore and do not run row count reports so that the process will complete quickly. Plan for discussion while the process is running and know how long each part will take, including the time to query the data. A full demonstration and presentation should run for one to one and a half hours, any longer and the audiences interest may be lost.

Best Practices
None

Sample Deliverables
None

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

81 of 82

Last updated: 02-Nov-10 20:33

INFORMATICA CONFIDENTIAL

PHASE 5: BUILD

82 of 82

Velocity v9
Phase 6: Test

2011 Informatica Corporation. All rights reserved.

Phase 6: Test
6 Test 6.1 Define Overall Test Strategy 6.1.1 Define Test Data Strategy 6.1.2 Define Unit Test Plan 6.1.3 Define System Test Plan 6.1.4 Define User Acceptance Test Plan 6.1.5 Define Test Scenarios 6.1.6 Build/Maintain Test Source Data Set 6.2 Prepare for Testing Process 6.2.1 Prepare Environments 6.2.2 Prepare Defect Management Processes 6.3 Execute System Test 6.3.1 Prepare for System Test 6.3.2 Execute Complete System Test 6.3.3 Perform Data Validation 6.3.4 Conduct Disaster Recovery Testing 6.3.5 Conduct Volume Testing 6.4 Conduct User Acceptance Testing 6.5 Tune System Performance 6.5.1 Benchmark 6.5.2 Identify Areas for Improvement 6.5.3 Tune Data Integration Performance 6.5.4 Tune Reporting Performance

INFORMATICA CONFIDENTIAL

PHASE 6: TEST

2 of 49

Phase 6: Test
Description
The diligence with which you pursue the Test Phase of your project will inevitably determine its acceptance by its end-users, and therefore, its success against its business objectives. During the Test Phase you must essentially validate that your system accomplishes everything that the project objectives and requirements specified and that all the resulting data and reports are accurate. Test is also a critical preparation against any eventuality that could impact your project; whether that be radical changes to data volumes, disasters that disrupt service for the system in some way, or spikes in concurrent usage. The Test phase includes the full design of your testing plans and infrastructure as well as two categories of comprehensive system-wide verification procedures; the System Test and the User Acceptance Test (UAT). The System Test is conducted after all elements of the system have been integrated into the test environment. It includes a number of detailed technically-oriented verifications that are managed as processes by the technical team with primarily technical criteria for acceptance. UAT is a detailed user-oriented set of verifications with User Acceptance as the objective. It is typically managed by end-users with participation from the technical team. Any test cannot be considered complete until there is verification that it has accomplished the agreed-upon Acceptance Criteria. Because of the natural tension that exists between completion of the preset project timeline and completion of Acceptance Criteria (which may take longer than expected) the Test Phase schedule is often owned by a QA Manager or Project Sponsor rather than the Project Manager. Velocity includes as a final step in the Test Phase activities related to tuning system performance. Satisfactory performance and system responsiveness can be a critical element of user acceptance.

Prerequisites
None

Roles

Business Analyst (Primary) Data Integration Developer (Primary) Data Warehouse Administrator (Primary) Database Administrator (DBA) (Primary) End User (Primary) Network Administrator (Primary) Presentation Layer Developer (Primary) Project Sponsor (Review Only) Quality Assurance Manager (Primary) Repository Administrator (Primary) System Administrator (Primary) System Operator (Primary) Technical Project Manager (Secondary) Test Manager (Primary) User Acceptance Test Lead (Primary)

Considerations
INFORMATICA CONFIDENTIAL PHASE 6: TEST 3 of 49

To ensure the Test Phase is successful it must be preceded by diligent planning and preparation. Early on, project leadership and project sponsors should establish test strategies and begin building plans for System Test and UAT. Velocity recommends that this planning process begins, at the latest, during the Design Phase, and that it includes descriptions of timelines, participation, test tools, guidelines and scenarios, as well as detailed Acceptance Criteria. The Test Phase includes the development of test plans and procedures. It is intended to overlap with the Build Phase which includes the individual design reviews and unit test procedures. It is difficult to determine your final testing strategy until detailed design and build decisions have been made in the Build Phase. Thus it is expected that from a planning perspective, some tasks and subtasks in the Test Phase will overlap with those in the Build Phase and possibly the Design Phase. The Test Phase includes other important activities in addition to testing. Any defects or deficiencies discovered must be categorized (severity, criticality, priority) recorded, and weighed against the Acceptance Criteria (AC). The technical team should repair them within the guidelines of the AC, and the results must be retested with the inclusion of satisfactory regression testing. This process has the prerequisite for the development of some type of Defect Tracking System; Velocity recommends that this be developed during the Build Phase. Although formal user acceptance signals the completion of the Test Phase, some of its activities will be revisited, perhaps many times, throughout the operation of the system. Performance tuning is recommended as a recurrent process. As data volume grows and the profile of the data changes, performance and responsiveness may degrade. You may want to plan for regular periods of benchmarking and tuning, rather than waiting to be reactive to end-user complaints. By it's nature software development is not always perfect, so some repair and retest should be expected. The Defect Tracking System must be maintained to record defects and enhancements for as long as the system is supported and used. Test scenarios, regression test procedures, and other testing aids must also be retained for this purpose.

Best Practices
None

Sample Deliverables
None

Last updated: 01-Feb-07 18:48

INFORMATICA CONFIDENTIAL

PHASE 6: TEST

4 of 49

Phase 6: Test
Task 6.1 Define Overall Test Strategy Description
The purpose of testing is to verify that the software has been developed according to the requirements and design specifications. Although the major testing actually occurs at the end of the Build Phase , determining the amount and types of testing to be performed should occur early in the development lifecycle. This enables project management to allocate adequate time and resources to this activity. This also enables the project to build the appropriate testing infrastructure prior to the beginning of the testing phase. Thus, while all of the testing related activities have been consolodated in the Testing phase, the beginning of these activities often begins as early as the Design Phase. The detailed object level testing plans are continually updated and modified as the development process continues since any change to development work is likely to create a new scenario to test. Planning should include the following components : resource requirements and schedule construction and maintenance of the test data preparation of test materials preparation of test environments preparation of the methods and control procedures for each of the major tests Typically, there are three levels of testing: Testing Level Unit Description Performed By

Developer Testing of each individual function. For example, with data integration this includes testing individual mappings, UNIX scripts, stored procedures, or other external programs. Ideally, the developer tests all error conditions and logic branches within the code. Testing performed to review the system as a whole as well as its points of integration. Testing may include, but is not limited to, data integrity, reliability, and performance. System Test Team

System or Integration

User Acceptance

As most data integration solutions do not directly touch end users, User Acceptance Testing Team User Acceptance Testing should focus on the front-end applications and reports, rather than the load processes themselves.

Prerequisites
None

Roles

Business Analyst (Primary) Data Integration Developer (Primary) End User (Primary) Presentation Layer Developer (Primary) Quality Assurance Manager (Approve) Technical Project Manager (Approve)

Considerations
None

INFORMATICA CONFIDENTIAL

PHASE 6: TEST

5 of 49

Best Practices
None

Sample Deliverables
None

Last updated: 01-Feb-07 18:48

INFORMATICA CONFIDENTIAL

PHASE 6: TEST

6 of 49

Phase 6: Test
Subtask 6.1.1 Define Test Data Strategy Description
Ideally, actual data from the production environment will be available for testing so that tests can cover the full range of possible values and states in the data. However, the full set of production data is often not available. Additionally, there is sometimes a risk of sensitive information migrating from production to less-controlled environments (i.e., test); in some circumstances, this may even be illegal. There is also the chicken-and-egg problem of requiring the load of production source data, in order to test the load of production source data. Therefore, it is important to understand that with any set of data used for testing, there is no guarantee that all possible exception cases and value ranges will occur in the sub-set of the data used. If generated data is used, the main challenge is to ensure that it accurately reflects the production environment. Theoretically, generated data can be made to be representative and engineered to test all of the project functionality. While the actual record counts in generated tables are likely to differ from production environments, the ratios between tables should be maintained; for example, if there is a one-to-ten ratio between products and customers in the live environment, care should be taken to retain this same ratio in the test environment. The deliverable from this subtask is a description and schedule for how test data will be derived, stored, and migrated to testing environments. Adequate test data can be important for proper unit testing and is critical for satisfactory system and user acceptance tests.

Prerequisites
None

Roles

Business Analyst (Primary) Data Integration Developer (Secondary) End User (Primary) Presentation Layer Developer (Primary) Quality Assurance Manager (Approve) Technical Project Manager (Approve) Test Manager (Primary)

Considerations
In stable environments, there is less of a premium on flexible maintenance of test data structures; the overhead of developing software to load test data may not be justified. In dynamic environments (i.e., where source and/or target data structures are not finalized), the availability of a data movement tool such as PowerCenter greatly expands the range of options for test data storage and movement. Usually, data for testing purposes is stored in the same structure as the source in the data flow. However, it is also possible to store test data in a format that is geared toward ease of maintenance and to use PowerCenter to transfer the data to the source system format. So if the source is a database with a constantly changing structure, it may be easier to store test data in XML or CSV formats where it can easily be maintained with a text editor. The PowerCenter mappings that load the test data from this source can make use of techniques to insulate (to some degree) the logic from schema changes by including pass-through transformations after source qualifiers and before targets. For Data Migration, the test data strategy should be focused on how much source data to use rather than how to manufacture test data. It is strongly recommended that the data used for testing is real production data but most likely of less volume then the production system. By using real production data, the final testing will be more meaningful and increase the level of confidence INFORMATICA CONFIDENTIAL PHASE 6: TEST 7 of 49

from the business community thus making go/no-go decisions easier.

Best Practices
Data Masking Implementation

Sample Deliverables
Critical Test Parameters

Last updated: 16-Mar-09 15:45

INFORMATICA CONFIDENTIAL

PHASE 6: TEST

8 of 49

Phase 6: Test
Subtask 6.1.2 Define Unit Test Plan Description
Any distinct unit of development must be adequately tested by the developer before it is designated ready for system test and for integration with the rest of the project elements. This includes any element of the project that can, in any way, be tested on its own. Rather than conducting unit testing in a haphazard fashion with no means of certifying satisfactory completion, all unit testing should be measured against a specified unit test plan and its completion criteria. Unit test plans are based on the individual business and functional requirements and detailed design for mappings, reports, or components for the mapping or report. The unit test plans should include specification of inputs, tests to verify, and expected outputs and results. The unit test is the best opportunity to discover any misinterpretation of the design as well as errors of development logic. The creation of the unit test plan should be a collaborative effort by the designer and the developer, and must be validated by the designer as meeting the business and functional requirements and design criteria. The designer should begin with a test scenario or test data descriptions and include checklists for the required functionality; the developer may add technical tests and make sure all logic paths are covered. The unit test plan consists of: Identification section: unit name, version number, date of build or change, developer, and other identification information. References to all applicable requirements and design documents. References to all applicable data quality processes (e.g., data analysis, cleansing, standardization, enrichment). Specification of test environment (e.g., system requirements, database/schema to be used). Short description of test scenarios and/or types of test runs. Per test run: Purpose (what features/functionality are being verified). Prerequisites. Definition of test inputs. References to test data or load-files to be used. Test script (step-by-step guide to executing the test). Specification (checklist) of the expected outputs, messages, error handling results, data output, etc. Comments and findings.

Prerequisites
None

Roles

Business Analyst (Secondary) Data Integration Developer (Primary) Presentation Layer Developer (Primary) Quality Assurance Manager (Review Only)

Considerations
Reference to design documents should contain the name and location of any related requirements documents, high-level and detailed design, mock-ups, workflows, and other applicable documents. Specification of the test environment should include such details as which reference or conversion tables must be used to translate the source data for the appropriate target (e.g., for conversion of postal codes, for key translation, other code translations). It should also include specification of any infrastructure elements or tools to be used in conjunction with the tests. The description of test runs should include the functional coverage, and any dependencies between test runs. INFORMATICA CONFIDENTIAL PHASE 6: TEST 9 of 49

Prerequisites should include whatever is needed to create the correct environment for the test to take place, any dependencies the test has on completion of other logic or test runs, availability of reference data, adequate space in database or file system, and so forth. The input files or tables must be specified with their locations. These data must be maintained in a secure place to make repeatable tests possible. Specifying the expected output is the main part of the test plan. It specifies in detail any output records and fields, and any functional or operational results through each step of the test run. The script should cover all of the potential logic paths and include all code translations and other transformations that are part of the unit. Comparing the produced output from the test run with this specification provides the verification that the build satisfies the design. The test script specifies all the steps needed to create the correct environment for the test, to complete the actual test run itself, and the steps to analyze the results. Analysis can be done by hand or by using compare scripts. The Comments and Findings section is where all errors and unexpected results found in the test run should be logged. In addition, errors in the test plan itself can be logged here as well. It is up to the QA Management and/or QA Strategy to determine whether to use a more advanced error tracking system for unit testing or to wait until system test. Some sites demand a more advanced error logging system, (e.g., ClearCase) where errors can be logged along with an indication of their severity and impact, as well as information about who is assigned to resolve the problem. One or more test runs can be specified in a single unit test plan. For example, one run may be an initial load against an empty target, with subsequent runs covering incremental loads against existing data or tests with empty input or with duplicate input records or files and empty reports. Test data must contain a mix of correct and incorrect data. Correct data can be expected to result in the specified output; incorrect data may have results according to the defined error-handling strategy such as creating error records or aborting the process. Examples of incorrect data are: Value errors: value is not in acceptable domain or an empty value for mandatory fields. Syntax errors: incorrect date format, incorrect postal code format, or non-numeric data in numeric fields. Semantic errors: two values are correct, but can not exist in the same record. Note that the error handling strategy should account for any Data Quality operations built into the project. Note also that some PowerCenter transformations can make use of data quality processes, or plans, developed in Informatica Data Quality (IDQ) applications. Data quality plan instructions can be loaded into a Data Quality Integration transformation (the transfomation is added to PowerCenter via a plug-in). Data quality plans should be tested using IDQ applications before they are added to PowerCenter transformations. The results of these tests will feed as prerequisites into the main unit test plan. The tests for data quality processes should follow the same guidelines as outlined in this document. A PowerCenter mapping should be validated once the Data Quality Integration transformation has been added to it and configured with a data quality plan. Every difference between the output expectation and the test output itself should be logged in the Comments and Findings section, along with information about the severity and impact on the test process. The unit test can proceed after analysis and error correction. The unit test is complete when all test runs are successfully completed and the findings are resolved and retested. At that point, the unit can be handed over to the next test phase.

Best Practices
Testing Data Quality Mappings

Sample Deliverables
Test Case List Unit Test Plan

INFORMATICA CONFIDENTIAL

PHASE 6: TEST

10 of 49

Last updated: 26-Oct-10 20:20

INFORMATICA CONFIDENTIAL

PHASE 6: TEST

11 of 49

Phase 6: Test
Subtask 6.1.3 Define System Test Plan Description
System Test (sometimes known as Integration Test) is crucial for ensuring that the system operates reliably as a fully integrated system and functions according to the business requirements and technical design. Success rests largely on business users' confidence in the integrity of the data. If the system has flaws that impede its functions, the data may also be flawed or users may perceive it as flawed,which results in a loss of confidence in the system. If the system does not provide adequate performance and responsiveness, the users may abandon it (especially if it is a reporting system) because it does not meet their perceived needs. As with the other testing processes, it is very important to begin planning for System Test early in the project to make sure that all necessary resources are scheduled and prepared ahead of time.

Prerequisites
None

Roles

Quality Assurance Manager (Review Only) Test Manager (Primary)

Considerations
Since the system test addresses multiple areas and test types, creation of the test plan should involve several specialists. The System Test Manager is then responsible for compiling their inputs into one consistent system test plan. All individuals participating in executing the test plan must agree on the relevant performance indicators that are required to determine if project goals and objectives are being met. The performance indicators must be documented, reviewed, and signed-off on by all participating team members. Performance indicators are placed in the context of Test Cases, Test Levels, and Test Types, so that the test team can easily measure and monitor their evaluation criteria.

Test Cases
The test case (i.e., unit of work to be tested) must be sufficiently specific to track and improve data quality and performance.

Test Levels
Each test case is categorized as occurring on a specific level or levels. This helps to clearly define the actual extent of testing expected within a given test case. Test levels may include one or more of the following: System Level. Covers all "end to end" integration testing, and involves the complete validation of total system functionality and reliability through all system entry points and exit points. Typically, this test level is the highest, and the last level of testing to be completed. Support System Level. Involves verifying the ability of existing support systems and infrastructure to accommodate new systems or the proposed expansion of existing systems. For example, this level of testing may determine the effect of a potential increase in network traffic due to an expanded system user base on overall business operations. Internal Interface Level. Covers all testing that involves internal system data flow. For example, this level of testing may validate the ability of PowerCenter to successfully connect to a particular data target and load data. External Interface Level. Covers all testing that involves external data sources. For example, this level of testing may collect data from diverse business systems into a data warehouse. Hardware Component Level. Covers all testing that involves verifying the function and reliability of specific hardware components. For example, this level of testing may validate a back-up power system by removing the primary power source. This level of testing typically occurs during the development cycle. Software Process Level. Covers all testing that involves verifying the function and reliability of specific software applications. This level of testing typically occurs during the development cycle. Data Unit Level. Covers all testing that involves verifying the function and reliability of specific data items and structures. This typically occurs during the development cycle in which data types and structures are defined and tested INFORMATICA CONFIDENTIAL PHASE 6: TEST 12 of 49

based on the application design constraints and requirements.

Test Types
The Data Integration Developer generates a list of the required test types based on the desired level of testing. The defined test types determine what kind of tests must be performed to satisfy a given test case. Test types that may be required include: Critical Technical Parameters (CTPs). A worksheet of specific CTPs is established, based on the identified test types. Each CTP defines specific functional units that are tested. This should include any specific data items, component, or functional parts. Test Condition Requirements (TCRs). Test Condition Requirement scripts are developed to satisfy all identified CTPs. These TCRs are assigned a numeric designation and include the test objective, list of any prerequisites, test steps, actual results, expected results, tester ID, the current date, and the current iteration of the test. All TCRs are included with each Test Case Description (TCD). Test Execution and Progression. A detailed description of general control procedures for executing a test, such as special conditions and processes for returning a TCR to a developer in the event that it fails. This description is typically provided with each TCD. Test Schedule. A specific test schedule that is defined within each TCD, based upon the project plan, and maintained using MS Project or a comparable tool. The overall Test Schedule for the project is available in the TCD Test Schedule Summary, which identifies the testing start and end dates for each TCD. As part of 6.2 Execute System Test other specific tests should be planned for :6.3.3 Perform Data Validation 6.3.4 Conduct Disaster Recovery Testing 6.3.5 Conduct Volume Testing The system test plans should include: System name, version number, list of components Reference to design document(s) such as high-level designs, workflow designs, database model and reference, hardware descriptions, etc. Specification of test environment Overview of the test runs (coverage, interdependencies) Per test run: Type and purpose of the test run (coverage, results, etc.) Prerequisites (e.g., accurate results from other test runs, availability of reference data, space in database or file system, availability of monitoring tools, etc.) Definition of test input References to test data or load-files to be used (note: data must be stored in a secure place to permit repeatable tests) Specification of the expected output and system behaviour (including record counts, error records expected, expected runtime, etc.) Specification of expected and maximum acceptable runtime Step-by-step guide to execute the test (including environment preparation, results recording, and analysis steps, etc.) Defect tracking process and tools Description of structure for meetings to discuss progress, issues and defect management during the test The system test plan consists of one or more test runs, each of which must be described in detail. The interaction between the test runs must also be specified. After each run, the System Test Manager can decide, depending on the defect count and severity, whether the system test can proceed with subsequent test runs or that errors must be corrected and the previous run repeated. Every difference between the expected output and the test output itself should be recorded and entered into the defect tracking system with a description of the severity and impact on the test process. These errors and the general progress of the system test should be discussed in a weekly or bi-weekly progress meeting. At this meeting, participants review the progress of the

INFORMATICA CONFIDENTIAL

PHASE 6: TEST

13 of 49

system test, any problems identified, and assignments to resolve or avoid them. The meeting should be directed by the System Test Manager and attended by the testers and other necessary specialists like designers, developers, systems engineers and database administrators. After assignment of the findings, the specialists can take the necessary actions to resolve the problems. After the solution is approved and implemented, the system test can proceed. When all tests are run successfully and all defects are resolved and retested, the system test plan will have been completed.

Best Practices
None

Sample Deliverables
None

Last updated: 15-Feb-07 19:38

INFORMATICA CONFIDENTIAL

PHASE 6: TEST

14 of 49

Phase 6: Test
Subtask 6.1.4 Define User Acceptance Test Plan Description
User Acceptance Testing (often know as UAT) is essential for gaining approval, acceptance and project sign off. It is the end user community that needs to carryout the testing and identify relevant issues for fixing. Resources for the testing will include physical environment setup as well as allocation of staff to testing from the user community. As with system testing, planning for User Acceptance Testing should be begun early in the project so as to ensure the necessary resources are scheduled and ready. In addition, the user acceptance criteria will need to be distilled from the requirements and existing gold standard reports. These criteria need to be documented and agreed by all parties so as to avoid delays through scope creep.

Prerequisites
None

Roles

Business Analyst (Secondary) End User (Primary) Quality Assurance Manager (Approve) Test Manager (Primary)

Considerations
The plan should be a construction of the acceptance criteria, with test scripts of actions that users will need to carry out to achieve certain results. For example, instructions to run particular workflows and run reports within which, the users can then examine the data. The author of the plan needs to bear in mind that the testers from user community may not be technically minded. Indeed, one possible benefit of having non technical users involved, is that they will provide an insight into the time and effort required for adoption and training when the completed data integration project is deployed. In addition to test scripts for execution additional criteria for acceptance need to be defined:Performance, required response time and usability Data quality tolerances Validation procedures for verifying data quality Tolerable bugs based on the defect management processes In Data Migration projects, user acceptance testing is even more user-focused than other data integration efforts. This testing usually takes two forms, traditional UAT and day-in-the-life. During these two phases, business users are working through the system, executing their normal daily routine and driving out issues and inconsistencies. It is very important that the data migration team works closely with the business testers to both provide appropriate data for these tests and to capture feedback to improve the data as soon as possible. This UAT activity is the best way to find out if the data is correct and if the data migration was completed successfully.

Best Practices
None

Sample Deliverables
None

Last updated: 01-Feb-07 18:48

INFORMATICA CONFIDENTIAL

PHASE 6: TEST

15 of 49

Phase 6: Test
Subtask 6.1.5 Define Test Scenarios Description
Test scenarios provide the context, the story line, for much of the test procedures, whether Unit Test, System Test or UAT. How can you know that the software solution youre developing will work within its ultimate business usage? A scenario provides the business case for testing specific functionality, enabling testers to pretend to carry-out the related business activity and then measure the results against expectations. For this reason, design of the scenarios is a critical activity and one that may involve significant effort in order to provide coverage for all the functionality that needs testing. The test scenario forms the basis for development of test scripts and checklists, the source data definitions, and other details of specific test runs.

Prerequisites
None

Roles

Business Analyst (Secondary) End User (Primary) Quality Assurance Manager (Approve) Test Manager (Primary)

Considerations
Test scenarios must be based on the functional and technical requirements by dividing them into specific functions that can be treated in a single test process. Test scenarios may include: The purpose/objective of the test (functionality being tested) described in end-user terms. Description of business, functional, or technical context for the test. Description of the type of technologies, development objects, and/or data that should be included. Any known dependencies on other elements of the existing or new systems. Typical attributes of test scenarios: Should be designed to represent both typical and unusual situations. Should include use of valid data as well as invalid or missing data. Test engineers may define their own unit test cases. Business cases and test scenarios for System and Integration Tests are developed by the test team with assistance of developers and end-users.

Best Practices
None

Sample Deliverables
None

Last updated: 01-Feb-07 18:48

INFORMATICA CONFIDENTIAL

PHASE 6: TEST

16 of 49

Phase 6: Test
Subtask 6.1.6 Build/Maintain Test Source Data Set Description
This subtask deals with the procedures and considerations for actually creating, storing, and maintaining the test data. The procedures for any given project are, of course, specific to its requirements and environments, but are also opportunistic. For some projects, there will exist a comprehensive set of data or at least a good start in that direction, while for other projects, the test data may need to be created from scratch. In addition to test data that allows full functional testing (i.e., functional test data), there is also a need for adequate data for volume tests (i.e., volume test data). The following paragraphs discuss each of these data types.

Functional Test Data


Creating a source data set to test the functionality of the transformation software should be the responsibility of a specialized team largely consisting of business-aware application experts. Business application skills are necessary to ensure that the test data not only reflects the eventual production environment but that it is also engineered to trigger all the functionality specified for the application. Technical skills in whatever storage format is selected are also required to facilitate data entry and/or movement. Volume is not a requirement of the functional test data set; indeed, too much data is undesirable since the time taken to load it needlessly delays the functional test. In a data integration project, while functional test data for the application sources is indispensable, the case for a predefined data set for the targets should also be considered. If available, such a data set makes it possible to develop an automated test procedure to compare the actual result set to a predicted result set (making the necessary adjustments to generated data, such as surrogate keys, timestamps, etc.). This has additional value in that the definition of a target data set in itself serves as a sort of design audit.

Volume Test Data


The main objective for the volume test data set is to ensure that the project satisfies any Service Level Agreements that are in place and generally meets performance expectations in the live environment. Once again, PowerCenter can be used to generate volumes of data and to modify sensitive live information in order to preserve confidentiality. There are a number of techniques to generate multiple output rows from a single source row, such as: Cartesian join in source qualifier Normalizer transformation Union transformation Java transformation If possible, the volume test data set should also be available to developers for unit testing in order to identify problems as soon as possible.

Maintenance
In addition to the initial acquisition or generation of test data, you will need a protected location for its storage and procedures for migrating it to test environments in such a fashion that the original data set is preserved (for the next test sequence). In addition, you are likely to need procedures that will enable you to rebuild or rework the test data, as required.

Prerequisites
None

Roles

Business Analyst (Primary) Data Integration Developer (Primary)

Considerations
Creating the source and target data sets and conducting automated testing are non-trivial, and are therefore, often dismissed as INFORMATICA CONFIDENTIAL PHASE 6: TEST 17 of 49

impractical. This is partly the result of a failure to appreciate the role that PowerCenter can play in the execution of the test strategy. At some point in the test process, it is going to be necessary to compile a schedule of expected results from a given starting point. Using PowerCenter to make this information available and to compare the actual results from the execution of the workflows can greatly facilitate the process. Data Migration projects should have little need for generating test data. It is strongly recommended that all data migration integration and system tests use actual production data. Therefore, effort spent generating test data on a data migration project should be very limited.

Best Practices
None

Sample Deliverables
None

Last updated: 15-Feb-07 19:40

INFORMATICA CONFIDENTIAL

PHASE 6: TEST

18 of 49

Phase 6: Test
Task 6.2 Prepare for Testing Process Description
This is the first major task of the Test Phase general preparations for System Test and UAT. This includes preparing environments, ramping up defect management procedures, and generally making sure the test plans and all their elements are prepared and that all participants have been notified of the upcoming testing processes.

Prerequisites
None

Roles

Data Integration Developer (Secondary) Database Administrator (DBA) (Primary) Presentation Layer Developer (Secondary) Quality Assurance Manager (Primary) Repository Administrator (Primary) System Administrator (Primary) Test Manager (Primary)

Considerations
Prior to beginning this subtask, you will need to collect and review the documentation generated by the previous tasks and subtasks, including the test strategy, system test plan, and UAT plan. Verify that all required test data has been prepared and that the defect tracking system is operational. Ensure that all unit test certification procedures are being followed. Based on the system test plan and UAT plan: Collect all relevant requirements, functional and internal design specifications, end-user documentation, and any other related documents. Develop the test procedures and documents for testers to follow from these. Verify that all expected participants have been notified of the applicable test schedule. Review the upcoming test processes with the Project Sponsor to ensure that they are consistent with the organization's existing QA culture (i.e., in terms of testing scope, approaches, and methods). Review the test environment requirements (e.g., hardware, software, communications, etc.) to ensure that everything is in place and ready. Review testware requirements (e.g., coverage analyzers, test tracking, problem/bug tracking, etc.) to ensure that everything is ready for the upcoming tests.

Best Practices
None

Sample Deliverables
None

Last updated: 01-Feb-07 18:48

INFORMATICA CONFIDENTIAL

PHASE 6: TEST

19 of 49

Phase 6: Test
Subtask 6.2.1 Prepare Environments Description
It is important to prepare the test environments in advance of System Test with the following objectives: To To To To emulate, to the extent possible, the Production environment. provide test environments that enable full integration of the system, and isolation from development. provide secure environments that support the test procedures and appropriate access. allow System Tests and UAT to proceed without delays and without system disruptions.

Prerequisites
None

Roles

Data Integration Developer (Secondary) Database Administrator (DBA) (Primary) Presentation Layer Developer (Secondary) Repository Administrator (Primary) System Administrator (Primary) Test Manager (Primary)

Considerations Plans
A formal test plan needs to be prepared by the Project Manager in conjunction with the Test Manager. This plan should cover responsibilities, tasks, time-scales, resources, training, and success criteria. It is vital that all resources, including off-project support staff, are made available for the entire testing period. Test scripts need to be prepared, together with a definition of the data required to execute the scripts. The Test Manager is responsible for preparing these items, but is likely to delegate a large part of the work. A formal definition of the required environment also needs to be prepared, including all necessary hardware components (i.e., server and client), software components (i.e., operating system, database, data movement, testing tools, application tools, custom application components etc., including versions), security and access rights, and networking. Establishing security and isolation is critical for preventing any unauthorized or unplanned migration of development objects into the test environments. The test environment administrator(s) must have specific verifications, procedures, and timing for any migrations and sufficient controls to enforce them. Review the test plans and scenarios to determine the technical requirements for the test environments. Volume tests and disaster/recovery tests may require special system preparations. The System Test environment may evolve into the UAT environment, depending on requirements and stability.

Processes
Where possible, all processes should be supported by the use of appropriate tools. Some of the key terminology related to the preparation of the environments and the associated processes include: Training testers a series of briefings and/or training sessions should be made available. This may be any combination of formal presentations, formal training courses, computer based tutorials or self-study sessions. Recording test results the results of each test must be recorded and cross-referenced to the defect reporting process. Reporting and resolution of defects (see 5.1.3 Define Defect Tracking Process) a process for recording defects, INFORMATICA CONFIDENTIAL PHASE 6: TEST 20 of 49

prioritizing their resolution, and tracking the resolution process. Overall test management a process for tracking the effectiveness of UAT and the likely effort and timescale remaining

Data
The data required for testing can be derived from the test cases defined in the scripts. This should enable a full dataset to be defined, ensuring that all possible cases are tested. 'Live data' is usually not sufficient because it does not cover all the cases the system should handle, and may require some sampling to keep the data volumes at realistic levels. It is, of course, possible to use modified live data, adding the additional cases or modifying the live data to create the required cases. The process of creating the test data needs to be defined. Some automated approach to creating all or the majority of the data is best. There is often a need to process data through a system where some form of OLTP is involved. In this case, it must be possible to roll-back to a base-state of data to allow reapplication of the transaction data as would be achieved by restoring from back-up. Where multiple data repositories are involved, it is important to define how these datasets relate. It is also important that the data is consistent across all the repositories and that it can be restored to a known state (or states) as and when required.

Environment
A properly set-up environment is critical to the success of UAT. This covers: Server(s) must be available for the required duration and have sufficient disk space and processing power for the anticipated workload. Client workstations must be available and sufficiently powerful to run the required client tools. Server and client software all necessary software (OS, database, ETL, test tools, data quality tools, connectivity etc.) should be installed at the version used in development (normally) with databases created as required. Networking all required LAN and WAN connectivity must be set up and firewalls configured to allow appropriate access. Bandwidth must be available for any particular large data transmissions. Databases all necessary schemas must be created and populated with an appropriate backup/restore strategy in place, and access rights defined and implemented. Application software correct versions should be migrated from development. For Data Migration, the system test environment should not be limited to the Informatica environment, but should also include all source systems, target systems, reference data and staging databases, and file systems. The system tests will be a simulation of production systems so the entire process should execute like a production environment.

Best Practices
None

Sample Deliverables
None

Last updated: 15-Feb-07 19:43

INFORMATICA CONFIDENTIAL

PHASE 6: TEST

21 of 49

Phase 6: Test
Subtask 6.2.2 Prepare Defect Management Processes Description
The key measure of software quality is, of course, the number of defects (a defect is anything that produces results other than the expected results based on the software design specification). Therefore it is essential for software projects to have a systematic approach to detecting and resolving defects early in the development life cycle.

Prerequisites
None

Roles

Quality Assurance Manager (Primary) Test Manager (Primary)

Considerations
Personal and peer reviews are primary sources of early defect detection. Unit testing, system testing and UAT are other key sources, however, in these later project stages, defect detection is a much more resource-intensive activity. Worse yet, change requests and trouble reports are evidence of defects that have made their way to the end users. There are two major components of successful defect management, defect prevention and defect detection. A good defect management process should enable developers to both lower the number of defects that are introduced, and remove defects early in the life cycle prior to testing. Defect management begins with the design of the initial QA strategy and a good, detailed test strategy. They should clearly define methods for reviewing system requirements and design and spell out guidelines for testing processes, tracking defects, and managing each type of test. In addition, many QA strategies include specific checklists that act as gatekeepers to authorize satisfactory completion of tests, especially during unit and system testing. To support early defect resolution, you must have a defect tracking system that is readily accessible to developers and includes the following: Ability to identify and type the defect, with details of its behaviour Means for recording the timing of the defect discovery, resolution, and retest Complete description of the resolution

Best Practices
None

Sample Deliverables
None

Last updated: 01-Feb-07 18:48

INFORMATICA CONFIDENTIAL

PHASE 6: TEST

22 of 49

Phase 6: Test
Task 6.3 Execute System Test Description
System Test (sometimes known as Integration Test) is crucial for ensuring that the system operates reliably and according to the business requirements and technical design. Success rests largely on business users' confidence in the integrity of the data. If the system has flaws that impede its function, the data may also be flawed, or users may perceive it as flawed - which results in a loss of confidence in the system. If the system does not provide adequate performance and responsiveness, the users may abandon it (especially if it is a reporting system) because it does not meet their perceived needs. System testing follows unit testing, providing the first tests of the fully integrated system, and offers an opportunity to clarify users performance expectations and establish realistic goals that can be used to measure actual operation after the system is placed in production. It also offers a good opportunity to refine the data volume estimates that were originally generated in the Architect Phase. This is useful for determining if existing or planned hardware will be sufficient to meet the demands on the system. This task incorporates five steps: 1. 6.3.1 Prepare for System Test , in which the test team determines how to test the system from end-to-end to ensure a successful load as well as planning for the environments, participants, tools and timelines for the test. 2. 6.3.2 Execute Complete System Test , in which the data integration team works with the Database Administrator to run the system tests planned in the prior subtask. It is crucial to also involve end-users in the planning and review of system tests. 3. 6.3.3 Perform Data Validation , in which the QA Manager and QA team ensure that the system is capable of delivering complete, valid data to the business users. 4. 6.3.4 Conduct Disaster Recovery Testing , in which the systems robustness and recovery in case of disasters such as network or server failure is tested. 5. 6.3.5 Conduct Volume Testing , in which the systems capability to handle large volumes is tested.

Prerequisites
None

Roles

Business Analyst (Primary) Data Integration Developer (Primary) Database Administrator (DBA) (Primary) End User (Primary) Network Administrator (Secondary) Presentation Layer Developer (Secondary) Project Sponsor (Review Only) Quality Assurance Manager (Review Only) Repository Administrator (Secondary) System Administrator (Primary) Technical Project Manager (Review Only) Test Manager (Primary)

INFORMATICA CONFIDENTIAL

PHASE 6: TEST

23 of 49

Considerations
All involved individuals and departments should review and approve the test plans, test procedures, and test results prior to beginning this subtask. It is important to thoroughly document the system testing procedure, describing the testing strategy, acceptance criteria, scripts, and results. This information can be invaluable later on, when the system is in operation and may not be meeting performance expectations or delivering the results that users want - or expect. For Data Migration projects, system tests are important because these are essentially dress-rehearsals for the final migration. These tests should be executed with production-level controls and be tracked and improved upon from system test cycle to system test cycle. In data migration projects these system tests are often referred to as mock-runs or trial cutovers.

Best Practices
None

Sample Deliverables
None

Last updated: 01-Feb-07 18:48

INFORMATICA CONFIDENTIAL

PHASE 6: TEST

24 of 49

Phase 6: Test
Subtask 6.3.1 Prepare for System Test Description
System test preparation consists primarily of creating the environment(s) required for testing the application and staging the system integration. System Test is the first opportunity, following comprehensive unit testing, to fully integrate all the elements of the system, and to test the system by emulating how it will be used in production. For this reason, the environment should be as similar as possible to the production environment in its hardware, software, communications, and any support tools.

Prerequisites
None

Roles

Data Integration Developer (Secondary) Database Administrator (DBA) (Secondary) System Administrator (Secondary) Test Manager (Primary)

Considerations
The preparations for System Test often take much more effort than expected, so they should be preceded by a detailed integration plan that describes how all of the system elements will be physically integrated within the System Test environment. The integration plan should be specific to your environment, but some of the general steps are likely be the same. The following are some general steps that are common in most integration plans. Migration of Informatica development folders to the system test environment. These folders may also include shared folders and/or shortcut folders that may have been added or modified during the development process. In versioned repositories, deployment groups may be used for this purpose. Often, flat files or parameter files reside on the development environments server and need to be copied to the appropriate directories on the system test environment server. Data consistency in system test environment is crucial. In order to emulate the production environment, the data being sourced and targeted should be as close as possible to production data in terms of data quality and size. The data model of the system test environment should be very similar to the model that is going to be implemented in production. Columns, constraints, or indices often change throughout development, so it is important to system test the data model before going into production. Synchronization of incremental logic is key when doing system testing. In order to emulate the production environment, the variables or parameters used for incremental logic need to match the values in the system test environment database(s). If the variables or parameters dont match, they can cause missing data or unusual amounts of data being sourced. For Data Migration projects the system test should not just involve running Informatica Workflows, it should also include data setup, migrating code, executing data and process validation and post-process auditing. The system test set-up should be part of the system test, not a pre-system test step.

Best Practices
None

Sample Deliverables
System Test Plan

INFORMATICA CONFIDENTIAL

PHASE 6: TEST

25 of 49

Last updated: 01-Feb-07 18:48

INFORMATICA CONFIDENTIAL

PHASE 6: TEST

26 of 49

Phase 6: Test
Subtask 6.3.2 Execute Complete System Test Description
System testing offers an opportunity to establish performance expectations and verify that the system works as designed, as well as to refine the data volume estimates generated in the Architect Phase . This subtask involves a number of guidelines for running the complete system test and resolving or escalating any issues that may arise during testing.

Prerequisites
None

Roles

Business Analyst (Secondary) Data Integration Developer (Secondary) Database Administrator (DBA) (Review Only) Network Administrator (Review Only) Presentation Layer Developer (Secondary) Quality Assurance Manager (Review Only) Repository Administrator (Review Only) System Administrator (Review Only) Technical Project Manager (Review Only) Test Manager (Primary)

Considerations System Test Plan


A System test plan needs to include pre-requisites to enter into the system test phase, criteria to successfully exit system test phase, and defect classifications. In addition, all test conditions, expected results, and test data need to be available prior to system test.

Load Routines
Ensure that the system test plan includes all types of load that may be encountered during the normal operation of the system. For example, a new data warehouse (or a new instance of a data warehouse) may include a one-off initial load step. There may also be weekly, monthly, or ad-hoc processes beyond the normal incremental load routines. System testing is a cyclical process. The project team should plan to execute multiple iterations of the most common load routines within the timeframe allowed for system testing. Applications should be run in the order specified in the test plan.

Scheduling
An understanding of dependent predecessors is crucial for the execution of end-to-end testing, as is the schedule for the testing run. Scheduling, which is the responsibility of the testing team, is generally facilitated through an application such as the PowerCenter Workflow Manager module and/or a third-party scheduling tool. Use the pmcmd command line syntax when running PowerCenter tasks and workflows with a third-party scheduler. Third-party scheduling tools can create dependencies between PowerCenter tasks and jobs that may not be possible to run on PowerCenter. Also the tools in PowerCenter and/or a third-party scheduling tool can be used to detect long running sessions/tasks and alert the system test team via email. This helps to identify issues early and manage system test timeframe effectively. INFORMATICA CONFIDENTIAL PHASE 6: TEST 27 of 49

System Test Results


The team executing the system test plan is responsible for tracking the expected and actual results of each session and task run. Commercial software tools are available for logging test cases and storing test results. The details of each PowerCenter session run can be found in the Workflow Monitor. To see the results: Right-click the session in the Workflow Monitor and choose Properties. Click the Transformation Statistics tab in the Properties dialog box. Session statistics are also available in the PowerCenter repository view REP_SESS_LOG, or through Metadata Reporter.

Resolution of Coding Defects


The testing team must document the specific statistical results of each run and communicate those results back to the project development team. If the results do not meet the criteria listed in the test case, or if any process fails during testing, the test team should immediately generate a change request. The change request is assigned to the developer(s) responsible for completing system modifications. In the case of a PowerCenter session failure, the test team should seek the advice of the appropriate developer and business analyst before continuing with any other dependent tests. Ideally, all defects will be captured, fixed, and successfully retested within the system testing timeframe. In reality, this is unlikely to happen. If outstanding defects are still apparent at the end of the system testing period, the project team needs to decide how to proceed. If system test plan contains successful system test completion criteria, those criteria must be fulfilled. Defect levels must meet established criteria for completion of the system test cycle. Defects should be judged by their number and by their impact. Ultimately, the project team is responsible for ensuring that the tests adhere to the system test plan and the test cases within it (developed in Subtask 6.3.1 Prepare for System Test ). The project team must review and sign-off on the results of the tests. For Data Migration projects, because they are usually part of a larger implementation the system test should be integrated with the larger project system test. The results of this test should be reviewed, improved upon and communicated to the project manager or project management office (PMO). It is common for these types of projects to have three or four full system tests otherwise known as mock runs or trial cutovers.

Best Practices
None

Sample Deliverables
None

Last updated: 15-Feb-07 19:46

INFORMATICA CONFIDENTIAL

PHASE 6: TEST

28 of 49

Phase 6: Test
Subtask 6.3.3 Perform Data Validation Description
The purpose of data validation is to ensure that source data is populated as per specification. The team responsible for completing the end-to-end test plan should be in a position to utilize the results detailed in the testing documentation (e.g., TCR, CTPs, TCD, and TCRs). Test team members should review and analyze the test results to determine if project and business expectations are being met. If the team concludes that the expectations are being met, it can sign-off on the end-to-end testing process. If expectations are not met, the testing team should perform a gap analysis on the differences between the test results and the project and business expectations. The gap analysis should list the errors and requirements not met so that a Data Integration Developer can be assigned to investigate the issue. The analysis should also include data from initial runs in production. The Data Integration Developer should assess the resources and time required to modify the data integration environment to achieve the required test results. The Project Sponsor and Project Manager should then finalize the approach for incorporating the modifications, which may include obtaining additional funding or resources, limiting the scope of the modifications, or re-defining the business requirements to minimize modifications.

Prerequisites
None

Roles

Business Analyst (Primary) Data Integration Developer (Review Only) Presentation Layer Developer (Secondary) Project Sponsor (Review Only) Quality Assurance Manager (Review Only) Technical Project Manager (Review Only) Test Manager (Primary)

Considerations
Before performing data validation, it is important to consider these issues: Job Run Validation. A very high-level testing validation can be performed using dashboards or custom reports using Informatica Data Explorer. The session logs and the workflow monitor can be used to check if the job has completed successfully. If relational database error logging is chosen, then the error tables can be checked for any transformation errors and session errors. The Data Integration Developer needs to resolve the errors identified in the error tables. The Integration Service generates the following tables to help you track row errors: PMERR_DATA. Stores data and metadata about a transformation row error and its corresponding source row. PMERR_MSG. Stores metadata about an error and the error message. PMERR_SESS. Stores metadata about the session. PMERR_TRANS. Stores metadata about the source and transformation ports, such as name and datatype, when a transformation error occurs. Involvement. The test team, the QA team, and, ultimately, the end-user community are all jointly responsible for ensuring the accuracy of the data. At the conclusion of system testing, all must sign-off to indicate their acceptance of the data quality. INFORMATICA CONFIDENTIAL PHASE 6: TEST 29 of 49

Access To Front-End for Reviewing Results. The test team should have access to reports and/or a front-end tool to help review the results of each testing run. Before testing begins, the team should determine just how results are to be reviewed and reported, what tool(s) are to be used, and how the results are to be validated. The test team should also have access to current business reports produced in legacy and current operational systems. The current reports can be compared to those produced from data in the new system to determine if requirements are satisfied and that the new reports are accurate. The Data Validation task has enormous scope and is a significant phase in any project cycle. Data validation can be either manual or automated. Manual. This technique involves manually validating target data with source and also ensuring that all the transformation have been correctly applied. Manual validation may be valid for a limited set of data or for master data. Automated. This technique involves using various techniques and/or tools to validate data and ensure, at the end of cycle, that all the requirements are met. The following tools are very useful for data validation: File Diff. This utility is generally available with any testing tool and is very useful if the source(s) and target(s) are files. Otherwise, the result sets from the source and/or target systems can be saved as flat files and compared using file diff utilities. Data Analysis Using IDQ. The testing team can use Informatica Data Quality (IDQ) Data Analysis plans to assess the level of data quality needs. Plans can be built to identify problems with data conformity and consistency. Once the data is analyzed, scorecards can be used to generate a high-level view of the data quality. Using the results from data analysis and scorecards, new test cases can be added and new test data can be created for the testing cycle. Using DataProfiler In Data Validation. Full data validation can be one of the most time-consuming elements of the testing process. During the System Test phase of the data integration project, you can use data profiling technology to validate the data loaded to the target database. Data profiling allows the project team to test the requirements and assumptions that were the basis for the Design Phase and Build Phase of the project, facilitating such tests as: Business rule validations Domain validations Row counts and distinct value counts Aggregation accuracy Throughout testing, it is advisable to re-profile the source data. This provides information on any source data changes that may have taken place since the Design Phase. Additionally, it can be used to verify the makeup and diversity of any data sets extracted or created for the purposes of testing. This is particularly relevant in environments where production source data was not available during design. When development data is used to develop the business rules for the mappings, surprises commonly occur when production data finally becomes available.

Defect Management:
The defects encountered during the data validation should be organized using either a simple tool like an Excel (or comparable) spreadsheet or a more advanced tool. Advanced tools may have facilities for defect assignment, defect status changes, and/or a section for defect explanation. The Data Integration Developer and the testing team must ensure that all defects are identified and corrected before changing the defect status. For Data Migration projects it is important to identify a set of processes and procedures to be executed to simplify the validation process. These processes and procedures should be built into the Punch List and should focus on reliability and efficiency. For large scale data migration projects it is important to realize the scale of validation. A set of tools must be developed to enable the business validation personnel to quickly and accurately validate that the data migration was complete. Additionally it is important that the run book includes steps to verify that all technical steps were completed successfully. PowerCenter Metadata Reporter should be leveraged and documented in the punch list steps and detailed records of all interaction points should be included in operational procedures.

Best Practices
None

INFORMATICA CONFIDENTIAL

PHASE 6: TEST

30 of 49

Sample Deliverables
None

Last updated: 15-Feb-07 19:48

INFORMATICA CONFIDENTIAL

PHASE 6: TEST

31 of 49

Phase 6: Test
Subtask 6.3.4 Conduct Disaster Recovery Testing Description
Disaster testing is crucial for proving the resilience of the system to the business sponsors and IT support teams, and for ensuring that staff roles and responsibilities are understood if a disaster occurs.

Prerequisites
None

Roles

Database Administrator (DBA) (Primary) End User (Primary) Network Administrator (Secondary) Quality Assurance Manager (Review Only) Repository Administrator (Secondary) System Administrator (Primary) Test Manager (Primary)

Considerations
Prior to disaster testing, disaster tolerance and system architecture need to be considered. These factors should already have been assessed during earlier phases of the project. The first step is to try to quantify the risk factors that could cause a system to fail and evaluate how long the business could cope without the system should it fail. These determinations should allow you to judge the disaster tolerance capabilities of the system. Secondly, consider the system architecture. A well-designed system will minimize the risk of disaster. If a disaster occurs, the system should allow a smooth and timely recovery.

Disaster Tolerance
Disaster tolerance is the ability to successfully recover applications and data after a disaster within an acceptable time period. A disaster is an event that unexpectedly disrupts service availability, corrupts data, or destroys data. Disasters may be triggered by natural phenomena, malicious acts of sabotage against the organization, or terrorist activity against society in general. The need for a disaster tolerant system depends on the risk of disaster and how long the business can afford applications to be out of action. The location and geographical proximity of data centers plus the nature of the business affect risk. The vulnerability of the business to disaster depends upon the importance of the system to the business as a whole and the nature of a system. Service level agreements (SLA) for the availability of a system dictate the need for disaster testing. For example, a real-time message-based transaction processing application that has to be operational 24/7 needs to be recovered faster than a management information system with a less stringent SLA.

System Architecture
Disaster testing is strongly influenced by the system architecture. A system can be designed with a clustered architecture to reduce the impact of disaster. For example, a user acceptance system and a production system can run in a clustered environment. If the production server fails, the user acceptance machine can take over. As an extra precaution, replication technology can be used to protect critical data. PowerCenter server grid technology is beneficial when designing and implementing a disaster tolerant system. Normally, server grids are used to balance loads and improve performance on resource-intensive tasks, but they can help reduce disaster

INFORMATICA CONFIDENTIAL

PHASE 6: TEST

32 of 49

recovery time too. Sessions in a workflow can be configured to run on any available server that is registered to the grid. The servers in the grid must be able to create and maintain a connection to each other across the network. If a server unexpectedly shuts down while it is running a session, then the workflow can be set to fail. This depends on the session settings specified and whether the server is configured as a master or worker server. Although the failed workflow has to be manually recovered if one of the servers unexpectedly shuts down, other servers in the grid should be available to rerun it, unless a catastrophic network failure occurs. The guideline is to aim to avoid single points of failure in a system where possible. Clustering and server grid solutions alleviate single points of failure. Be aware that single physical points of failure are often hardware and network related. Be sure to have backup facilities and spare components available, for example auxiliary generators, spare network cards, cooling systems; even a torch in case the lights go out! Perhaps the greatest risk to a system is human error. Businesses need to provide proper training for all staff involved in maintaining and supporting the system. Also be sure to provide documentation and procedures to cope with common support issues. Remember a single mis-typed command or clumsy action can bring down a whole system.

Disaster Test Planning


After disaster tolerance and system architecture have been considered, you can begin to prepare the disaster test plan. Allow sufficient time to prepare the plan. Disaster testing requires a significant commitment in terms of staff and financial resources. Therefore, the test plan and activities should be precise, relevant, and achievable. The test plan identifies the overall test objectives; consider what the test goals are and whether they are worthwhile for the allocated time and resources. Furthermore, the plan explains the test scope, establishes the criteria for measuring success, specifies any prerequisites and logistical requirements (e.g., the test environment), includes test scripts, and clarifies roles and responsibilities.

Test Scope
Test scope identifies the exact systems and functions to be tested. There may not be time to test for every possible disaster scenario. If so the scope should list and explain why certain functions or scenarios cannot be tested. Focus on the stress points for each particular application when deciding on the test scope. For example, in a typical data warehouse it is quite easy to recover data during the extract phase (i.e., when data is being extracted from a legacy system based on date/time criteria). It may be more difficult to recover from a downstream data warehouse or data mart load process, however. Be sure to enlist the help of application developers and system architects to identify the stress points in the overall system.

Establish Success Criteria


In theory, success criteria can be measured in several ways. Success can mean identifying a weakness in the system highlighted in the test cycle or successfully executing a series of scripts to recover critical processes that were impacted by the disaster test case. Use SLAs to help establish quantifiable measures of success. SLAs should already exist specifically for disaster recovery criteria. In general, if the disaster testing results meet or beat the SLA standards, then the exercise can be considered a success.

Environment and Logistical Requirements


Logistical requirements include schedules, materials, and premises, as well as hardware and software needs. Try and prepare a dedicated environment for disaster testing. As new applications are created and improved, they should be tested in the isolated disaster-testing environment. It is important to regularly test for disaster tolerance, particularly if new hardware and / or software components are introduced to the system being tested. Make sure that the testing environment is kept up to date with code and infrastructure changes that are being applied in the normal system testing environment(s). The test schedule is important because it explains what will happen and when. For example, if the electricity supply is going to be turned off or the plug pulled on a particular server, it must be scheduled and communicated to all concerned parties.

INFORMATICA CONFIDENTIAL

PHASE 6: TEST

33 of 49

Test Scripts
The disaster test plan should include test scripts, detailing the actions and activities required to actually conduct the technical tests. These scripts can be simple or complex, and can be used to provide instructions to test participants. The test scripts should be prepared by the business analysts and application developers.

Staff Roles and Responsibilities


Encourage the organization IT security team to participate in a disaster testing exercise. They can assist in simulating an attack on the database, identifying vulnerable access points on a network, and fine-tuning the test plan. Involve business representatives as well as IT testing staff in the disaster testing exercise. IT testing staff can focus on technical recovery of the system. Business users can identify the key areas for recovery and prepare backup strategies and procedures in case system downtime exceeds normal expectations. Ensure that the test plan is approved by the appropriate staff members and business groups.

Executing Disaster Tests


Disaster test execution should expose any flaws in the system architecture or in the test plan itself. The testing team should be able to run the tests based on the information within the test plan and the instructions in the test scripts. Any deficiencies in this area need to be addressed because a good test plan forms the basis of an overall disaster recovery strategy for the system. The test team is responsible for capturing and logging test results. It needs to communicate any issues in a timely manner to the application developers, business analysts, end-users, and system architects. It is advisable to involve other business and IT departmental staff in the testing where possible, not just the department members who planned the test. If other staff can understand the plan and successfully recover the system by following it, then the impact of a real disaster is reduced.

Data Migration Projects


While data migration projects dont fully require a full-blown disaster recovery solution, it is recommended to establish a disaster recovery plan. Typically this is a simple document to identify emergency procedures to follow if something were to happen to any of the major pieces of infrastructure. Additionally, a back-out plan should be present in the event the migration must stop midstream during the final implementation weekend.

Conclusion and Postscript


Disaster testing is a critical aspect of the overall system testing strategy. If conducted properly, disaster testing provides valuable feedback and lessons that will prove important if a real disaster strikes.

Postscript: Backing Up PowerCenter Components


Apply safeguards to protect important PowerCenter components, even if disaster tolerance is not considered a high priority by the business. Be sure to backup the production repository every day. The backup takes two forms: a database backup of the repository schema organized by the DBA, and a backup using the pmrep syntax that can be called from a script. It is also advisable to back up the pmserver.cfg, pmrepserver.cfg, and odbc.ini files.

Best Practices
Disaster Recovery Planning with PowerCenter HA Option PowerCenter Enterprise Grid Option

Sample Deliverables
None

Last updated: 06-Dec-07 14:56

INFORMATICA CONFIDENTIAL

PHASE 6: TEST

34 of 49

Phase 6: Test
Subtask 6.3.5 Conduct Volume Testing Description
Basic volume testing seeks to verify that the system can cope with anticipated production data levels. Taken to extremes, volume testing seeks to find the physical and logical limits of a system; this is also known as stress testing. Stress and volume testing seek to determine when and if system behavior changes as the load increases. A volume testing exercise is similar to a disaster testing exercise. The test scenarios encountered may never happen in the production environment. However, a well-planned and conducted test exercise provides invaluable reassurance to the business and IT communities regarding the stability and resilience of the system.

Prerequisites
None

Roles

Data Integration Developer (Secondary) Database Administrator (DBA) (Primary) Network Administrator (Secondary) System Administrator (Secondary) Test Manager (Primary)

Considerations Understand Service Level Agreements


Before starting the volume test exercise, consider the Service Level Agreements (SLA) for the particular system. The SLA should set measures for system availability and projected temporal growth in the amount of data being stored by the system. The SLAs are the benchmark to measure the volume test results against.

Estimate Projected Data Volumes Over Time and Consider Peak Load Periods
Enlist the help of the DBAs and Business Analysts to estimate the growth in projected data volume across the lifetime of the system. Remember to make allowances for any data archiving strategy that exists in the system. Data archiving helps to reduce the volume of data in the actual core production system, although of course, the net volume of data will increase over time. Use the projected data volumes to provide benchmarks for testing. Organizations often experience higher than normal periods of activity at predictable times. For example, a retailer or credit card supplier may experience peak activity during weekends or holiday periods. A bank may have month or year-end processes and statements to produce. Volume testing exercises should aim to simulate throughput at peak periods as well as normal periods. Stress testing goes beyond the peak period data volumes in order to find the limits of the system. A task such as duplicate record identification (known as data matching in Informatica Data Quality parlance) can place significant demands on system resources. Informatica Data Quality (IDQ) can perform millions or billions of comparison operations in a matching process. The time available for the completion of a matching process can have a big impact on the perception that the plan is running correctly. Bear in mind that, for these reasons, data matching operations are often scheduled for off-peak periods. Data matching is also a processor-intensive activity: the speed of the processor has a significant impact on how fast a matching process completes. If the project includes data quality operations, consult with a Data Quality Developer when estimating data volumes over time and peak load periods.

Volume Test Planning


Volume test planning is similar in many ways to disaster test planning. See 6.3.4 Conduct Disaster Recovery Testing for details on disaster test planning guidelines. INFORMATICA CONFIDENTIAL PHASE 6: TEST 35 of 49

However, there are some volume-test specific issues to consider during the planning stage: Obtaining Volume Test Data and Data Scrambling The test team responsible for completing the end-to-end test plan should ensure that the volume(s) of test data accurately reflect the production business environment. Obtaining adequate volumes of data for testing in a nonproduction environment can be time-consuming and logistically difficult, so remember to make allowances in the test plan for this. Some organizations choose to copy data from the production environment into the test system. Security protocol needs to be maintained if data is copied from a production environment since the data is likely to need to be scrambled. Some of the popular RDBMS products contain built-in scrambling packages; third-party scrambling solutions are also available. Contact the DBA and the IT security manager for guidance on the data scrambling protocol of the department or organization. For new applications, production data probably does not exist. Some commercially-available software products can generate large volumes of data. Alternatively, one of the developers may be able to build a customized suite of programs to artificially generate data. Hardware and Network Requirements and Test Timing Remember to consider the hardware and network characteristics when conducting volume testing. Do they match the production environment? Be sure to make allowances for the test results if there is a shortfall in processing capacity or network limitations on the test environment. Volume testing may involve ensuring that testing occurs at an appropriate time of day and day of week, and taking into account any other applications that may negatively affect the database and/or network resources. Increasing Data Volumes Volume testing cycles need to include normal expected volumes of data and some exceptionally high volumes of data. Incorporate peak period loads into the volume testing schedules. If stress tests are being carried out, data volume need to be increased even further. Additional pressure can be applied to the system, for example, by adding a high number of database users or temporarily bringing down a server. Any particular stress test cases need to be logged in the test plan and the test schedules.

Volume and Stress Test Execution


Volume Test Results Logging The volume testing team is responsible for capturing volume test results. Be sure to capture performance statistics for PowerCenter tasks, database throughput, server performance and network efficiency. PowerCenter Metadata Reporter provides an excellent method of logging PowerCenter session performance over time. Run the Metadata Reporter for each test cycle to capture session and workflow lapse time. The results can be displayed in Data Analyzer dashboards or exported to other media (e.g., PDF files). The views in the PowerCenter Repository can also be queried directly with SQL statements. In addition, collaboration should occur with the network and server administrators regarding the option to capture additional statistics, such as those related to CPU usage, data transfer efficiency, writing to disk etc. The type of statistics to capture depend on the operating system in use. If jobs and tasks are being run through a scheduling tool, use the features within the scheduling tool to capture lapse time data. Alternatively, use shell scripts or batch file scripts to retrieve time and process data from the operating system. System Limits, Scalability, and Bottlenecks If the system has been well-designed and built, the applications are more likely to perform in a predictable manner as data volumes increase. This is known as scalability and is a very desirable trait in any software system. Eventually however, the limits of the system are likely to be exposed as data volumes reach a critical mass and other INFORMATICA CONFIDENTIAL PHASE 6: TEST 36 of 49

stresses are introduced into the system. Physical or user-defined limits may be reached on particular parameters. For example, exceeding the maximum file size supported on an operating system constitutes a physical limit. Alternatively, breaching sort space parameters by running a database SQL query probably constitutes a limit that has been defined by the DBA. Bottlenecks are likely to appear in the load processes before such limits are exceeded. For example, a SQL query called in a PowerCenter session may experience a sudden drop in performance when data volumes reach a threshold figure. The DBA and application developer need to investigate any sudden drop in the performance of a particular query. Volume and stress testing is intended to gradually increase the data load in order to expose weaknesses in the system as a whole.

Conclusion
Volume and stress testing are important aspects of the overall system testing strategy. The test results provide important information that can be used to resolve issues before they occur in the live system. However, be aware that it is not possible to test all scenarios that may cause the system to crash. A sound system architecture and well-built software applications can help prevent sudden catastrophic errors.

Best Practices
None

Sample Deliverables
None

Last updated: 18-Oct-07 15:11

INFORMATICA CONFIDENTIAL

PHASE 6: TEST

37 of 49

Phase 6: Test
Task 6.4 Conduct User Acceptance Testing Description
User Acceptance Testing (UAT) is arguably the most important step in the project and is crucial to verifying that the system meets the users requirements. Being business usage-focused, it relates to the business requirements rather than on testing all the details of the technical specification. As such UAT is considered black box testing (i.e., without knowledge of all the underlying logic) that focuses on the deliverables to the end user, primarily through the presentation layer. UAT is the responsibility of the user community in terms of organization, staffing and final acceptance, but much of the preparation will have been undertaken by IT staff working to a plan agreed with the users. The function of the user acceptance testing is to obtain final functional approval from the user community for the solution to be deployed into production. As such, every effort must be made to replicate the production conditions.

Prerequisites
None

Roles

End User (Primary) Test Manager (Primary) User Acceptance Test Lead (Primary)

Considerations Plans
By this time User Acceptance Criteria should have been precisely defined by the user community as well, of course, as the specific business objectives and requirements for the project. UAT Acceptance Criteria should include tolerable bug levels, based on the defect management procedures report validation procedures (data audit, etc.) including gold standard reports to use for validation data quality tolerances that must be met validation procedures that will be based for comparison to existing systems (esp. for validation of data migration/synchronization projects or operational integration) required performance tolerances, including response time and usability As the testers may not have a technical background, the plan should include detailed procedures for testers to follow. The success of UAT depends on having certain critical items in place: Formal testing plan supported by detailed test scripts Properly configured environment, including the required test data (ideally a copy of the real, production environment and data) Adequately experienced test team members from the end user community Technical support personnel to support the testing team and to evaluate and remedy problems and defects discovered

Staffing the User Acceptance Testing


It is important that the user acceptance testers and their management are thoroughly committed to the new system and ensuring its success. There needs to be communication with the user community so that they are informed of the projects progress and able to identify appropriate members of staff to make available to carry out the testing. These participants will become the users most equipped to adopt the new system and so should be considered super-users who may participate in user training thereafter.

Best Practices
None

Sample Deliverables

INFORMATICA CONFIDENTIAL

PHASE 6: TEST

38 of 49

None

Last updated: 16-Feb-07 14:07

INFORMATICA CONFIDENTIAL

PHASE 6: TEST

39 of 49

Phase 6: Test
Task 6.5 Tune System Performance Description
Tuning a system can, in some cases, provide orders of magnitude performance gains. However, tuning is not something that should just be performed after the system is in production; rather, it is a concept of continual analysis and optimization. More importantly, tuning is a philosophy. The concept of performance must permeate all stages of development, testing, and deployment. Decisions made during the development process can seriously impact performance and no level of production tuning can compensate for an inefficient design that must be redeveloped. The information in this section is intended for use by Data Integration Developers, Data Quality Developers, Database Administrators, and System Administrators, but should be useful for anyone responsible for the long-term maintenance, performance, and support of PowerCenter Sessions, Data Quality Plans, PowerExchange Connectivity and Data Analyzer Reports.

Prerequisites
None

Roles

Data Integration Developer (Primary) Data Warehouse Administrator (Primary) Database Administrator (DBA) (Primary) Network Administrator (Primary) Presentation Layer Developer (Primary) Quality Assurance Manager (Review Only) Repository Administrator (Primary) System Administrator (Primary) System Operator (Primary) Technical Project Manager (Review Only) Test Manager (Primary)

Considerations
Performance and tuning the Data Integration environment is more than just simply tuning PowerCenter or any other Informatica product. True system performance analysis requires looking at all areas of the environment to determine opportunities for better performance from relational database systems, file systems, network bandwidth, and even hardware. The tuning effort requires benchmarking, followed by small incremental tuning changes to the environment, then re-executing the benchmarked data integration processes to determine the affect of the tuning changes Often, tuning efforts mistakenly focus on PowerCenter as the only point of concern when there may be other areas causing the bottleneck and needing attention. If you are sourcing data from a relational database for example, your data integration loads can never be faster than the source database can provide data. If the source database is poorly indexed, poorly implemented, or underpowered - no amount of downstream tuning in PowerCenter, hardware, network, file systems etc. can fix the problem of slow source data access. Throughout the tuning process, the entire end-to-end process must be considered and measured. The unit of work being baselined may be a single PowerCenter session for example, but it is always necessary to consider the end-to-end process of that session in the tuning efforts. Another important consideration of system tuning is the availability of an on-going means to monitor the system performance. While it is certainly important to focus on a specific area, tune, and deploy to production to gain

INFORMATICA CONFIDENTIAL

PHASE 6: TEST

40 of 49

benefit, continuously monitoring the performance of the system may reveal areas that show degredation over time and sometimes even immediate, extreme degredation for one reason or another. Quick identification of these areas allows pro-active tuning and adjustments before the problems become catosrophic. A good monitoring system may involve a variety of technologies to provide a full view of the environment. Note: The PowerCenter Administrator's Guide provides extensive information on performance tuning and is an excellent reference source on this topic. For Data Migration projects performance is often an important consideration. If a data migration project is the result of the implementation of a new package application or operational system, a down-time is usually required. Because this down-time may prevent the business from operating, the scheduled outage window must be as short as possible. Therefore, performance tuning is often addressed between system tests.

Best Practices
Recommended Performance Tuning Procedures

Sample Deliverables
None

Last updated: 24-Jun-10 13:55

INFORMATICA CONFIDENTIAL

PHASE 6: TEST

41 of 49

Phase 6: Test
Subtask 6.5.1 Benchmark Description
Benchmarking involves the process of running sessions or reports and collecting run statistics to set a baseline for comparison. The benchmark can be used as the standard for comparison after the session or report is tuned for performance. When determining a benchmark, the two key statistics to record are: session duration from start to finish, and rows per second throughput.

Prerequisites
None

Roles

Data Integration Developer (Primary) Data Warehouse Administrator (Primary) Database Administrator (DBA) (Primary) Network Administrator (Primary) Presentation Layer Developer (Primary) Repository Administrator (Primary) System Administrator (Primary) Test Manager (Primary)

Considerations
Since the goal of this task is to improve the performance of the entire system, it is important to choose a variety of mappings to benchmark. Having a variety of mappings ensures that optimizing one session does not adversely affect the performance of another session. It is important to work with the same exact data set each time you run a session for benchmarking and performance tuning. For example, if you run 1,000 rows for the benchmark, it is important to run the exact same rows for future performance tuning tests. After choosing a set of mappings, create a set of new sessions that use the default settings. Run these sessions when no other processes are running in the background.

Tip Tracking Results One way to track benchmarking results is to create a reference spreadsheet. This should define the number of rows processed for each source and target, the session start time, end time, time to complete, and rows per second throughput. Track two values for rows per second throughput: rows per second as calculated by PowerCenter (from transformation statistics in the session properties), and the average rows processed per second (based on total time duration divided by the number of rows loaded).
If it is not possible to run the session without background processes, schedule the session to run daily at a time where there are not many processes running on the server. Be sure that the session runs at the same time each day or night for benchmarking. The session should run at the same time for future tests. Track the performance results in spreadsheet over a period of days or for several runs. After the statistics are gathered, compile the average of the results in a new spreadsheet. Once the average results are calculated, identify the sessions that have lowest INFORMATICA CONFIDENTIAL PHASE 6: TEST 42 of 49

throughput or that miss their load window. These sessions are the first candidates for performance tuning. When the benchmark is complete, the sessions should be tuned for performance. It should be possible to identify potential areas for improvement by considering the machine, network, database, and PowerCenter session and server process. Data Analyzer benchmarking should focus on the time taken to run the source query, generate the report, and display it in the users browser.

Best Practices
None

Sample Deliverables
None

Last updated: 01-Feb-07 18:49

INFORMATICA CONFIDENTIAL

PHASE 6: TEST

43 of 49

Phase 6: Test
Subtask 6.5.2 Identify Areas for Improvement Description
The goal of this subtask is to identify areas for improvement, based on the performance benchmarks established in Subtask 6.5.1 Benchmark .

Prerequisites
None

Roles

Data Integration Developer (Primary) Data Warehouse Administrator (Primary) Database Administrator (DBA) (Primary) Network Administrator (Primary) Presentation Layer Developer (Primary) Repository Administrator (Primary) System Administrator (Primary) Test Manager (Primary)

Considerations
After performance benchmarks are established (in 6.5.1 Benchmark ), careful analysis of the results can reveal areas that may be improved through tuning. It is important to consider all possible areas for improvement, including: Machine. Regardless of whether the system is UNIX- or NT-based. Network. An often-overlooked facet of system performance, network optimization can have a major affect on overall system performance. For example, if the process of moving or FTPing files from a remote server takes four hours and the PowerCenter session takes four minutes, then optimizing and tuning the network may help to shorten the overall process of data movement, session processing, and backup. Key considerations for network performance include the network card and its settings, network protocol employed, available bandwidth, packet size settings, etc. Database. Database tuning is, in itself, an art form and is largely dependent on the DBA's skill, finesse, and in-depth understanding of the database engine. A major consideration in tuning databases is in defining throughput versus response time. It is important to understand that analytic solutions define their performance in response time, while many OLTP systems measure their performance in throughput, and most DBA's are schooled in OLTP performance tuning rather than response time tuning. Each of the three functional areas of database tuning (i.e., memory, disk I/O, and processing) must be addressed for optimal performance, or one of the other areas will suffer. PowerCenter. Most systems need to tune the PowerCenter session and server process in order to achieve an acceptable level of performance. Tuning the server daemon process and individual sessions can increase performance by a factor of 2 or 3, or more. These goals can be achieved by decreasing the number of network hops between the server and the databases, and by eliminating paging of memory on the server running the PowerCenter sessions. Data Analyzer. It is possible that tuning may be required for source queries and the reports themselves if the time taken to generate the report on screen takes too long. The actual tuning process can begin after the areas for improvement have been identified and documented. For data migration projects, other considerations must be included in the performance tuning activities. Many ERP applications have two-step processes where the data is loaded through simulated on-line processes. More specifically an API will be executed that will replicate in a batch scenario the way that the on-line entry works, executing all edits. In such a

INFORMATICA CONFIDENTIAL

PHASE 6: TEST

44 of 49

case, performance will not be the same as in a scenario where a relational database is being populated. The best approach to performance tuning is to set the expectation that all data errors should be identified and corrected in the ETL layer prior to the load to the target application. This approach can improve performance by as much as 80%.

Best Practices
Determining Bottlenecks

Sample Deliverables
None

Last updated: 24-Jun-10 13:46

INFORMATICA CONFIDENTIAL

PHASE 6: TEST

45 of 49

Phase 6: Test
Subtask 6.5.3 Tune Data Integration Performance Description
The goal of this subtask is to implement system changes to improve overall system performance, based on the areas for improvement that were identified and documented in Subtask 6.5.2 Identify Areas for Improvement .

Prerequisites
None

Roles

Data Integration Developer (Primary) Database Administrator (DBA) (Primary) Network Administrator (Primary) Quality Assurance Manager (Review Only) Repository Administrator (Primary) System Operator (Primary) Technical Project Manager (Review Only) Test Manager (Primary)

Considerations
Performance tuning should include the following steps: 1. Run a session and monitor the server to determine if the system is paging memory or if the CPU load is too high for the number of available processors. If the system is paging, correcting the system to prevent paging (e.g., increasing the physical memory available on the machine) can greatly improve performance. 2. Re-run the session and monitor the performance details, watching the buffer input and outputs for the sources and targets. 3. Tune the source system and target system based on the performance details. Once the source and target are optimized, re-run the PowerCenter session or Data Analyzer report to determine the impact of the changes. 4. Only after the server, source, and target have been tuned to their peak performance should the mapping and session be analyzed for tuning. This is because, in most cases, the mapping is driven by business rules. Since the purpose of most mappings is to enforce the business rules, and the business rules are usually dictated by the business unit in concert with the end-user community, it is rare that the mapping itself can be greatly tuned. Points to look for in tuning mappings are: filtering unwanted data early, cached lookups, aggregators that can be eliminated by programming finesse and using sorted input on certain active transformations. For more details on tuning mappings and sessions refer to the Best Practices. 5. After the tuning achieves a desired level of performance, the DTM (data transformation manager) process should be the slowest portion of the session details. This indicates that the source data is arriving quickly, the target is inserting the data quickly, and the actual application of the business rules is the slowest portion. This is the optimal desired performance. Only minor tuning of the session can be conducted at this point and usually has only a minimal effect. 6. Finally, re-run the benchmark sessions, comparing the new performance with the old performance. In some cases, optimizing one or two sessions to run quickly can have a disastrous effect on another mapping and care should be taken to ensure that this does not occur. INFORMATICA CONFIDENTIAL PHASE 6: TEST 46 of 49

Best Practices
Performance Tuning Databases (Oracle) Performance Tuning Databases (SQL Server) Performance Tuning Databases (Teradata) Performance Tuning in a Real-Time Environment Performance Tuning UNIX Systems Performance Tuning Windows 2000/2003 Systems Pushdown Optimization Session and Data Partitioning Tuning Mappings for Better Performance Tuning Sessions for Better Performance Tuning SQL Overrides and Environment for Better Performance Using Metadata Manager Console to Tune the XConnects

Sample Deliverables
None

Last updated: 24-Jun-10 14:14

INFORMATICA CONFIDENTIAL

PHASE 6: TEST

47 of 49

Phase 6: Test
Subtask 6.5.4 Tune Reporting Performance Description
The goal of this subtask is to identify areas where changes can be made to improve the performance of Data Analyzer reports.

Prerequisites
None

Roles

Database Administrator (DBA) (Primary) Network Administrator (Secondary) Presentation Layer Developer (Primary) Quality Assurance Manager (Review Only) Repository Administrator (Primary) System Administrator (Primary) Technical Project Manager (Review Only) Test Manager (Primary)

Considerations Database Performance


1. Generate SQL for each report and explain this SQL in the database to determine if the most efficient access paths are being used. Tune the database hosting the data warehouse and add indexes on the key tables. Take care in adding indexes since indexes affect ETL load times. 2. Analyze SQL requests made against the database to identify common patterns with user queries. If you find that many users are running aggregations against detail tables, consider creating an aggregate table in the database and perform the aggregations via ETL processing. This will save time when the user runs the report as the data will already be aggregated.

Data Analyzer Performance


1. Within Data Analyzer, use filters within reports as much as possible. Try to restrict as much data as possible. Also try to architect reports to start out with a high-level query, then provide analytic workflows to drill down to more detail. Data Analyzer report rendering performance is directly related to the number of rows returned from the database. 2. If the data within the report does not get updated frequently, make the report a cached report. If the data is being updated frequently, make the report a dynamic report. 3. Try to avoid sectional reports as much as possible since they take more time in rendering. 4. Schedule reports to run during off peak hours. Reports run in batches can use considerable resources. Therefore such reports should be run at the time when there is least use on the system subject to other dependencies.

Application Server Performance


1. Fine tune the application server Java Virtual Machine (JVM) to correspond with the recommendations in the Best Practice on Data Analyzer Configuration and Performance Tuning. This should significantly enhance Data Analyzer's reporting performance.

INFORMATICA CONFIDENTIAL

PHASE 6: TEST

48 of 49

2. Ensure that the application server has sufficient CPU and memory to handle the expected user load. Strawman estimates for CPU and memory are as follows: 1 CPU per 50 users 1-2 GB RAM per CPU 3. You may need additional memory if a large number of reports are cached. You may need additional CPUs if a large number of reports are on-demand.

Best Practices
Tuning and Configuring Data Analyzer and Data Analyzer Reports

Sample Deliverables
None

Last updated: 24-Jun-10 13:57

INFORMATICA CONFIDENTIAL

PHASE 6: TEST

49 of 49

Velocity v9
Phase 7: Deploy

2011 Informatica Corporation. All rights reserved.

Phase 7: Deploy
7 Deploy 7.1 Plan Deployment 7.1.1 Plan User Training 7.1.2 Plan Metadata Documentation and Rollout 7.1.3 Plan User Documentation Rollout 7.1.4 Develop Punch List 7.1.5 Develop Communication Plan 7.1.6 Develop Run Book 7.2 Deploy Solution 7.2.1 Train Users 7.2.2 Migrate Development to Production 7.2.3 Package Documentation

INFORMATICA CONFIDENTIAL

PHASE 7: DEPLOY

2 of 21

Phase 7: Deploy
Description
Upon completion of the Build Phase (when both development and testing are finished) the data integration solution is ready to be installed in a production environment and submitted to the ultimate test as a viable solution that meets the users' requirements. The deployment strategy developed during the Architect Phase is now put into action. During the Build Phase components are created that may require special initialization steps and proceedures. For the production deployment, checklists and procedures are developed to ensure that crucial steps are not missed in the production cut over. To the end user, this is where the fruits of the project are exposed and the end user acceptance begins. Up to this point, developers have been developing data cleansing, data transformations, load processes, reports, and dashboards in one or more development environments. But whether a project team is developing the back-end processes for a legacy migration project or the front-end presentation layer for a metadata management system, deploying a data integration solution is the final step in the development process. Metadata, which is the cornerstone of any data integration solution, should play an integral role in the documentation and training rollout to users. Not only is metadata critical to the current data integration effort, but it will be integral to planned metadata management projects down the road. After the solution is actually deployed, it must be maintained to ensure stability and scalability. All data integration solutions must be designed to support change as user requirements and the needs of the business change. As data volumes grow and user interest increases, organizations face many hurdles such as software upgrades, additional functionality requests, and regular maintenance. Use the Deploy Phase as a guide to deploying an on-time, scalable, and maintainable data integration solution that provides business value to the user community.

Prerequisites
None

Roles

Business Analyst (Primary) Business Project Manager (Primary) Data Architect (Secondary) Data Quality Developer (Primary) Data Warehouse Administrator (Primary) Database Administrator (DBA) (Primary) End User (Secondary) Metadata Manager (Primary) Presentation Layer Developer (Primary) Project Sponsor (Approve) Quality Assurance Manager (Approve) Repository Administrator (Primary) System Administrator (Primary) Technical Architect (Secondary)

INFORMATICA CONFIDENTIAL

PHASE 7: DEPLOY

3 of 21

Technical Project Manager (Review Only)

Considerations
None

Best Practices
None

Sample Deliverables
None

Last updated: 01-Feb-07 18:49

INFORMATICA CONFIDENTIAL

PHASE 7: DEPLOY

4 of 21

Phase 7: Deploy
Task 7.1 Plan Deployment Description
The success or failure associated with deployment often determines how users and management perceive the completed data integration solution. The steps involved in planning and implementing deployment are, therefore, critical to project success. This task addresses three key areas of deployment planning: Training Metadata documentation User documentation

Prerequisites
None

Roles

Application Specialist (Secondary) Business Analyst (Review Only) Data Integration Developer (Secondary) Database Administrator (DBA) (Primary) End User (Secondary) Metadata Manager (Primary) Project Sponsor (Primary) Quality Assurance Manager (Review Only) System Administrator (Secondary) Technical Project Manager (Secondary)

Although training and documentation are considered part of the Deploy Phase, both activities need to start early in the development effort and continue throughout the project lifecycle. Neither can be planned nor implemented effectively without the following: Thorough understanding of the business requirements that the data integration is intended to address In-depth knowledge of the system features and functions and its ability to meet business users' needs Understanding of the target users, including how, when, and why they will be using the system Companies that have training and documentation groups in place should include representatives of these groups in the project development team. Companies that do not have groups in place need to assign resources on the project team to these tasks, ensuring effective knowledge transfer throughout the development effort. And, everyone involved in the system design and build should understand the need for good documentation and make it a part of his or her everyday activities. This "in-process" documentation then serves as the foundation for the training curriculum and user documentation that is generated during the Deploy Phase. Although most companies have training programs and facilities in place, it is sometimes necessary to create these facilities to provide training on the data integration solution. If this is the case, the determination to create a training program must be made as early in the project lifecycle as possible, and the project plan must specify the necessary resources and development time. Creating a new training program is a double-edged sword: it can be quite time-consuming and costly, especially if additional personnel and/or physical facilities are required but it also gives project management the opportunity to tailor a training program specifically for users of the solution rather than "fitting" the training needs into an existing program.

Considerations

INFORMATICA CONFIDENTIAL

PHASE 7: DEPLOY

5 of 21

Project management also needs to determine policies and procedures for documenting and automating metadata reporting early in the deployment process rather than making reporting decisions on-the-fly. Finally, it is important to recognize the need to revise the end-user documentation and training curriculum over the course of the project lifecycle as the system and user requirements change. Documentation and training should both be developed with an eye toward flexibility and future change. For Data Migration projects it is very important that the operations team has the tools and processes to allow for a mass deployment of large amounts of code at one time, in a consistent manner. Capabilities should include: The ability to migrate code efficiently with little effort The ability to report what was deployed The ability to roll back changes if necessary This is why team-based development is normally a part of any data migration project.

Best Practices
None

Sample Deliverables
None

Last updated: 01-Feb-07 18:49

INFORMATICA CONFIDENTIAL

PHASE 7: DEPLOY

6 of 21

Phase 7: Deploy
Subtask 7.1.1 Plan User Training Description
Companies often misjudge the level of effort and resources required to plan, create, and successfully implement a user training program. In some cases, such as legacy migration initiatives, it may be that very little training is required on the data integration component of the project. However, in most cases, multiple training programs are required in order to address a wide assortment of user types and needs. For example, when deploying a metadata management system, it may be necessary to train administrative users, presentation layer users, and business users separately. When deploying a data conversion project, on the other hand, it may only be necessary to train administrative users. Note also that users of data quality applications such as Informatica Data Quality or Informatica Data Explorer will require training, and that these products may be of interest to personnel at several layers of the organization. The project plan should include sufficient time and resources for implementing the training program - from defining the system users and their needs, to developing class schedules geared toward training as many users as possible, efficiently and effectively, with minimal disruption of everyday activities. In developing a training curriculum, it is important to understand that there is seldom a "one size fits all" solution. The first step in planning user training is identifying the system users and understanding both their needs and their existing level of expertise. It is generally best to focus the curriculum on the needs of "average" users who will be trained prior to system deployment, then consider the specialized needs of high-end (i.e., expert) users and novice users who may be completely unfamiliar with decisionsupport capabilities. The needs of these specialized users can be addressed most effectively in follow-up classes. Planning user training also entails ensuring the availability of appropriate facilities. Ideally, training should take place on a system that is separate from the development and production environments. In most cases, this system mirrors the production environment, but is populated with only a small subset of data. If a separate system is not available, training can use either a development or production platform, but this arrangement raises the possibility of affecting either the development efforts or the production data. In any case, if sensitive production data is used in a training database, ensure appropriate security measures are in place to prevent unauthorized users in training from accessing confidential data.

Prerequisites
None

Roles

End User (Secondary)

Considerations
Successful training begins with careful planning. Training content and duration must correspond with end-user requirements. A well-designed and well-planned training program is a "must have" for a data integration solution to be considered successfully deployed. Business users often do not need to understand the back-end processes and mechanisms inherent in a data integration solution, but they do need to understand the access tools, the presentation layer, and the underlying data content to use it effectively. Thus, training should focus on these aspects, simplifying the necessary information as much as possible and organizing it to match the users' requirements. Training for business users usually focuses on three areas: The presentation layer Data content Application While the presentation layer is often the primary focus of training, data content and application training are also important to business users. Many companies overlook the importance of training users on the data content and application, providing only data access tool training. In this case, users often fail to understand the full capabilities of the data integration system and the company is unlikely to achieve optimal value from the system. Careful curriculum preparation includes developing clear, attractive training materials, including good graphics and well-

INFORMATICA CONFIDENTIAL

PHASE 7: DEPLOY

7 of 21

documented exercise materials that encourage users to practice using the system features and functions. Laboratory materials can make or break a training program by encouraging users to try using the system on their own. Training materials that contain obvious errors or poorly documented procedures actually discourage users from trying to use the system, as does a poorlydesigned presentation layer. If users do not gain confidence using the system during training, they are unlikely to use the data integration solution on a regular basis in their everyday activities. The training curriculum should include a post-training evaluation process that provides users with an opportunity to critique the training program, identifying both its strengths and weaknesses and making recommendations for future or follow-up training classes. The evaluation should address the effectiveness of both the course and the trainer because both are crucial to the success of a training program. As an example, the curriculum for a two-day training class on a data integration solution might look something like this: 2-Day Data Integration Solution Training Class Curriculum Day 1 Introduction and orientation High-level description & conceptualization tour of the data integration architecture Lunch Data content training Introduction to the presentation layer Day 2 Introduction to the application Introduction to metadata Lunch Integrated application & presentation layer laboratory 1 hour 2 hours 1 hour 3 hours 1 hour 2 hours 1 hour 2 hours 1 hour Duration

Best Practices
None

Sample Deliverables
None

Last updated: 01-Feb-07 18:49

INFORMATICA CONFIDENTIAL

PHASE 7: DEPLOY

8 of 21

Phase 7: Deploy
Subtask 7.1.2 Plan Metadata Documentation and Rollout Description
Whether a data integration project is being implemented as a single-use effort such as a legacy migration project, or as a longer-term initiative such as data synchronization (e.g., Single View of Customer), metadata documentation is critical to the overall success of the project. Metadata is the information map for any data integration effort. Proper use and enforcement of metadata standards will, for example, help ensure that future audit requirements are met, and that business users have the ability to learn exactly how their data is migrated, transformed, and stored throughout various systems. When metadata management systems are built, thorough metadata documentation provides end users with an even clearer picture of the potentially vast impact of seemingly minor changes in data structures. This subtask uses the example of a PowerCenter development environment to discuss the importance of documenting metadata. However, it is important to remember that metadata documentation is just as important for metadata management and presentation-layer development efforts. On the front-end, the PowerCenter development environment is graphical, easy-to-understand, and intuitive. On the back-end, it is possible to capture each step of the data integration process in the metadata, using manual and automatic entries into the metadata repository. Manual entries may include descriptions and business names, for example; automatic entries are produced while importing a source or saving a mapping. Because every aspect of design can potentially be captured in the PowerCenter repository, careful planning is required early in the development process to properly capture the desired metadata. Although it is not always easy to capture important metadata, every effort must be expended to satisfy this component of business documentation requirements.

Prerequisites
None

Roles

Database Administrator (DBA) (Primary) Metadata Manager (Primary) System Administrator (Secondary) Technical Project Manager (Review Only)

Considerations
During this subtask, it is important to decide what metadata to capture, how to access it, and when to place change control check points in the process to maintain all the changes in the metadata. The decision about which kinds of metadata to capture is driven by business requirements and project timelines. While it may be beneficial for a developer to enter detailed descriptions of each column, expression, variable, and so forth, it would also be very time-consuming. The decision, therefore, should be based on how much metadata is actually required by the systems that use metadata. From the developer's perspective, PowerCenter provides the ability to enter descriptive information for all repository objects, sources, targets, and transformations. Moreover, column level descriptions of the columns in a table, as well as all information about column size and scale, datatypes, and primary keys are stored in the repository. This enables business users to maintain information on the actual business name and description of a field on a particular table. This ability helps users in a number of ways: for example, it eliminates confusion about which columns should be used for a calculation. For example, 'C_Year' and 'F_Year' might be column names on a table, but 'Calendar Year' and 'Fiscal Year' are more useful to business users trying to calculate market share for the company's fiscal year. Informatica does not recommend accessing the repository tables directly, even for select access, because the repository structure can change with any product release. Informatica provides several methods of gaining access to this data:. INFORMATICA CONFIDENTIAL PHASE 7: DEPLOY 9 of 21

The PowerCenter Metadata Reporter (PCMR) provides Web-based access to the PowerCenter repository. With PCMR, developers and administrators can perform both operational and impact analysis on their data integration projects. Informatica continues to provide the MX Views, a set of views that are installed with the PowerCenter repository. The MX Views are meant to provide query-level access to repository metadata. MX2 is a set of encapsulated objects that can communicate with the metadata repository through a standard interface. These MX2 objects offer developers an advanced object-based API for accessing and manipulating the PowerCenter Repository from a variety of programming languages.

Best Practices
None

Sample Deliverables
None

Last updated: 01-Feb-07 18:49

INFORMATICA CONFIDENTIAL

PHASE 7: DEPLOY

10 of 21

Phase 7: Deploy
Subtask 7.1.3 Plan User Documentation Rollout Description
Good system and user documentation is invaluable for a number of data integration system users, such as: New data integration or presentation layer developers; Enterprise architects trying to develop a clear picture of how systems, data, and metadata are connected throughut an organization; Management users who are learning to navigate reports and dashboards; and Business users trying to pull together analytical information for an executive report. A well-documented project can save development and production team members both time and effort getting the new system into production and the new employee(s) up-to-speed. User documentation usually consists of two sets: one geared toward ad-hoc users, providing details about the data integration architecture and configuration; and another geared toward "push button" users, focusing on understanding the data, and providing details on how and where they can find information within the system. This increasingly includes documentation on how to use and/or access metadata.

Prerequisites
None

Roles

Business Analyst (Review Only) Quality Assurance Manager (Review Only)

Considerations
Good documentation cannot be implemented in a haphazard manner. It requires careful planning and frequent review to ensure that it meets users' needs and is easily accessible to everyone that needs it. In addition, it should incorporate a feedback mechanism that encourages users to evaluate it and recommend changes or additions. To improve users' ability to effectively access information in, and increase their understanding of, the content, many companies create resource groups within the business organization. Group members attend detailed training sessions and work with the documentation and training specialists to develop materials that are geared toward the needs of typical, or frequent, system users like themselves. Such groups have two benefits: they help to ensure that training and documentation materials are ontarget for the needs of the users, and they serve as in-house experts on the data integration architecture, reducing users' reliance on the central support organization.

Best Practices
None

Sample Deliverables
None

Last updated: 01-Feb-07 18:49

INFORMATICA CONFIDENTIAL

PHASE 7: DEPLOY

11 of 21

Phase 7: Deploy
Subtask 7.1.4 Develop Punch List Description
The first production run of any data integration effort requires that a variety of tasks are executed and often requires a specific order. In the case of a data warehouse, some tasks executed for the first production run are never run again; such as loading code tables and performing a full historical load. It is key to the success of the go-live to make sure these steps are not missed. In the case of a data migration effort, the first production run is the only production run and again the list of various tasks might be long and require a rigid sequence for success. These steps are often rehearsed in a set of mock-runs before production execution. Missing any one step may jeopardize the success of the Data Migration. In order to ensure that tasks are not missed or run out of sequence, a punch list will be created to be followed on the production run. The punch list is typically an Excel spreadsheet with a list of tasks that will be executed for each individual mock production run and final go-live first production run. The punch list will include the execution of scripts, workflows, validation steps, and any other task that must be executed. This punch list will typically include two worksheets: 1. The first worksheet should include: a. Task Name b. Task Description (1 sentence) c. Task Assignee d. Estimated Duration e. Time of Completion f. Who Competed Task g. Validation Steps h. Validation Notes i. Other Notes 2. The second worksheet should include: a. Name of each person working on the migration b. Cell Phone Number for each person on the migration c. Home Phone Number for each person on the migration d. Team they are on e. Reporting Manager* *All Reporting Managers should be listed on this worksheet

Prerequisites
None

Roles

Application Specialist (Secondary) Data Integration Developer (Secondary) Database Administrator (DBA) (Secondary) Production Supervisor (Primary) Project Sponsor (Review Only) System Administrator (Secondary) Technical Project Manager (Secondary)

Considerations

INFORMATICA CONFIDENTIAL

PHASE 7: DEPLOY

12 of 21

The key objective here is to plan ahead. Most organizations have well-established and documented system support procedures in-place. The support procedures for the solution should fit into these existing procedures, deviating only where absolutely necessary - and then, only with the prior knowledge and approval of the Project Manager and Production Supervisor. Any such deviations should be determined and documented as early as possible in the development effort, preferably before the system actually goes live. Be sure to thoroughly document specific procedures and contact information for problem escalation, especially if the procedures or contacts differ from the existing problem escalation plan. Data migration projects differ from traditional data integration projects as they normally have one critical go-live weekend. This means you have only one chance to execute successfully without disasterous consequences. The punch list will be initially created for the first trial cutover or mock run and then modified for each additional trial cutover. It is typical to have at least four trial cutovers before the final go-live. The punch list will be a critical document to the run-book and to final go-live. It will be discussed with the entire project team and will be a communication tool to demonstrate progress during the final go-live weekend.

Best Practices
None

Sample Deliverables
Punch List

Last updated: 01-Feb-07 18:49

INFORMATICA CONFIDENTIAL

PHASE 7: DEPLOY

13 of 21

Phase 7: Deploy
Subtask 7.1.5 Develop Communication Plan Description
A communication plan should be developed that discusses the details of communications and coordination for the production rollout of the data integration solution. The plan should discuss where key communication information will be stored, who will be communicated to, and how much communication will be provided. This information will be initially in a stand-alone document but upon project management approval this information will be added to the run book. A comprehensive communication plan can ensure that all required people in the organization are ready for the production deployment. Since many of them can be outside of the immediate data integration project team, it cannot be assumed that everyone is always up to date on the production go-live planning and timing. For example you may need to communicate with DBA's, IT infrastructure, web support teams, and other system owners that may have assigned tasks and monitoring activities during the first production run. The communication plan will ensure proper and timely communication across the organization so there are no surprises when the production run is initiated.

Prerequisites
7.1.4 Develop Punch List

Roles

Application Specialist (Secondary) Data Integration Developer (Secondary) Database Administrator (DBA) (Secondary) Production Supervisor (Primary) Project Sponsor (Review Only) System Administrator (Secondary) Technical Project Manager (Secondary)

Considerations
The communication plan should provide details about communication. It must include steps to take if a specific person on the plan is unresponsive, escalation procedures and emergency communication protocols (i.e., how would the entire core project team communicate in a dire emergency). Since many go-live events occur over weekends, it is also important to retain not only business contact information but also weekend contact information such as cell phones or pagers in the event a key contact needs to be reached on a non-business day.

Best Practices
None

Sample Deliverables
Data Migration Communication Plan

Last updated: 01-Feb-07 18:49

INFORMATICA CONFIDENTIAL

PHASE 7: DEPLOY

14 of 21

Phase 7: Deploy
Subtask 7.1.6 Develop Run Book Description
The Run Book contains detailed descriptions of the tasks from the punch list that was used for the first production run. It details the tasks more explicitly for the individual mock-run and final go-live production run. Typically the punch list will be created for the first trial cutover or mock-run and the run book will be developed during the first and second trial cutovers and completed by the start of the final production go-live.

Prerequisites
7.1.4 Develop Punch List 7.1.5 Develop Communication Plan

Roles

Application Specialist (Secondary) Data Integration Developer (Secondary) Database Administrator (DBA) (Secondary) Production Supervisor (Primary) Project Sponsor (Review Only) System Administrator (Secondary) Technical Project Manager (Secondary)

Considerations
One of the biggest challenges for completing a run book (like completing an operations manual) is to provide an adequate level of detail. It is important to find a balance between providing too much information making it unwieldy and unlikely to be used, versus providing too little detail that could jeopardize the successful execution of the tasks. For Data Migration projects this is even more imperative, since you normally have only one critical go-live event. This is the one chance to have a successful production go-live without negatively impacting operational systems that depend on the migrated data. The run book is developed and leveraged on trial cutovers and should have all the necessary information to ensure a successful migration. Go/No-Go Procedure Information will also be included in the run-book. The run book for a data migration project eliminates the need for an operations manual that is present for most other data integration solutions.

Best Practices
None

Sample Deliverables
Data Migration Run Book

Last updated: 01-Feb-07 18:50

INFORMATICA CONFIDENTIAL

PHASE 7: DEPLOY

15 of 21

Phase 7: Deploy
Task 7.2 Deploy Solution Description
The challenges involved in successfully deploying a data integration solution involve managing the migration from development through production, training end-users, and providing clear and consistent documentation. These are all critical factors in determining the success (or failure) of an implementation effort. Before the deployment tasks are undertaken however, it is necessary to determine the organization's level of preparedness for the deployment and thoroughly plan end-user training materials and documentation. If all prerequisites are not satisfactorily completed, it may be advisable to delay the migration, training, and delivery of finalized documentation rather than hurrying through these tasks solely to meet a predetermined target delivery date. For data migration projects it is important to understand that some packaged applications such as SAP have their own deployment strategies. The deployment strategies for Informatica processes should take this into account and when applicable match up with those deployment strategies.

Prerequisites
None

Roles

Business Analyst (Primary) Business Project Manager (Primary) Data Architect (Secondary) Data Integration Developer (Primary) Data Warehouse Administrator (Primary) Database Administrator (DBA) (Primary) Presentation Layer Developer (Primary) Production Supervisor (Approve) Quality Assurance Manager (Approve) Repository Administrator (Primary) System Administrator (Primary) Technical Architect (Secondary) Technical Project Manager (Approve)

Considerations
None

Best Practices
Application ILM log4j Settings

Sample Deliverables
None

Last updated: 02-Nov-10 22:20

INFORMATICA CONFIDENTIAL

PHASE 7: DEPLOY

16 of 21

Phase 7: Deploy
Subtask 7.2.1 Train Users Description
Before training can begin, company management must work with the development team to review the training curricula to ensure that it meets the needs of the various application users. First, however, management and the development team need to understand just who the users are and how they are likely to use the application. Application users may include individuals who have reporting needs and need to understand the presentation layer; operational users who need to review the content being delivered by a data conversion system; administrative users managing the sourcing and delivery of metadata across the enterprise; production operations personnel responsible for day-to-day operations and maintenance; and more. After the training curricula is planned and users are scheduled to attend classes appropriate to their needs, a training environment must be prepared for the training sessions. This involves ensuring that a laboratory environment is set-up properly for multiple concurrent users, and that data is clean and available to that environment. If the presentation layer is not ready or the data appears incomplete or inaccurate, users may lose interest in the application and choose not to use it for their regular business tasks. This lack of interest can result in an underutilized resource critical to business success. It is also important to prevent untrained users from accessing the system, otherwise the support staff is likely to be overburdened and spend a significant amount of time providing on-the-job training to uneducated users.

Prerequisites
None

Roles

Business Analyst (Primary) Business Project Manager (Primary) Data Integration Developer (Primary) Data Warehouse Administrator (Secondary) Presentation Layer Developer (Primary) Technical Project Manager (Review Only)

Considerations
It is important to consider the many and varied roles of all application users when planning user training. The user roles should be defined up-front to ensure that everyone who needs training receives it. If the roles are not defined up-front, some key users may not be properly trained, resulting in a less-than-optimal hand-off to the user departments. For example, in addition to training obvious users such as the operational staff, it may be important to consider users such as DBAs, data modelers, and metadata managers, at least from a high-level perspective, and ensure that they receive appropriate training. The training curricula should educate users about the data content as well as the effective use of the data integration system. While correct and effective use of the system is important, a thorough understanding of the data content helps to ensure that training moves along smoothly without interruption for ad-hoc questions about the meaning or significance of the data itself. Additionally, it is important to remember that no one training curriculum can address all needs of all users. The basic training class should be geared toward the average user with follow-up classes scheduled for those users needing training on the application's advanced features. It is also wise to schedule follow-up training for data and tool issues that are likely to arise after the deployment is complete and the end-users have had time to work with the tools and data. This type of training can be held in informal "question and answer" sessions rather than formal classes. Finally, be sure that training objectives are clearly communicated between company management and the development team to ensure complete satisfaction with the training deliverable. If the training needs of the various user groups vary widely, it may be

INFORMATICA CONFIDENTIAL

PHASE 7: DEPLOY

17 of 21

necessary to obtain additional training staff or services from a vendor or consulting firm.

Best Practices
None

Sample Deliverables
Training Evaluation

Last updated: 01-Feb-07 18:50

INFORMATICA CONFIDENTIAL

PHASE 7: DEPLOY

18 of 21

Phase 7: Deploy
Subtask 7.2.2 Migrate Development to Production Description
To successfully migrate PowerCenter or Data Analyzer from one environment to another one (from development to production, for example), some tasks must be completed. These tasks are dispatched within three phases: Pre-deployment phase Deployment phase Post-deployment phase Each phase is detailed in the Considerations section. While there are multiple tasks to perform in the deployment process, the actual migration phase consists of moving objects from one environment to another. A migration can include the following objects: PowerCenter - mappings, sessions, workflows, scripts, parameters files, stored procedures, etc. Data Analyzer - schemas, reports, dashboards, schedules, global variables. PowerExchange/CDC - datamaps and registrations. Data Quality - plans and dictionaries.

Prerequisites
None

Roles

Data Warehouse Administrator (Primary) Database Administrator (DBA) (Primary) Production Supervisor (Approve) Quality Assurance Manager (Approve) Repository Administrator (Primary) System Administrator (Primary) Technical Project Manager (Approve)

Considerations
The tasks below should be completed before, during, and after the migration to ensure a successful deployment. Failure to complete one or more of these tasks can result in an incomplete or incorrect deployment. Pre-deployment tasks Ensure all objects have been successfully migrated and tested in the Quality Assurance environment. Ensure the Production environment is compliant with specifications and is ready to receive the deployment. Obtain sign-off from the deployment team and project teams to deploy to the Production environment. Obtain sign-off from the business units to migrate to the Production environment. Deployment tasks: Verify the consistency of the connection objects names across environments to ensure that the connections are being made to the production sources/targets. If not, manually change the connections for each incorrect session to source and target the production environment. Determine the method of migration (i.e., folder copy or deployment group) to use. If you are going to use the folder copy method, make sure the shared folders are copied before the non-shared folders. If you are going to use the deployment

INFORMATICA CONFIDENTIAL

PHASE 7: DEPLOY

19 of 21

group method, make sure all the objects to be migrated are checked-in and refresh the deployment group as it is done. Data Analyzer objects that reference new tables require that schemas be migrated before the reports. Make sure the new tables are associated with the proper data source and that the data connectors are plugged to the news schemas. Synchronize the deployment window with the maintenance window to minimize the impact on end-users. If the deployment window is longer that the regular maintenance window, it may be necessary to coordinate with the business unit to minimize the impact on the end-users. Post-deployment tasks: Communicate with the management team members on all aspects of the migration (i.e., problems encountered, solutions, tips and tricks, etc.). Finalize and deliver the documentation. Obtain final user and project sponsor acceptance. Finally, when deployment is complete, develop a project close document to evaluate the overall effectiveness of the project (i.e., successes, recommended improvements, lessons learned, etc.).

Best Practices
Deployment Groups Migration Procedures - PowerCenter Using PowerCenter Labels Migration Procedures - PowerExchange Deploying Data Analyzer Objects

Sample Deliverables
Project Close Report

Last updated: 01-Feb-07 18:50

INFORMATICA CONFIDENTIAL

PHASE 7: DEPLOY

20 of 21

Phase 7: Deploy
Subtask 7.2.3 Package Documentation Description
The final tasks in deploying the new application are: Gathering all of the various documents that have been created during the life of the project; Updating and/or revising them as necessary, and Distributing them to the departments and individuals that will need them to use or supervise use of the application. By this point, management should have reviewed and approved all of the documentation. Documentation types and content varies widely among projects, depending on the type of engagement, expectations, scope of project, and so forth. Some typical deliverables include all of those listed in the Sample Deliverables section.

Prerequisites
None

Roles

Business Analyst (Approve) Business Project Manager (Primary) Data Architect (Secondary) Data Integration Developer (Primary) Data Warehouse Administrator (Secondary) Database Administrator (DBA) (Secondary) Presentation Layer Developer (Primary) Production Supervisor (Approve) Technical Architect (Primary) Technical Project Manager (Review Only)

Considerations
None

Best Practices
None

Sample Deliverables
None

Last updated: 01-Feb-07 18:50

INFORMATICA CONFIDENTIAL

PHASE 7: DEPLOY

21 of 21

Velocity v9
Phase 8: Operate

2011 Informatica Corporation. All rights reserved.

Phase 8: Operate
8 Operate 8.1 Define Production Support Procedures 8.1.1 Develop Operations Manual 8.2 Operate Solution 8.2.1 Execute First Production Run 8.2.2 Monitor Load Volume 8.2.3 Monitor Load Processes 8.2.4 Track Change Control Requests 8.2.5 Monitor Usage 8.2.6 Monitor Data Quality 8.3 Maintain and Upgrade Environment 8.3.1 Maintain Repository 8.3.2 Upgrade Software

INFORMATICA CONFIDENTIAL

PHASE 8: OPERATE

2 of 23

Phase 8: Operate
Description
The Operate Phase is the final step in the development of a data integration solution. This phase is sometimes referred to as production support. During its day-to-day operations the system continually faces new challenges such as increased data volumes, hardware and software upgrades, and network or other physical constraints. The goal of this phase is to keep the system operating smoothly by anticipating these challenges before they occur and planning for their resolution. Planning is probably the most important task in the Operate Phase. Often, the project team plans the system's development and deployment, but does not allow adequate time to plan and execute the turnover to day-to-day operations. Many companies have dedicated production support staff with both the necessary tools for system monitoring and a standard escalation process. This team requires only the appropriate system documentation and lead time to be ready to provide support. Thus, it is imperative for the project team to acknowledge this support capability by providing ample time to create, test, and turn over the deliverables discussed throughout this phase.

Prerequisites
None

Roles

Business Project Manager (Primary) Data Integration Developer (Secondary) Data Steward/Data Quality Steward (Primary) Data Warehouse Administrator (Secondary) Database Administrator (DBA) (Primary) Presentation Layer Developer (Secondary) Repository Administrator (Primary) System Administrator (Primary) System Operator (Primary) Technical Project Manager (Review Only)

Considerations
None

Best Practices
None

Sample Deliverables
None

Last updated: 01-Feb-07 18:50

INFORMATICA CONFIDENTIAL

PHASE 8: OPERATE

3 of 23

Phase 8: Operate
Task 8.1 Define Production Support Procedures Description
In this task, the project team produces an Operations Manual, which tells system operators how to run the system on a day-to-day basis. The manual should include information on how to restart failed processes and who to contact in the event of a failure. In addition, this task should produce guidelines for performing system upgrades and other necessary changes to the system throughout the project's lifetime. Note that this task must occur prior to the system actually going live. The production support procedures should be clear to system operators even before the system is in production, because any production issues that are going to arise will probably do so very shortly after the system goes live.

Prerequisites
None

Roles

Data Integration Developer (Secondary) Production Supervisor (Primary) System Operator (Review Only)

Considerations
The watchword here is: Plan Ahead. Most organizations have well-established and documented system support procedures inplace. The support procedures for the solution should fit into these existing procedures, deviating only where absolutely necessary - and then, only with the prior knowledge and approval of the Project Manager and Production Supervisor. Any such deviations should be determined and documented as early as possible in the development effort, preferably before the system actually goes live. Be sure to thoroughly document specific procedures and contact information for problem escalation, especially if the procedures or contacts differ from the existing problem escalation plan.

Best Practices
None

Sample Deliverables
None

Last updated: 01-Feb-07 18:50

INFORMATICA CONFIDENTIAL

PHASE 8: OPERATE

4 of 23

Phase 8: Operate
Subtask 8.1.1 Develop Operations Manual Description
After the system is deployed, the Operations Manual is likely to be the most frequently-used document in the operations environment. The system operators - the individuals who monitor the system on a day-to-day basis - use this manual to determine how to run the various pieces of the implemented solution. In addition, the manual provides the operators with error processing information, as well as reprocessing steps in the event of a system failure. The Operations Manual should contain a high-level overview of the system in order to familiarize the operations staff with new concepts along with the specific details necessary to successfully execute day-to-day operations. For data visualization, the Operations Manual should contain high-level explanations of reports, dashboards, and shared objects in order to familiarize the operations staff with those concepts. For a data integration/migration/consolidation solution, the manual should provide operators with the necessary information to perform the following tasks: Run workflows, worklets, tasks and any external code Recover and restart workflows Notify the appropriate second-tier support personnel in the event of a serious system malfunction Record the appropriate monitoring data during and after workflow execution (i.e., load times, data volumes, etc.) For a data visualization or metadata reporting solution the manual should include the details on the following: Run reports, schedules Rerun scheduled reports Source, target, database, web server and application server information Notify the appropriate second-tier support personnel in the event of a serious system malfunction Record the appropriate monitoring data (i.e., report run times, frequency, data volumes, etc.) Operations manuals for all projects should provide information for performing the following tasks: Start servers Stop servers Notify the appropriate second-tier support personnel in the event of a serious system malfunction Test the health of the reporting and/or data integration environment (i.e., check DB connections to the repositories, source and target databases / files and real time feeds, check CPU and memory usage on the PowerCenter and Data Analyzer servers).

Prerequisites
None

Roles

Data Integration Developer (Secondary) Production Supervisor (Primary) System Operator (Review Only)

A draft version of the Operations Manual can be started during the Build Phase as the developers document the individual components. Documents such as mapping specifications, report specifications, and unit and integration testing plans contain a great deal of information that can be transferred into the Operations Manual. Bear in mind that data quality processes are executed earlier, during the Design Phase, although the Data Quality Developer and Data Integration Developer will be available during the Build Phase to agree on any data quality measures (such as ongoing run-time data quality process deployment) that need to be added to the Operations Manual.

Considerations

INFORMATICA CONFIDENTIAL

PHASE 8: OPERATE

5 of 23

The Operations Manual serves as the handbook for the production support team. Therefore, it is imperative that it be accurate and kept up-to-date. For example, an Operations Manual typically contains names and phone numbers for on-call support personnel. Keeping this information consolidated in a central place in the document makes it easier to maintain. Restart and recovery procedures should be thoroughly tested and documented, and the processing window should be calculated and published. Escalation procedures should be thoroughly discussed and distributed so that members of the development and operations staff are fully familiar with them. In addition, the manual should include information on any manual procedures that may be required, along with step-by-step instructions for implementing the procedures. This attention to detail helps to ensure a smooth transition into the Operate Phase. Although it is important, the Operations Manual is not meant to replace user manuals and other support documentation. Rather, it is intended to provide system operators with a consolidated source of documentation to help them support the system. The Operations Manual also does not replace proper training on PowerCenter, Data Analyzer, and supporting products.

Best Practices
None

Sample Deliverables
Operations Manual

Last updated: 12-Jul-10 15:44

INFORMATICA CONFIDENTIAL

PHASE 8: OPERATE

6 of 23

Phase 8: Operate
Task 8.2 Operate Solution Description
After the data integration solution has been built and deployed, the job of running it begins. For a data migration or consolidation solution, the system must be monitored to ensure that data is being loaded into the database. A data visualization or metadata reporting solution should be monitored to ensure that the system is accessible to the end users. The goal of this task is to ensure that the necessary processes are in place to facilitate the monitoring of and the reporting on the system's daily processes.

Prerequisites
None

Roles

Business Project Manager (Primary) Data Steward/Data Quality Steward (Primary) Data Warehouse Administrator (Secondary) Database Administrator (DBA) (Primary) Presentation Layer Developer (Secondary) Project Sponsor (Primary) Repository Administrator (Review Only) System Administrator (Primary) System Operator (Primary) Technical Project Manager (Review Only)

Considerations
None

Best Practices
None

Sample Deliverables
None

Last updated: 01-Feb-07 18:50

INFORMATICA CONFIDENTIAL

PHASE 8: OPERATE

7 of 23

Phase 8: Operate
Subtask 8.2.1 Execute First Production Run Description
Once a Data Integration solution is fully developed, tested and signed off for production it is time to execute the first run in the production environment. During the implementation, the first run is a key to a successful deployment. While the first run is often similar to the on-going load process, it can be distinctively different. There are often specific one-time setup tasks that need to be executed on the first run that will not be part of the regular daily data integration process. In most cases the first production run is a high-profile set of activities that must be executed, documented, and improved for all future production runs. This run should leverage a Punch List and should execute a set of tested workflows or scripts (not manual steps such as executing a specific SQL statement for set-up). It is important that the first run is executed successfully with limited manual interactions. Any manual steps should be closely monitored, controlled, documented and communicated. This first run should be executed following the Punch List and should be revisited upon completion of the execution.

Prerequisites
6.3.2 Execute Complete System Test 7.2.2 Migrate Development to Production

Roles

Database Administrator (DBA) (Primary) Production Supervisor (Primary) System Administrator (Primary) System Operator (Primary) Technical Project Manager (Review Only)

Considerations
For some projects (such as a data migration effort) the first production run is the production system. It will not go on beyond the first production run since a data migration by its nature requires a single movement of the production data. Further, the set of tasks that make up the production run may not be executed again. Any future runs will be a part of the execution that addresses a specific data problem, not the entire batch. For data warehouses, often the first production run may include loading historical data as well as initial loads of code tables and dimension tables. The load process may execute much longer than a typical on-going load due to the extra amount of data and the different criteria it is run against to pick up the historical data. There may be extra data validation and verification at the end of the first production run to ensure that the system is properly initialized and ready for on-going loads. It is important to appropriately plan and execute the first load properly as the subsequent periodic refreshes of the data warehouse (daily, hourly, real time) depend on the setup and success of the first production run.

Best Practices
None

Sample Deliverables
Data Migration Run Book Operations Manual Punch List

INFORMATICA CONFIDENTIAL

PHASE 8: OPERATE

8 of 23

Last updated: 01-Feb-07 18:50

INFORMATICA CONFIDENTIAL

PHASE 8: OPERATE

9 of 23

Phase 8: Operate
Subtask 8.2.2 Monitor Load Volume Description
Increasing data volume is a challenge throughout the life of a data integration solution. As the data migration or consolidation system matures and new data sources are introduced, the amount of data processed and loaded into the database continues to grow. Similarly, as a data visualization or metadata management system matures, the amount of data processed and presented increases. One of the operations team's greatest tasks is to monitor the data volume processed by the system to determine any trends that are developing. If generated correctly, the data volume estimates used by the Technical Architect and the development team in building the architecture, should ensure that it is capable of growing to meet ever-changing business requirements. By continuously monitoring volumes, however, the development and operations teams can act proactively as data volumes increase. Monitoring affords team members the time necessary to determine how best to accommodate the increased volumes.

Prerequisites
None

Roles

Production Supervisor (Secondary) System Operator (Primary)

Considerations
Installing PowerCenter Reporting using Data Analyzer with Repository and Administrative reports can help monitor load volumes. The Session Run Details report can be configured to provide the following: Sucessful rows sourced Sucessful rows written Failed rows sourced Failed rows written Session duration The Session Run Details report can also be configured to display data over ranges of time for trending. This information provides the project team with both a measure of the increased volume over time and an understanding of the increased volume's impact on the data load window. Dashboards and alerts can be set to monitor loads on an on-going basis, alerting data integration administrators if load times exceed specified threshholds. By customizing the standard reports, Data Integration support staff can create any variety of monitoring levels -- from individual projects to full daily load processing statistics -- across all projects.

Best Practices
None

Sample Deliverables
None

Last updated: 01-Feb-07 18:50

INFORMATICA CONFIDENTIAL

PHASE 8: OPERATE

10 of 23

Phase 8: Operate
Subtask 8.2.3 Monitor Load Processes Description
After the data integration solution is deployed, the system operators begin the task of monitoring the daily processes. For data migration and consolidation solutions, this includes monitoring the processes that load the database. For presentation layers and metadata management reporting solutions, this includes monitoring the processes that create the end-user reports. This monitoring is necessary to ensure that the system is operating at peak efficiency. It is important to ensure that any processes that stop, are delayed, or simply fail to run are noticed and appropriate steps are taken. It is important to recognize in data migration and consolidation solutions that the processing time may increase as the system matures, new data sources are used, and existing sources mature. For data visualization and metadata management reporting solutions, it is important to note that processing time can increase as the system matures, more users access the system and reports are run more frequently. If the processes are not monitored, they may cause problems as the daily load processing begins to overlap the system's user availability. Therefore, the system operator needs to monitor and report on processing times as well as data volumes.

Prerequisites
None

Roles

Presentation Layer Developer (Secondary) System Operator (Primary)

Considerations
Data Analyzer with Repository and Administration Reports installed can provide information about session run details, average loading times, and server load trends by day. Administrative and operational dashboards can display all vital metrics needing to be monitored. They can also provide the project management team with a high-level understanding of the health of the analytic support system. Large installations may already have monitoring software in place that can be adapted to monitor the load processes of the analytic solution. This software typically includes both visual monitors for the client desktop of the System Operator as well as electronic alerts than can be programmed to contact various project team members.

Best Practices
Causes and Analysis of UNIX Core Files Load Validation Running Sessions in Recovery Mode

Sample Deliverables
None

Last updated: 24-Jun-10 14:05

INFORMATICA CONFIDENTIAL

PHASE 8: OPERATE

11 of 23

Phase 8: Operate
Subtask 8.2.4 Track Change Control Requests
The process of tracking change control requests is integral to the Operate Phase. It is here that any production issues are documented and resolved. The change control process allows the project team to prioritize the problems and create schedules for their resolution and eventual promotion into the production environment.

Description

Prerequisites
None

Roles

Business Project Manager (Primary) Project Sponsor (Primary)

Considerations

Ideally, a change control process was implemented during the Architect Phase, enabling the developers to follow a wellestablished process during the Operate Phase. Many companies rely on a Configuration Control Board to prioritize and approve work for the various maintenance releases. The Change Control Procedure document, created in conjunction with the Change Control Procedures in the Architect Phase should describe precisely how the project team is going to identify and resolve problems that come to light during system development or operation. Most companies use a Change Request Form to kick-off the Change Control procedure. These forms should include the following: Identify the individual or department requesting the change. A clear description of the change requested. Define the problem or issue that the requested change addresses. The priority level of the change requested. The expected release date. An estimation of the development time. The impact of the change requested to project(s) in development, if any. Include a Resolutions section to be filled in after the Change Request is resolved, specifying whether the change was implemented, in what release, and by whom. This type of change control documentation can be invaluable if questions subsequently arise as to why a system operates the way that it does, or why it doesn't function like an earlier version.

Best Practices
None

Sample Deliverables
Change Request Form

Last updated: 01-Feb-07 18:50

INFORMATICA CONFIDENTIAL

PHASE 8: OPERATE

12 of 23

Phase 8: Operate
Subtask 8.2.5 Monitor Usage Description
One of the most important aspects of the Operate Phase is monitoring how and when the organization's end users use the data integration solution. This subtask enables the project team to gauge what information is the most useful, how often it is retrieved, and what type of user generally requests it. All of this information can then be used to gauge the system's return on investment and to plan future enhancements. Monitoring the use of the presentation layer during User Acceptance Testing can indicate bottlenecks. When the project is complete, Operations continues to monitor the tasks to maintain system performance. The monitoring results can be used to plan for changes in hardware and/or network facilities to support increased requests to the presentation layer. For example, new requirements may be determined by the number of users requesting a particular report or by requests for more or different information in the report. These requirements may trigger changes in hardware capabilities and/or network bandwidth.

Prerequisites
None

Roles

Business Project Manager (Primary) Data Warehouse Administrator (Secondary) Database Administrator (DBA) (Primary) Production Supervisor (Primary) Project Sponsor (Review Only) Repository Administrator (Review Only) System Administrator (Approve) System Operator (Review Only)

Considerations
Most business organizations have tools in place to monitor the use of their production systems. Some end-user reporting tools have built-in reports for such purposes. The project team should review the available tools, as well as software that may be bundled with the RDBMS, and determine which tools best suit the project's monitoring needs. Informatica provides tools and sources to metadata that meet the need for monitoring information from the presentation layer, as well as the metadata on processes used to provide the presentation layer with data. This information can be extracted using Informatica tools to provide a complete view of information presentation usage.

Best Practices
None

Sample Deliverables
None

Last updated: 01-Feb-07 18:50

INFORMATICA CONFIDENTIAL

PHASE 8: OPERATE

13 of 23

Phase 8: Operate
Subtask 8.2.6 Monitor Data Quality Description
This subtask is concerned with data quality processes that may have been scoped into the project for late-project or post-project use. Such processes are an optional deliverable for most projects. However, there is a strong argument for building into the project plan data quality initiatives that will outlast the project. This argument is based upon the concept that the decision to incorporate ongoing monitoring should be considered a key deliverable, as it provides a means to monitor the existing data to ensure that previously identified data quality issues do not reoccur. For new data entering the system, monitoring provides a means to ensure that any new feeds do not compromise the integrity of the existing data. Moreover, the processes created for the Data Quality Audit task in the Analyze Phase may still be suitable for application to the data in the Operate Phase, or may be suitable with a reasonable amount of tuning. There are three types of data quality process relevant in this context: Processes that can be scheduled to monitor data quality on an ongoing basis Processes that can address or repair any data quality issues discovered Processes that can run at the point of data entry to prevent bad data from entering the system This subtask is concerned with agreeing to a strategy to use any or all such processes to validate the continuing quality of the business data and to safeguard against lapses in data quality in the future.

Prerequisites
None

Roles

Data Steward/Data Quality Steward (Primary) Production Supervisor (Secondary)

Considerations
Ongoing data quality initiatives bring the data quality process full-circle. This subtask is the logical conclusion to a process that began with the performance of a Data Quality Audit in the Analyze Phase and the creation of data quality processes (called plans in Informatica Data Quality terminology) in the Design Phase. The plans created during and after the Operate Phase are likely to be runtime or real-time plans. A runtime plan is one that can be scheduled for automated, regular execution (e.g., nightly or weekly). A real-time plan is one that can accept a live data feed, for example, from a third-party application, and write output data back to a live application. Real-time plans are useful in data entry scenarios; they can be used to capture data problems at the point of keyboard entry and thus before they are saved to the data system. The real-time plan can be used to check data entries, pass them if accurate, cleanse them of error, or reject them as unusable. Runtime plans can be used to monitor the data stored to the system; these plans can be run during periods of relative inactivity (e.g., weekends). For example, the Data Quality Developer may design a plan to identify duplicate records in the system, and the Developer or the system administrator can schedule the plan to run overnight. Any duplication issues found in the system can be addressed manually or by other data quality plans. The Data Quality Developer must discuss the importance of ongoing data quality management with the business early in the project, so that the business can decide what data quality management steps to take within the project or outside of it. The Data Quality Developer must also consider the impact that ongoing data quality initiatives are likely to have on the business systems. Should the data quality plans be deployed to several locations or centralized? Will the reference data be updated at regular intervals and by whom? Can plan resource files be moved easily across the enterprise? Once the project resources are unwound, these matters require a committed strategy from the business. However, the results clean, complete, compliant data are well worth it.

INFORMATICA CONFIDENTIAL

PHASE 8: OPERATE

14 of 23

Best Practices
None

Sample Deliverables
None

Last updated: 01-Feb-07 18:50

INFORMATICA CONFIDENTIAL

PHASE 8: OPERATE

15 of 23

Phase 8: Operate
Task 8.3 Maintain and Upgrade Environment Description
The goal in this task is to develop and implement an upgrade procedure to facilitate upgrading the hardware, software, and/or network hardware that supports the overall analytic solution. This plan should enable both the development and operations staff to plan for and execute system upgrades in an efficient, timely manner, with as little impact on the system's end users as possible.The deployed system incorporates multiple components, many of which are likely to undergo upgrades during the system's lifetime. Ideally, upgrading system components should be treated as a system change and as such, use many of the techniques discussed in 8.2.4 Track Change Control Requests. After these changes are prioritized and authorized by the Project Manager, an upgrade plan should be developed and executed. This plan should include the tasks necessary to perform the upgrades as well as the tasks necessary to update system documentation and the Operations Manual, when appropriate.

Prerequisites
None

Roles

Database Administrator (DBA) (Primary) Repository Administrator (Primary) System Administrator (Secondary)

Once the Build Phase has been completed, the development and operations staff should begin determining how upgrades should be carried out. The team should consider all aspects of the systems' architecture including any software and hardware being used. Special attention should be paid to software release schedules, hardware limitations, network limitations, and vendor release support schedules. This information will give the team an idea of how often and when various upgrades are likely to be required. When combined with knowledge of the data load windows, this will allow the operations team to schedule upgrades without adversely affecting the end users. Upgrading the Informatica software has some special implications. Many times, the software upgrade requires a repository upgrade as well. Thus, the operations team should factor in the time required to backup the repository, along with the time to perform the upgrade itself. In addition, the development staff should be involved in order to ensure that all current sessions are running as designed after the upgrade occurs.

Considerations

Best Practices
None

Sample Deliverables
None

Last updated: 01-Feb-07 18:50

INFORMATICA CONFIDENTIAL

PHASE 8: OPERATE

16 of 23

Phase 8: Operate
Subtask 8.3.1 Maintain Repository Description
A key operational aspect of maintaining PowerCenter repositories involves creating and implementing backup policies. These backups become invaluable if some catastrophic event occurs that requires the repository to be restored. Another key operational aspect is monitoring the size and growth of these repository databases, since daily use of these applications adds metadata to the repositories. The Administration Console manages Repository Services and repository content including backup and restoration. The following repository-related functions can be performed through the Administration Console: Enable or disable a Repository Service or service process. Alter the operating mode of a Repository Service. Create and delete repository content. Backup, copy, restore, or delete a repository. Promote a local repository to a global repository. Register and unregister a local repository. Manage user connections and locks. Send repository notification messages. Manage repository plug-ins. Upgrade a repository and upgrade a Repository Service to a Repository Service. Additional information about upgrades is available in the "Upgrading PowerCenter" chapter of the PowerCenter Installation and Configuration Guide.

Prerequisites
None

Roles

Database Administrator (DBA) (Secondary) Repository Administrator (Primary) System Administrator (Secondary)

Considerations Enabling and Disabling the Repository Service


A service process starts on a designated node when a Repository Service is enabled. PowerCenter's High Availability (HA) feature enables a service to fail-over to another node if the original node become unavailable. Administrative duties can be performed through the Administration Console only when the Repository Service is enabled.

Exclusive Mode
The Repository Service executes in normal or exclusive mode. Running the Repository Service in exclusive mode allows only one user to access the repository through the Administrative Console or pmrep command line program. It is advisable to set the Repository Service mode to exclusive when performing administrative tasks that require configuration updates involving deleting repository content or enabling version control, repository promotion, plug-in registration, or repository upgrades. Running in exclusive mode requires full privileges and permissions on a Repository Service. Precautions to take before switching to exclusive mode include user intent notification and disconnect verification. The Repository Service must be stopped and restarted to complete the mode switch.

Repository Backup
Although PowerCenter database tables may be included in Database Administration backup procedures, PowerCenter repository

INFORMATICA CONFIDENTIAL

PHASE 8: OPERATE

17 of 23

backup procedures and schedules are established to prevent data loss due to hardware, software, or user mishaps. The Repository Service provides backup processing for repositories through the Administrative Console or the pmrep command line program. The Repository Service backup function saves repository objects, connection information, and code page information in a file stored on the server in the backup location. PowerCenter backup scheduling should account for repository change frequency. Because development repositories typically change more frequently than production repositories, it may be desirable to backup the development repository nightly during heavy development efforts. Production repositories, on the other hand, may only need backup processing after development promotions are registered. Preserve the repository dates as part of the backup file name and, as new repositories are added, delete the older ones.

TIP A simple approach to automating PowerCenter repository backups is to use the pmrep command line program. Commands can be packaged and scheduled so that backups occur on a desired schedule without manual intervention. The backup file name should minimally include repository name and backup date (yyyymmdd).
A repository backup file is invaluable for reference when, as occasionally happens, questions arise as to the integrity of the repository or users encounter problems using it. A backup file enables technical support staff to validate repository integrity to, for example, eliminate the repository as a source of user problems. In addition, if the development or production repository is corrupted, the backup repository can be used to recover quickly.

TIP Keep in mind that you cannot restore a single folder or mapping from a repository backup. If, for example, a single important mapping is deleted by accident, you need to obtain a temporary database space from the DBA in order to restore the backup to a temporary repository DB. With the PowerCenter client tools, copy the lost metadata, and then remove the temporary repository from the database and the cache. If the developers need this service often, it may be prudent to keep the temporary database around all the time and copy over the development repository to the backup repository on a daily basis in addition to backing up to a file. Only the DBA should have access to the backup repository and requests should be made through him/her.

Repository Performance
Repositories may grow in size due to the execution of workflows, especially in large projects. As the repository grows, response may become slower. Consider these techniques to maintain a repository for better performance: Delete Old Session/Workflow Logs Information. Write a simple SQL script to delete old log information. Assuming that repository backups are taken on a consistent basis, you can always get old log information from the repository backup, if necessary. Perform Defragmentation. Much like any other database, repository databases should go undergo periodic "housecleaning" through statistics and defragmentation. Work with the DBAs to schedule this as a regular job.

Audit Trail
The SecurityAuditTrail configuration option in the Repository Service properties in the Administrative Console allows tracking changes to repository users, groups, privileges, and permissions. Enabling the audit trail causes the Repository Service to record security changes to the Repository Service log. Security audit changes logged include owner, owner's group or folder permissions, passwords changes of another user, user maintenance, group maintenance, global object permissions, and privileges.

Best Practices
Disaster Recovery Planning with PowerCenter HA Option Managing Repository Size Repository Administration Updating Repository Statistics

INFORMATICA CONFIDENTIAL

PHASE 8: OPERATE

18 of 23

Sample Deliverables
None

Last updated: 24-Jun-10 14:07

INFORMATICA CONFIDENTIAL

PHASE 8: OPERATE

19 of 23

Phase 8: Operate
Subtask 8.3.2 Upgrade Software Description
Upgrading the application software of a data integration solution to a new release is a continuous operations task as new releases are offered periodically by every software vendor in the forms of a major release or a hotfix. New software releases offer expanded functionality, new capabilities, consolidation of toolsets and fixes to existing functionality that can benefit the data integration environment and future integration work. However, an upgrade can be a disruptive event since project work may halt while the upgrade process is in progress. Given that data integration environments often contain a host of different applications, including Informatica software, database systems, operating systems, hardware configuration, EAI tools, BI tools and other related technologies, an upgrade in any one of these technologies may require an upgrade in any number of other software programs for the full system to function properly. System architects and administrators must continually evaluate the new software offerings across the various products in their data integration environment and balance the desire to upgrade with the impact of an upgrade. Software upgrades require a continuous assessment and planning process for both major releases and hotfix releases. A regular schedule should be defined where new releases are evaluated on functionality and need in the environment. Once approved, upgrades must be coordinated with ongoing development work, ongoing testing cycles and on-going production data integration. Appropriate project planning and coordination of software upgrades allow a data integration environment to stay current on its technology stack with minimal disruptions to production data integration efforts and development projects.

Tip Some consideration should be given for the various tools deployed in the integration environment and the business need to stay fresh with the latest release. It may be necessary to break out the tools into separate environments based on this need. Tip Due to the requirement for continuous assessment and planning it is considered a best practice to provide a technical project manager to oversee the process of conducting an upgrade. This will allow a single point person to coordinate and communicate the status of the upgrade process with its interdependencies.

Prerequisites
None

Roles

Database Administrator (DBA) (Secondary) Repository Administrator (Primary) System Administrator (Secondary) Technical Project Manager (Secondary)

Considerations
When faced with a new software release, the first consideration is to decide whether the upgrade is appropriate for the data integration environment. Consider the following questions when making a decision to upgrade: What new functionality and features could be leveraged inside of the environment? Are there any bug fixes or refinements that address current issues being faced? What is the remaining time on the support lifecycle for Informatica applications? What is the remaining time on the support lifecycle for associated data integration applications? How does the upgrade of associated data integration applications affect Informatica? How disruptive to the development environment can users be? Will new training be needed and how does this affect development cycles? Are there any additional software packages that may be introduced into the environment and how will they interact with INFORMATICA CONFIDENTIAL PHASE 8: OPERATE 20 of 23

the data integration environment?

Planning for the Upgrade


After the initial considerations and a decision to upgrade the software is made, a deeper level of planning needs to occur. All planning activities will lead to the development of a comprehensive project plan. One should go through an exercise that will address the following areas:

Deep Scope Analysis


The initial considerations made would not have addressed the level of detail needed to properly plan for the upgrade of the environment. A deep scope analysis, especially for major release upgrades, will provide the level of detail to help frame the project plan and the interdependencies that exist inside of the data integration environment. The areas to be included in the deep scope analysis are: Resource Analysis Requirements Analysis Timeline Analysis Business Impact Analysis Testing Analysis

Identify Upgrade Requirements


When executing a software upgrade there are various requirements that will need to be fulfilled. One will find requirements from the various business units affected by the upgrade; certain technical expertise will need to be called on to assist; corporate policies concerning architecture must be followed; and there will be the requirements needed by the data integration software itself. Determining these requirements on the front-end will ensure no surprises will develop prior to a go-no-go meeting for sign off for the production upgrade. Below are the various different requirement categories that must be considered.

Resource Requirements
Identifying the proper personnel to act as a dedicated lead in their respective field for the upgrade is key. This will eliminate waste in trying to hunt down a resource if an issue arises. This resource must be made aware of the upgrade process and what is expected of them. During this phase it should also be determined if this resource has the bandwidth to meet the timelines set forth in the timeline analysis. Every environment is different concerning the roles and responsibilities it possesses. Below are the typical resources that may be directly or indirectly associated with the upgrade: ETL Resources Server Administration Resources Database Administration Resources Networking Resources Testing Team Resources External Technical Resources

Tip New releases of software often include new features and functionality that may require some level of training for resources. Proper planning of the necessary training can ensure that employees are trained ahead of the upgrade so that productivity does not suffer once the new software is in place. Because it is impossible to properly estimate and plan the upgrade effort if there is no knowledge of the new features and potential environment changes, best practice dictates training a core set of architects and system administrators early in the upgrade process so they can assist in the upgrade planning process. Tip A new software release likely includes new and expanded features that may create a need to alter the current data integration processes. During the upgrade process, existing processes may be altered to incorporate and implement the new features. Time is required to make and test these changes as well. Reviewing the new features and assessing the impact on the upgrade process is a key pre-planning step.

INFORMATICA CONFIDENTIAL

PHASE 8: OPERATE

21 of 23

Software Requirements
Determining the requirements for all associated software applications for the data integration environment can prove to be very challenging. During the upgrade process a client will maximize the opportunity by trying to upgrade multiple software components in the environment. One will have to look at all of the applications (internal and external) to consider their needs and determine the cross dependencies. This must be reviewed from a version level as well as a hardware level. Doing this will help narrow down the possible configurations that the software will be able to run under, and then a determination of whether this configuration is supported by the enterprise must occur. Informatica provides a product availability matrix (PAM) that shows the cross dependencies for the various Informatica tools and the supported OS, Hardware, Repository RDMS and source / targets supported by each version and hotfix. Informatica provides product upgrade paths that show the prior releases that are supported by the upgrade process and any of the major caveats associated with the upgrade path. Usually the upgrade path information can be found in one of the following areas: Installation Guides Upgrade Guides Online Support Site Provided Help Documentation

Tip The PAMs will provide the high-level version information that is supported. In the case with UNIX or Linux environments if a particular package version is required the installation / upgrade guides usually contain this information and can be leveraged for a finer grain of detail.

Hardware Requirements
New versions of software all have requirements for the level of hardware needed for it to run in an optimal fashion. Some applications will run only on certain chipsets or with certain processor types. A similar exercise conducted with the software requirements where the available configurations are narrowed down will need to occur. Performance of the existing data integration environment is always a factor in determining the need for new or additional hardware. When making a decision on which hardware to select be cognizant of the performance level of the existing hardware and any growth factors that will be introduced in the following years. This will help aid in the sizing of the following areas: Additional servers for horizontal scaling (may affect licensing) More or faster CPUs (may affect licensing) Additional RAM to support increased loads or volume of data (may affect licensing) Disks to support increased IO requirements or volume of data

Tip In environments with production systems it is advisable to copy the production environment to a sandbox instance. The sandbox environment should be as close to an exact copy of production specs or future production specs as possible, including production data. A software upgrade is then performed on the sandbox instance and data integration processes run on both the current production and the sandbox instance for a period of time. In this way results can be compared over time to ensure that no unforeseen differences occur in the new software version. If differences do occur they can be investigated, resolved and accounted for in the final upgrade plan.

Infrastructure Requirements
As IT infrastructures become more advanced and compliance audits become the norm, any infrastructure requirements an enterprise may have must also be taken into consideration. These may include integration of all software to a corporate LDAP for user log in management. There may be certain security requirements that need to be met for web enabled applications or connectivity encryption levels. There may be a requirement for a disaster recovery environment as well as making sure the data integration environment is highly available.

Testing Requirements
INFORMATICA CONFIDENTIAL PHASE 8: OPERATE 22 of 23

During the upgrade of any offering of software, the testing phase is considered as important as the upgrade of the software code. Often more than 60 percent of the total upgrade time is devoted to testing the data integration environment with the new software release. Ensuring that data continues to flow correctly, software versions are compatible and new features do not cause unexpected results requires detailed testing. Defining the exact requirements for testing will help determine any specific test cases that are needed as well as the level of effort needed to meet the requirements. There are several categories of testing one should determine testing requirements for: Operability (Smoke Tests) Same Data Application Security Application Performance Third-Party Integration Disaster Recover High Availability Failover

Additional Requirements
While impossible to list all of the various requirements, one should seek out any additional ones not listed inside of this best practice. Having all requirements listed will round out any documentation concerning the upgrade and prevent any surprises after go live.

Upgrade Execution
Once a comprehensive plan for the upgrade is in place, the time comes to perform the actual upgrade on the development, test and production environments. The Installation Guides for each of the Informatica products, Upgrade Guides and online help provide instructions on upgrading and the step-by-step process for applying the new version of the software. A well-planned upgrade process is key to ensuring success during the transition from the current version to a new version with minimal disruption to the development and production environments. A smooth upgrade process enables data integration teams to take advantage of the latest technologies and advances in data integration.

Best Practices
Upgrading Metadata Manager Upgrading Data Analyzer Upgrading PowerCenter Upgrading PowerExchange

Sample Deliverables
None

Last updated: 31-Oct-10 02:35

INFORMATICA CONFIDENTIAL

PHASE 8: OPERATE

23 of 23

Velocity v9
Roles

2011 Informatica Corporation. All rights reserved.

Roles
Velocity Roles and Responsibilities Application Specialist Business Analyst Business Project Manager Data Architect Data Integration Developer Data Quality Developer Data Steward/Data Quality Steward Data Transformation Developer Data Warehouse Administrator Database Administrator (DBA) End User Integration Competency Center Director Legal Expert Metadata Manager Network Administrator PowerCenter Domain Administrator Presentation Layer Developer Production Supervisor Project Sponsor Quality Assurance Manager Repository Administrator Security Manager System Administrator System Operator Technical Architect Technical Project Manager Test Engineer Test Manager Training Coordinator User Acceptance Test Lead

INFORMATICA CONFIDENTIAL

ROLES

2 of 35

Velocity Roles and Responsibilities


The following pages describe the roles used throughout this Guide, along with the responsibilities typically associated with each. Please note that the concept of a role is distinct from that of an employee or full time equivalent (FTE). A role encapsulates a set of responsibilities that may be fulfilled by a single person in a part-time or full-time capacity, or may be accomplished by a number of people working together. The Velocity Guide refers to roles with an implicit assumption that there is a corresponding person in that role. For example, a task description may discuss the involvement of "the DBA" on a particular project, however, there may be one or more DBAs, or a person whose part-time responsibility is database administration. In addition, note that there is no assumption of staffing level for each role -- that is, a small project may have one individual filling the role of Data Integration Developer, Data Architect, and Database Administrator, while large projects may have multiple individuals assigned to each role. In cases where multiple people represent a given role, the singular role name is used, and project planners can specify the actual allocation of work among all relevant parties. For example, the methodology always refers to the Technical Architect, when in fact, there may be a team of two or more people developing the Technical Architecture for a very large development effort.

Data Integration Project - Sample Organization Chart

Last updated: 20-May-08 18:51

INFORMATICA CONFIDENTIAL

ROLES

3 of 35

Application Specialist
Successful data integration projects are built on a foundation of thorough understanding of the source and target applications. The Application Specialist is responsible for providing detailed information on data models, metadata, audit controls and processing controls to Business Analysts, Technical Architects and others regarding the source and/or target system. This role is normally filled by someone from a technical background who is able to query/analyze the data hands-on. The person filling this role should have a good business understanding of how the data is generated and maintained and good relationships with the Data Steward and the users of the data.

Reports to:
Technical Project Manager

Responsibilities:
Authority on application system data and process models Advises on known and anticipated data quality issues Supports the construction of representative test data sets

Qualifications/Certifications
Possesses excellent communication skills, both written and verbal Must be able to work effectively with both business and technical stakeholders Works independently with minimal supervision

Recommended Training
Informatica Data Explorer

Last updated: 09-Apr-07 15:38

INFORMATICA CONFIDENTIAL

ROLES

4 of 35

Business Analyst
The primary role of the Business Analyst (sometimes known as the Functional Analyst) is to represent the interests of the business in the development of the data integration solution. The secondary role is to function as an interpreter for business and technical staff, translating concepts and terminology and generally bridging gaps in understanding. Under normal circumstances, someone from the business community fills this role, since deep knowledge of the business requirement is indispensable. Ideally, familiarity with the technology and the development life-cycle allows the individual to function as the communications channel between technical and business users.

Reports to:
Business Project Manager

Responsibilities:
Ensures that the delivered solution fulfills the needs of the business (should be involved in decisions related to the business requirements) Assists in determining the data integration system project scope, time and required resources Provides support and analysis of data collection, mapping, aggregation and balancing functions Performs requirements analysis, documentation, testing, ad-hoc reporting, user support and project leadership Produces detailed business process flows, functional requirements specifications and data models and communicates these requirements to the design and build teams Conducts cost/benefit assessments of the functionality requested by end-users Prioritizes and balances competing priorities Plans and authors the user documentation set

Qualifications/Certifications
Possesses excellent communication skills, both written and verbal Must be able to work effectively with both business and technical stakeholders Works independently with minimal supervision Has knowledge of the tools and technologies used in the data integration solution Holds certification in industry vertical knowledge (if applicable)

Recommended Training
Interview/workshop techniques Project Management Data Analysis Structured analysis UML or other business design methodology Data Warehouse Development

Last updated: 09-Apr-07 15:20

INFORMATICA CONFIDENTIAL

ROLES

5 of 35

Business Project Manager


The Business Project Manager has overall responsibility for the delivery of the data integration solution. As such, the Business Project Manager works with the project sponsor, technical project manager, user community, and development team to strike an appropriate balance of business needs, resource availability, project scope, schedule, and budget to deliver specified requirements and meet customer satisfaction.

Reports to:
Project Sponsor

Responsibilities:
Develops and manages the project work plan Manages project scope, time-line and budget Resolves budget issues Works with the Technical Project Manager to procure and assign the appropriate resources for the project Communicates project progress to Project Sponsor(s) Is responsible for ensuring delivery on commitments and ensuring that the delivered solution fulfills the needs of the business Performs requirements analysis, documentation, ad-hoc reporting and project leadership

Qualifications/Certifications
Translates strategies into deliverables Prioritizes and balances competing priorities Possesses excellent communication skills, both written and verbal Results oriented team player Must be able to work effectively with both business and technical stakeholders Works independently with minimal supervision Has knowledge of the tools and technologies used in the data integration solution Holds certification in industry vertical knowledge (if applicable)

Recommended Training
Project Management

Last updated: 06-Apr-07 17:55

INFORMATICA CONFIDENTIAL

ROLES

6 of 35

Data Architect
The Data Architect is responsible for the delivery of a robust scalable data architecture that meets the business goals of the organization. The Data Architect develops the logical data models, and documents the models in Entity-Relationship Diagrams (ERD). The Data Architect must work with the Business Analysts and Data Integration Developers to translate the business requirements into a logical model. The logical model is captured in the ERD, which then feeds the work of the Database Administrator, who designs and implements the physical database. Depending on the specific structure of the development organization, the Data Architect may also be considered a Data Warehouse Architect, in cooperation with the Technical Architect. This role involves developing the overall Data Warehouse logical architecture, specifically the configuration of the data warehouse, data marts, and an operational data store or staging area if necessary. The physical implementation of the architecture is the responsibility of the Database Administrator.

Reports to:
Technical Project Manager

Responsibilities:
Designs an information strategy that maximizes the value of data as an enterprise asset Maintains logical/physical data models Coordinates the metadata associated with the application Develops technical design documents Develops and communicates data standards Maintains Data Quality metrics Plans architectures and infrastructures in support of data management processes and procedures Supports the build out of the Data Warehouse, Data Marts and operational data store Effectively communicates with other technology and product team members

Qualifications/Certifications
Strong understanding of data integration concepts Understanding of multiple data architectures that can support a Data Warehouse Ability to translate functional requirements into technical design specifications Ability to develop technical design documents and test case documents Experience in optimizing data loads and data transformations Industry vertical experience is essential Project Solution experience is desired Has had some exposure to Project Management Has worked with Modeling Packages Has experience with at least one RDBMS Strong Business Analysis and problem solving skills Familiarity with Enterprise Architecture Structures (Zachman/TOGAF)

Recommended Training
Modeling Packages Data Warehouse Development

Last updated: 01-Feb-07 18:51

INFORMATICA CONFIDENTIAL

ROLES

7 of 35

Data Integration Developer


The Data Integration Developer is responsible for the design, build, and deployment of the project's data integration component. A typical data integration effort usually involves multiple Data Integration Developers developing the Informatica mappings, executing sessions, and validating the results.

Reports to:
Technical Project Manager

Responsibilities:
Uses the Informatica Data Integration platform to extract, transform, and load data Develops Informatica mapping designs Develops Data Integration Workflows and load processes Ensures adherence to locally defined standards for all developed components Performs data analysis for both Source and Target tables/columns Provides technical documentation of Source and Target mappings Supports the development and design of the internal data integration framework Participates in design and development reviews Works with System owners to resolve source data issues and refine transformation rules Ensures performance metrics are met and tracked Writes and maintains unit tests Conduct QA Reviews Performs production migrations

Qualifications/Certifications
Understands data integration processes and how to tune for performance Has SQL experience Possesses excellent communications skills Has the ability to develop work plans and follow through on assignments with minimal guidance Has Informatica Data Integration Platform experience Is an Informatica Certified Designer Has RDBMS experience Has the ability to work with business and system owners to obtain requirements and manage expectations

Recommended Training
Data Modeling PowerCenter Level I & II Developer PowerCenter - Performance Tuning PowerCenter - Team Based Development PowerCenter - Advanced Mapping Techniques PowerCenter - Advanced Workflow Techniques PowerCenter - XML Support PowerCenter - Data Profiling PowerExchange

Last updated: 01-Feb-07 18:51

INFORMATICA CONFIDENTIAL

ROLES

8 of 35

Data Quality Developer


The Data Quality Developer (DQ Developer) is responsible for designing, testing, deploying, and documenting the project's data quality procedures and their outputs. The DQ Developer provides the Data Integration Developer with all relevant outputs and results from the data quality procedures, including any ongoing procedures that will run in the Operate phase or after project-end. The DQ Developer must provide the Business Analyst with the summary results of data quality analysis as needed during the project. The DQ Developer must also document at a functional level how the procedures work within the data quality applications. The primary tasks associated with this role are to use Informatica Data Quality and Informatica Data Explorer to profile the project source data, define or confirm the definition of the metadata, cleanse and accuracy-check the project data, check for duplicate or redundant records, and provide the Data Integration Developer with concrete proposals on how to proceed with the ETL processes.

Reports to:
Technical Project Manager

Responsibilities:
Profile source data and determine all source data and metadata characteristics Design and execute Data Quality Audit Present profiling/audit results, in summary and in detail, to the business analyst, the project manager, and the data steward Assist the business analyst/project manager/data steward in defining or modifying the project plan based on these results Assist the Data Integration Developer in designing source-to-target mappings Design and execute the data quality plans that will cleanse, de-duplicate, and otherwise prepare the project data for the Build phase Test Data Quality plans for accuracy and completeness Assist in deploying plans that will run in a scheduled or batch environment Document all plans in detail and hand-over documentation to the customer Assist in any other areas relating to the use of data quality processes, such as unit testing

Qualifications/Certifications
Has knowledge of the tools and technologies used in the data quality solution Results oriented team player Possesses excellent communication skills, both written and verbal Must be able to work effectively with both business and technical stakeholders

Recommended Training
Data Quality Workbench I & II Data Explorer Level I PowerCenter Level I Developer Basic RDBMS Training Data Warehouse Development

Last updated: 15-Feb-07 17:34

INFORMATICA CONFIDENTIAL

ROLES

9 of 35

Data Steward/Data Quality Steward


The Data Steward owns the data and associated business and technical rules on behalf of the Project Sponsor. This role has responsibility for defining and maintaining business and technical rules, liaising with the business and technical communities, and resolving issues relating to the data. The Data Steward will be the primary contact for all questions relating to the data, its use, processing and quality. In essence, this role formalizes the accountability for the management of organizational data. Typically the Data Steward is a key member of a Data Stewardship Committee put into place by the Project Sponsor. This committee will include business users and technical staff such as Application Experts. There is often an arbitration element to the role where data is put to different uses by separate groups of users whose requirements have to be reconciled.

Reports to:
Business Project Manager

Responsibilities:
Records the business use for defined data Identifies opportunities to share and re-use data Decides upon the target data quality metrics Monitors the progress towards, and tuning of, data quality target metrics Oversees data quality strategy and remedial measures Participates in the enforcement of data quality standards Enters, maintains and verifies data changes Ensures the quality, completeness and accuracy of data definitions Communicates concerns, issues and problems with data to the individuals that can influence change Researches and resolves data issues

Qualifications/Certifications
Possesses strong analytical and problem solving skills Has experience in managing data standardization in a large organization, including setting and executing strategy Previous industry vertical experience is essential Possesses excellent communication skills, both written and verbal Exhibits effective negotiating skills Displays meticulous attention to detail Must be able to work effectively with both business and technical stakeholders Works independently with minimal supervision Project solution experience is desirable

Recommended Training
Data Quality Workbench Level I Data Explorer Level I

Last updated: 15-Feb-07 17:34

INFORMATICA CONFIDENTIAL

ROLES

10 of 35

Data Transformation Developer


The Data Transformation Developer is responsible for the design, build and deployment of a project's data transformation components. A typical data integration or B2B oriented solution could have multiple Data Transformation Developers involved in developing the Informatica B2B Data Exchange transformation components (e.g., parsers, serializers, mappers and data splitters) and executing the transformation components and validating the results.

Reports to:
Technical Project Manager

Responsibilities:
Uses the Informatica Data Transformation technologies platform to parse, map, serialize and split data in unstructured, semi-structured and complex XML formats Develops Informatica data transformation designs Works with Data Integration developers to develop Data Integration Workflows and load processes Ensures adherence to proprietary and open data format standards for all developed components Performs data analysis for source and target data Provides technical documentation of transformation components Supports the development and design of the internal data integration framework Participates in design and development reviews Works with system owners to resolve data transformation issues and to refine transformation components Ensures that performance metrics are met and tracked Authors and maintains unit tests Conducts QA Reviews Performs production migrations of data transformation services May create external data transformation components using Java, .Net, XSLT or other technologies

Qualifications/Certifications
Solid understanding of XML and XML Schema technologies Understands data transformation (e.g., parsing, mapping and serialization processes) Possesses excellent communications skills Ability to develop work plans and follow through on assignments with minimal guidance Informatica B2B Data Exchange platform experience Knowledge of appropriate data formats as required (e.g., EDI, HIPPA, Swift, etc.) Data transformation experience Ability to work with business and system owners to obtain requirements and manage expectations

Recommended Training
B2B Data Exchange Basic and intermediate training Data Modeling PowerCenter Level I & II Developer PowerCenter Integrating complex data exchange PowerCenter XML Support XML and XML schema design (general)

Last updated: 01-Nov-10 16:30

INFORMATICA CONFIDENTIAL

ROLES

11 of 35

Data Warehouse Administrator


The scope of the Data Warehouse Administrator role is similar to that of the DBA. A typical data integration solution however, involves more than a single target database and the Data Warehouse Administrator is responsible for coordinating the many facets of the solution, including operational considerations of the data warehouse, security, job scheduling and submission, and resolution of production failures.

Reports to:
Technical Project Manager

Responsibilities:
Monitors and supports the Enterprise Data Warehouse environment Manages the data extraction, transformation, movement, loading, cleansing and updating processes into the DW environment Maintains the DW repository Implements database security Sets standards and procedures for the DW environment Implements technology improvements Works to resolve technical issues Contributes to technical and system architectural planning Tests and implements new technical solutions

Qualifications/Certifications
Experience in supporting Data Warehouse environments Familiarity with database, integration and presentation technology Experience in developing and supporting real-time and batch-driven data movements Solid understanding of relational database models and dimensional data models Strategic planning and system analysis Able to work effectively with both business and technical stakeholders Works independently with minimal supervision

Recommended Training
DBMS Administration Data Warehouse Development PowerCenter Administrator Level I & II PowerCenter Security and Migration PowerCenter Metadata Manager

Last updated: 01-Nov-10 16:30

INFORMATICA CONFIDENTIAL

ROLES

12 of 35

Database Administrator (DBA)


The Database Administrator (DBA) in a Data Integration Solution is typically responsible for translating the logical model (i.e., the ERD) into a physical model for implementation in the chosen DBMS, implementing the model, developing volume and capacity estimates, performance tuning, and general administration of the DBMS. In many cases, the project DBA also has useful knowledge of existing source database systems. In most cases, a DBA's skills are tied to a particular DBMS, such as Oracle or Sybase. As a result, an analytic solution with heterogeneous sources/targets may require the involvement of several DBAs. The Project Manager and Data Warehouse Administrator are responsible for ensuring that the DBAs are working in concert toward a common solution.

Reports to:
Technical Project Manager

Responsibilities:
Plans, implements and supports enterprise databases Establishes and maintains database security and integrity controls Delivers database services while managing to policies, procedures and standards Tests and implements new technical solutions Monitors and supports the database infrastructure (including clients) Develops volume and capacity estimates Proposes and implements enhancements to improve performance and reliability Provides operational support of databases, including backup and recovery Develops programs to migrate data between systems Works to resolve technical issues Contributes to technical and system architectural planning Supports data integration developers in troubleshooting performance issues Collaborates with other Departments (i.e., Network Administrators) to identify and resolve performance issues

Qualifications/Certifications
Experience in database administration, backup and recovery Expertise in database configuration and tuning Appreciation of DI tool-set and associated tools Experience in developing and supporting ETL real-time and batch processes Strategic planning and system analysis Strong analytical and communication skills Able to work effectively with both business and technical stakeholders Ability to work independently with minimal supervision

Recommended Training
DBMS Administration

Last updated: 01-Nov-10 16:30

INFORMATICA CONFIDENTIAL

ROLES

13 of 35

End User
The End User is the ultimate "consumer" of the data in the data warehouse and/or data marts. As such, the end user represents a key customer constituent (management is another), and must therefore be heavily involved in the development of a data integration solution. Specifically, a representative of the End User community must be involved in gathering and clarifying the business requirements, developing the solution and User Acceptance Testing (if applicable).

Reports to:
Business Project Manager

Responsibilities:
Gathers and clarifies business requirements Reviews technical design proposals Participates in User Acceptance testing Provides feedback on the user experience

Qualifications/Certifications
Strong understanding of the business' processes Good communication skills

Recommended Training
Data Analyzer - Quickstart Data Analyzer - Report Development

Last updated: 01-Nov-10 16:30

INFORMATICA CONFIDENTIAL

ROLES

14 of 35

Integration Competency Center Director


The Integration Competency Center Director (ICC Director) has overall responsibility for planning and managing an Integration Competency Center. The ICC Directors works with cross-functional management teams, project teams, technology groups, business architects, external supply-chain partners, technology suppliers, and other stakeholders to plan and operate the ICC. The specific activities vary depending upon the ICC scope, life-cycle, organizational model and maturity level.

Reports to:
C-level executive such as the CIO, CTO or COO but may also report to a business area executive.

Responsibilities:
Determines the scope and mission of the ICC and gains executive support Defines the organizational model for the ICC and its core operating principles Develops and manage the ICC work plan and service levels Manages ICC processes, operating rhythm and budget Resolves budget issues Works with the Management teams to procure and assign the appropriate resources Communicates progress and exceptions to Project Sponsor(s) Ensures delivery on commitments and service level fulfillment Continually examines competitor and best in class performers to identify ways to enhance the integration service and its value to the enterprise. Partners with LOB teams to define and develop shared goals and expectations. Finds and develops common ground among a wide range of stakeholders Advises senior executives on how solutions will support short- and long-term strategic directions Contracts and sets clear expectations with internal customers about goals, roles, resources, costs, timing, etc. Improves the quality of the integration program by developing and coaching staff across the enterprise to build their individual and collective performance and capability to the standards that will meet the current and future needs of the business. Organizes, leads, and facilitates cross-entity, enterprise-wide redesign initiatives that will encompass an end-to-end analysis and future state redesign that requires specialized knowledge or skills critical to the redesign effort Is accountable for planning, conducting, and directing the most complex, strategic, corporate-wide business problems to be solved with automated systems

Qualifications and Certifications


Proven and sustained leadership skills Ability to work collaboratively with colleagues and staff to create a results driven organization Ability to recruit, hire, lead and motivate staff Possesses excellent communication skills, both written and verbal Must be able to work effectively with both business and technical stakeholders Translates strategies into deliverables Prioritizes and balances competing priorities Demonstrates deep/broad integration and functional skills Demonstrates a business perspective that is much broader than one function/group Ability to find and develop common ground among a wide range of stakeholders Able to uncover hidden growth opportunities within market/industry segments to create a competitive advantage Displays personal courage by taking a stand on controversial and challenging changes Engages others in strategic discussions to leverage their insights and create shared ownership of the outcomes

Recommended Training
Project Management INFORMATICA CONFIDENTIAL ROLES 15 of 35

Enterprise Architectures

Last updated: 01-Nov-10 16:30

INFORMATICA CONFIDENTIAL

ROLES

16 of 35

Legal Expert
The Legal Expert provides a detailed understanding of one or more areas of external regulation to which the business is subject. The role includes responsibility for providing full details of the compliance reporting, auditing and associated procedures to which the business must comply. It is normally filled by someone with a business background that has good contact with the regulatory bodies involved, who is capable of understanding and interpreting any legal requirements and advice that might be received.

Reports to:
Project Sponsor

Responsibilities:
Provides oversight and advice to preclude legal implications Clarifies regulatory compliance policies Provides details on auditing and reporting requirements Manages compliance interactions and negotiations

Qualifications/Certifications
Expertise in regulatory compliance requirements for the business Strong understanding of the business' processes Evaluative, analytical and interpretative skills

Recommended Training
N/A

Last updated: 01-Nov-10 16:30

INFORMATICA CONFIDENTIAL

ROLES

17 of 35

Metadata Manager
The Metadata Manager's primary role is to serve as the central point of contact for all corporate metadata management. This role involves setting the company's metadata strategy, developing standards with the data administration group, determining metadata points of integration between disparate systems, and ensuring the ability to deliver metadata to business and technical users. The Metadata Manager is required to work across business and technical groups to ensure that consistent metadata standards are followed in all existing applications as well as in new development. The Metadata Manager also monitors PowerCenter repositories for accuracy and metadata consistency.

Reports to:
Business Project Manager

Responsibilities:
Formulates and implements the metadata strategy Captures and integrates metadata from heterogeneous metadata sources Implements and governs best practices relating to enterprise metadata management standards Determines metadata points of integration between disparate systems Ensures the ability to deliver metadata to business and technical users Monitors development repositories for accuracy and metadata consistency Identifies and profiles data sources to populate the metadata repository Designs metadata repository models

Qualifications/Certifications
Business sector experience is essential Experience in implementing and managing a repository environment Experience in data modeling (relational and dimensional) Experience in using repository tools Solid knowledge of general data architecture concepts, standards and best practices Strong analytical skills Excellent communication skills, both written and verbal Proven ability to work effectively with both business users and technical stakeholders

Recommended Training
DBMS Basics Data Modeling PowerCenter - Metadata Manager

Last updated: 01-Nov-10 16:30

INFORMATICA CONFIDENTIAL

ROLES

18 of 35

Network Administrator
The Network Administrator is responsible for maintaining the organization's computer networks. This is relevant to the analytic solution development effort because the movement of data between source and target systems is likely to place a heavy burden on network resources. The Network Administrator must be aware of the data volumes and schedules involved to ensure that adequate network capacity is available, and that other network users are not significantly (negatively) impacted.

Reports to:
Information Technology Lead

Responsibilities:
Designs and implements complex local and wide-area networked environments Manages a large, complex network or site and ensures that network capacity is available Provides appropriate security for data transmission over networked resources Establishes and recommends policies on systems and services usage Provides 3rd level support for applications executing on a company network Develops and communicates networking policies and standards Plans network architectures and infrastructures in support of data management processes and procedures Collaborates with other departments (i.e., DBA's) to identify and resolve performance issues Effectively communicates with other technology and project team members

Qualifications/Certifications
Experience in enterprise network management Solid understanding of networking/distributed computing environment technologies Detailed knowledge of network and server operating systems Experience in performance analysis and tuning to increase throughput and reliability Expertise in routing principles and client/server computing Strong Business Analysis and problem solving skills Network certification in relevant operating system Planning and implementation of preventative maintenance strategies

Recommended Training
Network Administration in relevant operating system

Last updated: 01-Nov-10 16:31

INFORMATICA CONFIDENTIAL

ROLES

19 of 35

PowerCenter Domain Administrator


The PowerCenter Domain Administrator is responsible for administering the Informatica Data Integration environment. This involves the management and administration of all components in the PowerCenter domain. The PowerCenter Domain Administrator works closely with the Technical Architect and other project personnel during the Architect, Build and Deploy phases to plan, configure, support and maintain the desired PowerCenter configuration. The PowerCenter Domain Administrator is reponsible for the domain security configuration, licensing and the physical linstall and location of the services and nodes that compose the domain.

Reports to:
Technical Project Manager

Responsibilities:
Manages the PowerCenter Domain, Nodes, Service Manager and Application Services Develops Disaster recovery and failover strategies for the Data Integration Environment Responsible for High Availability and PowerCenter Grid configuration Creates new services as nodes as needed Ensures proper configuration of the PowerCenter Domain components Ensures proper application of the licensing files to nodes and services Manages user and user group access to the domain components Manages backup and recovery of the domain metadata and appropriate shared file directories Monitors domain services and troubleshoots any errors Applies software updates as required Tests and implements new technical solutions

Qualifications/Certifications
Informatica Certified Administrator Experience in supporting Data Warehouse environments Experience in developing and supporting ETL real-time and batch processes Solid understanding of relational database models and dimensional data models

Recommended Training
PowerCenter Administrator Level I and Level II

Last updated: 01-Nov-10 16:31

INFORMATICA CONFIDENTIAL

ROLES

20 of 35

Presentation Layer Developer


The Presentation Layer Developer is responsible for the design, build, and deployment of the presentation layer component of the data integration solution. This component provides the user interface to the data warehouses, data marts and other products of the data integration effort. As the interface is highly visible to the enterprise, a person in this role must work closely with end users to gain a full understanding of their needs. The Presentation Layer Developer designs the application, ensuring that the end-user requirements gathered during the requirements definition phase are accurately met by the final build of the application. In most cases, the developer works with front-end Business Intelligence tools, such as Cognos, Business Objects and others. To be most effective, the Presentation Layer Developer should be familiar with metadata concepts and the Data Warehouse/Data Mart data model.

Reports to:
Technical Project Manager

Responsibilities:
Collaborates with ends users and other stakeholders to define detailed requirements Designs business intelligence solutions that meet user requirements for accessing and analyzing data Works with front-end business intelligence tools to design the reporting environment Works with the DBA and Data Architect to optimize reporting performance Develops supporting documentation for the application Participates in the full testing cycle

Qualifications/Certifications
Solid understanding of metadata concepts and the Data Warehouse/Data Mart model Aptitude with front-end business intelligence tools (i.e., Cognos, Business Objects, Informatica Data Analyzer) Excellent problem solving and trouble-shooting skills Solid interpersonal skills and ability to work with business and system owners to obtain requirements and manage expectations Capable of expressing technical concepts in business terms

Recommended Training
Informatica Data Analyzer Data Warehouse Development

Last updated: 01-Nov-10 16:31

INFORMATICA CONFIDENTIAL

ROLES

21 of 35

Production Supervisor
The Production Supervisor has operational oversight for the production environment and the daily execution of workflows, sessions and other data integration processes. Responsibilities includes, but are not limited to training and supervision of system operators, review of execution statistics, managing the scheduling for upgrades to the system and application software as well as the release of data integration processes.

Reports to:
Information Technology Lead

Responsibilities:
Manages the daily execution of workflows and sessions in the production environment Trains and supervises the work of system operators Reviews and audits execution logs and statistics and escalates issues appropriately Schedules the release of new sessions or workflows Schedules upgrades to the system and application software Ensures that work instructions are followed Monitors data integration processes for performance Monitors data integration components to ensure appropriate storage and capacity for daily volumes

Qualifications/Certifications
Production supervisory experience Effective leadership skills Strong problem solving skills Excellent organizational and follow-up skills

Recommended Training
PowerCenter PowerCenter PowerCenter PowerCenter Level I Developer Team Based Development Advanced Workflow Techniques Security and Migration

Last updated: 01-Nov-10 16:31

INFORMATICA CONFIDENTIAL

ROLES

22 of 35

Project Sponsor
The Project Sponsor is typically a member of the business community rather than an IT/IS resource. This is important because the lack of business sponsorship is often a contributing cause of systems implementation failure. The Project Sponsor often initiates the effort, serves as project champion, guides the Project Managers in understanding business priorities, and reports status of the implementation to executive leadership. Once an implementation is complete, the Project Sponsor may also serve as "chief evangelist", bringing word of the successful implementation to other areas within the organization.

Reports to:
Executive Leadership

Responsibilities:
Provides the business sponsorship for the project Champions the project within the business Initiates the project effort Guides the Project Managers in understanding business requirements and priorities Assists in determining the data integration system project scope, time, budget and required resources Reports status of the implementation to executive leadership

Qualifications/Certifications
Has industry vertical knowledge

Recommended Training
N/A

Last updated: 01-Nov-10 16:31

INFORMATICA CONFIDENTIAL

ROLES

23 of 35

Quality Assurance Manager


The Quality Assurance (QA) Manager ensures that the original intent of the business case is achieved in the actual implementation of the analytic solution. This involves leading the efforts to validate the integrity of the data throughout the data integration processes, and ensuring that the utlimate data target has been accurately derived from the source data. The QA Manager can be a member of the IT organization, but serve as a liaison to the business community (i.e., the Business Analysts and End Users). In situations where issues arise with regard to the quality of the solution, the QA Manager works with project management and the development team to resolve them. Depending upon the test approach taken by the project team, the QA Manager may also serve as the Test Manager.

Reports to:
Technical Project Manager

Responsibilities:
Leads the effort to validate the integrity of the data through the data integration processes Ensures that the data contained in the data integration solution has been accurately derived from the source data Develops and maintains quality assurance plans and test requirements documentation Verifies compliance to commitments contained in quality plans Works with the project management and development teams to resolve issues Participates in the enforcement of data quality standards Communicates concerns, issues and problems with data Participates in the testing and post-production verification Together with the Technical Lead and the Repository Administrator, articulates the development standards Advises on the development methods to ensure that quality is built in Designs the QA and standards enforcement strategy Together with the Test Manager, coordinates the QA and Test strategies Manages the implementation of the QA strategy

Qualifications/Certifications
Industry vertical knowledge Solid understanding of the Software Development Life Cycle Experience in quality assurance performance, auditing processes, best practices and procedures Experience with automated testing tools Knowledge of Data Warehouse and Data Integration enterprise environments Able to work effectively with both business and technical stakeholders

Recommended Training
PowerCenter Level I Developer Infomatica Data Explorer Informatica Data Quality Workbench Project Management

Last updated: 01-Nov-10 16:31

INFORMATICA CONFIDENTIAL

ROLES

24 of 35

Repository Administrator
The Repository Administrator is responsible for administering a PowerCenter or Data Analyzer Repository. This requires maintaining the organization and security of the objects contained in the repository. It entails developing and maintaining the folder and schema structures, managing users, groups, and roles, global/local repository relationships and backup and recovery. During the development effort, the Repository Administrator is responsible for coordinating migrations, maintaining database connections, establishing and promoting naming conventions and development standards, and developing back-up and restore procedures for the repositories. The Repository Administrator works closely with the Technical Architect and other project personnel during the Architect, Build and Deploy phases to plan, configure, support and maintain the desired PowerCenter and Data Analyzer configuration.

Reports to:
Technical Project Manager

Responsibilities:
Develops and maintains the repository folder structure Manages user and user group access to objects in the repository Manages PowerCenter global/local repository relationships and security levels Coordinates the deployment of objects during the development effort Establishes and promotes naming conventions and development standards Develops back-up and restore procedures for the repository Works to resolve technical issues Contributes to technical and system architectural planning Tests and implements new technical solutions

Qualifications/Certifications
Informatica Certified Administrator Experience in supporting Data Warehouse environments Experience in developing and supporting ETL real-time and batch processes Solid understanding of relational database models and dimensional data models

Recommended Training
PowerCenter Administrator Level I and Level II Data Analyzer Introduction

Last updated: 01-Nov-10 16:31

INFORMATICA CONFIDENTIAL

ROLES

25 of 35

Security Manager
The Security manager is responsible for defining and ensuring adherence to security policies and standards within the Enterprise Information Technology (IT) environment. He/She works to develop security policies and guidelines in conjunction with application and process owners. The Security Manager is responsible for promoting organizational security awareness and advising management about security issues and potential threats. He/she may also carry out risk analysis activities. It is important that the Security Manager be current on the latest security problems/risks/resolutions. Interacting with partner companies and security organizations to share ideas around security issues is highly recommended.

Reports to:
Information Techology Lead

Responsibilities:
Develops enterprise security standards and policies Makes recommendations to achieve operational and project goals in conjunction with security requirements and best practices Monitors and assesses security vulnerabilities across technical disciplines Acts as a subject matter expert for the technical security arena Provides advisory services to direct reports, security team members and the business Serves as an escalation point for security related issues and works to resolve security risks and to minimize potential threats Researches and stays current on new and evolving technologies and associated security risks Develops security tools and utilities that may be leveraged by the organization Reviews security specific change requests and provides risk assessments for changes Builds and maintains relationships with peer security professionals

Qualifications/Certifications
Experience in the IT security sector Ability to lead and manage technical staff Capable of designing and implementing state-of-the-art security services Experience providing advisory services in the area of technology and security architecture Broad understanding of computer and communication systems and networks and their interrelationships Strong task and project management skills with the ability to manage parallel work streams Good understanding of privacy and regulatory laws, their implications and mitigating measures CISSP or related certification

Recommended Training
Security Management Practices Security Architecture Operations Security Cryptography Network and Internet Security Disaster Recovery

Last updated: 01-Nov-10 16:31

INFORMATICA CONFIDENTIAL

ROLES

26 of 35

System Administrator
The System Administrator is responsible for managing the servers and operating systems used by the data integration solution. System Administrators are generally responsible for server-side tasks, but may also work on desktop tasks as well. Typical tasks may include setting up operating system level user accounts, installing server-side software, allocating and maintaining system disks and memory, and adding and configuring new workstations.

Reports to:
Information Technology Lead

Responsibilities:
Builds, operates and maintains servers and server infrastructure ensuring adherence to Service Level Agreements Troubleshoots problems with networks, web services, mail services and all general aspects of an ASP solution Monitors system logs and activity on servers and devices Plans new server deployments and documents server builds May lead or guide the work of other staff engaged in similar functions Coordinates with development teams to schedule releases of software updates Provides 3rd Level support for IT requests and Help Desk tickets Participates in the development of server usage policies and standards Collaborates with other departments (i.e., Network Administrator) to identify and resolve performance issues Evaluates and tests new hardware and software for client/server applications

Qualifications/Certifications
Experience in enterprise server management Solid understanding of client/server computing environment technologies Detailed knowledge of server operating systems Experience in performance analysis and tuning to increase throughput and reliability Experience in managing NAS and SAN solutions Strong problem solving skills Administrator certification in relevant operating systems Planning and implementation of preventative maintenance strategies

Recommended Training
Sever Administration/Certification in relevant operating systems (i.e., Microsoft, UNIX, Linux) PowerCenter Level I Administrator

Last updated: 01-Nov-10 16:32

INFORMATICA CONFIDENTIAL

ROLES

27 of 35

System Operator
The System Operator is primarily responsible for monitoring workflows and sessions and other components of the data integration solution. In the event of an execution failure, the System Operator must be able to read workflow/session logs and/or any other associated log files. In addition, he/she should follow pre-defined procedures for addressing the problem, including re-initiating the failed job(s) and/or notifying the data integration team.

Reports to:
Production Supervisor

Responsibilities:
Monitors system logs and activity on servers and devices Monitors the execution of batch jobs and scheduled backup jobs Follows pre-defined processes and procedures for addressing and escalating issues Operates server and peripheral devices to run production requests and create reports Collaborates with other departments (i.e., Server Administrator) to identify and resolve issues Implements restart/recovery strategies

Qualifications/Certifications
Experience in operations environment of assigned area General knowledge of client/server computing environment technologies Understanding of server operating systems Basic problem solving skills Administrator experience in relevant operating systems is desirable PowerCenter workflow monitor

Recommended Training
Sever Administration in relevant operating systems (i.e., Microsoft, UNIX, Linux) PowerCenter Level I Developer

Last updated: 01-Nov-10 16:32

INFORMATICA CONFIDENTIAL

ROLES

28 of 35

Technical Architect
The Technical Architect is responsible for the conceptualization, design, and implementation of a sound technical architecture, which includes both hardware and software components. The Architect interacts with the Project Management and design teams early in the development effort in order to understand the scope of the business problem and its solution. The Technical Architect must always consider both current (stated) requirements and future (unstated) directions. Having this perspective helps to ensure that the architecture can expand to correspond with the growth of the data integration solution. This is particularly critical given the highly iterative nature of data integration solution development.

Reports to:
Technical Project Manager

Responsibilities:
Develops the architectural design for a highly scalable, large volume enterprise solution Performs high-level architectural planning, proof-of-concept and software design Defines and implements standards, shared components and approaches Functions as the Design Authority in technical design reviews Contributes to development project estimates, scheduling and development reviews Approves code reviews and technical deliverables Assures architectural integrity Maintains compliance with change control, SDLC and development standards Develops and reviews implementation plans and contingency plans

Qualifications/Certifications
Software development expertise (previous development experience of the application type) Deep understanding of all technical components of the application solution Understanding of industry standard data integration architectures Ability to translate functional requirements into technical design specifications Ability to develop technical design documents Strong Business Analysis and problem solving skills Familiarity with Enterprise Architecture Structures (Zachman/TOGAF) or equivalent Experience and/or training in appropriate platforms for the project Familiarity with appropriate modeling techniques such as UML and ER modeling as appropriate

Recommended Training
Operating Systems DBMS PowerCenter Developer and Administrator - Level I PowerCenter New Features Basic and advanced XML

Last updated: 01-Nov-10 16:32

INFORMATICA CONFIDENTIAL

ROLES

29 of 35

Technical Project Manager


The Technical Project Manager has overall responsibility for managing the technical resources within a project. As such, he/she works with the project sponsor, business project manager and development team to assign the appropriate resources for a project within the scope, schedule, and budget and to ensure that project deliverables are met.

Reports to:
Project Sponsor or Business Project Manager

Responsibilities:
Defines and implements the methodology adopted for the project Liaises with the Project Sponsor and Business Project Manager Manages project resources within the project scope, time-line and budget Ensures all business requirements are accurate Communicates project progress to Project Sponsor(s) Is responsible for ensuring delivery on commitments and ensuring that the delivered solution fulfills the needs of the business Performs requirements analysis, documentation, ad-hoc reporting and resource leadership

Qualifications/Certifications
Translates strategies into deliverables Prioritizes and balances competing priorities Must be able to work effectively with both business and technical stakeholders Has knowledge of the tools and technologies used in the data integration solution Holds certification in industry vertical knowledge (if applicable)

Recommended Training
Project Management Techniques PowerCenter Developer Level I PowerCenter Administrator Level I Data Analyzer Introduction

Last updated: 01-Nov-10 16:32

INFORMATICA CONFIDENTIAL

ROLES

30 of 35

Test Engineer
The Test Engineer is responsible for completion of test plans and their execution. During test planning, the Test Engineer works with the Testing Manager/Quality Assurance Manager to finalize the test plans and to ensure that the requirements are testable. The Test Engineer is also responsible for complete execution including design and implementing test scripts, test suites of test cases, and test data. The Test Engineer should be able to demonstrate knowledge of testing techniques and to provide feedback to developers. He/She uses the procedures as defined in the test strategy to execute, report results and progress of test execution and to escalate testing issues as appropriate.

Reports to:
Test Manager (or Quality Assurance Manager)

Responsibilities:
Provides input to the test plan and executes it Carries out requested procedures to ensure that Data Integration systems and services meet organization standards and business requirements Develops and maintains test plans, test requirements documentation, test cases and test scripts Verifies compliance to commitments contained in the test plans Escalates issues and works to resolve them Participates in testing and post-production verification efforts Executes test scripts and documents and provides the results to the test manager Provides feedback to developers Investigates and resolves test failures

Qualifications/Certifications
Solid understanding of the Software Development Life Cycle Experience with automated testing tools Strong knowledge of Data Warehouse and Data Integration enterprise environments Experience in a quality assurance and testing environment Experience in developing and executing test cases and in setting up complex test environments Industry vertical knowledge

Recommended Training
PowerCenter Developer Level I &II Data Analyzer Introduction SQL Basics Data Quality Workbench

Last updated: 01-Nov-10 16:32

INFORMATICA CONFIDENTIAL

ROLES

31 of 35

Test Manager
The Test Manager is responsible for coordinating all aspects of test planning and execution. During test planning, the Test Manager becomes familiar with the business requirements in order to develop sufficient test coverage for all planned functionality. He/she also develops a test schedule that fits into the overall project plan. Typically, the Test Manager works with a development counterpart during test execution; the development manager schedules and oversees the completion of fixes for bugs found during testing. The test manager is also responsible for the creation of the test data set. An integrated test data set is a valuable project resource in its own right; apart from its obvious role in testing, the test data set is very useful to the developers of integration and presentation components. In general, separate functional and volume test data sets will be required. In most cases, these should be derived from the production environment. It may also be necessary to manufacture a data set which triggers all the business rules and transformations specified for the application. Finally, the Test Manager must continually advocate adherence to the Test Plans. Projects at risk of delayed completion often sacrifice testing at the expense of a high-quality end result.

Reports to:
Technical Project Manager (or Quality Assurance Manager)

Responsibilities:
Coordinates all aspects of test planning and execution Carries out procedures to ensure that Data Integration systems and services meet organization standards and business requirements Develops and maintains test plans, test requirements documentation, test cases and test scripts Develops and maintains test data sets Verifies compliance to commitments contained in the test plans Works with the project management and development teams to resolve issues Communicates concerns, issues and problems with data Leads testing and post-production verification efforts Executes test scripts and documents and publishes the results Investigates and resolves test failures

Qualifications/Certifications
Solid understanding of the Software Development Life Cycle Experience with automated testing tools Strong knowledge of Data Warehouse and Data Integration enterprise environments Experience in a quality assurance and testing environment Experience in developing and executing test cases and in setting up complex test environments Experience in classifying, tracking and verifying bug fixes Industry vertical knowledge Able to work effectively with both business and technical stakeholders Project management

Recommended Training
PowerCenter Developer Level I Data Analyzer Introduction Data Explorer

INFORMATICA CONFIDENTIAL

ROLES

32 of 35

Last updated: 01-Nov-10 16:32

INFORMATICA CONFIDENTIAL

ROLES

33 of 35

Training Coordinator
The Training Coordinator is responsible for the design, development, and delivery of all requisite training materials. The deployment of a data integration solution can only be successful if the End Users fully understand the purpose of the solution, the data and metadata available to them, and the types of analysis they can perform using the application. The Training Coordinator will work the Project Management Team, the development team, and the End Users to ensure that he/she fully understands the training needs, and develops the appropriate training material and delivery approach. The Training Coordinator will also schedule and manage the delivery of the actual training material to the End Users.

Reports to:
Business Project Manager

Responsibilities:
Designs, develops and delivers training materials Schedules and manages logistical aspects of training for end users Performs training need analysis in conjunction with the Project Manager, development team and end users Interviews subject matter experts Ensures delivery on training commitments

Qualifications/Certifications
Experience in the training field Ability to create training materials in multiple formats (i.e., written, computer-based, instructor-led, etc.) Possesses excellent communication skills, both written and verbal Results oriented team player Must be able to work effectively with both business and technical stakeholders Has knowledge of the tools and technologies used in the data integration solution

Recommended Training
Training Needs Analysis Data Analyzer Introduction Data Analyzer Report Creation

Last updated: 01-Nov-10 16:32

INFORMATICA CONFIDENTIAL

ROLES

34 of 35

User Acceptance Test Lead


The User Acceptance Test Lead is responsible for leading the final testing and gaining final approval from the business users. The User Acceptance Test Lead interacts with the End Users and the design team during the development effort to ensure the inclusion of all the user requirements within the original defined scope. He/she then validates that the deployed solution meets the final user requirements.

Reports to:
Business Project Manager

Responsibilities:
Gathers and clarifies business requirements Interacts with the design team and end users during the development efforts to ensure inclusion of users requirements within the defined scope Reviews technical design proposals Schedules and leads the user acceptance test effort Provides test script/case training to the user acceptance test team Reports on test activities and results Validates that the deployed solution meets the final user requirements

Qualifications/Certifications
Experience planning and executing user acceptance testing Strong understanding of the business' processes Knowledge of the project solution Excellent communication skills

Recommended Training
N/A

Last updated: 01-Nov-10 16:32

INFORMATICA CONFIDENTIAL

ROLES

35 of 35

You might also like