PowerCenter Developer I Lab Guide
PowerCenter Developer I Lab Guide
Lab
2-1: Creating Source Definitions
2-2: Creating Target Definitions
2-3: Creating Mappings
1
11
17
25
35
45
51
65
73
81
91
101
109
115
121
129
137
145
You will create some Source definitions for use in later work.
Goals:
Use wizards to import Source definitions from a flat file and a relational database table
Preview the data in the Sources
Duration:
10 minutes
Instructions
Note: Throughout this and later exercises, xx will refer to the student number assigned to
you by your Instructor or the machine you are working on. For example, if you are
Student05, then ~Developerxx refers to the folder ~Developer05.
Step 1.
1)
2)
Figure 1:
3)
) to start it.
For Username, enter Devxx (xx is the number assigned by your instructor).
b)
c)
Click Connect.
Lab 2-1
Figure 2:
4)
Figure 3:
Lab 2-2
Step 2.
1)
2)
3)
In the Open Flat File dialogue, select customer_central.dat and click OK.
a)
4)
Note: This is because you are importing a comma-delimited file. You will select the field
delimiter on the next screen. Note that PowerCenter can also import files with fixed
field widths.
b)
Note: When a file has column names, as this file does, PowerCenter can import those as field
names.
c)
Figure 4:
Click Next.
Flat File Import Wizard - Step 1 of 3
Lab 2-1
5)
Note: While a number of standard delimiters are listed, you can define any character or set
of characters as the delimiter using the "Other" checkbox.
b)
Figure 5:
Note: This step sets up the fields in general. You will have the opportunity to adjust
individual fields in Step 3.
Note: "Use default text length" - check this to set a standard length for all fields with a text
data type. Leave it unchecked and PowerCenter derives the text field length from the
actual length of the data in the file.
Note: "Escape Character" is the character used in your file format if the delimiter character
may appear in a field. Consult documentation to learn more.
6)
Use the scrollbar to move to the right and select the City field.
b)
c)
Lab 2-2
Note: You will adjust this field because you know in the future you will import addresses
with Canadian postal codes, which contain alphanumeric characters.
e)
f)
g)
h)
Note: You will adjust this field because you know that in the target database the data will be
stored as a Date. You could perform the conversion using PowerCenter's "To-Date"
function, but it is simpler to use the implied conversion functionality of the Source
definition.
i)
Figure 6:
7)
Click Finish.
Flat File Import Wizard - Step 3 of 3
Lab 2-1
Figure 7:
8)
Note: You may use this same Source definition to import data from multiple flat files. You
will change the name to refer generically to customers rather than the specific customer
data file.
a)
Double-click the green header bar at the top of the Source definition.
(i)
b)
c)
d)
e)
9)
To verify that the Source imported correctly, you will now preview the contents of
the flat file.
a)
Right-click the header bar of the Source definition and select Preview data.
b)
Figure 8:
(i)
(ii) In the Open Flat File dialogue, select customer_central.dat and click Open.
(iii) Click Open.
(iv) The Preview Data dialogue will display data from the flat file.
Lab 2-2
Figure 9:
Step 3.
1)
2)
3)
For ODBC data source, select OLTP (DataDirect 5.2 Oracle Wire Protocol).
b)
c)
d)
Lab 2-1
Figure 10:
4)
e)
In the Select tables pane, click the plus sign (+) beside SDBU to expand it.
f)
g)
Select DEALERSHIP.
h)
Click OK.
i)
b)
Select ODBC data source OLTP (DataDirect 5.2 Oracle Wire Protocol).
(ii) For Username, Owner name, and Password, enter SDBU. (Owner name
should populate automatically.)
Figure 11:
Lab 2-2
Figure 12:
Step 4.
1)
Note: Always save your work before closing the application or moving on to another task.
There is no automatic save in PowerCenter.
Note: You can also save by selecting RepositorySave from the menu.
Lab 2-1
10
Lab 2-2
Duration:
10 minutes
Instructions
Step 1.
1)
Define a Target
Determine what columns will be required
a)
Figure 13:
2)
b)
Lab 2-2
).
11
c)
12
Lab 2-2
3)
b)
Figure 16:
c)
Lab 2-2
13
d)
Figure 17:
e)
f)
Figure 18:
14
Lab 2-2
Step 2.
1)
Figure 19:
2)
Double-click the header of the Customers Target definition to open the Edit Tables
dialogue and select the Columns tab.
a)
Figure 20:
Note that the Datatypes are "number" and "string," as is standard for flat file
definitions.
Columns Tab
3)
4)
Click the Rename button and change the target name to STG_CUSTOMERS.
5)
6)
Click Apply.
a)
Lab 2-2
15
Figure 21:
7)
Figure 22:
8)
Note that the "number" and "string" datatypes have changed to the
"number(p,s)" and "varchar" types appropriate to Oracle.
Column Datatypes Changed
9)
Click OK
16
Lab 2-2
Duration:
30 minutes
Instructions
Step 1.
Create Shortcuts
Note: Best practices call for developers to build mappings from shortcuts to a common
folder, rather than defining Sources and Targets in the developers' own folders. This has
several advantages, of which the most significant is that it greatly eases migration of
mappings between PowerCenter environments (e.g., from Development to Test to
Production). Developers create sources and targets, and the Administrator copies them
to the Shortcut folder, where they can be used by all developers, and in migration.
In this lab, you will use shortcuts based on the Sources and Targets you created in labs 1 and
2. The administrator has already copied these Sources and Targets. You will learn how to
create shortcuts to objects in the shortcut folder.
Note: Best practices also call for data to be loaded directly from Sources into staging tables as
part of the ETL process. From these tables, data can be accessed for transformation and
loading without putting a further burden on the Source systems.
1)
Note: Do not double-click the name of the folder. This will connect you to the folder, and
you need to remain connected to your own ~Developerxx folder to create the shortcuts.
e)
Click once on the plus sign to the left of the subfolder named Sources.
Lab 2-3
17
f)
Figure 23:
Click once more on the plus sign to the left of the FlatFile subfolder. The
Repository Navigator should now look like this:
Repository Navigator - Shortcut Folder
g)
Click the Customers flatfile Source definition and drag it into the Source
Analyzer workspace.
h)
Figure 24:
Designer dialogue
Note: If the dialogue asks you whether to copy the source table, say No and try again. You
want to make a shortcut, not a copy.
i)
18
Lab 2-3
j)
Click Rename.
k)
l)
Note: The SC_ prefix is Velocity best practice for all shortcuts to objects in other folders.
m) In the Repository Navigator window, expand your ~Developerxx folder, then
the Sources sub-folder, then the FlatFile sub-subfolder.
n)
2)
3)
4)
5)
6)
Step 2.
1)
2)
In this step, you will place all required components into the Mapping Designer
workspace.
a)
In your ~Developerxx folder, expand the Sources subfolder, then the OLTP
sub-subfolder.
Figure 25:
b)
You will be prompted to name the new Mapping. Give it the name
m2_STG_DEALERSHIP_xx. (Do not type "xx" - use your student id
number!)
Note: Velocity best practice is for all Mappings to begin with the identifying prefix "m_"
Lab 2-3
19
(ii) Note that both a Source and a Source Qualifier transformation appear in
the Mapping Designer. If they did not, contact your instructor for help.
Figure 26:
c)
d)
3)
In this step, you will link the Source Qualifier to the Target.
Hint: This procedure may be easier if you rearrange the column widths in the Source
Qualifier and Target so that you can see the full name of the port
20
a)
b)
c)
Hover over DEALERSHIP_ID on the target and release the mouse button.
d)
A blue arrow, representing a link between the ports of the Source Qualifier and
the Target definition, appears.
e)
Repeat this process to link all ports of the Source Qualifier to the similarlynamed ports in the Target.
Lab 2-3
f)
Figure 27:
4)
Save your work. In the Output Window, verify that the Mapping is valid. (If it is
not, and you cannot spot the error, ask your instructor for help.)
Step 3.
1)
2)
Drag the Source definition SC_Customers (the flat file definition, not the OLTP
definition) and the Target definition SC_STG_CUSTOMERS into the Mapping
Designer workspace.
a)
b)
The DATE port in the Source Qualifier does not have a same-named port
in the Target definition. Link it to the DATE_FLD port in the Target
definition.
Lab 2-3
21
Figure 28:
3)
22
Lab 2-3
Step 4.
In the remaining labs of this class, you will use shortcuts to the objects in
SC_DATA_STRUCTURES. It will be convenient to create those shortcuts now so they
will be available in later labs.
1)
2)
3)
4)
Note: The best practice is to change the names of each of these shortcuts to read "SC_"
rather than "Shortcut_to_". However, it is time-consuming and dull.
The labs instructions from here on out will refer to these shortcuts using the "SC_" prefix, to
reflect the best practice; if you do not change the names, simply substitute
"Shortcut_to_".
Lab 2-3
23
24
Lab 2-3
You need to load the Customer and Dealership data into the Staging tables.
Goals:
Create and run Workflows that execute the Mappings you created in Lab 2-3
Duration:
45 minutes
Instructions
Step 1.
1)
Figure 29:
2)
Tools Toolbar
Step 2.
Create a Workflow
1)
2)
From the Workflow Manager menu, select Tools Workflow Designer (NOT
Workflow Manager).
3)
4)
Note: Velocity best practice is to prefix the name of a Workflow with "wkf_" and give it a
name describing what it does. In this case, we are loading dealership data into a staging
table.
b)
c)
Click OK.
Note that the Workflow is created with a Start task already present.
Lab 3-1
25
5)
).
a)
Click in the Workflow Designer workspace somewhere to the right of the Start
task.
b)
Figure 30:
c)
Figure 31:
Select Mapping
Note: The Velocity standard name for a Session task is s_ followed by the name of the
Mapping. The Workflow Designer automatically assigns this name to a Session task
when you add it to the Workflow.
26
Lab 3-1
6)
b)
c)
d)
e)
Figure 32:
7)
Tasks Linked
Note: The Session task properties determine what files or database tables the Mapping reads
from and writes to. The Source and Target definitions in the Mapping define the fields
to be read or written, but do not directly determine where to read from or write to.
Lab 3-1
27
a)
b)
c)
On the right:
(i)
(ii) In the "Connections" section, click the dropdown arrow and select the
connection SDBU.
Figure 33:
d)
e)
Source Properties
28
Lab 3-1
Step 3.
1)
In the "Tools" toolbar, click the "M" icon to start the Workflow Monitor
application.
2)
3)
4)
5)
Return to the Workflow Monitor. The status of your Workflow and Session are
now both "Running."
Figure 34:
6)
Step 4.
1)
2)
Figure 35:
Lab 3-1
29
3)
4)
5)
For ODBC Data Source, select STG (DataDirect 5.2 Oracle Wire Protocol).
b)
c)
d)
Figure 36:
30
Click Connect.
The lower part of the Preview Data dialogue populates with the data you loaded
into the STG_DEALERSHIP table.
Preview STG_DEALERSHIP data
Lab 3-1
Step 5.
1)
b)
c)
2)
b)
Click OK.
3)
4)
Figure 37:
Lab 3-1
31
Figure 38:
32
Lab 3-1
Lab 3-1
33
34
Lab 3-1
Move data from the Customer staging table to the ODS database
o
Use an Expression transformation to reformat data
o
Use a Filter transformation to pass only valid records
Duration:
60 minutes
Instructions
Step 1.
1)
2)
3)
4)
Figure 39:
Mapping Created
Lab 4-1
35
Step 2.
In this step you will add a Filter transformation to the mapping to pass only records with
valid Customer IDs.
1)
2)
Drag the following ports from the Source Qualifier to the Filter transformation:
DEALERSHIP_ID
CUSTOMER_NO
FIRSTNAME
LASTNAME
ZIP
GENDER
INCOME
AGE
DATE_FLD
Figure 40:
36
Ports Connected
3)
4)
Lab 4-1
5)
b)
c)
Figure 42:
Step 3.
Filter Expression
Click OK.
Mapping with Filter Transformation Added
In this step you will add an Expression transformation that will format Customer data
correctly for the ODS database.
1)
Lab 4-1
37
2)
3)
Drag the following ports from the Filter transformation to the Expression
transformation:
FIRSTNAME
LASTNAME
GENDER
INCOME
AGE
Edit the Expression transformation
a)
b)
c)
Set all the ports you dragged from the Filter to be input-only by unchecking the
"O" column.
d)
e)
Figure 43:
f)
Length/Precision
40
7
10
10
Use the Expression Editor to create an expression for the NAME port to
concatenate the FIRSTNAME and LASTNAME fields, with a space in
between:
FIRSTNAME || ' ' || LASTNAME
Note: More advanced data integration developers may recognize that the above expression
leaves something to be desired when dealing with less-than-ideal data, as would be
typical in these fields. Informatica has extensive data quality capabilities to recognize,
cleanse, and supplement name data. These capabilities are in the Data Quality product,
which is outside the scope of this class.
38
Lab 4-1
g)
The DECODE function uses a mapping to replace the values in a field with other values. In
this case, if the field has a value of "M", then it is changed to "MALE." If the field has a
value of "F", it is changed to "FEMALE." Any other value will be replaced with "UNK"
(for "UNKNOWN"). DECODE is useful when there are a relatively small number of
enumerated values in a field. If there are a larger number of values to be remapped, a
Lookup transformation would be used. (We will cover Lookup transformations later in
this course.)
h)
Create an expression for the SENIOR_FLAG port that sets the port value to 1
(Boolean TRUE) if the AGE is greater than 55:
IIF(AGE > 55, 1)
IIF - "Immediate If" - is a powerful function. When the expression (AGE > 55) evaluates to
TRUE, the first argument is assigned to the port. When the expression does not evaluate
to TRUE, the second argument is assigned to the port. In this case, no second value is
assigned, so the port is set to zero (0) when the expression evaluates to FALSE.
IIF expressions can be nested to handle multibranch logic.
i)
Create an expression for the HIGH_INCOME_FLAG port that sets the port to
1 (Boolean TRUE) if the INCOME is greater than 50000:
IIF(INCOME > 50000, 1)
j)
Figure 44:
4)
Click OK.
Lab 4-1
39
Step 4.
1)
From
Transformation
Name
To Transformation
Name
To Port Name
fil_Valid_Customer_
Number
DEALERSHIP_ID
SC_ODS_
CUSTOMER
DEALERSHIP_ID
fil_Valid_Customer_
Number
CUSTOMER_NO
SC_ODS_
CUSTOMER
CUSTOMER_NO
fil_Valid_Customer_
Number
ZIP
SC_ODS_
CUSTOMER
POSTAL_CODE
fil_Valid_Customer_
Number
DATE_FLD
SC_ODS_
CUSTOMER
CONTACT_DATE
exp_Format_
Customers
NAME
SC_ODS_
CUSTOMER
NAME
exp_Format_
Customers
GENDER_
CATEGORY
SC_ODS_
CUSTOMER
GENDER_
CATEGORY
exp_Format_
Customers
SENIOR_FLAG
SC_ODS_
CUSTOMER
SENIOR_FLAG
exp_Format_
Customers
HIGH_INCOME_
FLAG
SC_ODS_
CUSTOMER
HIGH_INCOME_
FLAG
Figure 45:
2)
40
Target, Connected
Verify that the Mapping is valid and fix any problems that keep it from
validating.
Lab 4-1
3)
Figure 46:
Step 5.
Completed Mapping
1)
2)
3)
4)
b)
c)
d)
e)
5)
6)
7)
Figure 47:
Lab 4-1
41
Figure 48:
Note: The number of rows you see may differ from what is shown in the figure, depending
on whether you performed the extra credit exercise at the end of Lab 4.
a)
42
Why does the number of rows in the source not match those in the target?
Lab 4-1
Lab 4-1
43
Answers
5.7.a. Why does the number of rows in the source not match those in the target?
Some rows were removed by the Filter transformation, so those rows did not reach the target.
44
Lab 4-1
Instructions
WARNING: In this lab, do not save your work. While it is normally best practice to save
your work frequently while working in PowerCenter, in this case you will be making
changes to a Mapping that is already the way you want it. So don't save your work!
Step 1.
In a complex Mapping, it can be hard to see how the parts relate. How can you make this
better?
1)
Begin with the Mapping from Lab 4-1 (m4_ODS_Customers_xx) open in the
PowerCenter Developer application.
2)
3)
Arrange All Iconic enables you to quickly see the relationships between the objects in a
Mapping.
Step 2.
1)
Autolink
"Arrange All" on the Mapping.
2)
Drag the cursor across the links between the Source definition and the Source
Qualifier to select them.
3)
4)
5)
Position the cursor over the Source, then click and drag to the Source Qualifier.
6)
Autolinking provides a quick way to connect the output ports in one transformation to the
input ports in another transformation.
Autolink by Name searches for ports with identical names and connects them
Autolink by Position connects the first output port to the first input port, the second output
port to the second input port, etc.
Lab 4-2
45
7)
Step 3.
Suppose another developer has created a large, complex Mapping that is not working quite
right: some data is winding up in the wrong fields. And you have been asked to debug it.
How can you figure out where the data is coming from? Answer: By tracing the link
paths.
a)
b)
Expand the Filter transformation so you can see the related field there.
(i)
c)
Note that the links leading both into and out of it are red.
You can, by expanding the appropriate transformations, trace the lineage of the
Postal Code field all the way back to the ZIP field in the Source definition.
Selecting the link path enables you to easily trace the lineage of any field forward and
backward through a Mapping.
Step 4.
You have to change the datatype of a field in the Source. Do you really have to manually
adjust every port along its link path? No.
1)
2)
Change the name of the CUSTOMER_NO port to CUST_NO and its precision
from 5 to 10.
3)
Click OK.
4)
5)
b)
Click Preview.
c)
d)
e)
f)
46
Lab 4-2
Step 5.
Moving Ports
Sometimes just rearranging the ports on a transformation will make the Mapping easier to
read.
1)
2)
3)
Single-click and hold the number next to the ZIP field. Note the square that appears
in the cursor.
4)
5)
Step 6.
2)
3)
4)
Click Create.
a)
5)
6)
Click Done.
The Filter you just created is already selected. Hold down the Shift key and click the
Aggregator you created to select it, too.
8)
Step 7.
a)
Note that the Designer dialogue tells you which transformations will be deleted.
b)
Click Yes.
Reverting to Saved
Sometimes you make a mistake that you can't easily undo and need to go back to where you
were before. If you haven't saved, you can do it.
1)
2)
When asked whether to save the changes to your folder, click No.
3)
4)
5)
6)
Lab 4-2
47
Step 8.
Scaling
You may not be able to see the whole Mapping in your workspace. But you can.
1)
2)
3)
In the Standard toolbar at the top of the window, click the Zoom dropbox (
and select 60.
4)
5)
6)
Step 9.
When editing several transformations, you don't have to close the Edit Transformations
dialogue and reopen it repeatedly
1)
2)
3)
4)
What happens?
Step 10.
You may find that you want to duplicate a set of transformations within a Mapping or a
Mapplet, preserving the dataflow between them. This technique may prove useful if you
know that you will need to use the logic contained in the transformations in other
Mappings or Mapplets.
48
1)
2)
Use your left mouse button to draw a rectangle that encloses the Filter and
Expression transformations. This will select these objects.
3)
4)
Note that both transformations have been copied onto the mapping, including the
dataflow between them. They have been renamed with a "1" on the end of their
names.
5)
6)
7)
8)
Disconnect from your folder but do not save the changes (revert to the previously
saved version).
Lab 4-2
Step 11.
By viewing object dependencies in the Designer, a user can learn which objects may be
affected by making changes to Source or Target definitions, Mappings, Mapplets, or
transformations. Direct and indirect dependencies are shown.
1)
2)
3)
4)
You will see the View Dependencies window, which will show every Mapping,
Session, and Workflow that uses or depends upon the SC_Customers Source, as
well as those that it uses or depends on.
5)
Note: The Save to File button on the View Dependencies window saves the dependency
information as a HTML file (.htm) for later viewing.
6)
Lab 4-2
49
Answers
4.5.e. Was there a change made in the Filter? What was it?
Yes, the name and precision of the Customer Number port changed to match the changes in the Source
Qualifier.
4.5.f. Was there a change made in the Target definition? Why or why not?
No, the Source and Target definitions cannot be changed or edited in the Mapping Designer workspace.
They can only in the Source Analyzer and Target Designer workspaces.
9.4. What happens?
You now see and can work with the ports of the Filter transformation.
50
Lab 4-2
Instructions
Step 1.
1)
Note: PowerCenter has many options that customize the appearance and functionality of the
client applications. In this case, we want to turn off automatic creation of Source
Qualifiers so we can use a single SQ to create a homogeneous join of two Source
definitions.
2)
3)
b)
c)
Lab 5-1
51
d)
Figure 49:
e)
Click OK.
4)
5)
In the following steps, you will create a Source Qualifier to join the tables using the
common field DEALERSHIP_ID.
Tip: Note that the fields are of the same data type - if they were not, you could not join the
tables with a single Source Qualifier.
Performance Note: Extensive discussion can ensue when deciding whether it is better to
have the tables joined in the database or by PowerCenter. In general, when the tables
have primary keys and indexes, it is better to join them in the database.
When you are joining more than three tables, database optimizers may or may not devise a
plan that leverages keys and indexes to avoid unnecessary full table scans. If a database
SQL plan analysis indicates that the database is engaging in multiple full table scans,
consider using PowerCenter to join at least some of the relational tables together.
52
Lab 5-1
a)
).
b)
c)
In the "Select Sources for Source Qualifier Transformation" dialogue, make sure
that both SC_STG_DEALERSHIP and SC_STG_EMPLOYEES are selected,
then click OK.
d)
A single Source Qualifier will be created, with all fields from both sources
feeding into it.
Figure 50:
e)
f)
Rename it SQ_EMPLOYEE_DEALERSHIP.
g)
h)
In the User Defined Join field, click the bent arrow to open the SQL Editor and
edit the property.
Tip: Do not use the "Sql Query" field for the Join condition. This will cause the workflow to
fail.
i)
j)
k)
Lab 5-1
53
l)
Figure 51:
m) Click OK.
n)
o)
Hint: The mapping will not validate, as it does not yet have a target object. This is OK.
Step 2.
1)
2)
54
Lab 5-1
b)
Figure 52:
Note: Making a transformation reusable is not reversible. Once done, it cannot be revoked.
Note: Also note that the best practice says that reusable objects should be created in the
project shortcut folder (in this class, SC_DATA_STRUCTURES). If you, as a
developer, promote an object to be reusable, you should notify your tech lead so s/he can
move it to the appropriate shortcut folder. This enables the object to be properly
migrated to the Test and Production environments. Once the tech lead has done this,
you must modify your mapping to use the shortcut rather than the object from your
local folder.
c)
d)
e)
3)
Lab 5-1
55
4)
Figure 53:
5)
Transformation Developer
b)
c)
d)
Note: Velocity best practice is to prefix re_ to the name of any reusable transformation.
6)
Step 3.
56
e)
f)
Click OK.
1)
2)
3)
Connect the ports FIRST_NAME and LAST_NAME from the Source Qualifier to
the ports FIRSTNAME and LASTNAME in the Expression transformation.
4)
Connect the port GENDER in the Source Qualifier to the port GENDER in the
Expression transformation.
5)
Connect the port AGE in the Source Qualifier to the port AGE in the Expression
transformation.
6)
Lab 5-1
7)
Expression
Port
re_exp_Format_Persons
NAME
GENDER_CATEGORY
SENIOR_FLAG
EMPLOYEE_ID
DEALERSHIP_ID
ZIP_CODE
HIRE_DATE
POSITION_TYPE
SQ_Employee_Dealership
DEALERSHIP_MANAGER_ID
8)
DEALERSHIP_DESC
DEALERSHIP_LOCATION
Save your work. Verify that the Mapping is valid.
9)
Figure 54:
Step 4.
Port in
SC_ODS_PERSONNEL
NAME
GENDER_CATEGORY
SENIOR_FLAG
EMPLOYEE_ID
DEALERSHIP_ID
POSTAL_CODE
HIRE_DATE
POSITION_TYPE
DEALERSHIP_MANAGER_ID
DEALERSHIP_DESCRIPTION
DEALERSHIP_LOCATION
1)
2)
Drag the flatfile Source definition SC_Inventory and the relational Source
definition SC_STG_PRODUCT into the Mapping Designer workspace.
Since you will be joining a flatfile source to a relational source, you cannot use a
homogeneous join here. Therefore you will use a Joiner transformation.
b)
c)
Lab 5-1
57
d)
Figure 55:
Performance Note: The PowerCenter Joiner transformation is fast and uses RAM rather
than disk memory wherever possible. Optimizing the use of RAM can be important,
particularly when RAM space is limited. Therefore, the Master side of the Joiner should
be the one with the fewest duplicate keys and the fewest rows (provided this fits the logic
of the join). Also, joining sorted data allows more efficient use of RAM.
In this lab, we will adhere to best practice by using STG_PRODUCT as the Master side of
the Joiner. STG_PRODUCT has a much smaller number of rows than Inventory, and
no duplicate keys.
3)
4)
INVENTORY_ID
PRODUCT_ID
DEALERSHIP_ID
RECEIVED_DATE
QTY_ON_HAND
INVOICE_PRICE
TIME_KEY
MSRP
Drag the following ports from SQ_SC_STG_PRODUCT to the Joiner:
5)
PRODUCT_ID
GROUP_ID
PRODUCT_DESC
GROUP_DESC
DIVISION_DESC
Double-click the Joiner transformation to edit it.
a)
Rename it jnr_Inventory_FF_STG_PRODUCT.
Note: the FF is for "Flat File." As a general rule, naming conventions should be as clear as
possible.
b)
c)
d)
e)
58
Lab 5-1
f)
g)
h)
i)
Figure 56:
j)
Step 5.
Click OK.
In this step you will create a Filter transformation to remove products with no inventory.
1)
2)
3)
4)
5)
Use Autolink by Name to connect ports from the Filter transformation to the
Target.
6)
The data flow for products should now look like this:
Figure 57:
Lab 5-1
59
7)
Arrange All Iconic. The entire Mapping should look like this:
Figure 58:
8)
Step 6.
1)
2)
Add a Session task using the mapping you just completed and link it to the Start
task.
3)
b)
c)
For SQ_SC_Inventory:
(i)
60
Lab 5-1
d)
Click OK.
4)
5)
6)
The Task Details and Source/Target Statistics for the completed Workflow should
look like this:
Figure 59:
Step 7.
1)
Figure 60:
Lab 5-1
61
2)
Figure 61:
62
Lab 5-1
Lab 5-1
63
64
Lab 5-1
Duration:
40 minutes
Instructions
Step 8.
1)
2)
Figure 62:
Lab 6-1
65
b)
Figure 63:
Import STG_DATES
3)
Drag the port DATE_ID from the Source Qualifier and drop it on the Lookup
transformation to create a link.
4)
b)
c)
d)
e)
Figure 64:
f)
66
In the Import Tables dialogue, log into the STGxx schema and select the
STG_DATES table.
Lab 6-1
5)
Figure 65:
6)
7)
Step 9.
1)
2)
Drag the following ports from the Source Qualifier to the Expression
transformation:
REVENUE
COST
DELIVERY_CHARGES
SALES_QTY
DISCOUNT
HOLDBACK
REBATE
a)
Figure 66:
Lab 6-1
67
3)
Create a new port called v_MARGIN, with Datatype decimal 10.2 and port
type Variable.
68
Lab 6-1
Figure 68:
Lab 6-1
69
b)
Create another new port named GROSS_PROFIT with Datatype decimal 10.2
and set it to Output only.
(i)
c)
Create another new port named NET_PROFIT with Datatype decimal 10.2
and set it to Output only.
(i)
d)
Figure 69:
Using the Expression Editor, set its formula to v_MARGIN(DELIVERY_CHARGES + DISCOUNT + HOLDBACK + REBATE)
e)
Click OK
f)
Step 10.
70
1)
2)
3)
Lab 6-1
Figure 70:
Step 11.
M6_Load_ODS_SALES
1)
2)
3)
4)
b)
Click OK.
6)
7)
Figure 71:
Lab 6-1
71
8)
Figure 72:
72
Lab 6-1
Use a Lookup transformation to add week and month information to the data rows
Split the data stream to feed two Aggregator transformations and write data out to two
separate tables
Use Sorter transformations to improve efficiency of the mapping
Duration:
75 minutes
Instructions
Step 1.
1)
2)
3)
Note: If you need help with this step, consult the instructions for Lab 6-1.
a)
b)
c)
Step 2.
Create and Configure an Aggregator to Summarize Data by
Month
1)
Lab 7-1
73
2)
From lkp_STG_DATES_WEEK_MONTH:
MONTH_DESC
b)
c)
Note: PowerCenter will automatically rename these ports with a numerical 1 at the end of
each port name to avoid having duplicate port names.
3)
Rename it agg_SUM_BY_MONTH.
b)
c)
d)
(ii) Change the port type from input only to output only.
(iii) Edit each of the expressions for the _MONTH_SUM ports to calculate a
sum. For example, the expression for REVENUE_MONTH_SUM should
be SUM(REVENUE).
74
Lab 7-1
e)
Figure 73:
4)
5)
Link the MONTH_DESC port and the MONTH_SUM ports to the appropriate
ports in the Target SC_ODS_SALES_BY_MONTH.
6)
Lab 7-1
75
7)
Figure 74:
Step 3.
Create and Configure an Aggregator to Summarize Data by
Week
1)
2)
3)
b)
b)
Rename it agg_SUM_BY_WEEK.
b)
c)
d)
76
4)
5)
Connect the WEEK_DESC and WEEK_SUM ports of the new Aggregator to their
counterparts in the target SC_ODS_SALES_BY_WEEK.
Lab 7-1
6)
7)
The Mapping should look like this when you Arrange All Iconic:
Figure 75:
Step 4.
M7_Sales_Summaries
1)
2)
The Session should both read from and write to Relational connection ODSxx.
3)
4)
Be sure to set the Load type to Normal and the Truncate table option on (checked).
5)
Figure 76:
a)
Lab 7-1
77
Figure 77:
Figure 78:
Note: The basic functionality of the Mapping is complete. However, in the Production
environment, where there will be millions of records, the Aggregator transformations
may run very slowly.
By default, Aggregator transformations work by creating a "bucket" for each unique value in
the Group By port(s). If the number of unique values is large, a great deal of memory
may be dedicated to maintaining these "buckets," or the system may have to cache
buckets to disk. In either case this can have a performance impact.
To prevent this, you can sort the data prior to its reaching the Aggregator. If the data is
sorted on the Group By port, and the Aggregator transformation is "told" that this is the
case, then there is no need to maintain many "buckets," and performance is improved.
Step 5.
1)
78
Lab 7-1
2)
3)
From lkp_STG_SALES_WEEK_MONTH:
MONTH_DESC
b)
From SQ_SC_ODS_SALES:
REVENUE
COST
DELIVERY_CHARGE
SALES_QTY
DISCOUNT
HOLDBACK
REBATE
GROSS_PROFIT
NET_PROFIT
Rename it srt_MONTH_DESC
b)
Figure 79:
d)
4)
5)
6)
Edit agg_SUM_BY_MONTH.
Lab 7-1
79
a)
b)
c)
7)
8)
b)
Alternatively, you can copy and edit the Sorter you have already created
The completed Mapping, when you Arrange All Iconic, should look like this:
Figure 80:
9)
Step 6.
1)
Click OK.
Note: "Refresh Mapping" re-reads the mapping information for the Session. If substantial
changes have been made to the mapping that might cause it to become invalid, the
Workflow Manager marks it invalid just in case.
80
2)
3)
b)
But compare the run time for the Session to the first run.
Lab 7-1
Duration:
35 minutes
Instructions
Step 1.
1)
In the Designer application, make sure you are connected and open to your assigned
Devxx folder.
2)
3)
a)
b)
b)
c)
4)
Tip: Note that the Mapping validates properly. The validation process ensures that the
Mapping is technically valid, but it cannot test for errors in business logic.
5)
6)
Inspect the Mapping to get an overall idea of what kind of processing is being
done.
b)
You have been told only that there is an "error" in the data being written to the
target, without any further clarification as to the nature of the error.
Tip: Many Mapping errors can be found by carefully inspecting the Mapping, without using
the Debug Wizard. If the error cannot be quickly located in this manner, the Debug
Wizard can help you by showing the actual data passing through the transformation
ports. However, to use the Debug Wizard effectively, you need to understand the logic
of the Mapping.
Step 2.
Lab 8-1
81
1)
Figure 81:
Figure 82:
Debugger Toolbar
Tip: If the Debugger Toolbar is not visible, it is possible that another toolbar has shifted it
off the screen. Rearrange the other toolbars until you can see it.
2)
3)
The first page of the Debug Wizard is informational. Please read it and press Next.
Tip: The Debug Wizard requires a valid Mapping and Session to run - it cannot help you
determine why a Mapping is invalid. The Output window of the Designer will show you
the reason(s) why a Mapping is invalid.
82
Lab 8-1
4)
Figure 83:
a)
b)
In the Session box, select the Create a debug session radio button.
c)
5)
Click Next.
The next page of the Wizard allows you to set connection properties, similar to
creating Sessions in the Workflow Manager application.
a)
b)
You will discard the debugger data in a later step, so this value will be
ignored.
Lab 8-1
83
c)
Figure 84:
d)
These panels enable you to set which transformations in the Mapping you wish to monitor
in this debugging session, and set Session configuration information, such as a parameter
file or which connections the variables $Source and $Target correspond to.
84
Lab 8-1
e)
Figure 85:
f)
6)
Click Finish.
Resize the Debugger Target Data Display and Debugger Data Display windows as needed.
A good guideline is to have them look something like this:
Figure 86:
Debugger Windows
Lab 8-1
85
Step 3.
1)
Figure 87:
2)
3)
Note: The term "instance" here refers to an object in the Mapping. Thus, each
transformation is an "instance."
4)
Figure 88:
5)
Note that there is no data available as yet - the Instance window, with the Next
Instance button, shows data as it moves from transformation to transformation
through the Mapping.
86
6)
Note that one more row has been read, and the first row has been "pushed" to
the Expression transformation and the Target table.
Lab 8-1
7)
Click the Step to Instance button several more times (at least 13), watching how the
data flows from the Instance window to the Target Instance window. Compare the
results between the Target instance and Instance windows.
a)
What is the nature of the error in the data being written to the table?
b)
Tip: Note that the transformation properties are grayed-out. While you can view and copy
expressions, you cannot edit the Mapping or its components while the Debugger session
is running.
c)
Step 4.
Tip: Nonetheless, you CAN try new variations on expression while the Debugger is running.
1)
Enter the Expression Editor for one of the output ports - preferably the one
that seems likely to be causing the problem.
b)
Select the text of the expression (even though it is grayed-out) and copy it to the
Windows clipboard by typing Ctrl+C.
c)
d)
e)
Paste the expression text you chose into the Expression Editor and press
Evaluate.
(i)
The Debugger will immediately evaluate the expression with the current
data in the ports.
(ii) You can make as many changes to the Expression here as you need.
(iii) Once you have a modified expression that you want to keep, copy it to the
Windows clipboard.
2)
3)
) on the Debugger
Lab 8-1
87
88
4)
Edit the Expression transformation and put your modified Expression in place by
pasting it into the Expression Editor.
5)
6)
Restart the Debugger and test to ensure that your fix worked.
Lab 8-1
Lab 8-1
89
Answers
3.7.a What is the nature of the error in the data being written to the table?
The month and date seem to be reversed. That is, the data comes in as January 1, January 2, etc., but is
being written as January 1, February 1, etc.
3.7.c. What is causing the error?
The Expression Editor is using a format of DD/MM/YYYY but the incoming data has a format of
MM/DD/YYYY.
90
Lab 8-1
Use a single Source definition to read two files and combine their data in a single data
stream.
Remove duplicate rows.
Create logic that
o
Rejects the record if the incoming CUSTOMER_ID is missing
o
Inserts the record if the customer does not already exist in ODS_CUSTOMERS
o
Updates the record if the customer already exists in ODS_CUSTOMERS
Duration:
90 minutes
Instructions
Step 1.
1)
2)
3)
4)
a)
b)
b)
Arrange All
Lab 9-1
91
5)
Figure 89:
Step 2.
1)
2)
Rename it uni_E_W_CUSTOMERS.
b)
Figure 90:
c)
3)
92
Click OK.
Lab 9-1
a)
Figure 91:
4)
Step 3.
1)
2)
Drag all the output ports from the Union transformation to it.
3)
Rename it srt_REMOVE_CUST_DUPS.
Lab 9-1
93
b)
Figure 92:
c)
d)
Figure 93:
e)
94
Click OK.
Lab 9-1
f)
Figure 94:
4)
Iconize the Source definitions, Source Qualifiers, and Union transformation, and
arrange the Mapping to give you space on the right.
Step 4.
Create and Configure a Lookup on the ODS_CUSTOMERS
target table
1)
2)
3)
Drag the port CUSTOMER_NO from the Sorter transformation to the Lookup
transformation.
Note: the rule of "active vs. passive" transformation objects applies here. The Sorter is an
active transformation. Therefore, it cannot be bypassed by bringing this port directly
through from the Union transformation to the Target.
4)
Rename it lkp_ODS_CUSTOMERS.
b)
c)
d)
Click OK.
Lab 9-1
95
5)
Figure 95:
Step 5.
96
1)
2)
Drag all the ports from the Sorter transformation to the Update Strategy
transformation.
3)
Drag the CUSTOMER_NO port from the Lookup transformation to the Update
Strategy transformation.
4)
Rename it to upd_UPDATE_ELSE_INSERT.
b)
c)
Change the name of the CUSTOMER_NO port (the one coming from the
Sorter) to CUSTOMER_NO_SOURCE
d)
Change the name of the CUSTOMER_NO1 port (the one coming from the
Lookup) to CUSTOMER_NO_LOOKUP
Lab 9-1
Figure 96:
Lab 9-1
97
e)
f)
Enter the above expression in the Expression Editor and click Validate.
(i)
Figure 97:
h)
5)
98
Use Autolink by Position to connect the ports from the Update Strategy
transformation to the Target definition.
Lab 9-1
6)
Figure 98:
Step 6.
M9_Update_Customers_xx
1)
2)
3)
The Target and Lookup transformation will use the Relational connection ODSxx
4)
5) DO NOT set the Truncate target table option. If you do, the existing data in the
table will be deleted and the update logic will not work properly.
Figure 99:
Note 1: The specific number of rows may vary depending on whether you did the Extra
Credit exercise in Lab 3-1.
Note 2: The number of rejected rows shown here does not reflect the number of rows
rejected by the Mapping. Rather, it shows that no errors were thrown by the database.
This is to be expected because the Mapping did not forward any rows with a null key
field to the database. To see the number of rows actually rejected by the Mapping, you
must consult the Session Log.
Lab 9-1
99
100
Lab 9-1
Duration:
60 minutes
Instructions
Step 1.
1)
2)
3)
4)
Lab 10-1
101
Step 2.
1)
2)
Note that there are bad entries in the Postal Code field, such as:
0.000
2112.
NULL (string)
3)
Not every record will require an attempt to repair bad Postal Code.
a)
4)
Step 3.
1)
2)
3)
4)
102
b)
c)
Lab 10-1
d)
Click OK.
5)
Delete the link from the Source Qualifier to the Lookup transformation.
6)
Step 4.
1)
2)
Drag all the ports from the Source Qualifier into the Expression transformation.
3)
Rename it exp_FIND_LOCATION.
Create a new Output port named LOCATION with datatype STRING and
precision 20.
Enter the Expression Editor for the port LOCATION.
(i)
(ii) At the bottom of the list you will see a folder named "Lookups." Open this
folder and you will see the unconnected Lookup transformation you just
created.
(iii) Create the expression so that if the value of POSTAL_CODE is NULL, or
contains the string NULL, or is equal to 0.000, then look up the
POSTAL_CODE in the STG_DEALERSHIP table based on
DEALERSHIP_ID. Otherwise, return the value of POSTAL_CODE.
IIF(((POSTAL_CODE != '0.000') AND NOT
ISNULL(POSTAL_CODE) AND (POSTAL_CODE != 'NULL')),
POSTAL_CODE,
:LKP.LKP_RETURN_LOCATION(DEALERSHIP_ID) )
(iv) Make sure your expression is valid, then click OK.
d)
Lab 10-1
103
Step 5.
Create and Configure a Router Transformation to Classify
Customers
)
1)
2)
Drag all ports from the Expression transformation to the Router transformation
except DEALERSHIP_ID.
3)
Rename it rtr_CLASSIFY_CUSTOMERS.
b)
c)
For the HIGH_VALUE group set the filter condition to send records to this
group when
the High Income flag is set to 1
OR
the value in the Location port is 19104, 10005, 90004, Newport Beach,
Scottsdale, or West Palm Beach
d)
For the SUBPRIME group set the condition to send records to this group when
the High Income flag is set to 0
AND
the value in the Location port is 55409, 98112, 75201, Indianapolis, or
Phoenix.
Note: Obviously these location choices are very simplistic. They are acceptable for
illustrating the use of the Router transformation in this exercise, however.
e)
Step 6.
1)
2)
3)
b)
Rename it SC_ODS_CUSTOMERS_UNCATEGORIZED
Note: Even though the Target instance has been renamed, it will still write to the original
table name. You can verify this by looking at the Shortcut To fields.
c)
4)
104
Lab 10-1
5)
Step 7.
1)
Note: You will override the Relational Writer so that the Subprime and Uncategorized
customers are written to .csv files.
2)
b)
c)
d)
e)
Using the drop box, set the value of the Writers property to File Writer.
In the Properties window, scroll down to find the Output Filename attribute.
Change its value to CUSTOMERS_SUBPRIME_xx.csv
Lab 10-1
105
f)
3)
In the "Flat Files - Targets" dialogue, select Delimited and click Advanced.
In the "Delimited File Properties - Targets" dialogue, make sure that the
Column Delimiter is a comma (,) character.
c)
Click OK
d)
Click OK.
4)
5)
106
Lab 10-1
6)
7)
Extra Credit
1. Extend the invalid POSTAL_CODE search to include fields that have a period character (.)
2. Redesign the mapping so that all of the POSTAL_CODE values are replaced with city names.
Lab 10-1
107
108
Lab 10-1
Duration:
45 minutes
Instructions
Step 1.
1)
2)
3)
4)
Note: This variable will be incremented and used to generate new employee IDs.
Lab 10-2
109
Step 2.
In this step, you will make a non-reusable copy of the reusable transformation you created
earlier in the course, and edit it.
1)
2)
b)
Click and hold the click, then hold down the CTRL key.
c)
d)
Notice that while you are doing this the Status Bar reads, "Make a non-reusable
copy of this transformation and add it to this mapping."
e)
3)
4)
b)
c)
b)
110
Lab 10-2
6)
7)
Link the following ports from the Source Qualifier to the Expression
transformation:
From Port
To Port
FIRST_NAME
FIRSTNAME
LAST_NAME
LASTNAME
GENDER
GENDER
AGE
AGE
Link the following ports from the Expression transformation to the Target:
From Port
To Port
NAME
NAME
GENDER_CATEGORY
GENDER_CATEGORY
SENIOR_FLAG
SENIOR_FLAG
NEW_EMPLOYEE_ID
EMPLOYEE_ID
Lab 10-2
111
Step 3.
b)
c)
d)
Step 4.
1)
Set the Show up to field to 200, to ensure that all 109 rows are visible.
b)
c)
2)
112
Lab 10-2
3)
a)
Step 5.
Note that the value of $$New_ID is the same as the value of the last
EMPLOYEE_ID. It is ready for the next run of the workflow.
2)
3)
Viewing the Source/Target statistics, note that the Source file contained 5 rows that
were added to the Target.
4)
View the Persistent Values for the Session and verify that the number has
incremented by five.
5)
Preview the data in the Target and verify that five new employees have been added
with the appropriate Employee ID numbers.
Extra Credit
If the Mapping had a relational source, how could a similar technique be used to read the Source
incrementally, so that only new records would be read each time the Session was run?
Lab 10-2
113
114
Lab 10-2
Create a Mapplet
Duration:
40 minutes
Instructions
Step 1.
1)
2)
Name it m11_Sales_Summaries_xx.
3)
4)
Arrange All Iconic if the Mapping isn't already arranged that way
Step 2.
Create a Mapplet
1)
2)
Lab 11-1
115
3)
).
4)
5)
6)
7)
8)
9)
).
a)
Drag the SALE_DATE port from the Lookup transformation to the Mapplet
Input transformation.
b)
c)
Rename it in_Sales_Summaries.
b)
Hint: You can use your mouse and Ctrl+C (copy) and Ctrl+V (Paste) to speed the process.
116
c)
d)
Lab 11-1
) to the Mapplet.
Rename it out_Sales_Summary_Weekly.
From agg_SUM_BY_WEEK, drag ports WEEK_DESC and
REVENUE_WEEK_SUM through NET_PROFIT_WEEK_SUM to the
Mapplet Output transformation.
Rename it out_Sales_Summary_Monthly.
From agg_SUM_BY_MONTH, drag ports MONTH_DESC and
REVENUE_MONTH_SUM through NET_PROFIT_MONTH_SUM to the
Mapplet Output transformation.
Lab 11-1
117
Step 3.
1)
2)
3)
4)
Manually link from the port SALE_DATE in the Source Qualifier to the port
SALE_DATE in the in_SALES_SUMMARIES section of the Mapplet.
Hint: You may want to stretch the Mapplet vertically to see as many ports as possible.
5)
118
You will use Autolink to connect the remaining input ports in the Mapplet.
a)
b)
c)
In the To Transformation box, if necessary expand the Mapplet, and select the
Input section in_Sales_Summaries.
d)
Click More.
e)
Lab 11-1
f)
Click OK.
g)
h)
Repeat the process with the suffix _MONTH to complete the links to the Input
section of the Mapplet.
6)
Lab 11-1
119
c)
Tip: This can work because the ports are in exactly the same order on both the Mapplet
output section and the Target. It is equivalent to Autolink by Position, but does not
automatically start with the first port on each transformation.
d)
7)
120
Lab 11-1
Create a more formal Workflow that prevents some types of bad data from getting into
the ODS_SALES table.
Assign Workflow variables to keep track of the number of times the Workflow has been
run.
Increment Workflow variables using an Assignment task.
Branch in a Workflow using link conditions and a Decision task to choose to run the
next Session or report an error.
Duration:
45 minutes
Instructions
Step 1.
1)
b)
a)
b)
c)
Using the EditCopy technique shown in an earlier lab, copy the reusable
Session s_m_Load_STG_TRANSACTIONS from
SC_DATA_STRUCTURES to your ~Developerxx folder.
d)
Lab 12-1
121
3)
c)
(ii) Edit the Session and change its name by adding the suffix _xx.
Step 2.
1)
2)
122
3)
4)
5)
6)
Lab 12-1
Step 3.
1)
2)
3)
4)
Add a link condition to ensure that the Assignment task executes only if the Session
task was successful.
a)
Select the pre-defined function "Status" and set the condition so that the status
must be SUCCEEDED. (See figure.)
5)
6)
Lab 12-1
123
Step 4.
1)
2)
3)
4)
Rename it dcn_RUN_WEEKLY.
In the Properties tab, create a Decision Name expression to see if this is the
seventh day of the Workflow week.
(i)
The Modulus function (MOD) divides two numbers and yields the
remainder.
Tip: The decision task evaluates an expression and returns a value of either TRUE or
FALSE. This value can be checked in a Link condition to determine the direction in
which the Workflow proceeds from the Decision task.
5)
Step 5.
124
1)
2)
3)
Lab 12-1
a)
4)
The Session properties were set correctly in the Workflow where you first created
this Session.
Step 6.
1)
2)
3)
4)
Rename it eml_DAILY_LOAD_COMPLETE
In the Properties tab, enter appropriate values for Email User Name, Email
Subject, and Email Text (see example below).
c)
Click OK.
Lab 12-1
125
d)
e)
Step 7.
You will need to run the Workflow seven times in order to test the weekly aggregate session.
1)
Review the Workflow results in the Gantt view of the Workflow Monitor. It
should appear similar to the figure below:
2)
126
Lab 12-1
3)
4)
Run the workflow six more times to simulate a week's normal runs.
5)
Extra Credit:
Modify the Workflow to fail if any of the Sessions in the Workflow fail.
Hint: You will need to use more than one Control task.
Hint: You can force a Session failure by changing to a Relational connection that references a
database schema that does not have the table in it. For example, change the target table to use
Relational connection OLTP.
Lab 12-1
127
Answers
7.5. After the last run, how is the Gantt char different?
The second Session task is shown connected to the Decision task, and has a status of Succeeded.
128
Lab 12-1
Create a Workflow that loads the ODS_SALES table, then raises an User-Defined event.
Wait for the User-Defined event, then load the Sales Summaries tables.
Stop the workflow nicely if the Sales Summary tables load properly.
Create a third branch to the workflow that starts a 15-minute timer. If the time limit is
reached, then fail the workflow.
Set the workflow to run at a particular time.
Duration:
35 minutes
Instructions
Step 1.
1)
Step 2.
1)
2)
Lab 12-2
129
3)
Drag the Session s_m8_Load_ODS_Sales into the workflow and link the Start task
to it.
4)
Step 3.
1)
2)
3)
4)
Add a link condition to ensure that the Event Raise task executes only if the Session
task was successful.
a)
Double-click the pre-defined function "Status" and set the condition so that the
status must be SUCCEEDED. (See figure.)
5)
130
b)
In the Properties tab, set the User Defined Event to wait for. (See figure.)
Lab 12-2
6)
Step 4.
1)
2)
3)
Rename it ew_Load_ODS_SALES_IS_DONE.
In the Events tab, set a User-Defined event which the Event Wait task will wait
for before executing. See the figure.
4)
Step 5.
1)
2)
3)
Lab 12-2
131
a)
Add a link condition that checks whether the Event Wait task has completed
successfully. (See figure.)
Step 6.
1)
2)
3)
4)
Rename it ctl_Stop_Workflow_Nicely
In the Properties tab tell the Control task to stop the top-level workflow (see
example below).
c)
132
Click OK.
Lab 12-2
Step 7.
1)
2)
3)
Rename it tmr_Wait_15_Minutes
In the Timer tab tell the Timer task to count 15 minutes from the time the
parent workflow started (see example below).
c)
Step 8.
Click OK
1)
2)
3)
4)
Rename it ctl_Fail_Load_ODS_SALES_SUMMARIES
In the Properties tab tell the Control task to Fail the top-level workflow (see
example below).
Lab 12-2
133
c)
Step 9.
Click OK.
1)
2)
3)
Step 10.
134
1)
2)
3)
4)
5)
6)
Set the workflow to start a few minutes from now. For example, if it is 12:55AM,
set the workflow to start at 1:00AM. (see figure below)
Lab 12-2
Figure 137: Edit Scheduler window set to start the workflow a few minutes from now
7)
Click OK
8)
Click OK
9)
Lab 12-2
135
Step 11.
1)
The workflow will start, but not execute until the date and time set in the Scheduler.
Wait until it starts.
2)
Review the Workflow results in the Gantt view of the Workflow Monitor. It should
appear similar to the figure below:
Note that the first Control task stopped the workflow before the second one failed it.
The first Control task is needed so the second one doesn't execute after 15 minutes
every time the workflow is run.
136
Lab 12-2
After the high-level flow has been established, document the details at the field level,
listing each of the Target fields and the Source field(s) used to create each Target field.
o
Document any expression that may be needed to generate the Target field (e.g.: a
sum of a field, a multiplication of two fields, a comparison of two fields, etc.).
Whatever the rules, be sure to document them at this point, and remember to
keep it at a physical level.
The designer may have to do some investigation at this point for some business
rules. For example, the business rules may say "For active customers, calculate a
late fee rate." The designer of the Mapping must determine that, on a physical
level, this translates to "for customers with an ACTIVE_FLAG of 1, multiply the
DAYS_LATE field by the LATE_DAY_RATE field."
Create an inventory of Mappings and reusable objects. This list is a "work in progress"
and will have to be continually updated as the project moves forward.
o
The administrator or lead developer should gather all the potential Sources, Targets, and
reusable objects and place them in a folder accessible to all who may need access to
them.
o
These lists are valuable to everyone, but especially for the lead developer. These
objects can be assigned to individual developers and progress tracked over the
course of the project.
If a shared folder for Sources and Targets is not available, the developer will need to
obtain the Source and Target database schema owners, passwords, and connect strings.
Workshop 1
137
With this information, ODBC connections can be created in the Designer tool to
allow access to the Source and Target definitions.
Reusable objects need to be properly documented to make it easier for other developers
to determine whether they can/should use them in their own development.
The Informatica Velocity methodology provides a matrix that assists in detailing the
relationships between Source fields and Target fields (Mapping Specifications.doc). It
also depicts fields that are derived from values in the Source and eventually linked to
ports in the Target.
Document any other information about the Mapping that is likely to be helpful in
developing it. Helpful information may, for example, include Source and Target
connection information, Lookups (and how to match data in the Lookup table),
potential data issues at a field level, any known issues with particular fields, pre-or postMapping processing requirements, and any information about specific error handling
requirements for the Mapping.
The completed Mapping design should then be reviewed with one or more team
members for completeness and adherence to the business requirements.
o
In addition, the design document should be updated whenever the business rules
change, or if more information is gathered during the build process.
Mapping Specifics
The following tips will make the Mapping development process more efficient. (Not in any
particular order.)
One of the first things to do is to bring all required Source and Target objects into the
Mapping.
138
Note, however, that all ports must be connected from the Source definition to the
Source Qualifier transformation.
Only needed fields should be projected from Source Qualifiers that originate with
Relational tables. The SQL that PowerCenter generates will include only the
needed fields, reducing computing resource requirements. In this case, only
connect from the Source Qualifier those fields that will be used subsequently.
Filter rows early and often. Only manipulate data that needs to be moved and
transformed. Reduce the number of non-essential records passed through the Mapping.
Decide if a Source Qualifier join will net the result needed, versus creating a Lookup to
retrieve desired results.
Workshop 1
Make use of variables (local or global) to reduce the number of times functions will have
to be used.
Make use of variables, reusable transformations, and Mapplets as "reusable code." These
will leverage the work being done by others, promote standardization, and ease
maintenance tasks.
Use active transformations early in the process to reduce the number of records as early
in the Mapping as possible.
Utilize single-pass reads. Design Mappings to utilize one Source Qualifier to populate
multiple Targets.
When the Source is large, cache lookup tables columns for lookup tables with
500,000 rows or less on 32-bit platforms with limited RAM memory.
Standard rule of thumb is not to cache tables over 500,000 rows on 32-bit
platforms with limited RAM.
Operators are faster than functions (i.e., || is faster than the CONCAT function).
Use flat files. File read/writes are faster than database read/writes on the same
server. Fixed-width files are faster than delimited file processing.
Workshop 1
139
Workshop
Scenario:
Management wants the ability to analyze how certain promotions are performing. They
want to be able to gather the promotions by day for each dealership, for each product
sold.
Goals:
Duration:
120 minutes
Instructions
Sources and Targets
Sources: TRANSACTIONS and PRODUCT_COST
These relational tables contain sales transactions and Product cost data for seven days. They
are located in the SDBU schema. For the purpose of this mapping, we will read all the data
in these tables.
These tables can be joined on PRODUCT_ID and PRODUCT_CODE
Figure 140: TRANSACTION table definition
Target: ODS_PROMOTIONS_DAILY
This is a relational table located in the ODSxx schema. After running the Mapping, it should
contain 1283 rows.
140
Workshop 1
Mapping Details
In order to successfully create the mapping, you will need to know some additional details.
Management has decided that they don't need to keep track of the Manager Discount
and the Employee Discount (PROMO_ID 105 and 200), so these will need to be
excluded from the load.
The DATE_DESC can be obtained from the STG_DATES table by matching the
TRANSACTION table DATE_ID to the DATE_ID in STG_DATES.
REVENUE is derived by taking the value in the QUANTITY port times the
SELLING_PRICE and then subtracting the DISCOUNT, HOLDBACK and
REBATE.
Most of the discounts are valid but occasionally they may be higher than the
acceptable value of 17.25%. When this occurs you will need to obtain an acceptable
value based on the PROMO_ID. The acceptable value can be obtained from the
PROMOTIONS table by matching the PROMO_ID.
Workshop 1
141
142
Target
Column
Source
File
Source
Column
Workshop 1
Expression
Run Details
Your Task Details, Source/Target Statistics, and preview of the Target data should be similar
to the figures below.
Figure 143: Task Details of the Completed Run
Workshop 1
143
144
Workshop 1
Create an inventory of Worklets and reusable tasks. This list is a "work in progress" list
and will have to be continually updated as the project moves forward.
The lists are valuable to everyone, but particularly for the lead developer.
Making an up-front decision to make all Session, Email and Command tasks
reusable will make this easier.
The administrator or lead developer should put together a list of database connections to
be used for Source and Target connection values.
Reusable tasks must be properly documented to make it easier for other developers to
determine whether they can or should use them in their own development.
Workshop 2
145
If the volume of data is sufficiently low for the available hardware to handle, you may
consider volume analysis optional, developing the load process solely on the dependency
analysis.
Another possible component to add into the load process is sending email. Three email
options are available for notification during the load process:
Post-session emails can be sent after a Session completes successfully or when if fails.
Email tasks can be placed in Workflows before or after an event or series of events.
Document any other information about the Workflow that is likely to be helpful in
developing. Helpful information may, for example, include Source and Target database
connection information; pre- or post-Workflow processing requirements; and any
information about specific error handling for the Workflow.
Create a Load Dependency analysis. This should list all Sessions by dependency, along
with all other events (Informatica or other) they depend on.
Also, be sure to specify the dependency relationship between each Session or event,
the algorithm or logic needed to test the dependency during execution, and the
impact of any possible dependency test results (e.g., don't run a Session, fail a
Session, fail a parent or Worklet, etc.)
Create a Load Volume analysis. This should list all the Sources and row counts and row
widths expected for each Session.
146
If the hardware is not adequate to run the Sessions concurrently, you will need to
prioritize them. The highest priority within a group is usually assigned to Sessions
with the most child dependencies.
This should include all Lookup transformations in addition to the extract Sources.
The amount of data that is read to initialize a Lookup cache can materially affect the
initialization and execution time of a Session.
The completed Workflow design should then be reviewed with one or more team
members for completeness and adherence to the business requirements.
The design document should be updated whenever the business rules change, or if more
information is gathered during the build process.
Workshop 2
Workflow Specifics
The following tips will make the Workflow development process more efficient (not in any
particular order).
When developing a sequential Workflow, use the Workflow Wizard to create Sessions in
sequence. You also have the option to create dependencies between Sessions.
Use a parameter file to define the values for parameters and variables used in a
Workflow, Worklet, Mapping, or Session. A parameter file can be created using a text
editor such as WordPad or Notepad. List the parameters or variables and their values in
the parameter file. The use of Parameter files is covered in the Level 2 Developer course.
Parameter files can contain the following types of parameters and variables:
Workflow variables
Worklet variables
Session parameters
Session parameters must be defined in a parameter file. Since Session parameters do not
have default values, when the Integration Service cannot locate the value of a Session
parameter in the parameter file, it fails to initialize the Session.
To include parameter or variable information for more than one Workflow, Worklet, or
Session in a single parameter file, create separate sections for each object within the
parameter file.
You can create multiple parameter fields for a single Workflow, Worklet, or Session and
change the file these tasks use as necessary. To specify the parameter file the Integration
Service uses with a Workflow, Worklet, or Session, do either of the following:
Enter the parameter file name and directory in the Workflow, Worklet, or Session
properties.
Start the Workflow, Worklet or Session using pmcmd and enter the parameter
filename and directory on the command line.
On hardware systems that are underutilized, you may be able to improve performance by
processing partitioned datasets in parallel in multiple threads of the same Session
instance running on the Integration Service node.
Workshop 2
147
148
This allows the Integration Service to update your Target incrementally, rather than
forcing it to process the entire Source and recalculate the same calculations each
time you run the Session.
Loading directly into the Target is possible when the data is going to be bulk loaded.
Load into flat files and bulk load using an external loader.
From the Workflow Manager Tools menu, select Options and deselect the option to
"Show full names of task." This will show the entire name of all tasks in the Workflow.
Workshop 2
Workshop
Scenario:
Goals:
Duration:
120 minutes
Instructions
Mappings Required
This section contains a listing of the Mappings that will be used in the workflow:
m_Load_STG_PAYMENT_TYPE
m_Load_STG_Product
m_Load_STG_Dealership
m_Load_STG_PROMOTIONS
m_Load_STG_CUSTOMERS
m_Load_STG_TRANSACTIONS
m_Load_STG_EMPLOYEES
For your convenience, reusable Sessions have been created for these mappings. You can
COPY them from the SC_DATA_STRUCTURES folder to your folder. (One or more of
these Sessions may already be in your Sessions subfolder.) Remember to use the Repository
Manager to copy the sessions. If the copy wizard asks to resolve any conflicts, tell it to replace
old definitions with new ones.
The names of the sessions are:
s_m_Load_STG_PAYMENT_TYPE
s_m_Load_STG_PRODUCT
s_m_Load_STG_DEALERSHIP
s_m_Load_STG_PROMOTIONS
s_m_Load_STG_CUSTOMERS
s_m_Load_STG_TRANSACTIONS
s_m_Load_STG_EMPLOYEES
Workshop 2
149
Workflow Details
1.
2.
No Session can begin until an indicator file shows up. The indicator file will be
named fileindxx.txt, and will be created by you using a text editor. You will need to
place this file in the directory indicated by the Instructor after you start the
Workflow. (If you are in a UNIX environment, you can skip this requirement.)
3.
In order to utilize the CPU in a more efficient manner, you will want to run some of
the Sessions sequentially and some of them concurrently.
a.
b.
c.
d.
e.
f.
4.
All Sessions truncate the Target tables and should be pointed to the correct
relational database connections.
5.
Final Point
More than one solution is possible. You will know that your solution has worked when all
the Sessions complete successfully.
150
Workshop 2