0% found this document useful (0 votes)
584 views31 pages

BASIC Transformer Stage

Transformer stage in IBM- DATASTAGE.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
584 views31 pages

BASIC Transformer Stage

Transformer stage in IBM- DATASTAGE.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

BASIC Transformer stage

http://pic.dhe.ibm.com/infocenter/iisinfsv/v8r7/advanced/print.jsp?topic=...

BASIC Transformer stage


Contents
1. BASIC Transformer stage: fast path 2. BASIC Transformer editor components 2.1. BASIC Transformer stage: Toolbar 2.2. BASIC Transformer stage: Link area 2.3. BASIC Transformer stage: Metadata area 2.4. BASIC Transformer stage: Shortcut menus 3. BASIC Transformer stage basic concepts 3.1. BASIC Transformer stage: Input link 3.2. BASIC Transformer stage: Output links 3.3. BASIC Transformer stage: Before-stage and after-stage routines 4. Editing BASIC transformer stages 4.1. Using drag-and-drop 4.2. Find and replace facilities 4.3. Select facilities 4.4. Creating and deleting columns 4.5. Moving columns within a link 4.6. Editing column meta data 4.7. Defining output column derivations 4.7.1. Column auto-match facility 4.8. Editing multiple derivations 4.8.1. Whole expression 4.8.2. Part of expression 4.9. Specifying before-stage and after-stage subroutines 4.10. Defining constraints and handling reject links 4.11. Specifying link order 4.12. Defining local stage variables 5. The InfoSphere DataStage expression editor 5.1. Expression format 5.2. Entering expressions 5.3. Completing variable names 5.4. Validating the expression 5.5. Exiting the expression editor 5.6. Configuring the expression editor 6. BASIC Transformer stage properties 6.1. BASIC Transformer stage: Stage page 6.1.1. BASIC Transformer stage: Advanced tab 6.2. BASIC Transformer stage: Input page 6.2.1. BASIC Transformer stage: Partitioning tab 6.3. BASIC Transformer stage: Output page
IBM InfoSphere DataStage, Version 8.7.0 Feedback

BASIC Transformer stage


The BASIC Transformer stage is a processing stage. It appears under the processing
1 of 31

9/18/2013 4:50 PM

BASIC Transformer stage

http://pic.dhe.ibm.com/infocenter/iisinfsv/v8r7/advanced/print.jsp?topic=...

category in the tool palette in the Transformer shortcut container. The BASIC Transformer stage is similar in appearance and function to the Transformer stage described in Transformer stage. It gives access to BASIC transforms and functions (BASIC is the language supported by the server engine and available in server jobs). For a description of the BASIC functions available see InfoSphere DataStage Server Job Developer Guide. You can only use BASIC transformer stages on SMP systems (not on MPP or cluster systems). Note: If you encounter a problem when running a job containing a BASIC transformer, you could try increasing the value of the DSIPC_OPEN_TIMEOUT environment variable in the Parallel Operator specific category of the environment variable dialog box in the DataStage Administrator (see InfoSphere DataStage Administrator Client Guide). BASIC Transformer stages can have a single input and any number of outputs. BASIC Transformer stage: fast path BASIC Transformer editor components BASIC Transformer stage basic concepts Editing BASIC transformer stages The InfoSphere DataStage expression editor BASIC Transformer stage properties Parent topic: Processing Data

Release date: 2011-10-01 PDF version of this information: IBM InfoSphere DataStage and QualityStage Parallel Job Developer's Guide

1. IBM InfoSphere DataStage, Version 8.7.0

Feedback

BASIC Transformer stage: fast path


About this task
This section specifies the minimum steps to take to get a BASIC Transformer stage functioning. InfoSphere DataStage provides a versatile user interface, and there are many shortcuts to achieving a particular end, this section describes the basic method, you will learn where the shortcuts are when you get familiar with the product. In the left pane: Ensure that you have column metadata defined. In the right pane:

2 of 31

9/18/2013 4:50 PM

BASIC Transformer stage

http://pic.dhe.ibm.com/infocenter/iisinfsv/v8r7/advanced/print.jsp?topic=...

Ensure that you have column metadata defined for each of the output links. The easiest way to do this is to drag columns across from the input link. Define the derivation for each of your output columns. You can leave this as a straight mapping from an input column, or explicitly define an expression to transform the data before it is output. Optionally specify a constraint for each output link. This is an expression which input rows must satisfy before they are output on a link. Rows that are not output on any of the links can be output on the otherwise link. Optionally specify one or more stage variables. This provides a method of defining expressions which can be reused in your output columns derivations (stage variables are only visible within the stage). Parent topic: BASIC Transformer stage

Release date: 2011-10-01 PDF version of this information: IBM InfoSphere DataStage and QualityStage Parallel Job Developer's Guide

2. IBM InfoSphere DataStage, Version 8.7.0

Feedback

BASIC Transformer editor components


The BASIC Transformer Editor has the following components. BASIC Transformer stage: Toolbar BASIC Transformer stage: Link area BASIC Transformer stage: Metadata area BASIC Transformer stage: Shortcut menus Parent topic: BASIC Transformer stage

Release date: 2011-10-01 PDF version of this information: IBM InfoSphere DataStage and QualityStage Parallel Job Developer's Guide

2.1. IBM InfoSphere DataStage, Version 8.7.0

Feedback

BASIC Transformer stage: Toolbar


The Transformer toolbar contains the following buttons (from left to right): Stage properties Constraints Show all Show/hide stage variables Cut Copy

3 of 31

9/18/2013 4:50 PM

BASIC Transformer stage

http://pic.dhe.ibm.com/infocenter/iisinfsv/v8r7/advanced/print.jsp?topic=...

Paste Find/replace Load column definition Save column definition Column auto-match Input link execution order Output link execution order Parent topic: BASIC Transformer editor components

Release date: 2011-10-01 PDF version of this information: IBM InfoSphere DataStage and QualityStage Parallel Job Developer's Guide

2.2. IBM InfoSphere DataStage, Version 8.7.0

Feedback

BASIC Transformer stage: Link area


The top area displays links to and from the BASIC Transformer stage, showing their columns and the relationships between them. The link area is where all column definitions and stage variables are defined. The link area is divided into two panes; you can drag the splitter bar between them to resize the panes relative to one another. There is also a horizontal scroll bar, allowing you to scroll the view left or right. The left pane shows the input link, the right pane shows output links. Output columns that have no derivation defined are shown in red. Within the Transformer Editor, a single link might be selected at any one time. When selected, the link's title bar is highlighted, and arrowheads indicate any selected columns. Parent topic: BASIC Transformer editor components

Release date: 2011-10-01 PDF version of this information: IBM InfoSphere DataStage and QualityStage Parallel Job Developer's Guide

2.3. IBM InfoSphere DataStage, Version 8.7.0

Feedback

BASIC Transformer stage: Metadata area


The bottom area shows the column metadata for input and output links. Again this area is divided into two panes: the left showing input link meta data and the right showing output link meta data. The meta data for each link is shown in a grid contained within a tabbed page. Click the tab to bring the required link to the front. That link is also selected in the link area.

4 of 31

9/18/2013 4:50 PM

BASIC Transformer stage

http://pic.dhe.ibm.com/infocenter/iisinfsv/v8r7/advanced/print.jsp?topic=...

If you select a link in the link area, its metadata tab is brought to the front automatically. You can edit the grids to change the column meta data on any of the links. You can also add and delete metadata. Parent topic: BASIC Transformer editor components

Release date: 2011-10-01 PDF version of this information: IBM InfoSphere DataStage and QualityStage Parallel Job Developer's Guide

2.4. IBM InfoSphere DataStage, Version 8.7.0

Feedback

BASIC Transformer stage: Shortcut menus


The BASIC Transformer Editor shortcut menus are displayed by right-clicking the links in the links area. There are slightly different menus, depending on whether you right-click an input link, an output link, or a stage variable. The input link menu offers you operations on input columns, the output link menu offers you operations on output columns and their derivations, and the stage variable menu offers you operations on stage variables. The shortcut menu enables you to: Open the Stage Properties dialog box in order to specify stage or link properties. Open the Constraints dialog box to specify a constraint (only available for output links). Open the Column Auto Match dialog box. Display the Find/Replace dialog box. Display the Select dialog box. Edit, validate, or clear a derivation or stage variable. Edit several derivations in one operation. Append a new column or stage variable to the selected link. Select all columns on a link. Insert or delete columns or stage variables. Cut, copy, and paste a column or a key expression or a derivation or stage variable. If you display the menu from the links area background, you can: Open the Stage Properties dialog box in order to specify stage or link properties. Open the Constraints dialog box in order to specify a constraint for the selected output link. Open the Link Execution Order dialog box in order to specify the order in which links should be processed. Toggle between viewing link relations for all links, or for the selected link only. Toggle between displaying stage variables and hiding them. Right-clicking in the meta data area of the Transformer Editor opens the standard grid editing shortcut menus.

5 of 31

9/18/2013 4:50 PM

BASIC Transformer stage

http://pic.dhe.ibm.com/infocenter/iisinfsv/v8r7/advanced/print.jsp?topic=...

Parent topic: BASIC Transformer editor components

Release date: 2011-10-01 PDF version of this information: IBM InfoSphere DataStage and QualityStage Parallel Job Developer's Guide

3. IBM InfoSphere DataStage, Version 8.7.0

Feedback

BASIC Transformer stage basic concepts


When you first edit a Transformer stage, it is likely that you will have already defined what data is input to the stage on the input links. You will use the Transformer Editor to define the data that will be output by the stage and how it will be transformed. (You can define input data using the Transformer Editor if required.) This section explains some of the basic concepts of using a Transformer stage. BASIC Transformer stage: Input link BASIC Transformer stage: Output links BASIC Transformer stage: Before-stage and after-stage routines Parent topic: BASIC Transformer stage

Release date: 2011-10-01 PDF version of this information: IBM InfoSphere DataStage and QualityStage Parallel Job Developer's Guide

3.1. IBM InfoSphere DataStage, Version 8.7.0

Feedback

BASIC Transformer stage: Input link


The input data source is joined to the BASIC Transformer stage via the input link. Parent topic: BASIC Transformer stage basic concepts

Release date: 2011-10-01 PDF version of this information: IBM InfoSphere DataStage and QualityStage Parallel Job Developer's Guide

3.2. IBM InfoSphere DataStage, Version 8.7.0

Feedback

BASIC Transformer stage: Output links


You can have any number of output links from your Transformer stage. You might want to pass some data straight through the BASIC Transformer stage unaltered, but it's likely that you'll want to transform data from some input columns before outputting it from the BASIC Transformer stage.
6 of 31

9/18/2013 4:50 PM

BASIC Transformer stage

http://pic.dhe.ibm.com/infocenter/iisinfsv/v8r7/advanced/print.jsp?topic=...

You can specify such an operation by entering an expression or by selecting a transform to apply to the data. InfoSphere DataStage has many built-in transforms, or you can define your own custom transforms that are stored in the Repository and can be reused as required. The source of an output link column is defined in that column's Derivation cell within the Transformer Editor. You can use the Expression Editor to enter expressions or transforms in this cell. You can also simply drag an input column to an output column's Derivation cell, to pass the data straight through the BASIC Transformer stage. In addition to specifying derivation details for individual output columns, you can also specify constraints that operate on entire output links. A constraint is a BASIC expression that specifies criteria that data must meet before it can be passed to the output link. You can also specify a reject link, which is an output link that carries all the data not output on other links, that is, columns that have not met the criteria. Each output link is processed in turn. If the constraint expression evaluates to TRUE for an input row, the data row is output on that link. Conversely, if a constraint expression evaluates to FALSE for an input row, the data row is not output on that link. Constraint expressions on different links are independent. If you have more than one output link, an input row might result in a data row being output from some, none, or all of the output links. For example, if you consider the data that comes from a paint shop, it could include information about any number of different colors. If you want to separate the colors into different files, you would set up different constraints. You could output the information about green and blue paint on LinkA, red and yellow paint on LinkB, and black paint on LinkC. When an input row contains information about yellow paint, the LinkA constraint expression evaluates to FALSE and the row is not output on LinkA. However, the input data does satisfy the constraint criterion for LinkB and the rows are output on LinkB. If the input data contains information about white paint, this does not satisfy any constraint and the data row is not output on Links A, B or C, but will be output on the reject link. The reject link is used to route data to a table or file that is a "catch-all" for rows that are not output on any other link. The table or file containing these rejects is represented by another stage in the job design. Parent topic: BASIC Transformer stage basic concepts

Release date: 2011-10-01 PDF version of this information: IBM InfoSphere DataStage and QualityStage Parallel Job Developer's Guide

3.3. IBM InfoSphere DataStage, Version 8.7.0

Feedback

BASIC Transformer stage: Before-stage and after-stage routines

7 of 31

9/18/2013 4:50 PM

BASIC Transformer stage

http://pic.dhe.ibm.com/infocenter/iisinfsv/v8r7/advanced/print.jsp?topic=...

You can specify routines to be executed before or after the stage has processed the data. For example, you might use a before-stage routine to prepare the data before processing starts. You might use an after-stage routine to send an electronic message when the stage has finished. Parent topic: BASIC Transformer stage basic concepts

Release date: 2011-10-01 PDF version of this information: IBM InfoSphere DataStage and QualityStage Parallel Job Developer's Guide

4. IBM InfoSphere DataStage, Version 8.7.0

Feedback

Editing BASIC transformer stages


About this task
The Transformer Editor enables you to perform the following operations on a BASIC Transformer stage: Create new columns on a link Delete columns from within a link Move columns within a link Edit column meta data Define output column derivations Specify before- and after-stage subroutines Define link constraints and handle rejects Specify the order in which links are processed Define local stage variables Using drag-and-drop Many of the BASIC Transformer stage edits can be made simpler by using the Transformer Editor's drag-and-drop functionality. Find and replace facilities Select facilities If you are working on a complex job where several links, each containing several columns, go in and out of the Transformer stage, you can use the select column facility to select multiple columns. This facility is also available in the Mapping tabs of certain Parallel job stages. Creating and deleting columns Moving columns within a link Editing column meta data Defining output column derivations You can define the derivation of output columns from within the Transformer Editor

8 of 31

9/18/2013 4:50 PM

BASIC Transformer stage

http://pic.dhe.ibm.com/infocenter/iisinfsv/v8r7/advanced/print.jsp?topic=...

in five ways. Editing multiple derivations Specifying before-stage and after-stage subroutines Defining constraints and handling reject links You can define a constraint to define limits for output data. You can also specify reject links. Specifying link order You can specify links to be in a particular order. Defining local stage variables You can declare a stage variable. Parent topic: BASIC Transformer stage

Release date: 2011-10-01 PDF version of this information: IBM InfoSphere DataStage and QualityStage Parallel Job Developer's Guide

4.1. IBM InfoSphere DataStage, Version 8.7.0

Feedback

Using drag-and-drop
Many of the BASIC Transformer stage edits can be made simpler by using the Transformer Editor's drag-and-drop functionality.

About this task


You can drag columns from any link to any other link. Common uses are: Copying input columns to output links Moving columns within a link Copying derivations in output links

Procedure
1. Click the source cell to select it. 2. Click the selected cell again and, without releasing the mouse button, drag the mouse pointer to the desired location within the target link. An insert point appears on the target link to indicate where the new cell will go. 3. Release the mouse button to drop the selected cell.

Results
You can drag multiple columns or derivations. Use the standard Explorer keys when selecting the source column cells, then proceed as for a single cell. You can drag the full column set by dragging the link title.

9 of 31

9/18/2013 4:50 PM

BASIC Transformer stage

http://pic.dhe.ibm.com/infocenter/iisinfsv/v8r7/advanced/print.jsp?topic=...

You can add a column to the end of an existing derivation by holding down the Ctrl key as you drag the column. Parent topic: Editing BASIC transformer stages

Release date: 2011-10-01 PDF version of this information: IBM InfoSphere DataStage and QualityStage Parallel Job Developer's Guide

4.2. IBM InfoSphere DataStage, Version 8.7.0

Feedback

Find and replace facilities


About this task
If you are working on a complex job where several links, each containing several columns, go in and out of the BASIC Transformer stage, you can use the find/replace column facility to help locate a particular column or expression and change it. The find/replace facility enables you to: Find Find Find Find and replace a column name and replace expression text the next empty expression the next expression that contains an error

To use the find/replace facilities, do one of the following: Click the find/replace button on the toolbar Choose find/replace from the link shortcut menu Type Ctrl-F The Find and Replace dialog box appears. It has three tabs: Expression Text. Allows you to locate the occurrence of a particular string within an expression, and replace it if required. You can search up or down, and choose to match case, match whole words, or neither. You can also choose to replace all occurrences of the string within an expression. Columns Names. Allows you to find a particular column and rename it if required. You can search up or down, and choose to match case, match the whole word, or neither. Expression Types. Allows you to find the next empty expression or the next expression that contains an error. You can also press Ctrl-M to find the next empty expression or Ctrl-N to find the next erroneous expression. Note: The find and replace results are shown in the color specified in Tools > Options. Press F3 to repeat the last search you made without opening the Find and Replace dialog box.

10 of 31

9/18/2013 4:50 PM

BASIC Transformer stage

http://pic.dhe.ibm.com/infocenter/iisinfsv/v8r7/advanced/print.jsp?topic=...

Parent topic: Editing BASIC transformer stages

Release date: 2011-10-01 PDF version of this information: IBM InfoSphere DataStage and QualityStage Parallel Job Developer's Guide

4.3. IBM InfoSphere DataStage, Version 8.7.0

Feedback

Select facilities
If you are working on a complex job where several links, each containing several columns, go in and out of the Transformer stage, you can use the select column facility to select multiple columns. This facility is also available in the Mapping tabs of certain Parallel job stages.

About this task


The select facility enables you to: Select all columns/stage variables whose expressions contains text that matches the text specified. Select all column/stage variables whose name contains the text specified (and, optionally, matches a specified type). Select all columns/stage variable with a certain data type. Select all columns with missing or invalid expressions. To use the select facilities, choose Select from the link shortcut menu. The Select dialog box appears. It has three tabs: Expression Text. This Expression Text tab allows you to select all columns/stage variables whose expressions contain text that matches the text specified. The text specified is a simple text match, taking into account the Match case setting. Column Names. The Column Names tab allows you to select all column/stage variables whose Name contains the text specified. There is an additional Data Type drop down list, that will limit the columns selected to those with that data type. You can use the Data Type drop down list on its own to select all columns of a certain data type. For example, all string columns can be selected by leaving the text field blank, and selecting String as the data type. The data types in the list are generic data types, where each of the column SQL data types belong to one of these generic types. Expression Types. The Expression Types tab allows you to select all columns with either empty expressions or invalid expressions. Parent topic: Editing BASIC transformer stages

Release date: 2011-10-01 PDF version of this information: IBM InfoSphere DataStage and QualityStage Parallel Job Developer's Guide

4.4. IBM InfoSphere DataStage, Version 8.7.0

Feedback

11 of 31

9/18/2013 4:50 PM

BASIC Transformer stage

http://pic.dhe.ibm.com/infocenter/iisinfsv/v8r7/advanced/print.jsp?topic=...

Creating and deleting columns


About this task
You can create columns on links to the BASIC Transformer stage using any of the following methods: Select the link, then click the load column definition button in the toolbar to open the standard load columns dialog box. Use drag-and-drop or copy and paste functionality to create a new column by copying from an existing column on another link. Use the shortcut menus to create a new column definition. Edit the grids in the link's meta data tab to insert a new column. When copying columns, a new column is created with the same meta data as the column it was copied from. To delete a column from within the Transformer Editor, select the column you want to delete and click the cut button or choose Delete Column from the shortcut menu. Parent topic: Editing BASIC transformer stages

Release date: 2011-10-01 PDF version of this information: IBM InfoSphere DataStage and QualityStage Parallel Job Developer's Guide

4.5. IBM InfoSphere DataStage, Version 8.7.0

Feedback

Moving columns within a link


About this task
You can move columns within a link using either drag-and-drop or cut and paste. Select the required column, then drag it to its new location, or cut it and paste it in its new location. Parent topic: Editing BASIC transformer stages

Release date: 2011-10-01 PDF version of this information: IBM InfoSphere DataStage and QualityStage Parallel Job Developer's Guide

4.6. IBM InfoSphere DataStage, Version 8.7.0

Feedback

Editing column meta data


About this task
You can edit column meta data from within the grid in the bottom of the Transformer

12 of 31

9/18/2013 4:50 PM

BASIC Transformer stage

http://pic.dhe.ibm.com/infocenter/iisinfsv/v8r7/advanced/print.jsp?topic=...

Editor. Select the tab for the link meta data that you want to edit, then use the standard InfoSphere DataStage edit grid controls. The meta data shown does not include column derivations since these are edited in the links area. Parent topic: Editing BASIC transformer stages

Release date: 2011-10-01 PDF version of this information: IBM InfoSphere DataStage and QualityStage Parallel Job Developer's Guide

4.7. IBM InfoSphere DataStage, Version 8.7.0

Feedback

Defining output column derivations


You can define the derivation of output columns from within the Transformer Editor in five ways.

About this task


Choose one of the following ways to define the derivation of output columns from within the Transformer Editor: If you require a new output column to be directly derived from an input column, with no transformations performed, then you can use drag-and-drop or copy and paste to copy an input column to an output link. The output columns will have the same names as the input columns from which they were derived. If the output column already exists, you can drag or copy an input column to the output column's Derivation field. This specifies that the column is directly derived from an input column, with no transformations performed. You can use the column auto-match facility to automatically set that output columns are derived from their matching input columns. You might need one output link column derivation to be the same as another output link column derivation. In this case you can use drag and drop or copy and paste to copy the derivation cell from one column to another. In many cases you will need to transform data before deriving an output column from it. For these purposes you can use the Expression Editor. To display the Expression Editor, double-click on the required output link column Derivation cell. (You can also invoke the Expression Editor using the shortcut menu or the shortcut keys.) If a derivation is displayed in red (or the color defined in Tools > Options), it means that the Transformer Editor considers it incorrect. (In some cases this might simply mean that the derivation does not meet the strict usage pattern rules of the server engine, but will actually function correctly.) Once an output link column has a derivation defined that contains any input link columns, then a relationship line is drawn between the input column and the output column, as shown in the following example. This is a simple example; there can be multiple relationship lines either in or out of columns. You can choose whether to view the relationships for all links, or just the relationships for the selected links, using the

13 of 31

9/18/2013 4:50 PM

BASIC Transformer stage

http://pic.dhe.ibm.com/infocenter/iisinfsv/v8r7/advanced/print.jsp?topic=...

button in the toolbar. Column auto-match facility Parent topic: Editing BASIC transformer stages

Release date: 2011-10-01 PDF version of this information: IBM InfoSphere DataStage and QualityStage Parallel Job Developer's Guide

4.7.1. IBM InfoSphere DataStage, Version 8.7.0

Feedback

Column auto-match facility


About this task
This time-saving feature allows you to automatically set columns on an output link to be derived from matching columns on an input link. Using this feature you can fill in all the output link derivations to route data from corresponding input columns, then go back and edit individual output link columns where you want a different derivation.

Procedure
1. Do one of the following: Click the Auto-match button in the Transformer Editor toolbar. Choose Auto-match from the input link header or output link header shortcut menu. TheColumn Auto-Match dialog box appears. 2. Choose the input link and output link that you want to match columns for from the drop down lists. 3. Click Location match or Name match from the Match type area. If you choose Location match, this will set output column derivations to the input link columns in the equivalent positions. It starts with the first input link column going to the first output link column, and works its way down until there are no more input columns left. 4. Click OK to proceed with the auto-matching. Note: Auto-matching does not take into account any data type incompatibility between matched columns; the derivations are set regardless. Parent topic: Defining output column derivations

Release date: 2011-10-01 PDF version of this information: IBM InfoSphere DataStage and QualityStage Parallel Job Developer's

14 of 31

9/18/2013 4:50 PM

BASIC Transformer stage

http://pic.dhe.ibm.com/infocenter/iisinfsv/v8r7/advanced/print.jsp?topic=...

Guide

4.8. IBM InfoSphere DataStage, Version 8.7.0

Feedback

Editing multiple derivations


About this task
You can make edits across several output column or stage variable derivations by choosing Derivation Substitution... from the shortcut menu. This opens the Expression Substitution dialog box. The Expression Substitution dialog box allows you to make the same change to the expressions of all the currently selected columns within a link. For example, if you wanted to add a call to the trim() function around all the string output column expressions in a link, you could do this in two steps. First, use the Select dialog to select all the string output columns. Then use the Expression Substitution dialog to apply a trim() call around each of the existing expression values in those selected columns. You are offered a choice between Whole expression substitution and Part of expression substitution. Whole expression With this option the whole existing expression for each column is replaced by the replacement value specified. Part of expression With this option, only part of each selected expression is replaced rather than the whole expression. The part of the expression to be replaced is specified by a Regular Expression match. Parent topic: Editing BASIC transformer stages

Release date: 2011-10-01 PDF version of this information: IBM InfoSphere DataStage and QualityStage Parallel Job Developer's Guide

4.8.1. IBM InfoSphere DataStage, Version 8.7.0

Feedback

Whole expression
With this option the whole existing expression for each column is replaced by the replacement value specified.

About this task


This replacement value can be a completely new value, but will usually be a value based on the original expression value. When specifying the replacement value, the existing value of the column's expression can be included in this new value by including "$1". This can be included any number of times.

15 of 31

9/18/2013 4:50 PM

BASIC Transformer stage

http://pic.dhe.ibm.com/infocenter/iisinfsv/v8r7/advanced/print.jsp?topic=...

For example, when adding a trim() call around each expression of the currently selected column set, having selected the required columns, you can use the following procedure.

Procedure
1. Select the Whole expression option. 2. Enter a replacement value of:
trim($1)

3. Click OK

Results
Where a column's original expression was:
DSLink3.col1

This will be replaced by:


trim(DSLink3.col1)

This is applied to the expressions in each of the selected columns. If you need to include the actual text $1 in your expression, enter it as "$$1". Parent topic: Editing multiple derivations

Release date: 2011-10-01 PDF version of this information: IBM InfoSphere DataStage and QualityStage Parallel Job Developer's Guide

4.8.2. IBM InfoSphere DataStage, Version 8.7.0

Feedback

Part of expression
With this option, only part of each selected expression is replaced rather than the whole expression. The part of the expression to be replaced is specified by a Regular Expression match.

About this task


It is possible that more that one part of an expression string could match the Regular Expression specified. If Replace all occurrences is checked, then each occurrence of a match will be updated with the replacement value specified. If it is not checked, then just the first occurrence is replaced. When replacing part of an expression, the replacement value specified can include that part of the original expression being replaced. In order to do this, the Regular

16 of 31

9/18/2013 4:50 PM

BASIC Transformer stage

http://pic.dhe.ibm.com/infocenter/iisinfsv/v8r7/advanced/print.jsp?topic=...

Expression specified must have round brackets around its value. "$1" in the replacement value will then represent that matched text. If the Regular Expression is not surrounded by round brackets, then "$1" will simply be the text "$1". For complex Regular Expression usage, subsets of the Regular Expression text can be included in round brackets rather than the whole text. In this case, the entire matched part of the original expression is still replaced, but "$1", "$2" etc can be used to refer to each matched bracketed part of the Regular Expression specified. The following is an example of the Part of expression replacement. Suppose a selected set of columns have derivations that use input columns from `DSLink3'. For example, two of these derivations could be:
DSLink3.OrderCount + 1 If (DSLink3.Total > 0) Then DSLink3.Total Else -1

You might want to protect the usage of these input columns from null values, and use a zero value instead of the null. Use the following procedure to do this.

Procedure
1. Select the columns you want to substitute expressions for. 2. Select the Part of expression option. 3. Specify a Regular Expression value of:
(DSLink3\.[a-z,A-Z,0-9]*)

4. Specify a replacement value of


NullToZero($1)

5. Click OK, to apply this to all the selected column derivations.

Results
From the examples above:
DSLink3.OrderCount + 1

would become
NullToZero(DSLink3.OrderCount) + 1

and
If (DSLink3.Total > 0) Then DSLink3.Total Else -1

would become:

17 of 31

9/18/2013 4:50 PM

BASIC Transformer stage

http://pic.dhe.ibm.com/infocenter/iisinfsv/v8r7/advanced/print.jsp?topic=...

If (NullToZero(DSLink3.Total) > 0) Then DSLink3.Total Else -1

If the Replace all occurrences option is selected, the second expression will become:
If (NullToZero(DSLink3.Total) > 0) Then NullToZero(DSLink3.Total) Else -1

The replacement value can be any form of expression string. For example in the case above, the replacement value could have been:
(If (StageVar1 > 50000) Then $1 Else ($1 + 100))

In the first case above, the expression


DSLink3.OrderCount + 1

would become:
(If (StageVar1 > 50000) Then DSLink3.OrderCount Else (DSLink3.OrderCount + 100)) + 1

Parent topic: Editing multiple derivations

Release date: 2011-10-01 PDF version of this information: IBM InfoSphere DataStage and QualityStage Parallel Job Developer's Guide

4.9. IBM InfoSphere DataStage, Version 8.7.0

Feedback

Specifying before-stage and after-stage subroutines


About this task
You can specify BASIC routines to be executed before or after the stage has processed the data. To specify a routine, click the stage properties button in the toolbar to open the Stage Properties dialog box. The General tab contains the following fields: Before-stage subroutine and Input Value. Contain the name (and value) of a subroutine that is executed before the stage starts to process any data. After-stage subroutine and Input Value. Contain the name (and value) of a subroutine that is executed after the stage has processed the data. Choose a routine from the drop-down list box. This list box contains all the built routines defined as a Before/After Subroutine under the Routines branch in the Repository. Enter an appropriate value for the routine's input argument in the Input Value field.

18 of 31

9/18/2013 4:50 PM

BASIC Transformer stage

http://pic.dhe.ibm.com/infocenter/iisinfsv/v8r7/advanced/print.jsp?topic=...

If you choose a routine that is defined in the Repository, but which was edited but not compiled, a warning message reminds you to compile the routine when you close the Transformer stage dialog box. If you installed or imported a job, the Before-stage subroutine or After-stage subroutine field might reference a routine that does not exist on your system. In this case, a warning message appears when you close the dialog box. You must install or import the "missing" routine or choose an alternative one to use. A return code of 0 from the routine indicates success, any other code indicates failure and causes a fatal error when the job is run. Parent topic: Editing BASIC transformer stages

Release date: 2011-10-01 PDF version of this information: IBM InfoSphere DataStage and QualityStage Parallel Job Developer's Guide

4.10. IBM InfoSphere DataStage, Version 8.7.0

Feedback

Defining constraints and handling reject links


You can define a constraint to define limits for output data. You can also specify reject links.

About this task


You can define limits for output data by specifying a constraint. Constraints are expressions and you can specify a constraint for each output link from a Transformer stage. You can also specify that a particular link is to act as a reject link. Reject links output rows that have not been written on any other output links from the Transformer stage because they have failed or constraints or because a write failure has occurred. To define a constraint or specify an otherwise link, do one of the following: Select an output link and click the constraints button. Double-click the output link's constraint entry field. Choose Constraints from the background or header shortcut menus. A dialog box appears which allows you either to define constraints for any of the Transformer output links or to define a link as an reject link. Define a constraint by entering a expression in the Constraint field for that link. Once you have done this, any constraints will appear below the link's title bar in the Transformer Editor. This constraint expression will then be checked against the row data at runtime. If the data does not satisfy the constraint, the row will not be written to that link. It is also possible to define a link which can be used to catch these rows which have been rejected from a previous link. A reject link can be defined by choosing Yes in the Reject Row field and setting the Constraint field as follows:

19 of 31

9/18/2013 4:50 PM

BASIC Transformer stage

http://pic.dhe.ibm.com/infocenter/iisinfsv/v8r7/advanced/print.jsp?topic=...

To catch rows which are rejected from a specific output link, set the Constraint field to linkname.REJECTED. This will be set whenever a row is rejected on the linkname link, whether because the row fails to match a constraint on that output link, or because a write operation on the target fails for that row. Note that such an otherwise link should occur after the output link from which it is defined to catch rejects. To catch rows which caused a write failures on an output link, set the Constraint field to linkname.REJECTEDCODE. The value of linkname.REJECTEDCODE will be non-zero if the row was rejected due to a write failure or 0 (DSE.NOERROR) if the row was rejected due to the link constraint not being met. When editing the Constraint field, you can set return values for linkname.REJECTEDCODE by selecting from the Expression Editor Link Variables > Constants... menu options. These give a range of errors, but note that most write errors return DSE.WRITERROR. In order to set a reject constraint which differentiates between a write failure and a constraint not being met, a combination of the linkname.REJECTEDCODE and linkname.REJECTED flags can be used. For example: To catch rows which have failed to be written to an output link, set the Constraint field to linkname.REJECTEDCODE To catch rows which do not meet a constraint on an output link, set the Constraint field to linkname.REJECTEDCODE = DSE.NOERROR AND linkname.REJECTED To catch rows which have been rejected due a a constraint or write error, set the Constraint field to linkname.REJECTED As a "catch all", the Constraint field can be left blank. This indicates that this otherwise link will catch all rows which have not been successfully written to any of the output links processed up to this point. Therefore, the otherwise link should be the last link in the defined processing order. Any other Constraint can be defined. This will result in the number of rows written to that link (that is, rows which satisfy the constraint) to be recorded in the job log as "rejected rows". Note: Due to the nature of the "catch all" case above, you should only use one reject link whose Constraint field is blank. To use multiple reject links, you should define them to use the linkname.REJECTED flag detailed in the first case above. Parent topic: Editing BASIC transformer stages

Release date: 2011-10-01 PDF version of this information: IBM InfoSphere DataStage and QualityStage Parallel Job Developer's Guide

4.11. IBM InfoSphere DataStage, Version 8.7.0

Feedback

Specifying link order


You can specify links to be in a particular order.

20 of 31

9/18/2013 4:50 PM

BASIC Transformer stage

http://pic.dhe.ibm.com/infocenter/iisinfsv/v8r7/advanced/print.jsp?topic=...

About this task


You can specify the order in which output links process a row. The initial order of the links is the order in which they are added to the stage.

Procedure
1. Do one of the following: Click the output link execution order button on the Transformer Editor toolbar. Choose output link reorder from the background shortcut menu. Click the stage properties button in the Transformer toolbar or choose stage properties from the background shortcut menu and click on the stage page Link Ordering tab. The Link Ordering tab appears: 2. Use the arrow buttons to rearrange the list of links in the execution order required. 3. When you are happy with the order, click OK. Note: Although the link ordering facilities mean that you can use a previous output column to derive a subsequent output column, this is not recommended, and you will receive a warning if you do so. Parent topic: Editing BASIC transformer stages

Release date: 2011-10-01 PDF version of this information: IBM InfoSphere DataStage and QualityStage Parallel Job Developer's Guide

4.12. IBM InfoSphere DataStage, Version 8.7.0

Feedback

Defining local stage variables


You can declare a stage variable.

About this task


You can declare and use your own variables within a BASIC Transformer stage. Such variables are accessible only from the BASIC Transformer stage in which they are declared. They can be used as follows: They can be assigned values by expressions. They can be used in expressions which define an output column derivation. Expressions evaluating a variable can include other variables or the variable being evaluated itself. Any stage variables you declare are shown in a table in the right pane of the links area. The table looks similar to an output link. You can display or hide the table by clicking the

21 of 31

9/18/2013 4:50 PM

BASIC Transformer stage

http://pic.dhe.ibm.com/infocenter/iisinfsv/v8r7/advanced/print.jsp?topic=...

Stage Variable button in the Transformer toolbar or choosing Stage Variable from the background shortcut menu. Note: Stage variables are not shown in the output link meta data area at the bottom of the right pane. The table lists the stage variables together with the expressions used to derive their values. Link lines join the stage variables with input columns used in the expressions. Links from the right side of the table link the variables to the output columns that use them.

Procedure
1. Do one of the following: Click the stage properties button in the Transformer toolbar. Choose stage properties from the background shortcut menu. The Transformer Stage Properties dialog box appears. 2. Click the Variables tab on the General page. The Variables tab contains a grid showing currently declared variables, their initial values, and an optional description. Use the standard grid controls to add new variables. Variable names must begin with an alphabetic character (a-z, A-Z) and can only contain alphanumeric characters (a-z, A-Z, 0-9). Ensure that the variable does not use the name of any BASIC keywords.

Results
Variables entered in the Stage Properties dialog box appear in the Stage Variable table in the links pane. You perform most of the same operations on a stage variable as you can on an output column (see Defining Output Column Derivations). A shortcut menu offers the same commands. You cannot, however, paste a stage variable as a new column, or a column as a new stage variable. Parent topic: Editing BASIC transformer stages

Release date: 2011-10-01 PDF version of this information: IBM InfoSphere DataStage and QualityStage Parallel Job Developer's Guide

5. IBM InfoSphere DataStage, Version 8.7.0

Feedback

The InfoSphere DataStage expression editor


The InfoSphere DataStage Expression Editor helps you to enter correct expressions when you edit BASIC Transformer stages. The Expression Editor can: Facilitate the entry of expression elements

22 of 31

9/18/2013 4:50 PM

BASIC Transformer stage

http://pic.dhe.ibm.com/infocenter/iisinfsv/v8r7/advanced/print.jsp?topic=...

Complete the names of frequently used variables Validate variable names and the complete expression The Expression Editor can be opened from: Output link Derivation cells Stage variable Derivation cells Constraint dialog box Transform dialog box in the Designer Expression format Entering expressions Completing variable names Validating the expression Exiting the expression editor There are a few ways in which you can exit the expression editor. Configuring the expression editor Parent topic: BASIC Transformer stage

Release date: 2011-10-01 PDF version of this information: IBM InfoSphere DataStage and QualityStage Parallel Job Developer's Guide

5.1. IBM InfoSphere DataStage, Version 8.7.0

Feedback

Expression format
The format of an expression is as follows:
KEY: something_like_this is a token something_in_italics is a terminal, that is, does not break down any further | is a choice between tokens [ is an optional part of the construction "XXX" is a literal token (that is, use XXX not including the quotes) ================================================= expression ::= function_call | variable_name | other_name | constant | unary_expression | binary_expression | if_then_else_expression | substring_expression | "(" expression ")"

23 of 31

9/18/2013 4:50 PM

BASIC Transformer stage

http://pic.dhe.ibm.com/infocenter/iisinfsv/v8r7/advanced/print.jsp?topic=...

function_call ::= function_name "(" [argument_list] ")" argument_list ::= expression | expression "," argument_list function_name ::= name of a built-in function | name of a user-defined_function variable_name ::= job_parameter name | stage_variable_name | link_variable name other_name ::= name of a built-in macro, system variable, and so on. constant ::= numeric_constant | string_constant numeric_constant ::= ["+" | "-"] digits ["." [digits]] ["E" | "e" ["+" | "-"] digits] string_constant ::= "'" [characters] "'" | """ [characters] """ | "\" [characters] "\" unary_expression ::= unary_operator expression unary_operator ::= "+" | "-" binary_expression ::= expression binary_operator expression binary_operator ::= arithmetic_operator | concatenation_operator | matches_operator | relational_operator | logical_operator arithmetic_operator ::= "+" | "-" | "*" | "/" | "^" concatenation_operator ::= ":" matches_operator ::= "MATCHES" relational_operator ::= " =" |"EQ" | "<>" | "#" | "NE" | ">" | "GT" | ">=" | "=>" | "GE" | "<" | "LT" | "<=" | "=<" | "LE" logical_operator ::= "AND" | "OR" if_then_else_expression ::= "IF" expression "THEN" expression "ELSE" expression substring_expression ::= expression "[" [expression ["," expression] "]" field_expression ::= expression "[" expression "," expression "," expression "]" /* That is, always 3 args

Note: keywords like "AND" or "IF" or "EQ" might be in any case Parent topic: The InfoSphere DataStage expression editor

Release date: 2011-10-01 PDF version of this information: IBM InfoSphere DataStage and QualityStage Parallel Job Developer's Guide

5.2. IBM InfoSphere DataStage, Version 8.7.0

Feedback

Entering expressions
About this task
Whenever the insertion point is in an expression box, you can use the Expression Editor to suggest the next element in your expression. Do this by right-clicking the box, or by clicking the Suggest button to the right of the box. This opens the Suggest Operand or

24 of 31

9/18/2013 4:50 PM

BASIC Transformer stage

http://pic.dhe.ibm.com/infocenter/iisinfsv/v8r7/advanced/print.jsp?topic=...

Suggest Operator menu. Which menu appears depends on context, that is, whether you should be entering an operand or an operator as the next expression element. You will be offered a different selection on the Suggest Operand menu depending on whether you are defining key expressions, derivations and constraints, or a custom transform. The Suggest Operator menu is always the same. Parent topic: The InfoSphere DataStage expression editor

Release date: 2011-10-01 PDF version of this information: IBM InfoSphere DataStage and QualityStage Parallel Job Developer's Guide

5.3. IBM InfoSphere DataStage, Version 8.7.0

Feedback

Completing variable names


About this task
The Expression Editor stores variable names. When you enter a variable name you have used before, you can type the first few characters, then press F5. The Expression Editor completes the variable name for you. If you enter the name of an input link followed by a period, for example, DailySales., the Expression Editor displays a list of the column names of that link. If you continue typing, the list selection changes to match what you type. You can also select a column name using the mouse. Enter a selected column name into the expression by pressing Tab or Enter. Press Esc to dismiss the list without selecting a column name. Parent topic: The InfoSphere DataStage expression editor

Release date: 2011-10-01 PDF version of this information: IBM InfoSphere DataStage and QualityStage Parallel Job Developer's Guide

5.4. IBM InfoSphere DataStage, Version 8.7.0

Feedback

Validating the expression


About this task
When you have entered an expression in the Transformer Editor, press Enter to validate it. The Expression Editor checks that the syntax is correct and that any variable names used are acceptable to the compiler. When using the Expression Editor to define a custom transform, click OK to validate the expression. If there is an error, a message appears and the element causing the error is highlighted in the expression box. You can either correct the expression or close the Transformer Editor or Transform dialog box.

25 of 31

9/18/2013 4:50 PM

BASIC Transformer stage

http://pic.dhe.ibm.com/infocenter/iisinfsv/v8r7/advanced/print.jsp?topic=...

Within the Transformer Editor, the invalid expressions are shown in red. (In some cases this might simply mean that the expression does not meet the strict usage pattern rules of the server engine, but will actually function correctly.) Parent topic: The InfoSphere DataStage expression editor

Release date: 2011-10-01 PDF version of this information: IBM InfoSphere DataStage and QualityStage Parallel Job Developer's Guide

5.5. IBM InfoSphere DataStage, Version 8.7.0

Feedback

Exiting the expression editor


There are a few ways in which you can exit the expression editor.

About this task


You can exit the Expression Editor in the following ways: Press Esc (which discards changes). Press Return (which accepts changes). Click outside the Expression Editor box (which accepts changes). Parent topic: The InfoSphere DataStage expression editor

Release date: 2011-10-01 PDF version of this information: IBM InfoSphere DataStage and QualityStage Parallel Job Developer's Guide

5.6. IBM InfoSphere DataStage, Version 8.7.0

Feedback

Configuring the expression editor


About this task
You can resize the Expression Editor window by dragging. The next time you open the expression editor in the same context (for example, editing output columns) on the same client, it will have the same size. The Expression Editor is configured by editing the Designer options. This allows you to specify how `helpful' the expression editor is. For more information, see InfoSphere DataStage Designer Client Guide. Parent topic: The InfoSphere DataStage expression editor

Release date: 2011-10-01 PDF version of this information: IBM InfoSphere DataStage and QualityStage Parallel Job Developer's Guide

26 of 31

9/18/2013 4:50 PM

BASIC Transformer stage

http://pic.dhe.ibm.com/infocenter/iisinfsv/v8r7/advanced/print.jsp?topic=...

6. IBM InfoSphere DataStage, Version 8.7.0

Feedback

BASIC Transformer stage properties


The Transformer stage has a Properties dialog box which allows you to specify details about how the stage operates. The Transform Stage dialog box has three pages: Stage page. This is used to specify general information about the stage. Input page. This is where you specify details about the data input to the Transformer stage. Output page. This is where you specify details about the output links from the Transformer stage. BASIC Transformer stage: Stage page BASIC Transformer stage: Input page BASIC Transformer stage: Output page Parent topic: BASIC Transformer stage

Release date: 2011-10-01 PDF version of this information: IBM InfoSphere DataStage and QualityStage Parallel Job Developer's Guide

6.1. IBM InfoSphere DataStage, Version 8.7.0

Feedback

BASIC Transformer stage: Stage page


The Stage page has four tabs: General. Allows you to enter an optional description of the stage and specify a before-stage or after-stage subroutine. Variables. Allows you to set up stage variables for use in the stage. Link Ordering. Allows you to specify the order in which the output links will be processed. Advanced. Allows you to specify how the stage executes. The General tab is described in "Before-Stage and After-Stage Routines" . The Variables tab is described in "Defining Local Stage Variables". The Link Ordering tab is described in "Specifying Link Order". BASIC Transformer stage: Advanced tab Parent topic: BASIC Transformer stage properties

Release date: 2011-10-01

27 of 31

9/18/2013 4:50 PM

BASIC Transformer stage

http://pic.dhe.ibm.com/infocenter/iisinfsv/v8r7/advanced/print.jsp?topic=...

PDF version of this information: IBM InfoSphere DataStage and QualityStage Parallel Job Developer's Guide

6.1.1. IBM InfoSphere DataStage, Version 8.7.0

Feedback

BASIC Transformer stage: Advanced tab


The Advanced tab is the same as the Advanced tab of the generic stage editor as described in "Advanced Tab". This tab allows you to specify the following: Execution Mode. The stage can execute in parallel mode or sequential mode. In parallel mode the data is processed by the available nodes as specified in the Configuration file, and by any node constraints specified on the Advanced tab. In sequential mode the data is processed by the conductor node. Combinability mode. This is Auto by default, which allows InfoSphere DataStage to combine the operators that underlie parallel stages so that they run in the same process if it is sensible for this type of stage. Preserve partitioning. This is set to Propagate by default, this sets or clears the partitioning in accordance with what the previous stage has set. You can also select Set or Clear. If you select Set, the stage will request that the next stage preserves the partitioning as is. Node pool and resource constraints. Select this option to constrain parallel execution to the node pool or pools or resource pool or pools specified in the grid. The grid allows you to make choices from drop down lists populated from the Configuration file. Node map constraint. Select this option to constrain parallel execution to the nodes in a defined node map. You can define a node map by typing node numbers into the text box or by clicking the browse button to open the Available Nodes dialog box and selecting nodes from there. You are effectively defining a new node pool for this stage (in addition to any node pools defined in the Configuration file). Parent topic: BASIC Transformer stage: Stage page

Release date: 2011-10-01 PDF version of this information: IBM InfoSphere DataStage and QualityStage Parallel Job Developer's Guide

6.2. IBM InfoSphere DataStage, Version 8.7.0

Feedback

BASIC Transformer stage: Input page


The Input page allows you to specify details about data coming into the Transformer stage. The Transformer stage can have only one input link. The General tab allows you to specify an optional description of the input link. The Partitioning tab allows you to specify how incoming data is partitioned. This is the same as the Partitioning tab in the generic stage editor described in "Partitioning Tab".

28 of 31

9/18/2013 4:50 PM

BASIC Transformer stage

http://pic.dhe.ibm.com/infocenter/iisinfsv/v8r7/advanced/print.jsp?topic=...

BASIC Transformer stage: Partitioning tab Parent topic: BASIC Transformer stage properties

Release date: 2011-10-01 PDF version of this information: IBM InfoSphere DataStage and QualityStage Parallel Job Developer's Guide

6.2.1. IBM InfoSphere DataStage, Version 8.7.0

Feedback

BASIC Transformer stage: Partitioning tab


The Partitioning tab allows you to specify details about how the incoming data is partitioned or collected when input to the BASIC Transformer stage. It also allows you to specify that the data should be sorted on input. By default the BASIC Transformer stage will attempt to preserve partitioning of incoming data, or use its own partitioning method according to what the previous stage in the job dictates. If the BASIC Transformer stage is operating in sequential mode, it will first collect the data before writing it to the file using the default collection method. The Partitioning tab allows you to override this default behavior. The exact operation of this tab depends on: Whether the stage is set to execute in parallel or sequential mode. Whether the preceding stage in the job is set to execute in parallel or sequential mode. If the BASIC Transformer stage is set to execute in parallel, then you can set a partitioning method by selecting from the Partitioning type drop-down list. This will override any current partitioning. If the BASIC Transformer stage is set to execute in sequential mode, but the preceding stage is executing in parallel, then you can set a collection method from the Collector type drop-down list. This will override the default collection method. The following partitioning methods are available: (Auto). InfoSphere DataStage attempts to work out the best partitioning method depending on execution modes of current and preceding stages and how many nodes are specified in the Configuration file. This is the default method for the Transformer stage. Entire. Each file written to receives the entire data set. Hash. The records are hashed into partitions based on the value of a key column or columns selected from the Available list. Modulus. The records are partitioned using a modulus function on the key column selected from the Available list. This is commonly used to partition on tag fields. Random. The records are partitioned randomly, based on the output of a random

29 of 31

9/18/2013 4:50 PM

BASIC Transformer stage

http://pic.dhe.ibm.com/infocenter/iisinfsv/v8r7/advanced/print.jsp?topic=...

number generator. Round Robin . The records are partitioned on a round robin basis as they enter the stage. Same. Preserves the partitioning already in place. DB2. Replicates the DB2 partitioning method of a specific DB2 table. Requires extra properties to be set. Access these properties by clicking the properties button. Range. Divides a data set into approximately equal size partitions based on one or more partitioning keys. Range partitioning is often a preprocessing step to performing a total sort on a data set. Requires extra properties to be set. Access these properties by clicking the properties button. The following Collection methods are available: (Auto). This is the default method for the Transformer stage. Normally, when you are using Auto mode, InfoSphere DataStage will eagerly read any row from any input partition as it becomes available. Ordered. Reads all records from the first partition, then all records from the second partition, and so on. Round Robin . Reads a record from the first input partition, then from the second partition, and so on. After reaching the last partition, the operator starts over. Sort Merge. Reads records in an order based on one or more columns of the record. This requires you to select a collecting key column from the Available list. The Partitioning tab also allows you to specify that data arriving on the input link should be sorted. The sort is always carried out within data partitions. If the stage is partitioning incoming data the sort occurs after the partitioning. If the stage is collecting data, the sort occurs before the collection. The availability of sorting depends on the partitioning method chosen. Select the check boxes as follows: Perform Sort. Select this to specify that data coming in on the link should be sorted. Select the column or columns to sort on from the Available list. Stable. Select this if you want to preserve previously sorted data sets. This is the default. Unique. Select this to specify that, if multiple records have identical sorting key values, only one record is retained. If stable sort is also set, the first record is retained. If NLS is enabled an additional button opens a dialog box allowing you to select a locale specifying the collate convention for the sort. You can also specify sort direction, case sensitivity, whether sorted as ASCII or EBCDIC, and whether null columns will appear first or last for each column. Where you are using a keyed partitioning method, you can also specify whether the column is used as a key for sorting, for partitioning, or for both. Select the column in the Selected list and right-click to invoke the shortcut menu. Parent topic: BASIC Transformer stage: Input page

30 of 31

9/18/2013 4:50 PM

BASIC Transformer stage

http://pic.dhe.ibm.com/infocenter/iisinfsv/v8r7/advanced/print.jsp?topic=...

Release date: 2011-10-01 PDF version of this information: IBM InfoSphere DataStage and QualityStage Parallel Job Developer's Guide

6.3. IBM InfoSphere DataStage, Version 8.7.0

Feedback

BASIC Transformer stage: Output page


The Output Page has a General tab which allows you to enter an optional description for each of the output links on the BASIC Transformer stage. The Advanced tab allows you to change the default buffering settings for the output links. Parent topic: BASIC Transformer stage properties

Release date: 2011-10-01 PDF version of this information: IBM InfoSphere DataStage and QualityStage Parallel Job Developer's Guide

31 of 31

9/18/2013 4:50 PM

You might also like