SAP BO Data Services Transforms
SAP BO Data Services Transforms
Transforms:
Query
Case
Merge
Row_Generation
Key_Generation
Date_Generation
Effective_Date
Table_Comparison
Hierarchy flattening
History_Perserving
Pivot
Reverse Pivot
Map_Operation
Validation
SQL
XML_Map
Data_Transfer
Text Data Processing
1. Query Transform:
Query Transform is similar to a SQL SELECT statement.
It can perform the following operations-
Choose (filter) the data to extract from sources
Join data from multiple sources
Map columns from input to output schemas
Perform transformations and functions on the data
Add new columns, nested schemas, and function results to the
output schema
Assign primary keys to output columns
Different functions can be performed using the query transform like
LOOKUP, AGGREGATE,CONVERSIONS, etc.
2. Merge Transform:
Merge Transform combines the rows from two or more sources into a
single target
The output schema will be the same as the input schema source objects
All the sources should have –
Same number of columns
Same data types of columns
Same column names
The transform does not strip out duplicate rows
3. Case Transform:
Case transform is used to route the Input coming from the Source to two or
more targets based upon the given condition.
Data Inputs
Only one data flow source is allowed.
Data Outputs
The output of the Case transform is connected with another object
in the workspace. Choose a case label from a pop-up menu. Each label represents a case
expression (WHERE clause) created in the Case editor.
Source:
Target:
4. Row Generation:
This transform doesn’t need an input
Generates a column filled with integer values starting at zero and incrementing by
one to the end value you specify.
You can give the starting row no. as per your requirement
5. Effective_Date:
Generates an additional Effective_To_Column based on the primary key’s
“effective date”
For this the Data Input should have an effective date column.
Effective_Date allow you to indicate changes to information over time. This
Can be used to implement SCD Type 3
6. Pivot Transform:
Pivot transform creates a new row for each value in columns that we
identify as a pivot columns
It can rearrange the data into a more simple and manageable form, with
all data in a single column, without losing the category information
7. Reverse Pivot Transform:
Reverse pivot transform creates a single row of data from several existing
rows
It allows us to combine data from several rows into a single row by
creating new columns
It can rearrange the data into a more searchable form without losing the
category information
select e.empno,e.ename,e.sal,d.dname from emp e ,dept d where d
.loc in ('NEW
YORK','DALLAS')and e.deptno = d.deptno and e.empno in (select e.
empno
from emp e where e.job in ('MANAGER','ANALYST') and e.comm is n
ull)
order by d.loc asc;
SAVE the work.
Run the JOB.
Target table's data after the JOB run.
138 Views Products: sap_data_services, sap_data_integrator
Start Date : 2000.01.01
End Date : 2100.12.31
Increment: Daly
Bring the Query_Transform and map the columns as shown.
Map the columns as shown below:
DATE:Date_Generation.DI_GENERATED_DATE
WEEK: week_in_year(DI_GENERATED_DATE)
MONTH:month(DI_GENERATED_DATE)
QUARTER: quarter(Date_Generation.DI_GENERATED_DATE )
YEAR: year(DI_GENERATED_DATE)
Basically we use table comparison for target based Change data capture (CDC).
It is used to capture a changed data that is present in the source but not in target and/or changes in
source but not in target.
There many methods available for source and target bases CDC.
Follow the link below (for Oracle DB) for the
same: https://docs.oracle.com/cd/B28359_01/server.111/b28313/cdc.htm
1.1 Row-by-row Select
We can choose this option in following case
1.1.1Option 1 - Normal operations
Select this option to have the transform look up the target table using SQL every time it receives an
input row. This option is best if the target table is large compared to the number of rows the transform
will receive as input. Make sure the appropriate indexes exist on the lookup columns in the target
table for optimal performance.
1.1.2Option 2 - When we need to consider a trailing blank while comparison
While comparison if in source field value is with trailing blanks then BODS will treat it as different
value i.e. BODS will not apply any trim function for that Particular field.
Example:
Consider an account number '1234 ' (I.e with trailing space) in source and '1234' in target. In such
case, BODS considers both the account numbers as separate in case of row by row but in case of
cache comparison it will consider both account numbers to be the same. The latter is explained
below.
1.2Cached Comparison:
Select this option to load the comparison table into memory. In this case, queries to the comparison
table access memory rather than the actual table. However, the table must fit in the available memory.
This option is best when the table fits into memory and you are comparing the entire target table.
With the help of this option BODS will capture target data into internal Cache which is again very
faster to access.
1.3Sorted input:
Often the most efficient solution when dealing with large data sources, because DS reads the
comparison table only once. This option can only be selected when it is guaranteed that the incoming
data are sorted in exactly the same order as the primary key in the comparison table. In most cases
incoming data must be pre-sorted, e.g. using a Query transform with an Order-by (that may be
pushed down to the underlying database), to take advantage of this functionality.
2) Auto Correct Load option :
Auto-correct load is used to avoid loading of duplicate data in target table using SAP BODS.
Basically, Auto-Correct load is used to implement SCD-1 when we are not using table comparison
feature of SAP BODS. There are many options available if you are using Auto -Correct load which can
help to optimize the performance.
If you are not choosing any Auto -correct load option then BODS will generate simple insert query.
Snapshot and query generated by BODS is as shown below:
Query :(BODS will generate Insert query,so it may insert duplicate data.)
INSERT /*+ APPEND */ INTO "DS_1"."TEST_AUTO_CORRECT_LOAD"
( "ACCOUNT_NO" , "STAT" )
SELECT "TABLE1"."ACCOUNT_NO" ACCOUNT_NO , "TABLE1"."STAT" STAT
FROM "DS_2"."TABLE1" "TABLE1";
Many options are available with BODS Auto-correct load. Some of them are explained in below
section:
2.1Allow merge set to Yes & Ignore Columns with null set to No
While going for auto-correct load, if you select ‘Allow merge option to yes & Ignore columns with null
to No’, BODS will generate Merge Query to maintain SCD 1, ignoring null values if any.
So by choosing ‘Allow merge option to Yes’, BODS will insert the new rows coming from source to
target and update the existing rows from source into target.
Snapshot for same is as below:
Query generated by BODS job is as follows:
MERGE INTO "DS_1"."TEST_AUTO_CORRECT_LOAD" s
USING
(SELECT "TABLE_1"."ACCOUNT_NO" ACCOUNT_NO , "TABLE_1"."STAT" STAT
FROM "PSEUDO_PROD"."TABLE_1" "TABLE_1"
)n
ON ((s.ACCOUNT_NO = n.ACCOUNT_NO))
WHEN MATCHED THEN
UPDATE SET s."STAT" = n.STAT
WHEN NOT MATCHED THEN
INSERT /*+ APPEND */ (s."ACCOUNT_NO", s."STAT" )
VALUES (n.ACCOUNT_NO , n.STAT)
Here, the query generated by BODS is Merge query which will insert the new rows coming from
source and update the existing rows present in target.
This query will be pushed down to database hence it will be an optimized one.
2.2Allow merge set to Yes & Ignore Columns with null set to Yes
Query generated by BODS job is as follows:
MERGE INTO "DS_1"."TEST_AUTO_CORRECT_LOAD" s
USING
(SELECT "TABLE_1"."ACCOUNT_NO" ACCOUNT_NO , "TABLE_1"."STAT" STAT
FROM "PSEUDO_PROD"."TABLE_1" "TABLE_1"
)n
ON ((s.ACCOUNT_NO = n.ACCOUNT_NO))
WHEN MATCHED THEN
UPDATE SET s."STAT" = NVL(n.STAT,S."STAT")
WHEN NOT MATCHED THEN
INSERT /*+ APPEND */ (s."ACCOUNT_NO", s."STAT" )
VALUES (n.ACCOUNT_NO , n.STAT).
Snap Shot for same as follows:
As seen in the snapshot above, BODS adds NVL function to consider null values if any.
If you select ‘Allow Merge to No and Auto correct load to Yes’ then BODS will generate PL/SQL code
which will again be very helpful when considering performance.
Below is the code generated by BODS:
BEGIN
DECLARE
CURSOR s_cursor IS
SELECT "TABLE_1"."ACCOUNT_NO" ACCOUNT_NO , "TABLE_1"."STAT" STAT
FROM "PSEUDO_PROD"."TABLE_1" "TABLE_1"
;
s_row s_cursor%ROWTYPE;
CURSOR t_cursor(p_ACCOUNT_NO s_row.ACCOUNT_NO%TYPE) IS
SELECT "ACCOUNT_NO" ACCOUNT_NO, "STAT" STAT, rowid
FROM "DS_1"."TEST_AUTO_CORRECT_LOAD"
WHERE (p_ACCOUNT_NO = "ACCOUNT_NO");
t_row t_cursor%ROWTYPE;
commit_count NUMBER;
BEGIN
commit_count := 0;
:processed_row_count := 0;
FOR r_reader IN
(SELECT "TABLE_1"."ACCOUNT_NO" ACCOUNT_NO , "TABLE_1"."STAT" STAT
FROM "PSEUDO_PROD"."TABLE_1" "TABLE_1"
) LOOP
OPEN t_cursor(r_reader.ACCOUNT_NO);
FETCH t_cursor INTO t_row;
IF t_cursor%NOTFOUND THEN
INSERT INTO "DS_1"."TEST_AUTO_CORRECT_LOAD"("ACCOUNT_NO", "STAT" )
VALUES (r_reader.ACCOUNT_NO , r_reader.STAT);
commit_count := commit_count + 1;
:processed_row_count := :processed_row_count + 1;
ELSE
LOOP
UPDATE "DS_1"."TEST_AUTO_CORRECT_LOAD" SET
"STAT" = r_reader.STAT
WHERE rowid = t_row.rowid;
commit_count := commit_count + 1;
:processed_row_count := :processed_row_count + SQL%ROWCOUNT;
IF (commit_count = 1000) THEN
COMMIT; commit_count := 0;
END IF;
FETCH t_cursor INTO t_row;
EXIT WHEN t_cursor%NOTFOUND;
END LOOP;
END IF;
CLOSE t_cursor;
IF (commit_count = 0) THEN
COMMIT; commit_count := 0;
END IF;
END LOOP;
COMMIT;
END;
END;
Snapshot for same is as below.
3) Array fetch & Rows par commit option
3.1Array Fetch size:
The array fetch feature lowers the number of database requests by "fetching" multiple rows (an array)
of data with each request. The number of rows to be fetched per request is entered in the Array fetch
size option on any source table editor or SQL transform editor. The default setting is 1000, which
means that with each database request, the software will automatically fetch 1000 rows of data from
the source database. The maximum array fetch size that can be specified is 5000 bytes.
Suggestion while using array fetch size option:
The optimal number for Array fetch size depends on the size of your table rows (the
number and type of columns involved) as well as the network round-trip time involved in
the database requests and responses. If your computing environment is very powerful,
(which means that the computers running the Job Server, related databases, and
connections are extremely fast), then try higher values for Array fetch size and test the
performance of your jobs to find the best setting.
3.2Rows per commit:
‘Rows per commit’ specifies the transaction size in number of rows. If set to 1000, Data
Integrator sends a commit to the underlying database for every 1000 rows.
‘Rows per commit’ for regular loading defaults to 1000 rows. Setting the Rows per
commit value significantly affects job performance. Adjust the rows per commit value in
the target table editor's Options tab, noting the following rules:
Do not use negative number signs and other non-numeric characters.
If you enter nothing or 0, the text box will automatically display 1000.
If you enter a number larger than 5000, the text box automatically displays 5000.
It is recommended that you set rows per commit between 500 and 2000 for best performance. You
might also want to calculate a value. To do this, use the following formula:
max_IO_size/row size (in bytes)
For most platforms, max_IO_size is 64K. For Solaris, max_IO_size is 1024K.
Note that even with a value greater than one set for Rows per commit, SAP Data Services will submit
data one row at a time if the following conditions exist:
You are loading into a database (this scenario does not apply to Oracle databases), and have a
column with a LONG datatype attribute.
You are using an overflow file where the transaction failed. However, once all the rows are loaded
successfully, the commit size reverts to the number you entered. In this case, depending on how often
a load error happens, performance might become worse than setting Rows per commit to 1.
Let us consider a scenario while choosing a different array fetch size and rows per commit value.
Let’s say you are setting Array fetch size value to 100 and rows per commit to 500 and the total
number of rows in source are 800. Suppose the job terminates after processing 700 rows, then the job
will enter only 500 rows into target because we have set rows per commit value to 500. The remaining
200 rows will not enter into the target, while the same 200 rows will be fetched by the same job. It will
cause an insufficient data in target.
So, setting array fetch size & rows per commit value is very important while designing a job.
As both the tables are from different databases, hence BODS generates multiple SQL which is shown
in snapshot below.
Also, a DB link is present between both the databases.(DB link is edwrep_to_pseudoprod). This link
helps us to generate Push-down SQL thereby generating an optimized SQL.
In order to achieve this, the following advance setting needs to be done in the Datastore of BODS.
Right click on DS_EDW_REP Datastore -> EDIT -> Advanced ->scroll down to Linked DataStores .
Click on Linked Datastores you will get following window :
Choose a datastore that needs to be linked with other datastore, In this case, it is DS_1.
Press OK. Provide proper DB link name from the dropdown
Press OK again. After saving the job, check the optimized SQL generated by BODS, which is as
follows.
Add the job servers into predefined server group and click Apply.