0% found this document useful (0 votes)
30 views75 pages

DataStageOnlyScenario Questions

The document outlines various data processing requirements and solutions using DataStage, SQL, and other methods. It includes tasks such as aggregating data, pivoting, removing duplicates, and managing file inputs and outputs. Each requirement is accompanied by a detailed step-by-step solution, showcasing how to implement the desired data transformations and manipulations.

Uploaded by

pbhavani2131
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views75 pages

DataStageOnlyScenario Questions

The document outlines various data processing requirements and solutions using DataStage, SQL, and other methods. It includes tasks such as aggregating data, pivoting, removing duplicates, and managing file inputs and outputs. Each requirement is accompanied by a detailed step-by-step solution, showcasing how to implement the desired data transformations and manipulations.

Uploaded by

pbhavani2131
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 75

Req-1# I have my input data as below #

input data:
eid|ename|sal
101|abc|200
102|xyz|300
103|lmn|500

Find the output data as below:

eid|ename|sal|total_Sal
101|abc|200|1000
102|xyz|300|1000
103|lmn|500|1000

In transformer stage 2 output links and create additional common column like bellow screens shot.

1
In aggregator stage provide the values like below.

How can we implement the same scenario in SQL?


select
a.eid,a.ename,a.sal,b.total_sal
from (select eid,ename,sal,1 as cc1 from agg_data) a
inner join (select 1 as cc1,sum(sal) total_sal from agg_data) b
on a.cc1=b.cc1;

(Or)

SELECT EID, ENAME, SAL, SUM(SAL) OVER() TOTAL_SAL FROM AGG_DATA;

# Req-2#
Input :
source,destination,km
hyd,bang,1000
delhi,chennai,1500
chennai,bang,600
bang,hyd,1000
bombay,pune,1000
bang,chennai,600
Output :
source,destination,km
hyd,bang,1000
delhi,chennai,1500
chennai,bang,600
bombay,pune,1000
see here hyd to banglore distance is 1000km
another rows is also banglore to hyd 1000km
so need to delete rows like this.

2
Solution:

Step1: Read the source data as is


Step2: Sort the keys using sort stage based on like below

Step3: convert from parallel to sequential in transformer stage.

3
Step4: apply the logic like above, compile run the job.
https://datastageinfoguide.blogspot.com/2014/02/datastage-scenario-based-
questionanswer.html
DATASTAGE SCENARIO BASED QUESTION - 2 : CITY NAMES AND DISTANCE BETWEEN THEM : PROBLEM
AND SOLUTION - Wings Of Technology

2nd method Scenario:

SOURCE,DESTINATION,DISTANCE
HYD,CHN,500
CHN,HYD,500
BANG,HYD,600
HYD,BANG,600
PUN,HYD,750
HYD,PUN,750
CHN,BANG,500
BANG,CHN,500

Expected output:

HYD,CHN,500
BANG,HYD,600
PUN,HYD,750
CHN,BANG,500

4
Step1: read the data from sequential file stage

Step2: make sure set the transformer stage execution mode as sequential instead of parallel

Step3: Create 2 stage variables to maintain the temp values, no need to mention initial value, leave as it
is empty;

Step4) Create the below stage variable logic

Sv1) If sv2 = DSLink21.DESTINATION :DSLink21.SOURCE Then 0 Else 1

Sv2) DSLink21.SOURCE : DSLink21.DESTINATION

5
Step5) use the sv1 filter in Transformer constraint level, Compile and run the job

Req-3# Achieve pivoting Transformer stage & Pivot Stage?


Input data:

city state name1 name2 name3


xyz fgh Sam Dean winchester

Output:

city state name


Xyz fgh sam
Xyz fgh Dean
Xyz fgh winchester

Ref: https://datastageinfoguide.blogspot.com/2013/10/transformer-looping-functions-for.html

Using Transformer stage:

Step1) Read the source data

6
Step2) in transformer stage write the looping based on the no. of columns need to append

Loop While = @ITERATION <=3

Loopvar= If @ITERATION=1 Then DSLink3.name1 Else If @ITERATION =2 Then DSLink3.name2 Else


DSLink3.name3

Step3) compile and run

Using Oracle SQL Query:

/* can achieve by using Unpivot in Oracle */

create table pvt_data(city varchar(10), state varchar(10),name1 varchar(10), name2 varchar(10),


name3 varchar(3));

insert into pvt_data values ('xyz','fgh','sam','dean','win')

commit;

select * from (select city,state,name1,name2,name3 from pvt_data)


unpivot( name for name1 in(name1,name2,name3));

7
(Or)

/* we can achieve using UNION ALL as well */

select city,state, name1 as name from pvt_data


union all
select city,state, name2 as name from pvt_data
union all
select city,state, name3 as name from pvt_data;

Req-4# Using Pivot Stage?

Step1) read the source data


step2) Select Pivot type= Horizontal Pivot
step3) go to Pivot Properties  create a new column and provide the list of columns in Derivation
which are need to pivot

Step4) Compile and run the job.

8
Req-5# Without a key column, Pivoting the Output data in Required
Format.

Input Data :

NAME
-----------
IBM
WEBSPHERE
DATASTAGE
IBM
INFOSPHERE
DATASTAGE

Required Output
NAME
----------
IBM WEBSPHERE DATASTAGE
IBM INFOSPHERE DATASTAGE

Solution

Step1) Read the Single column data

Step2) In transform stage, create additional column and give the logic in the derivation

Derivation:

If @INROWNUM <=3 then 1 Else (Floor( (@INROWNUM - 1) /3) + 1)

9
Step3) take a Pivot enterprise stage and change to Pivot type = vertical

Step4) set the pivot options:

Step5) In the output tab, in mapping sub tab, map the required columns

Step6) compile and run the job

NAME
IBM WEBSPHERE DATASTAGE
IBM INFOSPHERE DATASTAGE

10
Req-5# covert the below data from Columns to Rows (using Pivot)
Input:

CUSTOMER_ CUSTOMER_NA JAN_EXPEN FEB_EXPEN MAR_EXPEN


ID ME SE SE SE

1 UMA 100 200 300

2 POOJITHA 200 300 400

Output Data:

CUSTOMER_ID CUSTOMER_NAME Q1EXPENSE

1 UMA 100

1 UMA 200

1 UMA 300

2 POOJITHA 200

2 POOJITHA 300

2 POOJITHA 400

Ref - https://datastageinfoguide.blogspot.com/2014/02/pivot-enterprise-stage-horizontal_11.html

This logic can able implement by using Pivot Enterprise stage and also by using SQL Queries can do.

Practice from your side

11
Req-6# Scenario:
I have a source like
COL1
A
A
B
B
B
C

TARGET LIKE
COL1 COL2
A 1
A 2
B 1
B 2
B 3
C 1

HOW TO ACHIEVE THIS OUTPUT USING STAGE VARIABLE IN TRANSFORMER STAGE?

Solution)

Step1) Read the source data

Step2) in Sort Stage, Enable to Key Change Column = True, It will generate the
additional KeyChange Column in the output.

Step3) In Transform Stage, create a stage variable like

Sv1) If lnk_keyChangeCol.keyChange =1 Then 1 Else sv1 +1

12
Step4) Compile and run the job

Req-7#) We have a source which is a sequential file with header and footer.
How to remove the header and footer while reading this file using sequential
file stage of Datastage?
Sol: Type command in putty: sed '1d;$d' file_name>new_file_name (type this in
job before job subroutine then use new file in seq stage)

Req-8#) Suppose that 4 job control by the sequencer like (job 1, job 2, job 3, job 4 )
if job 1 have 10,000 row ,after run the job only 5000 data has been loaded in target
table remaining are not loaded and your job going to be aborted then.. How can sort
out the problem
Sol: Suppose job sequencer synchronies or control 4 jobs but job 1 have problem, in this
condition should go director and check it what type of problem showing either data type
problem, warning massage, job fail or job aborted in the logs, If job fail means data type
problem or missing column action.

So u should go Run window -> Click -> Tracing->Performance or In your target table -
>general -> action-> select this option here two option

(i) On Fail -- commit , Continue

(ii) On Skip -- Commit, Continue.

13
First u check how many data already load after then select on skip option then continue and
what remaining position data not loaded then select On Fail , Continue ...... Again Run the
job defiantly u get successful massage

Req-9#) I want to process 3 files in sequentially one by one how can i do


that. while processing the files it should fetch files automatically. (discuss
at the end)

Ans: If the metadata for all the files r same then create a job having file name as parameter
then use same job in routine and call the job with different file name...or u can create
sequencer to use the job.

Parameterize the file name.


Build the job using that parameter
Build job sequencer which will call this job and will accept the parameter for file name.
Write a UNIX shell script which will call the job sequencer three times by passing different
file each time.

Req-10#Q) Runtime column propagation (RCP):


If RCP is enabled for any job and specifically for those stages whose output connects to the
shared container input then meta data will be propagated at run time so there is no need to
map it at design time.
If RCP is disabled for the job in such case OSH has to perform Import and export every time
when the job runs and the processing time job is also increased.

Then you have to manually enter all the column description in each stage.RCP- Runtime
column propagation

14
Req-11#) Question:scenario
Source: Target

Eno Ename Eno Ename


1 a,b 1 a
2 c,d 2 c
3 e,f 3 e

Sol)

Step 1) Read the data


Step2) In transformer stage use the field function

Step3) Compile and Run the job

Req-12#Question – scenario:
source has 2 fields like

COMPANY LOCATION
IBM HYD
TCS BAN
IBM CHE
HCL HYD
TCS CHE
IBM BAN
HCL BAN
HCL CHE

15
output:
COMPANY LOCATION COUNT
TCS BAN 2
TCS CHE 2
IBM CHE 3
IBM HYD 3
IBM BAN 3
HCL HYD 4
HCL BAN 4
HCL CHE 4
HCL CHE 4

Step1) Read the source data

Step2) Take copy stage and take 2 output links send the data to each link.

Step3) In aggregator stage aggregate Based on Company and take the

Aggregation Type = Count rows

16
Step4) In Lookup Stage, do the inner join, >> Lookup Failure = Drop (Inner join)

Step5) Compile and run the job

Req-13#Scenario:
input is like this:
no,char
1,a
2,b
3,a
4,b
5,a
6,a
7,b
8,a

output:
no,char,Count
"1","a","1"
"6","a","2"
"5","a","3"
"8","a","4"
"3","a","5"
"2","b","1"
"7","b","2"
"4","b","3"

Already this scenario is implemented, using stage variable concept in Transformer stage we
do the sequence number in each group.

17
Req-14#scenario:
Input is like this:
file1
10
20
10
10
20
30

Output is like:
file2 file3(duplicates)
10 10
20 10
30 20

Here Using Sort Stage generate key change columns, based on that column separate the
unique records to 1 target, duplicates to another target.

Req-15#scenario:

18
Input is like:
file1
10
20
10
10
20
30
40
50

Output is like Multiple occurrences in one file and single occurrences in one file:
file2 file3
10 30
10 40
10 50
20
20

Solution

Step1) Read the data,

Step2) In Copy stage take 2 output links and separate them

Step3) in aggregator stage, do the count calculation.

19
Step4) In transformer stage

Step5) Do the inner join using Lookup Stage

Step6) Compile and run the job,

In sql:

select file1,count(*) from t1 group by file1 having count(*)=1;


select file1,count(*) from t1 group by file1 having count(*)>1;

20
Req-16#scenario:
Input is like this:
file1
10
20
10
10
20
30

Output is like:
file2 file3
10 30
20

In this scenario, it is similar to above, but in the duplicates target, remove the duplicates
again

21
Req-17#scenario:
Input is like this:
file1
1
2
3
4
5
6
7
8
9
10

Output is like:
file2(odd) file3(even)
1 2
3 4
5 6
7 8
9 10

Step1) Read the source data

22
Step2) Just sort the data based on source key

Step3) In transformer stage use Mod function, and apply in constraint to separate to even
target and odd target

Mod(AsInteger(DSLink5.file1),2)=1 -- Odd records


Mod(AsInteger(DSLink5.file1),2)=0 -- Even records

Step5) Compile & Run

Req-18#) How to calculate Sum(sal), Avg(sal), Min(sal), Max(sal) without


using Aggregator stage?

Req-19#) How to find out First sal, Last sal in each dept. without
using aggregator stage?
Sol: Using aggregator stage we can perform

Req-20#) How many ways are there to perform remove duplicates function
without using Remove duplicate stage? ***
 Using Sort Stage itself can do 3 ways
1. Set Allow Duplicates =False
2. Create Key Change Column = True
3. Link Sort Remove duplicates
 Input column > Select ‘Hash’ Partition Type> enable ‘ Perform sort
> enable ‘ Unique’ option
 Using Transformer stage, Stage variables.
1. Take 2 stage variables and compare

23
Req-21#scenario:
The input is
Shirt|red|blue|green
Pant|pink|red|blue

Output should be,

Shirt:red
Shirt:blue
Shirt:green
pant:pink
pant:red
pant:blue

Sol)

In the transformer Stage:

Compile and Run the job

24
Req-22#Scenario:
source
col1 col3
1 samsung
1 nokia
1 ercisson
2 iphone
2 motrolla
3 lava
3 blackberry
3 reliance

Expected Output
col 1 col2 col3 col4
1 samsung nokia ercission
2 iphone motrolla
3 lava blackberry reliance

Sol) design the job using Pivot and Transformer stage

Design Tip:

SeqFileStage --- > PivotStage (VerticalPivot) --- > Target

Req-23#Scenario:
12)Consider the following employees data as source?
employee_id, salary
-------------------
10, 1000
20, 2000
30, 3000
40, 5000

Create a job to find the sum of salaries of all employees and this sum should repeat for all
the rows.

The output should look like as

employee_id, salary, salary_sum


-------------------------------
10, 1000, 11000
20, 2000, 11000
30, 3000, 11000
40, 5000, 11000

Already has been done this kind of the scenario in the above examples

25
Req-24#Scenario:
I have two source tables/files numbered 1 and 2.
In the the target, there are three output tables/files, numbered 3,4 and 5.

The scenario is that,

to the out put 4 -> the records which are common to both 1 and 2 should go.

to the output 3 -> the records which are only in 1 but not in 2 should go

to the output 5 -> the records which are only in 2 but not in 1 should go.

Req-25# Scenario:
sno,sname,mark1,mark2,mark3
1,rajesh,70,68,79

26
2,mamatha,39,45,78
3,anjali,67,39,78
4,pavani,89,56,45
5,indu,56,67,78

out put is
sno,snmae,mark1,mark2,mark3,delimetercount
1,rajesh,70,68,79,4
2,mamatha,39,45,78,4
3,anjali,67,39,78,4
4,pavani,89,56,45,4
5,indu,56,67,78,4

Design Tip:

1) Read the data using sequential file stage in a single column with fixed width
2) In transformer stage use ‘Dcount’ function to find the delimiter counts and keep it in
different column

SequentialFileStage ---- > TransformerStage Target

Req-26#scenario:
sname total_vowels_count
Allen 2
Scott 1
Ward 1

Under Transformer Stage Description:

total_Vowels_Count=Count(DSLink3.last_name,"a")+Count(DSLink3.last_name,"e")
+Count(DSLink3.last_name,"i")+Count(DSLink3.last_name,"o")
+Count(DSLink3.last_name,"u").

Req-28# Scenario:
1)On daily we r getting some huge files data so all files metadata is same we have
to load in to target table how we can load?
Use File Pattern in sequential file

2) One column having 10 records at run time we have to send 5th and 6th record
to target at run time how we can send?
Can get through, by using UNIX command in sequential file filter option

How can we get 18 months date data in transformer stage?


Use transformer stage after input seq file and try this one as constraint in transformer
stage :

DaysSinceFromDate(CurrentDate(), DSLink3.date_18)<=548 OR
DaysSinceFromDate(CurrentDate(), DSLink3.date_18)<=546

27
where date_18 column is the column having that date which needs to be less or equal to
18 months and 548 is no. of days for 18 months and for leap year it is 546(these
numbers you need to check).

Diff b/w Compile and Validate?


Compile option only checks for all mandatory requirements like link requirements, stage
options and all. But it will not check if the database connections are valid.

Validate is equivalent to Running a job except for extraction/loading of data. That is,
validate option will test database connectivity by making connections to databases.

Req-29# Scenario:
Scenario:
in my i/p source i have N no.of records

In output i have 3 targets

i want o/p like 1st rec goes to 1st targt and

2nd rec goes to 2nd target and

3rd rec goes to 3rd target again

4th rec goes to 1st taget ............ like this

do this ""without using partition techniques "" remember it.

source--->trans---->target
in trans use conditions on constraints
mod(empno,3)=1
mod(empno,3)=2
mod(empno,3)=0

Req-30#) Scenario:
I am having i/p as
col A
a_b_c

28
x_F_I
DE_GH_IF

we have to make it as 3 columns like below


col1 col 2 col3
abc
xfi
de gh if

In transformer Stage write the field function and separate the columns

Req-31# Scenario 2:
Following is the existing job design. But requirement got changed to: Head and trailer
datasets should populate even if detail records is not present in the source file. Below
job don't do that job.

29
Hence changed the above job to this following requirement:

Used row generator with a copy stage. Given default value(zero) for col(count) coming in
from row generator. If no detail records it will pick the record count from row generator.

Req-32# How can we get the first 100 sample records from sequential file stage?
Use Read First Rows = 100

30
Req-33# I am reading 1 file with sequential file stage, so it will read the data by
default sequentially, If I need to run the job in parallel which option you can
select?

Enable the Number of Reader Per Node=’True’ Or Read from Mutliple Nodes =2

Req-34# From the below data where ever ‘abc’ is there just filter those records
from the input file using sequential file stage?

empno,ename,job,sal
5500,abc,xyz,2500
5502,lmn,aaa,2600
5504,abc,xyz,2700
5506,lmn,aaa,2800
5508,abc,xyz,2900
5510,lmn,aaa,3000
5512,abc,xyz,3100
5514,lmn,aaa,3200

Output:
empno,ename,job,sal
5500,abc,xyz,2500
5504,abc,xyz,2700
5508,abc,xyz,2900
5512,abc,xyz,3100

Solution: Filter = grep –I ‘abc’

Req-35# I am getting the files from the source with header and footer, I wanted to
remove those header and footer and then need to load the data using sequential
file stage?

Source Data:

empdata
5500,abc,xyz,2500
5502,lmn,aaa,2600
5504,abc,xyz,2700
5506,lmn,aaa,2800
5508,abc,xyz,2900
5510,lmn,aaa,3000
5512,abc,xyz,3100
5514,lmn,aaa,3200
7710 lmn aaa 3000
7712 abc xyz 3100
7714 lmn aaa 3200
totalrecords:10

31
Output:
5500,abc,xyz,2500
5502,lmn,aaa,2600
5504,abc,xyz,2700
5506,lmn,aaa,2800
5508,abc,xyz,2900
5510,lmn,aaa,3000
5512,abc,xyz,3100
5514,lmn,aaa,3200

In sequential file stage, filter option set  sed –n ‘1d;$d’

Req-36# How to handle the Fixed width flat file using sequential file stage?

Source:
5500abcxyz2500
5502lmnaaa2600
5504abcxyz2700
5506lmnaaa2800
5508abcxyz2900
5510lmnaaa3000

Target:
Empno,ename,job,sal
5500,abc,xyz,2500
5502,lmn,aaa,2600
5504,abc,xyz,2700
5506,lmn,aaa,2800
5508,abc,xyz,2900
5510,lmn,aaa,3000

Solution:

In the first sequential Files stage:

32
Compile and run the job

Req-37# How to hand the null values records in sequential file stage?
In the format table set – null field value=null / ‘’ and in the columns tab – set
nullability = yes for all the columns.

How to replace ^M character in VI editor/sed?


sed -e "s/^M//g" old_file_name > new_file_name

Req-38#) Scenario: Get the next column value in current row (LEAD Sql
scenario

33
Input file :
Sq,No
1,1000
2,2200
3,3030
4,5600

Output File :

Sq,No,No2
1,1000,2200
2,2200,3030
3,3030,5600
4,5600,NULL

Solution Design :

a) Job Design :

Below is the design which can achieve the output as we needed. Here, we are reading seq file as a input,
then data is passing through a Sort and Transformer stage to achieve the output.

b) Sort Stage Properties


in Sort stage, we will sort the data based on column "Sq" in descending order.

c) Transformer Stage Properties

34
Here, we took 2 stage variable : StageVar, StageVar1
and their derivations are -
StageVar1 = StageVar
DSLink6.No = StageVar1

d)Output File
Before capturing the data into Sequential file. We will sort the data again in ascending order to get the
output as we needed.

In SQL:

select ename,job,sal,lead(sal) over(order by hiredate,sal) sal_prev from emp1;

select segment,country,product,sale_date,gross_sales,discount,lead(gross_sales) over(order


by sale_DAte,gross_Sales) gross_prev from Financial;

Req-38.1#) Scenario: Get the curr column value in next row (LAG Sql scenario
Input file :
Sq,No
1,1000
2,2200
3,3030
4,5600

Output File :

Sq,No,No2
1,1000,null
2,2200,1000
3,3030,2200
4,5600,3030

35
Wht is the difference beteen validated ok and compiled in
datastage.
When you compile a job, it ensure that basic things like all the important stage parameters has been set,
mappings are correct, etc. and then it creates an executable job.

36
You validate a compiled job to make sure that all the connections are valid. All the job parameters are set
and a valid output can be expected after running this job. It is like a dry run where you don't actually play
with the live data but you are confident that things will work

Req-40#) Scenario:
From the Below source, replace with NewWord where ever word contain abc
Source:
empno,ename,job,sal
1200,abc123,xyz,2500
1202,lmn789,abc,2600
1204,abc,xyz,2700
1206,lmn,aaa,2800
1208,abc,xyz,2900
1210,lmn,abc,3000
1212,abc,xyz,3100
1214,lmn,aaa,3200

Target:
empno,ename,job,sal
1200,NewWord123,xyz,2500
1202,lmn789,NewJob,2600
1204,NewWord,xyz,2700
1206,lmn,aaa,2800
1208,NewWord,xyz,2900
1210,lmn,NewJob,3000
1212,NewWord,xyz,3100
1214,lmn,aaa,3200

Use ‘Change’ Function in transformer stage

In the transformer stage:

37
Change(DSLink3.ename,'abc','NewWord')

It will replace where ‘abc’ is there with ‘NewWord’

Same functionality we can do it in UNIX & SQL as well Like below


Unix Command:

sed -i 's/old-text/new-text/g' input.txt

On Oracle:
SELECT JOB, REPLACE(JOB,'MANAGER','MNG') JOB1 FROM EMP;

Req-41#) I have a source file, in output I want first 8 lines should be empty 9th line should be header
and remaining are source data? how to achieve this?

Job Design:

Step1) Read the source data using Sequential file stage


Step2) Take Transformer Stage 3 output links like Header for 1 link, for Input data for 1 link and for
empty rows 1 link and provide the options like below screen shot.

Here for header in the constraint level use @INROWNUM=1 (it will read only 1 record), to get empty
rows as we required just mention in the constraint level as @INROWNUM<=4 (it will give only 4 empty
rows).

38
Step3) use the Column Export stage for export all the columns value into singe column as below

Step4) Finally Funnel all the input

Step5) Compile and Run the job

Req-42#) I have EMP data and date in target I want to load the date into micro seconds. how you do
it?

the default format string for this function is:


"%yyyy-%mm-%dd %hh:%nn:%ss"
here that default format string does not contain microseconds. If microseconds are
enabled then the format string would end with %ss.6 (where the 6 indicates the
number of decimal places for seconds).

39
o ensure the stage converts the Timestamp string to Timestamp datatype without
losing the microseconds, the conversion can be coded explicitly in the derivation,
for example, instead of setting the derivation to just the name of TimeStamp string
such as myTimeStamp, change it to:
StringToTimestamp(myTimestamp,"%yyyy-%mm-%dd %hh:%nn:%ss.6")

Req-43#)my input is abc#123 i want abc in one column 123 in another?


User Convert function in transformer stage like below

In transformer stage:

Abc#GHJ I want abc in column and ghj in another?

This scenario also like the same as above

Req-44:
Input :
a
b

40
c
target:
a
bb
ccc
How can I get this. could anyone help me to do this.....

seqFileStage - > TransformerStage  Tareget

Hint: Src -->Trns (use Function Str(inputcolumn,@inputrow) --> Trg

Req-45: Get first Row and last row from source file?
My input has a unique column-id with the values

Src
-----
10
20
30
40
50
.....

how can i get first record in one o/p file, last record in another o/p file and rest of the records in 3rd
o/p file?

Sol)
Moethod1:

In transformer using constraints we can achieve


1). Link--> @inrownum=1
2).link --> lastrow()
3). link --> click the otherwise condition

41
Method2:
src-->colgen(generate dmmy column with value 1)--->copy(3 o/ps)
1 o/p --> head
2 o/p -->tail
head and tail -->funnel
3rd copy and funnel -->join(fullouter)-->filter (not null on dummy col)

Req-46:

Add Header & Trailer


Example Input:

Atul
Neeraj
Anita
Amruta
Divya

42
Swapnil
Pramod
Vivek
Ashish
Amit
Santosh

Output

Employee Name
Atul
Neeraj
Anita
Amruta
Divya
Swapnil
Pramod
Vivek
Ashish
Amit
Santosh
Employee Count : 11

Solution & Job design:

Job Design:

Step1: Extract the source data from sequential file stage

43
Step2: from Transformer stage give 2 output links

Step3:

44
Step4:

Step5: From the transformer stage take 3 output links, 1 is for header, 1 is for count, 1 is complete
data.

45
Step6: using column export stage, export all the source data into single columns.

Step7: Finally Combine all the data using funnel stage

Step8: Using sort stage sort the no field to get header, data and tailer into order

Step9 Finally load the data into target.

46
Note: Similar job, read the source data from database and create the header and footer records

Req-47:

Listing Projects, Jobs, Stages, Links, and Parameters:


For the more details:

https://www.datagenx.net/2015/10/dsjob-managing-datastage-jobs-from_6.html

Req-48: Oracle Row_number(), Rank(), Dense_Rank() & Cumilative sum


example Input:

Data
Gama
Charlie
Alpha
Beta
Alpha
Charlie
Delta
Alpha

47
Output Expected:

Data ROW_NUMBER RANK DENSE_RANK


Alpha 1 1 1
Alpha 2 1 1
Alpha 3 1 1
Beta 4 4 2
Charlie 5 5 3
Charlie 6 5 3
Delta 7 7 4
Gama 8 8 5

Sol:

Job Order ---- Seq_File -> Transformer -> Tgt

1. First sort the 'Data' in Ascending order.


2. Create 5 stage variables as follows:
Curr_Val --> Data
Row_Num (initial value '0') --> Row_Num+1
Rank (initial value '1') --> If Curr_Val = Prev_Val Then Rank Else Row_Num
Dense_Rank (initial value '1') --> If Curr_Val = Prev_Val Then Dense_Rank Else
Dense_Rank+1
Prev_Val (initial value '0') --> Curr_Val
3. In final column metadata along with Data column, add three more columns of Integer type
with names ROW_NUMBER, RANK, DENSE_RANK.
4. Map stage variables, in the column derivations of these new columns as follows
Row_Num ---> ROW_NUMBER
Rank --> RANK
Dense_Rank --> DENSE_RANK

(Or)

You must Respect the order of the variables

Sort -> Transformer

Sort : Sort Data Ascending

Transformer : Execution Mode : Sequentiel


We have 5 variables

48
StageVar2 +1 = StageVar2
if StageVar1 = DSLink8.Data then StageVar3 else StageVar2 = StageVar3
StageVar3 = StageVar4
if StageVar1= DSLink8.Data then StageVar else StageVar + 1 = StageVar
DSLink.Data = StageVar1

For the Output :


DSLink8.Data = Data
StageVar2 = Row_Rank
StageVar3 = Rank
StageVar = Dense_Rank

Step1: Read the data

Step2: sort the data based on key column

Step3: follow the step in transformer stage


Create stage variable with initial values from stage properties

varableName | InitialValue
----------------------------

49
svCurrvalu |
svRowNum |0
svRank |1
svDenseRank |0
svprevVal |0

svCurrVal:Data
svRowNum:svRowNum +1
svRank : If svCurrVal = svPrevVal Then svRank Else svRowNum
svDenseRank: If svCurrVal = svPrevVal Then svDenseRank Else svDenseRank+1
svPrevVal:svCurrVal

Map each stage variable to the repective output flow

Step4: configure the target and run the job:

Note:: Make sure the job in single node or sequentially, or change setting from parallel to sequential
in each stage.

For more details:


https://wingsoftechnology.com/dense_rank-analytic-function/

50
In Oracle SQL Or any other Database SQL we can use the queries like below

--Row_Number() -- It will generate sequence number for same values


--Rank() -- It will assign the same rank for the same values but it will skip the squence
--Dense_Rank() --It will assign the same rank for the same values but it will doesn't skip the squence
--Cumulative sum -each row salary sum with next row salary

Ex:Row_Number()
SELECT ENAME,JOB,SAL,DEPTNO,ROW_NUMBER() OVER(ORDER BY DEPTNO) RN FROM EMP;

Ex:RANK()
SELECT ENAME,JOB,SAL,DEPTNO,RANK() OVER(ORDER BY DEPTNO) RN FROM EMP;

Ex:DENSE_RANK()
SELECT ENAME,JOB,SAL,DEPTNO,DENSE_RANK() OVER(ORDER BY DEPTNO) RN FROM EMP;

cumulative sal:
SELECT ENAME,JOB,DEPTNO,SAL,SUM(SAL) OVER(ORDER BY SAL) t FROM EMP;

Cumulative sum:

51
Output:

EMPNO|ENAME|SAL|cumilative_Sal
7369|SMITH| 00800.| 00800.00
7900|JAMES| 00950.| 01750.00
7521|WARD| 01250.| 03000.00
7654|MARTIN| 01250.| 04250.00
7934|MILLER| 01300.| 05550.00
7844|TURNER| 01500.| 07050.00
7499|ALLEN| 01600.| 08650.00
7782|CLARK| 02450.| 11100.00
7566|JONES| 02975.| 14075.00
7902|FORD| 03000.| 17075.00
7839|KING| 05000.| 22075.00

52
Find department wise sum salary

In transformer stage apply the below logic:

currval: Sal
sv1: If currsal=prevsal then DSLink5.SAL +sv1 Else sal
prevsal:currval

map this sv1 to mapping colukmns

53
Req-49:

Given : FileA and FileB contains some data in single column.


Target : File_A : Which contains the data which is available in A but not in B
Target : File_B : Which contains the data which is available in B but not in A
Target : File_AB : Which contains the data which is available in A and B Both

FileA FileB
1 9
2 10
3 11
4 12
5 13
6 14
7 15
8 16
9 17
10 18
11 19
12 20
13 21
14 22
16 23

OutPu
t
File_A File_B File_AB

54
1 17 9
2 18 10
3 19 11
4 20 12
5 21 13
5 22 14
6 23 15
7 16
8

Solution Design:

a) Job Design:

Below is the design which can achieve the output as we needed. Here, we are reading 2 seq file as a
input, then data is passing through a Join and Filter stage to achieve the output.

b) Join Stage Properties: For the required output, we have to join the both input file on "Col1". Here we
are using FULL OUTER JOIN.

55
FULL OUTER JOIN will generate the 2 columns LEFT n RIGHT. Map the both columns to Join Output

b) Filer Stage Properties : Now, In Filter Stage, We will filter the data as per requirement.
Here ....
FileA = Left and FileB = Right
So...
Data Only in FileA = where Right_Col1 = 0 or "" or NULL
Data Only in FileB = where Left_Col1 = 0 or "" or NULL
Data in both FileA and FileB = where (Right_Col1 <> 0 or "" or NULL) and (Left_Col1 <> 0 or "" or
NULL)

## Note :-
Left and Right Col will be = 0 or "" or NULL if-
NULL => Data will be NULL if the Source is DB. ( But here we are using seq file )
In case of Sequence File as Source
"" => Data will be "" if the column data type is VarChar.
0 => Data will be 0 if the column data type is Integer.

Now, we will map the columns to output.


For "Data available only in FileA" :- Assign Left to Output.

56
For "Data available only in FileB" :- Assign Right to Output.

For "Data available in Both " :- Assign Right or Left to Output.

Req-50: How many ways we can call the routines?

a) In the job properties There is an option to call the Before and After job subroutines.

b) In the job sequence there is an activity called "Routine Activity". From there also the routines could
be called.

c) In the derivation part of the Transformer of a parallel job "parallel routines "can be called.

d) In the derivation part of the Transformer of a server job "server routines "can be called.

f) In the server job stages also before and after job subroutines can be called.

Req-51: How many ways we can call the routines?

Req-52:find first row and last row from the source file?

Src
10

57
20
30
40
50

Output:
10
50

Design:

58
Note: make sure, job should run in sequential

Req-60:

INPUT.txt –

Please generate lots of data using row generator. Provide a list of pin codes so that it repeats in multiple
records

ID,NAME,SALARY,ADDRESS,PINCODE
1,ABC,1000,WHITEFIELD,560066
2,XYZ,2000,MARTAHALLI,560075
2,XYZ,2300,MARTAHALLI,560075
5,FDG,1100,KUNDALAHALLI,560061
4,IOP,1200,MADIWALA,560061
3,UTR,1150,SILKBOARD,560066
6,SDA,1100,BTM LAYOUT,560075
8,MNB,2000,SINGSANDRA,560032
8,MNB,2300,ECITY,560032
10,WRE,3000,HOSUR,560011
11,ASD,2500,MAJESTIC,560066

OUTPUT_1.txt -> Note - None of the records which has duplicates should be considered for any
calculation and all of them should go to error file
PINCODE,NBR_OF_EMPL,HIGHEST_SAL,LOWEST_SAL,AVERAGE_SAL
560066,3,2500,1000,1550
560075,1,1100,1100,1100
560061,2,1200,1100,1150
560011,1,3000,3000,3000

OUTPUT_2_ERRORS.txt
ID,NAME,SALARY,ADDRESS,PINCODE,ERROR_DESCRIPTION
2,XYZ,2000,MARTAHALLI,560075,DIFFERENT SALARIES
2,XYZ,2300,MARTAHALLI,560075,DIFFERENT SALARIES
8,MNB,2000,SINGSANDRA,560032,DIFFERENT ADDRESS
8,MNB,2300,ECITY,560032,DIFFERENT ADDRESS

Req-61:

Source:
DID ITEM
111 A

59
111 B
111 C
111 D
222 E
222 F
222 G
222 H

Target:
DID ITEM RESULT
111 A A
111 B A,B
111 C A,B,C
111 D A,B,C,D
222 E A,B,C,D,E
222 F A,B,C,D,E,F
222 G A,B,C,D,E,F,G
222 H A,B,C,D,EF,G,H

Design:

Sort stage:

Transformer stage:

60
Req-62:

Src:
A
B
C
D
E

Trg:
col1,test_col1
A,A
B,BB
C,CCC
D,DDDD
E,EEEEE

61
Note: Run the job in single Node

62
63.) Find How many occurrences repeated in a input string?

63
64) How to perform null handling in Transformer stage?
Use null handling functions

Hint: use Null Handling Functions in the transformer stage:


(NullTovalue(), NullToEmpty(),NullToZero(),True(),False())

65) scenario

Source:
col1:
sac123#xysx456
lmn897_ijk@
rjk++xlmno11143

output1: Output2:
Numbers string_c
123456 sacxysx
897 lmnijk
11143 rjkxlmno

source -- > Transformer stage --> target1(Numbers)


--> target2(String_C)
in transformer stage, use translate convert() function

66) scenario
Src:
Country_name|countrycode
ind |101,102,103
US |897,912
China |616,658,692,217

outpu:
countryname countrycode
ind 101
ind 102
ind 103
US 897
US 912
China 616
China 658
China 692

64
China 217

Hint:
Source -- > transformer stage --> target

Step1: extract the data from source


Step2: In transformer stage, in while loop use @Iteration<=Dcount(count1)
In while loop derivation use Field(In.Col,’,’@iteration) : Loop_Var
Here loop variable, loop encounter till the loop condition is satisify

67) scenario: repace where eve ‘ind’ with ‘India’


Src:
col1
ind is countr y,ind is developing
in ind movies are famous,visit ind, in ind they provide many offers

OutPut
col1
India is country, India is developing
in India movies are famous,visit India, in India they provide many offers

67) scenario:
Src:
Ename sal doj
Abc 100 12-12-2019
xyz 200 21-04-2018
lmn 300 15-06-2021

output:
Ename sal doj
Abc 100 12-12-2019 00:00:00
xyz 200 21-04-2018 00:00:00
lmn 300 15-06-2021 00:00:00

convert the data into time stamp using transfer stage

Hint:
Source--- > Transformer stage -- > target

in transformer stage: need to concatenate the date and timestamp then convert into string to
timestamp.

65
68: SCD Implementations:
what is SCD (Slowly Changing Dimension)?
Dimensions that change slowly over time, rather than changing on regular schedule, time-base. In Data
Warehouse there is a need to track changes in dimension attributes in order to report historical data.

There are many approaches how to deal with SCD. The most popular are:

 Type 1 - Overwriting the old value


 Type 2 - Creating a new additional record (It maintains history)
 Type 3 - Adding a new column (Partial changes/history)
 Type 4 - Using historical table
 Type 6 - Combine approaches of types 1,2,3 (1+2+3=6)

Type 1 - Overwriting the old value. In this method no history of dimension changes is kept in the
database. The old dimension value is simply overwritten be the new one. This type is easy to maintain
and is often use for data which changes are caused by processing corrections (e.g. removal special
characters, correcting spelling errors).

Before the change:


Customer_ID Customer_Name Customer_Type
1 Ram Corporate

After the change:


Customer_ID Customer_Name Customer_Type
1 Ram Retail

Type 2 - Creating a new additional record and maintain the record history. In this
methodology all history of dimension changes is kept in the database. You capture attribute
change by adding a new row with a new surrogate key to the dimension table. Both the prior and
new rows contain as attributes the natural key (or other durable identifier). Also 'effective date'
and 'current indicator' columns are used in this method. There could be only one record with
current indicator set to 'Y'. For 'effective date' columns, i.e. start_date and end_date, the end_date
for current record usually is set to value 9999-12-31. Introducing changes to the dimensional
model in type 2 could be very expensive database operation so it is not recommended to use it in
dimensions where a new attribute could be added in the future.

Before the change:


Customer_ID Customer_Name Customer_Type Start_Date End_Date Current_Flag
1 Cust_1 Platinum 22-07-2010 31-12-9999 Y

After the change:

66
Customer_ID Customer_Name Customer_Type Start_Date End_Date Current_Flag
1 Cust_1 Platinum 22-07-2010 17-05-2012 N
1 Cust_1 Gold 17-05-2012 04-12-2022 N
1 Cust 1 Silver 04-12-2022 31-12-9999 Y

Type 3 - Adding a new column. In this type usually only the current and previous value of
dimension is kept in the database. The new value is loaded into 'current/new' column and the old
one into 'old/previous' column. Generally speaking the history is limited to the number of column
created for storing historical data. This is the least commonly needed techinque.

Before the change:


Customer_ID Customer_Name Current_Type Previous_Type
1 Cust_1 Proj3 Proj2

After the change:


Customer_ID Customer_Name Current_Type Previous_Type
1 Cust_1 Proj4 Proj3

Type 4 - Using historical table. In this method a separate historical table is used to track all
dimension's attribute historical changes for each of the dimension. The 'main' dimension table
keeps only the current data e.g. customer and customer_history tables.

Current table:
Customer_ID Customer_Name Customer_Type
1 Cust_1 Corporate

Historical table:
Customer_ID Customer_Name Customer_Type Start_Date End_Date
1 Cust_1 Retail 01-01-2010 21-07-2010
1 Cust_1 Oher 22-07-2010 17-05-2012
1 Cust_1 Corporate 18-05-2012 31-12-9999

Type 6 - Combine approaches of types 1,2,3 (1+2+3=6). In this type we have in dimension table
such additional columns as:
 current_type - for keeping current value of the attribute. All history records for given item of
attribute have the same current value.
 historical_type - for keeping historical value of the attribute. All history records for given
item of attribute could have different values.

67
 start_date - for keeping start date of 'effective date' of attribute's history.
 end_date - for keeping end date of 'effective date' of attribute's history.
 current_flag - for keeping information about the most recent record.
In this method to capture attribute change we add a new record as in type 2. The current_type
information is overwritten with the new one as in type 1. We store the history in a
historical_column as in type 3.

Customer_ID Customer_Name Current_Type Historical_Type Start_Date End_Date Current_Flag


1 Cust_1 Corporate Retail 01-01-2010 21-07-2010 N
2 Cust_1 Corporate Other 22-07-2010 17-05-2012 N
3 Cust_1 Corporate Corporate 18-05-2012 31-12-9999 Y

TYPE-1 Implementation in Data Stage:


Source Data:
empno,ename,location
1010,Srinu,Hyderabad
1011,Agens,Bengaluru
1012,Nazeem,Mysore
1013,Yaswanth,Hyderabad
1014,Poorna,vizag
1015,Meerna,Tirupathi

Step1: Load this data into target

Step2: In transformer stage:

68
Step3: In target, follow the below settings

Step4: See the data in target table in database

This is full load from the source, if any data is changes from the source, then re-run the job, all the
previous values will override. Here we can’t see the record transaction history

69
TYPE-2 Implementation in Data Stage:
Job Design:

Step1: Design the job as per the above screen


Step2: In the reference take only falg =1 records
Step3: setup the key columns and key values in Change Capture Stage to capture the changes

In the mapping we can see, it generated additional column to capture the changes

Step4: Target will contain changes change code records.

70
Step4: Based on these change codes, update the end_date and flag (create a new job for this process)

Step4.1) Read the all the changes data from sequential file stage
Step4.2) In transformer stage do the below changes

71
Enable the keys to generate the update statement to update the end_Date and flag columns
4.3) Run the job and see the result in the target table

Above zero (Flag_ind) and end_date columns are updated with updated records

Step5: Now Insert the New Records and updated record as new record in the target with new job

Step5.1: Read all the change data from source


Step5.2: In transformer stage set the below values

72
Step5.3: in the target set the options like below

Step5.4: Run the job and find the record transaction history in the target table

Finally, 1010 is having 2 records, one is old record and another one is updated record

Note: Like this we can implement the SCD using different stages like
 Change Data Capture
 SCD Stage
 Lookup Stage
 Join Stage
 Compare stage/difference Stage

Mainly in the real-time projects using CDC or SCD stages itself

73
Some use full links for the better understand the other stuff:
https://datastage4u.wordpress.com/category/datastage-best-practices/
http://datastageforleaener.blogspot.com/2015/06/change-capture-stage-in-datastage.html

69: Create a control table based on the input excel files

We have a excel sheet with data like Valic & Non-Valic data this, need to load into the control table
Excel data like:
Dis Un M Sale
Segment Country Product Bnd Sold Price Price Gross Sales

Governme Carreter
nt Canada a None 1619 3 20 32370

Governme German Carreter


nt y a None 1321 3 20 26420

Carreter
Midmarket France a None 2178 3 15 32670

German Carreter
Midmarket y a None 888 3 15 13320

Carreter
Midmarket Mexico a None 2470 3 15 37050

Governme German Carreter


nt y a None 1513 3 350 529550

German Montan
Midmarket y a None 921 5 15 13815

Channel Montan
Partners Canada a None 2518 5 12 30216

Governme Montan
nt France a None 1899 5 20 37980

Channel German Montan


Partners y a None 1545 5 12 18540

74
Montan
Midmarket Mexico a None 2470 5 15 37050

Montan
Enterprise Canada a None 2666 5 125 333188

Small Montan
Business Mexico a None 958 5 300 287400

Governme German Montan


nt y a None 2146 5 7 15022

Montan
Enterprise Canada a None 345 5 125 43125
United
States
of Montan
Midmarket America a None 615 5 15 9225
Governme
nt Canada Paseo None 292 10 20 5840
Midmarket Mexico Paseo None 974 10 15 14610
Channel
Partners Canada Paseo None 2518 10 12 30216

75

You might also like