DataStageOnlyScenario Questions
DataStageOnlyScenario Questions
input data:
eid|ename|sal
101|abc|200
102|xyz|300
103|lmn|500
eid|ename|sal|total_Sal
101|abc|200|1000
102|xyz|300|1000
103|lmn|500|1000
In transformer stage 2 output links and create additional common column like bellow screens shot.
1
In aggregator stage provide the values like below.
(Or)
# Req-2#
Input :
source,destination,km
hyd,bang,1000
delhi,chennai,1500
chennai,bang,600
bang,hyd,1000
bombay,pune,1000
bang,chennai,600
Output :
source,destination,km
hyd,bang,1000
delhi,chennai,1500
chennai,bang,600
bombay,pune,1000
see here hyd to banglore distance is 1000km
another rows is also banglore to hyd 1000km
so need to delete rows like this.
2
Solution:
3
Step4: apply the logic like above, compile run the job.
https://datastageinfoguide.blogspot.com/2014/02/datastage-scenario-based-
questionanswer.html
DATASTAGE SCENARIO BASED QUESTION - 2 : CITY NAMES AND DISTANCE BETWEEN THEM : PROBLEM
AND SOLUTION - Wings Of Technology
SOURCE,DESTINATION,DISTANCE
HYD,CHN,500
CHN,HYD,500
BANG,HYD,600
HYD,BANG,600
PUN,HYD,750
HYD,PUN,750
CHN,BANG,500
BANG,CHN,500
Expected output:
HYD,CHN,500
BANG,HYD,600
PUN,HYD,750
CHN,BANG,500
4
Step1: read the data from sequential file stage
Step2: make sure set the transformer stage execution mode as sequential instead of parallel
Step3: Create 2 stage variables to maintain the temp values, no need to mention initial value, leave as it
is empty;
5
Step5) use the sv1 filter in Transformer constraint level, Compile and run the job
Output:
Ref: https://datastageinfoguide.blogspot.com/2013/10/transformer-looping-functions-for.html
6
Step2) in transformer stage write the looping based on the no. of columns need to append
commit;
7
(Or)
8
Req-5# Without a key column, Pivoting the Output data in Required
Format.
Input Data :
NAME
-----------
IBM
WEBSPHERE
DATASTAGE
IBM
INFOSPHERE
DATASTAGE
Required Output
NAME
----------
IBM WEBSPHERE DATASTAGE
IBM INFOSPHERE DATASTAGE
Solution
Step2) In transform stage, create additional column and give the logic in the derivation
Derivation:
9
Step3) take a Pivot enterprise stage and change to Pivot type = vertical
Step5) In the output tab, in mapping sub tab, map the required columns
NAME
IBM WEBSPHERE DATASTAGE
IBM INFOSPHERE DATASTAGE
10
Req-5# covert the below data from Columns to Rows (using Pivot)
Input:
Output Data:
1 UMA 100
1 UMA 200
1 UMA 300
2 POOJITHA 200
2 POOJITHA 300
2 POOJITHA 400
Ref - https://datastageinfoguide.blogspot.com/2014/02/pivot-enterprise-stage-horizontal_11.html
This logic can able implement by using Pivot Enterprise stage and also by using SQL Queries can do.
11
Req-6# Scenario:
I have a source like
COL1
A
A
B
B
B
C
TARGET LIKE
COL1 COL2
A 1
A 2
B 1
B 2
B 3
C 1
Solution)
Step2) in Sort Stage, Enable to Key Change Column = True, It will generate the
additional KeyChange Column in the output.
12
Step4) Compile and run the job
Req-7#) We have a source which is a sequential file with header and footer.
How to remove the header and footer while reading this file using sequential
file stage of Datastage?
Sol: Type command in putty: sed '1d;$d' file_name>new_file_name (type this in
job before job subroutine then use new file in seq stage)
Req-8#) Suppose that 4 job control by the sequencer like (job 1, job 2, job 3, job 4 )
if job 1 have 10,000 row ,after run the job only 5000 data has been loaded in target
table remaining are not loaded and your job going to be aborted then.. How can sort
out the problem
Sol: Suppose job sequencer synchronies or control 4 jobs but job 1 have problem, in this
condition should go director and check it what type of problem showing either data type
problem, warning massage, job fail or job aborted in the logs, If job fail means data type
problem or missing column action.
So u should go Run window -> Click -> Tracing->Performance or In your target table -
>general -> action-> select this option here two option
13
First u check how many data already load after then select on skip option then continue and
what remaining position data not loaded then select On Fail , Continue ...... Again Run the
job defiantly u get successful massage
Ans: If the metadata for all the files r same then create a job having file name as parameter
then use same job in routine and call the job with different file name...or u can create
sequencer to use the job.
Then you have to manually enter all the column description in each stage.RCP- Runtime
column propagation
14
Req-11#) Question:scenario
Source: Target
Sol)
Req-12#Question – scenario:
source has 2 fields like
COMPANY LOCATION
IBM HYD
TCS BAN
IBM CHE
HCL HYD
TCS CHE
IBM BAN
HCL BAN
HCL CHE
15
output:
COMPANY LOCATION COUNT
TCS BAN 2
TCS CHE 2
IBM CHE 3
IBM HYD 3
IBM BAN 3
HCL HYD 4
HCL BAN 4
HCL CHE 4
HCL CHE 4
Step2) Take copy stage and take 2 output links send the data to each link.
16
Step4) In Lookup Stage, do the inner join, >> Lookup Failure = Drop (Inner join)
Req-13#Scenario:
input is like this:
no,char
1,a
2,b
3,a
4,b
5,a
6,a
7,b
8,a
output:
no,char,Count
"1","a","1"
"6","a","2"
"5","a","3"
"8","a","4"
"3","a","5"
"2","b","1"
"7","b","2"
"4","b","3"
Already this scenario is implemented, using stage variable concept in Transformer stage we
do the sequence number in each group.
17
Req-14#scenario:
Input is like this:
file1
10
20
10
10
20
30
Output is like:
file2 file3(duplicates)
10 10
20 10
30 20
Here Using Sort Stage generate key change columns, based on that column separate the
unique records to 1 target, duplicates to another target.
Req-15#scenario:
18
Input is like:
file1
10
20
10
10
20
30
40
50
Output is like Multiple occurrences in one file and single occurrences in one file:
file2 file3
10 30
10 40
10 50
20
20
Solution
19
Step4) In transformer stage
In sql:
20
Req-16#scenario:
Input is like this:
file1
10
20
10
10
20
30
Output is like:
file2 file3
10 30
20
In this scenario, it is similar to above, but in the duplicates target, remove the duplicates
again
21
Req-17#scenario:
Input is like this:
file1
1
2
3
4
5
6
7
8
9
10
Output is like:
file2(odd) file3(even)
1 2
3 4
5 6
7 8
9 10
22
Step2) Just sort the data based on source key
Step3) In transformer stage use Mod function, and apply in constraint to separate to even
target and odd target
Req-19#) How to find out First sal, Last sal in each dept. without
using aggregator stage?
Sol: Using aggregator stage we can perform
Req-20#) How many ways are there to perform remove duplicates function
without using Remove duplicate stage? ***
Using Sort Stage itself can do 3 ways
1. Set Allow Duplicates =False
2. Create Key Change Column = True
3. Link Sort Remove duplicates
Input column > Select ‘Hash’ Partition Type> enable ‘ Perform sort
> enable ‘ Unique’ option
Using Transformer stage, Stage variables.
1. Take 2 stage variables and compare
23
Req-21#scenario:
The input is
Shirt|red|blue|green
Pant|pink|red|blue
Shirt:red
Shirt:blue
Shirt:green
pant:pink
pant:red
pant:blue
Sol)
24
Req-22#Scenario:
source
col1 col3
1 samsung
1 nokia
1 ercisson
2 iphone
2 motrolla
3 lava
3 blackberry
3 reliance
Expected Output
col 1 col2 col3 col4
1 samsung nokia ercission
2 iphone motrolla
3 lava blackberry reliance
Design Tip:
Req-23#Scenario:
12)Consider the following employees data as source?
employee_id, salary
-------------------
10, 1000
20, 2000
30, 3000
40, 5000
Create a job to find the sum of salaries of all employees and this sum should repeat for all
the rows.
Already has been done this kind of the scenario in the above examples
25
Req-24#Scenario:
I have two source tables/files numbered 1 and 2.
In the the target, there are three output tables/files, numbered 3,4 and 5.
to the out put 4 -> the records which are common to both 1 and 2 should go.
to the output 3 -> the records which are only in 1 but not in 2 should go
to the output 5 -> the records which are only in 2 but not in 1 should go.
Req-25# Scenario:
sno,sname,mark1,mark2,mark3
1,rajesh,70,68,79
26
2,mamatha,39,45,78
3,anjali,67,39,78
4,pavani,89,56,45
5,indu,56,67,78
out put is
sno,snmae,mark1,mark2,mark3,delimetercount
1,rajesh,70,68,79,4
2,mamatha,39,45,78,4
3,anjali,67,39,78,4
4,pavani,89,56,45,4
5,indu,56,67,78,4
Design Tip:
1) Read the data using sequential file stage in a single column with fixed width
2) In transformer stage use ‘Dcount’ function to find the delimiter counts and keep it in
different column
Req-26#scenario:
sname total_vowels_count
Allen 2
Scott 1
Ward 1
total_Vowels_Count=Count(DSLink3.last_name,"a")+Count(DSLink3.last_name,"e")
+Count(DSLink3.last_name,"i")+Count(DSLink3.last_name,"o")
+Count(DSLink3.last_name,"u").
Req-28# Scenario:
1)On daily we r getting some huge files data so all files metadata is same we have
to load in to target table how we can load?
Use File Pattern in sequential file
2) One column having 10 records at run time we have to send 5th and 6th record
to target at run time how we can send?
Can get through, by using UNIX command in sequential file filter option
DaysSinceFromDate(CurrentDate(), DSLink3.date_18)<=548 OR
DaysSinceFromDate(CurrentDate(), DSLink3.date_18)<=546
27
where date_18 column is the column having that date which needs to be less or equal to
18 months and 548 is no. of days for 18 months and for leap year it is 546(these
numbers you need to check).
Validate is equivalent to Running a job except for extraction/loading of data. That is,
validate option will test database connectivity by making connections to databases.
Req-29# Scenario:
Scenario:
in my i/p source i have N no.of records
source--->trans---->target
in trans use conditions on constraints
mod(empno,3)=1
mod(empno,3)=2
mod(empno,3)=0
Req-30#) Scenario:
I am having i/p as
col A
a_b_c
28
x_F_I
DE_GH_IF
In transformer Stage write the field function and separate the columns
Req-31# Scenario 2:
Following is the existing job design. But requirement got changed to: Head and trailer
datasets should populate even if detail records is not present in the source file. Below
job don't do that job.
29
Hence changed the above job to this following requirement:
Used row generator with a copy stage. Given default value(zero) for col(count) coming in
from row generator. If no detail records it will pick the record count from row generator.
Req-32# How can we get the first 100 sample records from sequential file stage?
Use Read First Rows = 100
30
Req-33# I am reading 1 file with sequential file stage, so it will read the data by
default sequentially, If I need to run the job in parallel which option you can
select?
Enable the Number of Reader Per Node=’True’ Or Read from Mutliple Nodes =2
Req-34# From the below data where ever ‘abc’ is there just filter those records
from the input file using sequential file stage?
empno,ename,job,sal
5500,abc,xyz,2500
5502,lmn,aaa,2600
5504,abc,xyz,2700
5506,lmn,aaa,2800
5508,abc,xyz,2900
5510,lmn,aaa,3000
5512,abc,xyz,3100
5514,lmn,aaa,3200
Output:
empno,ename,job,sal
5500,abc,xyz,2500
5504,abc,xyz,2700
5508,abc,xyz,2900
5512,abc,xyz,3100
Req-35# I am getting the files from the source with header and footer, I wanted to
remove those header and footer and then need to load the data using sequential
file stage?
Source Data:
empdata
5500,abc,xyz,2500
5502,lmn,aaa,2600
5504,abc,xyz,2700
5506,lmn,aaa,2800
5508,abc,xyz,2900
5510,lmn,aaa,3000
5512,abc,xyz,3100
5514,lmn,aaa,3200
7710 lmn aaa 3000
7712 abc xyz 3100
7714 lmn aaa 3200
totalrecords:10
31
Output:
5500,abc,xyz,2500
5502,lmn,aaa,2600
5504,abc,xyz,2700
5506,lmn,aaa,2800
5508,abc,xyz,2900
5510,lmn,aaa,3000
5512,abc,xyz,3100
5514,lmn,aaa,3200
Req-36# How to handle the Fixed width flat file using sequential file stage?
Source:
5500abcxyz2500
5502lmnaaa2600
5504abcxyz2700
5506lmnaaa2800
5508abcxyz2900
5510lmnaaa3000
Target:
Empno,ename,job,sal
5500,abc,xyz,2500
5502,lmn,aaa,2600
5504,abc,xyz,2700
5506,lmn,aaa,2800
5508,abc,xyz,2900
5510,lmn,aaa,3000
Solution:
32
Compile and run the job
Req-37# How to hand the null values records in sequential file stage?
In the format table set – null field value=null / ‘’ and in the columns tab – set
nullability = yes for all the columns.
Req-38#) Scenario: Get the next column value in current row (LEAD Sql
scenario
33
Input file :
Sq,No
1,1000
2,2200
3,3030
4,5600
Output File :
Sq,No,No2
1,1000,2200
2,2200,3030
3,3030,5600
4,5600,NULL
Solution Design :
a) Job Design :
Below is the design which can achieve the output as we needed. Here, we are reading seq file as a input,
then data is passing through a Sort and Transformer stage to achieve the output.
34
Here, we took 2 stage variable : StageVar, StageVar1
and their derivations are -
StageVar1 = StageVar
DSLink6.No = StageVar1
d)Output File
Before capturing the data into Sequential file. We will sort the data again in ascending order to get the
output as we needed.
In SQL:
Req-38.1#) Scenario: Get the curr column value in next row (LAG Sql scenario
Input file :
Sq,No
1,1000
2,2200
3,3030
4,5600
Output File :
Sq,No,No2
1,1000,null
2,2200,1000
3,3030,2200
4,5600,3030
35
Wht is the difference beteen validated ok and compiled in
datastage.
When you compile a job, it ensure that basic things like all the important stage parameters has been set,
mappings are correct, etc. and then it creates an executable job.
36
You validate a compiled job to make sure that all the connections are valid. All the job parameters are set
and a valid output can be expected after running this job. It is like a dry run where you don't actually play
with the live data but you are confident that things will work
Req-40#) Scenario:
From the Below source, replace with NewWord where ever word contain abc
Source:
empno,ename,job,sal
1200,abc123,xyz,2500
1202,lmn789,abc,2600
1204,abc,xyz,2700
1206,lmn,aaa,2800
1208,abc,xyz,2900
1210,lmn,abc,3000
1212,abc,xyz,3100
1214,lmn,aaa,3200
Target:
empno,ename,job,sal
1200,NewWord123,xyz,2500
1202,lmn789,NewJob,2600
1204,NewWord,xyz,2700
1206,lmn,aaa,2800
1208,NewWord,xyz,2900
1210,lmn,NewJob,3000
1212,NewWord,xyz,3100
1214,lmn,aaa,3200
37
Change(DSLink3.ename,'abc','NewWord')
On Oracle:
SELECT JOB, REPLACE(JOB,'MANAGER','MNG') JOB1 FROM EMP;
Req-41#) I have a source file, in output I want first 8 lines should be empty 9th line should be header
and remaining are source data? how to achieve this?
Job Design:
Here for header in the constraint level use @INROWNUM=1 (it will read only 1 record), to get empty
rows as we required just mention in the constraint level as @INROWNUM<=4 (it will give only 4 empty
rows).
38
Step3) use the Column Export stage for export all the columns value into singe column as below
Req-42#) I have EMP data and date in target I want to load the date into micro seconds. how you do
it?
39
o ensure the stage converts the Timestamp string to Timestamp datatype without
losing the microseconds, the conversion can be coded explicitly in the derivation,
for example, instead of setting the derivation to just the name of TimeStamp string
such as myTimeStamp, change it to:
StringToTimestamp(myTimestamp,"%yyyy-%mm-%dd %hh:%nn:%ss.6")
In transformer stage:
Req-44:
Input :
a
b
40
c
target:
a
bb
ccc
How can I get this. could anyone help me to do this.....
Req-45: Get first Row and last row from source file?
My input has a unique column-id with the values
Src
-----
10
20
30
40
50
.....
how can i get first record in one o/p file, last record in another o/p file and rest of the records in 3rd
o/p file?
Sol)
Moethod1:
41
Method2:
src-->colgen(generate dmmy column with value 1)--->copy(3 o/ps)
1 o/p --> head
2 o/p -->tail
head and tail -->funnel
3rd copy and funnel -->join(fullouter)-->filter (not null on dummy col)
Req-46:
Atul
Neeraj
Anita
Amruta
Divya
42
Swapnil
Pramod
Vivek
Ashish
Amit
Santosh
Output
Employee Name
Atul
Neeraj
Anita
Amruta
Divya
Swapnil
Pramod
Vivek
Ashish
Amit
Santosh
Employee Count : 11
Job Design:
43
Step2: from Transformer stage give 2 output links
Step3:
44
Step4:
Step5: From the transformer stage take 3 output links, 1 is for header, 1 is for count, 1 is complete
data.
45
Step6: using column export stage, export all the source data into single columns.
Step8: Using sort stage sort the no field to get header, data and tailer into order
46
Note: Similar job, read the source data from database and create the header and footer records
Req-47:
https://www.datagenx.net/2015/10/dsjob-managing-datastage-jobs-from_6.html
Data
Gama
Charlie
Alpha
Beta
Alpha
Charlie
Delta
Alpha
47
Output Expected:
Sol:
(Or)
48
StageVar2 +1 = StageVar2
if StageVar1 = DSLink8.Data then StageVar3 else StageVar2 = StageVar3
StageVar3 = StageVar4
if StageVar1= DSLink8.Data then StageVar else StageVar + 1 = StageVar
DSLink.Data = StageVar1
varableName | InitialValue
----------------------------
49
svCurrvalu |
svRowNum |0
svRank |1
svDenseRank |0
svprevVal |0
svCurrVal:Data
svRowNum:svRowNum +1
svRank : If svCurrVal = svPrevVal Then svRank Else svRowNum
svDenseRank: If svCurrVal = svPrevVal Then svDenseRank Else svDenseRank+1
svPrevVal:svCurrVal
Note:: Make sure the job in single node or sequentially, or change setting from parallel to sequential
in each stage.
50
In Oracle SQL Or any other Database SQL we can use the queries like below
Ex:Row_Number()
SELECT ENAME,JOB,SAL,DEPTNO,ROW_NUMBER() OVER(ORDER BY DEPTNO) RN FROM EMP;
Ex:RANK()
SELECT ENAME,JOB,SAL,DEPTNO,RANK() OVER(ORDER BY DEPTNO) RN FROM EMP;
Ex:DENSE_RANK()
SELECT ENAME,JOB,SAL,DEPTNO,DENSE_RANK() OVER(ORDER BY DEPTNO) RN FROM EMP;
cumulative sal:
SELECT ENAME,JOB,DEPTNO,SAL,SUM(SAL) OVER(ORDER BY SAL) t FROM EMP;
Cumulative sum:
51
Output:
EMPNO|ENAME|SAL|cumilative_Sal
7369|SMITH| 00800.| 00800.00
7900|JAMES| 00950.| 01750.00
7521|WARD| 01250.| 03000.00
7654|MARTIN| 01250.| 04250.00
7934|MILLER| 01300.| 05550.00
7844|TURNER| 01500.| 07050.00
7499|ALLEN| 01600.| 08650.00
7782|CLARK| 02450.| 11100.00
7566|JONES| 02975.| 14075.00
7902|FORD| 03000.| 17075.00
7839|KING| 05000.| 22075.00
52
Find department wise sum salary
currval: Sal
sv1: If currsal=prevsal then DSLink5.SAL +sv1 Else sal
prevsal:currval
53
Req-49:
FileA FileB
1 9
2 10
3 11
4 12
5 13
6 14
7 15
8 16
9 17
10 18
11 19
12 20
13 21
14 22
16 23
OutPu
t
File_A File_B File_AB
54
1 17 9
2 18 10
3 19 11
4 20 12
5 21 13
5 22 14
6 23 15
7 16
8
Solution Design:
a) Job Design:
Below is the design which can achieve the output as we needed. Here, we are reading 2 seq file as a
input, then data is passing through a Join and Filter stage to achieve the output.
b) Join Stage Properties: For the required output, we have to join the both input file on "Col1". Here we
are using FULL OUTER JOIN.
55
FULL OUTER JOIN will generate the 2 columns LEFT n RIGHT. Map the both columns to Join Output
b) Filer Stage Properties : Now, In Filter Stage, We will filter the data as per requirement.
Here ....
FileA = Left and FileB = Right
So...
Data Only in FileA = where Right_Col1 = 0 or "" or NULL
Data Only in FileB = where Left_Col1 = 0 or "" or NULL
Data in both FileA and FileB = where (Right_Col1 <> 0 or "" or NULL) and (Left_Col1 <> 0 or "" or
NULL)
## Note :-
Left and Right Col will be = 0 or "" or NULL if-
NULL => Data will be NULL if the Source is DB. ( But here we are using seq file )
In case of Sequence File as Source
"" => Data will be "" if the column data type is VarChar.
0 => Data will be 0 if the column data type is Integer.
56
For "Data available only in FileB" :- Assign Right to Output.
a) In the job properties There is an option to call the Before and After job subroutines.
b) In the job sequence there is an activity called "Routine Activity". From there also the routines could
be called.
c) In the derivation part of the Transformer of a parallel job "parallel routines "can be called.
d) In the derivation part of the Transformer of a server job "server routines "can be called.
f) In the server job stages also before and after job subroutines can be called.
Req-52:find first row and last row from the source file?
Src
10
57
20
30
40
50
Output:
10
50
Design:
58
Note: make sure, job should run in sequential
Req-60:
INPUT.txt –
Please generate lots of data using row generator. Provide a list of pin codes so that it repeats in multiple
records
ID,NAME,SALARY,ADDRESS,PINCODE
1,ABC,1000,WHITEFIELD,560066
2,XYZ,2000,MARTAHALLI,560075
2,XYZ,2300,MARTAHALLI,560075
5,FDG,1100,KUNDALAHALLI,560061
4,IOP,1200,MADIWALA,560061
3,UTR,1150,SILKBOARD,560066
6,SDA,1100,BTM LAYOUT,560075
8,MNB,2000,SINGSANDRA,560032
8,MNB,2300,ECITY,560032
10,WRE,3000,HOSUR,560011
11,ASD,2500,MAJESTIC,560066
OUTPUT_1.txt -> Note - None of the records which has duplicates should be considered for any
calculation and all of them should go to error file
PINCODE,NBR_OF_EMPL,HIGHEST_SAL,LOWEST_SAL,AVERAGE_SAL
560066,3,2500,1000,1550
560075,1,1100,1100,1100
560061,2,1200,1100,1150
560011,1,3000,3000,3000
OUTPUT_2_ERRORS.txt
ID,NAME,SALARY,ADDRESS,PINCODE,ERROR_DESCRIPTION
2,XYZ,2000,MARTAHALLI,560075,DIFFERENT SALARIES
2,XYZ,2300,MARTAHALLI,560075,DIFFERENT SALARIES
8,MNB,2000,SINGSANDRA,560032,DIFFERENT ADDRESS
8,MNB,2300,ECITY,560032,DIFFERENT ADDRESS
Req-61:
Source:
DID ITEM
111 A
59
111 B
111 C
111 D
222 E
222 F
222 G
222 H
Target:
DID ITEM RESULT
111 A A
111 B A,B
111 C A,B,C
111 D A,B,C,D
222 E A,B,C,D,E
222 F A,B,C,D,E,F
222 G A,B,C,D,E,F,G
222 H A,B,C,D,EF,G,H
Design:
Sort stage:
Transformer stage:
60
Req-62:
Src:
A
B
C
D
E
Trg:
col1,test_col1
A,A
B,BB
C,CCC
D,DDDD
E,EEEEE
61
Note: Run the job in single Node
62
63.) Find How many occurrences repeated in a input string?
63
64) How to perform null handling in Transformer stage?
Use null handling functions
65) scenario
Source:
col1:
sac123#xysx456
lmn897_ijk@
rjk++xlmno11143
output1: Output2:
Numbers string_c
123456 sacxysx
897 lmnijk
11143 rjkxlmno
66) scenario
Src:
Country_name|countrycode
ind |101,102,103
US |897,912
China |616,658,692,217
outpu:
countryname countrycode
ind 101
ind 102
ind 103
US 897
US 912
China 616
China 658
China 692
64
China 217
Hint:
Source -- > transformer stage --> target
OutPut
col1
India is country, India is developing
in India movies are famous,visit India, in India they provide many offers
67) scenario:
Src:
Ename sal doj
Abc 100 12-12-2019
xyz 200 21-04-2018
lmn 300 15-06-2021
output:
Ename sal doj
Abc 100 12-12-2019 00:00:00
xyz 200 21-04-2018 00:00:00
lmn 300 15-06-2021 00:00:00
Hint:
Source--- > Transformer stage -- > target
in transformer stage: need to concatenate the date and timestamp then convert into string to
timestamp.
65
68: SCD Implementations:
what is SCD (Slowly Changing Dimension)?
Dimensions that change slowly over time, rather than changing on regular schedule, time-base. In Data
Warehouse there is a need to track changes in dimension attributes in order to report historical data.
There are many approaches how to deal with SCD. The most popular are:
Type 1 - Overwriting the old value. In this method no history of dimension changes is kept in the
database. The old dimension value is simply overwritten be the new one. This type is easy to maintain
and is often use for data which changes are caused by processing corrections (e.g. removal special
characters, correcting spelling errors).
Type 2 - Creating a new additional record and maintain the record history. In this
methodology all history of dimension changes is kept in the database. You capture attribute
change by adding a new row with a new surrogate key to the dimension table. Both the prior and
new rows contain as attributes the natural key (or other durable identifier). Also 'effective date'
and 'current indicator' columns are used in this method. There could be only one record with
current indicator set to 'Y'. For 'effective date' columns, i.e. start_date and end_date, the end_date
for current record usually is set to value 9999-12-31. Introducing changes to the dimensional
model in type 2 could be very expensive database operation so it is not recommended to use it in
dimensions where a new attribute could be added in the future.
66
Customer_ID Customer_Name Customer_Type Start_Date End_Date Current_Flag
1 Cust_1 Platinum 22-07-2010 17-05-2012 N
1 Cust_1 Gold 17-05-2012 04-12-2022 N
1 Cust 1 Silver 04-12-2022 31-12-9999 Y
Type 3 - Adding a new column. In this type usually only the current and previous value of
dimension is kept in the database. The new value is loaded into 'current/new' column and the old
one into 'old/previous' column. Generally speaking the history is limited to the number of column
created for storing historical data. This is the least commonly needed techinque.
Type 4 - Using historical table. In this method a separate historical table is used to track all
dimension's attribute historical changes for each of the dimension. The 'main' dimension table
keeps only the current data e.g. customer and customer_history tables.
Current table:
Customer_ID Customer_Name Customer_Type
1 Cust_1 Corporate
Historical table:
Customer_ID Customer_Name Customer_Type Start_Date End_Date
1 Cust_1 Retail 01-01-2010 21-07-2010
1 Cust_1 Oher 22-07-2010 17-05-2012
1 Cust_1 Corporate 18-05-2012 31-12-9999
Type 6 - Combine approaches of types 1,2,3 (1+2+3=6). In this type we have in dimension table
such additional columns as:
current_type - for keeping current value of the attribute. All history records for given item of
attribute have the same current value.
historical_type - for keeping historical value of the attribute. All history records for given
item of attribute could have different values.
67
start_date - for keeping start date of 'effective date' of attribute's history.
end_date - for keeping end date of 'effective date' of attribute's history.
current_flag - for keeping information about the most recent record.
In this method to capture attribute change we add a new record as in type 2. The current_type
information is overwritten with the new one as in type 1. We store the history in a
historical_column as in type 3.
68
Step3: In target, follow the below settings
This is full load from the source, if any data is changes from the source, then re-run the job, all the
previous values will override. Here we can’t see the record transaction history
69
TYPE-2 Implementation in Data Stage:
Job Design:
In the mapping we can see, it generated additional column to capture the changes
70
Step4: Based on these change codes, update the end_date and flag (create a new job for this process)
Step4.1) Read the all the changes data from sequential file stage
Step4.2) In transformer stage do the below changes
71
Enable the keys to generate the update statement to update the end_Date and flag columns
4.3) Run the job and see the result in the target table
Above zero (Flag_ind) and end_date columns are updated with updated records
Step5: Now Insert the New Records and updated record as new record in the target with new job
72
Step5.3: in the target set the options like below
Step5.4: Run the job and find the record transaction history in the target table
Finally, 1010 is having 2 records, one is old record and another one is updated record
Note: Like this we can implement the SCD using different stages like
Change Data Capture
SCD Stage
Lookup Stage
Join Stage
Compare stage/difference Stage
73
Some use full links for the better understand the other stuff:
https://datastage4u.wordpress.com/category/datastage-best-practices/
http://datastageforleaener.blogspot.com/2015/06/change-capture-stage-in-datastage.html
We have a excel sheet with data like Valic & Non-Valic data this, need to load into the control table
Excel data like:
Dis Un M Sale
Segment Country Product Bnd Sold Price Price Gross Sales
Governme Carreter
nt Canada a None 1619 3 20 32370
Carreter
Midmarket France a None 2178 3 15 32670
German Carreter
Midmarket y a None 888 3 15 13320
Carreter
Midmarket Mexico a None 2470 3 15 37050
German Montan
Midmarket y a None 921 5 15 13815
Channel Montan
Partners Canada a None 2518 5 12 30216
Governme Montan
nt France a None 1899 5 20 37980
74
Montan
Midmarket Mexico a None 2470 5 15 37050
Montan
Enterprise Canada a None 2666 5 125 333188
Small Montan
Business Mexico a None 958 5 300 287400
Montan
Enterprise Canada a None 345 5 125 43125
United
States
of Montan
Midmarket America a None 615 5 15 9225
Governme
nt Canada Paseo None 292 10 20 5840
Midmarket Mexico Paseo None 974 10 15 14610
Channel
Partners Canada Paseo None 2518 10 12 30216
75