DATASTAGE
DATASTAGE
source table
name
A
A
B
B
B
C
C
D
Soln:
Seq----transformer---seq.
Go to the stage variable properties create two staging variables first is s1
and second is s2 .
s2 = if inputcolumn = s1 then s2 +1 else 1
s1= inputcolumn
create new column is count.
s2= count
Q.
Soln:
Without using stage variable how can we delete the duplicates using
Transformer?
Soln:
The prerequisite for this is data should be partitioned by a key and sorted.
Later, follow the below steps:
S2 = if input.column = S1 then 0 else 1
S1 = input.column
In constraint : S2=1
Q,
Soln:
Sourcefile --> copy stage --> 1st link --> Removeduplicate stage -->
outputfile1 with 10,20,30,40,50,60,70
Copy stage-->2nd link --> aggregator stage (creates the row count)-->
filter stage-->filter1 (count>1) -->outputfile2 with 10,20,30,40 --
>Filter2(count=1)-->outputfile3 with 50,60,70
Q.
Details:
Col
-----
C1
C2
C3
C4
C5
C6
C7
C8
C9
C10
Soln:
Use below constraints in xfm to move the data into 3 different columns.
Make the xfm partition to run sequential.
mod(@inrownum,3) = 1
mod(@inrownum,3) = 2
mod(@inrownum,3) = 0
This will move first record to first output file, second to second and third
record to third reference link.
Q.
I have a sequential file it is having some records and the same file is
having header and footer now my question is how to count the records in
a file which is not counts the header and footer records and then
transform the records in to target and then again we fetch the header and
footer to that file which records are matched with the header
Soln:
Cat | sed 1d ; $d | wc -l
Q.
input
-------------
name | no
--------------------
Bose 1
Mani 2
Arun 3
Output
-------------
name | no
--------------------
Bose 1
Mani 2
Mani 2
Arun 3
Arun 3
Arun 3
Soln:
Seq--->Tnx----Dataset
3) in d link derivation
DSlink.name ---Name
@ITERATION ----Required_NumRows
O/p:
Name No Required_NumRows
BOSE 1 1
MANI 2 2
MANI 2 2
ARUN 3 3
ARUN 3 3
ARUN 3 3
Q,
Soln:
Look Copy stage is made up for only copying data and transformer stage
is made up for multiple functionalities and also it is made up of c++ code.
So if you want to only copy data and rename datatypes so you can go with
copy stage because if you use transformer stage it will call all c++
functions for running single operation and it will consume too much time
for single functionality so here copy stage is better than transformer
stage. But you want to multiple functionalities like copy ,and want to use
mathematical functions, miscellaneous functions ,logical etc so
transformer stage is good for multitasking .
Q,
Soln:
For Join, Merge and Remove duplicates, have data on links hash key
partitioned an sorted on Key columns specified. For lookup - primary link
needs to be hash key partitioned and sorted and reference link has to use
entire partition method.
Q,
Soln:
1.First filter then extract. But dont extract and filter. Use SQL instead
of table method when extracting. Say 1 million records are coming
from input table but there is a filter condition (Acct_Type=S) in job
as per business documents which results only few records say (100).
2. Reduce as many as transformer stages.
3. Reduce stage variables.
4.Use Copy stage instead of a Transformer for simple operations like
:
Q.
Soln: