Datastage Interview Questions
Datastage Interview Questions
5. Debug stages in PX
12. What are the errors you expereiced with data stage
13. what are the main diff between server job and parallel job in datastage
15. What is the difference between Squential Stage & Dataset Stage. When do u use them.
17. What is Phantom error in the datastage. How to overcome this error.
19. Explain the best approch to do a SCD type2 mapping in parallel job?
20. how can we improve the performance of the job while handling huge amount of data
22. how to implement routines in data stage,have any one has any material for data stage
23. How will you determine the sequence of jobs to load into data warehouse?
28. How to run a Shell Script within the scope of a Data stage job?
33. How to connect two stages which do not have any common columns between them?
34. In SAP/R3, How do you declare and pass parameters in parallel job .
36. How do you fix the error "OCI has fetched truncated data" in DataStage
37. A batch is running and it is scheduled to run in 5 minutes. But after 10 days the time changes to 10
minutes. What type of error is this and how to fix it?
38. Which partition we have to use for Aggregate Stage in parallel jobs ?
39. What is the baseline to implement parition or parallel execution method in datastage job.e.g. more
than 2 millions records only advised ?
41. What is the flow of loading data into fact & dimensional tables?
43. Aggregators – What does the warning “Hash table has grown to ‘xyz’ ….” mean?
source has 10000 records, Job failed after 5000 records are loaded. This status of the job is abort ,
Instead of removing 5000 records from target , How can i resume the load
46. What is Orchestrate options in generic stage, what are the option names. value ? Name of an
Orchestrate operator to call. what are the orchestrate operators available in datastage for AIX
environment.
50.
51What are the difficulties faced in using DataStage ? or what are the constraints in using
DataStage ?
52. Have you ever involved in updating the DS versions like DS 5.X, if so tell us some the steps
you have
53. What r XML files and how do you read data from XML files and what stage to be used?
56. What is the default cache size? How do you change the cache size if needed?
Default cache size is 256 MB. We can incraese it by going into Datastage Administrator and
selecting the Tunable Tab and specify the cache size over there.
57. How do you pass the parameter to the job sequence if the job is running at night?
61. What are the important considerations while using join stage instead of lookups.
62. how to implement type2 slowly changing dimenstion in datastage? give me with example?
64. What are Static Hash files and Dynamic Hash files?
65. What is the difference between Datastage Server jobs and Datastage Parallel jobs?
69. What is the order of execution done internally in the transformer with the stage editor having
input links on the lft hand side and output links?
70. How will you call external function or subroutine from datastage?
74. How do you do oracle 4 way inner join if there are 4 oracle input files?
77. How to handle Date convertions in Datastage? Convert a mm/dd/yyyy format to yyyy-dd-
mm?
We use a) "Iconv" function - Internal Convertion. b) "Oconv" function - External Convertion.
Function to convert mm/dd/yyyy format to yyyy-dd-mm is Oconv(Iconv(Filedname,"D/M
78. How do you execute datastage job from command line prompt?
Using "dsjob" command as follows. dsjob -run -jobstatus projectname jobname
80 How to install and configure DataStage EE on Sun Micro systems multi-processor hardware
running the Solaris 9 operating system?
Asked by: Kapil Jayne
81. What are all the third party tools used in DataStage?
83. what is the difference between routine and transform and function?
85. how to attach a mtr file (MapTrace) via email and the MapTrace is used to record all the
execute map errors
86. Is it possible to calculate a hash total for an EBCDIC file and have the hash total stored as
EBCDIC using Datastage?
Currently, the total is converted to ASCII, even tho the individual records are stored as EBCDIC.
87. If your running 4 ways parallel and you have 10 stages on the canvas, how many processes
does datastage create?
89. How will you pass the parameter to the job schedule if the job is running at night? What
happens if one job fails in the night?
91. how find duplicate records using transformer stage in server edition
102. Does type of partitioning change for SMP and MPP systems?
103. what is the difference between RELEASE THE JOB and KILL THE JOB?
104. Can you convert a snow flake schema into star schema?
108.Where we can use these Stages Link Partetionar, Link Collector & Inter Process (OCI) Stage
whether in Server Jobs or in Parallel Jobs ?And SMP is a Parallel or Server ?
109. Where can you output data using the Peek Stage?
111. In which situation,we are using RUN TIME COLUMN PROPAGATION option?
113. 1 1. Difference between Hashfile and Sequential File?. What is modulus?2 2. What is iconv
and oconv functions?.3 3. How can we join one Oracle source and Sequential file?.4 4. How can
we implement Slowly Changing Dimensions in DataStage?.5 5. How can we implement Lookup
in DataStage Server jobs?.6 6. What are all the third party tools used in DataStage?.7 7. what is
the difference between routine and transform and function?.8 8. what are the Job parameters?.9
9. Plug-in?.10 10.How can we improv
117. Suppose you have table "sample" & three columns in that tablesample:Cola Colb Colc1 10
1002 20 2003 30 300Assume: cola is primary keyHow will you fetch the record with maximum
cola value using data stage tool into the target system
119. What is TX and what is the use of this in DataStage ? As I know TX stand for Transformer
Extender, but I don't know how it will work and where we will used ?
120. What is the difference betwen Merge Stage and Lookup Stage?
123.What is the diffrence between the Dynamic RDBMS Stage & Static RDBMS Stage ?
133. What are different dimension table in your project??Plz explain me with an example??
138. how to distinguish the surogate key in different dimensional tables?how can we give for
different dimension tables?
140. What is the difference between sequential file and a dataset? When to use the copy stage?
142. What is complex stage? In which situation we are using this one?
146. What are the most important aspects that a beginner must consider doin his first DS
project ?
151. Why job sequence is use for? what is batches?what is the difference between job sequence
and batches?
155. purpose of using the key and difference between Surrogate keys and natural key
156. how to read the data from XL FILES?my problem is my data file having some commas in
data,but we are using delimitor is| ?how to read the data ,explain with steps?
157. How can I schedule the cleaning of the file &PH& by dsjob?
158. Hot Fix for ODBC Stage for AS400 V5R4 in Data Stage 7.1
161. what is the meaning of the following..1)If an input file has an excessive number of rows and
can be split-up then use standard 2)logic to run jobs in parallel3)Tuning should occur on a job-
by-job basis. Use the power of DBMS.
162. Why is hash file is faster than sequential file n odbc stage??
168. what is NLS in datastage? how we use NLS in Datastage ? what advantages in that ? at the
time of installation i am not choosen that NLS option , now i want to use that options what can i
do ? to reinstall that datastage or first uninstall and install once again ?
172. What is the use of Hash file??insted of hash file why can we use sequential file itself?
173. what is pivot stage?why are u using?what purpose that stage will be used?
177. what is the difference between static hash files n dynamic hash files?
179. What is the difference between reference link and straight link ?
180. What are the command line functions that import and export the DS jobs?
182. Whats difference betweeen operational data stage (ODS) & data warehouse?
183. I have few questions1. What ar ethe various process which starts when the datastage engine
starts?2. What are the changes need to be done on the database side, If I have to use dB2 stage?3.
datastage engine is responsible for compilation or execution or both?
184. Could anyone plz tell abt the full details of Datastage Certification.Title of Certification?
Amount for Certification test?Where can v get the Tutorials available for certification?Who is
Conducting the Certification Exam?Whether any training institute or person for guidens?I am
very much pleased if anyone enlightwn me abt the above saidSuresh
186. What is Ad-Hoc access? What is the difference between Managed Query and Ad-Hoc
access?
188. how we use the DataStage Director and its run-time engine to schedule running the solution,
testing and debugging its components, and monitoring the resulting e/xecutable versions on ad
hoc or scheduled basis?
189. What is the difference bitween OCI stage and ODBC stage?
191. How do you remove duplicates without using remove duplicate stage?
192. if we using two sources having same meta data and how to check the data in two sorces is
same or n
if we using two sources having same meta data and how to check the data in two sorces is same
or not?and if the data is not same i want to abort the job ?how we can do this?
193. If a DataStage job aborts after say 1000 records, how to continue the job from 1000th
record after fixing the error?
194. Can you tell me for what puorpse .dsx files are used in the datasatage
196. give one real time situation where link partitioner stage used?
200. What is the exact difference betwwen Join,Merge and Lookup Stage??
202. What are the new features of Datastage 7.1 from datastage 6.1
204. How to know the no.of records in a sequential file before running a server job?
205. Other than Round Robin, What is the algorithm used in link collecter? Also Explain How it
will works?
206. how to drop the index befor loading data in target and how to rebuild it in data stage?
208. what is the transaction size and array size in OCI stage?how these can be used?
216. How do u check for the consistency and integrity of model and repository?
217. how we can call the routine in datastage job?explain with steps?
220. If the size of the Hash file exceeds 2GB..What happens? Does it overwrite the current rows?
221. where we use link partitioner in data stage job?explain with example?
222 How i create datastage Engine stop start script.Actually my idea is as below.!
#bin/bashdsadm - usersu - rootpassword
(encript)DSHOMEBIN=/Ascential/DataStage/home/dsadm/Ascential/DataStage/DSEngine/binif
check ps -ef | grep DataStage (client connection is there) { kill -9 PID (client connection) }uv
-admin - stop > dev/nulluv -admin - start > dev/nullverify processcheck the connectionecho
"Started properly"run it as dsadm
230. Hican any one can explain what areDB2 UDB utilitiesub
232. Will the data stage consider the second constraint in the transformer once the first condition
is satisfied ( if the link odering is given)
234. how can u implement slowly changed dimensions in datastage? explain?2) can u join flat
file and database in datastage?how?
236. DataStage from Staging to MDW is only running at 1 row per second! What do we do to
remedy?
237. what is the mean of Try to have the constraints in the 'Selection' criteria of the jobs i
what is the mean of Try to have the constraints in the 'Selection' criteria of the jobs itself. This
will eliminate the unnecessary records even getting in before joins are made?
238. * What are constraints and derivation?* Explain the process of taking backup in DataStage?
*What are the different types of lookups available in DataStage?
243. how to implement type2 slowly changing dimensions in data stage?explain with example?
244. How much would be the size of the database in DataStage ?What is the difference between
Inprocess and Interprocess ?
248. what is meaning of file extender in data stage server jobs.can we run the data stage job from
one job to another job that file data where it is stored and what is the file extender in ds jobs.
250. what is merge and how it can be done plz explain with simple example taking 2 tables .......
251. it is possible to run parallel jobs in server jobs?
252. what are the enhancements made in datastage 7.5 compare with 7.0
253. If I add a new environment variable in Windows, how can I access it in DataStage?
255. Is it possible to move the data from oracle ware house to SAP Warehouse using with
DATASTAGE Tool.
258. How can I extract data from DB2 (on IBM iSeries) to the data warehouse via Datastage as
the ETL tool. I mean do I first need to use ODBC to create connectivity and use an adapter for
the extraction and transformation of data? Thanks so much if anybody could provide an answer.
263. Did you Parameterize the job or hard-coded the values in the jobs?
Always parameterized the job. Either the values are coming from Job Properties or from a
‘Parameter Manager’ – a third part tool. There is no way you will hard–code some parameters in
your jobs. The o
265. what happends out put of hash file is connected to transformer ..what error it throughs
266. what is merge ?and how to use merge? merge is nothing but a filter conditions that have
been used for filter condition
267. What will you in a situation where somebody wants to send you a file and use that file as an
input What will you in a situation where somebody wants to send you a file and use that file as
an input or reference and then run job.
271. what are the differences between the data stage 7.0 and 7.5in server jobs?
272. How the hash file is doing lookup in serverjobs?How is it comparing the key values?
274. how is datastage 4.0 functionally different from the enterprise edition now?? what are the
exact changes?
276. What is the utility you use to schedule the jobs on a UNIX server other than using Ascential
Director?
Use crontab utility along with d***ecute() function along with proper parameters passed.
277. How can I connect my DB2 database on AS400 to DataStage? Do I need to use ODBC 1st
to open the database connectivity and then use an adapter for just connecting between the two?
Thanks alot of any replies.
278. what is the OCI? and how to use the ETL Tools?
OCI means orabulk data which used client having bulk data its retrive time is much more ie.,
your used to orabulk data the divided and retrived Asked by: ramanamv
281. Hi!Can any one tell me how to extract data from more than 1 hetrogenious Sources.mean,
example 1 sequenal file, Sybase , Oracle in a singale Job.
284. What are OConv () and Iconv () functions and where are they used?
IConv() - Converts a string to an internal storage formatOConv() - Converts an expression to an
output format.
285. If data is partitioned in your job on key 1 and then you aggregate on key 2, what issues
could arise?
286. How can I specify a filter command for processing data while defining sequential file output
data?
287. There are three different types of user-created stages available for PX. What are they?
Which would you use? What are the disadvantage for using each type?
292. Does Enterprise Edition only add the parallel processing for better performance?Are any
stages/transformations available in the enterprise edition only?
293. what are validations you perform after creating jobs in designer.what r the different type of
errors u faced during loading and how u solve them
295. how we use NLS function in Datastage? what are advantages of NLS function? where we
can use that one? explain briefly?
300. Does the BibhudataStage Oracle plug-in better than OCI plug-in coming from DataStage?
What is theBibhudataStage extra functions?
301. How do we do the automation of dsjobs?
302. what is trouble shhoting in server jobs ? what are the diff kinds of errors encountered while
running any job?
303. what is Data stage Multi-byte, Single-byte file conversions?how we use that conversions in
data stage?
304. What are other Performance tunings you have done in your last project to increase the
performance of slowly running jobs?
Staged the data coming from ODBC/OCI/DB2UDB stages or any database on the server using
Hash/Sequential files for optimum performance also for data recovery in case job aborts.Tuned
the OCI stage for '
305. what is DataStage Multi-byte, Single-byte file conversions in Mainframe jobs? what is UTF
8 ? whats use of UTF 8 ?
307. What are Routines and where/how are they written and have you written any routines
before?
Routines are stored in the Routines branch of the DataStage Repository, where you can create,
view or edit. The following are different types of routines: 1) Transform functions
309. Hi, What are the Repository Tables in DataStage and What are they?
310. I want to process 3 files in sequentially one by one , how can i do that. while processing the
files it should fetch files automatically .
311. where does unix script of datastage executes weather in clinet machine or in server.suppose
if it eexcutes on server then it will execute ?
312. please list out the versions of datastage Parallel , server editions and in which year they are
realised.
320. Scenario based Question ........... Suppose that 4 job control by the sequencer like (job 1, job
2, job 3, job 4 )if job 1 have 10,000 row ,after run the job only 5000 data has been loaded in
target table remaining are not loaded and your job going to be aborted then.. How can short out
the problem.
Suppose job sequencer synchronies or control 4 job but job 1 have problem, in this condition
should go director and check it what type of problem showing either data type problem, warning
massage, job Asked by: Mukesh Kumar Madhav
321. What is the Batch Program and how can generate ?
Batch programe is the programe it's generate run time to maintain by the datastage it self but u
can easy to change own the basis of your requirement (Extraction, Transformation,Loading)
.Batch progr
323. How many jobs have you created in your last project?
100+ jobs for every 6 months if you are in Development, if you are in testing 40 jobs for every 6
months although it need not be the same number for everybody
324. what's the difference between Datastage Developers and Datastage Designers. What are the
skill's required for this.
325. Could you please help me with a set of questions on Parallel Extender?
327. Suppose if there are million records did you use OCI? if not then what stage do you prefer?
332. What is project life cycle and how do you implement it?
334. What are the often used Stages or stages you worked with in your last project?
A) Transformer, ORAOCI8/9, ODBC, Link-Partitioner, Link-Collector, Hash, ODBC,
Aggregator, Sort.
335. Have you ever involved in updating the DS versions like DS 5.X, if so tell us some the steps
you have taken in doing so?
Yes. The following are some of the steps; I have taken in doing so:1) Definitely take a back up of
the whole project(s) by exporting the project as a .dsx file2) See that you are using the same
parent
337. If worked with DS6.0 and latest versions what are Link-Partitioner and Link-Collector used
for?
Link Partitioner - Used for partitioning the data.Link Collector - Used for collecting the
partitioned data.
342. The above might rise another question: Why do we have to load the dimensional tables first,
then fact tables:
As we load the dimensional tables the keys (primary) are generated and these keys (primary) are
Foreign keys in Fact tables.
343. Tell me one situation from your last project, where you had faced problem and How did u
solve it?
A. The jobs in which data is read directly from OCI stages are running extremely slow. I had to
stage the data before sending to the transformer to make the jobs run faster.B. The job aborts
344. Does the selection of 'Clear the table and Insert rows' in the ODBC stage send a Truncate
statement to the DB or does it do some kind of Delete logic.
There is no TRUNCATE on ODBC stages. It is Clear table blah blah and that is a delete from
statement. On an OCI stage such as Oracle, you do have both Clear and Truncate options. They
are radically di
345. How do you rename all of the jobs to support your new File-naming conventions? Create a
Excel spreadsheet with new and old names. Export the whole project as a dsx. Write a Perl
program, which can do a simple rename of the strings looking up the Excel file.