Question: Dimension Modeling Types Along With Their Significance
Question: Dimension Modeling Types Along With Their Significance
Answer:
B) Dimensional Modelling.
Answer:
Answer:
Surrogate Key is a Primary Key for a Dimension table. Most importance of using it is, it
is independent of underlying database, i.e. Surrogate Key is not affected by the changes
Answer:
Data in a Database is
A) Detailed or Transactional
C) Current.
Question: What is the flow of loading data into fact & dimensional tables?
Answer:
Fact table - Table with Collection of Foreign Keys corresponding to the Primary Keys
Load - Data should be first loaded into dimensional table. Based on the primary key
values in dimensional table, then data should be loaded into Fact table.
Answer:
Orchestrate itself is an ETL tool with extensive parallel processing capabilities and
running on UNIX platform. Datastage used Orchestrate with Datastage XE (Beta version
of 6.0) to incorporate the parallel processing capabilities. Now Datastage has purchased
Orchestrate and integrated it with Datastage XE and released a new version Datastage 6.0
Answer:
Primary Key is a combination of unique and not null. It can be a collection of key values
called as composite primary key. Partition Key is a just a part of Primary Key. There are
several methods of partition like Hash, DB2, Random etc...While using Hash partition we
Answer:
Stage Variable - An intermediate processing variable that retains value during read and
Question: What is the default cache size? How do you change the cache size if
needed?
Answer:
Default cache size is 256 MB. We can increase it by going into Datastage Administrator
and selecting the Tunable Tab and specify the cache size over there.
Answer:
Used for Look-ups. It is like a reference table. It is also used in-place of ODBC, OCI
Answer:
i) Generic
ii) Specific
Question: What are Static Hash files and Dynamic Hash files?
Answer:
As the names itself suggest what they mean. In general we use Type-30 dynamic Hash
files. The Data file has a default size of 2GB and the overflow file is used if the data
exceeds the 2GB size.
Answer:
Answer:
ODBC PLUG-IN
Can be used for Variety of Databases Database Specific (only one database)
Question: How do you execute datastage job from command line prompt?
Answer:
Question: What are the command line functions that import and export the DS
jobs?
Answer:
Question: How to run a Shell Script within the scope of a Data stage job?
Answer:
Question: What are OConv () and Iconv () functions and where are they used?
Answer:
format to yyyy-dd-mm?
Answer:
We use
Oconv(Iconv(Filedname,"D/MDY[2,2,4]"),"D-MDY[2,2,4]")
Answer:
Answer:
Answer:
Link Partitioner: It actually splits data into various partitions or data flows using various
Partition methods.
Link Collector: It collects the data coming from partitions, merges it into a single data
Answer:
Answer:
Question: Did you Parameterize the job or hard-coded the values in the jobs?
Answer:
Always parameterized the job. Either the values are coming from Job Properties or from
a ‘Parameter Manager’ – a third part tool. There is no way you will hard–code some
parameters in your jobs. The often Parameterized variables in a job are: DB DSN name,
username, password, dates W.R.T for the data to be looked against at.
Question: Have you ever involved in updating the DS versions like DS 5.X, if so tell
Answer:
Yes.
Definitely take a back up of the whole project(s) by exporting the project as a .dsx file
See that you are using the same parent folder for the new version also for your old jobs
After installing the new version import the old project(s) and you have to compile them
all again. You can use 'Compile All' tool for this.
Make sure that all your DB DSN's are created with the same name as old ones. This step
In case if you are just upgrading your DB from Oracle 8i to Oracle 9i there is tool on DS
Do not stop the 6.0 server before the upgrade, version 7.0 install process collects project
Answer:
Typically a Reject-link is defined and the rejected data is loaded back into data
warehouse. So Reject link has to be defined every Output link you wish to collect
rejected data. Rejected data is typically bad data like duplicates of Primary keys or nullrows
where data is expected.
☻Page 17 of 210☻
Question: What are other Performance tunings you have done in your last project
Answer:
Staged the data coming from ODBC/OCI/DB2UDB stages or any database on the
server using Hash/Sequential files for optimum performance also for data recovery in
Tuned the OCI stage for 'Array Size' and 'Rows per Transaction' numerical values for
Sorted the data as much as possible in DB and reduced the use of DS-Sort for better
performance of jobs.
Removed the data not used from the source as early as possible in the job.
performance of DS queries.
If an input file has an excessive number of rows and can be split-up then use standard
Before writing a routine or a transform, make sure that there is not the functionality
required in one of the standard routines supplied in the sdk or ds utilities categories.
Constraints are generally CPU intensive and take a significant amount of time to
process. This may be the case if the constraint calls routines or external macros but if
Try to have the constraints in the 'Selection' criteria of the jobs itself. This will
eliminate the unnecessary records even getting in before joins are made.
Try not to use a sort stage when you can use an ORDER BY clause in the database.
Using a constraint to filter a record set is much slower than performing a SELECT …
WHERE….
Make every attempt to use the bulk loader for your particular database. Bulk loaders
Question: Tell me one situation from your last project, where you had faced
Answer:
1. The jobs in which data is read directly from OCI stages are running extremely slow. I
had to stage the data before sending to the transformer to make the jobs run faster.
2. The job aborts in the middle of loading some 500,000 rows. Have an option either
cleaning/deleting the loaded data and then run the fixed job or run the job again from
the row the job has aborted. To make sure the load is proper we opted the former.
Answer:
Give the OS of the Server and the OS of the Client of your recent most project
☻Page 18 of 210☻
Answer:
Most of the times the data was sent to us in the form of flat files. The data is dumped and
sent to us. In some cases were we need to connect to DB2 for look-ups as an instance
then we used ODBC drivers to connect to DB2 (or) DB2-UDB depending the situation
and availability. Certainly DB2-UDB is better in terms of performance as you know the
native drivers are always better than ODBC drivers. 'iSeries Access ODBC Driver
Question: What are Routines and where/how are they written and have you written
Answer:
Routines are stored in the Routines branch of the DataStage Repository, where you can
1. Transform Functions
Answer:
In almost all cases we have to delete the data inserted by this from DB manually and fix
Sequencers are job control programs that execute other jobs with preset Job parameters.
Answer:
Functions like [] -> sub-string function and ':' -> concatenation operator
Syntax:
Question: What will you in a situation where somebody wants to send you a file and
Answer:
Under Windows: Use the 'WaitForFileActivity' under the Sequencers and then run the
job. May be you can schedule the sequencer around the time the file is expected to
arrive.
Under UNIX: Poll for the file. Once the file has start the job or sequencer depending
on the file.
Question: What is the utility you use to schedule the jobs on a UNIX server other
Answer:
Use crontab utility along with dsexecute() function along with proper parameters passed.
☻Page 19 of 210☻
Answer:
Yes. One of the most important requirements.
Question: How would call an external Java function which are not supported by
DataStage?
Answer:
Starting from DS 6.0 we have the ability to call external Java functions using a Java
package from Ascential. In this case we can even use the command line to invoke the
Java function and write the return values from the Java program (if any) and use that files
Question: How will you determine the sequence of jobs to load into data warehouse?
Answer:
First we execute the jobs that load the data into Dimension tables, then Fact tables, then
Question: The above might raise another question: Why do we have to load the
Answer:
As we load the dimensional tables the keys (primary) are generated and these keys
Question: Does the selection of 'Clear the table and Insert rows' in the ODBC stage
Answer:
There is no TRUNCATE on ODBC stages. It is Clear table blah blah and that is a delete
from statement. On an OCI stage such as Oracle, you do have both Clear and Truncate
options. They are radically different in permissions (Truncate requires you to have alter
table permissions where Delete doesn't).
Question: How do you rename all of the jobs to support your new File-naming
conventions?
Answer:
Create an Excel spreadsheet with new and old names. Export the whole project as a dsx.
Write a Perl program, which can do a simple rename of the strings looking up the Excel
file. Then import the new dsx file probably into a new project for testing. Recompile all
jobs. Be cautious that the name of the jobs has also been changed in your job control jobs
or Sequencer jobs. So you have to make the necessary changes to these Sequencers.
Answer:
☻Page 20 of 210☻
Answer:
Answer:
Answer:
Answer:
Datastage developer is one how will code the jobs. Datastage designer is how will design
the job, I mean he will deal with blue prints and he will design the jobs the stages that are
Answer:
Do you have large sequential files (1 million rows, for example) that need to be compared
If so, then ask how each vendor would do that. Think about what process they are going
to do. Are they requiring you to load yesterday’s file into a table and do lookups?
If so, RUN!! Are they doing a match/merge routine that knows how to process this in
sequential files? Then maybe they are the right one. It all depends on what you need the
ETL to do.
If you are small enough in your data sets, then either would probably be OK.
Question: What are the main differences between Ascential DataStage and
Informatica PowerCenter?
Answer:
Chuck Kelley’s Answer: You are right; they have pretty much similar functionality.
However, what are the requirements for your ETL tool? Do you have large sequential
files (1 million rows, for example) that need to be compared every day versus yesterday?
If so, then ask how each vendor would do that. Think about what process they are going
to do. Are they requiring you to load yesterday’s file into a table and do lookups? If so,
RUN!! Are they doing a match/merge routine that knows how to process this in
sequential files? Then maybe they are the right one. It all depends on what you need the
ETL to do. If you are small enough in your data sets, then either would probably be OK.
Les Barbusinski’s Answer: Without getting into specifics, here are some differences
Does the tool use a relational or a proprietary database to store its Meta data and
☻Page 21 of 210☻
What add-ons are available for extracting data from industry-standard ERP,
Can the tool’s Meta data be integrated with third-party data modeling and/or
How well does each tool handle complex transformations, and how much external
scripting is required?
Almost any ETL tool will look like any other on the surface. The trick is to find out
which one will work best in your environment. The best way I’ve found to make this
determination is to ascertain how successful each vendor’s clients have been using their
product. Especially clients who closely resemble your shop in terms of size, industry, inhouse
skill sets, platforms, source systems, data volumes and transformation complexity.
Ask both vendors for a list of their customers with characteristics similar to your own that
have used their ETL product for at least a year. Then interview each client (preferably
several people at each site) with an eye toward identifying unexpected problems, benefits,
or quirkiness with the tool that have been encountered by that customer. Ultimately, ask
each customer – if they had it all to do over again – whether or not they’d choose the
same tool and why? You might be surprised at some of the answers.
Joyce Bischoff’s Answer: You should do a careful research job when selecting products.
You should first document your requirements, identify all possible products and evaluate
each product against the detailed requirements. There are numerous ETL products on the
market and it seems that you are looking at only two of them. If you are unfamiliar with
the many products available, you may refer to www.tdan.com, the Data Administration
If you ask the vendors, they will certainly be able to tell you which of their product’s
features are stronger than the other product. Ask both vendors and compare the answers,
which may or may not be totally accurate. After you are very familiar with the products,
call their references and be sure to talk with technical people who are actually using the
product. You will not want the vendor to have a representative present when you speak
with someone at the reference site. It is also not a good idea to depend upon a high-level
manager at the reference site for a reliable opinion of the product. Managers may paint a
very rosy picture of any selected product so that they do not look like they selected an
inferior product.
Answer:
1. Transform of routine
a. Date Transformation
b. Upstring Transformation
3. XML transformation
☻Page 22 of 210☻
Answer: Batch program is the program it's generate run time to maintain by the
Datastage itself but u can easy to change own the basis of your requirement (Extraction,
Transformation, Loading) .Batch program are generate depends your job nature either
simple job or sequencer job, you can see this program on job control option.
Question: Suppose that 4 job control by the sequencer like (job 1, job 2, job 3, job 4
) if job 1 have 10,000 row ,after run the job only 5000 data has been loaded in target
table remaining are not loaded and your job going to be aborted then.. How can
Answer:
Suppose job sequencer synchronies or control 4 job but job 1 have problem, in this
condition should go director and check it what type of problem showing either data type
problem, warning massage, job fail or job aborted, If job fail means data type problem or
In your target table ->general -> action-> select this option here two option
First u check how much data already load after then select on skip option then
continue and what remaining position data not loaded then select On Fail , Continue
Answer:
In such case OSH has to perform Import and export every time when the job runs and the
Question: How do you rename all of the jobs to support your new File-naming
conventions?
Answer: Create a Excel spreadsheet with new and old names. Export the whole project
as a dsx. Write a Perl program, which can do a simple rename of the strings looking up
the Excel file. Then import the new dsx file probably into a new project for testing.
Recompile all jobs. Be cautious that the name of the jobs has also been changed in your
job control jobs or Sequencer jobs. So you have to make the necessary changes to these
Sequencers.
Question: What will you in a situation where somebody wants to send you a file and
Answer: A. Under Windows: Use the 'WaitForFileActivity' under the Sequencers and
then run the job. May be you can schedule the sequencer around the time the file is
expected to arrive.
B. Under UNIX: Poll for the file. Once the file has start the job or sequencer depending
on the file
☻Page 23 of 210☻
parameters.
Answer: In almost all cases we have to delete the data inserted by this from DB manually
and fix the job and then run the job again.
Question34: What is the difference between the Filter stage and the Switch stage?
Ans: There are two main differences, and probably some minor ones as well. The two
1) The Filter stage can send one input row to more than one output link. The Switch
stage can not - the C switch construct has an implicit break in every case.
2) The Switch stage is limited to 128 output links; the Filter stage can have a
Question: How can i achieve constraint based loading using datastage7.5.My target
tables have inter dependencies i.e. Primary key foreign key constraints. I want my
primary key tables to be loaded first and then my foreign key tables and also primary key
tables should be committed before the foreign key tables are executed. How can I go
about it?
In the sequencer Call all Primary Key tables loading Jobs first and followed by Foreign
key tables, when triggering the Foreign tables load Job trigger them only when Primary
2) To improve the performance of the Job, you can disable all the constraints on the
tables and load them. Once loading done, check for the integrity of the data. Which does
not meet raise exceptional data and cleanse them.
This only a suggestion, normally when loading on constraints are up, will drastically
3) If you use Star schema modeling, when you create physical DB from the model, you
can delete all constraints and the referential integrity would be maintained in the ETL
process by referring all your dimension keys while loading fact tables. Once all
dimensional keys are assigned to a fact then dimension and fact can be loaded together.
☻Page 24 of 210☻
Ans: Either use Copy command as a Before-job subroutine if the metadata of the 2 files
are same or create a job to concatenate the 2 files into one, if the metadata is different.
Ans: Data Stage provides us with a stage Remove Duplicates in Enterprise edition. Using
Ans: While job development we can create a parameter 'FILE_NAME' and the value can
be passed while
Ans: In almost all cases we have to delete the data inserted by this from DB manually
and fix the job and then run the job again.
You can only export full projects from the command line.
You can find the export and import executables on the client machine usually someplace
Answer:
JOIN: Performs join operations on two or more data sets input to the stage and then
MERGE: Combines a sorted master data set with one or more sorted updated data sets.
The columns from the records in the master and update data set s are merged so that the
out put record contains all the columns from the master record plus any additional
A master record and an update record are merged only if both of them have the same
values for the merge key column(s) that we specify .Merge key columns are one or more
Answer:
Business advantages:
It is able to integrate data coming from all parts of the company;
We can collect data of different clients with him, and compare them;
☻Page 25 of 210☻
It makes the research of new business possibilities possible;
Technological advantages:
It offers the possibility for the organization of a complex business intelligence;
Easily implementable.
Data stage manager is used for to import & export the project to view & edit the
Data stage administrator is used for creating the project, deleting the project & setting
Data stage director is use for to run the jobs, validate the jobs, scheduling the jobs.
Server components
DS server: runs executable server jobs, under the control of the DS director, that extract,
DS Package installer: A user interface used to install packaged DS jobs and plug-in;
Repository or project: a central store that contains all the information required to build
☻Page 26 of 210☻
3. I have some jobs every month automatically delete the log details what r the steps
4. I want to run the multiple jobs in the single job. How can u handle.
VSS is designed by Microsoft but the disadvantage is only one user can access at a time,
other user can wait until the first user complete the operation.
CVSS, by using this many users can access concurrently. When compared to VSS, CVSS
cost is high.
6. What is the difference between clear log file and clear status file?
Clear log--- we can clear the log details by using the DS Director. Under job menu
clear log option is available. By using this option we can clear the log details of
particular job.
Clear status file---- lets the user remove the status of the record associated with all
7. I developed 1 job with 50 stages, at the run time one stage is missed how can u
By using usage analysis tool, which is available in DS manager, we can find out the what
8. My job takes 30 minutes time to run, I want to run the job less than 30 minutes?
By using performance tuning aspects which are available in DS, we can reduce time.
Tuning aspect
☻Page 27 of 210☻
And also use link partitioner & link collector stage in between passive stages
Pivot stage is used to transposition purpose. Pivot is an active stage that maps sets of
10. If a job locked by some user, how can you unlock the particular job in DS?
We can unlock the job by using clean up resources option which is available in DS
Director. Other wise we can find PID (process id) and kill the process in UNIX server.
11. What is a container? How many types containers are available? Is it possible to
A container is a group of stages and links. Containers enable you to simplify and
modularize your server job designs by replacing complex areas of the diagram with a
• Local containers. These are created within a job and are only accessible by that job
only.
• Shared containers. These are created separately and are stored in the Repository in the
same way that jobs are. Shared containers can use any job in the project.
To deconstruct the shared container, first u have to convert the shared container to local
13. I am getting input value like X = Iconv(“31 DEC 1967”,”D”)? What is the X
value?
X value is Zero.
Iconv Function Converts a string to an internal storage format.It takes 31 dec 1967 as
14. What is the Unit testing, integration testing and system testing?
Unit testing: As for Ds unit test will check the data type mismatching,
Integration testing: According to dependency we will put all jobs are integrated in to
System testing: System testing is nothing but the performance tuning aspects in Ds.
15. What are the command line functions that import and export the DS jobs?
16. How many hashing algorithms are available for static hash file and dynamic
hash file?
17. What happens when you have a job that links two passive stages together?
Obviously there is some process going on. Under covers Ds inserts a cut-down
transformer stage between the passive stages, which just passes data straight from one
Nested Condition. Allows you to further branch the execution of a sequence depending
on a condition.
19. I have three jobs A,B,C . Which are dependent on each other? I want to run A
& C jobs daily and B job runs only on Sunday. How can u do it?
First you have to schedule A & C jobs Monday to Saturday in one sequence.
Next take three jobs according to dependency in one more sequence and schedule that job
only Sunday.