Faq Infa Forum
Faq Infa Forum
Q4) What kind of challenges did you come across in your project ?
A4) Mostly the challenges were to finalize the requirements in such a way so that different stakeholders
come to a common agreement about the scope and expectations from the project.
Q10) Did you follow any formal process or methodology for Requirement gathering ?
A10) As such we did not follow strict SDLC approach because requirement gathering is an iterative
process.
But after creating the detailed Requirement Specification Documents, we used to take User signoff.
Q14) What are diferent Data Warehousing Methodologies that you are familiar with ?
A14) In Data Warehousing, two methodologies are poopulare, 1st one is Ralph Kimbal and 2nd one is
Bill Inmon.
We mainly followed Ralph Kimball's methodlogy in my last project.
In this methodlogy, we have a fact tables in the middle, surrounded by dimension tables.
This is also a basic STAR Schema which is the basic dimensional model.
A Snowflake schema. In a snowflake schema, we normalize one of the dimension tables.
Q17) What are Test cases or how did you do testing of Informatica Mappings ?
A17) Basically we take the SQL from Source Qualifier and check the source / target data in Toad.
Then we try to spot check data for various conditions according to mapping document and look for
any error in mappings.
For example, there may be a condition that if customer account does not exist then filter out that record
and write it to a reject file.
Q18) What are the other error handlings that you did in mappings?
A18) I mainly looked for non-numeric data in numeric fields, layout of a flat file may be different.
Also dates from flat file come as a string
Q20) Give me an example of a tough situation that you came across in Informatica Mappings
and how did you handle it ?
A20) Basically one of our colleagues had created a mapping that was using Joiner and mapping was
taking a lot
of time to run, but the Join was in such a way that we could do the Join at Database Level (Oracle Level).
So I suggested and implemented that change and it reduced the run time by 40%.
Q21) Tell me what are various transformations that you have used ?
A21) I have used Lookup, Joiner, Update Strategy, Aggregator, Sorter etc.
Like a Filter / Aggregator Transformation. Filter Transformation can filter out some records based
on condition defined in filter transformation.
Similarly, in an aggregator transformation, number of output rows can be less than input rows as
after applying the aggregate function like SUM, we could have less records.
The PowerCenter Server queries the lookup source based on the lookup ports in the
transformation. It compares Lookup transformation port values to lookup source column
values based on the lookup condition.
Q25) Did you use unconnected Lookup Transformation ? If yes, then explain.
A25) Yes. An Unconnected Lookup receives input value as a result of :LKP Expression in another
transformation. It is not connected to any other transformation. Instead, it has input ports,
output ports and a Return Port.
Condition values are stored in Index Cache and output values in Data cache.
Q27) What happens if the Lookup table is larger than the Lookup Cache ?
A27) If the data does not fit in the memory cache, the PowerCenter Server stores the overflow values
in the cache files.
To avoid writing the overflow values to cache files, we can increase the default cache size.
When the session completes, the PowerCenter Server releases cache memory and deletes the cache files.
If you use a flat file lookup, the PowerCenter Server always caches the lookup source.
a) Persistent cache.You can save the lookup cache files and reuse them the next time the
PowerCenter Server processes a Lookup transformation configured to use the cache.
b) Recache from source. If the persistent cache is not synchronized with the lookup table,
you can configure the Lookup transformation to rebuild the lookup cache.
c) Static cache. You can configure a static, or read-only, cache for any lookup source.
By default, the PowerCenter Server creates a static cache. It caches the lookup file or table
and looks up values in the cache for each row that comes into the transformation.
When the lookup condition is true, the PowerCenter Server returns a value from the lookup
cache. The PowerCenter Server does not update the cache while it processes the Lookup
transformation.
d) Dynamic cache. If you want to cache the target table and insert new rows or update
existing rows in the cache and the target, you can create a Lookup transformation to
use a dynamic cache.
The PowerCenter Server dynamically inserts or updates data in the lookup cache and passes
data to the target table.
e) Shared cache. You can share the lookup cache between multiple transformations. You can
share an unnamed cache between transformations in the same mapping. You can share a
named cache between transformations in the same or different mappings.
However, a Router transformation tests data for one or more conditions and gives you the
option to route rows of data that do not meet any of the conditions to a default output group.
When you design your data warehouse, you need to decide what type of information to store
in targets. As part of your target table design, you need to determine whether to maintain all
the historic data or just the most recent changes.
For example, you might have a target table, T_CUSTOMERS, that contains customer data.
When a customer address changes, you may want to save the original address in the table
instead of updating that portion of the customer row. In this case, you would create a new row
containing the updated address, and preserve the original row with the old customer address.
This illustrates how you might store historical information in a target table. However, if you
want the T_CUSTOMERS table to be a snapshot of current customer data, you would
update the existing customer row and lose the original address.
The model you choose determines how you handle changes to existing rows.
Note: You can also use the Custom transformation to flag rows for insert, delete, update, or reject.
Value of a parameter does not change during session, whereas the value stored in a variable can change.
For Performance tuning, first we try to identify the source / target bottlenecks. Meaning that first we
see what can be do so that Source data is being retrieved as fast possible.
We try to filter as much data in SOURCE QUALIFIER as possible. If we have to use a filter then filtering
If we are using an aggregator transformation then we can pass the sorted input to aggregator. We need
to ideally sort the ports on which the GROUP BY is being done.
Also there should be as less transformations as possible. Also in Source Qualifier, we should bring only
the ports which are being used.
For optimizing the TARGET, we can disable the constraints in PRE-SESSION SQL and use BULK
LOADING.
IF the TARGET Table has any indexes like primary key or any other indexes / constraints then BULK
Loading
will fail. So in order to utilize the BULK Loading, we need to disable the indexes.
In case of Aggregator transformation, we can use incremental loading depending on requirements.
Also we need to capture all source fields in a ERR_DATA table so that if we need to correct the
erroneous
data fields and Re-RUN the corrected data if needed.
Usually there could be a separate mapping to handle such error data file.
well there are Push & Pull strategies which is used to determine how the data comes from source systems
to ETL server.
Push strategy : with this strategy, the source system pushes data ( or send the data ) to the ETL server.
Pull strategy : with this strategy, the ETL server pull the data(or gets the data) from the source system.
Q20) How did you migrate from Dev environment to UAT / PROD Environment ?
A20) We can do a folder copy or export the mapping in XML Format and then Import it another
Repository or folder.
Q) External Scheduler ?
A) with exteranal schedulers, we used to run informatica jobs like workflows using pmcmd command in
parallel with
some oracle jobs like stored procedures. there were variuos kinds of external schedulers available in
market
like AUtosys, Maestro, Control M . So we can use for mix & match for informatica & oracle jobs using
external schedulers.
Type 1
OLD RECORD
==========
NEW RECORD
==========
For older record, we update the exp date as the The current Date - 1, if the changes
happened today.
Suppose on 1st Oct, 2007 a small business name changes from ABC Roofing to XYZ Roofing, so if we
want
to store the old name, we will store data as below:
Surr Dim Cust_Id Cust Name Eff Date Exp Date
Key (Natural Key)
======== =============== ========================= ========== =========
1 C01 ABC Roofing 1/1/0001 09/30/2007
101 C01 XYZ Roofing 10/1/2007 12/31/9999
Suppose on 1st Oct, 2007 a small business name changes from ABC Roofing to XYZ Roofing, so if we
want
to store the old name, we will store data as below:
1)A data warehouse is a relational database that is designed for query and analysis
rather than for transaction processing. It usually contains historical data derived
from transaction data, but it can include data from other sources. It separates
analysis workload from transaction workload and enables an organization to
consolidate data from several sources.
In addition to a relational database, a data warehouse environment includes an
extraction, transportation, transformation, and loading (ETL) solution, an online
analytical processing (OLAP) engine, client analysis tools, and other applications
that manage the process of gathering data and delivering it to business users.
A common way of introducing data warehousing is to refer to the characteristics of
a data warehouse as set forth by William Inmon:
Subject Oriented
Integrated
Nonvolatile
Time Variant
2)Surrogate Key
Data warehouses typically use a surrogate, (also known as artificial or identity key), key for the
dimension tables primary keys. They can use Infa sequence generator, or Oracle sequence, or SQL Server
Identity values for the surrogate key.
There are actually two cases where the need for a "dummy" dimension key arises:
1) the fact row has no relationship to the dimension (as in your example), and
2) the dimension key cannot be derived from the source system data.
3)Facts & Dimensions form the heart of a data warehouse. Facts are the metrics that business users would
use for making business decisions. Generally, facts are mere numbers. The facts cannot be used without
their dimensions. Dimensions are those attributes that qualify facts. They give structure to the facts.
Dimensions give different views of the facts. In our example of employee expenses, the employee
expense forms a fact. The Dimensions like department, employee, and location qualify it. This was
mentioned so as to give an idea of what facts are.
Facts are like skeletons of a body.
Skin forms the dimensions. The dimensions give structure to the facts.
The fact tables are normalized to the maximum extent.
Whereas the Dimension tables are de-normalized since their growth would be very less.
SCD Type 2
Slowly changing dimension Type 2 is a model where the whole history is stored in the database. An
additional dimension record is created and the segmenting between the old record values and the new
(current) value is easy to extract and the history is clear.
The fields 'effective date' and 'current indicator' are very often used in that dimension and the fact table
usually stores dimension key and version number.
4)CRC Key
Cyclic redundancy check, or CRC, is a data encoding method (noncryptographic) originally developed for
detecting errors or corruption in data that has been transmitted over a data communications line.
During ETL processing for the dimension table, all relevant columns needed to determine change of
content from the source system (s) are combined and encoded through use of a CRC algorithm. The
encoded CRC value is stored in a column on the dimension table as operational meta data. During
subsequent ETL processing cycles, new source system(s) records have their relevant data content values
combined and encoded into CRC values during ETL processing. The source system CRC values are
compared against CRC values already computed for the same production/natural key on the dimension
table. If the production/natural key of an incoming source record are the same but the CRC values are
different, the record is processed as a new SCD record on the dimension table. The advantage here is that
CRCs are small, usually 16 or 32 bytes in length, and easier to compare during ETL processing versus the
contents of numerous data columns or large variable length columns.
5)Data partitioning, a new feature added to SQL Server 2005, provides a way to divide large tables and
indexes into smaller parts. By doing so, it makes the life of a database administrator easier when doing
backups, loading data, recovery and query processing.
Data partitioning improves the performance, reduces contention and increases availability of data.
Objects that may be partitioned are:
• Base tables
• Indexed views
A Stored Procedure transformation is an important tool for populating and maintaining databases
. Database administrators create stored procedures to automate time-consuming tasks that are too
complicated for standard SQL statements.
You might use stored procedures to do the following tasks:
Check the status of a target database before loading data into it.
Determine if enough space exists in a database.
Perform a specialized calculation.
Drop and recreate indexes.
Q)What r the types of data that passes between informatica server and stored procedure?
types of data
Input/Out put parameters
Return Values
Status code.
PERSISTANT CACHE-
If you want to save and reuse the cache files, you can configure the transformation to use a
persistent cache. Use a persistent cache when you know the lookup table does not change
between session runs.
The first time the Informatica Server runs a session using a persistent lookup cache, it saves the
cache files to disk instead of deleting them. The next time the Informatica Server runs the
session, it builds the memory cache from the cache files. If the lookup table changes
occasionally, you can override session properties to recache the lookup from the database.
NONPERSISTANT CACHE-
By default, the Informatica Server uses a non-persistent cache when you enable caching in a
Lookup transformation. The Informatica Server deletes the cache files at the end of a session.
The next time you run the session, the Informatica Server builds the memory cache from the
database
Dynamic cache?
You might want to configure the transformation to use a dynamic cache when the target table is
also the lookup table. When you use a dynamic cache, the Informatica Server updates the lookup
cache as it passes rows to the target.
The Informatica Server builds the cache when it processes the first lookup request. It queries the
cache based on the lookup condition for each row that passes into the transformation.
When the Informatica Server reads a row from the source, it updates the lookup cache by
performing one of the following actions:
output :
112 HR admin
112 HR interview
112 HR pay roll
c. Used parameter file ( .txt file stored in the server file system )
with input values for the Batch
e.g. [s_m_Map1] $$ACC_YEAR=2003 $$ACC_PERIOD=12
[s_m_Map2] $$ACC_YEAR=2003