Bi Concepts
Bi Concepts
OBIEE
http://www.oracle.com/technology/documentation/bi_ee.html
http://download.oracle.com/docs/cd/E10415_01/doc/nav/portal_booklist.htm
http://zed.cisco.com/confluence/display/siebel/Home
http://zed.cisco.com/confluence/display/siebel/Enterprise+Architecture+BI+Standards
https://cisco.webex.com/ciscosales/lsr.php?
AT=pb&SP=MC&rID=39544447&rKey=9C8D63F2C74ED9DA
http://informatica.techtiks.com/informatica_questions.html#RQ1
http://www.allinterview.com/showanswers/32477.html
http://www.1keydata.com/datawarehousing/glossary.html
http://www.forum9.com/
http://www.livestore.net/
http://www.kalaajkal.com/
Iteration – the solution is delivered in short iterations, with each cycle adding more business value
and implementing requested changes.
10 Key Principles of Agile Software Development, and how it fundamentally differs from a more
traditional waterfall approach to software development, are as follows:
Page 1 of 94
TCS Internal
1. Active user involvement is imperative
2. The team must be empowered to make decisions
3. Requirements evolve but the timescale is fixed
4. Capture requirements at a high level; lightweight & visual
5. Develop small, incremental releases and iterate
6. Focus on frequent delivery of products
7. Complete each feature before moving on to the next
8. Apply the 80/20 rule
9. Testing is integrated throughout the project lifecycle – test early and often
10. A collaborative & cooperative approach between all stakeholders is essential
Unix
http://www.sikh-history.com/computers/unix/commands.html#catcommand
Cat file1
cat file1 file2 > all -----it will combined (it will create file if it doesn’t exit)
cat file1 >> file2---it will append to file 2
o > will redirect output from standard out (screen) to file or printer or
whatever you like.
o >> Filename will append at the end of a file called filename.
o < will redirect input to a process or command.
Below line is the first line of the script
#!/usr/bin/sh
Or
#!/bin/ksh
It's to tell your shell what shell to you in executing the following statements in your
shell script.
Page 2 of 94
TCS Internal
ps -A
Crontab command.
Crontab command is used to schedule jobs. You must have permission to run this
command by Unix Administrator. Jobs are scheduled in five numbers, as follows.
Minutes (0-59) Hour (0-23) Day of month (1-31) month (1-12) Day of week
(0-6) (0 is Sunday)
so for example you want to schedule a job which runs from script named backup jobs
in /usr/local/bin directory on sunday (day 0) at 11.25 (22:25) on 15th of month. The entry
in crontab file will be. * represents all values.
25 22 15 * 0 /usr/local/bin/backup_jobs
$ ls -l | grep '^d'
Pipes:
The pipe symbol "|" is used to direct the output of one command to the input
of another.
Page 3 of 94
TCS Internal
cat filename Dump a file to the screen in ascii.
head filename Show the first few lines of a file.
head -n filename Show the first n lines of a file.
tail filename Show the last few lines of a file.
tail -n filename Show the last n lines of a file.
find . -name aaa.txt Finds all the files named aaa.txt in the current directory or
any subdirectory tree.
find / -name vimrc Find all the files named 'vimrc' anywhere on the system.
find /usr/local/games -name "*xpilot*"
Find all files whose names contain the string 'xpilot' which
exist within the '/usr/local/games' directory tree.
You can find out what shell you are using by the command:
echo $SHELL
Interactive History
A feature of bash and tcsh (and sometimes others) you can use
the up-arrow keys to access your previous commands, edit
them, and re-execute them.
Opening a file
vi filename
Creating text
Edit modes: These keys enter editing modes and type in the text
of your document.
Page 4 of 94
TCS Internal
I Insert at beginning of current line
a Insert (append) after current cursor position
A Append to end of line
r Replace 1 character
R Replace mode
<ESC> Terminate insertion or overwrite mode
Deletion of text
Business Intelligence refers to a set of methods and techniques that are used by
organizations for tactical and strategic decision making. It leverages methods and
technologies that focus on counts, statistics and business objectives to improve business
performance.
Page 5 of 94
TCS Internal
DW stores historical data.
In terms of design data warehouse and data mart are almost the same.
In general a Data Warehouse is used on an enterprise level and a Data Marts is used on a
business division/department level.
Subject Oriented:
Data that gives information about a particular subject instead of about a company's
ongoing operations.
Integrated:
Data that is gathered into the data warehouse from a variety of sources and merged into a
coherent whole.
Time-variant:
All data in the data warehouse is identified with a particular time period.
Non-volatile
Data is stable in a data warehouse. More data is added but data is never removed. This
enables management to gain a consistent picture of the business.
Informatica Transformations:
Page 6 of 94
TCS Internal
Target Definition: The Target Definition is used to logically represent a database table
or file in the Data Warehouse / Data Mart.
Aggregator: The Aggregator transformation is used to perform Aggregate calculations
on group basis.
Expression: The Expression transformation is used to perform the arithmetic calculation
on row by row basis and also used to convert string to integer vis and concatenate two
columns.
Filter: The Filter transformation is used to filter the data based on single condition and
pass through next transformation.
Router: The router transformation is used to route the data based on multiple conditions
and pass through next transformations.
It has three groups
1) Input group
2) User defined group
3) Default group
Joiner: The Joiner transformation is used to join two sources residing in different
databases or different locations like flat file and oracle sources or two relational tables
existing in different databases.
Source Qualifier: The Source Qualifier transformation is used to describe in SQL the
method by which data is to be retrieved from a source application system and also
used to join two relational sources residing in same databases.
What is Incremental Aggregation?
A. Whenever a session is created for a mapping Aggregate Transformation, the session
option for Incremental Aggregation can be enabled. When PowerCenter performs
incremental aggregation, it passes new source data through the mapping and uses
historical cache data to perform new aggregation calculations incrementally.
We can configure to use dynamic cache We can’t configure to use dynamic cache
Pass multiple output values to another Pass one output value to another
transformation. Link lookup/output ports to transformation. The lookup/output/return
Page 7 of 94
TCS Internal
another transformation. port passes the value to the transformation
calling: LKP expression.
Lookup Caches:
When configuring a lookup cache, you can specify any of the following options:
• Persistent cache
• Recache from lookup source
• Static cache
• Dynamic cache
• Shared cache
Dynamic cache: When you use a dynamic cache, the PowerCenter Server updates the
lookup cache as it passes rows to the target.
If you configure a Lookup transformation to use a dynamic cache, you can only use the
equality operator (=) in the lookup condition.
NewLookupRow Port will enable automatically.
NewLookupRow
Description
Value
The PowerCenter Server does not update or insert the row in the
0
cache.
1 The PowerCenter Server inserts the row into the cache.
2 The PowerCenter Server updates the row in the cache.
Static cache: It is a default cache; the PowerCenter Server doesn’t update the lookup
cache as it passes rows to the target.
Persistent cache: If the lookup table does not change between sessions, configure the
Lookup transformation to use a persistent lookup cache. The PowerCenter Server then
saves and reuses cache files from session to session, eliminating the time required to read
the lookup table.
Page 8 of 94
TCS Internal
Stored Procedure: The Stored Procedure transformation is used to execute externally
stored database procedures and functions. It is used to perform the database level
operations.
Sorter: The Sorter transformation is used to sort data in ascending or descending order
according to a specified sort key. You can also configure the Sorter transformation for
case-sensitive sorting, and specify whether the output rows should be distinct. The Sorter
transformation is an active transformation. It must be connected to the data flow.
Union Transformation:
The Union transformation is a multiple input group transformation that you can use to
merge data from multiple pipelines or pipeline branches into one pipeline branch. It
merges data from multiple sources similar to the UNION ALL SQL statement to combine
the results from two or more SQL statements. Similar to the UNION ALL statement, the
Union transformation does not remove duplicate rows.Input groups should have similar
structure.
Update Strategy: The Update Strategy transformation is used to indicate the DML
statement.
We can implement update strategy in two levels:
1) Mapping level
2) Session level.
Session level properties will override the mapping level properties.
Mapplet:
Mapplet is a set of reusable transformations. We can use this mapplet in any mapping
within the Folder.
When you add transformations to a mapplet, keep the following restrictions in mind:
Page 9 of 94
TCS Internal
• The mapplet contains Input transformations and/or source definitions with at least
one port connected to a transformation in the mapplet.
• The mapplet contains at least one Output transformation with at least one port
connected to a transformation in the mapplet.
System Variables
$$$SessStartTime returns the initial system date value on the machine hosting the
Integration Service when the server initializes a session. $$$SessStartTime returns the
session start time as a string value. The format of the string depends on the database you
are using.
Adventage of Teradata:
1. Can store Billions of rows.
2. parallel processing makes teradata faster than other RDBMS
3. Can be accessed by network attached and channel attached system
4. supports the requirements from diverse clients
5. automatically detects and recovers from hardware failure
6. Allows expansion without sacrifice performance.
Datawarehouse - Concepts
Beginners
2. What is a DataMart?
Datamart is usually sponsored at the department level and developed with a specific
issue or subject in mind, a Data Mart is a data warehouse with a focused objective.
Page 10 of 94
TCS Internal
4. What do you mean by Dimension Attributes?
For example , attributes in a PRODUCT dimension can be product category, product type
etc.
Generally the Dimension Attributes are used in query filter condition and to display other
related information about an dimension.
In terms of design data warehouse and data mart are almost the same.
In general a Data Warehouse is used on an enterprise level and a Data Marts is used on a
business division/department level.
ROLAP stands for Relational OLAP. Cencentually data is organized in cubes with
dimensions.
Star schema is a data warehouse schema where there is only one "fact table" and many
denormalized dimension tables.
Fact table contains primary keys from all the dimension tables and other numeric
columns columns of additive, numeric facts.
In Data warehousing grain refers to the level of detail available in a given fact table as
well as to the level of detail provided by a star schema.
It is usually given as the number of records per key within the table. In general, the grain
of the fact table is the grain of the star schema.
Page 11 of 94
TCS Internal
Unlike Star-Schema, Snowflake schema contain normalized dimension tables in a tree
like structure with many nesting levels.
A surrogate key is a substitution for the natural primary key. It is a unique identifier or
number ( normally created by a database sequence generator ) for each record of a
dimension table that can be used for the primary key to the table.
11. What oracle tools are available to design and build a data warehosue/data mart?
Oracle Designer,
Oracle Express,
For example:
A SALES cube can have PROFIT and COMMISSION measures and TIME, ITEM and
REGION dimensions
ETL tools are used to pull data from a database, transform the data so that it is
compatible with a second database ( datawarehouse or datamart) and then load the data.
Page 12 of 94
TCS Internal
In a data warehouse paradigm "aggregation" is one way of improving query
performance. An aggregate fact table is a new table created off of an existing fact table
by summing up facts for a set of associated dimension. Grain of an aggregate fact is
higher than the fact table. Aggreagate tables contain fewer rows thus making quesries run
faster.
When a non-key attribute identifys the value of another non-key atribute then the table
is set to contain transitive dependecncy.
18. What are the tools in informatica?Why we are using that tools?
Page 13 of 94
TCS Internal
It also contains different rules to be applied on the data before the data get loaded to the
target.
a fact table that contains only primary keys from the dimension tables, and that do
not contain any
measures that type of fact table is called fact less fact table
A Method by which the designer can decide which path to choose when more than one
path is possible from one table to another in a Universe
Advanced
25. Who are the Data Stewards and whats their role?
Data Stewards is a group of experienced people who are responsible for planning ,
defining business processes and setting directions. Data Stewards are familiar with the
organization data quality , data issues and overall business processes.
To be able to drill down/drill across is the most basic requirement of an end user in a data
warehouse. Drilling down most directly addresses the natural end-user need to see more
detail in an result. Drill down should be as generic as possible becuase there is absolutely
no good way to predict users drill-down path.
Page 14 of 94
TCS Internal
27. What the easiest way to build a corporate specific time dimension?
Unlike most dimensions "Time dimension" do not change. You can populate it once
and use for years.
Real Time Data warehous is an analytic component of an enterprise level data stream
that supports continuous, asynchronous, multi-point delivery of data.
In a RTDW data moves straight from the source systems to decision makers without any
form for staging.
Slowly changing dimensions refers to the change in dimensional attributes over time.
There are three main techniques for handling slowly changing dimensions in a data
warehouse:
Type 1: Overwriting. No History maintained . The new record replaces the original
record.
Type3: Creating a current value field. The original record is modified to reflect the
change.
Each technique handles the problem differently. The designer chooses among these
techniques depending on the company's need to preserve an accurate history of the
dimensional changes.
Page 15 of 94
TCS Internal
31. What is TL9000?
Page 16 of 94
TCS Internal
Power Center 8.X Architecture.
Page 17 of 94
TCS Internal
Page 18 of 94
TCS Internal
Developer Changes: Java Transformation Added in 8.x
Page 19 of 94
TCS Internal
1) the diff btw 8.1 and 8.5 is we can find push down operation in mapping wch gives
more flexible performance tunning.
Pushdown optimization
A session option that allows you to push transformation logic to the source or target
database.
GRID
Effective in version 8.0, you create and configure a grid in the Administration Console.
You configure a grid to run on multiple nodes, and you configure one Integration Service
to run on the grid. The Integration Service runs processes on the nodes in the grid to
distribute workflows and sessions. In addition to running a workflow on a grid, you can
now run a session on a grid. When you run a session or workflow on a grid, one
service process runs on each available node in the grid.
The Integration Service starts one or more Integration Service processes to run and
monitor workflows. When we run a workflow, the ISP starts and locks the workflow,
runs the workflow tasks, and starts the process to run sessions. The functions of the
Integration Service Process are,
Locks and reads the workflow
Manages workflow scheduling, ie, maintains session dependency
Reads the workflow parameter file
Creates the workflow log
Runs workflow tasks and evaluates the conditional links
Starts the DTM process to run the session
Writes historical run information to the repository
Sends post-session emails
The Load Balancer dispatches tasks to achieve optimal performance. It dispatches tasks
to a single node or across the nodes in a grid after performing a sequence of steps. Before
Page 20 of 94
TCS Internal
understanding these steps we have to know about Resources, Resource Provision
Thresholds, Dispatch mode and Service levels
Resources – we can configure the Integration Service to check the resources
available on each node and match them with the resources required to run the
task. For example, if a session uses an SAP source, the Load Balancer dispatches
the session only to nodes where the SAP client is installed
Three Resource Provision Thresholds, The maximum number of runnable
threads waiting for CPU resources on the node called Maximum CPU Run Queue
Length. The maximum percentage of virtual memory allocated on the node
relative to the total physical memory size called Maximum Memory %. The
maximum number of running Session and Command tasks allowed for each
Integration Service process running on the node called Maximum Processes
Three Dispatch mode’s – Round-Robin: The Load Balancer dispatches tasks to
available nodes in a round-robin fashion after checking the “Maximum Process”
threshold. Metric-based: Checks all the three resource provision thresholds and
dispatches tasks in round robin fashion. Adaptive: Checks all the three resource
provision thresholds and also ranks nodes according to current CPU availability
Service Levels establishes priority among tasks that are waiting to be dispatched,
the three components of service levels are Name, Dispatch Priority and Maximum
dispatch wait time. “Maximum dispatch wait time” is the amount of time a task
can wait in queue and this ensures no task waits forever
A .Dispatching Tasks on a node
1. The Load Balancer checks different resource provision thresholds on the node
depending on the Dispatch mode set. If dispatching the task causes any threshold
to be exceeded, the Load Balancer places the task in the dispatch queue, and it
dispatches the task later
2. The Load Balancer dispatches all tasks to the node that runs the master
Integration Service process
B. Dispatching Tasks on a grid,
1. The Load Balancer verifies which nodes are currently running and enabled
2. The Load Balancer identifies nodes that have the PowerCenter resources required
by the tasks in the workflow
3. The Load Balancer verifies that the resource provision thresholds on each
candidate node are not exceeded. If dispatching the task causes a threshold to be
exceeded, the Load Balancer places the task in the dispatch queue, and it
dispatches the task later
4. The Load Balancer selects a node based on the dispatch mode
When the workflow reaches a session, the Integration Service Process starts the DTM
process. The DTM is the process associated with the session task. The DTM process
performs the following tasks:
Retrieves and validates session information from the repository.
Validates source and target code pages.
Verifies connection object permissions.
Page 21 of 94
TCS Internal
Performs pushdown optimization when the session is configured for pushdown
optimization.
Adds partitions to the session when the session is configured for dynamic
partitioning.
Expands the service process variables, session parameters, and mapping variables
and parameters.
Creates the session log.
Runs pre-session shell commands, stored procedures, and SQL.
Sends a request to start worker DTM processes on other nodes when the session is
configured to run on a grid.
Creates and runs mapping, reader, writer, and transformation threads to extract,
transform, and load data
Runs post-session stored procedures, SQL, and shell commands and sends post-
session email
IS starts ISP
DWH ARCHITECTURE
Page 22 of 94
TCS Internal
Granularity
Principle: create fact tables with the most granular data possible to support analysis of the
business process.
In Data warehousing grain refers to the level of detail available in a given fact table as
well as to the level of detail provided by a star schema.
It is usually given as the number of records per key within the table. In general, the grain
of the fact table is the grain of the star schema.
Facts: facts must be consistent with the grain ... all facts are at a uniform grain
dimensions: each dimension associated with fact table must take on a single value for
each fact row
Page 23 of 94
TCS Internal
What is DM?
DM is a logical design technique that seeks to present the data in a standard, intuitive
framework that allows for high-performance access. It is inherently dimensional, and it
adheres to a discipline that uses the relational model with some important restrictions.
Every dimensional model is composed of one table with a multipart key, called the fact
table, and a set of smaller tables called dimension tables. Each dimension table has a
single-part primary key that corresponds exactly to one of the components of the
multipart key in the fact table.
Page 24 of 94
TCS Internal
What is Conformed Dimension?
Conformed Dimensions (CD): these dimensions are something that is built once in your
model and can be reused multiple times with different fact tables. For example, consider
a model containing multiple fact tables, representing different data marts. Now look for a
dimension that is common to these facts tables. In this example let’s consider that the
product dimension is common and hence can be reused by creating short cuts and joining
the different fact tables.Some of the examples are time dimension, customer dimensions,
product dimension.
When you consolidate lots of small dimensions and instead of having 100s of small
dimensions, that will have few records in them, cluttering your database with these mini
‘identifier’ tables, all records from all these small dimension tables are loaded into ONE
dimension table and we call this dimension table Junk dimension table. (Since we are
storing all the junk in this one table) For example: a company might have handful of
manufacture plants, handful of order types, and so on, so forth, and we can consolidate
them in one dimension table called junked dimension table
An item that is in the fact table but is stripped off of its description, because the
description belongs in dimension table, is referred to as Degenerated Dimension. Since it
looks like dimension, but is really in fact table and has been degenerated of its
description, hence is called degenerated dimension. Now coming to the slowly changing
dimensions (SCD) and Slowly Growing Dimensions (SGD): I would like to classify
them to be more of an attributes of dimensions its self.
A Data Mart is a subset of data from a Data Warehouse. Data Marts are built for specific
user groups. They contain a subset of rows and columns that are of interest to the
Page 25 of 94
TCS Internal
particular audience. By providing decision makers with only a subset of the data from
the Data Warehouse, privacy, performance and clarity objectives can be attained.
A Fact Table in a dimensional model consists of one or more numeric facts of importance
to a business. Examples of facts are as follows:
Businesses have a need to monitor these "facts" closely and to sum them using different
"dimensions". For example, a business might find the following information useful:
1. the value of products sold store, by product type and by day of week
2. the value of products sold by product and by channel
In addition to numeric facts, fact table contain the "keys" of each of the dimensions that
related to that fact (e.g Customer Nbr, Product ID, Store Nbr). Details about the
dimensions (e.g customer name, customer address) are stored in the dimension table (i.e.
customer)
A set of level properties that describe a specific aspect of a business, used for analyzing
the factual measures
Page 26 of 94
TCS Internal
What is Factless Fact Table?
Factless fact table captures the many-to-many relationships between dimensions, but
contains no numeric or textual facts. They are often used to record events or coverage
information.
• Identifying product promotion events (to determine promoted products that didn’t
sell)
• Tracking student attendance or registration events
• Tracking insurance-related accident events
• Identifying building, facility, and equipment schedules for a hospital or
University"
Types of facts?
There are three types of facts:
• Additive: Additive facts are facts that can be summed up through all of the
dimensions in the fact table.
• Semi-Additive: Semi-additive facts are facts that can be summed up for some of
the dimensions in the fact table, but not the others.
• Non-Additive: Non-additive facts are facts that cannot be summed up for any of
the dimensions present in the fact table.
For example, you might have a session using a source that receives new data every day. You
can capture those incremental changes because you have added a filter condition to the mapping
that removes pre-existing data from the flow of data. You then enable incremental aggregation.
When the session runs with incremental aggregation enabled for the first time on March 1, you
use the entire source. This allows the PowerCenter Server to read and store the necessary
aggregate data. On March 2, when you run the session again, you filter out all the records except
those time-stamped March 2. The PowerCenter Server then processes only the new data and
updates the target accordingly.
You can capture new source data. Use incremental aggregation when you can capture new
source data each time you run the session. Use a Stored Procedure or Filter transformation to
process only new data.
Page 27 of 94
TCS Internal
Incremental changes do not significantly change the target. Use incremental aggregation
when the changes do not significantly change the target. If processing the incrementally changed
source alters more than half the existing target, the session may not benefit from using
incremental aggregation. In this case, drop the table and re-create the target with complete
source data.
Note: Do not use incremental aggregation if your mapping contains percentile or median
functions. The PowerCenter Server uses system memory to process Percentile and Median
functions in addition to the cache memory you configure in the session property sheet. As a
result, the PowerCenter Server does not store incremental aggregation values for Percentile and
Median functions in disk caches.
Normalization:
Some Oracle databases were modeled according to the rules of normalization that were
intended to eliminate redundancy.
Obviously, the rules of normalization are required to understand your relationships and
functional dependencies
A row is in first normal form (1NF) if all underlying domains contain atomic values only.
• Does not have a composite primary key. Meaning that the primary key can not be
subdivided into separate logical entities.
• All the non-key columns are functionally dependent on the entire primary key.
• A row is in second normal form if, and only if, it is in first normal form and every non-key
attribute is fully dependent on the key.
• 2NF eliminates functional dependencies on a partial key by putting the fields in a
separate table from those that are dependent on the whole key. An example is resolving
many: many relationships using an intersecting entity.
Page 28 of 94
TCS Internal
• A row is in third normal form if and only if it is in second normal form and if attributes that
do not contribute to a description of the primary key are move into a separate table. An
example is creating look-up tables.
Boyce Codd Normal Form (BCNF) is a further refinement of 3NF. In his later writings Codd refers
to BCNF as 3NF. A row is in Boyce Codd normal form if, and only if, every determinant is a
candidate key. Most entities in 3NF are already in BCNF.
An entity is in Fourth Normal Form (4NF) when it meets the requirement of being in Third Normal
Form (3NF) and additionally:
• Has no multiple sets of multi-valued dependencies. In other words, 4NF states that no
entity can have more than a single one-to-many relationship
It is a perfect valid question to ask why hints should be used. Oracle comes with an
optimizer that promises to optimize a query's execution plan. When this optimizer is
really doing a good job, no hints should be required at all.
Sometimes, however, the characteristics of the data in the database are changing rapidly,
so that the optimizer (or more accuratly, its statistics) are out of date. In this case, a hint
could help.
You should first get the explain plan of your SQL and determine what changes can be
done to make the code operate without using hints if possible. However, hints such as
ORDERED, LEADING, INDEX, FULL, and the various AJ and SJ hints can tame a wild
optimizer and give you optimal performance
The ANALYZE statement can be used to gather statistics for a specific table, index or
cluster. The statistics can be computed exactly, or estimated based on a specific number
of rows, or a percentage of rows:
Page 29 of 94
TCS Internal
ANALYZE TABLE employees ESTIMATE STATISTICS SAMPLE 15 PERCENT;
By default Oracle 10g automatically gathers optimizer statistics using a scheduled job
called GATHER_STATS_JOB. By default this job runs within maintenance windows
between 10 P.M. to 6 A.M. week nights and all day on weekends. The job calls the
DBMS_STATS.GATHER_DATABASE_STATS_JOB_PROC internal procedure which
gathers statistics for tables with either empty or stale statistics, similar to the
DBMS_STATS.GATHER_DATABASE_STATS procedure using the GATHER AUTO
option. The main difference is that the internal job prioritizes the work such that tables
most urgently requiring statistics updates are processed first.
***** RUN INFO FOR TGT LOAD ORDER GROUP [1], CONCURRENT SET [1]
****
Hint categories
Page 30 of 94
TCS Internal
Hints can be categorized as follows:
ORDERED- This hint forces tables to be joined in the order specified. If you know
table X has fewer rows, then ordering it first may speed execution in a join.
If index is not able to create then will go for /*+ parallel(table, 8)*/-----For select and
update example---in where clase like st,not in ,>,< ,<> then we will use.
When an SQL statement is passed to the server the Cost Based Optimizer (CBO) uses
database statistics to create an execution plan which it uses to navigate through the
data. Once you've highlighted a problem query the first thing you should do is
EXPLAIN the statement to check the execution plan that the CBO has created.
This will often reveal that the query is not using the relevant indexes, or
indexes to support the query are missing. Interpretation of the execution plan is
beyond the scope of this article.
The explain plan process stores data in the PLAN_TABLE. This table can be located in
the current schema or a shared schema and is created using in SQL*Plus as
follows:
Page 31 of 94
TCS Internal
Syntax for synonym
If we want all the parts (irrespective of whether they are supplied by any supplier or not),
and all the suppliers (irrespective of whether they supply any part or not) listed in the
same result set, we have a problem. That's because the traditional outer join (using the '+'
operator) is unidirectional, and you can't put (+) on both sides in the join condition. The
following will result in an error:
WTT.NAME WORK_TYPE,
(CASE
WHEN (PPC.CLASS_CODE = 'Subscription' AND L1.ATTRIBUTE_CATEGORY
IS NOT NULL)
THEN L1.ATTRIBUTE_CATEGORY
ELSE PTT.TASK_TYPE
END) TASK_TYPE,
Page 32 of 94
TCS Internal
PEI.DENOM_CURRENCY_CODE
We can keep aggregated data into materialized view. we can schedule the
MV to refresh but table can’t.MV can be created based on multiple
tables.
Materialized View?
Since when we are working with various databases running in different system,So
sometime we may needed to fetch some records from the remote location,so it may quit
expensive in terms of resourse of fetching data directly from remote location.To to
minimize to response time and to increse the throughput we may create the copy to that
on local database by using data from remote database.This duplicate copy is Known as
materialised view,which may be refreshed as per as requirment as option avilable with
oracle such as fast,complete and refresh.
DBMS_MVIEW.REFRESH('MV_COMPLEX', 'C');
By default, the Integration Service updates target tables based on key values. However,
you can override the default UPDATE statement for each target in a mapping. You might
want to update the target based on non-key columns.
Page 33 of 94
TCS Internal
Overriding the WHERE Clause
You can override the WHERE clause to include non-key columns. For example, you
might want to update records for employees named Mike Smith only. To do this, you edit
the WHERE clause as follows:
If you modify the UPDATE portion of the statement, be sure to use :TU to specify
ports.
DELETE
The DELETE command is used to remove rows from a table. A WHERE clause can be
used to only remove some rows. If no WHERE condition is specified, all rows will be
removed. After performing a DELETE operation you need to COMMIT or ROLLBACK
the transaction to make the change permanent or to undo it.
TRUNCATE
TRUNCATE removes all rows from a table. The operation cannot be rolled back. As
such, TRUCATE is faster and doesn't use as much undo space as a DELETE.
DROP
The DROP command removes a table from the database. All the tables' rows, indexes
and privileges will also be removed. The operation cannot be rolled back.
ROWID
A globally unique identifier for a row in a database. It is created at the time the row is
inserted into a table, and destroyed when it is removed from a
table.'BBBBBBBB.RRRR.FFFF' where BBBBBBBB is the block number, RRRR is the
slot(row) number, and FFFF is a file number.
ROWNUM
For each row returned by a query, the ROWNUM pseudocolumn returns a number
Page 34 of 94
TCS Internal
indicating the order in which Oracle selects the row from a table or set of joined
rows. The first row selected has a ROWNUM of 1, the second has 2, and so on.
You can use ROWNUM to limit the number of rows returned by a query, as in this
example:
Example
Select deptno, ename, sal from emp a
where sal = (select max(sal) from emp
where deptno = a.deptno)
group by deptno
Get dept wise max sal along with empname and emp no.
other query
select
emp_id,
Page 35 of 94
TCS Internal
max(decode(rank_id,1,address)) as add1,
max(decode(rank_id,2,address)) as add2,
max(decode(rank_id,3,address))as add3
from
(select emp_id,address,rank() over (partition by emp_id order by
emp_id,address )rank_id from temp )
group by
emp_id
Also below is the logic for converting columns into Rows without
using Normalizer Transformation.
3) Use Aggregator transformation and check group by on port id only. As shown below:-
Page 36 of 94
TCS Internal
WTT.NAME WORK_TYPE,
(CASE
WHEN (PPC.CLASS_CODE = 'Subscription' AND L1.ATTRIBUTE_CATEGORY
IS NOT NULL)
THEN L1.ATTRIBUTE_CATEGORY
ELSE PTT.TASK_TYPE
END) TASK_TYPE,
PEI.DENOM_CURRENCY_CODE
Rank query:
Select empno, ename, sal, r from (select empno, ename, sal, rank () over (order by sal
desc) r from EMP);
The DENSE_RANK function works acts like the RANK function except that it assigns
consecutive ranks:
Select empno, ename, Sal, from (select empno, ename, sal, dense_rank () over (order by
sal desc) r from emp);
Select empno, ename, sal,r from (select empno,ename,sal,dense_rank() over (order by sal
desc) r from emp) where r<=5;
Or
Select * from (select * from EMP order by sal desc) where rownum<=5;
2 nd highest Sal:
Select empno, ename, sal, r from (select empno, ename, sal, dense_rank () over (order by
sal desc) r from EMP) where r=2;
Top sal:
Select * from EMP where sal= (select max (sal) from EMP);
Page 37 of 94
TCS Internal
Select * from EMP where sal= (Select max (sal) from EMP where sal< (select max (sal)
from EMP));
Or
Select max (sal) from emp where sal < (select max (sal) from emp)
Delete from EMP where rowid not in (select max (rowid) from EMP group by deptno);
The WHERE clause cannot be used to restrict groups. you use the
HAVING clause to restrict groups
Page 38 of 94
TCS Internal
delete temp
where rowid not in (select max (rowid) from temp
group by empno.
)
from intsmdm.V289U_SAP_CT_HA
group by
Hierarchical queries
Starting at the root, walk from the top down, and eliminate employee Higgins in the result, but
process the child rows.
SELECT department_id, employee_id, last_name, job_id, salary
FROM employees
WHERE last_name! = ’Higgins’
START WITH manager_id IS NULL
CONNECT BY PRIOR employee_id = menagerie;
Block in PL/SQL:
The basic unit in PL/SQL is called a block, which is made up of three parts: a
declarative part, an executable part, and an exception-building part.
PL/SQL blocks can be compiled once and stored in executable form to increase
response time.
Page 39 of 94
TCS Internal
Store procedure a PL/SQL program that is stored in a database in compiled form .
PL/SQL stored procedure that is implicitly started when an INSERT, UPDATE or
DELETE statement is issued against an associated table is called a trigger.
A procedure or function is a schema object that logically groups a set of SQL and other
PL/SQL programming language statements together to perform a specific task.
A package is a group of related procedures and functions, together with the cursors and
variables they use,
Packages provide a method of encapsulating related procedures, functions, and associated
cursors and variables together as a unit in the database.
2. Procedure is not a part of expression. i mean we can’t call procedure from expressions
where as function can.
A function returns a value, and a function can be called in a SQL statement. No other
differences.
Indexes:
Bitmap indexes are most appropriate for columns having low distinct values—such as
GENDER, MARITAL_STATUS, and RELATION. This assumption is not
completely accurate, however. In reality, a bitmap index is always advisable for
systems in which data is not frequently updated by many concurrent systems. In
fact, as I'll demonstrate here, a bitmap index on a column with 100-percent unique
values (a column candidate for primary key) is as efficient as a B-tree index.
Page 40 of 94
TCS Internal
You should create an index if:
• The table is large and most queries are expected to retrieve less than 2 to 4
percent of the rows
Datafiles Overview
Tablespaces Overview
Oracle stores data logically in tablespaces and physically in datafiles associated with the
corresponding tablespace.
A database is divided into one or more logical storage units called tablespaces.
Tablespaces are divided into logical units of storage called segments.
A control file contains information about the associated database that is required for
access by an instance, both at startup and during normal operation. Control file
information can be modified only by Oracle; no database administrator or user can edit a
control file.
Page 41 of 94
TCS Internal
data..so based on requirements if we can use incremental
aggregation then definately it will improve performance..so
while develop mapping always keep in mind this factor too...
Data modeling
There are three levels of data modeling. They are conceptual, logical, and physical. This section
will explain the difference among the three, the order with which each one is created, and how to
go from one level to the other.
At this level, the data modeler attempts to identify the highest-level relationships among the
different entities.
At this level, the data modeler attempts to describe the data in as much detail as possible, without
regard to how they will be physically implemented in the database.
In data warehousing, it is common for the conceptual data model and the logical data model to be
combined into a single step (deliverable).
The steps for designing the logical data model are as follows:
Page 42 of 94
TCS Internal
3. Find the relationships between different entities.
4. Find all attributes for each entity.
5. Resolve many-to-many relationships.
6. Normalization.
At this level, the data modeler will specify how the logical data model will be realized in the
database schema.
http://www.learndatamodeling.com/dm_standard.htm
The differences between a logical data model and physical data model is shown
below.
Page 43 of 94
TCS Internal
Type 1 Slowly Changing Dimension
In Type 1 Slowly Changing Dimension, the new information simply overwrites the original
information. In other words, no history is kept.
After Christina moved from Illinois to California, the new information replaces the new record, and
we have the following table:
Advantages:
- This is the easiest way to handle the Slowly Changing Dimension problem, since there is no
need to keep track of the old information.
Disadvantages:
- All history is lost. By applying this methodology, it is not possible to trace back in
history. For example, in this case, the company would not be able to know that
Christina lived in Illinois before.
- Usage:
Type 1 slowly changing dimension should be used when it is not necessary for the data
warehouse to keep track of historical changes.
In Type 2 Slowly Changing Dimension, a new record is added to the table to represent the new
information. Therefore, both the original and the new record will be present. The newe record
gets its own primary key.
Page 44 of 94
TCS Internal
Customer Key Name State
1001 Christina Illinois
After Christina moved from Illinois to California, we add the new information as a new row into the
table:
Advantages:
Disadvantages:
- This will cause the size of the table to grow fast. In cases where the number of rows for the table
is very high to start with, storage and performance can become a concern.
Usage:
Type 2 slowly changing dimension should be used when it is necessary for the data warehouse to
track historical changes.
In Type 3 Slowly Changing Dimension, there will be two columns to indicate the particular
attribute of interest, one indicating the original value, and one indicating the current value. There
will also be a column that indicates when the current value becomes active.
To accommodate Type 3 Slowly Changing Dimension, we will now have the following columns:
Page 45 of 94
TCS Internal
• Customer Key
• Name
• Original State
• Current State
• Effective Date
After Christina moved from Illinois to California, the original information gets updated, and we
have the following table (assuming the effective date of change is January 15, 2003):
Advantages:
- This does not increase the size of the table, since new information is updated.
Disadvantages:
- Type 3 will not be able to keep all history where an attribute is changed more than once. For
example, if Christina later moves to Texas on December 15, 2003, the California information will
be lost.
Usage:
Type III slowly changing dimension should only be used when it is necessary for the data
warehouse to track historical changes, and when such changes will only occur for a finite number
of time.
Product
Effective Expiry
Product ID(PK) Year Product Name Product Price
DateTime(PK) DateTime
1 01-01-2004 12.00AM 2004 Product1 $150 12-31-2004 11.59PM
1 01-01-2005 12.00AM 2005 Product1 $250
Product
Product ID(PK) Current Product Current Old Product Old Year
Page 46 of 94
TCS Internal
Product Price
Year Name Price
1 2005 Product1 $250 $150 2004
The problem with the Type 3 approach, is over years, if the product price continuously
changes, then the complete history may not be stored, only the latest change will be stored.
For example, in year 2006, if the product1's price changes to $350, then we would not be able
to see the complete history of 2004 prices, since the old values would have been updated with
2005 product information.
Product
Product Product Old Product
Product ID(PK) Year Old Year
Name Price Price
1 2006 Product1 $350 $250 2005
Star Schemas
A star schema is a database design where there is one central table, the fact table, that
participates in many one-to-many relationships with dimension tables.
• the fact table contains measures: sales quantity, cost dollar amount, sales dollar
amount, gross profit dollar amount
• the dimensions are date, product, store, promotion
• the dimensions are said to describe the measurements appearing in the fact table
The star schema is the simplest data warehouse schema. It is called a star schema
because the diagram resembles a star, with points radiating from a center. The center of
the star consists of one or more fact tables and the points of the star are the dimension
tables, as shown in Figure 2-1.
The most natural way to model a data warehouse is as a star schema, only one join
establishes the relationship between the fact table and any one of the dimension tables.
Page 47 of 94
TCS Internal
A star schema optimizes performance by keeping queries simple and providing fast
response time. All the information about each level is stored in one row.
Consider the Product dimension, and suppose we have the following attribute hierarchy:
Snowflake schema is a more complex data warehouse model than a star schema, and is a
type of star schema. It is called a snowflake schema because the diagram of the schema
resembles a snowflake.
Page 48 of 94
TCS Internal
Figure 17-3 Snowflake Schema
Page 49 of 94
TCS Internal
Below is the simple data model
Page 50 of 94
TCS Internal
Page 51 of 94
TCS Internal
Page 52 of 94
TCS Internal
1.ACW – Logical Design
ACW_ORGANIZATION_D
ACW_DF_FEES_STG ACW_DF_FEES_F Primary Key
Non-Key Attributes Primary Key ORG_KEY [PK1]
SEGMENT1 ACW_DF_FEES_KEY Non-Key Attributes
ORGANIZATION_ID [PK1] ORGANIZATION_CODE
ITEM_TYPE Non-Key Attributes CREA TED_BY
BUYER_ID
PRODUCT_KEY CREA TION_DATE
COST_REQUIRED
ORG_KEY LAST_UPDATE_DATE
QUARTER_1_COST LAST_UPDATED_BY
DF_MGR_KEY
QUARTER_2_COST
COST_REQUIRED D_CREATED_BY
QUARTER_3_COST D_CREATION_DATE
DF_FEES PID for DF Fees
QUARTER_4_COST
COSTED_BY D_LAST_UPDATE_DATE
COSTED_BY D_LAST_UPDATED_BY
COSTED_DATE
COSTED_DATE
APPROV ING_MGR
APPROV ED_BY
APPROV ED_DATE
APPROV ED_DATE
D_CREATED_BY
D_CREATION_DATE ACW_USERS_D
D_LAST_UPDATE_BY Primary Key
D_LAST_UPDATED_DATE USER_KEY [PK1]
EDW_TIME_HIERARCHY Non-Key Attributes
PERSON_ID
EMAIL_ADDRESS
ACW_PCBA_A PPROVAL_F LAST_NAME
Primary Key FIRST_NAME
ACW_PCBA_A PPROVAL_STG FULL_NAME
PCBA _APPROVAL_KEY
Non-Key Attributes [PK1] EFFECTIV E_STA RT_DATE
INV ENTORY_ITEM_ID Non-Key Attributes EFFECTIV E_END_DATE
LATEST_REV EMPLOYEE_NUMBER
PART_KEY
LOCATION_ID LAST_UPDATED_BY
CISCO_PART_NUMBER
LOCATION_CODE SUPPLY_CHANNEL_KEY LAST_UPDATE_DATE
APPROV AL_FLAG CREA TION_DATE
NPI
ADJUSTMENT APPROV AL_FLAG CREA TED_BY
APPROV AL_DATE D_LAST_UPDATED_BY
ADJUSTMENT
TOTA L_ADJUSTMENT D_LAST_UPDATE_DATE
APPROV AL_DATE
TOTA L_ITEM_COST D_CREATION_DATE
ADJUSTMENT_AMT
DEMAND D_CREATED_BY
SPEND_BY _ASSEMBLY
COMM_MGR COMM_MGR_KEY ACW_PRODUCTS_D
BUYER_ID Primary Key
BUYER_ID
BUYER RFQ_CREATED ACW_PART_TO_PID_D PRODUCT_KEY [PK1]
RFQ_CREATED Users
Primary Key Non-Key Attributes
RFQ_RESPONSE
RFQ_RESPONSE
CSS PART_TO_PID_KEY [PK1] PRODUCT_NA ME
CSS D_CREATED_BY Non-Key Attributes BUSINESS_UNIT_ID
D_CREATED_DATE PART_KEY BUSINESS_UNIT
D_LAST_UPDATED_BY CISCO_PART_NUMBER PRODUCT_FAMILY_ID
ACW_DF_A PPROVAL_STG D_LAST_UPDATE_DATE PRODUCT_KEY PRODUCT_FAMILY
Non-Key Attributes PRODUCT_NA ME ITEM_TYPE
LATEST_REVISION D_CREATED_BY
INV ENTORY_ITEM_ID ACW_DF_A PPROVAL_F
D_CREATED_BY D_CREATION_DATE
CISCO_PART_NUMBER Primary Key
D_CREATION_DATE D_LAST_UPDATE_BY
LATEST_REV
DF_APPROVAL_KEY D_LAST_UPDATED_BY D_LAST_UPDATED_DATE
PCBA _ITEM_FLAG [PK1]
APPROV AL_FLAG D_LAST_UPDATE_DATE
Non-Key Attributes
APPROV AL_DATE
LOCATION_ID PART_KEY
LOCATION_CODE CISCO_PART_NUMBER
BUYER SUPPLY_CHANNEL_KEY
BUYER_ID PCBA _ITEM_FLAG
RFQ_CREATED APPROV ED ACW_SUPPLY_CHA NNEL_D
RFQ_RESPONSE APPROV AL_DATE
Primary Key
CSS BUYER_ID
SUPPLY_CHANNEL_KEY
RFQ_CREATED
[PK1]
RFQ_RESPONSE
CSS Non-Key Attributes
D_CREATED_BY SUPPLY_CHANNEL
D_CREATION_DATE DESCRIPTION
D_LAST_UPDATED_BY LAST_UPDATED_BY
D_LAST_UPDATE_DATE LAST_UPDATE_DATE
CREA TED_BY
CREA TION_DATE
D_LAST_UPDATED_BY
D_LAST_UPDATE_DATE
D_CREATED_BY
D_CREATION_DATE
Page 53 of 94
TCS Internal
2.ACW – Physical Design
ACW_PRODUCT S_D
Columns
ACW_DF_APPROVA L_STG
PRODUCT _KEY NUMB ER(10) [P K1]
Columns
PRODUCT _NAME CHAR(30)
INVENT ORY_IT EM_ID NUMB ER(10) BUS INESS_UNIT _ID NUMB ER(10)
CISCO_PA RT_NUM BERCHAR(30) ACW_DF_APPROVA L_F ACW_PART _TO_PID_D
BUS INESS_UNIT VARCHAR2(60)
LATEST _REV CHAR(10) Colum ns Columns
PRODUCT _FAM ILY_ID NUMB ER(10)
PCBA_ITEM_FLAG CHAR(1) DF_APPROVAL_KEY NUMBER(10) [PK1] PART_T O_PID_KEY NUMB ER(10) [P K1]
PRODUCT _FAM ILY VARCHAR2(180)
APPROVAL_FLAG CHAR(1) PART _K EY NUMBER(10) PART_K EY NUMB ER(10)
IT EM_T YPE CHAR(30)
APPROVAL_DA T E DAT E CISCO_PA RT _NUMBE R CHA R(30) CISCO_PART _NUMBE RCHAR(30)
D_CREA T ED_BY CHAR(10)
LOCAT ION_ID NUMB ER(10) SUP PLY _CHANNE L_KEYNUMBER(10) PRODUCT_KEY NUMB ER(10) D_CREA T ION_DAT E DAT E
SUPPLY_CHANNEL CHAR(10) PCB A_IT EM_FLAG CHA R(1) PRODUCT_NAME CHAR(30)
D_LAST _UPDAT E_BY CHAR(10)
BUYER VARCHAR2(240) APP ROV ED CHA R(1) LAT EST _REVIS ION CHAR(10)
D_LAST _UPDAT ED_DAT CEHAR(10)
BUYER_ID NUMB ER(10) APP ROV AL_DAT E DAT E D_CREA TED_BY CHAR(10)
RFQ_CREAT ED CHAR(1) BUY ER_ID NUMBER(10) D_CREA TION_DATE DAT E
RFQ_RESPONSE CHAR(1) RFQ_CREAT ED CHA R(1) D_LAST _UPDAT ED_BYCHAR(10)
CSS CHAR(10) RFQ_RE SPONSE CHA R(1) D_LAST _UPDAT E_DATE DAT E
CSS CHA R(10)
D_CREA T ED_BY CHA R(10)
D_CREA T ION_DAT E DAT E
D_LAST _UPDATED_BY CHA R(10)
D_LAST _UPDATE_DAT EDAT E
ACW_SUPPLY_CHANNEL_D
Columns
SUP PLY _CHANNEL_KEYNUMB ER(10) [P K1]
SUP PLY _CHANNEL CHA R(60)
DES CRIPT ION VARCHAR2(240)
LAST _UPDAT ED_BY NUMB ER
LAST _UPDAT E_DAT E DAT E
CRE AT ED_BY NUMB ER(10)
CRE AT ION_DAT E DAT E
D_LAST_UPDAT ED_BY CHA R(10)
D_LAST_UPDAT E_DAT EDAT E
D_CREA T ED_BY CHA R(10)
D_CREA T ION_DAT E DAT E
Users
Page 54 of 94
TCS Internal
3. ACW - Data Flow Diagram
ESMPRD
SJPROD D REFADM
X1 SJ and ODS Data ACW
D MFGISRO
D NRTREF
ACW Data
ESMPRD - DWRPT
DF Approval D ACW_DF_
D ACW_DF_APPROVAL_
STG APPROVAL_F ACW BO Reports
D ACW_DF_FEES_ BO Reports
D ACW_DF_FEES_STG DF Fees F
Dimensional
D ACW_SUPPLY _ D ACW_ D EDW_TIME_
CHANNEL_D ORGANIZATIONS_D HIERARCHY_D
Page 55 of 94
TCS Internal
Implementation for Incremental Load
Method -I
Page 56 of 94
TCS Internal
Logic in the mapping variable is
Page 57 of 94
TCS Internal
Logic in the SQ is
Logic in the expression is to set max value for mapping var is below
Page 58 of 94
TCS Internal
Logic in the update strategy is below
Page 59 of 94
TCS Internal
Page 60 of 94
TCS Internal
Method -II
Page 61 of 94
TCS Internal
Main mapping
Page 62 of 94
TCS Internal
Workflod Design
Parameter file
Page 63 of 94
TCS Internal
It is a text file below is the format for parameter file. We use to place this file in the unix
box where we have installed our informatic server.
[GEHC_APO_DEV.WF:w_GEHC_APO_WEEKLY_HIST_LOAD.WT:wl_GEHC_AP
O_WEEKLY_HIST_BAAN.ST:s_m_GEHC_APO_BAAN_SALES_HIST_AUSTRIA]
$InputFileName_BAAN_SALE_HIST=/interface/dev/etl/apo/srcfiles/HS_025_20070921
$DBConnection_Target=DMD2_GEMS_ETL
$$CountryCode=AT
$$CustomerNumber=120165
[GEHC_APO_DEV.WF:w_GEHC_APO_WEEKLY_HIST_LOAD.WT:wl_GEHC_AP
O_WEEKLY_HIST_BAAN.ST:s_m_GEHC_APO_BAAN_SALES_HIST_BELGIUM]
$DBConnection_Sourcet=DEVL1C1_GEMS_ETL
$OutputFileName_BAAN_SALES=/interface/dev/etl/apo/trgfiles/HS_002_20070921
$$CountryCode=BE
$$CustomerNumber=101495
Transformation…
Mapping: Mappings are the highest-level object in the Informatica object hierarchy,
containing all objects necessary to support the movement of data.
Session: A session is a set of instructions that tells informatica Server how to move data
from sources to targets.
WorkFlow: A workflow is a set of instructions that tells Informatica Server how to
execute tasks such as sessions, email notifications. In a workflow multiple sessions can
be included to run in parallel or sequential manner.
Source Definition: The Source Definition is used to logically represent an application
database table.
Target Definition: The Target Definition is used to logically represent a database table
or file in the Data Warehouse / Data Mart.
Aggregator: The Aggregator transformation is used to perform calculations on additive
data. Reduce performance.
Expression: The Expression transformation is used to evaluate, create, modify data or set
and create variables
Filter: The Filter transformation is used as a True/False gateway for passing data through
a given path in the mapping. Should be used earlier to reduce unwanted data to pass.
Joiner: The Joiner transformation is used to join two related heterogeneous data sources
residing in different physical locations or file systems
Lookup: The Lookup transformation is used to retrieve a value from database and apply
the retrieved values against the values passed in from another transformation.
Normalizer: The Normalizer transformation is used to transform structured data (such as
COBOL or flat files) into relational data
Rank: The Rank transformation is used to order data within certain data set so that only
the top or bottom n records are retrieved
Page 64 of 94
TCS Internal
Sequence Generator: The Sequence Generator transformation is used to generate
numeric key values in sequential order.
Source Qualifier: The Source Qualifier transformation is used to describe in SQL the
method by which data is to be retrieved from a source application system.
Stored Procedure: The Stored Procedure transformation is used to execute externally
stored database procedures and functions
Update Strategy: The Update Strategy transformation is used to indicate the DML
statement.
Input Transformation: Input transformations are used to create a logical interface to a
mapplet in order to allow data to pass into the mapplet.
Output Transformation: Output transformations are used to create a logical interface
from a mapplet in order to allow data to pass out of a mapplet.
Adventage of Teradata:
7. Can store Billions of rows.
8. parallel processing makes teradata faster than other RDBMS
9. Can be accessed by network attached and channel attached system
10. supports the requirements from diverse clients
11. automatically detects and recovers from hardware failure
12. allows expansion without sacrifice performance
Introduction
This document is intended to provide a uniform approach for developers in
building Informatica mappings and sessions.
Informatica Overview
Informatica is a powerful Extraction, Transformation, and Loading tool and is been
deployed at GE Medical Systems for data warehouse development in the Business
Intelligence Team. Informatica comes with the following clients to perform various tasks.
Page 65 of 94
TCS Internal
Informatica Architecture at GE Medical Systems
DEVELOPMENT ENVIRONMENT
GEMSDW1 (3.231.200.74)
INFORMATICA SERVER
SOURCE DATA
DWDEV
DEVELOPMENT DATABSE
IFDEV
TESTING ENVIRONMENT
GEMSDW1 (3.231.200.74)
INFORMATICA SERVER
SOURCE DATA
DWTEST
TEST DATABASE
IFDEV
PRODUCTION ENVIRONMENT
GEMSDW2 (3.231.200.69)
FIN2
To start development on any data mart you should have the following things set up by the
Informatica Load Administrator
Informatica Folder. The development team in consultation with the BI
Support Group can decide a three-letter code for the project, which would be
used to create the informatica folder as well as Unix directory structure.
Informatica Userids for the developers
Unix directory structure for the data mart.
A schema XXXLOAD on DWDEV database.
The best way to get the informatica set-up done is to put a request in the following
website.
http://uswaudom02medge.med.ge.com/GEDSS/prod/BIDevelopmentSupport.nsf/
Transformation Specifications
Before developing the mappings you need to prepare the specifications document for the
mappings you need to develop. A good template is placed in the templates folder
(\\3.231.100.33\GEDSS_All\QuickPlace_Home\Tools\Informatica\Installation_&_Development\Templates
). You can use your own template as long as it has as much detail or more than that which
is in this template.
Page 67 of 94
TCS Internal
While estimating the time required to develop mappings the thumb rule is as follows.
Simple Mapping – 1 Person Day
Medium Complexity Mapping – 3 Person Days
Complex Mapping – 5 Person Days.
Usually the mapping for the fact table is most complex and should be allotted as much
time for development as possible.
INSTEAD OF
T1 SQ1
T1
T3
T3 SQ1
Page 68 of 94
TCS Internal
Data Loading from tables in Oracle Apps
When you try to import the source definition from a table in Oracle Apps using source
analyser in designer you might face problems, as informatica cannot open up so many
schemas at the same time. The best way to import the source definition of a table in
Oracle Apps is to take the table creation script you want to import and create it in a test
schema and import the definition from there.
Commenting
Any experienced developer would agree to the point that a good piece of code is not just
a script, which runs efficiently and does what it is required to do, but also one that is
commented properly. So in keeping with good coding practices, informatica mappings,
sessions and other objects involved in the mappings need to be commented properly as
well. This not only helps in the production support team to debug the mappings in case
they throw errors while running in production but also this way we are storing the
maximum metadata in the informatica repository which might be useful when we build a
central metadata repository in the near future.
Each folder should have the Project name and Project leader name in the
comments box.
Each mapping should have a comment, which tells what the mapping does at very
high level
Each transformation should have a comment in the description box, which tells
the purpose of the transformation.
If the transformation is taking care of a business rule then that business rule
should be mentioned in the comment.
Each port should have its purpose documented in the description box.
Log files
A session log is created for each session that runs. The verbosity of the logs can be
tailored to specific performance or troubleshooting needs. By default, the session log
name is the name of the mapping with the .log extension. This should not normally be
overridden. The Session Wizard has two options for modifying the session name, by
appending either the ordinal (if saving multiple sessions is enabled) or the time (if saving
session by timestamp is enabled). Be aware that when saving session logs by timestamp,
Informatica does not perform any deletion or archiving of the session logs.
Whenever using the VERBOSE DATA option of informatica logging use a condition to
load just a few records rather than doing a full load. This conserves the space on the Unix
Box. Also you should remove the verbose option as soon as you are done with the
troubleshooting. You should configure your informatica sessions to create the log files
at /ftp/oracle/xxx/logs/ directory and the bad files in /ftp/oracle/xxx/errors/ directory
where xxx stands for the three-letter code of the data mart.
Page 69 of 94
TCS Internal
Triggering Sessions and Batches
The standard methodology to schedule informatica sessions and batches is through
Cronacle scripts.
Failure Notification
Once in production your sessions and batches need to send out notification when then fail
to the load administrator. You can do this by calling the script
/i01/proc/admin/intimate_load_failure.sh xxx in the failure post session commands. The
script intimate_load_failure.sh takes the three letter data mart code as argument.
Page 70 of 94
TCS Internal
Naming Conventions and usage of Transformations
Quick Reference
Object Type Syntax
Folder XXX_<Data Mart Name>
Mapping m_fXY_ZZZ_<Target Table Name>_x.x
Session s_fXY_ZZZ_<Target Table Name>_x.x
Batch b_<Meaningful name representing the sessions inside>
Source Definition <Source Table Name>
Target Definition <Target Table Name>
Aggregator AGG_<Purpose>
Expression EXP_<Purpose>
Filter FLT_<Purpose>
Joiner JNR_<Names of Joined Tables>
Lookup LKP_<Lookup Table Name>
Normalizer Norm_<Source Name>
Rank RNK_<Purpose>
Router RTR_<Purpose>
Sequence Generator SEQ_<Target Column Name>
Source Qualifier SQ_<Source Table Name>
Stored Procedure STP_<Database Name>_<Procedure Name>
Update Strategy UPD_<Target Table Name>_xxx
Mapplet MPP_<Purpose>
Input Transformation INP_<Description of Data being funneled in>
Output Tranformation OUT_<Description of Data being funneled out>
Database Connections XXX_<Database Name>_<Schema Name>
General Conventions
The name of the transformation should be as self-explanatory as possible regarding its
purpose in the mapping.
Wherever possible use short forms of long words.
Use change of case as word separators instead of under scores to conserve characters.
E.g. FLT_TransAmtGreaterThan0 instead of FLT_Trans_Amt_Greater_Than_0.
Preferably use all UPPER CASE letters in naming the transformations.
Folder
XXX stands for the three-letter code for that specific data mart.
Make sure you put in the project leader name and a brief note on what the data mart is all
about in the comments box of the folder.
Page 71 of 94
TCS Internal
Example
BIO_Biomed_Datamart
Mapping
Mapping is the Informatica Object which contains set of transformations including source
and target. Its look like pipeline.
m_fXY_ZZZ_<Target Table Name or Meaningful name>_x.x
f= frequency
d=daily,
w=weekly,
m=monthly,
h=hourly
X = L for Load or U for Update
Y = S for Stage or P for Production
ZZZ = Three-letter data mart code.
x.x = version no. eg. 1.1
Example
Name of a mapping which just inserts data into the stage table Contract_Coverages on a daily
basis.
m_dLS_BIO_Contract_Coverages_1.1
Session
A session is a set of instructions that tells informatica Server how to move data from
sources to targets.
s_fXY_ZZZ_<Target Table Name or Meaningful name>_x.x
f = frequency
d=daily,
w=weekly,
m=monthly,
h=hourly
X = L for Load or U for Update
Y = S for Stage or P for Production
ZZZ = Three-letter data mart code.
x.x = version no. eg. 1.1
Example
Name of a session which just inserts data into the stage table Contract_Coverages on a daily
basis.
s_dLS_BIO_Contract_Coverages_1.1
A workflow is a set of instructions that tells the Informatica Server how to execute tasks such
as sessions, email notifications, and shell commands. In a workflow multiple sessions can be
included to run in parallel or sequential manner
Page 72 of 94
TCS Internal
WorkFlow
A workflow is a set of instructions that tells Informatica Server how to execute tasks such
as sessions, email notifications and commands. In a workflow multiple sessions can be
included to run in parallel or sequential manner.
Wkf_<Meaningful name representing the sessions inside>
Example
Name of a workflow which contains sessions which run daily to load US sales data
wkf_US_SALES_DAILY_LOAD
Source Definition
The Source Definition is used to logically represent an application database table. The
Source Definition is associated to a single table or file and is created through the Import
Source process. It is usually the left most object in a mapping.
Naming convention is as follows.
<Source Table Name>
Source Definition could also be named as follows
<Source File Name>
In case there are multiple files then use the common part of all the files
OR
<Database name>_<Table Name>
OR
XY_<Db Name>_<Table Name>
XY = TB if it’s a table
FF if it’s a flat file
CB if it’s a COBOL file
Example
Name of source table from dwdev database
TB_DWDEV_TRANSACTION, FF_OPERATING_PLAN
Target Definition
The Target Definition is used to logically represent a database table or file in the Data
Warehouse / Data Mart. The Target Definition is associated to a single table and is
created in the Warehouse Designer. The target transformation is usually the right most
transformation in a mapping
Naming convention is as follows.
<Target Table Name>
Target Definition could also be named as follows
<Database name>_<Table Name>
OR
XY_<Db Name>_<Table Name>
XY = TB if it’s a table
FF if it’s a flat file
CB if it’s a COBOL file
Example
Name of target table in dwdev database
Page 73 of 94
TCS Internal
TB_DWDEV_TRANSACTION, FF_OPERATING_PLAN
Aggregator
The Aggregator transformation is used to perform Aggregate calculations on group basis.
AGG_<Purpose>
Example
Name of aggregator which aggregates to transaction amount
AGG_SUM_OF_TRANS_AMT
Name of aggregator which is used to find distinct records
AGG_DISTINCT_ORDERS
Expression
The Expression transformation is used to perform the arithmetic calculation on row by
row basis and also used to convert string to integer vis and concatenate two columns.
:
1. Variable names should begin with the letters “v_’ followed by the datatype and
name.
o Character data – v_char/v_vchar/v_vchar2/v_text
o Numeric data – v_num/v_float/v_dec/v_real
o Integer data – v_int/v_sint
o Date data – v_date
o Sequential data – v_seq
2. Manipulations of string should be indicated in the name of the new port. For
example, conc_CustomerName.
3. Manipulations of numeric data should be indicated in the name of the new port.
For example, sum_AllTaxes.
Naming convention of the transformation itself is as follows.
EXP_<Purpose>
Example
Name of expression which is used to trim columns
EXP_TRIM_COLS
Name of exression which is used to decode geography identifiers to geography descriptions
EXP_DECODE_GEOG_ID
Filter
The Filter transformation is used as a True/False gateway for passing data through a
given path in the mapping. Filters are almost always used in tandem to provide a path for
both possibilities. The Filter transformation should be used as early as possible in a
mapping in order to preserve performance.
Naming convention is as follows.
FLT_<Purpose>
Filters could also be named as follows
FLT_<Column in Condition>
Example
Name of filter which filters out records which are already existing in the target table.
FLT_STOP_OLD_RECS
Page 74 of 94
TCS Internal
Name of filter which filters out records with geography identifiers less than zero
FLT_GEO_ID or FLT_GeoidGreaterThan0
Joiner
The Joiner transformation is used to join two related heterogeneous data sources residing
in different physical locations or file systems. One of the most common uses of the joiner
is to join data from a relational table to a flat file etc. The sources or tables joined should
be annotated in the Description field of the Transformation tab for the Joiner
transformation.
JNR_<Names of Joined Tables>
Example
Name of joiner which joins TRANSACTION and GEOGRAPHY table
JNR_TRANX
Lookup
The Lookup transformation is used to retrieve a value(s) from database and apply the
retrieved value(s) against a value(s) passed in from another transformation. The existence
of the retrieved value(s) can then be used in other transformations to satisfy a condition.
Lookup transformations can be used in either a connected or unconnected state. Where
possible, the unconnected state should be used to enhance performance. However, it must
be noted that only one return value can be passed out of an unconnected lookup.
The ports needed for the Lookup should be suffixed with the letters “_in” for the input
ports and “_out” for the output ports. Port data types should not normally be modified in
a Lookup transformation, but instead should be modified in a prior transformation.
Often lookups fail and developers are left to wonder why. The datatype of a port
is absolutely essential in validating data through a lookup. For example, a
decimal(19,2) and a money datatype will not match.
When overriding the Lookup SQL, always ensure to put a valid Order By or
Order By 1 statement in the SQL. This will cause the database to perform the
order rather than Informatica server as builds the cache.
Naming convention is as follows.
LKP_<Lookup Table Name>
Example
Name of lookup transformation looking up on transaction table would be
LKP_TRANSACTION
Normalizer
The Normalizer transformation is used to transform structured data (such as COBOL or
flat files) into relational data. The Normalizer works by having the file header and detail
information identified by the developer in the transformation, and then looping through
the structured file according to the transformation definition.
Norm_<Source Name>
Example
Name of Normalizer normalizing data in OS_TRANS_DAILY file
Norm_OS_TRANS_DAILY
Page 75 of 94
TCS Internal
Rank
The Rank transformation is used to order data within certain data set so that only the top
or bottom n records are retrieved. For example, you can order Stores by Sales Quarterly
and then filter only the top 10 Store records. The reference to the business rule governing
the ranking should be annotated in the Description field of the Transformation tab for the
Rank transformation.
RNK_<Purpose>
Example
Name of Rank which picks top 10 Customers by Sales Amounts.
RNK_TopTenCustbySales
Router
RTR_<Purpose>
Example
Name of which routes data based on the value of Geography Identifier
RTR_GeoidGreaterThan0
OR
RTR_GEO_ID
Sequence Generator
The Sequence Generator transformation is used to generate numeric key values in
sequential order. This is normally done to produce surrogate primary keys etc. It has been
observed that reusable sequence generators don’t work as efficiently as stand alone
sequence generators. To overcome this there are two options.
1) Use the procedure described in the appendix A of this document.
2) Use a trigger on the target table to populate the primary key automatically when a
record is inserted.
Example
Name of sequence generator feeding primary key column to transaction table
SEQ_TRANSACTION
Source Qualifier
The Source Qualifier transformation is used to describe in SQL (or in the native script of
the DBMS platform, e.g. SQL for Oracle) the method by which data is to be retrieved
from a source application system. The Source Qualifier describes any joins, join types,
order or group clauses, and any filters of the data.
Care should be exercised in the use of filters in the Source Qualifier or in overriding the
default SQL or native script. The amount of data can be greatly affected using this option,
such that a mapping can become invalid. Use this option only when it is known that the
data excluded will not be needed in the mapping.
Naming convention is as follows.
Page 76 of 94
TCS Internal
SQ_<Source Table Name>
Example
Name of source qualifier of Transaction table
SQ_TRANSACTION
Stored Procedure
The Stored Procedure transformation is used to execute externally stored database
procedures and functions. The transformation can execute any require functionality as
needed, from truncate table to complex business logic. Avoid using stored procedures as
far as possible as this makes the mappings difficult to debug and also reduces readability
of the code. Informatica doesn’t have the LIKE operator and you can use a store
procedure, which will do the LIKE test and send out a flag. Similarly any operator or
function, which is not available in informatica but available in the database server, can be
used by the usage of small stored procedure. You should resist the temptation of putting
all the logic in a stored procedure.
Naming convention is as follows.
STP_<Database Name>_<Procedure Name>
Example
Name of stored procedure to calculate commission in dwdev database
STP_DWDEV_Calc_Commission
Update Strategy
The Update Strategy transformation is used to indicate the type of data modification
(DML) that will occur to a table in the database. The transformation can provide
INSERT, UPDATE, or DELETE functionality to the data. As far as possible don’t use
the REJECT option of Update Strategy as details of the rejected records are entered into
the log file by informatica and hence this may lead to the creation of a very big log file.
Naming convention is as follows.
UPD_<Target Table Name>_xxx
xxx = _ins for INSERT
_dlt for DELETE
_upd for UPDATE
_dyn – dynamic (the strategy type is decided by an algorithm inside the
update strategy transformation
When using an Update Strategy transformation, do not allow the numeric representation
of the strategy to remain in the expression. Instead, replace the numeric with the
following:
0 - INSERT
1- DELETE
2 - UPDATE
Example
Name of update strategy which updates TRANSACTION table
UPD_TRANSACTION_upd
Page 77 of 94
TCS Internal
Mapplet
Mapplets are a way of capturing complex transformation logic and storing the logic for
reuse. It may also be designed to pre-configure transformations that are redundant, thus
saving development time. Mapplets usually contain several transformations configured to
meet a specific transformation need.
In order for mapplets to be reusable, input and output transformation ports are required.
These ports provide a logical interface from a mapping to the mapplet. As with all
interface designs, mapplets require careful design to ensure their maximum efficiency
and reusability.
All transformations contained within the mapplet should be named in accordance with the
Transformation Naming Convention listed above. The exception is that if the target data
mart name is required, it should not be included unless the mapplet is specific to a single
data mart project. If the mapplet is specific to a data mart project, make sure it is
documented as such.
It is important to ensure the Description field of the mapplet is completed. Additionally,
the functionality and reuse of the mapplet should be defined as well.
MPP_<Purpose>
Example
Name of mapplet, which splits monthly estimates to weekly estimates
MPP_SPLIT_ESTIMATES
Database Connections
When creating database connections in server manager to access source databases and
target database you should follow the following naming convention to avoid confusion
and make production migration easy.
XXX_<Database Name>_<Schema_Name> where XXX stands for the three-letter
datamart code.
Example
Page 78 of 94
TCS Internal
Database connection for cfdload schema on dwdev database for ORS datamart sessions would be
ORS_dwdev_cfdload
Version Control
The version control feature provided by Informatica is not mapping specific but folder
specific. You cannot version individual mappings separately. You need to save the whole
contents of the folder as a different version when ever you want to change the version of
a single mapping. Hence we have proposed to do version control through PVCS. You
need to have the following structure set up in PVCS before you start you start
development.
PVCS----|
|-----Project Name---|
|-----Informatica_Folder_Name_1
|-----Informatica_Folder_Name_2|
|-----Mappings|---
Mapping1
|----
Mapping2
|-----Sessions|----
Session1
|----
Session1
You can start PVCS right from the day one when development begins. This way the
PVCS can server as the central repository for all scripts including informatica scripts
which will enable the developers to access the production scripts anytime. The name of
the mapping should reflect the version number. Refer to the naming conventions of
mappings. The first cut mappings should be named with a suffix “_1.0”, next whenever
you want to make changes you should first make a copy of the mapping and change the
suffix to “_1.2”.
Testing Guidelines
Testing a New Data mart
Get a fresh schema XXXLOAD created in the DWTEST database. Test your mappings
by loading data into XXXLOAD schema on DWTEST database. You can do a partial
load or a full load depending on the amount of data you have. This schema would later
serve the purpose of testing out any changes you make to the data mart once it’s moved
into production. For testing the sessions and mappings you can use the template in the
templates folder. You can use your own improvised template in case this doesn’t suffice
your requirement.
Page 79 of 94
TCS Internal
Testing changes to a data mart in production
First develop the changes in the Informatica Development Repository and test them out
by loading data into XXXLOAD schema on DWDEV database. Next make sure the
schema XXXLOAD in DWTEST database has exactly the same structure and data as in
XXXLOAD on DWSTAGE. Now test out all your changes by loading data into
XXXLOAD schema on DWTEST database. After you are satisfied with the test results
you can move the changes to production. Make sure you follow the same process to move
your changes from DWDEV to DWTEST as you would follow to move the changes to
DWSTAGE.
Production Migration
You should first go through the BI change control procedure described in the
document at the following link
\\uswaufs03medge.med.ge.com\GEDSS_All\QuickPlace_Home\Processes\Change_Contr
ol
Page 80 of 94
TCS Internal
Performance Tuning
The goal of performance tuning is to optimize session performance by eliminating
performance bottlenecks. To tune the performance of a session, first you identify a
performance bottleneck, eliminate it, and then identify the next performance
bottleneck until you are satisfied with the session performance. You can use the test
load option to run sessions when you tune session performance.
The most common performance bottleneck occurs when the Informatica Server writes
to a target database. You can identify performance bottlenecks by the following
methods:
Running test sessions. You can configure a test session to read from a flat
file source or to write to a flat file target to identify source and target
bottlenecks.
Studying performance details. You can create a set of information called
performance details to identify session bottlenecks. Performance details
provide information such as buffer input and output efficiency.
Monitoring system performance. You can use system-monitoring tools to
view percent CPU usage, I/O waits, and paging to identify system bottlenecks.
Once you determine the location of a performance bottleneck, you can
eliminate the bottleneck by following these guidelines:
Eliminate source and target database bottlenecks. Have the database
administrator optimize database performance by optimizing the query,
increasing the database network packet size, or configuring index and key
constraints.
Eliminate mapping bottlenecks. Fine tune the pipeline logic and
transformation settings and options in mappings to eliminate mapping
bottlenecks.
Eliminate session bottlenecks. You can optimize the session strategy and use
performance details to help tune session configuration.
Eliminate system bottlenecks. Have the system administrator analyze
information from system monitoring tools and improve CPU and network
performance.
If you tune all the bottlenecks above, you can further optimize session performance
by partitioning the session. Adding partitions can improve performance by utilizing
more of the system hardware while processing the session.
Because determining the best way to improve performance can be complex, change
only one variable at a time, and time the session both before and after the change. If
session performance does not improve, you might want to return to your original
configurations.
For more information check out the Informatica Help from any of the three
informatica client tools.
Page 81 of 94
TCS Internal
Performance Tips
If suppose I've to load 40 lacs records in the target table and the workflow
is taking about 10 - 11 hours to finish. I've already increased
the cache size to 128MB.
There are no joiner, just lookups
and expression transformations
Ans:
Page 82 of 94
TCS Internal
Staging areas: If you use staging areas u force informatica
server to perform multiple datapasses. Removing of staging
areas may improve session performance.
Page 83 of 94
TCS Internal
1. Filter as soon as possible (left most in mapping). Process only the data necessary
and eliminate as much extra unnecessary data as possible. Use Source Qualifier
to filter data since the Source Qualifier transformation limits the row set extracted
from a source while the Filter transformation limits the row set sent to a target.
3. Cache lookups if source table is under 500,000 rows and DON’T cache for tables
over 500,000 rows.
5. If a value is used in multiple ports, calculate the value once (in a variable) and
reuse the result instead of recalculating it for multiple ports.
9. Avoid using Stored Procedures, and call them only once during the mapping if
possible.
10. Remember to turn off Verbose logging after you have finished debugging.
11. Use default values where possible instead of using IIF (ISNULL(X),,) in
Expression port.
12. When overriding the Lookup SQL, always ensure to put a valid Order By
statement in the SQL. This will cause the database to perform the order rather
than Informatica Server while building the Cache.
13. Improve session performance by using sorted data with the Joiner transformation.
When the Joiner transformation is configured to use sorted data, the Informatica
Server improves performance by minimizing disk input and output.
14. Improve session performance by using sorted input with the Aggregator
Transformation since it reduces the amount of data cached during the session.
Page 84 of 94
TCS Internal
15. Improve session performance by using limited number of connected input/output
or output ports to reduce the amount of data the Aggregator transformation stores
in the data cache.
17. Performing a join in a database is faster than performing join in the session. Also
use the Source Qualifier to perform the join.
18. Define the source with less number of rows and master source in Joiner
Transformations, since this reduces the search time and also the cache.
19. When using multiple conditions in a lookup conditions, specify the conditions
with the equality operator first.
21. If the lookup table is on the same database as the source table, instead of using a
Lookup transformation, join the tables in the Source Qualifier Transformation
itself if possible.
22. If the lookup table does not change between sessions, configure the Lookup
transformation to use a persistent lookup cache. The Informatica Server saves and
reuses cache files from session to session, eliminating the time required to read
the lookup table.
23. Use :LKP reference qualifier in expressions only when calling unconnected
Lookup Transformations.
24. Informatica Server generates an ORDER BY statement for a cached lookup that
contains all lookup ports. By providing an override ORDER BY clause with
fewer columns, session performance can be improved.
26. Reduce the number of rows being cached by using the Lookup SQL Override
option to add a WHERE clause to the default SQL statement.
Page 85 of 94
TCS Internal
Actual Pass or Tested
Results, Fail By
Step Description Test Conditions Expected Results (P or F)
#
SAP-
CMS
Inter
faces
1 CMS Run the Informatica Interface sends an email As P Madhav
database Interface notification and stops expected a
down
2 Check the no Run the Informatica Count of records in flat file As P Madhav
of records Interface and table is same expected a
loaded in
table
3 Call the SP Run the Informatica Get the unique number As P Madhav
for getting Interface expected a
unique
sequence no
4 Run the Run the Informatica Interface will stop after As P Madhav
interface even Interface finding no flat files expected a
if no flat file
present on sap
server
Page 86 of 94
TCS Internal
Actual Pass or Tested
Results, Fail By
Step Description Test Conditions Expected Results (P or F)
#
5 Check for flat Run the Informatica Interface will load the data As P Madhav
files when Interface into CMS expected a
files are
present on sap
server
6 SAP host Run the Informatica Informatica Interface will fail As P Madhav
name changed Interface to do SCP of the files on to expected a
in the SCP SAP server and send an error
script email
7 SAP unix user Run the Informatica Informatica Interface will fail As P Madhav
changed in Interface to do SCP of the files on to expected a
the SCP script SAP server and send an error
email
8 CMS Run the Informatica Data is not loaded and files are As P Madhav
database Interface sent to errored directory expected a
down after
files are
retrieved from
SAP server
9 Stored Run the Informatica Informatica interface stops and As P Madhav
Procedure Interface sends an error email expected a
throws an
error
10 Check the Run the Informatica Value of DA_LD_NR in the As P Madhav
value of Interface control table is same as that expected a
DA_LD_NR I loaded in the table for that
in the control interface
table
11 Error during Run the Informatica Interface is stopped, files As P Madhav
load the data Interface moved to errored directory expected a
into CMS and email notification sent
tables
12 Error while Run the Informatica Interface sends an email As P Madhav
updating the Interface notification and stops expected a
control table
in CMS
CMS
SAP
Inter
faces
1 Value in CMS Run the Informatica Informatica Interface will not As P Madhav
Control table Interface generate any flat file expected a
is not set to
“STAGED”
2 Value in CMS Run the Informatica Informatica Interface will not As P Madhav
Control table Interface generate flat file expected a
is set to
“STAGED”
Page 87 of 94
TCS Internal
Actual Pass or Tested
Results, Fail By
Step Description Test Conditions Expected Results (P or F)
#
3 SAP host Run the Informatica Informatica Interface will fail As P Madhav
name changed Interface to scp the files on to SAP expected a
in the SCP server and send an error email
script
4 SAP unix user Run the Informatica Informatica Interface will fail As P Madhav
changed in Interface to SCP the files on to SAP expected a
the SCP script server and send an error email
5 Value in CMS Run the Informatica The status in the control table As P Madhav
Control table Interface updated to expected a
is set to “TRANSFORMED”
“STAGED”
6 Value in CMS Run the Informatica The status of each record is As P Madhav
Control table Interface updated to expected a
is set to “UNPROCESSED”
“STAGED”
and record
status is
“UNPROCES
SED”
7 File generated Run the Informatica No files are send to SAP As P Madhav
with no Interface server expected a
records
8 CMS Run the Informatica Interface sends an email As P Madhav
database Interface notification and stops expected a
down
9 SCP of files Run the Informatica Flat files are moved to As P Madhav
failed Interface error directory and send expected a
an email
10 SCP of files is Run the Informatica Flat files are moved to As P Madhav
successful Interface processed directory expected a
11 Check the no Run the Informatica Count of records updated is As P Madhav
of records Interface same as count of the records in expected a
updated in the the flat file
CMS table
12 Check the no Run the Informatica Count of records present in the As P Madhav
of records Interface flat file is same as the records expected a
present in the with status
flat file “UNPROCCESSED”
SLM
CMS
1 Value in SLM Run the Informatica Informatica Interface will not As P Madhav
Control table Interface generate any flat file expected a
is not set to
“STAGED”
2 Value in SLM Run the Informatica Informatica Interface will not As P Madhav
Control table Interface generate flat file expected a
is set to
“STAGED”
Page 88 of 94
TCS Internal
Actual Pass or Tested
Results, Fail By
Step Description Test Conditions Expected Results (P or F)
#
3 CMS Run the Informatica Interface sends an email As P Madhav
database Interface notification and stops expected a
down
4 Check the no Run the Informatica Count of records in CMS and As P Madhav
of records Interface SLM table is same expected a
loaded in
table
5 Call the SP Run the Informatica Get the unique number As P Madhav
for getting Interface expected a
unique
sequence no
6 Stored Run the Informatica Informatica interface stops and As P Madhav
Procedure Interface sends an error email expected a
throws an
error
7 Check the Run the Informatica Value of DA_LD_NR in the As P Madhav
value of Interface control table is same as that expected a
DA_LD_NR I loaded in the table for that
in the control interface
table
8 Error during Run the Informatica Interface sends an email As P Madhav
loading the Interface notification and stops expected a
data into
CMS table
9 Error while Run the Informatica Interface sends an email As P Madhav
updating the Interface notification and stops expected a
control table
in CMS
10 Error while Run the Informatica Interface sends an email As P Madhav
retrieving Interface notification and stops expected a
data from
SLM database
11 Error while Run the Informatica Interface sends an email As P Madhav
updating Interface notification and stops expected a
control table
on SLM
database
What is QA philosophy?
Page 89 of 94
TCS Internal
followed, and ensuring that problems are found and dealt with. It is oriented to
'prevention'.
vijay kumar: What is 'Software Testing'?
Testing involves operation of a system or application under controlled conditions and
evaluating the results (eg, 'if the user is in interface A of the application while using
hardware B, and does C, then D should happen'). The controlled conditions should
include both normal and abnormal conditions. Testing should intentionally attempt to
make things go wrong to determine if things happen when they shouldn't or things don't
happen when they should. It is oriented to 'detection'. (See the Bookstore section's
'Software Testing' category for a list of useful books on Software Testing.)
Organizations vary considerably in how they assign responsibility for QA and testing.
Sometimes they're the combined responsibility of one group or individual. Also common
are
vijay kumar: Why does software have bugs?
Page 90 of 94
TCS Internal
cost effective methods of ensuring quality. Employees who are most skilled at
inspections are like the 'eldest brother' in the parable in 'Why is it often hard for
vijay kumar: What is software 'quality'?
Quality software is reasonably bug-free, delivered on time and within budget, meets
requirements and/or expectations, and is maintainable. However, quality is obviously a
subjective term. It will depend on who the 'customer' is and their overall influence in the
scheme of things. A wide-angle view of the 'customers' of a software development
project might include end-users, customer acceptance testers, customer contract officers,
customer management, the development organization's
management/accountants/testers/salespeople, future software maintenance engineers,
stockholders, magazine columnists, etc. Each type of 'customer' will have their own slant
on 'quality' - the accounting department might define quality in terms of profits while an
end-user might defin
vijay kumar: What is SEI? CMM? CMMI? ISO? IEEE? ANSI? Will it help?
Page 91 of 94
TCS Internal
predicted and effectively implemented when required.
Defect Life Cycle : when the defect was found by tester, he assigned that bug as NEW
status. Then Test Lead Analysis that bug and assign to developer OPEN status.
Developer fix the bug FIX status. Then tester again test the new build if the same error
occurs or not... if no means CLOSED status. Defect Life Cycle -> NEW -> OPEN -> FIX
-> CLOSED Revalidation cycle means test the new version or new build have the same
defect by executing the same testcases. simply like regression testing.
1. New Bugs are submitted in the Bug tracking system account by the QA.
When a bug is logged our QA engineers include all relevant information to that bug
such as:
• Date/time logged
• Language
• Operating System
Page 92 of 94
TCS Internal
• Bug Type – e.g. functional, UI, Installation, translation?
• Priority – Low\Medium\High\Urgent
The QA also analyses the error and describes, in a minimum number of steps how to
reproduce the problem for the benefit of the engineer. At this stage the bug is
labelled “Open”. Each issue must pass through at least four states:
Open: Opened by QA during testing
Pending: Fixed by Engineer but not verified as yet
Fixed: Fix Verified by QA
Closed: Fix re-verified before sign-off
QA Process Cycle:
The philosophy of Quality Assurance for software systems development is to ensure the
system meets or exceeds the agreed upon requirements of the end-users; thus creating a
high-quality, fully-functional and user-friendly application.
Phase V: QA Testing
1. Unit Testing
2. Functional Testing
3. System Integration Testing
4. Regression Testing
5. User Acceptance Testing
Page 93 of 94
TCS Internal
Unit testing: The testing, by development, of the application modules to verify each
unit (module) itself meets the accepted user requirements and design and development
standards
Functional Testing: The testing of all the application’s modules individually to ensure
the modules, as released from development to QA, work together as designed and meet
the accepted user requirements and system standards
System Integration Testing: Testing of all of the application modules in the same
environment, database instance, network and inter-related applications, as it would
function in production. This includes security, volume and stress testing.
Regression Testing: This is the testing of each of the application’s system builds to
confirm that all aspects of a system remain functionally correct after program
modifications. Using automated regression testing tools is the preferred method.
User Acceptance Testing: The testing of the entire application by the end-users
ensuring the application functions as set forth in the system requirements documents and
that the system meets the business needs
Page 94 of 94
TCS Internal