0% found this document useful (0 votes)
649 views94 pages

Bi Concepts

Business intelligence (BI) refers to methods and technologies used by organizations for tactical and strategic decision making by analyzing data to better understand customers, improve operations, and address business issues; a data warehouse stores integrated and historical data from various sources to support analysis and decision making, unlike operational databases; Informatica transformations include mappings containing sources and targets connected in a pipeline, sessions instructing data movement, and workflows automating tasks.

Uploaded by

srikanth
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
649 views94 pages

Bi Concepts

Business intelligence (BI) refers to methods and technologies used by organizations for tactical and strategic decision making by analyzing data to better understand customers, improve operations, and address business issues; a data warehouse stores integrated and historical data from various sources to support analysis and decision making, unlike operational databases; Informatica transformations include mappings containing sources and targets connected in a pipeline, sessions instructing data movement, and workflows automating tasks.

Uploaded by

srikanth
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 94

What is BI?

OBIEE

http://www.oracle.com/technology/documentation/bi_ee.html
http://download.oracle.com/docs/cd/E10415_01/doc/nav/portal_booklist.htm

http://zed.cisco.com/confluence/display/siebel/Home

http://zed.cisco.com/confluence/display/siebel/Enterprise+Architecture+BI+Standards

https://cisco.webex.com/ciscosales/lsr.php?
AT=pb&SP=MC&rID=39544447&rKey=9C8D63F2C74ED9DA

http://informatica.techtiks.com/informatica_questions.html#RQ1

http://www.allinterview.com/showanswers/32477.html
http://www.1keydata.com/datawarehousing/glossary.html
http://www.forum9.com/
http://www.livestore.net/
http://www.kalaajkal.com/

BR100-High level requirement Document


MD50-Functional Specification
MD70-Technical specification .After approval of md70 we have to start
development, ERMO-TCA,TCB,TCC and FPR, SIT,UAT.
MD120-Deployment process—AIM methodology

Iteration – the solution is delivered in short iterations, with each cycle adding more business value
and implementing requested changes.

10 Key Principles of Agile Software Development, and how it fundamentally differs from a more
traditional waterfall approach to software development, are as follows:

Page 1 of 94
TCS Internal
1. Active user involvement is imperative
2. The team must be empowered to make decisions
3. Requirements evolve but the timescale is fixed
4. Capture requirements at a high level; lightweight & visual
5. Develop small, incremental releases and iterate
6. Focus on frequent delivery of products
7. Complete each feature before moving on to the next
8. Apply the 80/20 rule
9. Testing is integrated throughout the project lifecycle – test early and often
10. A collaborative & cooperative approach between all stakeholders is essential

Unix

http://www.sikh-history.com/computers/unix/commands.html#catcommand

Cat file1
cat file1 file2 > all -----it will combined (it will create file if it doesn’t exit)
cat file1 >> file2---it will append to file 2

o > will redirect output from standard out (screen) to file or printer or
whatever you like.
o >> Filename will append at the end of a file called filename.
o < will redirect input to a process or command.
Below line is the first line of the script

#!/usr/bin/sh

Or

#!/bin/ksh

What does #! /bin/sh mean in a shell script?


It actually tells the script to which interpreter to refer. As you know, bash shell has
some specific functions that other shell does not have and vice-versa. Same way is
for perl, python and other languages.

It's to tell your shell what shell to you in executing the following statements in your
shell script.

how to find all processes that are running

Page 2 of 94
TCS Internal
ps -A

Crontab command.
Crontab command is used to schedule jobs. You must have permission to run this
command by Unix Administrator. Jobs are scheduled in five numbers, as follows.

Minutes (0-59) Hour (0-23) Day of month (1-31) month (1-12) Day of week
(0-6) (0 is Sunday)
so for example you want to schedule a job which runs from script named backup jobs
in /usr/local/bin directory on sunday (day 0) at 11.25 (22:25) on 15th of month. The entry
in crontab file will be. * represents all values.

25 22 15 * 0 /usr/local/bin/backup_jobs

The * here tells system to run this each month.


Syntax is
crontab file So a create a file with the scheduled jobs as above and then type
crontab filename .This will scheduled the jobs.

Below cmd gives total no of users logged in at this time.


who | wc -l
echo "are total number of people logged in at this time."

Below cmd will display only directories

$ ls -l | grep '^d'

Pipes:

The pipe symbol "|" is used to direct the output of one command to the input
of another.

Moving, renaming, and copying files:

cp file1 file2 copy a file


mv file1 newname move or rename a file
mv file1 ~/AAA/ move file1 into sub-directory AAA in your home directory.
rm file1 [file2 ...] remove or delete a file

Viewing and editing files:

Page 3 of 94
TCS Internal
cat filename Dump a file to the screen in ascii.
head filename Show the first few lines of a file.
head -n filename Show the first n lines of a file.
tail filename Show the last few lines of a file.
tail -n filename Show the last n lines of a file.

Searching for files : The find command

find . -name aaa.txt Finds all the files named aaa.txt in the current directory or
any subdirectory tree.
find / -name vimrc Find all the files named 'vimrc' anywhere on the system.
find /usr/local/games -name "*xpilot*"
Find all files whose names contain the string 'xpilot' which
exist within the '/usr/local/games' directory tree.

You can find out what shell you are using by the command:

echo $SHELL

If file exists then send email with attachment.

if [[ -f $your_file ]]; then


uuencode $your_file $your_file|mailx -s "$your_file exists..." your_email_address
fi

Interactive History

A feature of bash and tcsh (and sometimes others) you can use
the up-arrow keys to access your previous commands, edit
them, and re-execute them.

Basics of the vi editor

Opening a file
vi filename

Creating text
Edit modes: These keys enter editing modes and type in the text
of your document.

i Insert before current cursor position

Page 4 of 94
TCS Internal
I Insert at beginning of current line
a Insert (append) after current cursor position
A Append to end of line
r Replace 1 character
R Replace mode
<ESC> Terminate insertion or overwrite mode

Deletion of text

x Delete single character


dd Delete current line and put in buffer

:w Write the current file.


:w new.file Write the file to the name 'new.file'.
:w! existing.file Overwrite an existing file with the file currently being edited.
:wq Write the file and quit.
:q Quit.
:q! Quit with no changes.

Business Intelligence refers to a set of methods and techniques that are used by
organizations for tactical and strategic decision making. It leverages methods and
technologies that focus on counts, statistics and business objectives to improve business
performance.

The objective of Business Intelligence is to better understand customers and improve


customer service, make the supply and distribution chain more efficient, and to identify
and address business problems and opportunities quickly.

Warehouse is used for high level data analysis purpose.It


is used for predictions, timeseries analysis, financial
Analysis, what -if simulations etc. Basically it is used
for better decision making.

OLTP is NOT used for analysis purpose.


It is used for transaction and data processing.
Its basically used for storing day-to-day transactions that
take place in an organisation.
The main focus of OLTP is easy and fast inputing of data,
While the main focus in data warehouse is easy retrieval of
Data.
OLTP doesnt store historical data.(this is the reason why
it cant be used for analysis)

Page 5 of 94
TCS Internal
DW stores historical data.

What is a Data Warehouse?

Data Warehouse is a "Subject-Oriented, Integrated, Time-Variant Nonvolatile


collection of data in support of decision making".

In terms of design data warehouse and data mart are almost the same.

In general a Data Warehouse is used on an enterprise level and a Data Marts is used on a
business division/department level.

Subject Oriented:

Data that gives information about a particular subject instead of about a company's
ongoing operations.

Integrated:

Data that is gathered into the data warehouse from a variety of sources and merged into a
coherent whole.

Time-variant:

All data in the data warehouse is identified with a particular time period.

Non-volatile

Data is stable in a data warehouse. More data is added but data is never removed. This
enables management to gain a consistent picture of the business.

Informatica Transformations:

Mapping: Mapping is the Informatica Object which contains set of transformations


including source and target. Its look like pipeline.
Session: A session is a set of instructions that tells informatica Server how to move data
from sources to targets.
WorkFlow: A workflow is a set of instructions that tells Informatica Server how to
execute tasks such as sessions, email notifications and commands. In a workflow multiple
sessions can be included to run in parallel or sequential manner.
Source Definition: The Source Definition is used to logically represent database table or
Flat files.

Page 6 of 94
TCS Internal
Target Definition: The Target Definition is used to logically represent a database table
or file in the Data Warehouse / Data Mart.
Aggregator: The Aggregator transformation is used to perform Aggregate calculations
on group basis.
Expression: The Expression transformation is used to perform the arithmetic calculation
on row by row basis and also used to convert string to integer vis and concatenate two
columns.
Filter: The Filter transformation is used to filter the data based on single condition and
pass through next transformation.
Router: The router transformation is used to route the data based on multiple conditions
and pass through next transformations.
It has three groups
1) Input group
2) User defined group
3) Default group
Joiner: The Joiner transformation is used to join two sources residing in different
databases or different locations like flat file and oracle sources or two relational tables
existing in different databases.
Source Qualifier: The Source Qualifier transformation is used to describe in SQL the
method by which data is to be retrieved from a source application system and also
used to join two relational sources residing in same databases.
What is Incremental Aggregation?
A. Whenever a session is created for a mapping Aggregate Transformation, the session
option for Incremental Aggregation can be enabled. When PowerCenter performs
incremental aggregation, it passes new source data through the mapping and uses
historical cache data to perform new aggregation calculations incrementally.

Lookup: Lookup transformation is used in a mapping to look up data in a flat file or a


relational table, view, or synonym.
Two types of lookups:
1) Connected
2) Unconnected
Connected Unconnected

Which is connected to Pipeline and Which is not connected to pipeline and


receives the input values from pipeline Receives input values from the result of a:
LKP expression in another transformation
via arguments.
We can’t use this lookup More than once in We can use this transformation more than
a mapping. once within the mapping
We can return multiple columns from the Designate one return port (R). Returns one
same row. column from each row.

We can configure to use dynamic cache We can’t configure to use dynamic cache
Pass multiple output values to another Pass one output value to another
transformation. Link lookup/output ports to transformation. The lookup/output/return

Page 7 of 94
TCS Internal
another transformation. port passes the value to the transformation
calling: LKP expression.

Lookup Caches:

When configuring a lookup cache, you can specify any of the following options:

• Persistent cache
• Recache from lookup source
• Static cache
• Dynamic cache
• Shared cache

Dynamic cache: When you use a dynamic cache, the PowerCenter Server updates the
lookup cache as it passes rows to the target.
If you configure a Lookup transformation to use a dynamic cache, you can only use the
equality operator (=) in the lookup condition.
NewLookupRow Port will enable automatically.

NewLookupRow
Description
Value
The PowerCenter Server does not update or insert the row in the
0
cache.
1 The PowerCenter Server inserts the row into the cache.
2 The PowerCenter Server updates the row in the cache.

Static cache: It is a default cache; the PowerCenter Server doesn’t update the lookup
cache as it passes rows to the target.
Persistent cache: If the lookup table does not change between sessions, configure the
Lookup transformation to use a persistent lookup cache. The PowerCenter Server then
saves and reuses cache files from session to session, eliminating the time required to read
the lookup table.

Normalizer: The Normalizer transformation is used to generate multiple records from a


single record and transform structured data (such as COBOL or flat files) into relational
data
Rank: The Rank transformation allows you to select only the top or bottom rank of data.
You can use a Rank transformation to return the largest or smallest numeric value in a
port or group.
The Designer automatically creates a RANKINDEX port for each Rank transformation.
Sequence Generator: The Sequence Generator transformation is used to generate
numeric key values in sequential order.

Page 8 of 94
TCS Internal
Stored Procedure: The Stored Procedure transformation is used to execute externally
stored database procedures and functions. It is used to perform the database level
operations.
Sorter: The Sorter transformation is used to sort data in ascending or descending order
according to a specified sort key. You can also configure the Sorter transformation for
case-sensitive sorting, and specify whether the output rows should be distinct. The Sorter
transformation is an active transformation. It must be connected to the data flow.

Union Transformation:
The Union transformation is a multiple input group transformation that you can use to
merge data from multiple pipelines or pipeline branches into one pipeline branch. It
merges data from multiple sources similar to the UNION ALL SQL statement to combine
the results from two or more SQL statements. Similar to the UNION ALL statement, the
Union transformation does not remove duplicate rows.Input groups should have similar
structure.

Update Strategy: The Update Strategy transformation is used to indicate the DML
statement.
We can implement update strategy in two levels:
1) Mapping level
2) Session level.
Session level properties will override the mapping level properties.

Mapplet:

Mapplet is a set of reusable transformations. We can use this mapplet in any mapping
within the Folder.

A mapplet can be active or passive depending on the transformations in the mapplet.


Active mapplets contain one or more active transformations. Passive mapplets contain
only passive transformations.

When you add transformations to a mapplet, keep the following restrictions in mind:

• If you use a Sequence Generator transformation, you must use a reusable


Sequence Generator transformation.
• If you use a Stored Procedure transformation, you must configure the Stored
Procedure Type to be Normal.
• You cannot include the following objects in a mapplet:
o Normalizer transformations
o COBOL sources
o XML Source Qualifier transformations
o XML sources
o Target definitions
o Other mapplets

Page 9 of 94
TCS Internal
• The mapplet contains Input transformations and/or source definitions with at least
one port connected to a transformation in the mapplet.
• The mapplet contains at least one Output transformation with at least one port
connected to a transformation in the mapplet.

Input Transformation: Input transformations are used to create a logical interface to a


mapplet in order to allow data to pass into the mapplet.
Output Transformation: Output transformations are used to create a logical interface
from a mapplet in order to allow data to pass out of a mapplet.

System Variables

$$$SessStartTime returns the initial system date value on the machine hosting the
Integration Service when the server initializes a session. $$$SessStartTime returns the
session start time as a string value. The format of the string depends on the database you
are using.

Adventage of Teradata:
1. Can store Billions of rows.
2. parallel processing makes teradata faster than other RDBMS
3. Can be accessed by network attached and channel attached system
4. supports the requirements from diverse clients
5. automatically detects and recovers from hardware failure
6. Allows expansion without sacrifice performance.

Datawarehouse - Concepts

Beginners

1. What is a Data Warehouse?

Data Warehouse is a "Subject-Oriented, Integrated, Time-Variant Nonvolatile


collection of data in support of decision making".

2. What is a DataMart?

Datamart is usually sponsored at the department level and developed with a specific
issue or subject in mind, a Data Mart is a data warehouse with a focused objective.

3. What is Data Mining?

Data Mining is an analytic process designed to explore hidden consistent patterns,


trends and associations with in data stored in a data warehouse or other large databases.

Page 10 of 94
TCS Internal
4. What do you mean by Dimension Attributes?

The Dimension Attributes are the various columns in a dimension table.

For example , attributes in a PRODUCT dimension can be product category, product type
etc.

Generally the Dimension Attributes are used in query filter condition and to display other
related information about an dimension.

5. What is the difference between a data warehouse and a data mart?

In terms of design data warehouse and data mart are almost the same.

In general a Data Warehouse is used on an enterprise level and a Data Marts is used on a
business division/department level.

A data mart only contains data specific to a particular subject areas.

6. What is the difference between OLAP, ROLAP, MOLAP and HOLAP?

ROLAP stands for Relational OLAP. Cencentually data is organized in cubes with
dimensions.

MOLAP stands for Multidimensional OLAP. Cencentually data is organized in cubes


with dimensions

HOLAP stands for Hybrid OLAP, it is a combination of both worlds.

7. What is a star schema?

Star schema is a data warehouse schema where there is only one "fact table" and many
denormalized dimension tables.

Fact table contains primary keys from all the dimension tables and other numeric
columns columns of additive, numeric facts.

8. What does it mean by grain of the star schema?

In Data warehousing grain refers to the level of detail available in a given fact table as
well as to the level of detail provided by a star schema.

It is usually given as the number of records per key within the table. In general, the grain
of the fact table is the grain of the star schema.

9. What is a snowflake schema?

Page 11 of 94
TCS Internal
Unlike Star-Schema, Snowflake schema contain normalized dimension tables in a tree
like structure with many nesting levels.

Snowflake schema is easier to maintain but queries require more joins.

10. What is a surrogate key?

A surrogate key is a substitution for the natural primary key. It is a unique identifier or
number ( normally created by a database sequence generator ) for each record of a
dimension table that can be used for the primary key to the table.

A surrogate key is useful because natural keys may change.

11. What oracle tools are available to design and build a data warehosue/data mart?

Data Warehouse Builder,

Oracle Designer,

Oracle Express,

Express Objects etc.

12. What is a Cube?

A multi-dimensional representation of data in which the cells contain measures (i.e.


facts) and the edges represent data dimensions by which the data can be sliced and diced

For example:

A SALES cube can have PROFIT and COMMISSION measures and TIME, ITEM and
REGION dimensions

13. What does ETL stand for?

ETL stands for "Extract, Transform and Load".

ETL tools are used to pull data from a database, transform the data so that it is
compatible with a second database ( datawarehouse or datamart) and then load the data.

14. What is Aggregation?

Page 12 of 94
TCS Internal
In a data warehouse paradigm "aggregation" is one way of improving query
performance. An aggregate fact table is a new table created off of an existing fact table
by summing up facts for a set of associated dimension. Grain of an aggregate fact is
higher than the fact table. Aggreagate tables contain fewer rows thus making quesries run
faster.

15. what is Business Intelligence?

Business Intelligence is a term introduced by Howard Dresner of Gartner Group in


1989. He described Business Intelligence as a set of concepts and methodologies to
improve decision making in business through use of facts and fact based systems.

16. What is transitive dependency?

When a non-key attribute identifys the value of another non-key atribute then the table
is set to contain transitive dependecncy.

17. what is the current version of informatica?

The current version of informatica is 8.6

18. What are the tools in informatica?Why we are using that tools?

Powermart and powercenter are the popular tools.

Powercenter is generally used in production environment.

powermart is generally used in developement environment

19. What is a transformation?

It is a function (or) object that process (or)transforms the data.

Designer tool provides a set of transformations for different data transformations.

Transformations such as filter,source qualifior,expression,aggrigator,joiner etc.

20. What is a mapping?

It defines the flow of data from source to the target.

Page 13 of 94
TCS Internal
It also contains different rules to be applied on the data before the data get loaded to the
target.

21. what is fact less fact table?

a fact table that contains only primary keys from the dimension tables, and that do
not contain any

measures that type of fact table is called fact less fact table

22. What is a Schema?

Graphical Representation of the datastructure.

First Phase in implementation of Universe

23. What is A Context?

A Method by which the designer can decide which path to choose when more than one
path is possible from one table to another in a Universe

24. What is a Bomain key?

A file that contains the address of the repository's security domain.

Advanced

25. Who are the Data Stewards and whats their role?

Data Stewards is a group of experienced people who are responsible for planning ,
defining business processes and setting directions. Data Stewards are familiar with the
organization data quality , data issues and overall business processes.

26. What are the most important features of a data warehouse?

DRILL DOWN, DRILL ACROSS and TIME HANDLING

To be able to drill down/drill across is the most basic requirement of an end user in a data
warehouse. Drilling down most directly addresses the natural end-user need to see more
detail in an result. Drill down should be as generic as possible becuase there is absolutely
no good way to predict users drill-down path.

Page 14 of 94
TCS Internal
27. What the easiest way to build a corporate specific time dimension?

Unlike most dimensions "Time dimension" do not change. You can populate it once
and use for years.

So the easiest way is to use spread-sheet.

28. What is a Real-Time Data Warehouse - RTDW?

Real Time Data warehous is an analytic component of an enterprise level data stream
that supports continuous, asynchronous, multi-point delivery of data.

In a RTDW data moves straight from the source systems to decision makers without any
form for staging.

29. What is Slowly Changing Dimension?

Slowly changing dimensions refers to the change in dimensional attributes over time.

An example of slowly changing dimension is a product dimension where attributes of a


given product change over time, due to change in component or ingredients or packaging
details.

There are three main techniques for handling slowly changing dimensions in a data
warehouse:

Type 1: Overwriting. No History maintained . The new record replaces the original
record.

Type2: Creating another dimension record with time stamps.

Type3: Creating a current value field. The original record is modified to reflect the
change.

Each technique handles the problem differently. The designer chooses among these
techniques depending on the company's need to preserve an accurate history of the
dimensional changes.

30. What is a Conformed Dimension?

Confirmed dimension is a dimension modelling technique promoted by Ralph Kimball.

A Conformed Dimension is a dimension that has a single meaning and content


throughout a datawarehouse. A confirmed dimension can be used in any star schema. For
example Time/Calendar dimension is normally used in all star schemas so can be
designed once and used with many fact tables across a data warehouse.

Page 15 of 94
TCS Internal
31. What is TL9000?

TL9000 is a quality measurement system, determined by the QuEST Forum to be vital


to an organization's success. It offers a telecommunications-specific set of requirements
based on ISO 9001; it defines the quality system requirements for design, development,
production, delivery, installation and maintenance of telecommunication products and
services.

Difference between 7.x and 8.x

Power Center 7.X Architecture.

Page 16 of 94
TCS Internal
Power Center 8.X Architecture.

Page 17 of 94
TCS Internal
Page 18 of 94
TCS Internal
Developer Changes: Java Transformation Added in 8.x

• For example, in PowerCenter:


• PowerCenter Server has become a service, the Integration Service
• No more Repository Server, but PowerCenter includes a Repository
Service
• Client applications are the same, but work on top of the new services
framework

below are the difference between 7.1 and 8.1 of infa..

1)powercenter connect for sap netweaver bw option


2)sql transformation is added
3)service oriented architecture
4)grid concept is additional feature
5) random file name can genaratation in target
6) command line programms: infacmd and infasetup new commands were added.
7) java transformation is added feature
8)concurrent cache creation and faster index building are additional feature in
lookup transformation
9) caches or automatic u dont need to allocate at transformation level
10) push down optimization techniques,some
11)we can append data into the flat file target.

Page 19 of 94
TCS Internal
1) the diff btw 8.1 and 8.5 is we can find push down operation in mapping wch gives
more flexible performance tunning.

Pushdown optimization

A session option that allows you to push transformation logic to the source or target
database.

GRID

Effective in version 8.0, you create and configure a grid in the Administration Console.
You configure a grid to run on multiple nodes, and you configure one Integration Service
to run on the grid. The Integration Service runs processes on the nodes in the grid to
distribute workflows and sessions. In addition to running a workflow on a grid, you can
now run a session on a grid. When you run a session or workflow on a grid, one
service process runs on each available node in the grid.

Integration Service (IS)


The key functions of IS are
 Interpretation of the workflow and mapping metadata from the repository.
 Execution of the instructions in the metadata
 Manages the data from source system to target system within the memory and
disk
The main three components of Integration Service which enable data movement are,
 Integration Service Process
 Load Balancer
 Data Transformation Manager
6.1 Integration Service Process (ISP)

The Integration Service starts one or more Integration Service processes to run and
monitor workflows. When we run a workflow, the ISP starts and locks the workflow,
runs the workflow tasks, and starts the process to run sessions. The functions of the
Integration Service Process are,
 Locks and reads the workflow
 Manages workflow scheduling, ie, maintains session dependency
 Reads the workflow parameter file
 Creates the workflow log
 Runs workflow tasks and evaluates the conditional links
 Starts the DTM process to run the session
 Writes historical run information to the repository
 Sends post-session emails

6.2 Load Balancer

The Load Balancer dispatches tasks to achieve optimal performance. It dispatches tasks
to a single node or across the nodes in a grid after performing a sequence of steps. Before

Page 20 of 94
TCS Internal
understanding these steps we have to know about Resources, Resource Provision
Thresholds, Dispatch mode and Service levels
 Resources – we can configure the Integration Service to check the resources
available on each node and match them with the resources required to run the
task. For example, if a session uses an SAP source, the Load Balancer dispatches
the session only to nodes where the SAP client is installed
 Three Resource Provision Thresholds, The maximum number of runnable
threads waiting for CPU resources on the node called Maximum CPU Run Queue
Length. The maximum percentage of virtual memory allocated on the node
relative to the total physical memory size called Maximum Memory %. The
maximum number of running Session and Command tasks allowed for each
Integration Service process running on the node called Maximum Processes
 Three Dispatch mode’s – Round-Robin: The Load Balancer dispatches tasks to
available nodes in a round-robin fashion after checking the “Maximum Process”
threshold. Metric-based: Checks all the three resource provision thresholds and
dispatches tasks in round robin fashion. Adaptive: Checks all the three resource
provision thresholds and also ranks nodes according to current CPU availability
 Service Levels establishes priority among tasks that are waiting to be dispatched,
the three components of service levels are Name, Dispatch Priority and Maximum
dispatch wait time. “Maximum dispatch wait time” is the amount of time a task
can wait in queue and this ensures no task waits forever
A .Dispatching Tasks on a node
1. The Load Balancer checks different resource provision thresholds on the node
depending on the Dispatch mode set. If dispatching the task causes any threshold
to be exceeded, the Load Balancer places the task in the dispatch queue, and it
dispatches the task later
2. The Load Balancer dispatches all tasks to the node that runs the master
Integration Service process
B. Dispatching Tasks on a grid,
1. The Load Balancer verifies which nodes are currently running and enabled
2. The Load Balancer identifies nodes that have the PowerCenter resources required
by the tasks in the workflow
3. The Load Balancer verifies that the resource provision thresholds on each
candidate node are not exceeded. If dispatching the task causes a threshold to be
exceeded, the Load Balancer places the task in the dispatch queue, and it
dispatches the task later
4. The Load Balancer selects a node based on the dispatch mode

6.3 Data Transformation Manager (DTM) Process

When the workflow reaches a session, the Integration Service Process starts the DTM
process. The DTM is the process associated with the session task. The DTM process
performs the following tasks:
 Retrieves and validates session information from the repository.
 Validates source and target code pages.
 Verifies connection object permissions.

Page 21 of 94
TCS Internal
 Performs pushdown optimization when the session is configured for pushdown
optimization.
 Adds partitions to the session when the session is configured for dynamic
partitioning.
 Expands the service process variables, session parameters, and mapping variables
and parameters.
 Creates the session log.
 Runs pre-session shell commands, stored procedures, and SQL.
 Sends a request to start worker DTM processes on other nodes when the session is
configured to run on a grid.
 Creates and runs mapping, reader, writer, and transformation threads to extract,
transform, and load data
 Runs post-session stored procedures, SQL, and shell commands and sends post-
session email

After the session is complete, reports execution result to ISP

Pictorial Representation of Workflow execution:

A PowerCenter Client request IS to start workflow

IS starts ISP

ISP consults LB to select node

ISP starts DTM in node selected by LB

DWH ARCHITECTURE

Page 22 of 94
TCS Internal
Granularity

Principle: create fact tables with the most granular data possible to support analysis of the
business process.

In Data warehousing grain refers to the level of detail available in a given fact table as
well as to the level of detail provided by a star schema.

It is usually given as the number of records per key within the table. In general, the grain
of the fact table is the grain of the star schema.

Facts: facts must be consistent with the grain ... all facts are at a uniform grain

• watch for facts of mixed granularity


• total sales for day & montly total

dimensions: each dimension associated with fact table must take on a single value for
each fact row

• each dimension attribute must take on one value


• outriggers are the exception, not the rule

Page 23 of 94
TCS Internal
What is DM?

DM is a logical design technique that seeks to present the data in a standard, intuitive
framework that allows for high-performance access. It is inherently dimensional, and it
adheres to a discipline that uses the relational model with some important restrictions.
Every dimensional model is composed of one table with a multipart key, called the fact
table, and a set of smaller tables called dimension tables. Each dimension table has a
single-part primary key that corresponds exactly to one of the components of the
multipart key in the fact table.

Page 24 of 94
TCS Internal
What is Conformed Dimension?

Conformed Dimensions (CD): these dimensions are something that is built once in your
model and can be reused multiple times with different fact tables. For example, consider
a model containing multiple fact tables, representing different data marts. Now look for a
dimension that is common to these facts tables. In this example let’s consider that the
product dimension is common and hence can be reused by creating short cuts and joining
the different fact tables.Some of the examples are time dimension, customer dimensions,
product dimension.

What is Junk Dimension?

A "junk" dimension is a collection of random transactional codes, flags and/or text


attributes that are unrelated to any particular dimension. The junk dimension is simply a
structure that provides a convenient place to store the junk attributes. A good example
would be a trade fact in a company that brokers equity trades.

When you consolidate lots of small dimensions and instead of having 100s of small
dimensions, that will have few records in them, cluttering your database with these mini
‘identifier’ tables, all records from all these small dimension tables are loaded into ONE
dimension table and we call this dimension table Junk dimension table. (Since we are
storing all the junk in this one table) For example: a company might have handful of
manufacture plants, handful of order types, and so on, so forth, and we can consolidate
them in one dimension table called junked dimension table

It’s a dimension table which is used to keep junk attributes

What is De Generated Dimension?

An item that is in the fact table but is stripped off of its description, because the
description belongs in dimension table, is referred to as Degenerated Dimension. Since it
looks like dimension, but is really in fact table and has been degenerated of its
description, hence is called degenerated dimension. Now coming to the slowly changing
dimensions (SCD) and Slowly Growing Dimensions (SGD): I would like to classify
them to be more of an attributes of dimensions its self.

Degenerated Dimension: a dimension which is located in fact table known as


Degenerated dimesion

What is Data Mart?

A Data Mart is a subset of data from a Data Warehouse. Data Marts are built for specific
user groups. They contain a subset of rows and columns that are of interest to the

Page 25 of 94
TCS Internal
particular audience. By providing decision makers with only a subset of the data from
the Data Warehouse, privacy, performance and clarity objectives can be attained.

What is Data Warehouse?

A Data Warehouse (DW) is simply an integrated consolidation of data from a variety of


sources that is specially designed to support strategic and tactical decision making. The
main objective of a Data Warehouse is to provide an integrated environment and coherent
picture of the business at a point in time.

What is Fact Table?

A Fact Table in a dimensional model consists of one or more numeric facts of importance
to a business. Examples of facts are as follows:

1. the number of products sold


2. the value of products sold
3. the number of products produced
4. the number of service calls received

Businesses have a need to monitor these "facts" closely and to sum them using different
"dimensions". For example, a business might find the following information useful:

1. the value of products sold this quarter versus last quarter


2. the value of products sold by store
3. the value of products sold by channel (e.g. phone, Internet, in-store shopping)
4. the value of products sold by product (e.g. blue widgets, red widgets)

Businesses will often need to sum facts by multiple dimensions:

1. the value of products sold store, by product type and by day of week
2. the value of products sold by product and by channel

In addition to numeric facts, fact table contain the "keys" of each of the dimensions that
related to that fact (e.g Customer Nbr, Product ID, Store Nbr). Details about the
dimensions (e.g customer name, customer address) are stored in the dimension table (i.e.
customer)

What is Fact and Dimension?

A "fact" is a numeric value that a business wishes to count or sum. A "dimension" is


essentially an entry point for getting at the facts.Dimensions are things of interest to the
business.

A set of level properties that describe a specific aspect of a business, used for analyzing
the factual measures

Page 26 of 94
TCS Internal
What is Factless Fact Table?

Factless fact table captures the many-to-many relationships between dimensions, but
contains no numeric or textual facts. They are often used to record events or coverage
information.

Common examples of factless fact tables include:

• Identifying product promotion events (to determine promoted products that didn’t
sell)
• Tracking student attendance or registration events
• Tracking insurance-related accident events
• Identifying building, facility, and equipment schedules for a hospital or
University"

Types of facts?
There are three types of facts:

• Additive: Additive facts are facts that can be summed up through all of the
dimensions in the fact table.
• Semi-Additive: Semi-additive facts are facts that can be summed up for some of
the dimensions in the fact table, but not the others.
• Non-Additive: Non-additive facts are facts that cannot be summed up for any of
the dimensions present in the fact table.

What is incremental Aggrgation?


When using incremental aggregation, you apply captured changes in the source to aggregate
calculations in a session. If the source changes only incrementally and you can capture changes,
you can configure the session to process only those changes. This allows the PowerCenter
Server to update your target incrementally, rather than forcing it to process the entire source and
recalculate the same data each time you run the session.

For example, you might have a session using a source that receives new data every day. You
can capture those incremental changes because you have added a filter condition to the mapping
that removes pre-existing data from the flow of data. You then enable incremental aggregation.

When the session runs with incremental aggregation enabled for the first time on March 1, you
use the entire source. This allows the PowerCenter Server to read and store the necessary
aggregate data. On March 2, when you run the session again, you filter out all the records except
those time-stamped March 2. The PowerCenter Server then processes only the new data and
updates the target accordingly.

Consider using incremental aggregation in the following circumstances:

You can capture new source data. Use incremental aggregation when you can capture new
source data each time you run the session. Use a Stored Procedure or Filter transformation to
process only new data.

Page 27 of 94
TCS Internal
Incremental changes do not significantly change the target. Use incremental aggregation
when the changes do not significantly change the target. If processing the incrementally changed
source alters more than half the existing target, the session may not benefit from using
incremental aggregation. In this case, drop the table and re-create the target with complete
source data.

Note: Do not use incremental aggregation if your mapping contains percentile or median
functions. The PowerCenter Server uses system memory to process Percentile and Median
functions in addition to the cache memory you configure in the session property sheet. As a
result, the PowerCenter Server does not store incremental aggregation values for Percentile and
Median functions in disk caches.

Normalization:

Some Oracle databases were modeled according to the rules of normalization that were
intended to eliminate redundancy.

Obviously, the rules of normalization are required to understand your relationships and
functional dependencies

First Normal Form:

A row is in first normal form (1NF) if all underlying domains contain atomic values only.

• Eliminate duplicative columns from the same table.


• Create separate tables for each group of related data and identify each row with a unique
column or set of columns (the primary key).

Second Normal Form:


An entity is in Second Normal Form (2NF) when it meets the requirement of being in First Normal
Form (1NF) and additionally:

• Does not have a composite primary key. Meaning that the primary key can not be
subdivided into separate logical entities.
• All the non-key columns are functionally dependent on the entire primary key.
• A row is in second normal form if, and only if, it is in first normal form and every non-key
attribute is fully dependent on the key.
• 2NF eliminates functional dependencies on a partial key by putting the fields in a
separate table from those that are dependent on the whole key. An example is resolving
many: many relationships using an intersecting entity.

Third Normal Form:


An entity is in Third Normal Form (3NF) when it meets the requirement of being in Second
Normal Form (2NF) and additionally:

• Functional dependencies on non-key fields are eliminated by putting them in a separate


table. At this level, all non-key fields are dependent on the primary key.

Page 28 of 94
TCS Internal
• A row is in third normal form if and only if it is in second normal form and if attributes that
do not contribute to a description of the primary key are move into a separate table. An
example is creating look-up tables.

Boyce-Codd Normal Form:

Boyce Codd Normal Form (BCNF) is a further refinement of 3NF. In his later writings Codd refers
to BCNF as 3NF. A row is in Boyce Codd normal form if, and only if, every determinant is a
candidate key. Most entities in 3NF are already in BCNF.

Fourth Normal Form:

An entity is in Fourth Normal Form (4NF) when it meets the requirement of being in Third Normal
Form (3NF) and additionally:

• Has no multiple sets of multi-valued dependencies. In other words, 4NF states that no
entity can have more than a single one-to-many relationship

Why using hints

It is a perfect valid question to ask why hints should be used. Oracle comes with an
optimizer that promises to optimize a query's execution plan. When this optimizer is
really doing a good job, no hints should be required at all.

Sometimes, however, the characteristics of the data in the database are changing rapidly,
so that the optimizer (or more accuratly, its statistics) are out of date. In this case, a hint
could help.

You should first get the explain plan of your SQL and determine what changes can be
done to make the code operate without using hints if possible. However, hints such as
ORDERED, LEADING, INDEX, FULL, and the various AJ and SJ hints can tame a wild
optimizer and give you optimal performance

Table analyze and update Analyze Statement

The ANALYZE statement can be used to gather statistics for a specific table, index or
cluster. The statistics can be computed exactly, or estimated based on a specific number
of rows, or a percentage of rows:

ANALYZE TABLE employees COMPUTE STATISTICS;

ANALYZE INDEX employees_pk COMPUTE STATISTICS;

ANALYZE TABLE employees ESTIMATE STATISTICS SAMPLE 100 ROWS;

Page 29 of 94
TCS Internal
ANALYZE TABLE employees ESTIMATE STATISTICS SAMPLE 15 PERCENT;

EXEC DBMS_STATS.gather_table_stats('SCOTT', 'EMPLOYEES');

EXEC DBMS_STATS.gather_table_stats('SCOTT', 'EMPLOYEES', estimate_percent =>


15);

Automatic Optimizer Statistics Collection

By default Oracle 10g automatically gathers optimizer statistics using a scheduled job
called GATHER_STATS_JOB. By default this job runs within maintenance windows
between 10 P.M. to 6 A.M. week nights and all day on weekends. The job calls the
DBMS_STATS.GATHER_DATABASE_STATS_JOB_PROC internal procedure which
gathers statistics for tables with either empty or stale statistics, similar to the
DBMS_STATS.GATHER_DATABASE_STATS procedure using the GATHER AUTO
option. The main difference is that the internal job prioritizes the work such that tables
most urgently requiring statistics updates are processed first.

Informatica Session Log shows busy percentage

***** RUN INFO FOR TGT LOAD ORDER GROUP [1], CONCURRENT SET [1]
****

Thread [READER_1_1_1] created for [the read stage] of partition point


[SQ_ACW_PCBA_APPROVAL_STG] has completed: Total Run Time = [7.193083]
secs, Total Idle Time = [0.000000] secs, Busy Percentage = [100.000000]

Thread [TRANSF_1_1_1] created for [the transformation stage] of partition point


[SQ_ACW_PCBA_APPROVAL_STG] has completed. The total run time was
insufficient for any meaningful statistics.

Thread [WRITER_1_*_1] created for [the write stage] of partition point


[ACW_PCBA_APPROVAL_F1, ACW_PCBA_APPROVAL_F] has completed: Total
Run Time = [0.806521] secs, Total Idle Time = [0.000000] secs, Busy Percentage =
[100.000000]

Hint categories

Page 30 of 94
TCS Internal
Hints can be categorized as follows:

• Hints for Optimization Approaches and Goals,


• Hints for Access Paths, Hints for Query Transformations,
• Hints for Join Orders,
• Hints for Join Operations,
• Hints for Parallel Execution,
• Additional Hints

ORDERED- This hint forces tables to be joined in the order specified. If you know
table X has fewer rows, then ordering it first may speed execution in a join.

PARALLEL (table, instances)This specifies the operation is to be done in parallel.

If index is not able to create then will go for /*+ parallel(table, 8)*/-----For select and
update example---in where clase like st,not in ,>,< ,<> then we will use.

[NO]APPEND-This specifies that data is to be or not to be appended to the end of a


file rather than into existing free space. Use only with INSERT commands..

EXPLAIN PLAN Usage


Table analyze and update statisticDbms_job.get_table_stats (owner=’schemaname’,
table=table name)

When an SQL statement is passed to the server the Cost Based Optimizer (CBO) uses
database statistics to create an execution plan which it uses to navigate through the
data. Once you've highlighted a problem query the first thing you should do is
EXPLAIN the statement to check the execution plan that the CBO has created.
This will often reveal that the query is not using the relevant indexes, or
indexes to support the query are missing. Interpretation of the execution plan is
beyond the scope of this article.

The explain plan process stores data in the PLAN_TABLE. This table can be located in
the current schema or a shared schema and is created using in SQL*Plus as
follows:

SQL> CONN sys/password AS SYSDBA


Connected
SQL> @$ORACLE_HOME/rdbms/admin/utlxplan.sql
SQL> GRANT ALL ON sys.plan_table TO public;

SQL> CREATE PUBLIC SYNONYM plan_table FOR sys.plan_table;;

Page 31 of 94
TCS Internal
Syntax for synonym

CREATE OR REPLACE SYNONYM CTS_HZ_PARTIES FOR


[email protected]

CREATE DATABASE LINK TS4EDW CONNECT TO CDW_ASA IDENTIFIED BY k0kroach


USING 'TS4EDW'

Full Outer Join?

If we want all the parts (irrespective of whether they are supplied by any supplier or not),
and all the suppliers (irrespective of whether they supply any part or not) listed in the
same result set, we have a problem. That's because the traditional outer join (using the '+'
operator) is unidirectional, and you can't put (+) on both sides in the join condition. The
following will result in an error:

SQL> select p.part_id, s.supplier_name


2 from part p, supplier s
3 where p.supplier_id (+) = s.supplier_id (+);
where p.supplier_id (+) = s.supplier_id (+)
*
ERROR at line 3:
ORA-01468: a predicate may reference only one outer-joined table

Up through Oracle8i, Oracle programmers have used a workaround to circumvent this


limitation. The workaround involves two outer join queries combined by a UNION
operator, as in the following example:

SELECT e.last_name, e.department_id, d.department_name


FROM employees e FULL OUTER JOIN departments d ON (e.department_id =
d.department_id) ;

SQL> select p.part_id, s.supplier_name


2 from part p, supplier s
3 where p.supplier_id = s.supplier_id (+)
4 union
5 select p.part_id, s.supplier_name
6 from part p, supplier s
7 where p.supplier_id (+) = s.supplier_id;

WTT.NAME WORK_TYPE,
(CASE
WHEN (PPC.CLASS_CODE = 'Subscription' AND L1.ATTRIBUTE_CATEGORY
IS NOT NULL)
THEN L1.ATTRIBUTE_CATEGORY
ELSE PTT.TASK_TYPE
END) TASK_TYPE,

Page 32 of 94
TCS Internal
PEI.DENOM_CURRENCY_CODE

What’s the difference between View and Materialized View?

In view we cannot do DML commands where as it is possible


in Materialized view.

A view has a logical existence but a materialized view has


a physical existence.Moreover a materialized view can be
Indexed, analysed and so on....that is all the things that
we can do with a table can also be done with a materialized
view.

We can keep aggregated data into materialized view. we can schedule the
MV to refresh but table can’t.MV can be created based on multiple
tables.

Materialized View?
Since when we are working with various databases running in different system,So
sometime we may needed to fetch some records from the remote location,so it may quit
expensive in terms of resourse of fetching data directly from remote location.To to
minimize to response time and to increse the throughput we may create the copy to that
on local database by using data from remote database.This duplicate copy is Known as
materialised view,which may be refreshed as per as requirment as option avilable with
oracle such as fast,complete and refresh.

CREATE MATERIALIZED VIEW EBIBDRO.HWMD_MTH_ALL_METRICS_CURR_VIEW


TABLESPACE EBIBDD
NOCACHE
LOGGING
NOCOMPRESS
NOPARALLEL
BUILD IMMEDIATE
REFRESH COMPLETE
START WITH sysdate
NEXT TRUNC(SYSDATE+1)+ 4/24
WITH PRIMARY KEY
AS
select * from HWMD_MTH_ALL_METRICS_CURR_VW;

Another Method to refresh:

DBMS_MVIEW.REFRESH('MV_COMPLEX', 'C');

Target Update Override

By default, the Integration Service updates target tables based on key values. However,
you can override the default UPDATE statement for each target in a mapping. You might
want to update the target based on non-key columns.

Page 33 of 94
TCS Internal
Overriding the WHERE Clause

You can override the WHERE clause to include non-key columns. For example, you
might want to update records for employees named Mike Smith only. To do this, you edit
the WHERE clause as follows:

UPDATE T_SALES SET DATE_SHIPPED =:TU.DATE_SHIPPED,


TOTAL_SALES = :TU.TOTAL_SALES WHERE :TU.EMP_NAME = EMP_NAME
and
EMP_NAME = 'MIKE SMITH'

If you modify the UPDATE portion of the statement, be sure to use :TU to specify
ports.

What is the Difference between Delete ,Truncate and Drop?

DELETE

The DELETE command is used to remove rows from a table. A WHERE clause can be
used to only remove some rows. If no WHERE condition is specified, all rows will be
removed. After performing a DELETE operation you need to COMMIT or ROLLBACK
the transaction to make the change permanent or to undo it.

TRUNCATE

TRUNCATE removes all rows from a table. The operation cannot be rolled back. As
such, TRUCATE is faster and doesn't use as much undo space as a DELETE.

DROP

The DROP command removes a table from the database. All the tables' rows, indexes
and privileges will also be removed. The operation cannot be rolled back.

Difference between Rowid and Rownum

ROWID
A globally unique identifier for a row in a database. It is created at the time the row is
inserted into a table, and destroyed when it is removed from a
table.'BBBBBBBB.RRRR.FFFF' where BBBBBBBB is the block number, RRRR is the
slot(row) number, and FFFF is a file number.

ROWNUM

For each row returned by a query, the ROWNUM pseudocolumn returns a number

Page 34 of 94
TCS Internal
indicating the order in which Oracle selects the row from a table or set of joined
rows. The first row selected has a ROWNUM of 1, the second has 2, and so on.

You can use ROWNUM to limit the number of rows returned by a query, as in this
example:

SELECT * FROM employees WHERE ROWNUM < 10;

What is the difference between sub-query & co-related sub


query?

A sub query is executed once for the parent statement


whereas the correlated sub query is executed once for each
row of the parent query.

Example
Select deptno, ename, sal from emp a
where sal = (select max(sal) from emp
where deptno = a.deptno)
group by deptno

Get dept wise max sal along with empname and emp no.

Select a.empname, a.empno, b.sal, b.deptno


From EMP a,
(Select max (sal) sal, deptno from EMP group by deptno) b
Where
a.sal=b.sal and
a.deptno=b.deptno

Below query transpose rows into columns.


select
emp_id,
max(decode(row_id,0,address))as address1,
max(decode(row_id,1,address)) as address2,
max(decode(row_id,2,address)) as address3
from (select emp_id,address,mod(rownum,3) row_id from temp order by emp_id )
group by emp_id

other query

select
emp_id,

Page 35 of 94
TCS Internal
max(decode(rank_id,1,address)) as add1,
max(decode(rank_id,2,address)) as add2,
max(decode(rank_id,3,address))as add3
from
(select emp_id,address,rank() over (partition by emp_id order by
emp_id,address )rank_id from temp )
group by
emp_id

Also below is the logic for converting columns into Rows without
using Normalizer Transformation.

1) Source will contain two columns address and id.


2) Use sorter to arrange the rows in ascending order.
3) Then create expression as shown in below screen shot.

3) Use Aggregator transformation and check group by on port id only. As shown below:-

Page 36 of 94
TCS Internal
WTT.NAME WORK_TYPE,
(CASE
WHEN (PPC.CLASS_CODE = 'Subscription' AND L1.ATTRIBUTE_CATEGORY
IS NOT NULL)
THEN L1.ATTRIBUTE_CATEGORY
ELSE PTT.TASK_TYPE
END) TASK_TYPE,
PEI.DENOM_CURRENCY_CODE

Rank query:

Select empno, ename, sal, r from (select empno, ename, sal, rank () over (order by sal
desc) r from EMP);

Dense rank query:

The DENSE_RANK function works acts like the RANK function except that it assigns
consecutive ranks:

Select empno, ename, Sal, from (select empno, ename, sal, dense_rank () over (order by
sal desc) r from emp);

Top 5 salaries by using rank:

Select empno, ename, sal,r from (select empno,ename,sal,dense_rank() over (order by sal
desc) r from emp) where r<=5;

Or

Select * from (select * from EMP order by sal desc) where rownum<=5;

2 nd highest Sal:

Select empno, ename, sal, r from (select empno, ename, sal, dense_rank () over (order by
sal desc) r from EMP) where r=2;

Top sal:

Select * from EMP where sal= (select max (sal) from EMP);

Second highest sal

Page 37 of 94
TCS Internal
Select * from EMP where sal= (Select max (sal) from EMP where sal< (select max (sal)
from EMP));

Or
Select max (sal) from emp where sal < (select max (sal) from emp)

Remove duplicates in the table:

Delete from EMP where rowid not in (select max (rowid) from EMP group by deptno);

Get duplicate rows from the table:


Select deptno, count (*) from EMP group by deptno having count (*)>1;

SELECT column, group_function


FROM table
[WHERE condition]
[GROUP BY group_by_expression]
[HAVING group_condition]
[ORDER BY column];

The WHERE clause cannot be used to restrict groups. you use the
HAVING clause to restrict groups

Oracle set of statements:

DATA DEFINATION LANGUAGE :(DDL)


Create
Alter
Drop
DATA MANUPALATION LANGUAGE (DML)
Insert
Update
Delete
DATAQUARING LANGUAGE (DQL)
Select
DATA CONTROL LANGUAGE (DQL)
Grant
Revoke
TRANSACTION CONTROL LANGUAGE (TCL)
Commit
Rollback
Save point

There is a query to deleting ur duplicate records

Page 38 of 94
TCS Internal
delete temp
where rowid not in (select max (rowid) from temp
group by empno.
)

How to find duplicate rows in the table


Select

WIN_NR, CT_NR, SEQ_NR, count (*)

from intsmdm.V289U_SAP_CT_HA

group by

WIN_NR, CT_NR, SEQ_NR having count (*)>1

How to find second highest sal from the table


Select max (sal) from EMP where sal < (select max (sal) from emp)

If you want to have only duplicate records, then write the


following query in the Source Qualifier SQL Override,

Select distinct (deptno), deptname from dept_test a where deptno in(


select deptno from dept_test b
group by deptno
having count(1)>1)

Hierarchical queries
Starting at the root, walk from the top down, and eliminate employee Higgins in the result, but
process the child rows.
SELECT department_id, employee_id, last_name, job_id, salary
FROM employees
WHERE last_name! = ’Higgins’
START WITH manager_id IS NULL
CONNECT BY PRIOR employee_id = menagerie;

Block in PL/SQL:

The basic unit in PL/SQL is called a block, which is made up of three parts: a
declarative part, an executable part, and an exception-building part.

PL/SQL blocks can be compiled once and stored in executable form to increase
response time.

Page 39 of 94
TCS Internal
Store procedure a PL/SQL program that is stored in a database in compiled form .
PL/SQL stored procedure that is implicitly started when an INSERT, UPDATE or
DELETE statement is issued against an associated table is called a trigger.

Difference between Procedure and Function?

A procedure or function is a schema object that logically groups a set of SQL and other
PL/SQL programming language statements together to perform a specific task.

A package is a group of related procedures and functions, together with the cursors and
variables they use,
Packages provide a method of encapsulating related procedures, functions, and associated
cursors and variables together as a unit in the database.

I would like to contribute whatever i know,


according to me there are 3 main diffrences b/w procedure and function.

1. We approach procedure for some action


we approach function for computing value.

2. Procedure is not a part of expression. i mean we can’t call procedure from expressions
where as function can.

3. Function should return value.


Procedure may return none or more. I mean it can return values how many you wish.

A function returns a value, and a function can be called in a SQL statement. No other
differences.

Indexes:

Bitmap indexes are most appropriate for columns having low distinct values—such as
GENDER, MARITAL_STATUS, and RELATION. This assumption is not
completely accurate, however. In reality, a bitmap index is always advisable for
systems in which data is not frequently updated by many concurrent systems. In
fact, as I'll demonstrate here, a bitmap index on a column with 100-percent unique
values (a column candidate for primary key) is as efficient as a B-tree index.

When to Create an Index

Page 40 of 94
TCS Internal
You should create an index if:

• A column contains a wide range of values

• A column contains a large number of null values

• One or more columns are frequently used together in a WHERE clause or a


join condition

• The table is large and most queries are expected to retrieve less than 2 to 4
percent of the rows

Datafiles Overview

A tablespace in an Oracle database consists of one or more physical datafiles. A datafile


can be associated with only one tablespace and only one database.

Tablespaces Overview

Oracle stores data logically in tablespaces and physically in datafiles associated with the
corresponding tablespace.

A database is divided into one or more logical storage units called tablespaces.
Tablespaces are divided into logical units of storage called segments.

Control File Contents

A control file contains information about the associated database that is required for
access by an instance, both at startup and during normal operation. Control file
information can be modified only by Oracle; no database administrator or user can edit a
control file.

What is incremental aggregation and how it is done?

Gradually to synchronize the target data with source data,

There are further 2 techniques:-

Refresh load - Where the existing data is truncated and


Reloaded completely.
Incremental - Where delta or difference between target and
Source data is dumped at regular intervals. Timsetamp for
previous delta load has to be maintained.

Incremental aggregation performs aggregation on incremented

Page 41 of 94
TCS Internal
data..so based on requirements if we can use incremental
aggregation then definately it will improve performance..so
while develop mapping always keep in mind this factor too...

Dimensional Model: A type of data modeling suited for data warehousing. In a


dimensional model, there are two types of tables: dimensional tables and fact tables.
Dimensional table records information on each dimension, and fact table records all the
"fact", or measures.

Data modeling
There are three levels of data modeling. They are conceptual, logical, and physical. This section
will explain the difference among the three, the order with which each one is created, and how to
go from one level to the other.

Conceptual Data Model

Features of conceptual data model include:

• Includes the important entities and the relationships among them.


• No attribute is specified.
• No primary key is specified.

At this level, the data modeler attempts to identify the highest-level relationships among the
different entities.

Logical Data Model

Features of logical data model include:

• Includes all entities and relationships among them.


• All attributes for each entity are specified.
• The primary key for each entity specified.
• Foreign keys (keys identifying the relationship between different entities) are specified.
• Normalization occurs at this level.

At this level, the data modeler attempts to describe the data in as much detail as possible, without
regard to how they will be physically implemented in the database.

In data warehousing, it is common for the conceptual data model and the logical data model to be
combined into a single step (deliverable).

The steps for designing the logical data model are as follows:

1. Identify all entities.


2. Specify primary keys for all entities.

Page 42 of 94
TCS Internal
3. Find the relationships between different entities.
4. Find all attributes for each entity.
5. Resolve many-to-many relationships.
6. Normalization.

Physical Data Model

Features of physical data model include:

• Specification all tables and columns.


• Foreign keys are used to identify relationships between tables.
• Demoralization may occur based on user requirements.
• Physical considerations may cause the physical data model to be quite different from the
logical data model.

At this level, the data modeler will specify how the logical data model will be realized in the
database schema.

The steps for physical data model design are as follows:

1. Convert entities into tables.


2. Convert relationships into foreign keys.
3. Convert attributes into columns.

http://www.learndatamodeling.com/dm_standard.htm

Modeling is an efficient and effective way to represent the organization’s needs; It


provides information in a graphical way to the members of an organization
to understand and communicate the business rules and processes. Business
Modeling and Data Modeling are the two important types of modeling.

The differences between a logical data model and physical data model is shown
below.

Logical vs Physical Data Modeling


Logical Data Model Physical Data Model
Represents business information and defines Represents the physical implementation of the model
business rules in a database.
Entity Table
Attribute Column
Primary Key Primary Key Constraint
Alternate Key Unique Constraint or Unique Index
Inversion Key Entry Non Unique Index
Rule Check Constraint, Default Value
Relationship Foreign Key
Definition Comment

Page 43 of 94
TCS Internal
Type 1 Slowly Changing Dimension

In Type 1 Slowly Changing Dimension, the new information simply overwrites the original
information. In other words, no history is kept.

In our example, recall we originally have the following table:

Customer Key Name State


1001 Christina Illinois

After Christina moved from Illinois to California, the new information replaces the new record, and
we have the following table:

Customer Key Name State


1001 Christina California

Advantages:

- This is the easiest way to handle the Slowly Changing Dimension problem, since there is no
need to keep track of the old information.

Disadvantages:

- All history is lost. By applying this methodology, it is not possible to trace back in
history. For example, in this case, the company would not be able to know that
Christina lived in Illinois before.
- Usage:

About 50% of the time.

When to use Type 1:

Type 1 slowly changing dimension should be used when it is not necessary for the data
warehouse to keep track of historical changes.

Type 2 Slowly Changing Dimension

In Type 2 Slowly Changing Dimension, a new record is added to the table to represent the new
information. Therefore, both the original and the new record will be present. The newe record
gets its own primary key.

In our example, recall we originally have the following table:

Page 44 of 94
TCS Internal
Customer Key Name State
1001 Christina Illinois

After Christina moved from Illinois to California, we add the new information as a new row into the
table:

Customer Key Name State


1001 Christina Illinois
1005 Christina California

Advantages:

- This allows us to accurately keep all historical information.

Disadvantages:

- This will cause the size of the table to grow fast. In cases where the number of rows for the table
is very high to start with, storage and performance can become a concern.

- This necessarily complicates the ETL process.

Usage:

About 50% of the time.

When to use Type 2:

Type 2 slowly changing dimension should be used when it is necessary for the data warehouse to
track historical changes.

Type 3 Slowly Changing Dimension

In Type 3 Slowly Changing Dimension, there will be two columns to indicate the particular
attribute of interest, one indicating the original value, and one indicating the current value. There
will also be a column that indicates when the current value becomes active.

In our example, recall we originally have the following table:

Customer Key Name State


1001 Christina Illinois

To accommodate Type 3 Slowly Changing Dimension, we will now have the following columns:

Page 45 of 94
TCS Internal
• Customer Key
• Name
• Original State
• Current State
• Effective Date

After Christina moved from Illinois to California, the original information gets updated, and we
have the following table (assuming the effective date of change is January 15, 2003):

Customer Key Name Original State Current State Effective Date


1001 Christina Illinois California 15-JAN-2003

Advantages:

- This does not increase the size of the table, since new information is updated.

- This allows us to keep some part of history.

Disadvantages:

- Type 3 will not be able to keep all history where an attribute is changed more than once. For
example, if Christina later moves to Texas on December 15, 2003, the California information will
be lost.

Usage:

Type 3 is rarely used in actual practice.

When to use Type 3:

Type III slowly changing dimension should only be used when it is necessary for the data
warehouse to track historical changes, and when such changes will only occur for a finite number
of time.

Product
Effective Expiry
Product ID(PK) Year Product Name Product Price
DateTime(PK) DateTime
1 01-01-2004 12.00AM 2004 Product1 $150 12-31-2004 11.59PM
1 01-01-2005 12.00AM 2005 Product1 $250

Type 3: Creating new fields.


In this Type 3, the latest update to the changed values can be seen. Example mentioned below
illustrates how to add new columns and keep track of the changes. From that, we are able to
see the current price and the previous price of the product, Product1.

Product
Product ID(PK) Current Product Current Old Product Old Year

Page 46 of 94
TCS Internal
Product Price
Year Name Price
1 2005 Product1 $250 $150 2004

The problem with the Type 3 approach, is over years, if the product price continuously
changes, then the complete history may not be stored, only the latest change will be stored.
For example, in year 2006, if the product1's price changes to $350, then we would not be able
to see the complete history of 2004 prices, since the old values would have been updated with
2005 product information.

Product
Product Product Old Product
Product ID(PK) Year Old Year
Name Price Price
1 2006 Product1 $350 $250 2005

Star Schemas

A star schema is a database design where there is one central table, the fact table, that
participates in many one-to-many relationships with dimension tables.

• the fact table contains measures: sales quantity, cost dollar amount, sales dollar
amount, gross profit dollar amount
• the dimensions are date, product, store, promotion
• the dimensions are said to describe the measurements appearing in the fact table

The star schema is the simplest data warehouse schema. It is called a star schema
because the diagram resembles a star, with points radiating from a center. The center of
the star consists of one or more fact tables and the points of the star are the dimension
tables, as shown in Figure 2-1.

Figure 2-1 Star Schema

Text description of the illustration dwhsg007.gif

The most natural way to model a data warehouse is as a star schema, only one join
establishes the relationship between the fact table and any one of the dimension tables.

Page 47 of 94
TCS Internal
A star schema optimizes performance by keeping queries simple and providing fast
response time. All the information about each level is stored in one row.

Snow flake Schema

If a dimension is normalized, we say it is a snowflaked design.

Consider the Product dimension, and suppose we have the following attribute hierarchy:

SKU -> brand -> category -> department

• for a given SKU, there is one brand


• for a given brand, there is one category
• for a given category, there is one department

• a department has many brands


• a category has many brands
• a brand has many products

Snowflake schema, which is a star schema with normalized dimensions in a tree


structure.

Snowflake schema is a more complex data warehouse model than a star schema, and is a
type of star schema. It is called a snowflake schema because the diagram of the schema
resembles a snowflake.

Snowflake schemas normalize dimensions to eliminate redundancy. That is, the


dimension data has been grouped into multiple tables instead of one large table. For
example, a product dimension table in a star schema might be normalized into a
products table, a product_category table, and a product_manufacturer table in a
snowflake schema. While this saves space, it increases the number of dimension tables
and requires more foreign key joins. The result is more complex queries and reduced
query performance. Figure 17-3 presents a graphical representation of a snowflake
schema.

Page 48 of 94
TCS Internal
Figure 17-3 Snowflake Schema

Page 49 of 94
TCS Internal
Below is the simple data model

Below is the sq for project dim

Page 50 of 94
TCS Internal
Page 51 of 94
TCS Internal
Page 52 of 94
TCS Internal
1.ACW – Logical Design

ACW_ORGANIZATION_D
ACW_DF_FEES_STG ACW_DF_FEES_F Primary Key
Non-Key Attributes Primary Key ORG_KEY [PK1]
SEGMENT1 ACW_DF_FEES_KEY Non-Key Attributes
ORGANIZATION_ID [PK1] ORGANIZATION_CODE
ITEM_TYPE Non-Key Attributes CREA TED_BY
BUYER_ID
PRODUCT_KEY CREA TION_DATE
COST_REQUIRED
ORG_KEY LAST_UPDATE_DATE
QUARTER_1_COST LAST_UPDATED_BY
DF_MGR_KEY
QUARTER_2_COST
COST_REQUIRED D_CREATED_BY
QUARTER_3_COST D_CREATION_DATE
DF_FEES PID for DF Fees
QUARTER_4_COST
COSTED_BY D_LAST_UPDATE_DATE
COSTED_BY D_LAST_UPDATED_BY
COSTED_DATE
COSTED_DATE
APPROV ING_MGR
APPROV ED_BY
APPROV ED_DATE
APPROV ED_DATE
D_CREATED_BY
D_CREATION_DATE ACW_USERS_D
D_LAST_UPDATE_BY Primary Key
D_LAST_UPDATED_DATE USER_KEY [PK1]
EDW_TIME_HIERARCHY Non-Key Attributes
PERSON_ID
EMAIL_ADDRESS
ACW_PCBA_A PPROVAL_F LAST_NAME
Primary Key FIRST_NAME
ACW_PCBA_A PPROVAL_STG FULL_NAME
PCBA _APPROVAL_KEY
Non-Key Attributes [PK1] EFFECTIV E_STA RT_DATE
INV ENTORY_ITEM_ID Non-Key Attributes EFFECTIV E_END_DATE
LATEST_REV EMPLOYEE_NUMBER
PART_KEY
LOCATION_ID LAST_UPDATED_BY
CISCO_PART_NUMBER
LOCATION_CODE SUPPLY_CHANNEL_KEY LAST_UPDATE_DATE
APPROV AL_FLAG CREA TION_DATE
NPI
ADJUSTMENT APPROV AL_FLAG CREA TED_BY
APPROV AL_DATE D_LAST_UPDATED_BY
ADJUSTMENT
TOTA L_ADJUSTMENT D_LAST_UPDATE_DATE
APPROV AL_DATE
TOTA L_ITEM_COST D_CREATION_DATE
ADJUSTMENT_AMT
DEMAND D_CREATED_BY
SPEND_BY _ASSEMBLY
COMM_MGR COMM_MGR_KEY ACW_PRODUCTS_D
BUYER_ID Primary Key
BUYER_ID
BUYER RFQ_CREATED ACW_PART_TO_PID_D PRODUCT_KEY [PK1]
RFQ_CREATED Users
Primary Key Non-Key Attributes
RFQ_RESPONSE
RFQ_RESPONSE
CSS PART_TO_PID_KEY [PK1] PRODUCT_NA ME
CSS D_CREATED_BY Non-Key Attributes BUSINESS_UNIT_ID
D_CREATED_DATE PART_KEY BUSINESS_UNIT
D_LAST_UPDATED_BY CISCO_PART_NUMBER PRODUCT_FAMILY_ID
ACW_DF_A PPROVAL_STG D_LAST_UPDATE_DATE PRODUCT_KEY PRODUCT_FAMILY
Non-Key Attributes PRODUCT_NA ME ITEM_TYPE
LATEST_REVISION D_CREATED_BY
INV ENTORY_ITEM_ID ACW_DF_A PPROVAL_F
D_CREATED_BY D_CREATION_DATE
CISCO_PART_NUMBER Primary Key
D_CREATION_DATE D_LAST_UPDATE_BY
LATEST_REV
DF_APPROVAL_KEY D_LAST_UPDATED_BY D_LAST_UPDATED_DATE
PCBA _ITEM_FLAG [PK1]
APPROV AL_FLAG D_LAST_UPDATE_DATE
Non-Key Attributes
APPROV AL_DATE
LOCATION_ID PART_KEY
LOCATION_CODE CISCO_PART_NUMBER
BUYER SUPPLY_CHANNEL_KEY
BUYER_ID PCBA _ITEM_FLAG
RFQ_CREATED APPROV ED ACW_SUPPLY_CHA NNEL_D
RFQ_RESPONSE APPROV AL_DATE
Primary Key
CSS BUYER_ID
SUPPLY_CHANNEL_KEY
RFQ_CREATED
[PK1]
RFQ_RESPONSE
CSS Non-Key Attributes
D_CREATED_BY SUPPLY_CHANNEL
D_CREATION_DATE DESCRIPTION
D_LAST_UPDATED_BY LAST_UPDATED_BY
D_LAST_UPDATE_DATE LAST_UPDATE_DATE
CREA TED_BY
CREA TION_DATE
D_LAST_UPDATED_BY
D_LAST_UPDATE_DATE
D_CREATED_BY
D_CREATION_DATE

Page 53 of 94
TCS Internal
2.ACW – Physical Design

ACW_DF_FEES_F ACW_ORGANIZAT ION_D


ACW_DF_FEES_STG Columns Columns
Columns ACW_DF_FEES_KEY NUMB ER(10) [P K1] ORG_KE Y NUMB ER(10) [P K1]
SEGM ENT 1 VARCHAR2(40) PRODUCT _KEY NUMB ER(10) ORGANIZAT ION_CODE CHAR(30)
ORGA NIZAT ION_IDNUMB ER(10) ORG_KE Y NUMB ER(10) CRE AT ED_BY NUMB ER(10)
IT EM_T YPE CHAR(30) DF_MGR_KEY NUMB ER(10) CRE AT ION_DAT E DAT E
BUYER_ID NUMB ER(10) COST _REQUIRED CHA R(1) LAST_UPDAT E_DAT E DAT E
COST _REQUIRED CHAR(1) DF_FE ES FLOAT (12) LAST_UPDAT ED_BY NUMB ER
QUART ER_1_COSTFLOAT (12) COST ED_BY NUMB ER(10) D_CREA TED_BY CHAR(10)
QUART ER_2_COSTFLOAT (12) COST ED_DAT E DAT E D_CREA TION_DATE DAT E
QUART ER_3_COSTFLOAT (12) APP ROV ING_M GR NUMB ER(10) D_LAST _UPDAT E_DATE DAT E
QUART ER_4_COSTFLOAT (12) APP ROV ED_DAT E DAT E D_LAST _UPDAT ED_BYCHAR(10)
COST ED_B Y NUMB ER(10) D_CREA T ED_BY CHA R(10)
COST ED_DATE DAT E D_CREA T ION_DAT E DAT E
PID_for_DF_Fees
APPROVED_BY NUMB ER(10) D_LAST _UPDATE_BY CHA R(10)
APPROVED_DAT E DAT E D_LAST _UPDATED_DAT C EHA R(10)

EDW_T IME_HIE RARCHY


ACW_US ERS_D
ACW_PCBA_APPROVAL_F Columns
Columns USE R_K EY NUM BER(10) [P K1]
PCB A_APPROVAL_KEY CHA R(10) [PK1] PERSON_ID CHA R(10)
ACW_PCBA_APPROVAL_ST G
PART _KEY NUM B ER(10) EMAIL_ADDRESS CHA R(10)
Colum ns
CISCO_PA RT _NUMBE R CHA R(10) LAST_NAM E VARCHAR2(50)
INVENT ORY_IT EM_IDNUMB ER(10) FIRST _NAME VARCHAR2(50)
SUP PLY_CHANNE L_KEYNUM B ER(10)
LAT EST _REV CHAR(10) FULL_NAM E CHA R(10)
LOCAT ION_ID NUMB ER(10) NPI CHA R(1)
APP ROV AL_FLAG CHA R(1) EFFECT IVE_ST ART _DATD
EAT E
LOCAT ION_CODE CHAR(10) EFFECT IVE_END_DAT E DAT E
ADJUSTME NT CHA R(1)
APP ROV AL_FLAG CHAR(1) EMPLOYEE_NUMBER NUM BER(10)
APP ROV AL_DA TE DAT E
ADJUST ME NT CHAR(1) SEX NUM BER
APP ROV AL_DAT E DAT E ADJUSTME NT _AM T FLOAT (12)
SPE ND_BY_ASSE M BLY FLOAT (12) LAST_UPDAT E_DAT E DAT E
TOT AL_ADJUST MENT CHAR(10) CRE AT ION_DAT E DAT E
COMM _MGR_K EY NUM B ER(10)
TOT AL_IT EM _COST FLOAT (10) CRE AT ED_BY NUM BER(10)
BUY ER_ID NUM B ER(10)
DEMAND NUMB ER D_LAST _UPDAT ED_BY CHA R(10)
COMM_MGR CHAR(10) RFQ_CREATED CHA R(1)
RFQ_RESPONSE CHA R(1) D_LAST _UPDAT E_DATE DAT E
BUY ER_ID NUMB ER(10) D_CREA TION_DATE DAT E
CSS CHA R(10)
BUY ER VARCHAR2(240) D_CREA TED_BY CHA R(10)
D_CREAT ED_BY CHA R(10)
RFQ_CREAT ED CHAR(1)
D_CREAT ED_DAT E CHA R(10)
RFQ_RE SPONSE CHAR(1)
D_LAST_UPDATED_BY CHA R(10)
CSS CHAR(10)
D_LAST_UPDATE_DAT EDAT E

ACW_PRODUCT S_D
Columns
ACW_DF_APPROVA L_STG
PRODUCT _KEY NUMB ER(10) [P K1]
Columns
PRODUCT _NAME CHAR(30)
INVENT ORY_IT EM_ID NUMB ER(10) BUS INESS_UNIT _ID NUMB ER(10)
CISCO_PA RT_NUM BERCHAR(30) ACW_DF_APPROVA L_F ACW_PART _TO_PID_D
BUS INESS_UNIT VARCHAR2(60)
LATEST _REV CHAR(10) Colum ns Columns
PRODUCT _FAM ILY_ID NUMB ER(10)
PCBA_ITEM_FLAG CHAR(1) DF_APPROVAL_KEY NUMBER(10) [PK1] PART_T O_PID_KEY NUMB ER(10) [P K1]
PRODUCT _FAM ILY VARCHAR2(180)
APPROVAL_FLAG CHAR(1) PART _K EY NUMBER(10) PART_K EY NUMB ER(10)
IT EM_T YPE CHAR(30)
APPROVAL_DA T E DAT E CISCO_PA RT _NUMBE R CHA R(30) CISCO_PART _NUMBE RCHAR(30)
D_CREA T ED_BY CHAR(10)
LOCAT ION_ID NUMB ER(10) SUP PLY _CHANNE L_KEYNUMBER(10) PRODUCT_KEY NUMB ER(10) D_CREA T ION_DAT E DAT E
SUPPLY_CHANNEL CHAR(10) PCB A_IT EM_FLAG CHA R(1) PRODUCT_NAME CHAR(30)
D_LAST _UPDAT E_BY CHAR(10)
BUYER VARCHAR2(240) APP ROV ED CHA R(1) LAT EST _REVIS ION CHAR(10)
D_LAST _UPDAT ED_DAT CEHAR(10)
BUYER_ID NUMB ER(10) APP ROV AL_DAT E DAT E D_CREA TED_BY CHAR(10)
RFQ_CREAT ED CHAR(1) BUY ER_ID NUMBER(10) D_CREA TION_DATE DAT E
RFQ_RESPONSE CHAR(1) RFQ_CREAT ED CHA R(1) D_LAST _UPDAT ED_BYCHAR(10)
CSS CHAR(10) RFQ_RE SPONSE CHA R(1) D_LAST _UPDAT E_DATE DAT E
CSS CHA R(10)
D_CREA T ED_BY CHA R(10)
D_CREA T ION_DAT E DAT E
D_LAST _UPDATED_BY CHA R(10)
D_LAST _UPDATE_DAT EDAT E
ACW_SUPPLY_CHANNEL_D
Columns
SUP PLY _CHANNEL_KEYNUMB ER(10) [P K1]
SUP PLY _CHANNEL CHA R(60)
DES CRIPT ION VARCHAR2(240)
LAST _UPDAT ED_BY NUMB ER
LAST _UPDAT E_DAT E DAT E
CRE AT ED_BY NUMB ER(10)
CRE AT ION_DAT E DAT E
D_LAST_UPDAT ED_BY CHA R(10)
D_LAST_UPDAT E_DAT EDAT E
D_CREA T ED_BY CHA R(10)
D_CREA T ION_DAT E DAT E

Users

Page 54 of 94
TCS Internal
3. ACW - Data Flow Diagram

EDWPROD CDB ETL


D EDW_TIME_ Time Dim
HIERARCHY

ESMPRD
SJPROD D REFADM
X1 SJ and ODS Data ACW
D MFGISRO

D NRTREF

ODSPROD ODS Data


D CMERO

ACW Data

ESMPRD - DWRPT

D ACW_PCBA_ PCBA Approval D ACW_PCBA_


APPROVAL_STG APPROVAL_F

DF Approval D ACW_DF_
D ACW_DF_APPROVAL_
STG APPROVAL_F ACW BO Reports

D ACW_DF_FEES_ BO Reports
D ACW_DF_FEES_STG DF Fees F

Dimensional
D ACW_SUPPLY _ D ACW_ D EDW_TIME_
CHANNEL_D ORGANIZATIONS_D HIERARCHY_D

D ACW_PART_TO_ D ACW_PRODUCTS_D D ACW_USERS_D


PID_D

Logic for getting header for the target flat file

Page 55 of 94
TCS Internal
Implementation for Incremental Load
Method -I

Page 56 of 94
TCS Internal
Logic in the mapping variable is

Page 57 of 94
TCS Internal
Logic in the SQ is

Logic in the expression is to set max value for mapping var is below

Page 58 of 94
TCS Internal
Logic in the update strategy is below

Page 59 of 94
TCS Internal
Page 60 of 94
TCS Internal
Method -II

Updating parameter File

Logic in the expression

Page 61 of 94
TCS Internal
Main mapping

Sql override in SQ Transformation

Page 62 of 94
TCS Internal
Workflod Design

Parameter file

Page 63 of 94
TCS Internal
It is a text file below is the format for parameter file. We use to place this file in the unix
box where we have installed our informatic server.

[GEHC_APO_DEV.WF:w_GEHC_APO_WEEKLY_HIST_LOAD.WT:wl_GEHC_AP
O_WEEKLY_HIST_BAAN.ST:s_m_GEHC_APO_BAAN_SALES_HIST_AUSTRIA]
$InputFileName_BAAN_SALE_HIST=/interface/dev/etl/apo/srcfiles/HS_025_20070921
$DBConnection_Target=DMD2_GEMS_ETL
$$CountryCode=AT
$$CustomerNumber=120165

[GEHC_APO_DEV.WF:w_GEHC_APO_WEEKLY_HIST_LOAD.WT:wl_GEHC_AP
O_WEEKLY_HIST_BAAN.ST:s_m_GEHC_APO_BAAN_SALES_HIST_BELGIUM]
$DBConnection_Sourcet=DEVL1C1_GEMS_ETL
$OutputFileName_BAAN_SALES=/interface/dev/etl/apo/trgfiles/HS_002_20070921
$$CountryCode=BE
$$CustomerNumber=101495

Transformation…
Mapping: Mappings are the highest-level object in the Informatica object hierarchy,
containing all objects necessary to support the movement of data.
Session: A session is a set of instructions that tells informatica Server how to move data
from sources to targets.
WorkFlow: A workflow is a set of instructions that tells Informatica Server how to
execute tasks such as sessions, email notifications. In a workflow multiple sessions can
be included to run in parallel or sequential manner.
Source Definition: The Source Definition is used to logically represent an application
database table.
Target Definition: The Target Definition is used to logically represent a database table
or file in the Data Warehouse / Data Mart.
Aggregator: The Aggregator transformation is used to perform calculations on additive
data. Reduce performance.
Expression: The Expression transformation is used to evaluate, create, modify data or set
and create variables
Filter: The Filter transformation is used as a True/False gateway for passing data through
a given path in the mapping. Should be used earlier to reduce unwanted data to pass.
Joiner: The Joiner transformation is used to join two related heterogeneous data sources
residing in different physical locations or file systems
Lookup: The Lookup transformation is used to retrieve a value from database and apply
the retrieved values against the values passed in from another transformation.
Normalizer: The Normalizer transformation is used to transform structured data (such as
COBOL or flat files) into relational data
Rank: The Rank transformation is used to order data within certain data set so that only
the top or bottom n records are retrieved

Page 64 of 94
TCS Internal
Sequence Generator: The Sequence Generator transformation is used to generate
numeric key values in sequential order.
Source Qualifier: The Source Qualifier transformation is used to describe in SQL the
method by which data is to be retrieved from a source application system.
Stored Procedure: The Stored Procedure transformation is used to execute externally
stored database procedures and functions
Update Strategy: The Update Strategy transformation is used to indicate the DML
statement.
Input Transformation: Input transformations are used to create a logical interface to a
mapplet in order to allow data to pass into the mapplet.
Output Transformation: Output transformations are used to create a logical interface
from a mapplet in order to allow data to pass out of a mapplet.

Adventage of Teradata:
7. Can store Billions of rows.
8. parallel processing makes teradata faster than other RDBMS
9. Can be accessed by network attached and channel attached system
10. supports the requirements from diverse clients
11. automatically detects and recovers from hardware failure
12. allows expansion without sacrifice performance

Introduction
This document is intended to provide a uniform approach for developers in
building Informatica mappings and sessions.

Informatica Overview
Informatica is a powerful Extraction, Transformation, and Loading tool and is been
deployed at GE Medical Systems for data warehouse development in the Business
Intelligence Team. Informatica comes with the following clients to perform various tasks.

 Designer – used to develop transformations/mappings


 Workflow Manager / Workflow Monitor replace the Server Manager - used to
create sessions / workflows/ worklets to run, schedule, and monitor mappings for
data movement
 Repository Manager – used to maintain folders, users, permissions, locks, and
repositories.
 Server – the “workhorse” of the domain. Informatica Server is the component
responsible for the actual work of moving data according to the mappings
developed and placed into operation. It contains several distinct parts such as the
Load Manager, Data Transformation Manager, Reader and Writer.
 Repository Server - Informatica client tools and Informatica Server connect to
the repository database over the network through the Repository Server.

Page 65 of 94
TCS Internal
Informatica Architecture at GE Medical Systems

DEVELOPMENT ENVIRONMENT

GEMSDW1 (3.231.200.74)

INFORMATICA SERVER

SOURCE DATA
DWDEV

DEVELOPMENT DATABSE

IFDEV

INFORMATICA DEVELOPMENT REPOSITORY


(Americas Development Repository)

TESTING ENVIRONMENT
GEMSDW1 (3.231.200.74)

INFORMATICA SERVER

SOURCE DATA
DWTEST

TEST DATABASE

IFDEV

INFORMATICA DEVELOPMENT REPOSITORY


(Americas Development Repository)

PRODUCTION ENVIRONMENT
GEMSDW2 (3.231.200.69)

INFORMATICA SERVER STAGE DATABASE

SOURCE DATA DWSTAGE


Page 66 of 94
TCS Internal
IFMAR
MIRRORING

FIN2

INFORMATICA PRODUCTION REPOSITORY


REPORTING
(Americas Production Repository)
DATABASE

General Development Guidelines


The starting point of the development is the logical model created by the Data Architect.
This logical model forms the foundation for metadata, which will be continuously be
maintained throughout the Data Warehouse Development Life Cycle (DWDLC). The
logical model is formed from the requirements of the project. At the completion of the
logical model technical documentation defining the sources, targets, requisite business
rule transformations, mappings and filters. This documentation serves as the basis for the
creation of the Extraction, Transformation and Loading tools to actually manipulate the
data from the applications sources into the Data Warehouse/Data Mart.

To start development on any data mart you should have the following things set up by the
Informatica Load Administrator
 Informatica Folder. The development team in consultation with the BI
Support Group can decide a three-letter code for the project, which would be
used to create the informatica folder as well as Unix directory structure.
 Informatica Userids for the developers
 Unix directory structure for the data mart.
 A schema XXXLOAD on DWDEV database.

The best way to get the informatica set-up done is to put a request in the following
website.
http://uswaudom02medge.med.ge.com/GEDSS/prod/BIDevelopmentSupport.nsf/

Transformation Specifications
Before developing the mappings you need to prepare the specifications document for the
mappings you need to develop. A good template is placed in the templates folder
(\\3.231.100.33\GEDSS_All\QuickPlace_Home\Tools\Informatica\Installation_&_Development\Templates
). You can use your own template as long as it has as much detail or more than that which
is in this template.

Page 67 of 94
TCS Internal
While estimating the time required to develop mappings the thumb rule is as follows.
 Simple Mapping – 1 Person Day
 Medium Complexity Mapping – 3 Person Days
 Complex Mapping – 5 Person Days.
Usually the mapping for the fact table is most complex and should be allotted as much
time for development as possible.

Data Loading from Flat Files


It’s an accepted best practice to always load a flat file into a staging table before any
transformations are done on the data in the flat file.
Always use LTRIM, RTRIM functions on string columns before loading data into a stage
table.
You can also use UPPER function on string columns but before using it you need to
ensure that the data is not case sensitive (e.g. ABC is different from Abc)
If you are loading data from a delimited file then make sure the delimiter is not a
character which could appear in the data itself. Avoid using comma-separated files. Tilde
(~) is a good delimiter to use.
We have a screen door program, which can check for common errors in flat files. You
should work with the BI support team to set up the screen door program to check the flat
files from which your data mart extracts data. For more information you can go to this
link \\3.231.100.33\GEDSS_All\QuickPlace_Home\Projects\DataQA

Data Loading from Database tables


Mappings which run on a regular basis should be designed in such a way that you query
just that data from the source table which has changed since the last time you extracted
data from the source table. If you are extracting data from more than one table from the
same database by joining them then you can have multiple source definitions and a single
source qualifier instead of having joiner transformation to join them as shown in the
figure below. You can put the join conditions in the source qualifier. If the tables exist in
different databases you can make use of synonyms for querying them from the same
database

INSTEAD OF
T1 SQ1
T1

SQ1 T2 SQ1 JON1


T2

T3
T3 SQ1

Page 68 of 94
TCS Internal
Data Loading from tables in Oracle Apps
When you try to import the source definition from a table in Oracle Apps using source
analyser in designer you might face problems, as informatica cannot open up so many
schemas at the same time. The best way to import the source definition of a table in
Oracle Apps is to take the table creation script you want to import and create it in a test
schema and import the definition from there.

Commenting
Any experienced developer would agree to the point that a good piece of code is not just
a script, which runs efficiently and does what it is required to do, but also one that is
commented properly. So in keeping with good coding practices, informatica mappings,
sessions and other objects involved in the mappings need to be commented properly as
well. This not only helps in the production support team to debug the mappings in case
they throw errors while running in production but also this way we are storing the
maximum metadata in the informatica repository which might be useful when we build a
central metadata repository in the near future.
 Each folder should have the Project name and Project leader name in the
comments box.
 Each mapping should have a comment, which tells what the mapping does at very
high level
 Each transformation should have a comment in the description box, which tells
the purpose of the transformation.
 If the transformation is taking care of a business rule then that business rule
should be mentioned in the comment.
 Each port should have its purpose documented in the description box.

Log files
A session log is created for each session that runs. The verbosity of the logs can be
tailored to specific performance or troubleshooting needs. By default, the session log
name is the name of the mapping with the .log extension. This should not normally be
overridden. The Session Wizard has two options for modifying the session name, by
appending either the ordinal (if saving multiple sessions is enabled) or the time (if saving
session by timestamp is enabled). Be aware that when saving session logs by timestamp,
Informatica does not perform any deletion or archiving of the session logs.
Whenever using the VERBOSE DATA option of informatica logging use a condition to
load just a few records rather than doing a full load. This conserves the space on the Unix
Box. Also you should remove the verbose option as soon as you are done with the
troubleshooting. You should configure your informatica sessions to create the log files
at /ftp/oracle/xxx/logs/ directory and the bad files in /ftp/oracle/xxx/errors/ directory
where xxx stands for the three-letter code of the data mart.

Page 69 of 94
TCS Internal
Triggering Sessions and Batches
The standard methodology to schedule informatica sessions and batches is through
Cronacle scripts.

Failure Notification
Once in production your sessions and batches need to send out notification when then fail
to the load administrator. You can do this by calling the script
/i01/proc/admin/intimate_load_failure.sh xxx in the failure post session commands. The
script intimate_load_failure.sh takes the three letter data mart code as argument.

Page 70 of 94
TCS Internal
Naming Conventions and usage of Transformations
Quick Reference
Object Type Syntax
Folder XXX_<Data Mart Name>
Mapping m_fXY_ZZZ_<Target Table Name>_x.x
Session s_fXY_ZZZ_<Target Table Name>_x.x
Batch b_<Meaningful name representing the sessions inside>
Source Definition <Source Table Name>
Target Definition <Target Table Name>
Aggregator AGG_<Purpose>
Expression EXP_<Purpose>
Filter FLT_<Purpose>
Joiner JNR_<Names of Joined Tables>
Lookup LKP_<Lookup Table Name>
Normalizer Norm_<Source Name>
Rank RNK_<Purpose>
Router RTR_<Purpose>
Sequence Generator SEQ_<Target Column Name>
Source Qualifier SQ_<Source Table Name>
Stored Procedure STP_<Database Name>_<Procedure Name>
Update Strategy UPD_<Target Table Name>_xxx
Mapplet MPP_<Purpose>
Input Transformation INP_<Description of Data being funneled in>
Output Tranformation OUT_<Description of Data being funneled out>
Database Connections XXX_<Database Name>_<Schema Name>

General Conventions
The name of the transformation should be as self-explanatory as possible regarding its
purpose in the mapping.
Wherever possible use short forms of long words.
Use change of case as word separators instead of under scores to conserve characters.
E.g. FLT_TransAmtGreaterThan0 instead of FLT_Trans_Amt_Greater_Than_0.
Preferably use all UPPER CASE letters in naming the transformations.

Folder

XXX_<Data Mart Name>

XXX stands for the three-letter code for that specific data mart.
Make sure you put in the project leader name and a brief note on what the data mart is all
about in the comments box of the folder.

Page 71 of 94
TCS Internal
Example
BIO_Biomed_Datamart

Mapping
Mapping is the Informatica Object which contains set of transformations including source
and target. Its look like pipeline.
m_fXY_ZZZ_<Target Table Name or Meaningful name>_x.x
f= frequency
d=daily,
w=weekly,
m=monthly,
h=hourly
X = L for Load or U for Update
Y = S for Stage or P for Production
ZZZ = Three-letter data mart code.
x.x = version no. eg. 1.1

Example
Name of a mapping which just inserts data into the stage table Contract_Coverages on a daily
basis.
m_dLS_BIO_Contract_Coverages_1.1

Session
A session is a set of instructions that tells informatica Server how to move data from
sources to targets.
s_fXY_ZZZ_<Target Table Name or Meaningful name>_x.x
f = frequency
d=daily,
w=weekly,
m=monthly,
h=hourly
X = L for Load or U for Update
Y = S for Stage or P for Production
ZZZ = Three-letter data mart code.
x.x = version no. eg. 1.1

Example
Name of a session which just inserts data into the stage table Contract_Coverages on a daily
basis.
s_dLS_BIO_Contract_Coverages_1.1

A workflow is a set of instructions that tells the Informatica Server how to execute tasks such
as sessions, email notifications, and shell commands. In a workflow multiple sessions can be
included to run in parallel or sequential manner

Page 72 of 94
TCS Internal
WorkFlow

A workflow is a set of instructions that tells Informatica Server how to execute tasks such
as sessions, email notifications and commands. In a workflow multiple sessions can be
included to run in parallel or sequential manner.
Wkf_<Meaningful name representing the sessions inside>

Example
Name of a workflow which contains sessions which run daily to load US sales data
wkf_US_SALES_DAILY_LOAD

Source Definition
The Source Definition is used to logically represent an application database table. The
Source Definition is associated to a single table or file and is created through the Import
Source process. It is usually the left most object in a mapping.
Naming convention is as follows.
<Source Table Name>
Source Definition could also be named as follows
<Source File Name>
In case there are multiple files then use the common part of all the files
OR
<Database name>_<Table Name>
OR
XY_<Db Name>_<Table Name>
XY = TB if it’s a table
FF if it’s a flat file
CB if it’s a COBOL file
Example
Name of source table from dwdev database
TB_DWDEV_TRANSACTION, FF_OPERATING_PLAN

Target Definition
The Target Definition is used to logically represent a database table or file in the Data
Warehouse / Data Mart. The Target Definition is associated to a single table and is
created in the Warehouse Designer. The target transformation is usually the right most
transformation in a mapping
Naming convention is as follows.
<Target Table Name>
Target Definition could also be named as follows
<Database name>_<Table Name>
OR
XY_<Db Name>_<Table Name>
XY = TB if it’s a table
FF if it’s a flat file
CB if it’s a COBOL file
Example
Name of target table in dwdev database

Page 73 of 94
TCS Internal
TB_DWDEV_TRANSACTION, FF_OPERATING_PLAN

Aggregator
The Aggregator transformation is used to perform Aggregate calculations on group basis.

AGG_<Purpose>
Example
Name of aggregator which aggregates to transaction amount
AGG_SUM_OF_TRANS_AMT
Name of aggregator which is used to find distinct records
AGG_DISTINCT_ORDERS

Expression
The Expression transformation is used to perform the arithmetic calculation on row by
row basis and also used to convert string to integer vis and concatenate two columns.
:
1. Variable names should begin with the letters “v_’ followed by the datatype and
name.
o Character data – v_char/v_vchar/v_vchar2/v_text
o Numeric data – v_num/v_float/v_dec/v_real
o Integer data – v_int/v_sint
o Date data – v_date
o Sequential data – v_seq
2. Manipulations of string should be indicated in the name of the new port. For
example, conc_CustomerName.
3. Manipulations of numeric data should be indicated in the name of the new port.
For example, sum_AllTaxes.
Naming convention of the transformation itself is as follows.
EXP_<Purpose>
Example
Name of expression which is used to trim columns
EXP_TRIM_COLS
Name of exression which is used to decode geography identifiers to geography descriptions
EXP_DECODE_GEOG_ID

Filter
The Filter transformation is used as a True/False gateway for passing data through a
given path in the mapping. Filters are almost always used in tandem to provide a path for
both possibilities. The Filter transformation should be used as early as possible in a
mapping in order to preserve performance.
Naming convention is as follows.
FLT_<Purpose>
Filters could also be named as follows
FLT_<Column in Condition>
Example
Name of filter which filters out records which are already existing in the target table.
FLT_STOP_OLD_RECS

Page 74 of 94
TCS Internal
Name of filter which filters out records with geography identifiers less than zero
FLT_GEO_ID or FLT_GeoidGreaterThan0

Joiner
The Joiner transformation is used to join two related heterogeneous data sources residing
in different physical locations or file systems. One of the most common uses of the joiner
is to join data from a relational table to a flat file etc. The sources or tables joined should
be annotated in the Description field of the Transformation tab for the Joiner
transformation.
JNR_<Names of Joined Tables>
Example
Name of joiner which joins TRANSACTION and GEOGRAPHY table
JNR_TRANX

Lookup
The Lookup transformation is used to retrieve a value(s) from database and apply the
retrieved value(s) against a value(s) passed in from another transformation. The existence
of the retrieved value(s) can then be used in other transformations to satisfy a condition.
Lookup transformations can be used in either a connected or unconnected state. Where
possible, the unconnected state should be used to enhance performance. However, it must
be noted that only one return value can be passed out of an unconnected lookup.
The ports needed for the Lookup should be suffixed with the letters “_in” for the input
ports and “_out” for the output ports. Port data types should not normally be modified in
a Lookup transformation, but instead should be modified in a prior transformation.
 Often lookups fail and developers are left to wonder why. The datatype of a port
is absolutely essential in validating data through a lookup. For example, a
decimal(19,2) and a money datatype will not match.
 When overriding the Lookup SQL, always ensure to put a valid Order By or
Order By 1 statement in the SQL. This will cause the database to perform the
order rather than Informatica server as builds the cache.
Naming convention is as follows.
LKP_<Lookup Table Name>
Example
Name of lookup transformation looking up on transaction table would be
LKP_TRANSACTION

Normalizer
The Normalizer transformation is used to transform structured data (such as COBOL or
flat files) into relational data. The Normalizer works by having the file header and detail
information identified by the developer in the transformation, and then looping through
the structured file according to the transformation definition.
Norm_<Source Name>
Example
Name of Normalizer normalizing data in OS_TRANS_DAILY file
Norm_OS_TRANS_DAILY

Page 75 of 94
TCS Internal
Rank
The Rank transformation is used to order data within certain data set so that only the top
or bottom n records are retrieved. For example, you can order Stores by Sales Quarterly
and then filter only the top 10 Store records. The reference to the business rule governing
the ranking should be annotated in the Description field of the Transformation tab for the
Rank transformation.
RNK_<Purpose>

Example
Name of Rank which picks top 10 Customers by Sales Amounts.
RNK_TopTenCustbySales

Router
RTR_<Purpose>

Example
Name of which routes data based on the value of Geography Identifier
RTR_GeoidGreaterThan0
OR
RTR_GEO_ID

Sequence Generator
The Sequence Generator transformation is used to generate numeric key values in
sequential order. This is normally done to produce surrogate primary keys etc. It has been
observed that reusable sequence generators don’t work as efficiently as stand alone
sequence generators. To overcome this there are two options.
1) Use the procedure described in the appendix A of this document.
2) Use a trigger on the target table to populate the primary key automatically when a
record is inserted.

SEQ_<Target Column Name>

Example
Name of sequence generator feeding primary key column to transaction table
SEQ_TRANSACTION

Source Qualifier
The Source Qualifier transformation is used to describe in SQL (or in the native script of
the DBMS platform, e.g. SQL for Oracle) the method by which data is to be retrieved
from a source application system. The Source Qualifier describes any joins, join types,
order or group clauses, and any filters of the data.
Care should be exercised in the use of filters in the Source Qualifier or in overriding the
default SQL or native script. The amount of data can be greatly affected using this option,
such that a mapping can become invalid. Use this option only when it is known that the
data excluded will not be needed in the mapping.
Naming convention is as follows.

Page 76 of 94
TCS Internal
SQ_<Source Table Name>

Example
Name of source qualifier of Transaction table
SQ_TRANSACTION

Stored Procedure
The Stored Procedure transformation is used to execute externally stored database
procedures and functions. The transformation can execute any require functionality as
needed, from truncate table to complex business logic. Avoid using stored procedures as
far as possible as this makes the mappings difficult to debug and also reduces readability
of the code. Informatica doesn’t have the LIKE operator and you can use a store
procedure, which will do the LIKE test and send out a flag. Similarly any operator or
function, which is not available in informatica but available in the database server, can be
used by the usage of small stored procedure. You should resist the temptation of putting
all the logic in a stored procedure.
Naming convention is as follows.
STP_<Database Name>_<Procedure Name>

Example
Name of stored procedure to calculate commission in dwdev database
STP_DWDEV_Calc_Commission

Update Strategy
The Update Strategy transformation is used to indicate the type of data modification
(DML) that will occur to a table in the database. The transformation can provide
INSERT, UPDATE, or DELETE functionality to the data. As far as possible don’t use
the REJECT option of Update Strategy as details of the rejected records are entered into
the log file by informatica and hence this may lead to the creation of a very big log file.
Naming convention is as follows.
UPD_<Target Table Name>_xxx
xxx = _ins for INSERT
_dlt for DELETE
_upd for UPDATE
_dyn – dynamic (the strategy type is decided by an algorithm inside the
update strategy transformation
When using an Update Strategy transformation, do not allow the numeric representation
of the strategy to remain in the expression. Instead, replace the numeric with the
following:
0 - INSERT
1- DELETE
2 - UPDATE

Example
Name of update strategy which updates TRANSACTION table
UPD_TRANSACTION_upd

Page 77 of 94
TCS Internal
Mapplet
Mapplets are a way of capturing complex transformation logic and storing the logic for
reuse. It may also be designed to pre-configure transformations that are redundant, thus
saving development time. Mapplets usually contain several transformations configured to
meet a specific transformation need.
In order for mapplets to be reusable, input and output transformation ports are required.
These ports provide a logical interface from a mapping to the mapplet. As with all
interface designs, mapplets require careful design to ensure their maximum efficiency
and reusability.
All transformations contained within the mapplet should be named in accordance with the
Transformation Naming Convention listed above. The exception is that if the target data
mart name is required, it should not be included unless the mapplet is specific to a single
data mart project. If the mapplet is specific to a data mart project, make sure it is
documented as such.
It is important to ensure the Description field of the mapplet is completed. Additionally,
the functionality and reuse of the mapplet should be defined as well.

MPP_<Purpose>

Example
Name of mapplet, which splits monthly estimates to weekly estimates
MPP_SPLIT_ESTIMATES

Input Tranformation ( Mapplet Only)


Input transformations are used to create a logical interface to a mapplet in order to allow
data to pass into the mapplet. This interface may represent the next step in a mapping
from the output port side of any transformation.
INP_<Description of Data being funnelled in>
Example
INP_APJournalEntries

Output Tranformation ( Mapplet Only)


Output transformations are used to create a logical interface from a mapplet in order to
allow data to pass out of a mapplet. This interface may represent the next step in a
mapping to the input port side of any transformation.
OUT_<Description of Data being funnelled in>
Example
OUT_ValidJournalEntries

Database Connections
When creating database connections in server manager to access source databases and
target database you should follow the following naming convention to avoid confusion
and make production migration easy.
XXX_<Database Name>_<Schema_Name> where XXX stands for the three-letter
datamart code.
Example

Page 78 of 94
TCS Internal
Database connection for cfdload schema on dwdev database for ORS datamart sessions would be
ORS_dwdev_cfdload

Version Control
The version control feature provided by Informatica is not mapping specific but folder
specific. You cannot version individual mappings separately. You need to save the whole
contents of the folder as a different version when ever you want to change the version of
a single mapping. Hence we have proposed to do version control through PVCS. You
need to have the following structure set up in PVCS before you start you start
development.
PVCS----|
|-----Project Name---|
|-----Informatica_Folder_Name_1
|-----Informatica_Folder_Name_2|
|-----Mappings|---
Mapping1
|----
Mapping2
|-----Sessions|----
Session1
|----
Session1

You can start PVCS right from the day one when development begins. This way the
PVCS can server as the central repository for all scripts including informatica scripts
which will enable the developers to access the production scripts anytime. The name of
the mapping should reflect the version number. Refer to the naming conventions of
mappings. The first cut mappings should be named with a suffix “_1.0”, next whenever
you want to make changes you should first make a copy of the mapping and change the
suffix to “_1.2”.

Testing Guidelines
Testing a New Data mart
Get a fresh schema XXXLOAD created in the DWTEST database. Test your mappings
by loading data into XXXLOAD schema on DWTEST database. You can do a partial
load or a full load depending on the amount of data you have. This schema would later
serve the purpose of testing out any changes you make to the data mart once it’s moved
into production. For testing the sessions and mappings you can use the template in the
templates folder. You can use your own improvised template in case this doesn’t suffice
your requirement.

Page 79 of 94
TCS Internal
Testing changes to a data mart in production
First develop the changes in the Informatica Development Repository and test them out
by loading data into XXXLOAD schema on DWDEV database. Next make sure the
schema XXXLOAD in DWTEST database has exactly the same structure and data as in
XXXLOAD on DWSTAGE. Now test out all your changes by loading data into
XXXLOAD schema on DWTEST database. After you are satisfied with the test results
you can move the changes to production. Make sure you follow the same process to move
your changes from DWDEV to DWTEST as you would follow to move the changes to
DWSTAGE.

Production Migration
You should first go through the BI change control procedure described in the
document at the following link
\\uswaufs03medge.med.ge.com\GEDSS_All\QuickPlace_Home\Processes\Change_Contr
ol

Moving a New Data Mart To Production


The following documentation needs to be done before you can move the Informatica
scripts of a new data mart to production. You can find all the template at the
following link
\\3.231.100.33\GEDSS_All\QuickPlace_Home\Tools\Informatica\Installation_&_Development\Templates

1) Production Migration Document for Informatica (MD120). This is in addition to


the one you fill out for moving the Oracle Scripts of the new data mart to production.
2) Load Process Document. This document should explain through diagrams and text
the whole load process.
3) Some data marts might need more documents for adequately documenting the
loads and this can be discussed with the BI Support team on what all documents
need to be created.
You need to place all these documents on the file server also known as the Quick
place for the BI Support Team to review.

Moving a Change to Production


The following documentation needs to be done before you can move a change to
Informatica scripts, which are already in production.
1) Production Migration Document for Informatica (MD120). This document needs
to be updated with the change.
2) Change Document. This document needs to be filled out and placed in the
quickplace under the support folder of your project name. Rename the change
document file name to CC_99999.doc where 99999 stand for the global change id
given by the change control application. Make sure you analyze the impact of the
change you are making and document it in the change document. The impact
could be changes in load time, load strategy etc. If possible attach an email from
the user that says that the user has validated the change in the test environment.

Page 80 of 94
TCS Internal
Performance Tuning
The goal of performance tuning is to optimize session performance by eliminating
performance bottlenecks. To tune the performance of a session, first you identify a
performance bottleneck, eliminate it, and then identify the next performance
bottleneck until you are satisfied with the session performance. You can use the test
load option to run sessions when you tune session performance.
The most common performance bottleneck occurs when the Informatica Server writes
to a target database. You can identify performance bottlenecks by the following
methods:
 Running test sessions. You can configure a test session to read from a flat
file source or to write to a flat file target to identify source and target
bottlenecks.
 Studying performance details. You can create a set of information called
performance details to identify session bottlenecks. Performance details
provide information such as buffer input and output efficiency.
 Monitoring system performance. You can use system-monitoring tools to
view percent CPU usage, I/O waits, and paging to identify system bottlenecks.
 Once you determine the location of a performance bottleneck, you can
eliminate the bottleneck by following these guidelines:
 Eliminate source and target database bottlenecks. Have the database
administrator optimize database performance by optimizing the query,
increasing the database network packet size, or configuring index and key
constraints.
 Eliminate mapping bottlenecks. Fine tune the pipeline logic and
transformation settings and options in mappings to eliminate mapping
bottlenecks.
 Eliminate session bottlenecks. You can optimize the session strategy and use
performance details to help tune session configuration.
 Eliminate system bottlenecks. Have the system administrator analyze
information from system monitoring tools and improve CPU and network
performance.
If you tune all the bottlenecks above, you can further optimize session performance
by partitioning the session. Adding partitions can improve performance by utilizing
more of the system hardware while processing the session.
Because determining the best way to improve performance can be complex, change
only one variable at a time, and time the session both before and after the change. If
session performance does not improve, you might want to return to your original
configurations.
For more information check out the Informatica Help from any of the three
informatica client tools.

Page 81 of 94
TCS Internal
Performance Tips
If suppose I've to load 40 lacs records in the target table and the workflow
is taking about 10 - 11 hours to finish. I've already increased
the cache size to 128MB.
There are no joiner, just lookups
and expression transformations

Ans:

(1) If the lookups have many records, try creating indexes


on the columns used in the lkp condition. And try
increasing the lookup cache.If this doesnt increase
the performance. If the target has any indexes disable
them in the target pre load and enable them in the
target post load.

(2) Three things you can do w.r.t it,

1. Increase the Commit intervals ( by default its 10000)


2. Use bulk mode instead of normal mode incase ur target doesn't have
primary keys or use pre and post session SQL to
implement the same (depending on the business req.)
3. Uses Key partitionning to load the data faster.

(3)If your target consists key constraints and indexes u slow


the loading of data. To improve the session performance in
this case drop constraints and indexes before you run the
session and rebuild them after completion of session.

What is Performance tuning in Informatica


The aim of performance tuning is optimize session
performance so sessions run during the available load window
for the Informatica Server.

Increase the session performance by following.

The performance of the Informatica Server is related to


network connections. Data generally moves across a network
at less than 1 MB per second, whereas a local disk moves
data five to twenty times faster. Thus network connections
ofteny affect on session performance.So aviod twrok
connections.

Flat files: If u'r flat files stored on a machine other than


the informatca server, move those files to the machine that
consists of informatica server.

Relational datasources: Minimize the connections to sources


,targets and informatica server to improve session
performance.Moving target database into server system may
improve session performance.

Page 82 of 94
TCS Internal
Staging areas: If you use staging areas u force informatica
server to perform multiple datapasses. Removing of staging
areas may improve session performance.

yoU can run the multiple informatica servers againist the


same repository. Distibuting the session load to multiple
informatica servers may improve session performance.

Run the informatica server in ASCII datamovement mode


improves the session performance. Because ASCII datamovement
mode stores a character value in one byte.Unicode mode takes
2 bytes to store a character.

If a session joins multiple source tables in one Source


Qualifier, optimizing the query may improve performance.
Also, single table select statements with an ORDER BY or
GROUP BY clause may benefit from optimization such as adding
indexes.

We can improve the session performance by configuring the


network packet size,which allows data to cross the network
at one time.To do this go to server manger ,choose server
configure database connections.

If your target consists key constraints and indexes u slow


the loading of data. To improve the session performance in
this case drop constraints and indexes before you run the
session and rebuild them after completion of session.

Running a parallel sessions by using concurrent batches will


also reduce the time of loading the data. So concurent
batches may also increase the session performance.

Partittionig the session improves the session performance by


creating multiple connections to sources and targets and
loads data in paralel pipe lines.

In some cases if a session contains a aggregator


transformation ,you can use incremental aggregation to
improve session performance.

Aviod transformation errors to improve the session performance.

If the session containd lookup transformation you can


improve the session performance by enabling the look up cache.

If your session contains filter transformation ,create that


filter transformation nearer to the sources or you can use
filter condition in source qualifier.

Aggreagator,Rank and joiner transformation may oftenly


decrease the session performance .Because they must group
data before processing it. To improve session performance in
this case use sorted ports option.

Page 83 of 94
TCS Internal
1. Filter as soon as possible (left most in mapping). Process only the data necessary
and eliminate as much extra unnecessary data as possible. Use Source Qualifier
to filter data since the Source Qualifier transformation limits the row set extracted
from a source while the Filter transformation limits the row set sent to a target.

2. Only pass data through an Expression Transformation if some type of


manipulation is being done. If no manipulations are being done using that field
then push the data to the farthest active transformation possible. Turn off output
ports where the field isn’t passed on.

3. Cache lookups if source table is under 500,000 rows and DON’T cache for tables
over 500,000 rows.

4. Reduce the number of transformations. Don’t use an Expression Transformation


to collect fields. Don’t use an Update Transformation if only inserting. Insert
mode is the default.

5. If a value is used in multiple ports, calculate the value once (in a variable) and
reuse the result instead of recalculating it for multiple ports.

6. Reuse objects where possible.

7. Delete unused ports particularly in the Source Qualifier and Lookups.

8. Use Operators in expressions over the use of functions.

9. Avoid using Stored Procedures, and call them only once during the mapping if
possible.

10. Remember to turn off Verbose logging after you have finished debugging.

11. Use default values where possible instead of using IIF (ISNULL(X),,) in
Expression port.

12. When overriding the Lookup SQL, always ensure to put a valid Order By
statement in the SQL. This will cause the database to perform the order rather
than Informatica Server while building the Cache.

13. Improve session performance by using sorted data with the Joiner transformation.
When the Joiner transformation is configured to use sorted data, the Informatica
Server improves performance by minimizing disk input and output.

14. Improve session performance by using sorted input with the Aggregator
Transformation since it reduces the amount of data cached during the session.

Page 84 of 94
TCS Internal
15. Improve session performance by using limited number of connected input/output
or output ports to reduce the amount of data the Aggregator transformation stores
in the data cache.

16. Use a Filter transformation prior to Aggregator transformation to reduce


unnecessary aggregation.

17. Performing a join in a database is faster than performing join in the session. Also
use the Source Qualifier to perform the join.

18. Define the source with less number of rows and master source in Joiner
Transformations, since this reduces the search time and also the cache.

19. When using multiple conditions in a lookup conditions, specify the conditions
with the equality operator first.

20. Improve session performance by caching small lookup tables.

21. If the lookup table is on the same database as the source table, instead of using a
Lookup transformation, join the tables in the Source Qualifier Transformation
itself if possible.

22. If the lookup table does not change between sessions, configure the Lookup
transformation to use a persistent lookup cache. The Informatica Server saves and
reuses cache files from session to session, eliminating the time required to read
the lookup table.

23. Use :LKP reference qualifier in expressions only when calling unconnected
Lookup Transformations.

24. Informatica Server generates an ORDER BY statement for a cached lookup that
contains all lookup ports. By providing an override ORDER BY clause with
fewer columns, session performance can be improved.

25. Eliminate unnecessary data type conversions from mappings.

26. Reduce the number of rows being cached by using the Lookup SQL Override
option to add a WHERE clause to the default SQL statement.

UNIT TEST CASES TEMPLATE:

Page 85 of 94
TCS Internal
Actual Pass or Tested
Results, Fail By
Step Description Test Conditions Expected Results (P or F)
#
SAP-
CMS
Inter
faces
1 CMS Run the Informatica Interface sends an email As P Madhav
database Interface notification and stops expected a
down
2 Check the no Run the Informatica Count of records in flat file As P Madhav
of records Interface and table is same expected a
loaded in
table

3 Call the SP Run the Informatica Get the unique number As P Madhav
for getting Interface expected a
unique
sequence no
4 Run the Run the Informatica Interface will stop after As P Madhav
interface even Interface finding no flat files expected a
if no flat file
present on sap
server

Page 86 of 94
TCS Internal
Actual Pass or Tested
Results, Fail By
Step Description Test Conditions Expected Results (P or F)
#
5 Check for flat Run the Informatica Interface will load the data As P Madhav
files when Interface into CMS expected a
files are
present on sap
server
6 SAP host Run the Informatica Informatica Interface will fail As P Madhav
name changed Interface to do SCP of the files on to expected a
in the SCP SAP server and send an error
script email
7 SAP unix user Run the Informatica Informatica Interface will fail As P Madhav
changed in Interface to do SCP of the files on to expected a
the SCP script SAP server and send an error
email
8 CMS Run the Informatica Data is not loaded and files are As P Madhav
database Interface sent to errored directory expected a
down after
files are
retrieved from
SAP server
9 Stored Run the Informatica Informatica interface stops and As P Madhav
Procedure Interface sends an error email expected a
throws an
error
10 Check the Run the Informatica Value of DA_LD_NR in the As P Madhav
value of Interface control table is same as that expected a
DA_LD_NR I loaded in the table for that
in the control interface
table
11 Error during Run the Informatica Interface is stopped, files As P Madhav
load the data Interface moved to errored directory expected a
into CMS and email notification sent
tables
12 Error while Run the Informatica Interface sends an email As P Madhav
updating the Interface notification and stops expected a
control table
in CMS
CMS
SAP
Inter
faces
1 Value in CMS Run the Informatica Informatica Interface will not As P Madhav
Control table Interface generate any flat file expected a
is not set to
“STAGED”
2 Value in CMS Run the Informatica Informatica Interface will not As P Madhav
Control table Interface generate flat file expected a
is set to
“STAGED”

Page 87 of 94
TCS Internal
Actual Pass or Tested
Results, Fail By
Step Description Test Conditions Expected Results (P or F)
#
3 SAP host Run the Informatica Informatica Interface will fail As P Madhav
name changed Interface to scp the files on to SAP expected a
in the SCP server and send an error email
script
4 SAP unix user Run the Informatica Informatica Interface will fail As P Madhav
changed in Interface to SCP the files on to SAP expected a
the SCP script server and send an error email
5 Value in CMS Run the Informatica The status in the control table As P Madhav
Control table Interface updated to expected a
is set to “TRANSFORMED”
“STAGED”
6 Value in CMS Run the Informatica The status of each record is As P Madhav
Control table Interface updated to expected a
is set to “UNPROCESSED”
“STAGED”
and record
status is
“UNPROCES
SED”
7 File generated Run the Informatica No files are send to SAP As P Madhav
with no Interface server expected a
records
8 CMS Run the Informatica Interface sends an email As P Madhav
database Interface notification and stops expected a
down
9 SCP of files Run the Informatica Flat files are moved to As P Madhav
failed Interface error directory and send expected a
an email
10 SCP of files is Run the Informatica Flat files are moved to As P Madhav
successful Interface processed directory expected a
11 Check the no Run the Informatica Count of records updated is As P Madhav
of records Interface same as count of the records in expected a
updated in the the flat file
CMS table
12 Check the no Run the Informatica Count of records present in the As P Madhav
of records Interface flat file is same as the records expected a
present in the with status
flat file “UNPROCCESSED”
SLM
CMS
1 Value in SLM Run the Informatica Informatica Interface will not As P Madhav
Control table Interface generate any flat file expected a
is not set to
“STAGED”
2 Value in SLM Run the Informatica Informatica Interface will not As P Madhav
Control table Interface generate flat file expected a
is set to
“STAGED”

Page 88 of 94
TCS Internal
Actual Pass or Tested
Results, Fail By
Step Description Test Conditions Expected Results (P or F)
#
3 CMS Run the Informatica Interface sends an email As P Madhav
database Interface notification and stops expected a
down
4 Check the no Run the Informatica Count of records in CMS and As P Madhav
of records Interface SLM table is same expected a
loaded in
table
5 Call the SP Run the Informatica Get the unique number As P Madhav
for getting Interface expected a
unique
sequence no
6 Stored Run the Informatica Informatica interface stops and As P Madhav
Procedure Interface sends an error email expected a
throws an
error
7 Check the Run the Informatica Value of DA_LD_NR in the As P Madhav
value of Interface control table is same as that expected a
DA_LD_NR I loaded in the table for that
in the control interface
table
8 Error during Run the Informatica Interface sends an email As P Madhav
loading the Interface notification and stops expected a
data into
CMS table
9 Error while Run the Informatica Interface sends an email As P Madhav
updating the Interface notification and stops expected a
control table
in CMS
10 Error while Run the Informatica Interface sends an email As P Madhav
retrieving Interface notification and stops expected a
data from
SLM database
11 Error while Run the Informatica Interface sends an email As P Madhav
updating Interface notification and stops expected a
control table
on SLM
database

What is QA philosophy?

The inherent philosophy of Quality Assurance for software systems development is to


ensure the system meets or exceeds the agreed upon requirements of the end-users; thus
creating a high-quality, fully-functional and user-friendly application.

vijay kumar: What is 'Software Quality Assurance'?


Software QA involves the entire software development PROCESS - monitoring and
improving the process, making sure that any agreed-upon standards and procedures are

Page 89 of 94
TCS Internal
followed, and ensuring that problems are found and dealt with. It is oriented to
'prevention'.
vijay kumar: What is 'Software Testing'?
Testing involves operation of a system or application under controlled conditions and
evaluating the results (eg, 'if the user is in interface A of the application while using
hardware B, and does C, then D should happen'). The controlled conditions should
include both normal and abnormal conditions. Testing should intentionally attempt to
make things go wrong to determine if things happen when they shouldn't or things don't
happen when they should. It is oriented to 'detection'. (See the Bookstore section's
'Software Testing' category for a list of useful books on Software Testing.)

Organizations vary considerably in how they assign responsibility for QA and testing.
Sometimes they're the combined responsibility of one group or individual. Also common
are
vijay kumar: Why does software have bugs?

miscommunication or no communication - as to specifics of what an application should


or shouldn't do (the application's requirements).
software complexity - the complexity of current software applications can be difficult to
comprehend for anyone without experience in modern-day software development. Multi-
tiered applications, client-server and distributed applications, data communications,
enormous relational databases, and sheer size of applications have all contributed to the
exponential growth in software/system complexity.
programming errors - programmers, like anyone else, can make mistakes.
changing requirements (whether documented or undocumented) - the end-user may not
understand the effects of changes, or may understand and request them an
vijay kumar: What is verification? validation?
Verification typically involves reviews and meetings to evaluate documents, plans, code,
requirements, and specifications. This can be done with checklists, issues lists,
walkthroughs, and inspection meetings. Validation typically involves actual testing and
takes place after verifications are completed. The term 'IV & V' refers to Independent
Verification and Validation.
vijay kumar: What is a 'walkthrough'?
A 'walkthrough' is an informal meeting for evaluation or informational purposes. Little or
no preparation is usually required.

vijay kumar: What's an 'inspection'?


An inspection is more formalized than a 'walkthrough', typically with 3-8 people
including a moderator, reader, and a recorder to take notes. The subject of the inspection
is typically a document such as a requirements spec or a test plan, and the purpose is to
find problems and see what's missing, not to fix anything. Attendees should prepare for
this type of meeting by reading thru the document; most problems will be found during
this preparation. The result of the inspection meeting should be a written report.
Thorough preparation for inspections is difficult, painstaking work, but is one of the most

Page 90 of 94
TCS Internal
cost effective methods of ensuring quality. Employees who are most skilled at
inspections are like the 'eldest brother' in the parable in 'Why is it often hard for
vijay kumar: What is software 'quality'?
Quality software is reasonably bug-free, delivered on time and within budget, meets
requirements and/or expectations, and is maintainable. However, quality is obviously a
subjective term. It will depend on who the 'customer' is and their overall influence in the
scheme of things. A wide-angle view of the 'customers' of a software development
project might include end-users, customer acceptance testers, customer contract officers,
customer management, the development organization's
management/accountants/testers/salespeople, future software maintenance engineers,
stockholders, magazine columnists, etc. Each type of 'customer' will have their own slant
on 'quality' - the accounting department might define quality in terms of profits while an
end-user might defin
vijay kumar: What is SEI? CMM? CMMI? ISO? IEEE? ANSI? Will it help?

SEI = 'Software Engineering Institute' at Carnegie-Mellon University; initiated by the


U.S. Defense Department to help improve software development processes.
CMM = 'Capability Maturity Model', now called the CMMI ('Capability Maturity Model
Integration'), developed by the SEI. It's a model of 5 levels of process 'maturity' that
determine effectiveness in delivering quality software. It is geared to large organizations
such as large U.S. Defense Department contractors. However, many of the QA processes
involved are appropriate to any organization, and if reasonably applied can be helpful.
Organizations can receive CMMI ratings by undergoing assessments by qualified
auditors.
vijay kumar: Level 1 - characterized by chaos, periodic panics, and heroic
efforts required by individuals to successfully
complete projects. Few if any processes in place;
successes may not be repeatable.

Level 2 - software project tracking, requirements management,


realistic planning, and configuration management
processes are in place; successful practices can
be repeated.

Level 3 - standard software development and maintenance processes


are integrated throughout an organization; a Software
Engineering Process Group is is in place to oversee
software processes, and training programs are used to
ensure
vijay kumar: Level 4 - metrics are used to track productivity, processes,
and products. Project performance is predictable,
and quality is consistently high.

Level 5 - the focus is on continouous process improvement. The


impact of new processes and technologies can be

Page 91 of 94
TCS Internal
predicted and effectively implemented when required.

vijay kumar: What is the 'software life cycle'?


The life cycle begins when an application is first conceived and ends when it is no longer
in use. It includes aspects such as initial concept, requirements analysis, functional
design, internal design, documentation planning, test planning, coding, document
preparation, integration, testing, maintenance, updates, retesting, phase-out, and other
aspects.

Defect Life Cycle : when the defect was found by tester, he assigned that bug as NEW
status. Then Test Lead Analysis that bug and assign to developer OPEN status.
Developer fix the bug FIX status. Then tester again test the new build if the same error
occurs or not... if no means CLOSED status. Defect Life Cycle -> NEW -> OPEN -> FIX
-> CLOSED Revalidation cycle means test the new version or new build have the same
defect by executing the same testcases. simply like regression testing.

Bug reporting and Tracking


Using the testing methodology listed above our QA engineers, translators and
language specialists log issues, or bugs, on our Online Bug Tracking System.
Localization issues which can be fixed by our engineers will be fixed accordingly.
Internationalization and source code issues will also be logged and reported to the
Client with suggestions on how to fix them. Bug Tracking process is as follows:

1. New Bugs are submitted in the Bug tracking system account by the QA.

When a bug is logged our QA engineers include all relevant information to that bug
such as:

• Date/time logged

• Language

• Operating System

Page 92 of 94
TCS Internal
• Bug Type – e.g. functional, UI, Installation, translation?

• Priority – Low\Medium\High\Urgent

• Possible Screenshot of Problem

The QA also analyses the error and describes, in a minimum number of steps how to
reproduce the problem for the benefit of the engineer. At this stage the bug is
labelled “Open”. Each issue must pass through at least four states:
Open: Opened by QA during testing
Pending: Fixed by Engineer but not verified as yet
Fixed: Fix Verified by QA
Closed: Fix re-verified before sign-off

QA Process Cycle:

Software QA involves the entire software development PROCESS - monitoring and


improving the process.

The philosophy of Quality Assurance for software systems development is to ensure the
system meets or exceeds the agreed upon requirements of the end-users; thus creating a
high-quality, fully-functional and user-friendly application.

Phase I: Requirements Gathering, Documentation and Agreement

Phase II: Establishing Project Standards

Phase III: Test Planning

Phase IV: Test Case Development

Phase V: QA Testing

Phase VI: User Acceptance Testing

Phase VII: System Validation

QA Life Cycle consists of 5 types of


Testing regimens:

1. Unit Testing
2. Functional Testing
3. System Integration Testing
4. Regression Testing
5. User Acceptance Testing

Page 93 of 94
TCS Internal
Unit testing: The testing, by development, of the application modules to verify each
unit (module) itself meets the accepted user requirements and design and development
standards

Functional Testing: The testing of all the application’s modules individually to ensure
the modules, as released from development to QA, work together as designed and meet
the accepted user requirements and system standards

System Integration Testing: Testing of all of the application modules in the same
environment, database instance, network and inter-related applications, as it would
function in production. This includes security, volume and stress testing.

Regression Testing: This is the testing of each of the application’s system builds to
confirm that all aspects of a system remain functionally correct after program
modifications. Using automated regression testing tools is the preferred method.

User Acceptance Testing: The testing of the entire application by the end-users
ensuring the application functions as set forth in the system requirements documents and
that the system meets the business needs

Page 94 of 94
TCS Internal

You might also like