Informatica Best Practices
Informatica Best Practices
Overview
Chris Ward
Big Data Services CoE
Senior Teradata Consultant
Agenda
• Introductions
• BDS/DI COE & Partner Alliance
• Features
> Teradata TPT and the API
> Push Down Optimization
> Metadata Manager
> Hadoop & Informatica BDE
• Tips From the Expert
> Design Considerations
> Loader Utilities / PDO Design Patterns
> Performance Tips
> Administration Tips
> Informatica Versions
• Next Steps
• Resource Management
> Centralized Talent Base
> Training (Formal, OJT, Mentoring, Cross Training)
> Right person, Right place, Right time
• Load/Unload Protocols
> FastLoad Protocol – Bulk loading of empty tables
> MultiLoad Protocol – Bulk Insert, Update, Upsert, & Delete
> FastExport Protocol – Export data out of Teradata DB
> TPump – SQL application for continuous loading
Teradata Database
10 7/10/2014 Teradata Confidential
Before API – Integration with Scripting
5. Read
messages Message & Statistics File
Data –
Metadata Named
2. Write 4. Read 4. Write
Data Pipe Data Msgs.
Load Teradata
Data Source Informatica Protocol
Functions
Load Data
Database
Data Sources
PowerCenter
Oracle, etc. Parallel Transporter
Flat file
API
Stream
Update
Export
Load
13 7/10/2014
Teradata Database
Teradata Confidential
Parallel Input Streams Using API
Source
PowerCenter
Launches Multiple
Instances that PowerCenter PowerCenter PowerCenter
Read/Transform
Data in Parallel
TPT API
Teradata Parallel
TPT Load
TPT Load
Instance
Instance
TPT Load
Instance
Transporter Reads
Parallel Streams
with Multiple
Instances Launched
Through API
14 7/10/2014
Teradata Database
Teradata Confidential
Why TPT- API
• Scalable increased performance
> Use the power of PowerCenter and TD parallel processing
> Multiple load stream instances from source through TD target
> No landing of data, just pass buffers in memory
> 64 bit instead of TTU 32 bit
• Provides Tight Integration with PowerCenter
> ETL Integration is faster and easier
> PowerCenter has control over the entire load process
– Errors and statistics returned programmatically
– Checkpoints controlled by PowerCenter
104.8M 10M
• Read Source
• Load to Stage table via TPT API
• Perform basic transformations such as data type checks only
> Ie where field data type is not a string
– Load raw value into VARCHAR
– Test and convert to true data type and load into separate field
Staging Warehouse
Note that not all transformation rules are supported with pushdown
optimization (ELT) in the current release of Informatica 8.x.
• Meet SLA
> Well-positioned to make data available on time
• Update
• Insert
PDO 3 sec
Architect
Informatica BDE
SOURCE
DATA Profile
Profile
INTERFACE
Parse
ETL
TPT API
Databases Database
Cleanse
Match TDHC
Files
HIVE (HiveQL and UDFs) JDBC
INTERFACE
Servers & MAPREDUCE
Sql-H
Mainframe LOAD Sql-H
YARN Teradata
Informatica
JMS Queue’s Batch INTERFACE
HDFS INTERFACE
HDFS API
Transformation
Row-by-Row Set-based Row-by-Row
Strength
Transient, non-
Main Data Storage Raw data or
persistent, non- High value data
Type infrequently used
critical
Data Landing
Data Integration Reporting and Zone, Discovery
Primary Usage
Development Analytics Process,
Scalable Real-time
• Reusable Mapplets
> For groups of transformations that are commonly used together,
make them reusable. I.e. Data quality rules where data types
are converted, then lookup performed and result return in
standard format.
> Put in Common Folder.
• Source / Targets
> Use Shortcuts and put in common Folder
Indexes
–A mapping
containing parallel
lookups cannot be
pushed to the
database Convert To
–Design the
mapping and make
the lookups
sequential.
–A mapping containing
unconnected lookups
might perform slower
than the integration
service’s DTM when
utilizing PDO.
–The Cause is SQL Convert To
generated by the
integration service can
be complex and slow as
unconnected lookups
are converted into outer
joins.
–Compensate by
connecting the lookups
whenever possible.
–Mappings containing
an Aggregator
downstream from a
Sorter transformation
can not utilize
pushdown
- Handle this by Convert To
redesigning the
mapping to achieve full
or source-side
pushdown optimization,
configure the
Aggregator
transformation so that it
does not use sorted
input, and remove the
Sorter transformation
Pushdown
Optimization is not
possible for this
mapping because
variable ports are not
supported. Consider
replacing the
following
(NET_AMOUNT =
AMOUNT – FEE,
DOLLAR_AMT =
NET_AMOUNT *
RATE) with
(DOLLAR_AMT =
(AMOUNT – FEE) *
RATE)
1 PowerCenter expressions can be pushed down only if there is an equivalent database function
2 Not all databases are supported, Refer to documentation for further details
GA - Nov ’09
• Use the right time, right place, right tool. I.e. A database is
designed to do filters, joins, sorts and aggregations. Use the
appropriate tool at the right moment.
• Find your bottlenecks and make a single change at a time and re-
test. A change can just move the bottleneck elsewhere.
300000
250000
200000
150000
Total Batch Time
100000
Total Session Elasped Time
50000
0
00
00
00
00
00
00
00
00
00
00
00
00
00
:0
:0
:0
:0
0:
0:
0:
0:
0:
0:
0:
0:
0:
0:
0:
0:
0:
00
00
00
0
0
10
10
0
0
01
1
1
01
01
01
01
01
01
01
01
01
01
20
20
20
20
20
20
2
/2
/2
/2
/2
/2
/2
/2
/2
/2
/2
1/
3/
2/
2/
1/
1/
/
11
21
13
23
12
22
12
22
11
21
11
2/
3/
4/
5/
6/
7/
2/
2/
3/
3/
4/
4/
5/
5/
6/
6/
7/
> Monitor environments all i.e. Informatica, databases etc.
Common reporting methodology to be able to look at what was
the performance like across all systems at a point in time.
Informatica Server
1. Install Teradata Database PowerCentre/Exchange
2. Install TTU On Informatica Server
3. Test TTU (TPT API Load / ODBC)
4. Install Informatica PowerCenter PowerCenter/Exchange TPT API
5. Install Informatica TPT API module Module
Teradata
Database