IICS
IICS
for informatica powercenter we not needed network but for iics we need bcoz secure agent we
will be runnnig on cloud
on premise serveràinformatica server will be on top of unix server informatica powercenter will
be maintained by particular company by getting license and any issues with informatica
powercenter will raise vendor request to informatica.
Informatica cloudà informatica data will be maintained by informatica organization, iics will store
metadata by informatica.
The Informatica Cloud Secure Agent is a lightweight program that runs all tasks and enables
secure communication across the firewall between your organization and Informatica Intelligent
Cloud Services. When the Secure Agent runs a task, it connects to the Informatica Cloud
hosting facility to access task information. It connects directly and securely to sources and
targets, transfers data between them, orchestrates the flow of tasks, runs processes, and
performs any additional task requirement.
If the Secure Agent loses connectivity to Informatica Intelligent Cloud Services, it tries to
reestablish connectivity to continue the task. If it cannot reestablish connectivity, the task fails.
Dev/QA Secure agenet will be on premise(machine) same server
Informatica enterprise iPaaS includes multiple cloud data management products designed to
accelerate productivity and improve speed and scale
For developing mapping first u need to chck Adminstrator that whether all services are running
and up mainly the data integration server
Add connections:àIn IICS Admin there you go to connections in that new connection and add
connection details
In this u need to mention connection name and description and the Type of connection(for which
platform it connector ex.mysql,oracle,db2 and ….)
Runtime Environment if your database in machine then need to select on premise or else
Informatica hosted agent
In username give the database username following with password in next blank
Main thing code pageàUTF8 and any other according values present in data.
New à creating any informatica thing using this
My JobsàFor checking what tasks are running and completed and any other status
If file is present in Local then need to use this or else FTP/SFTP and mention path
IMP
SEQUENCE GENERATOR
2nd day it should start from continued yesterday or previous run à7,8,9,10,11
So that’s y storing the values of 1st day in memory we need to use mapping task(in realtime this
is practiced that each mapping associated with mapping task)
IICS Data Integration à Bundles (are used for application integration part)
Interview Question
Yes,We can run in TasksàPowercenter Tasks (Import Powercenter Workflow so you can run it as
Cloud Data Integration Task) But the main condition is Only disadvantage is Workflow
should contain only one session and U cannot edit anything in that imported one
And main thing is that $ , @ , ó These character need to be printed in target as it is present in
source data files
To do so important thing is Name Varchar(20) source to make these characters fit in Name column u
need to increase the precise to more and datatype of Varchar(20) à Nvarchar(100)
Bcoz we don’t know how much space these Unicode characters space may take in db size.
Scheduling
Blackout period -->A blackout period prevents all scheduled tasks and linear taskflows from
running
Based on Repeats under Scheduling options it will give more flexibility of which time,day and
hour
Delimitter
For Data is having , and u need that ask source team to change the delimiter to anything different as
double quotes(“) or pipe(|)
Example:
Data incoming
So instead comma we can make delimiter other to avoid issues at target loading
101|Navin , kumar|Vallamdas|20
In Source informatica
Powercenter we had debugger to view data so that how data is flowing and find error in iics we have
preview data instead of it for the same.
Normally we will give failure email so that on failure it should notify prod support team about it
4.Advanced Options
Maximum no,of log files --> 10 means last 10 log files will be saved
If any services are partially running the test connection will not succeded so try to make all services
up and running
SRC_FF_EMPLOY1.csv
SRC_FF_EMPLOY2.csv
D:\Downloads\SRC_FF_EMPLOY3.csv
So informatica will read the config.txt file and go to that paths of file if mentioned at different location
than the config file
For this all the three files mentioned in config should be of same schema(same structure of FF)
To avoid we need to generate dynamically the config file so we can use unix bash pre processing
commands(before starting session this command will execute) in mapping task to generate the
config file this should be done before reading the data from source that config file
If the columns name are different then need to manual do field mapping,
Means certain column will have only this length of values only
Src file as
BUT the only issue is incase u load the data then in database select that data u will notice that it
will have spaces also in it
So to avoid it we can use expression transformation in between target and source and apply
rtrim and ltrim on columns u feel would have spaces
Mapping source part
Select the object and click on three buttons and u will get option is Show dependies
Use dynamic file name means u will get file name according to expression u mentioned such as
in this case T_EMPLOYMAIN||TO_CHAR(SYSDATE,’MMDDYYYY’)||’.csv’
Here we use to_char to avoid the issues in filenaming like it will show error as naming errror
While source team sends data they will have header and footer on the file that is footer will have
no.of records in file so that we can validate the file.
IF target you dont want footer and header if we dont give command then it wont create
But if in source data it is there and u want to remove header and footer then u need to process
by command task of unix
Hierarchical Schema-->
Csv file is relational model and it will have in delimitter as one column value
But in xml and json file are having hierarchical format parent and child relation ship
1.Hierarchical Schema --> To define Structure(XML,JSON)
2.Hierarchy Parser --> To read data from json or xml convert to relational output
The Hierarchical Parser transformation converts hierarchical input into relational output.The
transformation processes XML or JSON input from the upstream transformation and provides
relational output to the downstream transformation.
To parse complex hierarchical structures, consider using the strucure parser transformation for a
more comprehensive handling of hierarchical file inputs
You can configure a hierarchical schema that defines the expected hierarchy of the output data
from a sample file or schema file. The hierarchy Parser Transformation converts hierarchical
input based on the hierarchical schema that you associate with transformation. You can use an
existing hierarchical schema or configure one.
Configure the field mapping to select which schema elements provide relational outputs
Input file (actual location of File similar to config in filelist)--> Hierarchial Parser --> Relation
model
Based on hierarchical schema hierarchical parser will convert the data into rdbms
Hierarchical builder will create xml file when given the input of relational database
Step 1 : Hierarchial Schema creation --> New --> Components--> Hierachial Schema
Step 2:Mapping
In Source u need to mention details as below u need to mention input.txt as object source
As soon we generate a file at output at runtime then there is an extra output file also generated
we can stop generating that by using scripts
FOR JSON_DATA
With sample file as we can also not give values we need just only the structure for reference
1.Intelligent Structure model – in this we will pass the excel data for reference
It was visual representation for accessing the data present in the file
2.Structure Parser --> The Structure Parser transformation transforms your input data into a
user-defined structured format based on an
intelligent structure model. You can use the Structure Parser transformation to analyze data
such as log files, clickstreams, XML or JSON files, Word tables, and other unstructured or
semi-structured formats.
ISM-->Under Components
Display will have more dropdowns if we have more than one sheet present in excel
Mapping
Filepath
And take structure parser use it and in that select the ism model and after selecting only u will
get option enabled for connecting the source to structure parser
In fields option of Structure Parser U need to take source filepath field map it to filepath of
structure parser
For source u need to give the config.txt which will have the filepath of xslx file
Note if u have more than one sheet in excel then accordingly u need to create the no.of targets
in the mapping
If there is date column in source then u need to add one expression transformation in between
structure parser and target to make date format compatible
In this we have filter condition we prefer this to filter out the data from source itself so that
performance is good the example is you have 5 years data in source and u need to work etl on
only 1year data so in this case its very helpful for performance perspective.
In filter option we need to just mention the filter condition informatica will convert/add into sql
whole query
If we select any source table we will get filter and sort option and other options in source
transformation
IF we go for SQL Override then the filter and sort option gets disabled
We can select Source Type as query it will just fetch data as mentioned in query
We can give source type as query which we not need to mention any particular table and we
can write join query sql for processiong as source itself
We can create target table at runtime u need to select connection so that table will get created
in that respective database schema --> Informatica will decide what datatypes to be there in
target fields depending on the target type connection incase it has teradata ,snowflake target
then according to that platforms it will change target datatypes
filter option two types:
Sort option
We can add more no.of columns sorting for having firstname same then we can go for lastname
sorting (more than one column sorting)
We can give source type as multiple objects where more than one related tables(object)
are used as source
If relation not mentioned between the objects(tables) then we need to create custom
relation ship and add them up
FILTER TRANSFORMATION
Filter transformation is used to filter out record any where in the pipeline
SQ Filter--> used to filter out record from source incase it is flat file we cannot use these filter
feature of SQ Transformations
Active : same no.of records at output as the input --> 50 records passed as input getting 30
records as output
Passive: expression doing some trim for values -->50 records passed as input and getting same
50 no.of records
Properties:
False --> It will block all the record example when we want to check the
connection at production so that we not need any loading of data at target
NOT ISNULL(Commision_pct)-->Filter
INSTR(JOB_ID,’REP’) -->Filter
10000
50000
6000
12000
I have a ff and it has record for all countries , i want to load india data to target
India
india
INDIA
S+sq+filter (lower(country)=’india’ AND salary>5000)
IN IN(department_id,40,50)
Like – instr()
NOTE:
If we get date of different formats in one source file then u need consider it as string and
make the required changes to the column
Or else make the format as one format and then proceed further
Passive Connected
U need to learn SQL single row functions which are used in informatica expression
transformations
Some intermediate logic we need to do in iics then we use variable port and it wont be taken as
output if we wanna take then need to use variable port value in another output port
We can give the output field and mention the expression with IIF statement
Scenario 1: Source and target same database
Now main thing is after loading data we need to chck whether the data is loaded
By doing so,
Target query
Minus
If the source and target is present in the oracle database itself then we can go with above
queries
The o/p query is null no values means data loaded without any error unit testing thing.
We need to create a another table in target without any transformation i.e 1 to 1 mappings
Need to minus query for chcking whether data load done is correct
If wrong then u will get to see o/p on running the minus query
Another way is taking exporting the source data into excel and target also into excel and
comparing both source and target excel to validate whether data got correctly loaded
O_commission_pct will chck for null if null it will replace with 0 or else as its is value
O_hiredate will give o/p whether hired year was leap year or not
O_increment salary means it will chck first if salary is less than 10,000 then salary will be
increased by 20% i.e *0.2 and else previosu condition doesnt satisfy then salary will be
increased by 10% i.e *0.1
EXPRESSION MACROS
And incase there are spaces for values present so u will be applying trim functions to columns
to remove the extra spaces u have in value
So in informatica u need to create each o/p column for the column logic will be applied
BUt in the case of IICS we can use MACROS OPTION
Vertical
Horizontal
Vertical Macros
Vertical Macro expands an expression vertically that implies vertical macro generates a set of
same expression condition on multiple incoming fields
The macros will be applied in vertical manner means to multiple columns we can apply that logic
For example we want to apply logic as trim and replace the ‘$’ of all columns by using vertical
macros we can apply that writing once at multiple columns instead of applying for each column
separately
INPUT MACRO FIELD --> Where u need to mention all columns as input to macros mention
IF u want to validate the expression replacing with column name instead of input macro will
validate
RTRIM(LTRIM(REPLACECHR(0,FIRST_NAME,'$','')))
AT TARGET SIDE:
U need to create a parameter value then u need to create a mapping task and field mapping of
macros in mct
U need to map the macros columns which will have suffix _out with columns
Horizontal Macros
Use a horizontal macros to generate a single complex expression that includes a set of
incoming fields or set of constants
In a horizontal macro,a macro input field can represent set of incoming field or set of constants
In horizontal macro the expression represents calculations that you want to perform with
incoming fields or constants.
A horizontal macro produces one result so a transformation output field passes the result to the
rest of the mapping.You configure the horizontal macro expression in the transformation output
field
The result of expressino pass to downstream transformation with the default field rule.You do
not need additional field rule to include the results of horizontal macro in mapping
To write the results of horizontal macro to the target,connect the transformation output field to a
target field in the target transformation
FLAG: %OPR_SUM[IIF(ISNULLL(%in_PORT%,1,0))]%
At runtime the application expands the expression horizontally as follows to include the field that
IIF(ISNULL(First_Name,1,0))+IIF(ISNULL(Last_Name,1,0))+IIF(ISNULL(Phone_number,1,0),1,0
)+IIF(ISNULL(Job_ID,1,0),1,0)
SNOWFLAKE CONNECTION
Add-on-connectors search for snowflake connector -->strt free trial and go to connections
-->new connection and select type of connector as below
This used for heterogenous sources and incase u need to join two data pipelines at any part of
mappings
IF TWO PIPELINES AND DATA YOU GET IS ACTIVE AND PASSIVE DATA THEN ONLY
YOU CAN DIRECTLY JOIN
BUT INCASE U R GETTING ACTIVE AND ACTIVE DATA THEN YOU NEED TO USE
SORTER TRANSFORMATION BEFORE JOINER AND TICK SORTED INPUT
Source transformation --> for joining to same sources and can be applied at source only
Always take big table(no.of columns and no.of records) table as detail table.
Always take small table as master for having less cache to store
If it creates small table for cache bcoz it will improve the performance
4types of Joins
Left Circle--> Detail table
2.Master Outer(Left Outer Join)==>All Records from Detail and only matching records from
Master
3.Detail Outer(RIght Outer Join)==> All records from Master Table and only matching from
Detail
Join Condition is common column between two sources and you need to have common column
Joiner takes only 2 sources only and if more no.of sources you need to source joiner
Incase u need to filter out data then u need to apply filter using source filter as it will read less
data from source and incase it is details source then less amount of cache to store
mapping
INCASE U WANT TO REMOVE FULL ROW DUPLICATE ANY WHERE IN PIPELINE U CAN
USE SORTER
ASCII VALUES
A-->Z (65-90)
a-->z(97-122)
U need to select which column by sorting and which way of sorting ascending or descending
In this Case Sensitive means if turned on
Arun
Baba
arun
Means it will sort first the Upper case and then lowercase characters
BY default null is treated as high in all oracle db side incase u need to change u need to enable
it
Aggregation Transformation
Min()
MAx()
Sum()
Avg()
Count()
Group by
IN SQL
This above statement can be fetched by using group by on non aggregator columns
BUT IN INFORMATICA
It allows to get data for non aggregate transformation without applying group by on non
aggregate columns
And incase in GROUP BY u r using then only u need to use SORTER BEFORE
AGGREGRATOR TRANSFORMATION the column which u made as GROUP BY That same
column need to be USED SORTED BY in Sorter Transformation
INCASE u gave Sorted input and u dont have group by mentioned in AGGREGRATOR
TRANSFORMATION THEN ITS FINE
IF there is more number of rows then when applied aggregator SUM on Salary then it will show
last record and SUM(SALARY) in salary column
Sorter-->Aggregator this will have effective performance incase of records are more in input
Incase u enable sorted input and u dont provide then it will throw the below error
DATA CACHING--->Incase u have data we wanna apply sum and other aggregators then it will
store alll data value and then u will aggregate sum by using whole records data
ROUTER TRANSFORMATION
If u want to make one data pipeline splitting into many data pipeline then we can go for Router
Transformation
It will filter data with conditions and will create by default one group which will have data which is
not satisfied by filter conditions mentioned
Example You have source data with multiple mode of payments you can use router
transformation to split data with mode of payments Such as UPI,CARD,NETBANKING
You can convert single data pipeline you can use router (single to many -->router)
And for multiple data pipeline into single data pipeline you need to union(many to single -->
union)
Incase Value of Column is ‘Grocery’ INcase u put this on filter sometime it may having matching
issue instead we can upper that value and chk whether its matching
RANK TRANSFORMATIONS
Rank Transformation will take time to complete as it need to compare each record and make
ranking based on values of that records
Best Example: You have 100 students of college u applied rank of these students based on
cgpa
But incase you want to know which year these students belong to then we will group them by
year and apply rank on students based on cgpa
NOTE: BY using Informatica u can only do Rank() u cannot perform dense rank()
U need to select rank by which column and Rank Order means least to high value for Bottom
and High value to least value for Top
No.of rows --> means incase no group by and u give 100 then it will pass 100 records
Incase u apply group by some column example department_id then it will give records -->2 it will
pass for that department_id only 2 rows that may be same rank ones or different ranks as 1 ,1
or rnk 1,2
You can parameterize the number of rows u need to rank 100 rows or less number or more
This below example is of no.of rows =2 and group by department_id and rank by salary
The above part is regarding cache how its storedd and other things about it
LOOKUP TRANSFORMATION
CONNECTED LOOKUP
Lookup will lookup on another table to fetch records similar to join but in lookup u can go with
nonequi joins
Source is Mart_data where product_id and transactions details such as price ,quantity, and so
on...
Connected ones:This connected lookup will be part of pipeline will be connected to other
transformations
Lookup SQL Override : This we can use incase we need only few other conditions based results
from lookup table such as only active records
Lookup Source Filter: Here you can mention the filter conditions to apply on lookup tables
Informatica will create lookup cache at very first time so that each every time of running
it will not go to lookup table for comparison instead it will check from lookup cache
created
But in IICS we need to make column_name should be given which prefix or suffix
Incase we give suffix as ‘emp’ for example then all source fields will have emp as suffix
before the column name that is emp_first_name
Below image in return fields u can select which all columns needed and remove
unnecessary columns in lookup table to avoid creating cache on it.
Mapping
LEFT OUTER JOIN MEANS SOURCE LEFT OUTER JOIN WITH LOOKUP TABLE
:LKP Then informatica understands that it’s related to unconnected lookup trans
Create o/p port only for that column which u need the data from lookup table for example above
picture we need department_name from department lookup table so mention department_name
as o/p port same follows for location_id as o/p port
Reusability—You can use or create one time only the unconnected lookup but u can use it
multiple times anywhere in that mapping
Condition base Lookup--- incase a table Product_data where we have product_details such as
name description and so on..
Source is Dmart_data where u have product_description and product_id but u have around
1000 records intotal out of which 100records having product_description as “Null” value
As logic IIF(ISNULL(prod_desc),:lkp.u_lookup(product_id),prod_desc)
Unconnected lookup will return only one port then how to get more than one port ?
Select the return port and precision high for data to acomodate as data would be concatenated
one
IN EXPRESSION
Substr(v_expop,-4) means will extract the data of last 4 positions that’s y used –4
Need to create o/p port for the columns need to be derived from concatenation example
department_name had two columns data that is department_name and location id both with
delimiter as | in between
And lookup table Products is having 1201 2 values /multiple values in that case
Or
Return all Lookup values (in this case lookup is active transformation)
So incase passed 100 will get 100 but due return all it might give more than no.of rows
passed
1201--> BANKING
1201-->PURCHASING
But when we select first row lookup it order by ascending order the order will be
First row-->BANKING
Then u need to use two seperate unconnected lookup transformation with each column
deriving from one transformation bcoz in u cannot do concatenation of column data in
case of FlatFile data
Whenever we want dynamic file in target we can go for transaction control transf
SQL--->TCL--->Commit,Rollback
Infa-->TC_Commit_Before
TC_Commit_After
TC_Rollback_before
TC_Rollback_after
TC_Continue_Transaction
100 1
102 2
103 3
104 4
Commit all records and create a file for that before records
107 2 Whenevre it encounter other number other than 5 or 1 then
it will
Do TC_Continue_Transaction
108 3
109 4
111 1
112 2
113 3
114 4
115 5
Example:
ON sorting
Product_id
100 100
101 100
102 100
103 101
104 101
100 101
101 102
102 102
103 102
100 103
101 103
102 103
103 104
104 104
It will create each file for product_id 100,101,102,103,104 we can go for commit before in
change in column data means any value change then 100 to 101 then 100 will go to one file
This will seperate the files according to product_id column value change
Mapping:
And In Transaction Control Condition --> If field Value Changes means that As soon field name
of counrty_name column changes it values for example India-->China then it wil execute the
operation mentioned in properties
Incase we are getting data from Mainframe – VSAM Files this will be having denormalized data
levels forms
To read this data we can use Normalizer after source(VSAM) and target as relational database
In Normalizer will have only 2 types of data types that is string and number so u need to
change the other data type if any into string/ number and again back to original data type
by using expression transformation
Incase u need to increment the data and example u loaded yesterday 16 and next day u
are loading data around 15 so incase u need to start from 17 u need to use mapping task
as it holds value incase of sequence generating case
If we dont use mapping task then it will just create again from 1
Mapping:
Normalizer Trans:
Normalized fields is which field u need to make transpose of and gc_id->generated column
Gk-->generated key
Wherever we want to have unqiue column value like surogate key then we can use seq
generator
NextVal--> It is used to generate next next value ,You can pass this port to any table or
transformation
CURVAL>NEXTVAL
To reset the value of sequence it’s easy to handlde with help of sequence generator
transformation
For example, You have generated 1 to 5 and you want reset value to 1 again after 5 this kind of
scenario we can use sequence generator.
Use Shared Sequences --> incase you have two mappings and their target is same table then
we can go for it. Means both mappings will share sequence number m1 & m2 will have seq s1
Cycle-->It shouldnt be enabled incase we are keeping seq generator column as primary key
Cycle means once it reaches the end value it will start again from the initial Value
Cycle start value---> if we enable by which value the cycle should start
Reset---> Means it will reset the values with each new session run of each mapping
No.of Cached Values --> If by default given 0 means it wont store any sequence number
values under informatica memory
But if we give any specific number such as 100 then it will keep it in informatica memory and
once the input data is received it will assign that values to data
Incase we have large amount of data 1million then assigning each data and seq number will be
time taking so instead generating some seq number and then assigning them value
Mapping-
TO FETCH ONLY 4TH RECORD
SHARED SEQUENCE
In this concept, the mapping ‘m1’ and mapping ‘m2’ having same target table ‘t’ then both can
have shared sequence generator
So it will have 1 to 1250 it will reserve another 1000 records from 1001 to 2000
JAVA TRANSFORMATION
For example we may want to use looping,encryption and decryption can be achieved easily by
Java Transformation
Active or Passive.
If active, the transformation can generate more than one output row for each input row
If passive, the transformation can generate one output row for each input row
Default is Active
SQL Transformation
In any where between pipeline if we need to use or call sql statements then we can go for SQL
Transformation
1.Source Transf
4.SQL Transformation
Sequnce can be generated by using sequence generator transformation and by using the SQL
transformation by calling the database object
Mapping:
SQL Transformation :
U get option to enter query and load the saved query from loacl .sql file
In the above image we can find the query :
SELECT
DEPARTMENT_NAME,
LOCATION_ID
FROM
DEPARTMENTS
WHERE
DEPARTMENT_ID = ?DEPARTMENT_ID?;
In this ?? Means at run time it will take the values and proceed fruther
In output fields need to mention the two columns of tables u r fetching the data
For example in above select query we are fetching location_id and department_name so need to mention that
columns in output fields and sql error show the error occured if needed in db
And pass through fields are like do u need all incoming fields to be passed to downstream or u
need to exclude few
At target u can check that all columns names while field mapping and the outputfields + pass
through fileds of SQL Transformations
Output Oracle fetching department_name and location_id from departments table for the data
available on employee table similar thing as lookup
Dynamic SQL File Loading
SOURCE:
Mapping:
SQL TRANSFORMATION
Create the o/p column for the columns name which are present in select query
You can create table and load data in that or else can create the table at run time for storing
data
o/p at database
A Static SQL Query runs the same query statement for each input row in SQL
transformation. But the data of the query can be changed for each input row using the
parameter binding in the SQL editor.
The string or column of static SQL is enclosed with question mark (?)
-----------------------------------------------------------------------------------------------------------------------
A Dynamic SQL query can execute different query statements for each input row. The
SQL query executed for each input row is changed using the string variables in the query
which link to the input fields passed to the SQL transformation.
To configure a string variable in the query, identify an input field by name in the query
and enclose the name in tilde characters ( ~). The query changes based on the value of
the data in the field.
A portion of SQL query (like table names) can be substituted with input fields from source.
Source:
Mapping:
We have select query in the input file so need to give that column as o/p fields
Configure the target as normally with runtime table creation or loading in already available data
We dont have update strategy transformation in iics We will have option in Target transformation
When u select update at target it will ask on which column basis u will update the table
When u open session log it will have updated rows incase also its just normal insert to table
IF UPDATE-->
IT WILL UPDATE THE EXISTING RECORD WITH ANY VALUE CHANGE IN FIELDS
IF DELETE-->
Mapping:
First u will take source data and then lookup the target data with primary key or particular field
Means it will chk whether employee_id already present if present chcks all other columns are
similar with src using lookup column values
Then in Router we will decide to make different groups of insert and update
If we mention just that column name of flag then it will consider that true only then only pass to
that groups
Another target with normal insert and truncate target option disabled
UNION TRANSFORMATION
But for tables we cannot use filelist instead we can have union concept for tables
We can also go for hetrogenous source like one file and anothe table but the same structure of
both
To change union all to union with the help of sorter after union transf.
Mostly we will take some logic from source by using router will convert the single input
data into seperate groups o/p and that o/p is applied with various transformations logic
and at the end need to combine all this seperated group o/p to one using union
transformation
Mapping:
Incase you want to add another new group click on + symbol
Incase you want to remove duplicates record then need to perform sorter transformation and
select any column for sorting and in Advanced select distinct option
In above statement owner is schema name and we gave lower of table_name to avoid
case sensitivity issues.
SIMPLE UPDATE
a.METHOD 1 – FLAG
b.METHOD 2- VERSION
c.METHOD 3-DATE
Scd TYPE 2
FLAG METHOD
Mapping
Another target update_update is for setting the flag to 0 for that old record
And need to lookup only target data which has flag 1 means we will lookup only on active
records
Creating expression transformation for updating and inserting the data into the table
Creating insert and update flag logic and insert_active_flag and update_active_flag for flag
column of target
TASKS
Use the replication task to replicate the data to target .You might replicate data to back up the
data or perform offline reporting
You can replicate data in salesforce objects or database tables to database or flat files.You can
configure a task to replicate all rows of a source object each time the task runs or to only
replicate the rows that changed since the last time the task was run.You can use a replication
task to reset the target tables and create target tables.
Why is load type disable when oracle/sql server/my sql is source in DRS?
DRS will always do full load when database is Source,user cannot change the load type as it is
as designed in task .To do incremental load,columns like created date,modified date are
necessary and these columns may not be available with all tables in database.Hence by default
task will do only full load
Workaround is used to data sync task and use $LastRunTime or any other appropriate data
filter which can mimic the incremental load
For eg. U have data on US Server and u are reporting from EU server so u need to replicate
data from US sever to EU server that is offline reporting
If we try to fetch data from another server location we may face latency issues that’s y we opt for
replication task to create replica of data
And for example u have target table and u need lookup on it but due but its taking time so we
will create local server replica of that table and perform lookup
If target is not availabe it will create and runtime and load data and if its availabe it will
incrementally load data
Under load type u will get options of delete records which will be enabled only if u have audit
columns in ur table aud_cols—create_date,update_date,update_user.
Mostly for backup tables for case of replication factor
3.Synchornization Task--
Synchronization task is used to synchronize the data between source and target
Use the synchronization task to synchronize data between a source and target
For example.you can read sales leads from your sales database and write them into salesforce
you can also use expressions to transform the data according to your business logic or use data
filters to
You can use the following source and target types in synchronization tasks
Database
Flat file
Salesforce
Synchornization task u can load one table at a time u can use multiple table into one using join
sql query
2nd load incremental load but in these unlike scd it will just update all rows with actual updated
row of source
U can add mapplet and edit data types in synchronization task
U have options to create expression or use functions on any column when u click on specific
column name
4.Powercenter Task – We can ran powercenter mappings,sessions,wfs in IICS but you cannot
have more than 1 session,1 mapping,1 workflow if it has then it will fail
You need to mention all parameters mentioned for powercenter in iics powercenter task
This is for loading data to target from source with out creating the mappings by mentioning
details such source-pre sql,sql override,post sql,2nd source(will act as join),target details,field
mapping,and runtime options(scheduling)
Fields will get automapped
6.Dynamic Task
Parameterization of mapping if we have multiple sources it should belong to only one database
Created a mapping which is having only parameter values no direct values of tables source and
target
Adding values to that mapping by using dynamic mapping task where its will act as one
mapping but multiple data pipelines
Parameterized Mapping: for this field mapping wont be enable so u need to take care that
both target and source have columns for proper data loading
By using new paramter option we can create parameters
For this task first u need to create one dummy mapping which will have all values as
parameterized
Here at Jobs section we can mention about as many sources and targets we want to process
Incase both jobs have same group they will run at same time
We do have mapplet
Using Mapplet in Mapping:
TASKFLOWS(WORKFLOW IN PC)
Different Taskflows
In this cannot add task steps in it.If we want to have different tasks in sequential flow of task
execute
You can change the order by changing the numbering of respective tasks
2.Taskflow
In this we can all the componets such as taskflows,tasksteps for integration and work on
it.
3.Parallel Tasks
Running the different task at same time parallely and main thing u shouldnt use same target
table while using parallel task
In this adding plus symbol will create another linking path for another parallel task
Normal parallel task with addition of decision means if the mentioned taskflow gets failed or
succeed or any other condition will proceed with the action mentioned in decision task
Below picture means the on mapping task we are putting the decision dependency where if it
equals to 1 means succeeds then notification task of email success will be sent
And another path2 where the notification task of email failed will be sent
U can also customize what all details u needed of current run in mail
In this we can add any task steps in the taskflow.If we want have different tasks in sequential
flow of task execute
7.Single Task
TASK STEP
1.Assignment
For assigning values u need to have any variable to get stored that value
Means if status of two task is success that is 1 then it will have value as Pass or else Fails
In decision mention the values based on str.
2.Data Task
3.Notification - Email
You need to create first file listener component first and mention that component under taskflow
file watch task
A file listener listens to files on a defined location.
A file event occurs when new files arrive to the monitored folder or the files in the monitored folder
are updated or deleted.
At which interval and between which time line it should chck for the file in Schedule
For above scenario SO it works like as soon we start running taskflow it will be searching for the
file with mentioned directory and filename if it finds then it deletes it
Incase Source teams as upload the files they are informed to upload indicator file once all files
are uploaded
So file listener will listen to that file and will delete it and further process will start running
So we have one wf will start everyday at 7.30pm but it has dependency of files and data in
specific tables then only it will start that workflow running.
So we can have file listener /file watch for above scenario to proceed further
Decision
Wait
Parallel
Jump
By the time data task get executed the command task will successfully excute command and
create master file i.e config .txt
For example, we have used a windows batch command but in realtime project we will be unix
based commands
For windows -->This file should have extension .bat (batch file)
In this command it will list down files under mentioned fodler and subfolder and will create
mention folder name at mentioned path
This below command will create a file which as ECHO is on and will be have sized KB File
This below command will create 0KB File and no content inside tha t file
Touch file or indicator file concept means the source teams 1 once the taskflow 1 gets
successfully completed it will just generate one touch file which will act as indicator file for
starting the next taskflow or next process Run.
While source team is transferring the data or file on sftp but in this process we shouldnt interrupt
Source team can place one touch file (indicator ) so once the whole process is completed and
next team will read that indicator file and delete or modify the file using file watch task and
proceed further.
Scenario: If the task runs more than 5minutes then it should send mail and it should fail
So that it will send mail to support team and team will do appropriate actions on the taskflow
All this thread will have load summary mentioned if u have big data such as 100k only then load
summary is generated which as data of busy% and idle time
Such as:
Busy %:IF busy percentage is 100 or more means it is utilizing all resources for that respective
operation, and if we are getting timeout
Idle Time:0%
Incase it is having source or writer as rdbms then we need to explain plan of query and improve
performance whther its going for full table scan or other way
If writer as indexes, we should drop index and load data and then post sql we enable
index/create index again
SQL-HInts
Collect stats
1.Pushdown Optimzation--> converting the transformations into sql queries and pushing down
to source/target/both and we cannot do pdo when we are dealing with flat file
Note:For running pdo u need to create mapping task
ce side and will try to convert the transformations as sql queries and pushing down to the
source side as select statements
b.To Target: Traverse from target side and will try to convert the transformation logic as sql
queries and the pushingdown to the target as insert or update statements
IN mapping, you could see three dots on top right corner>Push down Optmization > Preview
Pushdown
Next Pushdown Options> u can select here which type pushdown optimization u want to do
This above method u will get preview of query generated by pdo method but incase it isnt
working u can go for below method
--> while creating mapping task after entering definition details such as mapping name and
environment
-->next is schedule in that when u scroll down u will get option of pushdown optimization where
u can select the type of pdo
For example, u have mapping without pdo applied u are running it reading around 105 records
and target is loaded with 78 records due to some filter transformation data is reduced
But when using mapping with pdo of source it will read around 78 records and it will load those
78 records into target. So this way pdo can help in performance. In session log u will be able to
see transformation converted sql query
Partition we have only source level only nor any transformation level
Create mapping
Incase the range of records data gets changed i.e increased then it wont work
If employee_id is not falling in the above mentioned range then it wont be considered
Max:64 partition
Incase u want in table level data for metadata u need to do through rest api call
To get dependencies you need to go to three dots of mapping u get option of show
dependencies
Lookup Cache
2.Lookup caches
3.Reading Data
Cache means whichever table we are using that table data will be taken into our local memory
RAM for speed retrieval of data it will take data from cache instead of going to lookup tables
Both the caches will vanished once the session or task is completed successfully or failed
Once session failed or successful the cache will vanish
Once the cache is created During the run time the cache does not gets changed
During runtime incase lookup tables values get changed the static lookup cache will not be
changed
The static cache does not change while the integration service processes the lookup
Removing all other fields except customer_key,customer_id and md5_checksum so that lookup
cache does not get build large size
Lookup advanced properties must be checked and need to keep lookup source filter as
12/31/9999 to get only the active records for comparison
Routing data based on newlookuprow port values
Use dynamic lookup cache to keep the lookup cache synchronized with the target
When u enable lookup cache,a mapping task builds the lookup cache when it processes the first
lookup request.The cache can be static or dynamic ,if cache is dynamic the task updates the
cached based on action in the task so if the task uses the lookup multiple times downstream
transformation can use updated data
You can use dynamic cache with most of sources but you cannot use dynamic cache with flat
file or salesforce lookups.
Incase we are creating dynamic lookup then we will have one extra column called as
newlookupRow in lookup transformation
NewLookupRow Port will consists of values such as
If its dynamic lookup cache then it will first insert the data into lkp cache
1001-1
1002-1
1003-1
1004-1
1002-0
1005-1
1003-0
Based on newlookuprow port value it will decide whether to update or insert the data
3.PERSISTENT CACHE
If you want the cache is used by many mappings then u can use this cache
Normally caching gets vanished once the mapping task is executed successfully or it get failed .
If you are using persistent cached enabled and named shared cache
If you have multiple mappings under one taskflow out of which few mappings using same cache
Then u it might take 5mins for each mapping task for working on lookup cache so if 4 mappings
then around 20mins
Instead we can create named cache at mapping 1 and can be used by other mappings to avoid
time consumption
We can enable re-cache means incase data gets changed in between it will update the lookup
cache to avoid again wrong data load
Very first time u r running then u need to enable the recache incase its updating
1.When the source is having 1000 records only and lookup table as 100k we can go for direct
lookup table instead of creating cache and while looking up for lookup table we can remove
order by clause to avoid for more time at lookup
2.Passing a sorted input to lookup cache so that it doesnt sort on its own which takes lot of time
# Comments
$$a=10
$-->System Defined
$PM
Connections,values,Source Query-->Parameterization
IICS:
IN -->value to mapping
Normal 1 to 1 load
Need to create parameter first
You need to mention the file that is having the parameterized values
In session log u can view how the parameter value is place and fired as sql query to db
You can create parameterization for connections also
Variables
For each and every time it will change today and tomorrow there might be difference in values
Project Related
Methodology-Waterfall,Agile
Effort estimation = ETA(how many resources needed and how many will create)
6.Cert-Certification
7.Go live
8.Production Support-Oncall
L1,L2,L3 Support
After going go live developer need to provide Keb knwoledge based documentation or
knowledge transfer to support team
Waterfall vs Agile
Filesystem – SFTP-Winscp
We wont be always hitting the source for data instead once we will load the data in
ODS(Operational Data store) is a central database that provides a snapshot of the latest data
from multiple transactional systems for operational reporting.
What all data needed in target and how transformation needs to be applied on data
Usually developer creates or else Bussiness Analyst Develops the source to target mapping
sheet (sttm file)
Some columns may have direct mapping and some may have bussiness rules and cleansing
the data (ltrim and rtrim)
Migration in realtime
Powercenter task but has limitation that we cannot modify the mappings and it should be direct
mapping and workflow should have one session only
Migration Factory
Internal Utilities
The data is main source is from SFTP(client or vendor has access to one of environment sftp
they place the file there)
Data masking is done for PII(personal identification information) encryption and decryption when
accessing that data
QA stage where the data validation happens where all valid data is further forwarded and
rejected records/data passed to source teams with report
Symantic layer is last layer it has materialized view or view ,the materialized views get refreshed
Initial Load -->Extracting the data from source for first time
Historical Load-->
Delta Extract-->First time u have extracted data ,but daily the data is changing
Incremental Load-->The data gets inserted and updation this will never have T&L
Agile:
EPIC:
Jira-User Story-US230000128
Sprint Basis
Mon Tue
For example u started on wednesday 10th may so on 20th may u need to do demo or review of
what did in these sprint timeline
1 Product owner
1 Scrum master
1 Business analyst
Max 8 or 9 developers
Sprint standup metting --> daily 11am/3pm/9om-- discussion on the project progress
Meeting:
Sprint planning
Sprint demo
Sprint retrospective
If you are not able to complete any task it will be carry forward to Next sprint
While working on current sprint work we may have alternate mondays or any other day about
next sprint planning
If You are working on production support and you are getting same issues multiple times you will
be creating a problem ticket
Winscp used for file transfer for one server to another server ->Prod env we wont have access
to move the file
Incase u have to load daily large volumne of data you can follow intra day loading instead 1 time
load of whole day
Different Schedulers
IICS Scheduler-Native
UC4-Automic
Control-m
Apache Airflow
Autosys
There are various UC4 objects available to fullfil your scheduling requirements
1.JOBS-Job- are basic building blocks in UC4.for each program that need to run (for example
FTP database load) a job must be created .A job contains all the information required to execute
the program or script on the server and handle the output.When job si created it will specify the
program location,input and output parameters.Job are run both individually and as components
as UC4 process flows.
FUrthermore a job can be component of any number of process flow.If job definition is changed
the change is applied to every process flow that includes it
2.Job Plan --->Jobs are combined to create process flows.Process flows are equivalent to job
streams and run any number of jobs.Process flows include scheduling and exception handling
information.When jobs and process flows are added to a process flow.these objects are referred
to as process flow components
3.Events-->Jobs/Job plan can be trigerred based on time or existence of file.There are two
types of event object
3.a File event-->This is filewatcher objects which senses for file and if true triggers the actions
4.Schedule-->Schedule is Parent of all objects .The objects which are to be scheduled needs to
be placed in schedule object where frequency of job/job plan can be added.Schedule loads
every midnight and loads the objects are to triggered . Schedule runs for 24hrs a day and get
auto reload at 00.00 mighnight for next day execution
Jobplan contains mapping task,file transfer task(remote server to local server),Archival jobs,File
watcher jobs
In JOB, >Variables & Prompts > Variables > u will have jobname,job type,param location
In development env u will creating and after that we need to deploy all the dependencies of
mappings,mapping task,and so on and after that need to deploy the job uc4 components
Advantages of UC4
Imporves current data processes,such as automating process that are currently manual
Improves job handling across disparate systems,especially those that have specific output/input
dependencies ,sequence,etc
Provides the ability to check and validate jobs and notify on failure
Extends services by providing the ability to securely move data fromone system to another
system that cant easily to completed today
Allows for more informed and calculated decissions of maintence schedules and impact
analysis on scheduled jobs
By default ,if you make changes to schema,data integration does not pick up the changes
automatically.If you want data integration to refresh data object schema everytime the mapping
task runs.you can enable dynamic schema handling
A schema changed include one ore more following changes to data object
Commit Point
A commit interval is the interval at which the integration service commits the data to target
during a session.
The commit point can be factor of the commit interval type and the size of the buffer blocks
The commit interval is number of rows you want to use a basis for the commit point
The commit interval type is the type of rows that you want to use as basis for commit point
1.Target based
The integration service commits data based on no.of target rows and the key constraints on the
target table
The commit point also depends on buffer block size,the commit interval and the integration
service configuration for writer timeout
The integration service commits the data based on the number of source rows.
The commit point is the commit interval you configure in the session properties
The integration service commits data based on transactions defined in the mapping
properties.YOu can also configure some commit and rollback options in the session properties
Source based and user defined commits session have partitioning restrictoins.If you configure a
session with multiple partitions to use source –based or user-defined commit you can choose
pass through partitioning at certain partition points in a pipeline
COmmit interval--> by default 10,000 it will commit after every 10,000 records if you have more
data u can make commit interval also large
Commit on end of file-->You have 5 files and after each file it will commit on target side incase
you dont then at 5th file you got any error it will not load all 5 files then
Recover Strategy-->(for production support) ->recover from last checkpoint it will start from last
time where it failed
Incremental Aggregation -->today u have aggregate some values and u got the final
aggregated value
So tommorrow while again start it will take yesterday aggregate value from cache and
aggregate with todays values
Informatica oracle tables name it will only take upto 30 characters only --> to go for more
characters in informatica level u can used long name for oracle
Target load order--> in mapping u will flow run order--> in this we need to manual assign the
order of data loading
Constraint based load ordering—It will automatically select the order like which one to load first
to avoid constraint voilation issue
Deadlock
Incase 2jobs that is 2 different mappings loading table at same schedule time we will get write
intent lock it will say table is being used by another process
If u use deadlock retry in session/mapping task -->then m1 is loading and m2 is also loading
then it will retry for mapping m2 if enabled the deadlock retry it will not fail
Stop on error-->if any error it will not fail -->mostly will keep 0 ->if kept 1 means it will pass error
it will not fail it will reject the records
Tracing level
Verbose initialization --> In addition to normal tracing the integration service logs additional
initialization details, names of index and data file used an detailed transformation statistics
Verbose Data-->In addition to verbose initialization integration service logs each row that into
the mapping. Also notes where the integration service truncates string data to fit the precision of
a column and provides detailed transformation statistics
When you configure the tracing level to verbose data the integration service writes row data for
all rows in a block when it processes a transformation
In production we cannot debug or preview data through session log only we can analyse and
backtrack
Transformation Active or Passive
Source Active
Expression Passive
Aggregator Active(used to perform calculations on the
data such as sums, averages, counts, etc.)
Joiner Active(the number of rows in Joiner output
may not be equal to the number of rows in
Joiner Input. )
Rank Active(The rank transformation has an output
port by which it assigns a rank to the rows. Our
requirement is to load top 3 salaried
employees for each department)
Router Active
Union Active
STOP command on the session task, the integration service stops reading data from the source
although it continues processing the data to targets. If the integration service cannot finish
processing and committing data, we can issue the abort command. ABORT command has a
timeout period of 60 seconds.
Questions asked: 1. In the sequence generator, there are 2 ports CURVAL and NEXTVAL.
Which one is larger/has a higher value? - Current Value is the larger value because the nextval.
You can optimize performance by connecting only the NEXTVAL port in a mapping.
2. Union Transformation, does it perform a union or union all? - It performs a union all. It’s an
active transformation because Although we have duplicate records, it is still not an a passive.
There is a change happening to the row number, which is why it is active. To get rid of
duplicates, you can select distinct through sorter transformation.
3. You have an unconnected lookup, in which scenario will you use only unconnected and not
possible to use connected? - When you want to call a lookup based on a condition such as if
dept=10, then call the lookup, otherwise call the third lookup.
4. What is the reason that sorted input will improve the performance of aggregator? - When you
sort the data based on deptno for example, and when informatica is reading the data in the
cache and a new group is encountered, informatica does not know whether the last record is
dept 1,2, or 3, so the cache size will increase and performance will decrease.
5. If you have 10 records in a table, select * from table where rownum < 5, select * from table
where rownum = 5, select * from table where rownum > 5, what will be the output? - Rownum <
5 will give you 4 records - Rownum = 5 and Rownum > 4 will not work.
6. Is NULL treated as the highest or lowest value? - NULLS have the highest value. You have
the option to do NULLS first/last in order by clause.
7. What are the limitations of pushdown optimization? - Variable port cannot be used.
Normalization, parameter files, XML Transformation will not work with this. It doesn't work when
loading from flat file to database. Source and target must be the same database
8. What are the restrictions of a mapplet? - Normalizer and XML transformations cannot be
used. Sequence generator must be reusable. You also cannot have a target in a mapplet, so
update strategy cannot be used either.