Talend Scenario Based Interview Questions
Talend Scenario Based Interview Questions
Talend Scenario Based Interview Questions
1. Talend – Merge multiple files into single file with sorting operation.
2. Loading Fact Table Using Talend
3. ROWNUM Analytical Function in Talend
4. SCD-2 Implementations in Talend
5. Deployment strategies in Talend
6. Custom Header Footer in Talend
7. Data Masking Using Talend
8. How to use Shared DB Connection in Talend
9. Load all rows from source to target except last 5
10. Late Arriving Dimension Using Talend
11. Date Dimension Using Talend
12. Dynamic Column Ordering Of Source File Using Talend
13. Incremental Load Using Talend
14. Getting Files From FTP Server
15. Initializing Context At Run Time Using Popup
16. User Define Function In Talend
17. Calling DB Sequence From Talend
1. What is the difference between the ETL and ELT components of Talend Open Studio?
2. How does one deploy Talend projects?
3. What are the elements of a Talend project?
4. What is the most current version of Talend Open Studio?
5. How do you implement versioning for Talend jobs?
6. What is the tMap component?
7. What is the difference between the tMap and tJoin components?
8. Which *component* is used to sort data?
1. What is Talend ?
2. What is difference between ETL and ELT components of Talend ?
3. How to deploy talend projects ?
4. What are types of available version of Talend ?
5. How to implement versioning for talend jobs ?
6. What is tMap component ?
7. What is difference between tMap and tJoin compoents ?
8. Which component is used to sort that data ?
9. How to perform aggregate operations/functions on data in talend ?
10. What types of joins are supported by tMap component ?
11. How to schedule a talend job ?
12. How to runs talend job as web service ?
13. How to Integrate SVN with Talend ?
14. How to run talend jobs on Remote server ?
15. How to pass data from parent job to child jobs through trunjob component ?
16. How to load context variables dynamically from file/database ?
17. How to run talend jobs in Parallel ?
18. What is Context variables ?
19. How to export a talend job ?
1. Talend – Merge multiple files into single file with sorting operation.
Scenario: Merging multiple input sources into a single target along with the file
names with a additional column and sorting operations on all files with in the flow
The snapshot below shows the overall mapping.
As a source we have taken tFileList which will pull all our source files from the
specified directory, then tFileInput Component will read all files one by one. In
source file path you have to specify the global variable which holds the address of
the current file in process using–
((String)globalMap.get(“tFileList_1_CURRENT_FILEPATH”))
Later tMap is used to assign one extra column which holds the name of the file and
again name of the file is to be retrieve using global variable in our case it is
((String)globalMap.get(“tFileList_1_CURRENT_FILE”))
After completion of first job we fire a onSubJobOk trigger to do our rest work.
now,
tBufferInput will hold all the source file data at once and later tBufferInput will
pull all the buffered data produced by tBufferOutput.
Finally, tSort Component will sort all the three source file data using some column
as a key and tFileout- component will produce the final output.
that’s all, now you can create a job and simply run it—>>>
Now in tMap join all the dimensions with your source data using the keys and fetch
down the SURROGATE_KEY and put all those skeys in the fact table. In my join
condition i have used inner join as a join method.
In tmap component i have use some calculations to find out what are the percentage
in discount and total value for the order . It may be anything depend upon your
requirements.
note:- Just keep in mind data type conversion you have to keep in mind other wise
it’ll give you trouble a lot in my case i just converted my data types in staging
itself.
3.ROWNUM Analytical Function in Talend
Analytical funtions are quite useful in SQL and help us avoid extra coding.We
encountered a similar scenario where in we had to implement a function similer to
the ROWNUM() OVER (partition by clause) analytical functionality in Talend.
We ll give an overview of the Talend Job we created in order to attain the
functionality:
Prerequisites:
Create a context variable and Initialize it to 0.
Scenario: Implement the ROW_NUMBER() over (partition by) analytical functionality
in Talend
The image below gives the overview of the job, the components to focus on would be
the tMemorizeRows and the tJavaRow component.
This configuration specifies that we MEMORIZE the ID column , and the row count
specifies that we remember upto 2 rows flowing into the tMemorize component.
The most important part of the job ,is configuring the tJavaRow component ,wherein
we specify the logic for implementing this analytical function.
The java code compares the previous and current row using the index for
tMemorizeRows as shown in the snapshot..
4. SCD-2 Implementations in Talend
CREATING STAGING TABLE
1)From tOracleInput_3 bring the max surrogate key present in your dimension table
using query
“SELECT max(user_profile_sur_key) FROM ANKIT_KANSAL.user_profile”
USER PROFILE IS OUR DIMENSION TABKE
2)Create a context variable snumber of integer type and assign a default value 0
and in the next step assign the max surrogate key to the variable using tJavaFlex
component.
4) tOracleInput4 takes all your data from the source oracle system KEEP ALL THE
SOURCE AS A VARCHAR2 FORMAT ONLY.
5)using tMap component calculate your date and also you can find out your GOOD and
BAD records depending upon the business requiremetns.
AS FOLLOWS:-
*If both data structure does not matches because staging may consist the data in
string format for that column then you can achieve this join by converting the data
type with in tmap only Integer.parseInt(row1.USER_ID) at the join level only.
CATCHING FRESH INSERTS:-
At the right hand side you can achieve this functionality by enabling catch lookup
inner join reject to true.
IF A ROW ALREADY EXISTS THEN INSERTING A NEW RECORD WITH A FRESH VALUE AND UPDATING
THE OLD RECORD WITH END DATE
TO UPDATE THE EXISTING RECORD WITH ASSIGNING THE END DATE WITH THE CURRENT DATE
tFileInputDelimited1_1 is used to take input source file, just select limit=1 while
fetching records and in, tJavaFlex_1 write your own header which you want to
populate on the destination file in the main body code section.
let’s say in my case i have used
row3.first=”First Name”;
row3.last=”Last Name”;
Please be sure that in your tFileOutPutExcel_1 you have to uncheck include header
option. This completes your first job.
now,
In your second job use your input source file and process the data as per your
requirements and further after processing put all your data to the same output file
as defined earlier kindly uncheck include header option and check append existing
file and append existing sheet option.
last,
In your third job while fetching data use limit = 1 for records and only select one
column in your schema definition and further in tJavaFlex component use the
following code to print the number of rows at the header.
row5.first=((Integer)globalMap.get(“tFileInputDelimited_3_NB_LINE”)).toString()+”
“+”rows”;
*file names in all the three jobs must be same with different select options.
7.Data Masking Using Talend
Data Masking–>>Data Masking is a process of encrypting a data field to protect data
that is classified as personal identifiable data, personal sensitive data or
commercially sensitive data, however the data must remain usable for the purposes
of undertaking valid test cycles. It must also look real and appear consistent.
production systems generally consists of intensively sensitive data and you can not
take chances to make it available for others easily without some changes and for
better performance of applications it’s also required to have similar kind of data
for testing purposes.
If we are talking to current social networking boom if any of user related
information become available without any security measures than it may lead to
various disastrous results. So, to overcome these problems some encrypting
strategies are required which will fulfill the needs.
Now,
Implementing Data Masking Using Talend
Overall Job Descriptive Image:-
JOB DESCRIPTION–>>
A source– file is taken which contains all valid data but which you do not want to
give directly after known risks involved.
Replacement– File contains the data which is to be used to replace against the
source matching data.
Take a tReplaceList component from the palette, and it properties define as:-
Lookup search column is the column which contains the same data present in your
source file column and corresponding Lookup replacement column contains the data
which is used to map/replace with the matches found against the source column.
In column option part you have to check the box on which you want to search
performed and simultaneously replacement should be done.
e.g.
Your source column contains:-
Text(/*Col_name*/)
“Hello this is a text which is to be replaced”
output generated
HO this tis a msg haha tis to be wow
and finally using tSendMail component the transformed file is sent to the
Destination with all the security measures applied.
8.How to use Shared DB Connection in Talend
Today we will discuss how to use or register a shared DB connection, across your
jobs. In real time scenarios it’s not recommended to create connections again and
again so instead of creating new connections you can use your old registered
connections through out your sub jobs.
For Demonstration purpose we have created three jobs in which we are using a single
connection through out our all jobs.
JOBS DESC:-
1. shared_conn_demo–>> This job holds our two child jobs (parent,child)
2. parent_db_conns–>>The primary job is the one in which connection(shared) is
registered for our sub jobs.
3. child_job_conn–>>This is the third job which will be using the connection
created in parent job.
STEPS
1) parent_db_conns:- Create a connection as you create normally and then at the
last component property check use or register a shared DB connection and put a name
you want to give for shared connection.
NOTE- The name must be in “”
2) shared_conn_demo:-Now you just have to connect both the jobs and then run them.
Give the trigger as OnSubjobOk
That’s it now you can use or share your connections through out your all jobs and
save connections.
9.Load all rows from source to target except last 5
SCENARIO
Today we will discuss a situation in which we have to dump all source records to
target except the last 5 records.
SOLUTION
We need a couple of components to achieve this requirement as given below in
snapshot.
SEQUENCE CREATION
By using above class and method you can generate a sequence in Talend.
LOGIC
Pull all the rows from source and further in tmap create a sequence using
Numeric.sequence method further sort all the rows using tSortRow in descending
order key based upon that sequence generated in tMap, now create another sequence
using tmap. Resulting your last rows from source that are currently at top because
of sort method will assigned a sequence from 1 and so on.
Finally, use a tFilter and restrict all rows that you want to limit in my case its
5
Just use filter and give condition as
The number you select in tFilter the process will leave those rows to reach to the
target.
10.Late Arriving Dimension Using Talend
In one of our previous posts we had already covered the concept and implementation
of late arriving dimension using informatica.
So, directly we will show you how to achieve this functionality using Talend DI
tool.
COMPONENTS REQUIRED
• tOracleInput
• tOracleOutput
• tLogRow
• tJavaFlex
• tMap
Given below is the complete job description:-
JOB DESCRIPTION:-
1. Create a first job that will bring the max surrogate key (using sequel query)
from the production dimension table and save in one of context variable using
tJavaFlex Component. In main code section of tJavaFlex write
context.prod_skey=row4.prod_skey;
2. Create a second job that runs after successful completion of first job and here
bring the tOracleInput_1 component that will bring all the source data that resides
into your staging area.
3. Pull two more oracle input sources that will act as a lookup one for prod_dim
table and other one is for cust_dim table.
4. tMap Job Desc as described in the below figure
Use Left Outer Joiner Model at the left hand side in both the lookup i.e. for
customer and product.
THIS JOB IS CREATED TO IMPLEMENT LATE ARRIVING DIMENSION CONCEPT FOR PROD_DIM ONLY.
5. The Top right corner output in tMap is used to populate your normal fact table
but make sure to check whether your prod_dim skey must not be null.
6. The second insert is used again to update your fact table but with the record
that contains no surrogate key as late arriving dimension concept here you just
increment your context variable and populate using ++context.prod_skey.
7. And Finally Insert a record in the product dimension for that just add one more
output in tMap with the name like insert_prod with the same check constraint and
use the incremented context variable.
8.In my job i have used tLogRow for logging purposes in real scenario you have to
use tOracleOutPut component to be effective for DB.
9. For prod_dim insert output in tOracleOutput remember to use update or insert
because if any changes will come then it will directly effect to your database as
SCD1.
11.Date Dimension Using Talend
So, in this post we will demonstrate the same implementation using Talend Open
Studio.
COMPONENTS TAKEN:-
1)Oracle Db(tOracleConnection)
2)tMap(for data manipulation)
3)tConvertType(to change the data type)
4)tLogRow(Displaying Result)
COMPLETE JOB DESCRIPTION:-
job_info
Now,
In tOracleInput component define two columns as a schema both must be of data type
and in sql editor write the query given below.
“select sysdate+level,trunc(sysdate+level,’q’) from dual connect by level < 366″
first column will return you data serially from the current date and second column
will give you first date for every quarter.
So, this is how you can implement a basic date-time dimension. Beyond this you can
alter this current implementation as per your requirement.
12.Dynamic Column Ordering Of Source File Using Talend
In of our last post we demonstrate how to handle dynamic column ordering from
source file using Informatica and finally generating a defined output. So, In this
post we will show you how to implement the same functionality using Talend Open
Studio,
OVERALL JOB DESC:-
job desc
First, of all create some context variables which will help you to build up the
complete job as given in the screen shot.
first variable count is used for some process which i will tell you later during
the post, and produce/create a number of context variables as the number of columns
in your source file, i have three columns for demo so i have created three
variables as var1,var2,var3.
I have a source file which is a comma separated file, which is introduced to the
environment using tFileInputDelimited component.
Remenber to bring the data from first row only do not skip the first row.
Now in tJavaRow Component build a schema as given in the snapshot attached.
Finally use the code given below in the tJavaRow component to successfully build
the job.
*****************************CODE BEGINS*************************************
//used to perform checks for the first time on the basis of first row.
context.count=context.count+1;
if(context.count==1)
{
System.out.println(“in first if”);
//checks the position of column using first row coming from source file and assign
variable value if found similar logic will be applied for all the columns involved.
if(input_row.field1.equals(“ename”))
{
context.var1=1;
}
if(input_row.field2.equals(“ename”))
{
context.var2=1;
System.out.println(“ename var2″);
}
if(input_row.field3.equals(“ename”))
{
context.var3=1;
}
if(input_row.field1.equals(“deptno”))
{
context.var1=2;
}
if(input_row.field2.equals(“deptno”))
{
context.var2=2;
}
if(input_row.field3.equals(“deptno”))
{
context.var3=2;
}
if(input_row.field1.equals(“salary”))
{
context.var1=3;
}
if(input_row.field2.equals(“salary”))
{
context.var2=3;
}
if(input_row.field3.equals(“salary”))
{
context.var3=3;
}
}
//at this stage your mapping variables already know which column contains which
data so use all the mapping variables and forward the data to the defined schema
columns.
if(context.var1.equals(1)){ output_row.ename = input_row.field1;}
if(context.var2.equals(1)){ output_row.ename = input_row.field1;}
if(context.var3.equals(1)){ output_row.ename = input_row.field1;}
if(context.var1.equals(2)){ output_row.deptno = input_row.field2;}
if(context.var2.equals(2)){ output_row.deptno = input_row.field2;}
if(context.var3.equals(2)){ output_row.deptno = input_row.field2;}
if(context.var1.equals(3)){ output_row.salary = input_row.field3;}
if(context.var2.equals(3)){ output_row.salary = input_row.field3;}
if(context.var3.equals(3)){ output_row.salary = input_row.field3;}
*Take all your source data as a string data type and later on using tConvertTyoe
change the schema data type def.
******************************CODE ENDS***************************************
So, this is one of the way you can handle dynamic column ordering of your source
file
13.Incremental Load Using Talend
Incremental Load is way of loading your target tables in data warehouse such that
new or updated data from your source system will only affect your target system.
It’s pretty similar with your novel reading, let’s say first day you read some
pages then on other day you will start reading from the point you left and so on.
So, today we will discuss one of the implementation technique for incremental
loading using Talend Open Studio.
Overall Job Desc:-
overall_structure
Execution Steps:
1. Create two context variables one will hold the directory structure of your
parameter file(file will store your last run date info) specify some default value
which you can further change using property file while deployment and the other
variable is used within the job to hold the last run status. Load the last run
context variable using tContextLoad.
2. Create a sql query which will hit your source database and pull down data as per
your needs. In my case query is with condition tdate < sysdate and
tdate>'”+context. Last_Load_Date+”‘”.
As in most of the cases reporting is done till the previous day data, tdate is a
column which holds the record’s updated/insert date status so we are pulling data
as per the given condition.
Total Data Fetched = Data Greater than Last_Time_Run + Data Less then current_date.
3.Simple update your parameter file that is holding your previous run date with
current_date-1.
“select ‘Last_Load_Date;’|| to_char(sysdate-1,’dd-mon-yyyy’) as prev_day from dual”
PROBLEM: -
It happens to most of the times that your business logic remains the same but your
parameters such as file name, db name, passwords e.t.c. changes frequently, and to
make the things workout either you have to make changes in the Default.properties
or you have to open the Talend Studio and make explicit changes to run the job.
SOLUTION :-
Talend provides a simple solution by providing a context prompt option to your
variables thus every time you run your job a popup-window will appear and there u
just need to simply put your values.
3)Go to Advance Setting Tab of oracle output component and change the settings as
defined in the screenshot.
tAggregateRow is most useful component in Talend open studio. It works on SQL Group
by features. It receives a flow and aggregates it based on one or more columns. It
is widely used to perform operation on common SQL aggregate functions like min, max
sum etc. After aggregation, tAggregateRow provide output for each line.
Before jumping to Talend tAggregateRow component in Talend, let me explain you
aggregate function in SQL.
Aggregate function always returns a single result based on groups of rows. It
generally used with SELECT statement in SQL queries. For example if we want to get
average of the maximum salary of all department in given table, we can use this SQL
query
SELECT AVG (MAX (salary)) as AVG_SALARY FROM employees GROUP BY department_id;
AVG_SALARY
----------------
18929.33
Talend Open Studio component (tAggregatedRow) performs same calculation and return
same expected result. Some common aggregate function commonly used in Talend are
min, max, avg, sum, first, last, list, count etc.
Talend provide two components for above data aggregation.
• tAggregateRow
• tAggregateSortedRow
The differences between tAggregateRow and tAggregateSortedRow are
tAggregateRow
It can’t sort the value based on any key field, simply return aggregated value
All standard SQL aggregate function applied in this component
Sorting of data either ascending or descending could be achieved by tSortRow
tAggregateSortedRow
This component sort the data based on key field along with aggregation
It also gives facility to provide “Input rows count” where we can pass number that
represent how many row we want to aggregate.
Below screenshot explain how to use tAggregateRow in Talend
Here is the list of records available in my input file. Based on the first column
“Grappe”, I need to aggregate data from this list and try to find
• List of all “Name of BTS” for a particular group.
• Total number of “degree of BTS” in a particular group.
You can double click on first component (inputfile in above screen) and set its
property and schema. Please click on Edit Schema to add schema of input file.
Now you need to set properties for tAggregateRow, simply double click and set it as
per above screenshot.
You need to select “schema” as “Built – In”
For “Group by” you need to provide column name by which you want to grouping.
Here I have applied two aggregate function (count, list) in input file and
generated same output column.
Your job is finally ready, after clicking on run button in tool bar, you have above
output.
In output screen you can see list of column values and their count, but if you
closely have a look into output your data is not sorted.
If you want your result in sorted way, you can use tSortRow in above code, or you
can also used tAggregateSortedRow
Now let me create another sub job with tAggregateSortedRow and combine in main job.
This component will show you the differences between tAggregateRow &
tAggregateSortedRow.
Follow the instruction according to above screenshot. Double click and locate your
input file and add schema by clicking on Edit schema button. Combine all three
components via Main link.
Add tAggregatedSortedRow from component pane and set its property similar to
tAggregateRow.
For “Group by” you need to add column from input file based on that you need for
grouping.
For “Input row count” you need to set number of columns needs to be sorted. You can
use your total number of row value here.
After setting properties for tAggregateSortedRow you need to make it sub job of
main above tAggregateRow job, so that the sub job only executed after completion of
main job.
Now everything is complete. You need to execute this complete job and compare both
the the output.
You will found that the second sub job has sorted value based on “Group by” field
for tAggregateSortedRow.
2. How to resume job execution from same location if job get failed in Talend
Sometime in Talend we need to execute multiple sub jobs in parallel. It means that
while executing one job, Talend will start another sub job without interrupting the
previous running job. This is something what known as parallel execution in Talend
and same concept known as multithreading in Java language.
Before jump to main article let me briefly explain what is multithreading in Java.
1. New: This is the case where you just created instance of thread class but it’s
not started.
2. Runnable: This is the state where a thread or sub program is executing in Java
Virtual Machine (JVM), but the thread scheduler has not selected it to be the
running state. You can simply say that it is ready to run.
3. Running: This is the state where thread scheduler has selected it for running
state.
4. Blocked: This is the state where thread is still alive but currently not
eligible to run.
5. Time waiting: This is the state where a thread is waiting to execute another
thread in specified time.
6. Terminated: A thread that has exited from memory.
The above theory I just explained you only for your knowledge that how Talend work
in multithreading environment and how it handle multiple sub jobs in their
execution.
But you should not worry about above technical details. You can simply follow
screenshot to enable multithreading or parallel execution of sub jobs in Talend
Open Studio.
Note: Make sure that your all sub jobs should be independent to each other, so that
it will not affect to other part of your job.
I simply used three components tRowGenerator, tLogRow, tJavaRow and connected via
main link.
tRowGenerator – basically used to generate random row for testing purpose. I said
it to generate 5 dummy rows with three columns each row for my first job.
Double click on tRowGenerator and click on (+) button to add column name in schema
and you can also assign its value under “function. You can set parameters for
function below the schema tab. Have a look into the screenshot
Once you define your columns name and its length and other parameters, make sure
that you have entered value for “Number of Rows for RowGenerator”. This is the
number which will decide to generate numbers of row. For above example, I am
generating only 5 rows. Click on Ok to complete it.
Map it with tLogRow to display all generated record on the screen. After that
connect it with tJavaRow to display custom message. I simply wrote
System.out.println("Executing 1st Job");
Here is the screenshot.
Repeat same sub job block for three times to make three independent sub jobs.
Make sure that for every tJavaRow you should write different message
For example I have written below 3 different messages for tJavaRow
tJavaRow_1 System.out.println("Executing 1st Job");
tJavaRow_3 System.out.println("Executing 2nd Job");
tJavaRow_4 System.out.println("Executing 3rd Job");
Finally my one main job with three different sub-jobs is ready to execute.
Now click execute this job and you will see the result, its execute in sequential
(first job then second job then third job).
Now let me enable the parallel execution for all three sub jobs.
Go to right bottom corner of Talend Open Studio and click on Parallel job execution
button
Once you clicked on this button it open Parallel Job Execution window as below
Finally threading is applied in your job. This exercise will execute all three jobs
independently based on java multithreading concept.
Now save your job and click run to execute it.
Here is your output after execution
If you not able to see Parallel job execution button at the bottom right corner in
your Talend Open Studio, go to Window in menu bar then click on show view then
click on Talend then select Job.After doing this exercise you will be able to show
button on bottom right corner.
tFileList component in Talend Open studio is used for listing of all files and
directories. You can iterate it and get list of all files, directory from current
folder as well as sub directory.
As usual, I am goring to read configuration setting from an xml file and store into
a global variable. I shall use it to connect to the desired folder.
Here is content written in my config file “Config.xml” and store in my local drive.
<ServerConfigration>
<!-- Input file configuration -->
<arg name="InputFilePath">C:/Config_Files/input_files</arg>
<arg name="LogPath">C:/logs</arg>
</ServerConfigration>
tPrejob : It’s always a good practice to start our job from tPrejob. Once tPrejob
component is ok I have connected to tFileInputXML.
tFileInputXML : It is an input component and used to map xml file. Double click on
tFileInputXMLto set its properties setting
Note: you can click on “Edit schema” to add columns, available in your config file
and map them using xPath query.
tSetGlobalVar : It is used to store global values for your next component. Double
click ontSetGlobalVar and provide details.
Here you just created two global variables. It will widely accessible in your
entire Talend application by pre-defined Talend method globalMap.get()
So you just finished your configuration part of your job. Now let me jump to file
iteration part and list me all files & folder name.
File Iteration
1. tFileList
2. tIterateToFlow
3. tLogRow
tFileList: It contain list of all files and directories. It iterate on a set of
files from a particular directory. This component comes with lots of useful
properties. You can set it as per your needs
Directory (String)globalMap.get(“InputPath”)
File Type 1. Both (List all files and all folders)
2. Directories (List only for folders)
3. Files (List only files)
Include subdirectories Check if you want to iterate in child directory.
Files You can decide which type of file you want to iterate.
You can add more file extension by simply clicking on (+) button
Order by You can decide which file iterate first.
Oder action Either file iterates will asc or desc.
For file operation, Talend gives some predefined constant for tFileList
tIterateToFlow component convert each iteration into an input flow, so that input
flow can be stored some place to get it list.
In this example tIterateToFlow will receive each iteration and convert into a flow
and we can write this flow into a file or store into a log.
Double click on tIterateToFlow then click on Edit schema to add column name.
Now you have successfully designed your job, this job will return all csv file, as
we have used .csv for a mask for tFileList component above.
You can also download this example at end of article. If you feel any issue to run
this job, don’t hesitate to contact me, I am sure I shall try to help you.
What is the difference between the ETL and ELT components of Talend Open Studio?
How does one deploy Talend projects?
What are the elements of a Talend project?
What is the most current version of Talend Open Studio?
How do you implement versioning for Talend jobs?
What is the tMap component?
What is the difference between the tMap and tJoin components?
Which *component* is used to sort data?
Revision History
1.0 04-10-2014 Initial Development
1.1 07-10-2014 Modification to Source and Target repository Schema
1.2 09-10-2014 Modification to transformation logic.
8. Provide Sub Job title for every sub job to describe the sub job
purpose/objective.
9. Avoid Hard Coding in Talend Job component. Instead use Talend context variables.
Context Group will allow you to use the same context variables in any number of
jobs without having to create again and assign value again to them. Imagine your
Project requires 20 context variables and there are 10 jobs that require those
context variables. Without context groups it will be very difficult to create those
context variables again and again in every job.
You can create different context groups for different functionality of variables.
For example, you can have different context group for database parameters , SMTP
params and SFTP params etc.
Click on links below to know more about context variables and context groups:
1. Understand Context Variables Part 1 (Context Variables, Context groups)
2. Understand Context Variables Part 2 ( Define context variables in Repository,
which can be made available to multiple jobs)
3. Understand Context Variables Part 3 (Populate the values of context variables
from file. tContextLoad)
4. How to Pass Context Variables to Child Jobs.
5. How to Pass context Variables/ Parameters through command line.
11. Use Talend.properties file to provide the values to context variables using
tContextLoad.
Always provide the value of context variables either through database table or
through Talend.properties file. Below is sample of Talend.properties file.
Click hereto understand How to Populate the values of context variables from
fileusing tContextLoad component.
12. Create Variables in tMap and use the variables to assign the values to target
fields.
For multiple use of single expression or for using the same mapping for multiple
target fields, it is always good to create a variable in tMap and assign the value
of that variable in target fields. It will allow to only evaluating the expression
once for multiple number of times.
Divide the Talend Job to multiple sub jobs for easy maintainability. First create a
subjob and then test it and then move to next sub job.
15. Always Exit Talend open studio before shutting down the PC.
Talend workspace may get corrupted sometimes, if you shutdown your machine before
exiting Talend Open Studio. So always exit Talend before shutting down PC.