0% found this document useful (0 votes)

1K views8 pages

Difference Between Lookup Join and Merge Stage

The document discusses three DataStage stages - Lookup, Join, and Merge - that can join tables based on key columns. The Lookup stage is used for small reference datasets and validating rows. The Join stage is best for large tables, outer joins, and joining multiple tables. The Merge stage is used when multiple update and reject links are needed, such as combining a master dataset with one or more update datasets. The document then provides details and development examples for each stage.

Uploaded by

Jesse Kota

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1K views8 pages

Difference Between Lookup Join and Merge Stage

Uploaded by

Jesse Kota

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

DataStage: Join vs Lookup vs Merge

DataStage has three processing stages that can join tables based on the values of key columns:
Lookup, Join and Merge. In this post, we discuss when to choose which stage, the difference
between these stages, and development references when we use those stages.

Use the Lookup stage when:

 Having a small reference dataset.

 Validating a row (If there is no corresponding entry in a lookup table to the key’s values,
you can output the row in the reject link).

Use the Join stage when:

 Joining large tables.

 Doing outer joins (left, right, full outer).
 Joining multiple tables with the same keys.

Use the Merge stage when:

 Multiple update and reject links are needed (e.g. Combining a master data set with one
or more update datasets)

Let’s discuss each stage in details.

Lookup Stage

Key Points

 The Lookup stage has a reference link, a single input link, a single output link and a
single rejects link.
 It does not required data on the input link or reference link to be sorted.
 Lookup stage is a in-memory processing stage. Large look up table will result in the job
failure if DataStage engine server runs out of memory.
 The Key column names in main and lookup tables do not need to be the same as you
map them in the stage.
 Make sure to select the right Lookup Stage Conditions (see Example step 3).

Development Reference
In this example, we will add employees’ information to the sales record by joining two table by
the key columns, Empl_Id

(1) Map the key column and map the output in the Lookup stage.

(2) Select Lookup Stage Conditions to specify the actions when Lookup condition is not met and
Lookup fails.

There are 4 options: Continue, Drop, Fail and Reject.

 Continue: When the lookup table does not have the value appears in the main table, it
will assign null values to the lookup table columns. In another word, this option works like
Left Join.
 Drop: When the lookup table does not have the value appears in the main table, it will
drop the row all together. In another word, this option works like Inner Join.
 Fail: When the lookup table does not have the value appears in the main table, the job
will fail. This is the default option for the Lookup stage.
 Drop: When the lookup table does not have the value appears in the main table, it will
output to the reject output (as in this example).

(3) Make sure you have the correct link order.

(4) Input partitioning usually works with ‘Auto’.

Join Stage

Key Points

 The key columns must be the same name between tables.

 It can have multiple input links (as long as table has the same key columns between
them) and a single output link.
 The performance of Join can be improved by key-sorting data on input links (‘Auto’
partitioning mode is usually fine).
 If the reference dataset is small enough to fit in RAM, it is faster to use Lookup.
 There are four join options: inner join, left outer join, right outer join and full outer join.
 We need to make sure input links are in the right order. This can be set from Stage ->
Link Ordering.

Development Reference
In this example, we join Employee and Products tables to Sales_Records based on Empl_Id and
Product_Id. Then, calculate the revenue by multiplying the price column from Products by the
number of units sold.

(1) In each join stage, make sure to choose join key and type (Left outer, right outer, full outer,
etc).

(2) Make sure the link order is correct.

(3) Partition can be ‘Auto’.

(4) Transformer Stage to calculate revenue by multiplying Unit_Price by Units. Note that the data
type for Units is integer and Unit_Price is double. Therefore, set the Revenue’s data type as
double.

Merge Stage

Key Points

 The Merge stage can have any number of input links, single output links and the same
number of reject output links as the update input links.
 A master record and an update record are merged only if both of them have the same
values for the specified merged key. In another word, merge stage does not do range
lookup.
 To minimise memory requirements, we can ensure that rows with the same key column
values are located in the same partition and is processed in the same node by
partitioning. However, the ‘auto’ option for partitioning usually works fine.
 As part of preprocessing, duplicate records need to be removed from the master. If there
are more than one update data sets, it only updates the first record as below.

Development Reference

In this example, updating Master_Sales_Records with employee information from 2 reference

Employee tables.

(1) Merge stage has only 3 options, Unmatched Master Mode, Warn On Reject Updates and
Warn On Unmatched Master. All the tables must have the same column names for the merge
keys.

(2) Configure input and output links. Map them to the right link order.
Reference Datasets

Sales_Records

Employee

Joined
Difference Between Normal Lookup and Sparse Lookup

Normal Lookup:-

Normal Lookup data needs to be in memory

Normal might provide poor performance if the reference data is huge as it has to put
all the data in memory.

Normal Lookup can have more than one reference link.

Normal lookup can be used with any database

Sparse Lookup:-

Sparse Lookup directly hits the database.

If the input stream data is less and reference data is more like 1:100 or more in
such cases sparse lookup is better.

Sparse Lookup, we can only have one reference link.

Sparse lookup, we can only use for Oracle and DB2.

Sparse lookup sends individual sql statements for every incoming row.(Imagine if
the reference data is huge).

This Lookup type option can be found in Oracle or DB2 stages. Default is Normal.

Articulation
100% (1)
Articulation
26 pages
HSE - General Awareness - Environmental Management - Completion - Certificate
30% (43)
HSE - General Awareness - Environmental Management - Completion - Certificate
1 page
It Is Always A
No ratings yet
It Is Always A
32 pages
Health and Safety at TCS - Ireland - Quiz - 13 - 17 - 43 (GMT +0530)
0% (4)
Health and Safety at TCS - Ireland - Quiz - 13 - 17 - 43 (GMT +0530)
1 page
Process Question Bank
No ratings yet
Process Question Bank
23 pages
Untitled Document
No ratings yet
Untitled Document
4 pages
80730AE: Development Basics For Microsoft Dynamics AX - Test Your Knowledge Questions and Answers
No ratings yet
80730AE: Development Basics For Microsoft Dynamics AX - Test Your Knowledge Questions and Answers
56 pages
Business Skill 2 Poll Questions
No ratings yet
Business Skill 2 Poll Questions
23 pages
Process Question Bank
No ratings yet
Process Question Bank
23 pages
T13 Answers Ion PDF
No ratings yet
T13 Answers Ion PDF
20 pages
T15 Hand-On Solution Id 80827
No ratings yet
T15 Hand-On Solution Id 80827
2 pages
Biz Skill Track 2
No ratings yet
Biz Skill Track 2
13 pages
Articulation PDF
No ratings yet
Articulation PDF
25 pages
M Jae JMG THV Eixg TBWYEo 5 T
No ratings yet
M Jae JMG THV Eixg TBWYEo 5 T
19 pages
WINGS 1 Business Skill 2 Previous Cycle Test Series
No ratings yet
WINGS 1 Business Skill 2 Previous Cycle Test Series
23 pages
Untitled Document
No ratings yet
Untitled Document
3 pages
Geoinformatics in Theory and Practice
No ratings yet
Geoinformatics in Theory and Practice
528 pages
10pages Already Uploaded But It Is Clear
No ratings yet
10pages Already Uploaded But It Is Clear
12 pages
ITIL - Session 03-04 - Service Strategy
No ratings yet
ITIL - Session 03-04 - Service Strategy
54 pages
Regarding Genaretes in The Process of Estimate Regret Concern Despite Estimates Think Capture
No ratings yet
Regarding Genaretes in The Process of Estimate Regret Concern Despite Estimates Think Capture
3 pages
TCS Wings1 Articulation Revision Summary
No ratings yet
TCS Wings1 Articulation Revision Summary
3 pages
Year Gap Affidavit Format For TCS
No ratings yet
Year Gap Affidavit Format For TCS
1 page
Tech Guru
No ratings yet
Tech Guru
8 pages
Milestone - Coding - Python - Cu
No ratings yet
Milestone - Coding - Python - Cu
3 pages
E 2
No ratings yet
E 2
5 pages
Calculates Totals or Other Aggregate Functions For Each Group. The Summed Totals For Each Group Are Output From The Stage Thro' Output Link
100% (1)
Calculates Totals or Other Aggregate Functions For Each Group. The Summed Totals For Each Group Are Output From The Stage Thro' Output Link
106 pages
Servlets Mock Test I
No ratings yet
Servlets Mock Test I
11 pages
PDF Itil v3 Mock Exam Day 21 Compress
No ratings yet
PDF Itil v3 Mock Exam Day 21 Compress
6 pages
Curso 80302
No ratings yet
Curso 80302
6 pages
PDF Course Id 51803 Rio Application Operation Competency - Compress
No ratings yet
PDF Course Id 51803 Rio Application Operation Competency - Compress
9 pages
Self Assessment Questions & Answers
0% (1)
Self Assessment Questions & Answers
6 pages
Business Skills Track 2 Complete Notes PDF
100% (1)
Business Skills Track 2 Complete Notes PDF
31 pages
WINGS 1 Business Skill 2 Mini Mock
No ratings yet
WINGS 1 Business Skill 2 Mini Mock
4 pages
Day 2
No ratings yet
Day 2
5 pages
Articulation Dump
No ratings yet
Articulation Dump
98 pages
DataStage Theory Part
No ratings yet
DataStage Theory Part
18 pages
DataStage Theory Part
No ratings yet
DataStage Theory Part
18 pages
Articulation
No ratings yet
Articulation
6 pages
Compprschool 210217124246
100% (1)
Compprschool 210217124246
25 pages
Exam: 1Z0-931 1Z0-931-F: NO.1 A. B. C. D. E
No ratings yet
Exam: 1Z0-931 1Z0-931-F: NO.1 A. B. C. D. E
15 pages
Application Operations Team Member - Quiz - Completion - Certificate
No ratings yet
Application Operations Team Member - Quiz - Completion - Certificate
1 page
2
No ratings yet
2
16 pages
Isecurity Quiz Answers
No ratings yet
Isecurity Quiz Answers
4 pages
Java 8 Innards Final Quiz
No ratings yet
Java 8 Innards Final Quiz
3 pages
Advance Database Management System
No ratings yet
Advance Database Management System
22 pages
Fabric Set Java
No ratings yet
Fabric Set Java
2 pages
Some FAQ For New ILP Trainee
No ratings yet
Some FAQ For New ILP Trainee
88 pages
Datastage Scenarios Doc1
No ratings yet
Datastage Scenarios Doc1
52 pages
Business Skill Track 2 Syllabus
No ratings yet
Business Skill Track 2 Syllabus
2 pages
Ferrari Alberto Russo Marco Ferrari Alberto Analyzing Data With Power BI and Power Pivot For
No ratings yet
Ferrari Alberto Russo Marco Ferrari Alberto Analyzing Data With Power BI and Power Pivot For
412 pages
T1M4 - 13 - 46 - 10 (GMT +0530)
No ratings yet
T1M4 - 13 - 46 - 10 (GMT +0530)
1 page
Agents Companion v2
100% (1)
Agents Companion v2
76 pages
Change Datatypes and Return Required Json Data
No ratings yet
Change Datatypes and Return Required Json Data
1 page
CDBM Mod02 Answers
No ratings yet
CDBM Mod02 Answers
22 pages
Datastage Interview Questions
No ratings yet
Datastage Interview Questions
11 pages
Digital - Python
No ratings yet
Digital - Python
5 pages
TCS Exam SDLM Answers
No ratings yet
TCS Exam SDLM Answers
3 pages
JIRA Respuestas
No ratings yet
JIRA Respuestas
4 pages
Rac DBA Resume
No ratings yet
Rac DBA Resume
4 pages
Anti Bribery and Corruption Training - Part 2 - Completion - Certificate
No ratings yet
Anti Bribery and Corruption Training - Part 2 - Completion - Certificate
1 page
Roleplay
No ratings yet
Roleplay
2 pages
New Notes 4pages
No ratings yet
New Notes 4pages
4 pages
TCS Helath Insurance - Domiciliary Claim Reimbursement Guidelines
No ratings yet
TCS Helath Insurance - Domiciliary Claim Reimbursement Guidelines
1 page
17818
No ratings yet
17818
2 pages
Open Foris Saiku Manual November2018
100% (1)
Open Foris Saiku Manual November2018
15 pages
Resume 51914
No ratings yet
Resume 51914
2 pages
Tcs Ilp Dbms and SQL Assg1
No ratings yet
Tcs Ilp Dbms and SQL Assg1
5 pages
C-Some More Stages
No ratings yet
C-Some More Stages
25 pages
S. No. Roll NO Name Project Title: B R Krishna Kokiligada
No ratings yet
S. No. Roll NO Name Project Title: B R Krishna Kokiligada
8 pages
Difference Between Join Stage and Look Up Stage in Datastage
No ratings yet
Difference Between Join Stage and Look Up Stage in Datastage
13 pages
Pdf-Sapdocx Compress
No ratings yet
Pdf-Sapdocx Compress
6 pages
Car Price Prediction Using Machine Learning Techniques
100% (1)
Car Price Prediction Using Machine Learning Techniques
6 pages
Join Stage
No ratings yet
Join Stage
14 pages
B.Tech CSE 4th Year - Revised1
No ratings yet
B.Tech CSE 4th Year - Revised1
35 pages
Chapter 9: Transactions: Modified From: Database System Concepts, 6 Ed
No ratings yet
Chapter 9: Transactions: Modified From: Database System Concepts, 6 Ed
55 pages
Akash Thumma Resumed
No ratings yet
Akash Thumma Resumed
1 page
Format of Final Year Project Report
No ratings yet
Format of Final Year Project Report
9 pages
Chapter 3
No ratings yet
Chapter 3
44 pages
Information Analizer User Guide
No ratings yet
Information Analizer User Guide
431 pages
Ia KT
No ratings yet
Ia KT
1 page
Hive 1
No ratings yet
Hive 1
7 pages
Theory Question For 504 A
No ratings yet
Theory Question For 504 A
2 pages
Practical Vulnerability Management A Strategic Approach To Managing Cyber Risk 1st Edition Andrew Magnusson
100% (1)
Practical Vulnerability Management A Strategic Approach To Managing Cyber Risk 1st Edition Andrew Magnusson
59 pages
Advanced SQL Practice Queries
No ratings yet
Advanced SQL Practice Queries
3 pages
Thesis
No ratings yet
Thesis
80 pages
Oracle Identity Governance 12c
No ratings yet
Oracle Identity Governance 12c
3 pages
Big Data Technologie
No ratings yet
Big Data Technologie
36 pages
Taxsaver
No ratings yet
Taxsaver
2 pages
DBMS Interview Questions by Company
No ratings yet
DBMS Interview Questions by Company
15 pages
Enterprise Systems Emerging Technologies and The Data-Driven Knowledge Organisation
No ratings yet
Enterprise Systems Emerging Technologies and The Data-Driven Knowledge Organisation
14 pages
ITC8024 Cloud Computing & Service: Syllabus & Question Bank
No ratings yet
ITC8024 Cloud Computing & Service: Syllabus & Question Bank
2 pages
DS Interview Questions
No ratings yet
DS Interview Questions
5 pages
Ror 21
No ratings yet
Ror 21
2 pages
Backend SDE Intern Assignment
No ratings yet
Backend SDE Intern Assignment
3 pages
Cloud Digital Leader v1.0 (Cloud Digital Leader) : Question 26 (Single Topic)
No ratings yet
Cloud Digital Leader v1.0 (Cloud Digital Leader) : Question 26 (Single Topic)
2 pages

Difference Between Lookup Join and Merge Stage

Uploaded by

Difference Between Lookup Join and Merge Stage

Uploaded by

DataStage: Join vs Lookup vs Merge

Use the Lookup stage when:

 Having a small reference dataset.

Use the Join stage when:

 Joining large tables.

Use the Merge stage when:

Let’s discuss each stage in details.

There are 4 options: Continue, Drop, Fail and Reject.

(3) Make sure you have the correct link order.

(4) Input partitioning usually works with ‘Auto’.

 The key columns must be the same name between tables.

(2) Make sure the link order is correct.

(3) Partition can be ‘Auto’.

In this example, updating Master_Sales_Records with employee information from 2 reference

Normal Lookup data needs to be in memory

Normal Lookup can have more than one reference link.

Normal lookup can be used with any database

Sparse Lookup directly hits the database.

Sparse Lookup, we can only have one reference link.

Sparse lookup, we can only use for Oracle and DB2.

You might also like