0% found this document useful (0 votes)
103 views13 pages

TD Join Strategies 1

The document discusses Teradata join strategies, including merge, nested, row hash, and product joins. It focuses on explaining the four different merge join strategies: 1. Using the primary index columns of both tables in the join condition 2. Joining on the primary index of one table and a non-primary indexed column of the other 3. Neither table joined on a primary index, so both tables are redistributed into spool 4. If one table is small, it may be duplicated across all AMPs (big table - small table join) Nested joins use a unique index to retrieve a single row and match it to potentially multiple rows in another table. Hash joins require one

Uploaded by

K Vinay Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
103 views13 pages

TD Join Strategies 1

The document discusses Teradata join strategies, including merge, nested, row hash, and product joins. It focuses on explaining the four different merge join strategies: 1. Using the primary index columns of both tables in the join condition 2. Joining on the primary index of one table and a non-primary indexed column of the other 3. Neither table joined on a primary index, so both tables are redistributed into spool 4. If one table is small, it may be duplicated across all AMPs (big table - small table join) Nested joins use a unique index to retrieve a single row and match it to potentially multiple rows in another table. Hash joins require one

Uploaded by

K Vinay Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Teradata Joining Strategies

Teradata Join Strategies are utilized by the optimizer to choose the least cost plan and
better performance. The strategy will be chosen based on the available information to the
Optimizer like Table size, PI information, Stats Information. 

Teradata join strategies are following:


       Merge (Exclusion)
       Nested
       Row Hash
       Product (including Cartesian Product joins)

Merge Join Strategies


There are four different merge join strategies based on the redistribution. Below details
have all these information.

Merge Join Strategy 1


The 1st merge join will utilize the Primary Index on both tables in the join Condition.
The key here is that both of the Primary Index columns of each table are used in the
WHERE or ON clause in the join type.

The inner join above focuses on returning all rows when there is a match between the two
tables. The ON clause is extremely important because this join establishes the join
(equality) condition. Think of this as if you were repairing a deck and you required a certain
kind of wood in order to make sure you had a match. You would go down to the local home
improvement store to locate the right pieces of wood to repair the deck. A match would
definitely not be cedar if your deck required pressured pine.
To investigate further, each matching row is joined where Emp = Emp which is stated from
the ON Clause in the JOIN. If we analyze further, EMP is the Primary Index for both tables.
This first merge join type is extremely efficient because both columns in the ON clause are
the Primary Indexes of their respective tables. When this occurs, NO data has to be moved
into spool and the joins can be performed in what is called AMP LOCAL. Teradata can
perform this join with rapid speed. Remember that the less data that has to be moved to
complete a join the better the performance will be achieved. The end result of this join is
that the rows from each table are physically joined together to form one row.

Merge Join Strategy 2


The previous join strategy is the best join scenario because it focuses on utilizing the
Primary Index Column in both tables for the equality (ON) condition. The next best scenario
is when a join is performed on a Primary Index column of one table to a non-primary
indexed column of another table.

In this example, two tables are being joined based on the DEPT column. In the department
table, the Primary Index column is DEPT. As we know, this is a good match based on the
equality (ON) condition. However, the employee table has EMP as the Primary Index
column. Which table do you think will move based on this join?
Regardless of the equality condition, the primary objective is to bring the rows together
from each table on the same AMPs. There are several options that the Teradata Optimizer
could choose in order to complete this task. The first option is to duplicate the smaller table
on all AMPs. The second option could be to leave the department table that has an equality
condition match on the Primary Index Column stationary on the AMP. The next step would
be to move the rows from the Employee table into spool. This would be accomplished by
hashing (locating) the columns in the employee table, and then moving these rows into
spool to the appropriate AMPs where the department table rows reside. What has occurred
here is that the stationary table is already sorted by the hash code and the spool table will
be re-sorted by hash code in spool to the appropriate matching AMPs.
Clearly the second option is an excellent choice for the Teradata Optimizer because it
reduces the amount of resources necessary to complete the join and improve performance.

Merge Join Strategy 3


The third scenario is where neither table is being joined on the Primary Index of either
table. In this case Teradata will redistribute both tables into spool and sort them by hash
code. When we want to redistribute and sort by hash code we merely hash the non-primary
index columns and move them to the AMPs spool where they are sorted by hash. Once this
is accomplished, then the appropriate rows are together in spool on all the AMPs. It is now
join time!

In the previous example, the columns in the join equality are MgrEmp = MgrNo. The
Primary Index of the department table is DEPT and the Primary Index for the manager table
is LOC. In this case, both columns being utilized in this join equality are not part of the
Primary Index columns. So what strategy will Teradata take to resolve this join operation?
Basically rows from both tables will need to be rehashed and redistributed into SPOOL.
The reason is because neither columns selected in the ON Clause are the Primary Index of
the respective tables. Therefore, both tables are redistributed based on the ON clause
columns. 
The next step in this process is to redistributed the rows and locate them to the matching
AMPs. When this is completed, the rows from both tables will be located in two different
spools. Lastly, the rows in each spool will be joined together to bring back the matching
rows. This type of join strategy is extremely inefficient. It consumes a ton of resources and
time to manage and assemble this type of join.

Merge Join Strategy 4


The fourth merge join strategy is called the big table - small table join. If one of the tables
being joined is small, then Teradata may choose a strategy that will duplicate the smaller
table across all the AMPs. The key about this strategy is that regardless if the table is part
of the Primary Index Column or not Teradata could still choose to duplicate the table across
all the AMPs.

In this inner join above, the two tables involved in the join are the Employee table and the
Department table. The DEPT column is the join equality that is making the match between
the two tables. The DEPT column is the Primary Index Column in the Department table. The
Employee table has the EMP column as the Primary Index. The final analysis of this join is
that the Department table is small and makes a good candidate for this type of join
strategy.
In order to join these two tables together, the first step is to get the rows together on the
same AMP. In this case, since the Department table is small, Teradata will choose to
duplicate the entire Department table on each AMP into spool. Once this is completed, then
the next step is for the AMPs to join the base Employee rows with the Department rows.

`
Instead of redistributing the larger Employee table, which is not part of the Primary Index
Column in the equality (ON) condition, Teradata will choose a more efficient strategy. This
strategy would be to duplicate the smaller table across all the AMPs (Big Table -Small Table
Join). This merge join strategy will consume minimal resources, and allow for Teradata to
excel.

Nested Join
A nested join strategy is probably the most precise join available. This join is designed to
utilize a unique index type (Either Unique Primary Index or Unique Secondary Index) from
one of the tables in the join statement in order to retrieves a single row. It then matches
that row to one or more rows on the other table being used in the join.
From the example above, the nested join has the join equality (ON) condition based on
the DEPT column. The dept column is the Primary Index Column on the department table.
In addition, the dept column is the Secondary Index Column in the employee table. Based
on this information above, which rows will move?
Keep in mind that the nested join prides itself on being able to move a single row into spool
and then matching that row with another table that contains several matches. How is this
done? Analysis of this join statement indicates a new clause has been added to this join
statement. This is known as the WHERE option. When utilized, the WHERE option allows for
a single row to be retrieved from a table. In addition, a nested join will always use a unique
index to isolate that single record and then join that record to another table. The other table
may use an index or it may not. However, the best practice is to always use columns that
have indexes when doing joins. Teradata has superior knowledge on indexed columns and
can utilize this information to choose an aggressive strategy to complete a join. Utilization
of indexes in join statements will improve performance and utilize less resources as the
below diagram illustrates.
Since there is only one row in the department table that has a match for department =10,
which is based on the AND option in the join statement, the Teradata Optimizer will choose
a path to move the department table columns into spool and duplicate them across all the
AMP’s.
Once this is completed, then the matches will proceed with that single record (10 and
SALES) to the second table, which did not move from the base AMP. Nested Joins are great
in an OLTP Environment because of the usage of both Unique and Non-Unique Indexes. In
addition, a nested join can reduce the resources necessary to complete the join. Finally,
nested joins, which are similar to all the join strategies discussed, must have an equality
condition in the ON Clause (d.dept = e.dept).

Hash Join
The Hash Join is part of the Merge Join Family. Remember that the key to a Merge Join is
based on an equality condition such as E.Dept = D.Dept in the ON clause of the join
statement. A Hash Join can only take place if one or both of the tables on each AMP can fit
completely inside the AMP’s memory.
Hash Join Strategy

In this example, the Hash Join has a join equality (ON) condition based on the EMP and
MGREMP Columns. The key point here is the columns do not necessarily have to the same
name when doing a join operation. The columns names can be different but the row
information has to be similar in order for the match to work. Both the EMP and MGREMP
columns have the same type of information so therefore a join based on these column
names will be successful. In addition, EMP column is the Primary Index column on the
employee table. However, the MGREMP column is not an index column in the department
table. Based on this information above, which rows will move?
Remember that the key to determining a Hash Join is if the SMALLER TABLE can be held
completely in each AMP’s MEMORY.
The Hash Join process is where the smaller table is sorted by row hash and duplicated on
every AMP. The key here is that the smaller table is required to be held completely in each
AMP’s memory. Teradata will use the join column of the larger table in order to search for a
match. The row hash join is extremely efficient because it eliminates the sorting,
redistribution, and or copying of the larger table into spool. In addition, the rows that are
duplicated into the AMP’s memory yield increased performance because the rows never go
into spool. Rows that go into spool always have to involve disk activity. AMP memory does
not involve disk interaction, which automatically increases performance. Hash Joins and
Nested Joins are both Great in an OLTP Environment for these reasons.

Exclusion Join
All of the joins that we have reviewed up to this point were based on finding matching rows
based on a join equality condition. The returned rows from these types of joins compared
rows from both tables in a join and then returned rows that matched. In addition, these
joins were inclusive. When working with exclusion joins the thought process has to be
reversed. Exclusion Joins have one primary function. They exclude rows during a join. The
best example here is when I was out with my best friend and his two sons (Blake and
Zach). They were telling a story that sounded believable about their dog Freddie who
allegedly chased a neighbor’s cat down the street. When they completed the story, I asked
them if this was true…they said in unison “NOT”! Well the NOT statement works exactly the
same in Teradata. When you put a NOT in front of a statement it will give you the opposite
answer.

SELECT EMP, DEPT, NAME 


FROM 
EMPLOYEETABLE
WHERE DEPT=10 and 
EMP NOT IN (SELECT MGREMP from DEPTTABLE WHERE MGREMP IS NOT NULL )

As you can see in the above example and as has been discussed above this type of join
utilizes the NOT IN statement. Exclusion joins are used for finding rows that don’t have a
matching row in the other table. Queries with the NOT IN operator are the types of queries
that always result in exclusion joins. In this case, this query will find all the employees who
belong to department 10 who are NOT managers.
These joins will always involve a Full Table Scan because Teradata will need to compare
every record to eliminate rows that will need to be excluded. With this being stated, this
type of join can be resource intensive if the two tables in this comparison are large.
In addition, the biggest problem with the Exclusion Joins is when the NOT IN statement is
used. The reason for this is that NULLs are considered unknowns so the data returned in the
answer will be NULLs. There are two ways to correct this:
       Define NOT IN columns as NOT NULL on the CREATE TABLE.
       Add the “ AND WHERE Column IS NOT NULL” to the end of the JOIN as seen in the above
example.

Product Joins
What Makes Product Joins Different
Product Joins compare every row of one table to every row of another table. They are called
product joins because they are a product of the number of rows in table one multiplied by
the number of rows in table two. For example, if one table had five rows and the other table
had five rows then the Product Join would compare 5 x 5 or 25 rows with a potential of 25
rows coming back.

SELECT E.EMP,D.DEPT 
FROM EMPLOYEETABLE E,DEPTTABLE D
WHERE
EMP LIKE '_b%'
About 99% of the time, product joins are major mistakes. The recommendation is to avoid
these types of queries whenever possible. The reason is because all rows in both tables will
be compared. Remember, Teradata tables have the potential to contain millions of rows. If a
user accidentally writes a product join against two tables that have 1 million rows each. The
result set would return One Trillion Rows (1000000000000)! Needless to say this is a
mistake you want to make. So how do you avoid writing a product join?
To avoid a product join, check your syntax to ensure that the join is based on an EQUALITY
condition. In the join syntax example above, the equality statement reads “WHERE EMP Like
‘_b%’”. Because this clause is not based on a common domain condition between the two
tables (i.e., e.dept = d.dept), the result is a product join. Another cause of a product join is
when aliases are not used after being established. Finally check your join syntax to ensure
the WHERE clause is not missing.

Cartesian Product Join


Just as discussed with Product Joins above, a Cartesian Product Join is usually something
you want to avoid. If we decided to run this query in Michigan this would be call the big
“Mistake on the Lake”. A Cartesian Product Join will join every row in one table to every row
in another table. The only thing that decides the number of rows will be the total number of
rows from both tables. If one table had 5 rows and another had 10 rows then you will
always get 50 rows returned. Imagine this situation. if we have a table with 10 million rows
and another with 25 million rows and a Cartesian product join is written (by accident)
against these two tables. What will be the result? Well, based on the example above, you
will get back about 25 Trillion Rows (250000000000000)! This is definitely NOT the correct
answer this user would want. This is why spool space limitations are utilized.
SELECT E.EMP,D.DEPT 
FROM EMPLOYEETABLE E,DEPTTABLE D;
About 99% of the time, a Cartesian Product Join is a major problem. The recommendation is
avoid these types of queries whenever possible. The reason is because all rows in both
tables will be joined. So how do you avoid writing a Cartesian Product Join?
To avoid a Cartesian Product Join, check your syntax to ensure that the join is based on an
EQUALITY condition. In the join syntax example above, the WHERE clause is missing.
Because this clause is missing, a common domain condition between the two tables (i.e.,
e.dept = d.dept) does not exist, the result is a product join. Another cause of a product join
is when aliases are not used after being established.

Types of Teradata Joins


Teradata joins

When we join two or more tables on a column or set of columns, Joining takes place. This will result in
data resulting from matching records in both the tables. This Universal concept remains the same for all
the databases.

In Teradata, we have Optimizer (a very smart Interpreter), which determines type of join strategy to be
used based on user input taking performance factor in mind.

In Teradata, some of common join types are used like

- Inner join (can also be "self join" in some cases)

- Outer Join (Left, Right, Full)

- Cross join (Cartesian product join)

When User provides join query, optimizer will come up with join plans to perform joins. These Join
strategies include

- Merge Join

- Nested Join

- Hash Join

- Product join

- Exclusion Join
 Merge Join

--------------------

Merge join is a concept in which rows to be joined must be present in same AMP. If the rows to be joined
are not on the same AMP, Teradata will either redistribute the data or duplicate the data in spool to make
that happen based on row hash of the columns involved in the joins WHERE Clause.

            If two tables to be joined have same primary Index, then the records will be present in Same AMP
and Re-Distribution of records is not required.

There are four scenarios in which redistribution can happen for Merge Join

Case 1: If joining columns are on UPI = UPI, the records to be joined are present in Same AMP and
redistribution is not required. This is most efficient and fastest join strategy

Case 2: If joining columns are on UPI = Non Index column, the records in 2nd table has to be
redistributed on AMP's based on data corresponding to first table.

Case 3: If joining columns are on Non Index column = Non Index column , the both the tables are to be
redistributed so that matching data lies on same amp , so the join can happen on redistributed data.  This
strategy is time consuming since complete redistribution of both the tables takes across all the amps

Case 4: For join happening on Primary Index, If the Referenced table (second table in the join) is very
small, then this table is duplicated /copied on to every AMP.

 Nested Join

-------------------

Nested Join is one of the most precise join plans   suggested by Optimizer .Nested Join works on UPI/USI
used in Join statement and is used to retrieve the single row from first table . It then checks for one more
matching rows in second table based on being used in the join using an index (primary or secondary) and
returns the matching results.

Example:

Select EMP.Ename , DEP.Deptno, EMP.salary
from

EMPLOYEE EMP ,

DEPARTMENT DEP

Where EMP.Enum = DEP.Enum
and EMp.Enum= 2345;    -- this results in nested join

 Hash join

----------------

Hash join is one of the plans suggested by Optimizer based on joining conditions. We can say Hash Join
to be close relative of Merge based on its functionality. In case of merge join, joining would happen in
same amp.   In Hash Join, one or both tables which are on same amp are fit completely inside the AMP's
Memory   . Amp chooses to hold small tables in its memory for joins happening on ROW hash.

Advantages of Hash joins are

1. They are faster than Merge joins since the large table doesn’t need to be sorted.

2. Since the join happening b/w table in AMP memory and table in unsorted spool, it happens so quickly.

 Exclusion Join

-------------------------

These type of joins are suggested by optimizer when following are used in the queries

- NOT IN

- EXCEPT

- MINUS

- SET subtraction operations

Select EMP.Ename , DEP.Deptno, EMP.salary
from

EMPLOYEE EMP

WHERE EMP.Enum NOT IN

( Select Enum from

DEPARTMENT DEP

where Enum is NOT NULL );
Please make sure to add an additional WHERE filter “with <column> IS NOT NULL” since usage of
NULL in a NOT IN <column> list will return no results.

Exclusion join for following NOT In query has 3 scenarios

Case 1: matched data in "NOT IN" sub Query will disqualify that row

Case 2: Non-matched data in "NOT IN" sub Query will qualify that row

Case 3: Any Unknown result in "NOT IN" will disqualify that row - ('NULL' is a typical example of this
scenario). 

You might also like