Joins and Views - Week-5
Joins and Views - Week-5
Agenda
Venn Diagrams
ANSI-89 Joins
ANSI-92 Joins
INNER Joins
OUTER JOINS
LEFT OUTER
RIGHT OUTER
INVERSE
FULL
Ambiguous Fields
Views
never see them again, but this week we will revisit them from the context of database
joins.
In this diagram above we could have two tables with data in them, one table named A,
the other named B. The red area would indicate the area where data between the two
tables is related. In set operators this was known as A ∩ B (A intersect B). We will use
diagrams like this to visualize how the various joins work.
ANSI-89 Joins
ANSI-89 joins are still commonly used today by db developers whom learned it that way
and have not changed because they do not see a need to. In the real-world, there are
millions of databases in use that have been created over decades of time. This means
that there are a significant amount of existing databases that exist that use the older
model and they need to be maintained. Therefore, we are covering the topic just
enough such that you recognize it when you see it and can work on those databases.
However, all work in this course will be done using the current ANSI-92 method unless
otherwise stated.
ANSI-89 Joins have been called different things over the years, but the 2 most popular
names are "Simple Joins" or "Natural Joins". The basic syntax is:
SELECT <fieldlist>
As see above, the two table names are both included in the FROM clause, but it is very
important to note that a query has no knowledge of any existing relationships,
contraints etc. Anything that is required in the output, must be defined in the query. If
there is a relationship between the two tables, then it must be described in the
statement. The WHERE clause is where this is done by making an equality on the
primary key of the parent table and the foreign key of the child table. Meaning, that
BDD 200 JOINS and VIEWS
data will only be retrieved when rows in table 2 have an equivalent value in the key field
of table 1.
PLAYERS
1 John Smith
2 Bob Marley
3 Steven King
4 Jim Parsons
TEAMS
10 Hornets Yellow
11 Falcons Brown
12 Bloopers Red
In the tables above, you can see that there is a relationship between the two tables on
the teamID field. John and Bob play on the Hornets and Steven plays for the Falcons.
We also see the Jim is not yet on a team and the Bloopers do not have any players.
Let's write a natural join to obtain the players names and the team name for the teams
they play on.
-- this is obsolete code and should not be done unless specifically asked to do
ANSI89 joins
SELECT
BDD 200 JOINS and VIEWS
playerID,
teamName
-- this is obsolete code and should not be done unless specifically asked to do
ANSI89 joins
SELECT
playerID,
teamName
PLAYERID PLAYERNAME
PLAYERID PLAYERNAME
As you can see this output is WRONG, it says that each of the 4 players play one all 3
teams. These results are called the CROSS join results. This is done because the query
has no prior knowledge of the relationship and unless this link is described in the query,
it gets all possible combinations that could be true. Therefore, it is essential that a
WHERE clause be added to describe the equality.
-- this is obsolete code and should not be done unless specifically asked to do
ANSI89 joins
SELECT
playerID,
teamName
-- this is obsolete code and should not be done unless specifically asked to do
ANSI89 joins
SELECT
playerID,
teamName
PLAYERID PLAYERNAME
From this new query we now have the correct results. Note that Jim is not in the output,
as he is not on a team, and the Bloopers team is missing because they do not have any
players.
The natural join syntax gets messy when you now need to add conditions or filters in
the WHERE clause as now AND is required. If you have multiple filters, you then have
multiple ANDs and ORs which make it more confusing and mistakes are easily made.
Additionally, if we consider Order of Execution, remember that the FROM clause
executes before the WHERE clause. This results in the CROSS join results being obtained
first and then eliminating the results that don't match. This could result in memory
issues in very large queries and slower processing time compared to the new ANSI-92
standard.
ANSI-92 Joins
In 1992, a new standard format was introduced that would become the standard format
across all DBMS versions of SQL. This introduces the JOIN keyword along with ON and
USING. We are going to cover several different ways to retrieve data from the tables
using inner and outer join types.
OR
When the JOIN is specified with no prefix qualifier, the default join type is INNER JOIN.
The first version, using the ON keyword can always be used and it is best to know this
one as you can create almost any join query using this syntax. The USING syntax has a
few limitations, but also has some benefits.
USING can only be used if the keyfield names in both tables is exactly the same
(teamID in our example)
table aliases can not be used on the tables involved in the USING keyword
when Using is used, the field indicated is combined into a single field, meaning it
can no longer be ambiguous.
Inner Joins
Inner Joins return the results when the equality exactly matches. This may seem
obvious, because = means equals, but you will see with outer joins that we can obtain
results that don't satisify the equality. Inner Joins will only return rows where the key
value from table 1 exists in table 2 AND the key values in table 2 exist in table 1.
For example:
SELECT
playerID,
firstname,
lastname,
teamName
This statement has the exact same output as the last query using the Natural join with
the WHERE equality.
Outer Joins
BDD 200 JOINS and VIEWS
Outer Joins extend the Inner join by allowing us to retrieve results from one or both
tables, even if there is no matching record in the other table. This is a very powerful
feature that allows us to get information that there are no matches for instance.
There are 3 types of outer joins available to us in ANSI-92, LEFT, RIGHT and FULL. The
concept of Left and Right may be confusing at first, but it is actually really easy once you
get it.
table1 JOIN table2
Using the above join, table1 is considered the LEFT table because it is before
the JOIN keyword. This then means that table2 would be the RIGHT table because it is
located after the JOIN keyword. The reason we do not say left or right of the join is
because it can be written in different ways. For example:
table1
JOIN
table2
Even though this appears like table2 is left of the JOIN, it is still the right table because it
is after. For readability purposes, it is helpful in the tables and JOIN keyword are written
on one line.
A LEFT OUTER JOIN would return ALL the rows from the left table regardless if a matching
record exists in the right table, but will still only return rows from the right table if there
is an exact matching record in left table.
A RIGHT OUTER JOIN would return ALL the rows from the right table regardless if a
BDD 200 JOINS and VIEWS
matching record exists in the left table, but will still only return rows from the left table if
there is an exact matching record in right table.
Example
SELECT
playerID,
firstname,
lastname,
teamName
-- OR
SELECT
playerID,
firstname,
lastname,
teamName
These two statements above will return the exact same results, as we swapped the
tables and changed left to right.
1 John Smith
2 Bob Marley
3 Steven King
BDD 200 JOINS and VIEWS
4 Jim Parsons
In this result, you see that all the matching records are present, but in addition, player 4
is included even though there is no team that is linked to him. Notice that the value of
the teamname can not be retrieved, because there is no team that matches, and
therefore the value returned for teamname is NULL.
Example 2
-- OR
returns...
1 John Smith
2 Bob Marley
3 Steven King
In this result, you see that all the matching records are present, but in addition, the
Bloopers team is included even though there is no player linked to that team. Notice
that the value of the playerid can not be retrieved, because there is no player that
matches, and therefore the value returned for playerid, firstname or lastname is NULL.
BDD 200 JOINS and VIEWS
It is interesting to note, that although firstname and lastname would be NULL in this
case, the PlayerName column returns a blank and NOT a NULL. This is because
playername is a calculated field using a string function, therefore NULLs are eliminated
and a zero length string is returned.
Example: Create a query that lists the teams that do not yet have any players linked to
them:
by using the LEFT join, all rows from the teams table are retrieved, regardless if a
matching player exists, and only those players returned that are on a team are returned.
Knowing that the extra team rows returned can not retrieve player information and
therefore inserts NULL values, we can add an additional filter of WHERE playerID IS
NULL, eliminating the records that do match.
TEAMID TEAMNAME
12 Bloopers
Another Example: Create a query that lists all players not currently on a team!
BDD 200 JOINS and VIEWS
The right join ensures that all players are returned and the use of the teamid being null
gives us only those players whom are not on a team.
4 Jim Parsons
A third type of outer join combines the results from both left and right joins together. A
full join will return:
1 John Smith
2 Bob Marley
3 Steven King
4 Jim Parsons
4 Jim Parsons
PLAYERS
1 John Smith
2 Bob Marley
3 Steven King
4 Jim Parsons
TEAMS
10 Hornets Yellow
11 Falcons Brown
12 Bloopers Red
FIELDS
FIELDS
So now we need to ask the question, create a query that outputs the player names and
the address that they need to go to for practice. We now requried fields from 2
different tables that do not have a way to join them, there are no matching fields.
However, we know players play on teams and teams play at field locations, so if we use
all 3 tables, then we can get the answer.
In this sample, we first joined players and teams, and then joined the result to fields.
Note that the first join creates a new temporary table, that includes all fields from both
players and teams tables, and then we join the third table, fields, to the new temporary
table. It is NOT table 1 joins to table 2 and table 2 joins to table 3. The results are:
FIRSTNAME LASTNAME
Example: create a query that outputs the player names, team name and
the address that they need to go to for practice, but include all fields regardless of
teams associated with them.
In order to return all the fields, we need to do a right join on the join that includes fields.
Because we only wanted matching teams and players, the first join could simply be an
inner join.
It is important to note that if the order of the tables in the FROM clause was different,
for example fields was the firsts table, then the join types would have to chage to match
the order the tables were entered.
Example: create a query that outputs the player names, team name and
the address that they need to go to for practice, but include all players regardless of
teams associated with them.
In this case, the table that we needed all records returned from was the first table.
Therefore, the first join needed to be a left join to carry those records forward. Now the
BDD 200 JOINS and VIEWS
first join created a temp table in memory that contains all fields from both tables.
Therefore the player information now resides in the temp table which is joined to the
3rd, fields, table. Because we still need to keep allplayers, a second left join is required to
maintain that data. An inner or right join would have eliminated players without teams.
FIRSTNAME LASTNAME
because the teamid field is located in both the players and teams tables and the SELECT
clause asks for the teamid without specifying which one is requested. This brings up 2
questions:
To answer the first question, we use the table.field notation, and is easier typed using
abbreviated table aliases.
BDD 200 JOINS and VIEWS
-- OR using aliases
For INNER joins, the only results that can be displayed are where they related field
matches in both tables. Be definition the two values will be identical and therefore it
does not matter which field is used for inner joins.
When performing Left, Right and Full outer joins, it is possible that one version of the
field will have values in it and the other one could be a NULL value. The choice of which
field then depends on the question: where we wanted to show all records from the left
table, the field from the left table would likely be the right result to show. And if a right
join was used, it is likely the field from the right table should be used.
1 John Smith 10 10
2 Bob Marley 10 10
3 Steven King 11 11
In the above query and results, by showing both teamid's, it is clear which one you
would choose for different output requirements. If you needed the teamID for the team
without players, you would have to choose t.teamid.
BDD 200 JOINS and VIEWS
This question becomes more complicated when we are using joins to find NOT answers.
Let us return to the question: List the teams that do NOT have players.
Views
What is a view? A view is an object stored in the database that can be a named SELECT
statement. You are able to create a query and save it as an object in the database.
When a view is executed, it produces results in real-time (i.e. at the time of execution).
Views vs Tables
Tables are database objects that have structure, data rows and columns and most
importantly stores data. Views DO NOT store data and are nothing more than an SQL
statement stored as a string object. I repeat, a view is nothing more than a string, that
happens to be a SELECT statement most of the time. When a table is requested, it
simply retrieves the current data in the table. When a view is requested, it executes the
premade SQL statement and results in a real-time table of results. These results can be
the final results, or can be used as the source in further SQL statements.
Why Views?
Views are a very powerful tool in a database schema. One way to think of a view is as an
object orientated object that enhances the usability and flexibility of data retrieval.
The ability to create complex SELECT statements for recall at any time is huge
time saving tool, eliminating the need to recreate complex statements multiple
times. (for example, creating a query to answer the question, how much money
did we make yesterday is likely asked every day. Creating a view that can be run
each day makes daily routines extremely easy)
Views can be used to build base queries: to put tables back together through
complex joins and then use the view as the source of a new SELECT statement to
perform only the filtering and sorting.
BDD 200 JOINS and VIEWS
Views are NOT tables, but can be treated as if they were tables in SQL queries.
How?
Views are created by the DDL statement CREATE:
to view the results of the View's SELECT statement, use a SELECT statement with the
view as the data source:
Examples:
Creates a base view that can be used further to answer many of the questions we have
seen in examples above:
1 John Smith 10 1
2 Bob Marley 10 1
3 Steven King 11 1
Question: List all players on teams and what the team name is!
As you can see, these queries are much easier than the ones we created above in the
JOINS section. Although, it is important to note that the View still contains a join. In
fact, the viewname can be replaced by the actual SELECT statement in the view and it
would product the same results.
-- is the same as
SELECT * FROM
SELECT *
Modifying a View
It is often the case during development that a view needs to be changed. Although you
could DROP the view and the recreate it, this can result in a very brief time where the
view does not exist which would return a confusing error to incoming requests. There is
an alternative using the CREATE OR REPLACE or ALTER statement that does not drop the
view, but simply changes it, such that it still exists at all times. The general syntax is:
CREATE OR REPLACE VIEW viewname AS
SELECT .....
SELECT .....
Summary
This was a pretty big week, and probably one of the most important weeks. Joins are a
fundamental part of relational databases and SQL. Almost every query written in the
real worl will involve at least one join, therefore they are an essential part of SQL and
computer programming. Views are used every day for adding an eliment of object
oriented programming to SQL and database access.
Explain what a view is, how it is saved, and how it can be used to make working in
SQL more efficient
create and save views and use them in the right context in more complexe SQL
SELECT statements.
SUMMARY
Venn Diagrams
ANSI-89 Joins
ANSI-92 Joins
INNER Joins
OUTER JOINS
LEFT OUTER
RIGHT OUTER
INVERSE
FULL
Ambiguous Fields
Views
BDD 200 JOINS and VIEWS
INSERT ALL
INTO teams VALUES (10, 'Hornets', 'Yellow', 'Toronto')
INTO teams VALUES (11, 'Falcons', 'Brown', 'Barrie')
INTO teams VALUES (12, 'Bloopers', 'Red', 'Kitchener')
SELECT * FROM dual;
COMMIT;
INSERT ALL
INTO players VALUES (1, 'John', 'Smith', 10)
INTO players VALUES (2, 'Bob', 'Marley', 10)
INTO players VALUES (3, 'Steven', 'King', 11)
INTO players VALUES (4, 'Jim', 'Parsons', NULL)
SELECT * FROM dual;
COMMIT;
INSERT ALL
INTO fields VALUES ('Toronto', '22 Queen St. E.', 'Doug Ford')
INTO fields VALUES ('Barrie', '380 Bayview Ave', 'Richard Alexander')
INTO fields VALUES ('Kitchener', '220 Weber St. S', 'Marco Albano')
INTO fields VALUES ('Waterloo', '576 King St. N', 'Pauyl Caruso')
SELECT * FROM dual;
COMMIT;