0% found this document useful (0 votes)
2 views

Lecture 7 - code_notes_improved1

This document provides an overview of SQL JOINS, detailing how to combine data from multiple tables using INNER JOINS, OUTER JOINS (LEFT, RIGHT, FULL), and UNION. It explains the significance of primary and foreign keys in relational databases and illustrates how JOINS can enhance data organization and query efficiency. Additionally, it includes practical SQL examples to demonstrate the implementation of various types of JOINS and their applications.

Uploaded by

chhawchh
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Lecture 7 - code_notes_improved1

This document provides an overview of SQL JOINS, detailing how to combine data from multiple tables using INNER JOINS, OUTER JOINS (LEFT, RIGHT, FULL), and UNION. It explains the significance of primary and foreign keys in relational databases and illustrates how JOINS can enhance data organization and query efficiency. Additionally, it includes practical SQL examples to demonstrate the implementation of various types of JOINS and their applications.

Uploaded by

chhawchh
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 28

JOINS

In this section, we learn how to combine data from


multiple tables.

Specifically, we learn about:


- INNER JOINS
- OUTER JOINS (LEFT, RIGHT, FULL)
- UNION

Introduction to JOINS
So far we worked with one table at a time. But the real power of SQL
comes from working with data across multiple tables at once. The term
relational database refers to the fact that tables within it relate to one
another. They contain common identifiers (e.g. primary and foreign
keys) that allow information from multiple tables to be easily
combined. In this lesson, we’ll see how to leverage SQL to link tables
together with what is called JOINS.
To understand what JOINS are and why they’re helpful, let’s think
about Parch & Posey’s orders table. Looking back at this table, we
notice that none of the orders say the name of the client, which is a
very useful piece of information. Instead, the table refers to customers
by numerical values in the account_id column. We’ll need to join
with another table, in order to connect the orders data to the
customer names.
But why isn’t the customer’s name in the orders table in the first
place? There are several reasons why relational databases are like this
and are split into multiple tables. Let’s focus on two of the most
important ones:
1. Objects of different nature, such as orders and accounts, are
easier to organize if kept separate.

The account and orders tables store fundamentally different types


of objects. Parch & Posey probably only has one account per
customer, and they want it to be up-to-date with the latest
information. The customers’ data may get updated every once in
a while, not very often, only if some customer changes address,
legal name, etc, pretty rarely… Orders, on the other hand, are
likely to stay the same once they are entered, and more orders
keep getting added there every day, even minute, or second. A
given customer might have multiple orders, and rather than
change any past order, Parch & Posey might just add a new one.
Because these objects operate differently and are conceptually
very different, it makes sense for them to live in different tables.

2. This multi-table structure allows queries to execute more quickly.

Another reason accounts and orders might be stored separately


has to do with the speed at which databases need to use the data.
When you write a query, its execution speed depends on the
amount of data you’re asking the database to read, and the
number and type of calculations you’re asking it to do. Imagine a
world where the account names and addresses are added into the
orders table, just so that we have everything in one place. This
means that the table would have many additional columns: one
for account name, for website, for street name, street number,
zip code, etc. Let’s say, a customer changes their address. Then
the address columns need to be updated retroactively for every
single order. This means X*#_orders modifications (X: number of
address columns, #_orders: number of orders by the specific
customer). By contrast, keeping account details in a separate
table would need us to change only 1 record, the one that
corresponds to the account of that customer. The larger the data
set, the more this matters! We would have to both retrieve more
columns every time we want to read from or work with the data,
and we would need to update many more rows every time that a
single thing changes for a customer.

Primary Keys and Foreign Keys


A primary key (PK) is a unique values column in a particular table. In
Parch and Posey, this is the first column in each of our tables. Here,
those columns are all called id, but that doesn’t necessarily have to be
the case always. It is common that the primary key is the first column in
our tables in most databases, because it is practical.
A foreign key (FK) is when we see the primary key of one table in
another table, and it can be used as a connector between the data of
the two tables.

Take a look at the Entity Relationship Diagram of Parch and Posey that
is illustrated next, to familiarize yourselves with how primary and
foreign keys help connect different tables.
SQL JOINS
A JOIN clause is used to combine rows from two or more tables, based
on a related column between them. The rows are combined
horizontally, in the sense that two rows (one from each table) with
possibly different columns, are concatenated the one next to the other
in one common row that includes columns from both tables. This
concatenation happens only if the two rows from the two tables have
matching values between their corresponding columns, i.e. columns
that represent the same type of entity. For example, if the
account_id in one row of the orders table is equal to the id in a
row of the accounts table, the two rows can get concatenated into
one, in the final result. Examples will help us understand this better.

INNER JOINS
The INNER JOIN selects rows that have matching values in the
corresponding columns of both tables. It will select all rows from both
tables only if there is a match between the columns for all rows. If there
are rows in one table that do not have matches in the other table, then
these rows will not be shown in the output.
--The simplest possible join combines all
columns from both tables
SELECT * FROM ORDERS --Selects all columns from
both tables
INNER JOIN accounts --Defines which table to
join ORDERS with
on accounts.id=orders.account_id --Defines which
columns the data should be matched through; id
and account_id both represent the id of the
customer so they are the corresponding columns
between the two tables.
--We can select only a subset of the columns.
With the following query we produce an output
with all the orders, similar to the orders
table, but with the addition of the client’s
name for every order.
SELECT accounts.name, orders.* FROM ORDERS
INNER JOIN accounts
on accounts.id=orders.account_id

_________________________________________________________
SOME NAMING AMBIGUITY ERRORS!!: It’s important to specify the
table name in front of each column, because we may get some
ambiguity error, if the two tables have columns with the same names.
Try the following query to see the ambiguity error:
SELECT id, * FROM ORDERS
INNER JOIN accounts
on accounts.id=orders.account_id

If we run a JOIN query with *, meaning asking for all the columns, there
will be no ambiguity error and we may end up with two columns having
the same name, but completely different content. Try the following
query to notice that.
select * from accounts a
join sales_reps r
on r.id=a.sales_rep_id --In this case, an error
will show up, when we try to specifically choose
one of these two columns. The following query,
for example, gives such an error.
select id from
(select * from accounts a
join sales_reps r
on r.id=a.sales_rep_id)
--Notice that here we also used our first
SUBQUERY, i.e. a query inside another query!!

In such a case we may want to use aliases for the columns, or not select
both these columns, if they are not both needed, but there should be
no ambiguity in the end.
_________________________________________________________

-- If we select a subset of the columns, we can


sort them in any way we want, they don’t have to
follow the same order with the original tables
(this holds for all queries, not only for
JOINS).
SELECT orders.id, accounts.name, orders.total,
orders.total_amt_usd
FROM orders INNER JOIN accounts
ON accounts.id = orders.account_id
-- We can give each table in our query an alias
or a nickname, in order to make the query
shorter. Frequently an alias is conveniently
chosen to be the first letter of the table name,
but in reality, it could be anything.

SELECT o.id, a.name, o.total, o.total_amt_usd


FROM orders as o
INNER JOIN accounts as a
ON a.id = o.account_id

--Once we create the aliases, we have to use


them otherwise we will get an error. Try to run
the following query to see the error.
SELECT o.id, a.name, o.total, o.total_amt_usd
FROM orders as o
INNER JOIN accounts as a
ON accounts.id = orders.account_id

-- The word “AS” is not necessary (it’s a


stylistic choice: omitting AS is faster to type,
but some people find the code more legible when
including it, find more easily which the aliases
are by looking through the “AS” statements
SELECT o.id, a.name, o.total, o.total_amt_usd
FROM orders o
INNER JOIN accounts a
ON a.id = o.account_id

--In queries with JOINS we can still use WHERE


and LIMIT clauses
SELECT o.id, a.name, o.total, o.total_amt_usd
FROM orders o
INNER JOIN accounts a
ON a.id = o.account_id
WHERE a.name = 'Walmart'
LIMIT 10

--Alternatively, instead of a WHERE clause, we


can filter the data through the ON clause. By
changing WHERE to AND, we’re moving this logical
statement to become part of the ON clause. This
effectively pre-filters the right table to
include only rows with account name “Walmart”
BEFORE the join is even executed. In other
words, it’s like a WHERE clause that applies
before the join rather than after, which is much
more efficient (faster query, needing less time
and resources).
SELECT o.id, a.name, o.total, o.total_amt_usd
FROM orders o
INNER JOIN accounts a
ON a.id = o.account_id and a.name = 'Walmart'

--Notice that we actually don’t have to use the


word INNER for our inner joins (INNER join is
kind of the default join)
SELECT o.id, a.name, o.total
FROM orders o
JOIN accounts a
ON a.id = o.account_id and a.name = 'Walmart'

-- We can run JOIN queries for more than two


tables.
—- Example: Provide an output that shows the
region and the name of the sales rep of each
account
select a.name account, s.name sales_rep, r.name
region from accounts a
join sales_reps s
on a.sales_rep_id=s.id
join region r
on s.region_id=r.id
--Remember, specifying the source table for each
column is necessary when we have columns with
the same name!!
--Another join with multiple tables
-- Example: Provide the name for each region for
every order, as well as the account name and the
unit price they paid for the order.
SELECT o.id as order_id, r.name as region,
a.name account, o.total_amt_usd/o.total
unit_price
FROM region r
JOIN sales_reps s
ON s.region_id = r.id
JOIN accounts a
ON a.sales_rep_id = s.id
JOIN orders o
ON o.account_id = a.id
where o.total>0
LIMIT 10;

OUTER JOINS
In the previous section, we talked about INNER JOINS which link two or
more tables to find all the rows that match, and only those. The OUTER
JOIN returns not only the rows that match on the criteria we specify,
but also the unmatched rows from either of the tables that we want to
join.
There are three types of OUTER JOINS: LEFT
JOIN, RIGHT JOIN, and FULL JOIN.
 LEFT and RIGHT OUTER JOINS
The LEFT JOIN keyword returns all records from the left table (table1),
and only the matched records from the right table (table2). The result
is NULL from the right side, if there is no match (refer to figure below,
2nd Venn diagram).
What’s the difference between the LEFT and RIGHT JOIN?
When we begin building a query using LEFT JOIN, the first table we
name in the FROM clause is the table considered on the left, and the
second table, in the JOIN clause, is considered on the right. For
example, if we want all the rows from the first table and only the
matching rows from the second table, we will use a LEFT OUTER JOIN.
Conversely, if we want all the rows from the second table and any
matching rows from the first table, we will use a RIGHT OUTER JOIN.
The word OUTER is not necessary in the queries. LEFT JOIN and RIGHT
JOIN are equivalent to LEFT OUTER JOIN and RIGHT OUTER JOIN
respectively.

 FULL OUTER JOIN


The last type of join is a FULL OUTER JOIN. This will return the INNER
JOIN result set, as well as any unmatched rows from either of the
tables being joined. FULL OUTER JOIN is the default OUTER JOIN, so the
word FULL is not necessary in a query. FULL OUTER JOIN is equivalent
to OUTER JOIN.

Let’s run some queries!! For the next examples, and because the
existing tables don’t have any data mismatches, so we wouldn’t be able
to showcase the value of the OUTER JOINS, we decided to add some
more rows in two of the tables of the Parch database. Let’s assume that
the company considers to expand, so hired some new sales reps and
also opened some new regions of sales. Let’s add this new data in our
tables:
INSERT INTO sales_reps(id, name, region_id)
values(321991, 'Alina Shein', 5), (321992,
'Alberto Quin', 8);
INSERT INTO region(id, name)
values(5, 'International'), (6, 'South'), (7,
'North');

Now, let’s work with the new data:

-- Are there any sales representatives with no


accounts assigned to them yet?
--(The word OUTER is not necessary)
select sr.name, a.* from accounts a
right outer join sales_reps sr
on a.sales_rep_id =sr.id

-- We can actually isolate the sales


representatives that are not assigned yet to any
accounts, using IS NULL, since these are the
unmatched sales_reps.
select sr.name, a.* from accounts a
right outer join sales_reps sr
on a.sales_rep_id =sr.id
where a.name is NULL

-- We want to see which new regions do we have,


to assign some reps to them
select sr.name, r.name as region from region r
left outer join sales_reps sr
on r.id = sr.region_id

-- We can actually isolate the regions that have


no sales reps assigned to them yet, using IS
NULL, since these are the unmatched regions.
select sr.name, r.name as region from region r
left outer join sales_reps sr
on r.id = sr.region_id
where sr.name is null

-- We want to see which reps have no regions


assigned (new reps).
-- This query shows all reps, whether they have
regions assigned or not!
select sr.name, r.name as region_name from
region r
right outer join sales_reps sr
on r.id = sr.region_id

-- We can actually isolate the reps that have no


regions assigned to them yet, using IS NULL,
since these are the unmatched reps.
select sr.name, r.name as region_name from
region r
right outer join sales_reps sr
on r.id = sr.region_id
where r.name is null

-- We want to see both unassigned reps and


regions, in order to be able to match those with
one another.
-- This query shows all reps and regions,
whether they are assigned or not!
select sr.name, r.name as region_name from
region r
full join sales_reps sr
on r.id = sr.region_id

-- Using IS NULL correctly, we can isolate the


unassigned regions and reps.
select sr.name, r.name as region_name from
region r
full join sales_reps sr
on r.id = sr.region_id
where r.name is null or sr.name is null

Work with more than 2 joins:

-- List all the regions together with their


total number of sales_reps and total number of
orders.
select r.name, count(distinct sr.id) sales_reps,
count(o.id) num_orders from orders o
join accounts a
on o.account_id = a.id
right join sales_reps sr
on sr.id = a.sales_rep_id
right join region r
on r.id =sr.region_id
group by r.name

NOTE 1:
____________________________________________________
Pay attention to how the output changes when we replace
count(o.id) with count(*)!! Count(*) gives num_orders=1,
even for regions that had no sales_reps assigned and hence had no
orders whatsoever. This happened because count(*) counts all rows
in the joined tables, even the ones for which o.id, which represents
the id of any order placed for that region, is NULL!!! The rows for which
o.id is NULL should be omitted from the final count, since it means
that no orders were placed in that specific row! So in this case, it is
important to capture the rows with NULLs in the relevant columns, this
is why using count(o.id) is more accurate than count(*).
select r.name, count(distinct sr.id) sales_reps,
count(*) num_orders from orders o
join accounts a
on o.account_id = a.id
right join sales_reps sr
on sr.id = a.sales_rep_id
right join region r
on r.id =sr.region_id
group by r.name

A simpler way to notice this, is comparing the following two queries.


What is their difference? Count(*) counts number of rows, even if sr.id
is empty (NULL), so the answer in the second query is wrong!

-- How many sales_reps does each region have?


select r.name, count(sr.id) from region r
left join sales_reps sr
on sr.region_id = r.id
group by r.name

Versus

-- How many sales_reps does each region have?


select r.name, count(*) from region r
left join sales_reps sr
on sr.region_id = r.id
group by r.name

Opposite to the above, there are cases where we prefer to use


count(*) instead of count(column)! For example, if we want to
count all the accounts in the accounts table, count(*) is more
accurate than count(name), because if the name of some
accounts happens to be NULL (e.g. we didn’t have the names of some
of the accounts in our data), then count(name) will only include in
the final result the non NULL rows, which is a smaller number than all
the rows, which would be counted by count(*).
_______________________________________(end of Note 1)

NOTE 2:
____________________________________________________

Finally, it is very useful to consult the ERD diagram of the database, IF


available, when we try to build queries with multiple joins. Here, for
example, the Parch ERD (Entity Relationship Diagram) shown below
would have showed us that the only way to connect regions with the
orders placed at those regions would be through the following joins
(represented by arrows):

orders <-> accounts <-> sales_reps <-> region


Keep in mind, though, that the ERD does not help choosing the type of
join, since that depends on:
 the specific output we are expecting
 the order with which we state the different tables in the query,
 does NOT depend on how tables show up in the ERD!!!!!

Also, ERDs are not always available to us, so we need to be


comfortable with building the right queries, even when ERDs are
not available to us.
___________________________________(end of Note 2)

-- Total revenues made by each sales_rep, only


sales_reps that brought more than 500000 of
revenues.
select sr.name, sum(o.total_amt_usd)
total_revenue from orders o
join accounts a
on a.id = o.account_id
right join sales_reps sr
on sr.id = a.sales_rep_id
group by sr.name
having sum(o.total_amt_usd)>500000
order by total_revenue

NOTE 3:
____________________________________________________
In order to build successful queries with multiple joins, make sure that
after every join you add, you run the query to see if you are getting the
output that you expected. Especially when these queries need outer
joins, make sure that you select the correct outer join, by printing out
the intermediate results after every join, and you see all the data that
you expected in the output. In this above query, for example, only if
you use a RIGHT JOIN when you join sales_reps to the
accounts/orders result of the previous join, will you be able to
keep all the sales representatives in the final output, otherwise only the
ones with accounts assigned to them will show up, and the ones with
no accounts will be eliminated during the join!

______________________________________(end of Note 3)
Should we use filtering conditions in the ON clause instead of the
WHERE, when and why???
Let’s use some new data for this example. In your Parch database build
the following table, a shorter version of the familiar sales_reps table

CREATE TABLE sales_reps_short (


id integer,
name bpchar,
region_id integer
);
INSERT INTO sales_reps_short
VALUES (1,'Samuel Racine',1),
(2,'Sherlene Wetherington',2),
(3,'Earlie Schleusner',NULL),
(4,'Moon Torian',3),
(5,'Brandie Riva',4),
(6,'Brandie Riva',5)

-- A shorter version of the sales_reps table


select * from sales_reps_short;
-- The region table
select * from region;

-- Print all the regions, whether they have a sales rep


or not
select r.id r_id, sr.name sales_rep from region r
left join sales_reps_short sr
on r.id = sr.region_id
;

-- Print only the regions from 4 and above, whether


they have a sales rep or not
select r.id r_id, sr.name sales_rep from region r
left join sales_reps_short sr
on r.id = sr.region_id
where r.id>3
;
/* Print only the regions from 4 and above, whether
they have a sales rep or not.

Let's see what happens when we move the filtering


condition from WHERE to ON:

Here we ACCIDENTALLY put the r.id>3 filtering condition


in the ON clause. The result is that, even though we
thought that we were filtering out all the regions from
3 and below, we actually didn't! It is just that
those regions specifically have NULL in the sales rep
column, even though we know from two queries above that
they do have a sales representative. So what is going
on? When we put the r.id>3 filtering condition in the
ON clause, then the JOIN only happens for rows in which
r.id>3, i.e. the JOIN happens according to the ON
condition. The rest of the regions will still show up
in the output, because this is a LEFT joint and all the
rows of the left table should show up in the output,
but they will not be joined to anything really, because
of the r.id>3 condition, hence the NULL values in the
sales_rep column.

SO: the filtering condition in the ON clause dictates


whether a row will be joined with something from the
other table or not, and not whether it will appear in
the output or not. Instead, the type of join(LEFT,
RIGHT, INNER, OUTER) is the one that dictates whether a
row will appear in the output at all. So when we use a
filtering condition in the ON clause of an OUTER join,
it doesn't really work as a filtering condition
in terms of the output, it works as a filtering
condition in terms of whether the rows will be joined
to anything from the other table or will get NULLS
instead in the joined columns. This is usually not a
useful practice, so it is better to avoid filtering
conditions in the ON clause when we have outer joins,
because the result is most likely not the intended one.
*/
select r.id r_id, sr.name sales_rep from region r
left join sales_reps_short sr
on r.id = sr.region_id and r.id>3
;

-- Now let's see what happens when we have an INNER


JOIN.

select * from sales_reps_short;

select * from region;

-- Print only the regions that have a sales rep


select r.id r_id, sr.name sales_rep from region r
join sales_reps_short sr
on r.id = sr.region_id
;

-- Print only the regions from 4 and above that have a


sales rep
select r.id r_id, sr.name sales_rep from region r
join sales_reps_short sr
on r.id = sr.region_id
where r.id>3
;

/*Print only the regions from 4 and above that have a


sales rep.

Let's see what happens when we move the filtering


condition from WHERE to ON:

The output is exactly the same with the previous query.


This time we have an inner join, so none of the rows
that won't be jointed will appear in the output,
because there is no outer join to dictate that. The
condition r.id>3 in the ON clause, like before,
dictates that any row with region id <4 should not be
joined and, since these rows cannot be joined, they
won't show up at all the final output!

This is a useful case of adding the filtering


condition in the ON clause, because it both gives us
the output that we want, and it saves computational
time, since these rows will never even try to
get joined, they get excluded, before the join even
happens.
.*/
select r.id r_id, sr.name sales_rep from region r
join sales_reps_short sr
on r.id = sr.region_id and r.id>3
;

Takeways:
 It is better to avoid filtering conditions in the
ON clause when we have outer joins, because the
result is most likely not the intended one.
 It is good to move filtering conditions in the ON
clause, because it both gives us the output that we
want, and it saves computational time, since the
rows that we want to filter out, will never even
try to get joined, they get excluded before the
join even happens.

To summarize:
Here are the different types of JOINs in SQL:
(INNER) JOIN: Returns records that have matching values in both tables
LEFT (OUTER) JOIN: Returns ALL records from the left table, and only
the matched records from the right table, IF ANY. For the records of the
left table that have no match in the right table, empty records/NULL
values are added in the columns of the latter.
RIGHT (OUTER) JOIN: Returns ALL records from the right table, and
only the matched records from the left table, IF ANY. For the records of
the right table that have no match in the left table, empty records/NULL
values are added in the columns of the latter.
FULL (OUTER) JOIN: Returns all records from both tables, whether they
have a match or not between the two. For any records that have no
match, the columns of the table that does not have a match are
populated with empty records/cells/ NULL values.

The following figure shows a Venn diagram illustrating the result of


different types of JOINS.

Please, check out JOIN_concept.csv for more explanation with some toy
examples.

A last note…
It is important to be intentional about the type of JOIN that you need
to use…

Instead for example of always using a FULL JOIN and removing the
NULLs, if needed, we need to select intentionally what join is
appropriate! Sometimes, if your data is too big, you don’t want to try to
join all the rows (i.e. FULL join), needed or not, and then decide with
the eye what join to use, or what rows to remove. For very voluminous
data, trying to join everything may be so costly, that the query will
never end! So you have to learn how to use the correct join, without
joining everything, just because you can and then deciding what to do
next.

You also need to keep in mind that some data may give correct output
for the wrong join, just coincidentally, but the query would not be
correct for different data. For example, if we run the following query in
the old vs new Parch database (i.e. before and after we added the new
data in the Parch database), it would give the correct answer in the old
data, but a wrong answer in the new data.
Run these queries on both old and new data:

select sr.name sr_name, r.name region_name from


sales_reps sr
join region r
on r.id = sr.region_id ;

select sr.name sr_name, r.name region_name from


sales_reps sr
left join region r
on r.id = sr.region_id
In the old data that had no unassigned sales reps, both queries would
give the same answer.

In the new data, the second query will include all the sales reps,
whether they are assigned to a region or not, but the first query will
only include the assigned sales reps.
If the question we are asking is: find all sales reps and their regions
then, even though in the old data both queries would give the same
output, a correct output, the first query would be wrong, because if the
data was different, the first query would omit the sales reps that have
no assigned region!
So for a query with JOINS to be considered correct, it is not enough for
the output to look correct, the query should be able to give the
correct answer, even if the data was a bit different!

UNION
Union combines rows vertically. So, in a sense, it takes two tables (or
sets of rows) and places one underneath the other. The two tables or
sets of rows should have columns with the same name, type and order,
otherwise the union will fail!

-- Parch and Posey bought a small company that


had its own sales representatives and wants to
add them to its own. Let’s add a new. Table
reflecting the newly added reps.
create table sales_reps_new(id int, name bpchar,
region_id int);

insert into sales_reps_new(id, name, region_id)


values(1, 'Mailo Sales', 5), (2, 'Rania Feltz',
null), (3, 'Adaku Abara', 2), (4, 'Necole
Victory', 1);

--Union concatenates vertically rows of the SAME


structure (same number, type and name of
columns)
select * from sales_reps
union
select * from sales_reps_new
order by id

--Try to run the following queries and notice


the error messages
select name, region_id from sales_reps
union
select region_id from sales_reps_new
order by id

select region_id, name from sales_reps


union
select name, region_id from sales_reps_new

-- UNION
-- all distinct rows selected by either queries
select name, region_id from sales_reps
union
select name, region_id from sales_reps_new
order by name;

-- UNION ALL
-- all rows selected by either queries
select name, region_id from sales_reps
union all
select name, region_id from sales_reps_new
order by name;

-- INTERSECT
-- returns rows that are common in both queries
select name, region_id from sales_reps
intersect
select name, region_id from sales_reps_new
order by name

-- EXCEPT (ALSO KNOWN AS MINUS IN SOME DBMS)


-- returns only those rows which are unique in
only first SELECT query and not those rows which
are common to both first and second queries
select name, region_id from sales_reps
except
select name, region_id from sales_reps_new
order by name
CORRECT ORDER AMONG DIFFERENT
CLAUSES
ORDER IN QUERY MATTERS STILL!!!: The order with which the different
clauses are implemented behind the scene, also dictates the order with
which we should state the clauses in a big query. This is the correct
order with which we should state the different clauses in our queries
SELECT ... (aggregation)... FROM ...
JOIN
ON
(we may have multiple JOIN ON combinations, if multiple joins are
needed)

WHERE
GROUP BY
HAVING
ORDER BY
LIMIT

You might also like