Chapter 13: Filter Results Using WHERE and Having
Chapter 13: Filter Results Using WHERE and Having
Chapter 13: Filter Results Using WHERE and Having
HAVING
Section 13.1: Use BETWEEN to Filter Results
The following examples use the Item Sales and Customers sample databases.
This query will return all ItemSales records that have a quantity that is greater or equal to 10 and less than or equal
to 17. The results will look like:
This query will return all ItemSales records with a SaleDate that is greater than or equal to July 11, 2013 and less
than or equal to May 24, 2013.
When comparing datetime values instead of dates, you may need to convert the datetime values into a
date values, or add or subtract 24 hours to get the correct results.
This query will return all customers whose name alphabetically falls between the letters 'D' and 'L'. In this case,
Customer #1 and #3 will be returned. Customer #2, whose name begins with a 'M' will not be included.
Id FName LName
An aggregate function is a function where the values of multiple rows are grouped together as input on
certain criteria to form a single value of more significant meaning or measurement (Wikipedia).
This example uses the Car Table from the Example Databases.
This query will return the CustomerId and Number of Cars count of any customer who has more than one car. In
this case, the only customer who has more than one car is Customer #1.
This statement will return all Employee records where the value of the ManagerId column is NULL.
SELECT *
FROM Employees
WHERE ManagerId IS NOT NULL
This statement will return all Employee records where the value of the ManagerId is not NULL.
This statement will return all the rows from the table Employees.
Using a WHERE at the end of your SELECT statement allows you to limit the returned rows to a condition. In this case,
where there is an exact match using the = sign:
Section 13.5: The WHERE clause only returns rows that match
its criteria
Steam has a games under $10 section of their store page. Somewhere deep in the heart of their systems, there's
probably a query that looks something like:
SELECT *
FROM Items
WHERE Price < 10
AND
Will return:
OR
Will return:
SELECT *
FROM Cars
WHERE TotalCost IN (100, 200, 300)
This query will return Car #2 which costs 200 and Car #3 which costs 100. Note that this is equivalent to using
multiple clauses with OR, e.g.:
SELECT *
FROM Cars
WHERE TotalCost = 100 OR TotalCost = 200 OR TotalCost = 300
SELECT *
FROM Employees
WHERE FName LIKE 'John'
This query will only return Employee #1 whose first name matches 'John' exactly.
SELECT *
FROM Employees
WHERE FName like 'John%'
John% - will return any Employee whose name begins with 'John', followed by any amount of characters
%John - will return any Employee whose name ends with 'John', proceeded by any amount of characters
%John% - will return any Employee whose name contains 'John' anywhere within the value
In this case, the query will return Employee #2 whose name is 'John' as well as Employee #4 whose name is
'Johnathon'.
To check for customers who have ordered both - ProductID 2 and 3, HAVING can be used
select customerId
from orders
where productID in (2,3)
group by customerId
having count(distinct productID) = 2
Return value:
customerId
1
The query selects only records with the productIDs in questions and with the HAVING clause checks for groups
select customerId
from orders
group by customerId
having sum(case when productID = 2 then 1 else 0 end) > 0
and sum(case when productID = 3 then 1 else 0 end) > 0
This query selects only groups having at least one record with productID 2 and at least one with productID 3.
Oracle:
SELECT Id,
Col1
FROM (SELECT Id,
Col1,
row_number() over (order by Id) RowNumber
FROM TableName)
WHERE RowNumber <= 20
SQL Server:
SELECT TOP 20 *
FROM dbo.[Sale]
MySQL:
SELECT Id,
Col1
FROM (SELECT Id,
Col1,
row_number() over (order by Id) RowNumber
FROM TableName)
WHERE RowNumber BETWEEN 21 AND 40
PostgreSQL; SQLite:
MySQL:
Oracle:
SELECT Id,
Col1
FROM (SELECT Id,
Col1,
row_number() over (order by Id) RowNumber
FROM TableName)
WHERE RowNumber > 20
PostgreSQL:
SQLite:
Example query:
Example result:
on type you see if an index was used. In the column possible_keys you see if the execution plan can choose from
different indexes of if none exists. key tells you the acutal used index. key_len shows you the size in bytes for one
index item. The lower this value is the more index items fit into the same memory size an they can be faster
processed. rows shows you the expected number of rows the query needs to scan, the lower the better.
DESCRIBE tablename;
Exmple Result:
Here you see the column names, followed by the columns type. It shows if null is allowed in the column and if the
column uses an Index. the default value is also displayed and if the table contains any special behavior like an
auto_increment.
Id FirstName LastName
1 Ozgur Ozturk
2 Youssef Medi
3 Henry Tai
Order Table
Id CustomerId Amount
1 2 123.50
2 3 14.80
Get all customers with a least one order
SELECT * FROM Customer WHERE EXISTS (
SELECT * FROM Order WHERE Order.CustomerId=Customer.Id
)
Result
Id FirstName LastName
2 Youssef Medi
3 Henry Tai
Get all customers with no order
SELECT * FROM Customer WHERE NOT EXISTS (
SELECT * FROM Order WHERE Order.CustomerId = Customer.Id
)
Result
Id FirstName LastName
1 Ozgur Ozturk
Purpose
EXISTS, IN and JOIN could sometime be used for the same result, however, they are not equals :
A table may be joined to itself or to any other table. If information from more than two tables needs to be accessed,
multiple joins can be specified in a FROM clause.
In the below example, for each Employee in the example database Employees table, a record is returned containing
the employee's first name together with the corresponding first name of the employee's manager. Since managers
are also employees, the table is joined with itself:
SELECT
e.FName AS "Employee",
m.FName AS "Manager"
FROM
Employees e
JOIN
Employees m
ON e.ManagerId = m.Id
Employee Manager
John James
Michael James
Johnathon John
The first action is to create a Cartesian product of all records in the tables used in the FROM clause. In this case it's
the Employees table twice, so the intermediate table will look like this (I've removed any fields not used in this
example):
The next action is to only keep the records that meet the JOIN criteria, so any records where the aliased e table
ManagerId equals the aliased m table Id:
Then, each expression used within the SELECT clause is evaluated to return this table:
e.FName m.FName
John James
Michael James
Johnathon John
Finally, column names e.FName and m.FName are replaced by their alias column names, assigned with the AS
operator:
Employee Manager
John James
Michael James
Johnathon John
Note that (1,2) are unique to A, (3,4) are common, and (5,6) are unique to B.
Inner Join
An inner join using either of the equivalent queries gives the intersection of the two tables, i.e. the two rows they
have in common:
a | b
--+--
3 | 3
4 | 4
A left outer join will give all rows in A, plus any common rows in B:
a | b
--+-----
1 | null
2 | null
3 | 3
4 | 4
Similarly, a right outer join will give all rows in B, plus any common rows in A:
a | b
-----+----
3 | 3
4 | 4
null | 5
null | 6
A full outer join will give you the union of A and B, i.e., all the rows in A and all the rows in B. If something in A
doesn't have a corresponding datum in B, then the B portion is null, and vice versa.
a | b
-----+-----
CREATE TABLE A (
X varchar(255) PRIMARY KEY
CREATE TABLE B (
Y varchar(255) PRIMARY KEY
);
Inner Join
X Y
------ -----
Lisa Lisa
Marco Marco
Phil Phil
Sometimes abbreviated to "left join". Combines left and right rows that match, and includes non-matching left
rows.
X Y
----- -----
Amy NULL
John NULL
Lisa Lisa
Marco Marco
Phil Phil
Sometimes abbreviated to "right join". Combines left and right rows that match, and includes non-matching right
rows.
X Y
----- -------
Lisa Lisa
Marco Marco
Phil Phil
NULL Tim
NULL Vincent
Sometimes abbreviated to "full join". Union of left and right outer join.
X Y
----- -------
Amy NULL
John NULL
Lisa Lisa
Marco Marco
Phil Phil
NULL Tim
NULL Vincent
X
-----
Lisa
Marco
Phil
Y
-----
Lisa
Marco
Phil
As you can see, there is no dedicated IN syntax for left vs. right semi join - we achieve the effect simply by switching
the table positions within SQL text.
X
----
Amy
John
WARNING: Be careful if you happen to be using NOT IN on a NULL-able column! More details here.
Y
-------
Tim
Vincent
As you can see, there is no dedicated NOT IN syntax for left vs. right anti semi join - we achieve the effect simply by
switching the table positions within SQL text.
Cross Join
X Y
----- -------
Amy Lisa
John Lisa
Lisa Lisa
Marco Lisa
Phil Lisa
Amy Marco
John Marco
Lisa Marco
Marco Marco
Phil Marco
Amy Phil
John Phil
Lisa Phil
Marco Phil
Phil Phil
Amy Tim
Cross join is equivalent to an inner join with join condition which always matches, so the following query would
have returned the same result:
Self-Join
This simply denotes a table joining with itself. A self-join can be any of the join types discussed above. For example,
this is a an inner self-join:
X X
---- -----
Amy John
Amy Lisa
Amy Marco
John Marco
Lisa Marco
Phil Marco
Amy Phil
The following example will select all departments and the first name of employees that work in that department.
Departments with no employees are still returned in the results, but will have NULL for the employee name:
Departments.Name Employees.FName
HR James
HR John
HR Johnathon
Sales Michael
Tech NULL
and
Id Name
1 HR
2 Sales
3 Tech
First a Cartesian product is created from the two tables giving an intermediate table.
The records that meet the join criteria (Departments.Id = Employees.DepartmentId) are highlighted in bold; these are
passed to the next stage of the query.
As this is a LEFT OUTER JOIN all records are returned from the LEFT side of the join (Departments), while any
records on the RIGHT side are given a NULL marker if they do not match the join criteria. In the table below this will
return Tech with NULL
Finally each expression used within the SELECT clause is evaluated to return our final table:
Departments.Name Employees.FName
HR James
HR John
Sales Richard
Tech NULL
It is possible to get accidental cross joins which then return incorrect results, especially if you have a lot of
joins in the query.
If you intended a cross join, then it is not clear from the syntax (write out CROSS JOIN instead), and someone
is likely to change it during maintenance.
The following example will select employee's first names and the name of the departments they work for:
e.FName d.Name
James HR
John HR
Richard Sales
Which returns:
d.Name e.FName
HR James
HR John
HR Michael
HR Johnathon
Sales James
Sales John
Sales Michael
Sales Johnathon
Tech James
Tech John
Tech Michael
Tech Johnathon
It is recommended to write an explicit CROSS JOIN if you want to do a cartesian join, to highlight that this is what
you want.
The basic idea is that a table-valued function (or inline subquery) gets applied for every row you join.
This makes it possible to, for example, only join the first matching entry in another table.
The difference between a normal and a lateral join lies in the fact that you can use a column that you previously
joined in the subquery that you "CROSS APPLY".
Syntax:
PostgreSQL 9.3+
SQL-Server:
/*
AND
(
(__in_DateFrom <= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateTo)
AND
(__in_DateTo >= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateFrom)
)
*/
ORDER BY MAP_CTCOU_DateFrom
LIMIT 1
/*
AND
(
(@in_DateFrom <= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateTo)
AND
(@in_DateTo >= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateFrom)
)
*/
ORDER BY MAP_CTCOU_DateFrom
) AS FirstOE
A FULL OUTER JOIN returns all rows from the left table, and all rows from the right table.
If there are rows in the left table that do not have matches in the right table, or if there are rows in right table that
do not have matches in the left table, then those rows will be listed, too.
Example 1 :
Example 2:
SELECT
COALESCE(T_Budget.Year, tYear.Year) AS RPT_BudgetInYear
,COALESCE(T_Budget.Value, 0.0) AS RPT_Value
FROM T_Budget
Note that if you're using soft-deletes, you'll have to check the soft-delete status again in the WHERE-clause (because
FULL JOIN behaves kind-of like a UNION);
It's easy to overlook this little fact, since you put AP_SoftDeleteStatus = 1 in the join clause.
Also, if you are doing a FULL JOIN, you'll usually have to allow NULL in the WHERE-clause; forgetting to allow NULL
on a value will have the same effects as an INNER join, which is something you don't want if you're doing a FULL
JOIN.
Example:
SELECT
T_AccountPlan.AP_UID
,T_AccountPlan.AP_Code
,T_AccountPlan.AP_Lang_EN
,T_BudgetPositions.BUP_Budget
,T_BudgetPositions.BUP_UID
,T_BudgetPositions.BUP_Jahr
FROM T_BudgetPositions
WHERE (1=1)
AND (T_BudgetPositions.BUP_SoftDeleteStatus = 1 OR T_BudgetPositions.BUP_SoftDeleteStatus IS NULL)
AND (T_AccountPlan.AP_SoftDeleteStatus = 1 OR T_AccountPlan.AP_SoftDeleteStatus IS NULL)
UNION ALL
SELECT People.Name
FROM People
JOIN MyDescendants ON People.Name = MyDescendants.Parent
)
SELECT * FROM MyDescendants;
The following example will select employees' first names (FName) from the Employees table and the name of the
department they work for (Name) from the Departments table:
Employees.FName Departments.Name
James HR
John HR
Richard Sales
(These examples use the Employees and Customers tables from the Example Databases.)
Standard SQL
UPDATE
Employees
SET PhoneNumber =
(SELECT
c.PhoneNumber
FROM
Customers c
WHERE
c.FName = Employees.FName
AND c.LName = Employees.LName)
WHERE Employees.PhoneNumber IS NULL
SQL:2003
MERGE INTO
Employees e
USING
Customers c
ON
e.FName = c.Fname
AND e.LName = c.LName
AND e.PhoneNumber IS NULL
WHEN MATCHED THEN
UPDATE
SET PhoneNumber = c.PhoneNumber
SQL Server
UPDATE
Employees
SET
PhoneNumber = c.PhoneNumber
FROM
Employees e
INNER JOIN Customers c
ON e.FName = c.FName
AND e.LName = c.LName
WHERE
PhoneNumber IS NULL
UPDATE Cars
SET TotalCost = TotalCost + 100
WHERE Id = 3 or Id = 4
Update operations can include current values in the updated row. In this simple example the TotalCost is
incremented by 100 for two rows:
A column's new value may be derived from its previous value or from any other column's value in the same table or
a joined table.
UPDATE
Cars
SET
Status = 'READY'
WHERE
Id = 4
This statement will set the status of the row of 'Cars' with id 4 to "READY".
WHERE clause contains a logical expression which is evaluated for each row. If a row fulfills the criteria, its value is
updated. Otherwise, a row remains unchanged.
UPDATE Cars
SET Status = 'READY'
This statement will set the 'status' column of all rows of the 'Cars' table to "READY" because it does not have a WHERE
clause to filter the set of rows.
This would create an empty database named myDatabase where you can create tables.
You can use any of the other features of a SELECT statement to modify the data before passing it to the new table.
The columns of the new table are automatically created according to the selected rows.
CREATE TABLE creates a new table in the database, followed by the table name, Employees
This is then followed by the list of column names and their properties, such as the ID
The column CityID of table Employees will reference to the column CityID of table Cities. Below you could find
the syntax to make this.
Important: You couldn't make a reference to a table that not exists in the database. Be source to make first the
table Cities and second the table Employees. If you do it vise versa, it will throw an error.
SQL Server
RETURN @output
END
This example creates a function named FirstWord, that accepts a varchar parameter and returns another varchar
value.
BEGIN TRANSACTION
BEGIN TRY
INSERT INTO dbo.Sale(Price, SaleDate, Quantity)
VALUES (5.2, GETDATE(), 1)
INSERT INTO dbo.Sale(Price, SaleDate, Quantity)
VALUES (5.2, 'not a date', 1)
COMMIT TRANSACTION
END TRY
BEGIN CATCH
THROW
ROLLBACK TRANSACTION
END CATCH
BEGIN TRANSACTION
BEGIN TRY
INSERT INTO dbo.Sale(Price, SaleDate, Quantity)
VALUES (5.2, GETDATE(), 1)
INSERT INTO dbo.Sale(Price, SaleDate, Quantity)
VALUES (5.2, GETDATE(), 1)
COMMIT TRANSACTION
END TRY
BEGIN CATCH
THROW
ROLLBACK TRANSACTION
END CATCH
Let's say we want to extract the names of all the managers from our departments.
Using a UNION we can get all the employees from both HR and Finance departments, which hold the position of a
manager
SELECT
FirstName, LastName
FROM
HR_EMPLOYEES
WHERE
Position = 'manager'
UNION ALL
SELECT
FirstName, LastName
FROM
FINANCE_EMPLOYEES
WHERE
Position = 'manager'
The UNION statement removes duplicate rows from the query results. Since it is possible to have people having the
same Name and position in both departments we are using UNION ALL, in order not to remove duplicates.
If you want to use an alias for each output column, you can just put them in the first select statement, as follows:
SELECT
FirstName as 'First Name', LastName as 'Last Name'
FROM
HR_EMPLOYEES
WHERE
Position = 'manager'
UNION ALL
SELECT
FirstName, LastName
FROM
UNION joins 2 result sets while removing duplicates from the result set
UNION ALL joins 2 result sets without attempting to remove duplicates
One mistake many people make is to use a UNION when they do not need to have the duplicates removed.
The additional performance cost against large results sets can be very significant.
Suppose you need to filter a table against 2 different attributes, and you have created separate non-clustered
indexes for each column. A UNION enables you to leverage both indexes while still preventing duplicates.
This simplifies your performance tuning since only simple indexes are needed to perform these queries optimally.
You may even be able to get by with quite a bit fewer non-clustered indexes improving overall write performance
against the source table as well.
Suppose you still need to filter a table against 2 attributes, but you do not need to filter duplicate records (either
because it doesn't matter or your data wouldn't produce any duplicates during the union due to your data model
design).
This is especially useful when creating Views that join data that is designed to be physically partitioned across
multiple tables (maybe for performance reasons, but still wants to roll-up records). Since the data is already split,
having the database engine remove duplicates adds no value and just adds additional processing time to the
queries.
The above statement would add columns named StartingDate which cannot be NULL with default value as current
date and DateOfBirth which can be NULL in Employees table.
This will not only delete information from that column, but will drop the column salary from table employees(the
column will no more exist).
This will add a Primary key to the table Employees on the field ID. Including more than one column name in the
parentheses along with ID will create a Composite Primary Key. When adding more than one column, the column
names must be separated by commas.
This query will alter the column datatype of StartingDate and change it from simple date to datetime and set
default to current date.
This Drops a constraint called DefaultSalary from the employees table definition.
Note: Ensure that constraints of the column are dropped before dropping a column.
This example will insert all Employees into the Customers table. Since the two tables have different fields and you
don't want to move all the fields over, you need to set which fields to insert into and which fields to select. The
correlating field names don't need to be called the same thing, but then need to be the same data type. This
example is assuming that the Id field has an Identity Specification set and will auto increment.
If you have two tables that have exactly the same field names and just want to move all the records over you can
use:
This statement will insert a new row into the Customers table. Note that a value was not specified for the Id column,
as it will be added automatically. However, all other column values must be specified.
This statement will insert a new row into the Customers table. Data will only be inserted into the columns specified -
note that no value was provided for the PhoneNumber column. Note, however, that all columns marked as not null
must be included.
For inserting large quantities of data (bulk insert) at the same time, DBMS-specific features and recommendations
exist.
Note: The AND NOT EXISTS portion prevents updating records that haven't changed. Using the INTERSECT construct
allows nullable columns to be compared without special handling.
Now, we just discovered a new user named Joe and would like to take him into account. To achieve that, we need to
determine whether there is an existing row with his name, and if so, update it to increment count; on the other
hand, if there is no existing row, we should create it.
MySQL uses the following syntax : insert … on duplicate key update …. In this case:
Now, we just discovered a new user named Joe and would like to take him into account. To achieve that, we need to
determine whether there is an existing row with his name, and if so, update it to increment count; on the other
hand, if there is no existing row, we should create it.
PostgreSQL uses the following syntax : insert … on conflict … do update …. In this case:
create a Department table to hold information about departments. Then create an Employee table which hold
information about the employees. Please note, each employee belongs to a department, hence the Employee table
has referential integrity with the Department table.
First query selects data from Department table and uses CROSS APPLY to evaluate the Employee table for each
record of the Department table. Second query simply joins the Department table with the Employee table and all
the matching records are produced.
SELECT *
FROM Department D
CROSS APPLY (
SELECT *
FROM Employee E
WHERE E.DepartmentID = D.DepartmentID
) A
GO
SELECT *
FROM Department D
INNER JOIN Employee E
ON D.DepartmentID = E.DepartmentID
If you look at the results they produced, it is the exact same result-set; How does it differ from a JOIN and how does
it help in writing more efficient queries.
The first query in Script #2 selects data from Department table and uses OUTER APPLY to evaluate the Employee
table for each record of the Department table. For those rows for which there is not a match in Employee table,
those rows contains NULL values as you can see in case of row 5 and 6. The second query simply uses a LEFT
OUTER JOIN between the Department table and the Employee table. As expected the query returns all rows from
Department table; even for those rows for which there is no match in the Employee table.
SELECT *
FROM Department D
OUTER APPLY (
SELECT *
FROM Employee E
WHERE E.DepartmentID = D.DepartmentID
) A
GO
SELECT *
FROM Department D
LEFT OUTER JOIN Employee E
ON D.DepartmentID = E.DepartmentID
GO
Even though the above two queries return the same information, the execution plan will be bit different. But cost
wise there will be not much difference.
Now comes the time to see where the APPLY operator is really required. In Script #3, I am creating a table-valued
function which accepts DepartmentID as its parameter and returns all the employees who belong to this
department. The next query selects data from Department table and uses CROSS APPLY to join with the function
So now if you are wondering, can we use a simple join in place of the above queries? Then the answer is NO, if you
replace CROSS/OUTER APPLY in the above queries with INNER JOIN/LEFT OUTER JOIN, specify ON clause (something
as 1=1) and run the query, you will get "The multi-part identifier "D.DepartmentID" could not be bound." error. This
is because with JOINs the execution context of outer query is different from the execution context of the function
(or a derived table), and you can not bind a value/variable from the outer query to the function as a parameter.
Hence the APPLY operator is required for such queries.
See TRUNCATE documentation for details on how TRUNCATE performance can be better because it ignores triggers
and indexes and logs to just delete the data.
Let's assume we want to DELETEdata from Source once its loaded into Target.
Most common RDBMS implementations (e.g. MySQL, Oracle, PostgresSQL, Teradata) allow tables to be joined
during DELETE allowing more complex comparison in a compact syntax.
Adding complexity to original scenario, let's assume Aggregate is built from Target once a day and does not contain
the same ID but contains the same date. Let us also assume that we want to delete data from Source only after the
aggregate is populated for the day.
In PostgreSQL use: