SQL Recap
SQL Recap
26 June 2023
11:46
Exercise 1 — Tasks
1. Find the title of each film
SELECT title FROM Movies;
Where clause
25 July 2023
12:14
BETWEEN … AND … Number is within range of two values col_name BETWEEN 1.5 AND 10.5
(inclusive)
NOT BETWEEN … AND Number is not within range of two values col_name NOT BETWEEN 1 AND 10
… (inclusive)
NOT IN (…) Number does not exist in a list col_name NOT IN (1, 3, 5)
Exercise 2 — Tasks
a. Find the movie with a row id of 6
SELECT * FROM movies
WHERE id = 6;
b. Find the movies released in the years between 2000 and 2010
SELECT * FROM movies
WHERE year BETWEEN 2000 and 2010;
c. Find the movies not released in the years between 2000 and 2010
SELECT * FROM movies
WHERE year NOT BETWEEN 2000 and 2010;
More constraints
When writing WHERE clauses with columns containing text data, SQL
supports a number of useful operators to do things like case-
insensitive string comparison and wildcard pattern matching.
Operator Condition Example
= Case sensitive exact string comparison (notice the single col_name = "abc"
equals)
NOT LIKE Case insensitive exact string inequality comparison col_name NOT LIKE "ABCD"
_ Used anywhere in a string to match a single character (only col_name LIKE "AN_"
with LIKE or NOT LIKE) (matches "AND", but not "AN")
NOT IN (…) String does not exist in a list col_name NOT IN ("D", "E", "F")
All strings must be quoted so that the query parser can distinguish words in
the string from SQL keywords.
Exercise 3 — Tasks
a. Find all the Toy Story movies
SELECT * FROM movies
WHERE title like "%toy story%";
c. Find all the movies (and director) not directed by John Lasseter
SELECT title, director FROM movies
WHERE director NOT like "John Lasseter";
The LIMIT will reduce the number of rows to return, and the
optional OFFSET will specify where to begin counting the number
rows from.
Select query with limited rows
SELECT column, another_column, …
FROM mytable
WHERE condition(s)
ORDER BY column ASC/DESC LIMIT num_limit OFFSET num_offset;
The LIMIT and OFFSET are applied relative to the other parts of a query, they are generally done
last after the other clauses have been applied.
Exercise 4 — Tasks
a. List all directors of Pixar movies (alphabetically), without duplicates
SELECT DISTINCT director FROM movies
ORDER BY director ASC;
b. List the last four Pixar movies released (ordered from most recent to least)
SELECT title, year FROM movies
ORDER BY year DESC
LIMIT 4;
Review
25 July 2023
12:59
Review 1 — Tasks
1. List all the Canadian cities and their populations
SELECT city, population FROM north_american_cities
WHERE country = "Canada";
2. Order all the cities in the United States by their latitude from north to south
SELECT city, latitude FROM north_american_cities
WHERE country = "United States"
ORDER BY latitude DESC;
3. List all the cities west of Chicago, ordered from west to east. Seeing from the
table that Chicago Longitude is -87.629798
SELECT city, longitude FROM North_american_cities
WHERE longitude < -87.629798
ORDER BY longitude DESC
5. List the third and fourth largest cities (by population) in the United States and
their population
SELECT city FROM North_american_cities
WHERE country = "United States"
ORDER BY population DESC
LIMIT 2 OFFSET 2;
INNER JOIN
25 July 2023
13:21
Entity data in the real world is often broken down into pieces and stored across multiple orthogonal
tables using a process known as normalization.
Database normalization is useful because it minimizes duplicate data in any single table, and allows
for data in the database to grow independently of each other
The INNER JOIN is a process that matches rows from the first table and the second table which
have the same key (as defined by the ON constraint) to create a result row with the combined
columns from both tables. After the tables are joined, the other clauses are then applied.
the Movie_id column in the Box Office table corresponds with the Id column in
the Movies
Exercise 6 — Tasks
a. Find the domestic and international sales for each movie
SELECT Title, International_sales, Domestic_sales FROM Movies M
INNER JOIN Boxoffice B ON B.Movie_id = M.id;
b. Show the sales numbers for each movie that did better internationally rather than
domestically
SELECT Title, International_sales, Domestic_sales FROM Movies M
INNER JOIN Boxoffice B ON B.Movie_id = M.id
WHERE International_sales > Domestic_sales;
OUTER JOIN
25 July 2023
14:27
When joining table A to table B, a LEFT JOIN simply includes rows from A regardless of whether a
matching row is found in B. The RIGHT JOIN is the same, but reversed, keeping rows in B
regardless of whether a match is found in A. Finally, a FULL JOIN simply means that rows from
both tables are kept, regardless of whether a matching row exists in the other table.
When using any of these new joins, you will likely have to write additional logic to deal with NULLs
in the result and constraints.
Exercise 7 — Tasks
a. Find the list of all buildings that have employees
SELECT DISTINCT Building_name FROM Buildings B
INNER JOIN Employees E ON B.Building_name = E.Building ;
Exercise 8 — Tasks
1. Find the name and role of all employees who have not been assigned to a building
SELECT Name, Role FROM employees
WHERE Building IS NULL;
Expressions
26 July 2023
09:47
Expressions can use mathematical and string functions along with
basic arithmetic to transform values when the query is executed.
Example query with expressions
SELECT particle_speed / 2.0 AS half_particle_speed
FROM physics_data
WHERE ABS (particle_position) * 10.0 > 500;
Exercise 9 — Tasks
1. List all movies and their combined sales in millions of dollars
SELECT Title, (Domestic_sales + International_sales)/1000000 as Combined_sales
FROM Movies
INNER JOIN Boxoffice
ON Movies.id = Boxoffice.movie_id;
COUNT(*), COUNT(column A common function used to counts the number of rows in the group if no
) column name is specified. Otherwise, count the number of rows in the group
with non-NULL values in the specified column.
MIN(column) Finds the smallest numerical value in the specified column for all rows in the
group.
MAX(column) Finds the largest numerical value in the specified column for all rows in the
group.
AVG(column) Finds the average numerical value in the specified column for all rows in the
group.
SUM(column) Finds the sum of all numerical values in the specified column for the rows in
the group.
The GROUP BY clause works by grouping rows that have the same
value in the column specified.
Exercise 10 — Tasks
1. Find the longest time that an employee has been at the studio
SELECT MAX(Years_employed) FROM employees;
2. For each role, find the average number of years employed by employees in that
role
SELECT Role, AVG(Years_employed) FROM employees
GROUP BY Role;
Aggregates
26 July 2023
10:46
The GROUP BY clause is executed after the WHERE clause (which filters
the rows which are to be grouped).
An additional HAVING clause is used specifically with the GROUP
BY clause to allow us to filter grouped rows from the result set.
Select query with HAVING constraint
SELECT group_by_column, AGG_FUNC (column_expression) AS
aggregate_result_alias, …
FROM mytable
WHERE condition
GROUP BY column
HAVING group_condition;
The HAVING clause constraints are written the same way as
the WHERE clause constraints, and are applied to the grouped rows.
If you aren't using the `GROUP BY` clause, a simple `WHERE` clause
will suffice.
Exercise 11 — Tasks
1. Find the number of Artists in the studio (without a HAVING clause)
SELECT Role, COUNT(Name) FROM employees
WHERE Role = "Artist";
Each query begins with finding the data that we need in a database,
and then filtering that data down into something that can be
processed and understood as quickly as possible.
Complete SELECT query
SELECT DISTINCT column, AGG_FUNC(column_or_expression), …
FROM mytable
JOIN another_table
ON mytable.column = another_table.column
WHERE constraint_expression
GROUP BY column
HAVING constraint_expression
ORDERBY column ASC/DESC LIMIT count OFFSET count;
Exercise 12 — Tasks
1. Find the number of movies each director has directed
SELECT Director, COUNT(Title) FROM movies
GROUP BY Director;
2. Find the total domestic and international sales that can be attributed to each
director
SELECT Director, SUM(Domestic_sales + International_sales) AS Total_sales FROM
Movies M
INNER JOIN Boxoffice B
ON M.Id = B.Movie_Id
GROUP BY Director;
Inserting Rows
26 July 2023
11:40
Exercise 13 — Tasks
a. Add the studio's new production, Toy Story 4 to the list of movies (you can use
any director)
INSERT INTO movies VALUES (4, "Toy Story 4", "El Directore", 2015, 90);
b. Toy Story 4 has been released to critical acclaim! It had a rating of 8.7, and
made 340 million domestically and 270 million internationally. Add the
record to the BoxOffice table.
INSERT INTO boxoffice VALUES (4, 8.7, 340000000, 270000000);
Updating Rows
26 July 2023
11:55
The UPDATE statement, requires you to specify exactly which table,
columns, and rows to update. In addition, the data you are updating
has to match the data type of the columns in the table schema.
Update statement with values
UPDATE mytable
SET column = value_or_expr,
other_column = another_value_or_expr,
…
WHEREcondition;
Exercise 14 — Tasks
1. The director for A Bug's Life is incorrect, it was actually directed by John
Lasseter
UPDATE Movies
SET Director = "John Lasseter"
WHERE Id = 2;
2. The year that Toy Story 2 was released is incorrect, it was actually released
in 1999
UPDATE Movies
SET Year = 1999
WHERE id = 3;
3. Both the title and director for Toy Story 8 is incorrect! The title should be "Toy
Story 3" and it was directed by Lee Unkrich
UPDATE movies
SET title = "Toy Story 3", director = "Lee Unkrich"
WHERE id = 11;
Deleting Rows
27 July 2023
14:23
When you need to delete data from a table in the database, you can
use a DELETE statement, which describes the table to act on, and the
rows of the table to delete through the WHERE clause.
Delete statement with condition
DELETE FROM mytable
WHERE condition;
If you decide to leave out the WHERE constraint, then all rows are
removed, which is a quick and easy way to clear out a table
completely (if intentional).
It is downright easy to irrevocably remove data, so always
read your DELETE statements twice and execute once.
Exercise 15 — Tasks
1. This database is getting too big, lets remove all movies that were
released before 2005.
DELETE FROM movies
WHERE Year < 2005;
2. Andrew Stanton has also left the studio, so please remove all movies directed by
him.
DELETE FROM movies
WHERE Director = "Andrew Stanton";
Creating Tables
27 July 2023
14:28
INTEGER, BOOLEAN The integer datatypes can store whole integer values like the count of a
number or an age. In some implementations, the boolean value is just
represented as an integer value of just 0 or 1.
FLOAT, DOUBLE, REAL The floating point datatypes can store more precise numerical data like
measurements or fractional values. Different types can be used depending
on the floating point precision required for that value.
CHARACTER(num_ch The text based datatypes can store strings and text in all sorts of locales. The
ars), distinction between the various types generally amount to underlaying
VARCHAR(num_chars efficiency of the database when working with these columns.
),
TEXT Both the CHARACTER and VARCHAR (variable character) types are specified
with the max number of characters that they can store (longer values may be
truncated), so can be more efficient to store and query with big tables.
DATE, DATETIME SQL can also store date and time stamps to keep track of time series and
event data. They can be tricky to work with especially when manipulating
data across timezones.
BLOB Finally, SQL can store binary data in blobs right in the database. These values
are often opaque to the database, so you usually have to store them with
the right metadata to requery them.
Table constraints
Each column can have additional table constraints on it which limit
what values can be inserted into that column.
Constraint Description
PRIMARY KEY This means that the values in this column are unique, and each value can be used to
identify a single row in this table.
AUTOINCREME For integer values, this means that the value is automatically filled in and
NT incremented with each row insertion. Not supported in all databases.
UNIQUE This means that the values in this column have to be unique, so you can't insert
another row with the same value in this column as another row in the table. Differs
from the `PRIMARY KEY` in that it doesn't have to be a key for a row in the table.
NOT NULL This means that the inserted value cannot be `NULL`.
CHECK This allows you to run a more complex expression to test whether the values
(expression) inserted are valid. For example, you can check that values are positive, or greater
than a specific size, or start with a certain prefix, etc.
FOREIGN KEY This is a consistency check which ensures that each value in this column corresponds
to another value in a column in another table.
For example, if there are two tables, one listing all Employees by ID, and another
listing their payroll information, the `FOREIGN KEY` can ensure that every row in the
payroll table corresponds to a valid employee in the master Employee list.
An example
Movies table schema
CREATETABLEmovies (
id INTEGER PRIMARYKEY,
title TEXT,
director TEXT,
year INTEGER,
length_minutes INTEGER);
Exercise 16 — Tasks
a. Create a new table named Database with the following columns:
– Name A string (text) describing the name of the database
– Version A number (floating point) of the latest version of this database
– Download_count An integer count of the number of times this database was
downloaded
This table has no constraints.
Altering Tables
27 July 2023
14:38
As your data changes over time, SQL provides a way for you to
update your corresponding tables and database schemas by using
the ALTER TABLE statement to add, remove, or modify columns and
table constraints.
Adding columns
You need to specify the data type of the column along with any
potential table constraints and default values to be applied to both
existing and new rows. In some databases like MySQL, you can even
specify where to insert the new column using
the FIRST or AFTER clauses.
Altering table to add new column(s)
ALTER TABLE mytable
ADD column DataType Optional TableConstraint
DEFAULTdefault_value;
Removing columns
Dropping columns is as easy as specifying the column to drop,
however, some databases (including SQLite) don't support this
feature. Instead you may have to create a new table and migrate
the data over.
Altering table to remove column(s)
ALTER TABLE mytable
DROP column_to_be_deleted;
Exercise 17 — Tasks
1. Add a column named Aspect_ratio with a FLOAT data type to store the aspect-
ratio each movie was released in.
ALTER TABLE movies
ADD Aspect_ratio FLOAT;
2. Add another column named Language with a TEXT data type to store the
language that the movie was released in. Ensure that the default for this language
is English.
ALTER TABLE movies
ADD Language TEXT
Default "English";
Dropping Tables
27 July 2023
14:45
In some rare cases, you may want to remove an entire table
including all of its data and metadata, and to do so, you can use
the DROP TABLE statement, which differs from the DELETE statement
in that it also removes the table schema from the database entirely.
Drop table statement
DROP TABLE IF EXISTS mytable;
Like the CREATE TABLE statement, the database may throw an error if
the specified table does not exist, and to suppress that error, you
can use the IF EXISTS clause.
In addition, if you have another table that is dependent on columns
in table you are removing (for example, with a FOREIGN
KEY dependency) then you will have to either update all dependent
tables first to remove the dependent rows or to remove those tables
entirely.
Subqueries
27 July 2023
15:22
First, you would need to calculate the average revenue all the Associates are generating:
SELECT AVG(revenue_generated)
FROM sales_associates;
And then using that result, we can then compare the costs of each of the Associates
against that value. To use it as a subquery, we can just write it straight into
the WHERE clause of the query:
SELECT*
FROM sales_associates
WHERE salary >
(SELECTAVG(revenue_generated)
FROMsales_associates);
As the constraint is executed, each Associate's salary will be tested against the value
queried from the inner subquery.
A subquery can be referenced anywhere a normal table can be
referenced. Inside a FROM clause, you can JOIN subqueries with other
tables, inside a WHERE or HAVING constraint, you can test expressions
against the results of the subquery, and even in expressions in
the SELECT clause, which allow you to return data directly from the
subquery. They are generally executed in the same logical order as
the part of the query that they appear in.
Because subqueries can be nested, each subquery must be fully
enclosed in parentheses in order to establish proper hierarchy.
Subqueries can otherwise reference any tables in the database, and
make use of the constructs of a normal query (though some
implementations don't allow subqueries to use LIMIT or OFFSET).
Correlated subqueries
From <https://sqlbolt.com/topic/subqueries>
When working with multiple tables, the UNION and UNION ALL operator allows you to
append the results of one query to another assuming that they have the same column
count, order and data type. If you use the UNION without the ALL, duplicate rows
between the tables will be removed from the result.
Select query with set operators
SELECT column, another_column
FROM mytable
UNION/ UNIONALL/ INTERSECT/ EXCEPT SELECT other_column,
yet_another_column
FROM another_table
ORDER BY column DESC LIMIT n;
The UNION happens before the ORDER BY and LIMIT. It's not common to use UNIONs,
but if you have data in different tables that can't be joined and processed, it can be an
alternative to making multiple queries on the database.
Similar to the UNION, the INTERSECT operator will ensure that only rows that are
identical in both result sets are returned, and the EXCEPT operator will ensure that only
rows in the first result set that aren't in the second are returned. This means that
the EXCEPT operator is query order-sensitive, like the LEFT JOIN and RIGHT JOIN.
Both INTERSECT and EXCEPT also discard duplicate rows after their respective
operations, though some databases also support INTERSECT ALL and EXCEPT ALL to
allow duplicates to be retained and returned.