0% found this document useful (0 votes)
60 views95 pages

Intermediate SQL Edited

This document outlines an intermediate SQL course led by Jasmin Ludolf, focusing on querying databases, specifically using PostgreSQL. Key topics include using COUNT and DISTINCT functions, understanding query execution order, debugging SQL errors, and adhering to SQL style best practices. The course also emphasizes the importance of filtering data using the WHERE clause and provides exercises to practice these concepts.

Uploaded by

oishi761994
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views95 pages

Intermediate SQL Edited

This document outlines an intermediate SQL course led by Jasmin Ludolf, focusing on querying databases, specifically using PostgreSQL. Key topics include using COUNT and DISTINCT functions, understanding query execution order, debugging SQL errors, and adhering to SQL style best practices. The course also emphasizes the importance of filtering data using the WHERE clause and provides exercises to practice these concepts.

Uploaded by

oishi761994
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 95

Intermediate SǪL

1. Querying a database
Hello, my name is Jasmin Ludolf, and I'll be your instructor for this course on using SQL to turn raw
data into actionable insights. We'll build on our foundational knowledge of SQL, learn how to reveal
insights, and how to present our results clearly.
1.1. Course roadmap

• Querying databases
• Count and view specified records
• Understand query execution and style
• Filtering
• Aggregate functions
• Sorting and grouping
While SQL can be used to create and modify databases, the focus of this course will be querying
databases. Recall that a query is a request for data from a database. In this course, we'll look at how to
execute a query for a database using keywords that will enable us to count and view all or a
specified amount of records. We'll go over common SQL errors, style guidelines, and the order in
which our code will execute. We'll cover how to filter data using various techniques, how to use
aggregate functions, and finally, how to sort and group the results. We'll be using PostgreSQL
throughout.
1.2. Our films database

We will work with a films database containing four tables: films, reviews, people, and roles. Our
database schema, pictured here, shows the table names, field names, and data types.
1.3. COUNT()

• COUNT()
• Counts the number of records with a value in a field
• Use an alias for clarity
Here we go with our first new keyword. Let's say we wanted to count something from our people
table. The COUNT function lets us do this by returning the number of records with a value in a field.
For example, to count the number of birth dates present in the people table, we will use SELECT
COUNT birthdate FROM people. The result is 6152 birthdates. We've used the alias "count
birthdates" for the field name in this example to make the results more readable.
1.3.1. COUNT() multiple fields

If we want to count more than one field, we need to use COUNT multiple times. Here we are counting
both the number of names and birth dates present in the people table.
1.3.2. Using * with COUNT()

• COUNT(field_name) counts values in a field


• COUNT(*) counts records in a table
• represents all fields
Using COUNT with a field name tells us how many values are in a field. However, if we want to
count the number of records in a table, we can call COUNT with an asterisk. For example, this code
gives the total number of records in the people table. The asterisk represents all fields. Passing the
asterisk to COUNT is a shortcut for counting the total number of records.
1.4. DISTINCT
• DISTINCT removes duplicates to return only unique values

• Which languages are in our films table ?

Often, our results will include duplicates. We can use the DISTINCT keyword to select all the unique
values from a field. This might be useful if, for example, we're interested in knowingwhich languages
are represented in the films table. Adding DISTINCT to our query will remove all duplicates, as we
can see here.
1.5. COUNT() with DISTINCT
• Combine COUNT() with DISTINCT to count unique values

• COUNT() includes duplicates


• DISTINCT excludes duplicates
Combining COUNT with DISTINCT is also common to count the number of unique values in a field.
This query counts the number of distinct birth dates in the people table. Let's take a moment to
consider why this number is different from the birthdate count of 6152 we got before. Some people in
our table likely share the same birthday; COUNT would include all the duplicates while DISTINCT
counts all of the unique dates, no matter how many times they come up.

Exercise
Practice with COUNT()

As you've seen, COUNT(*) tells you how many records are in a table. However, if you want to count
the number of non-missing values in a particular field, you can call COUNT() on just that field.

Let's get some practice with COUNT()! You can look at the data in the tables throughout these
exercises by clicking on the table name in the console.

Instructions 1/3:
• Count the total number of records in the people table, aliasing the result as count_records.

• -- Count the number of records in the people table


• SELECT COUNT(people) AS count_records
• FROM people;
Instructions 2/3:
• Count the number of records with a birthdate in the people table, aliasing the result as
count_birthdate.

-- Count the number of birthdates in the people table


SELECT COUNT(birthdate) AS count_birthdate
FROM people;

Instructions 3/3:
• Count the records for languages and countries in the films table; alias as count_languages and
count_countries.

-
- Count the records for languages and countries represented in
the films table
SELECT COUNT(language) AS count_languages, COUNT(country) A S
count_countries
FROM films;
Exercise
SELECT DISTINCT
Often query results will include many duplicate values. You can use the DISTINCT keyword to select
the unique values from a field.

This might be useful if, for example, you're interested in knowing which languages are represented in
the films table. See if you can find out what countries are represented in this table with the following
exercises.

Instructions 1/2:
• Return the unique countries represented in the films table using DISTINCT.

-- Return the unique countries from the films table


SELECT DISTINCT country
FROM films;
Instructions 2/2:
• Return the number of unique countries represented in the films table, aliased as
count_distinct_countries.

-- Count the distinct countries from the films table


SELECT COUNT(DISTINCT country) AS count_distinct_countries
FROM films;

2. Query execution
Fantastic work on using COUNT and DISTINCT! Now that we've flexed our SQL muscle a bit, we'll
take a small step back and better understand how SQL code works.
2.1. Order of execution
• SQL is not processed in its written order
• LIMIT limits how many results we return

• Good to know processing order for debugging and aliasing


• Aliases are declared in the SELECT statement
Unlike many programming languages, SQL code is not processed in the order it is written. Consider
we want to grab a coat from a closet: first, we need to know which closet contains the coats. This is
similar to the FROM statement, which is the first line to be processed. Before any data can be
selected, the table from which the data will be selected needs to be indicated. Next, our SELECTion is
made. Finally, the results are refined. Here we use the LIMIT keyword that limits the results to a
specified number of records. In this case, we only want to return the first ten names from the people
table. Knowing processing order is especially useful when debugging and aliasing fields and tables.
Suppose we need to refer to an alias later on in our code. In that case, that alias will only make sense
to a processor when its declaration in the SELECT statement is processed before the alias reference is
made elsewhere in the query.
2.2. Debugging SQL

• Misspelling
• Incorrect capitalization
• Incorrect or missing punctuation
Before we begin working with more advanced queries, it's useful to know more about debugging SQL
code and how to read the error messages. Some messages are extremely helpful, pinpointing and even
suggesting a solution for the error, as this message does when we misspell the "name" field we'd like
to select. Other common errors may involve incorrect capitalization or punctuation.
2.3. Comma errors

• Look out for comma errors!


Other error messages are less helpful and require us to review our code more closely. Forgetting a
comma is a very common error. Let's say we've drafted this code to find all titles, country of origin,
and duration of films. The error message will alert us to the general location of the error using a caret
below the line of code, which in this case points to the "country" field name. We must examine the
code a little further, though, to discover the missing comma is between "country" and "duration".
2.4. Keyword errors

SQL displays a similar error message when a keyword is misspelled, but this time, the caret indicator
below the offending line is spot on.
2.5. Final note on errors

Most common errors:


• Misspelling
• Incorrect capitalization
• Incorrect or missing punctuation, especially commas

Learn by making mistakes


There are a few more SQL errors out there, but the three mentioned in this lesson will be the most
common ones we will encounter. Debugging is a major skill, and the best way to master this skill is to
make mistakes and learn from them.
Exercise
Debugging errors

Debugging is an essential skill for all coders, and it comes from making many mistakes and learning
from them.
In this exercise, you'll be given some buggy code that you'll need to fix.

Instructions 1/3:
• Debug and fix the SQL query provided.
-- Debug this code
SELECT certification
FROM films
LIMIT 5;
Instructions 2/3:
• Find the two errors in this code; the same error has been repeated twice.

-- Debug this code


SELECT film_id, imdb_score, num_votes
FROM reviews;

Instructions 3/3:
• Find the two bugs in this final query.
• -- Debug this code
• SELECT COUNT(birthdate) AS count_birthdays
• FROM people;
3. SQL style
3.1. SQL formatting
• Formatting is not required
• But lack of formatting can cause issues

SQL is a generous language when it comes to formatting. New lines, capitalization, and indentation
are not required in SQL as they sometimes are in other programming languages. For example, the
code on this slide will run just fine, returning the first three titles, release years, and countries from the
films table. However, writing queries like this won't make us any friends in the SQL world because
the lack of formatting makes the code difficult to read, especially as queries become more complex.
3.2. Best practices

• Capitalize keywords
• Add new lines
Over time, SQL users have developed style standards that are generally accepted across industries.
This code returns the same results as the code on the previous slide, but it is much easier to read due to
the addition of capitalized keywords and new lines between them.
3.3. Style guides

While keyword capitalization and new lines are standard practice, many of the finer details of SQL
style are not. For instance, some SQL users prefer to create a new line and indent each selected field
when a query selects multiple fields, as the query on this slide does.

Because of the different formatting styles, it's helpful to follow a SQL style guide, such as Holywell's,
which outlines a standard of best practices for indentation, capitalization, and naming conventions for
tables, fields, and aliases. Remember, though, that there is no single required formatting in SQL: the
guiding principle is writing clear and readable code.
3.4. Semicolon

• Best practice
• Easier to translate between SQL flavors
• Indicates the end of a query
Have you noticed the sample code we've been looking at throughout this lesson has a semicolon at the
end? Like capitalization and new lines, this semicolon is unnecessary in PostgreSQL; we could leave
it out of the query and still expect the same results with no errors. However, including a semicolon at
the end of the query is considered best practice for several reasons. First, some SQL flavors require it,
so it's a good habit to have. Including a semicolon in a PostgreSQL query means that the query is
more easily translated to another flavor if necessary. Additionally, like a period at the end of a
sentence, a semicolon at the end of a query indicates its end, which is helpful in a file containing
several queries.

3.5. Dealing with non-standard field names


• Release year instead of release_year
• Put non-standard field names in double-quotes

One last note on SQL style: while we can ensure our code is formatted beautifully, we don't have
control over other people's SQL style. When creating a table, a SQL mistake is including spaces in a
field name. To query that table, we'll need to enclose the field name in double-quotes to indicate that,
despite being two words, the name refers to just one field. For example, if a sloppy SQL coder had
named a field release-space-year as two words, we'd need to update the query we've seen throughout
this chapter to the one shown here.
3.6. Why do we format?
• Easier collaboration
• Clean and readable
• Looks professional
• Easier to understand
• Easier to debug
Adhering to SQL style guides allows for easier collaboration between peers. Having clean and
readable code is highly valued in the community and a professional setting and will make things easier
for anyone wanting to understand or debug our queries.

Exercise
Formatting

Readable code is highly valued in the coding community and professional settings. Without proper
formatting, code and results can be difficult to interpret. You'll often be working with other people that
need to understand your code or be able to explain your results, so having a solid formatting habit is
essential.

In this exercise, you'll correct poorly written code to better adhere to SQL style standards.

Instructions:
• Adjust the sample code so that it is in line with standard practices.
• -- Rewrite this query
• SELECT person_id, role
FROM roles
LIMIT 10;
4. Filtering numbers
4.1. WHERE

• WHERE filtering clause


To filter, we need to use a new clause called WHERE, which allows us to focus on only the data
relevant to our business questions. Going back to our coat analogy, we may want to select a coat from
the closet

where the color is green. The WHERE clause can help us with that.
4.1.1. WHERE with comparison operators

We will focus on filtering numbers in this lesson. To do this, we will be using comparison operators
such as greater than. Here is an example of a query where we filtered to see only films released after
the year 1960 using the greater than operator.
Let's explore some of the other operators. We would use the less-than operator to see films released
before the year 1960.

We would use the less than or equal to operator to see films released during or before the year 1960.
If we want to see films released in a specific year, we can use equals.
Here is a final example that isn't as intuitive as the others. If we wanted to filter films to see all releases
EXCEPT those from the year 1960, we would combine the less than and greater than symbols as
shown here. This is the SQL standard symbol that means "not equal to".
• > Greater than or after
• < Less than or before
• = Equal to
• >= Greater than or equal to
• <= Less than or equal to
• <> Not equal to
Let's recap all the comparison operators we can use with WHERE to filter numbers. We have: greater
than (that also means after), less than (that also means before), equal to, greater than or equal to, less
than or equal to, and not equal to.
4.1.2. WHERE with strings

• Use single-quotes around strings we want to filter


WHERE and the comparison operator, equals, can also be used with strings. In these cases, we will
have to use single quotation marks around the strings we want to filter. For example, here, we want to
filter titles where the country is Japan.
4.2. Order of execution

A final note on using WHERE. Similar to LIMIT, this clause comes after the FROM statement when
writing a query. If we use both WHERE and LIMIT, the written order will be SELECT, FROM,
WHERE, LIMIT; however, the order of execution will now be FROM, WHERE, SELECT, LIMIT.
Thinking about the coats in our closet, we go to the closet we want to get the coat from, find where the
green coats are, and select five of them.

Exercise
Using WHERE with numbers

Filtering with WHERE allows you to analyze your data better. You may have a dataset that includes a
range of different movies, and you need to do a case study on the most notable films with the biggest
budgets. In this case, you'll want to filter your data to a specific budget range.

Now it's your turn to use the WHERE clause to filter numeric values!

Instructions 1/3:
• Select the film_id and imdb_score from the reviews table and filter on scores higher than 7.0.
• -
- Select film_id and imdb_score with an imdb_score over 7.0
• SELECT film_id, imdb_score
• FROM reviews
• WHERE imdb_score>7.0;
Instructions 2/3:
• Select the film_id and facebook_likes of the first ten records with less than 1000 likes from the
reviews table.
• -
- Select film_id and facebook_likes for ten records with less than
1000 likes
• SELECT film_id, facebook_likes
• FROM reviews
• WHERE facebook_likes<1000
• LIMIT 10;
Instructions 3/3
• Count how many records have a num_votes of at least 100,000; use the alias films_over_100K_votes.
-- Count the records with at least 100,000 votes
SELECT COUNT(*) AS films_over_100k_votes
FROM reviews
WHERE num_votes >=100000;

Exercise
Using WHERE with text

WHERE can also filter string values.

Imagine you are part of an organization that gives cinematography awards, and you have several international
categories. Before you confirm an award for every language listed in your dataset, it may be worth seeing if
there are enough films of a specific language to make it a fair competition. If there is only one movie or a
significant skew, it may be worth considering a different way of giving international awards.

Let's try this out!


Instructions:
• Select and count the language field using the alias count_spanish.
• Apply a filter to select only Spanish from the language field.
• -- Count the Spanish-language films
• SELECT COUNT(language) AS count_spanish
• FROM films
• WHERE language='Spanish';

5. Multiple criteria
Great work on filtering numbers! Our SQL skills are growing fast. Next up, we will look at how to
filter with multiple criteria.

There will often be the case that we have more than one criteria we'd like to meet. Looking again at
our favorite coats, perhaps we want narrow down our choices to coats
that are yellow

and shorter in length.


• OR, AND, BETWEEN
We will be learning about three additional keywords that will allow us to enhance our filters when
using WHERE by adding multiple criteria. These are OR, AND, and BETWEEN. In the context of
our coats, we could look at coats where the color is yellow or the length is short, or we could filter for
coats where both criteria are true. We can also look for coats that have between one and five buttons.
5.1. OR operator

• Use OR when you need to satisfy at least one condition


The first keyword we will look at is the OR operator. OR is used when we want to filter multiple
criteria and only need to satisfy at least one condition. Perhaps we want to select green or purple coat
options as an example.
• Correct
• Invalid
In SQL, we combine OR with WHERE to achieve this type of filtering. Here is an example using the
films database. The query on the left returns all films released in either 1994 or 2000. Note that we
must specify the field for every OR condition, so the query on the right is invalid. That query hasn't
specified what field or operator should be associated with the year 2000.
5.2. AND operator

• Use AND if we need to satisfy all criteria


• Correct:

• Invalid:
If we want to satisfy all criteria in our filter, we need to use AND with WHERE. For example, this
query gives us the titles of films released between 1994 and 2000. We need to specify the field name
separately for every AND condition as with OR.
5.3. AND, OR

• Filter films released in 1994 or 1995, and certified PG or R


Let's kick it up a notch. We now want to filter films released in 1994 OR 1995, AND with a
certification of either PG or R. Thankfully, we can combine AND and OR to answer this question. If a
query has multiple filtering conditions, we will need to enclose the individual clauses in parentheses to
ensure the correct execution order; otherwise, we may not get the expected results.
5.4. BETWEEN, AND

As we've learned, we can use this query to get titles of all films released in and between 1994 and
2000. Checking for ranges like this is very common, so in SQL the BETWEEN keyword provides a
valuable shorthand for filtering values within a specified range. This second query is equivalent to the
one on the left. It's important to remember that BETWEEN is inclusive, meaning the results contain
the beginning and end values.
5.5. BETWEEN, AND, OR
Like the WHERE clause, the BETWEEN clause can be used with multiple AND and OR operators,
so we can build up our queries and make them even more powerful! For example, we can get the titles
of all films released between 1994 and 2000 from the United Kingdom.

Exercise
Using AND

The following exercises combine AND and OR with the WHERE clause. Using these operators
together strengthens your queries and analyses of data.

You will apply these new skills now on the films table.

Instructions 1/3:
• Select the title and release_year for all German-language films released before 2000.
• -- Update the query to see all German-
language films released after 2000
• SELECT title, release_year
FROM films
WHERE release_year < 2000
AND language = 'German';
Instructions 2/3:
• Update the query from the previous step to show German-language films released after 2000
rather than before.

-- Update the query to see all German-


language films released after 2000
SELECT title, release_year
FROM films
WHERE release_year > 2000
AND language = 'German';
Instructions 3/3
• Select all details for German-language films released after 2000 but before 2010 using only
WHERE and AND.
-- Select all records for German-
language films released after 2000 and before 2010
SELECT *
FROM films
WHERE release_year > 2000
And release_year < 2010
AND language = 'German';

Exercise
Using OR
This time you'll write a query to get the title and release_year of films released in 1990 or 1999, which
were in English or Spanish and took in more than $2,000,000 gross.

It looks like a lot, but you can build the query up one step at a time to get comfortable with the
underlying concept in each step. Let's go!

Instructions 1/3:
• Select the title and release_year for films released in 1990 or 1999 using only WHERE and
OR.

-- Find the title and year of films from the 1990 or 1999
SELECT title, release_year
FROM films
WHERE release_year = 1990
OR release_year = 1999;
Instructions 2/3:
• Filter the records to only include English or Spanish-language films.
-- Add a filter to see only English or Spanish-
language films
AND (language = 'English' OR language = 'Spanish');

Instructions 3/3:
• Finally, restrict the query to only return films worth more than $2,000,000 gross.
• -- Filter films with more than $2,000,000 gross
• AND gross > 2000000;
Exercise
Using BETWEEN

Let's use BETWEEN with AND on the films database to get the title and release_year of all Spanish-
language films released between 1990 and 2000 (inclusive) with budgets over $100 million.

We have broken the problem into smaller steps so that you can build the query as you go along!

Instructions 1/4:
• Select the title and release_year of all films released between 1990 and 2000 (inclusive) using
BETWEEN.
-
- Select the title and release_year for films released betw
een 1990 and 2000
SELECT title, release_year
FROM films
WHERE release_year
BETWEEN 1990 AND 2000;
Instructions 2/4:
• Build on your previous query to select only films with a budget over $100 million.
-
- Narrow down your query to films with budgets > $100 milli on
AND budget > 100000000;

Instructions 3/4:
• Now, restrict the query to only return Spanish-language films.
-- Restrict the query to only Spanish-language films
AND language = 'Spanish';

Instructions 4/4:
• Finally, amend the query to include all Spanish-language or French-language films with the
same criteria.
-- Amend the query to include Spanish or French-
language films
AND (language = 'Spanish' OR language = 'French');
6. Filtering text
We're making excellent progress! We will now switch our focus away from filtering numbers to
filtering textual data.

• WHERE can also filter text

As we've briefly seen, we can use the WHERE clause to filter text data. However, so far, we've only
been able to filter by specifying the exact text we're interested in.

• Filter a pattern rather than specific text


• LIKE
• NOT LIKE
• IN

We'll often want to search for a pattern rather than a specific text string in the real world. We'll be
introducing three more SQL keywords into our vocabulary to help us achieve this: LIKE, NOT LIKE,
and IN.
6.1. LIKE

• Used to search for a pattern in a field


% match zero, one, or many characters

_ match a single character


• Ev_ Mendes
In SQL, we can use the LIKE operator with a WHERE clause to search for a pattern in a field. We use
a wildcard as a placeholder for some other values to accomplish this. There are two wildcards with
LIKE, the percent, and the underscore. The percent wildcard will match zero, one, or many characters
in the text. For example, the query on the left matches people like Adel, Adelaide, and Aden. The
underscore wildcard will match a single character. For example, the query on the right matches only
three-letter names like Eve. We'd also see names like Eva if it were in our dataset. Eva Mendes,
however, would not be visible unless the search criteria looked like this.
6.2. NOT LIKE

We can also use the NOT LIKE operator to find records that don't match the specified pattern. In this
query, we are finding records for people who do not have A-dot as part of their first name. It's
important to note that this operation is case-sensitive, so we must be mindful of what we are querying.
6.3. Wildcard position

We've reviewed one example of where to position each wildcard, but we can actually put them
anywhere and combine them! We can find values that start, end, or contain characters in any position,
as well as find records of a certain length. For example, this code on the left will find all people whose
name ends in r. The code on the right will find records where the third character is t.
6.4. WHERE, OR

What if we want to filter based on many conditions or a range of numbers? We could chain several
ORs to the WHERE clause based on what we know, but that can get messy. We can see an example
here where we select the film titles released in 1920, 1930, or 1940.
6.5. WHERE, IN
A helpful operator here is IN. The IN operator allows us to specify multiple values in a WHERE
clause, making it easier and quicker to set numerous OR conditions. Neat, right? So, the example
shown on the previous slide would simply become WHERE release_year IN 1920, 1930, 1940, where
the years are enclosed in parentheses.

Here is another example using a text field where we want to find the title WHERE the associated
country is either Germany or France.
Exercise
LIKE and NOT LIKE

The LIKE and NOT LIKE operators can be used to find records that either match or do not match a
specified pattern, respectively. They can be coupled with the wildcards % and _. The % will match
zero or many characters, and _ will match a single character.

This is useful when you want to filter text, but not to an exact word. Do the following exercises to gain
some practice with these keywords.

Instructions 1/3
• Select the names of all people whose names begin with 'B'.

• -- Select the names that start with B


• SELECT name
• FROM people
• WHERE name LIKE 'B%';
Instructions 2/3:
• Select the names of people whose names have 'r' as the second letter.

SELECT name
FROM people
-- Select the names that have r as the second letter
WHERE name LIKE '_r%';
Instructions 3/3:
Select the names of people whose names don't start with 'A'.

SELECT name
FROM people
-- Select names that don't start with A
WHERE name NOT LIKE 'A%';
Exercise
WHERE IN
You now know you can query multiple conditions using the IN operator and a set of parentheses. It is
a valuable piece of code that helps us keep our queries clean and concise.

Try using the IN operator yourself!

Instructions 1/3:
• Select the title and release_year of all films released in 1990 or 2000 that were longer than two
hours.
• -
- Find the title and release_year for all films over two ho
urs in length released in 1990 and 2000
• SELECT title, release_year
• FROM films
• WHERE (release_year IN (1990, 2000) AND duration >120);
Instructions 2/3:
• Select the title and language of all films in English, Spanish, or French using IN.

-
- Find the title and language of all films in English, Span
ish, and French
SELECT title, language
FROM films
WHERE language IN ('English','Spanish','French');

Instructions 3/3:
• Select the title, certification and language of all films certified NC-17 or R that are in English,
Italian, or Greek.

• -
- Find the title, certification, and language all films cer
tified NC-17 or R that are in English, Italian, or Greek
• SELECT title, certification, language
• FROM films
• WHERE certification = 'NC-17'
• OR certification = 'R'
• AND language IN ('English','Italian','Greek');
Exercise
Combining filtering and selecting
Time for a little challenge. So far, your SQL vocabulary from this course
includes COUNT(), DISTINCT, LIMIT, WHERE, OR, AND, BETWEEN, LIKE, NOT LIKE, and
IN. In this exercise, you will try to use some of these together. Writing more complex queries will be
standard for you as you become a qualified SQL programmer.

As this query will be a little more complicated than what you've seen so far, we've included a bit of
code to get you started. You will be using DISTINCT here too because, surprise, there are two movies
named 'Hamlet' in this dataset!

Follow the instructions to find out what 90's films we have in our dataset that would be suitable for
English-speaking teens.

Instructions:
• Count the unique titles from the films database and use the alias provided.
• Filter to include only movies with a release_year from 1990 to 1999, inclusive.
• Add another filter narrowing your query down to English-language films.
• Add a final filter to select only films with 'G', 'PG', 'PG-13' certifications.
• -- Count the unique titles
• SELECT COUNT(DISTINCT(title)) AS nineties_english_films_for
_teens
• FROM films
• -- Filter to release_years to between 1990 and 1999
• WHERE release_year
• BETWEEN 1990 AND 1999
• -- Filter to English-language films
• AND language ='English'
• -- Narrow it down to G, PG, and PG-13 certifications
• AND certification IN ('G','PG','PG-13');
7. Null values
7.1. Missing values
• COUNT(field_name) includes only non-missing values
• COUNT(*) include missing values

Null
• Missing values:
o Human error
o Information not available
o Unknown
When we were learning how to use the COUNT keyword, we learned that we could include or
exclude non-missing values depending on whether or not we use the asterisk in our query. But what is
a missing or non-missing value? In SQL, NULL represents a missing or unknown value. Why is this
useful? In the real world, our databases will likely have empty fields either because of human error or
because the information is not available or is unknown. Knowing how to handle these fields is
essential as they can affect any analyses we do.
7.2. null

For example, we used COUNT all with an asterisk on the left. Suppose our goal is to analyze
posthumous success using data from the people table. We might make the wrong assumption that
because we have a field name called deathdate, this information is available for everyone. Half of
them are, in fact, NULL, as we can see on the right, so we would make an inaccurate judgment on
what the data means.
7.3. IS NULL

One quick way to see how much of our data is missing is by using IS NULL with the WHERE clause.
Here is an example where we have checked to see which names do not have a recorded birthdate in
our table.
7.4. IS NOT NULL

On the left is an example of counting the missing birthdates in the people table. The count is 2245.
Sometimes, we'll want to filter out missing values, so we only get results that are not NULL. To do
this, we can use the IS NOT NULL operator. For example, this query on the right gives the count of
all people whose birth dates are not missing in the people table, giving us a new count of 6152.
7.5. COUNT() vs IS NOT NULL

There may be a question about the difference between using COUNT with a field name and using the
same COUNT with the added WHERE clause with IS NOT NULL. The answer is there is no
difference, as both will be counting non-missing values.

7.6. NULL put simply


• NULL values are missing values
• Very common
• Use IS NULL or IS NOT NULL to:
o Identify missing values
o Select missing values
o Exclude missing values
Before we wrap up this lesson, let's review what we've learned. NULL values are missing values, and
they are very common in the real world. It is good practice to know how many NULL values are in
our data by using the IS NULL or IS NOT NULL operator for filtering. These keywords will help to
identify, select, or exclude missing values. Don't worry; this will soon become second nature because
it is that common!
Exercise
Practice with NULLs
Well done. Now that you know what NULL means and what it's used for, it's time for some more
practice!

Let's explore the films table again to better understand what data you have.

Instructions 1/2:
• Select the title of every film that doesn't have a budget associated with it and use the alias
no_budget_info.

• -- List all film titles with missing budgets


• SELECT title AS no_budget_info
• FROM films
• WHERE budget IS NULL;
Instructions 2/2:
• Count the number of films with a language associated with them and use the alias
count_language_known.

-- Count the number of films we have language data for


SELECT COUNT(title) AS count_language_known
FROM films
WHERE language IS NOT NULL;

8. Summarizing data

• Aggregate functions return a single value

When analyzing data, we often want to understand the dataset as a whole in addition to looking at
individual records. One way to do this is to summarize the data using SQL's aggregate functions. An
aggregate function performs a calculation on several values and returns a single value.
8.1. Aggregate functions

We already know one aggregate function, COUNT()! We'll now learn four new aggregate functions,
allowing us to find the average, sum, minimum, and maximum of a specified field. Let's look at some
examples of how they work. These aggregate functions come after SELECT, exactly like COUNT().
This first query gives us the average value from the budget field of the films table. That's an average
of over 39 million per film in the films table. The SUM() function returns the result of adding the
values in the budget field. Here, the total budget of all the films is over 181 billion!

The MIN() function returns the lowest budget, and the MAX() function returns the highest budget.
The 2006 South Korean movie "The Host" had a budget of over 12 billion. That sounds huge, but our
data is in multiple currencies, so a true comparison would require currency exchange rates as well.
Note that we operate on the field (or column) with all of these aggregate functions, not the records (or
rows).
8.2. Non-numerical data
Numerical fields only
• AVG()
• SUM()

Various data types


• COUNT()
• MIN()
• MAX()

Although these functions appear to be mathematical, we can use several of them with both numerical
and non-numerical fields. Average and sum are the two aggregate functions we can only use on
numerical fields since they require arithmetic. We can use count, minimum, and maximum with non-
numerical fields as we already saw with COUNT() in previous lessons.
MIN() <-> MAX()
Minimum <-> Maximum Lowest <-> Highest
A <-> Z
1715 <-> 2022
0 <-> 100

COUNT() can provide a total of any non-missing, or not null, records in a field regardless of their
type. Similarly, minimum and maximum will give the record that is figuratively the lowest or highest.
Lowest can mean the letter A when dealing with strings or the earliest date when dealing with dates.
And, of course, with numbers, it is the highest or the lowest number.

Here are some examples of using these functions with non-numerical fields. We can select the
minimum and maximum country from the films database and see that our minimum country, or
country that would come first in the alphabet, is Afghanistan. In contrast, West Germany is the
maximum country that would come last in the alphabet. Our database does not contain any films made
in Zambia or Zimbabwe, but it does have at least one film made back when Germany was two
different countries!
8.3. Aliasing when summarizing

Notice how all query results have automatically updated the field name to the function. In this case,
min. It's best practice to use an alias when summarizing data so that our results are clear to anyone
reading our code.

Exercise
Practice with aggregate functions
Now let's try extracting summary information from a table using these new aggregate functions.
Summarizing is helpful in real life when extracting top-line details from your dataset. Perhaps you'd
like to know how old the oldest film in the films table is, what the most expensive film is, or how
many films you have listed.

Now it's your turn to get more insights about the films table!

Instructions 1/4:
• Use the SUM() function to calculate the total duration of all films and alias with
total_duration.

-- Query the sum of film durations


SELECT SUM(duration) AS total_duration
FROM films;
Instructions 2/4:
• Calculate the average duration of all films and alias with average_duration.

• -- Calculate the average duration of all films


• SELECT AVG(duration) AS average_duration
• FROM films;

Instructions 3/4:
• Find the most recent release_year in the films table, aliasing as latest_year.

-- Find the latest release_year


SELECT MAX(release_year) AS latest_year
FROM films;

Instructions 4/4:
• Find the duration of the shortest film and use the alias shortest_film.

-- Find the duration of the shortest film


SELECT MIN(duration) AS shortest_film
FROM films;
9. Summarizing subsets
9.1. Using WHERE with aggregate functions

We can combine aggregate functions with the WHERE clause to gain further insights from our data.
That's because the WHERE clause executes before the SELECT statement. For example, to get the
average budget of movies made in 2010 or later, we would select the average of the budget field from
the films table where the release year is greater than or equal to 2010.

Here are a few more examples using the other functions: we find the total budget of movies made in
2010 using the SUM function, that's over 8.9 billion! Next, we get the smallest budget using the MIN
function, which is 65,000.
Here, we query the highest budget using the MAX function. 600 million feels like a lot again for a
movie budget, but this is in Indian Rupees for the movie "Kites". Finally, we query the count of the
number of budgets using the COUNT function, which gives us the total number of non-missing values
in the budget field, meaning there are 194 budgets recorded for the year 2010 in the films table.

9.2. ROUND()
• Round a number to a specified decimal

Now that we are doing all sorts of things with our numerical values, we'll likely want to clean up some
of the crazy decimals that might appear. In SQL, we can use ROUND() to round our number to a
specified decimal. There are two parameters for ROUND(): the number we want to round and the
decimal place we want to round to. Here we have re-calculated the same average budget as before, but
this time we have included ROUND() and specified we want to round to two decimal places because
we are dealing with currency.
9.2.1. ROUND() to a whole number

The second parameter in our ROUND() function is optional, so we can leave it out if we want to
round to a whole number. We would get the same result if we passed zero as the second argument, as
it is the default when no number is given.
9.2.2. ROUND() using a negative parameter

• Numerical fields only

Here is a tricky one: we could also pass a negative number as the second parameter and still get a
result. Here, the function is rounding to the left of the decimal point instead of the right. Using
negative five as the decimal place parameter will cause the function to round to the hundred thousand
or five places to the left. ROUND() can only be used with numerical fields.
Exercise
Combining aggregate functions with WHERE
When combining aggregate functions with WHERE, you get a powerful tool that allows you to get
more granular with your insights, for example, to get the total budget of movies made from the year
2010 onwards.

This combination is useful when you only want to summarize a subset of your data. In your film-
industry role, as an example, you may like to summarize each certification category to compare how
they each perform or if one certification has a higher average budget than another.

Let's see what insights you can gain about the financials in the dataset.

Instructions 1/4:
• Use SUM() to calculate the total gross for all films made in the year 2000 or later, and use the
alias total_gross.
• -- Calculate the sum of gross from the year 2000 or later
• SELECT SUM(gross) AS total_gross
• FROM films
• WHERE release_year >=2000;

Instructions 2/4:
• Calculate the average amount grossed by all films whose titles start with the letter 'A' and alias
with avg_gross_A.

-- Calculate the average gross of films that start with A


SELECT AVG(gross) AS avg_gross_A
FROM films
WHERE title LIKE 'A%';
Instructions 3/4
• Calculate the lowest gross film in 1994 and use the alias lowest_gross.

-- Calculate the lowest gross film in 1994


SELECT MIN(gross) AS lowest_gross
FROM films
WHERE release_year=1994;

Instructions 4/4:
• Calculate the highest gross film between 2000 and 2012, inclusive, and use the alias
highest_gross
• -- Calculate the highest gross film released between 2000-
2012
• SELECT MAX(gross) AS highest_gross
• FROM films
• WHERE release_year
• BETWEEN 2000 AND 2012;
Exercise
Using ROUND()
Aggregate functions work great with numerical values; however, these results can sometimes get
unwieldy when dealing with long decimal values. Luckily, SQL provides you with
the ROUND() function to tame these long decimals.

If asked to give the average budget of your films, ten decimal places is not necessary. Instead, you can
round to two decimal places to create results that make more sense for currency.

Now you try!

Instructions:
• Calculate the average facebook_likes to one decimal place and assign to the alias,
avg_facebook_likes.
-
- Round the average number of facebook_likes to one decimal
place
SELECT ROUND(AVG(facebook_likes),1) AS avg_facebook_likes
FROM reviews;
Exercise
ROUND() with a negative parameter
A useful thing you can do with ROUND() is have a negative number as the decimal place parameter.
This can come in handy if your manager only needs to know the average number
of facebook_likes to the hundreds since granularity below one hundred likes won't impact decision
making.

Social media plays a significant role in determining success. If a movie trailer is posted and barely
gets any likes, the movie itself may not be successful. Remember how 2020's "Sonic the Hedgehog"
movie got a revamp after the public saw the trailer?

Let's apply this to other parts of the dataset and see what the benchmark is for movie budgets so, in the
future, it's clear whether the film is above or below budget.

Instructions:
• Calculate the average budget from the films table, aliased as avg_budget_thousands, and
round to the nearest thousand.
-- Calculate the average budget rounded to the thousands
SELECT ROUND(AVG(budget),-3) AS avg_budget_thousands
FROM films;
10. Aliasing and arithmetic

We can perform basic arithmetic with symbols like plus, minus, multiply, and divide. Using
parentheses with arithmetic indicates to the processor when the calculation needs to execute. Here are
some basic examples of how we can use arithmetic in SQL. We can add, subtract, multiply, and
divide as follows. In these examples, the parentheses are not required as only one calculation takes
place; however, they provide more clarity to the code. But, the division gives a result of one; why is
that?

Similar to other programming languages, SQL assumes that we want to get an integer back if we
divide an integer by an integer. So be careful! When dividing, we can add decimal places to our
numbers if we want more precision. For example, SELECT four-point-zero divided by three-point-
zero gives us the result we would expect: 1-point-3 repeating.
10.1. Aggregate functions vs. arithmetic

What's the difference between using aggregate functions and arithmetic? The key difference is that
aggregate functions, like SUM, perform their operations on the fields vertically while arithmetic adds
up the records horizontally.

10.2. Aliasing with arithmetic

Before we move on, let's run through an arithmetic example using our database. Here we have
selected the gross, how much the movie made, minus the budget, how much the movie cost, from our
films table. The result is the amount of profit. Notice that the query's result doesn't give us a defined
field name. We will always need to use an alias when summarizing data with aggregate functions and
arithmetic.
10.3. Aliasing with functions

As we progress and learn how to manipulate our data, it will be even more important to keep our field
names clear. For example, if we're using multiple MAX functions in one query, we'll have two fields
named max, which isn't very useful! This is a situation when it's especially important to alias like we
do here.
10.4. Order of execution

• Step 1: FROM
• Step 2: WHERE
• Step 3: SELECT (aliases are defined here)
• Step 4: LIMIT
• Aliases defined in the SELECT clause cannot be used in the WHERE clause due to order of
execution
Let's explore how using an alias fits into the SQL execution order. Here is a reminder of the order of
execution we know so far: SQL will process the FROM statement first, followed by the WHERE
clause, then the SELECT statement, and finally, LIMIT. When adding an alias for a field name in the
SELECT clause, we might assume we could use it later in our query with the WHERE clause.
Unfortunately, that is not possible; as we can see by the order of execution, the query would not have
created the alias yet, and our code would generate an error.
Exercise
Aliasing with functions
Aliasing can be a lifesaver, especially as we start to do more complex SQL queries with multiple
criteria. Aliases help you keep your code clean and readable. For example, if you want to find
the MAX() value of several fields without aliasing, you'll end up with the result with several columns
called max and no idea which is which. You can fix this with aliasing.

Now, it's over to you to clean up the following queries.

Instructions 1/3:
• Select the title and duration in hours for all films and alias as duration_hours; since the current
durations are in minutes, you'll need to divide duration by 60.0.

-- Calculate the title and duration_hours from films


SELECT title, (duration/60.0) AS duration_hours
FROM films;

Instructions 2/3
• Calculate the percentage of people who are no longer alive and alias the result as
percentage_dead.

• -
- Calculate the percentage of people who are no longer alive
• SELECT COUNT(deathdate) * 100.0 / COUNT(*) AS percentage_de
ad
• FROM people;
Instructions 3/3
• Find how many decades (period of ten years) the films table covers by using MIN() and
MAX(); alias as number_of_decades.

-- Find the number of decades in the films table


SELECT (MAX(release_year)-
MIN(release_year)) / 10.0 AS number_of_decades
FROM films;
Exercise
Rounding results
You found some valuable insights in the previous exercise, but many of the results were
inconveniently long. We forgot to round! We won't make you redo them all; however, you'll update
the worst offender in this exercise.

Instructions:
• Update the query by adding ROUND() around the calculation and round to two decimal
places.
• -- Round duration_hours to two decimal places
• SELECT title, ROUND(duration / 60.0,2) AS duration_hours
• FROM films;
11. Sorting results

Sorting results means we want to put our data in a specific order. It's another way to make our data
easier to understand by quickly seeing it in a sequence. Let's say we wanted to extract our three
longest coats; if our closet were messy, it would take a long time to find. However, if we sorted our
closet by garment type and length, we could quickly grab them!
11.1. ORDER BY

In SQL, the ORDER BY keyword is used to sort results of one or more fields. When used on its own,
it is written after the FROM statement, as shown here. ORDER BY will sort in ascending order by
default. This can mean from smallest to biggest or from A to Z. In this case, we have one query
sorting the budget from smallest to biggest and a second query sorting the titles alphabetically. Our
database contains film titles that start with symbols and numbers; these come before the letter A.
11.1.1. ASCending

We could also add the ASC keyword to our query to clarify that we are sorting in ascending order.
The results are the same, and our code is more readable.
11.1.2. DESCending

We can use the DESC keyword to sort the results in descending order. This query gives us the film
titles sorted by budget from biggest to smallest. However, our data contains a lot of null values. We
can add a WHERE clause before ORDER BY to filter the budget field for only non-null values and
improve our results.

11.2. Sorting fields

Notice that we don't have to select the field we are sorting on. For example, here's a query where we
sort by release year and only look at the title. However, it is a good idea to include the field we are
sorting on in the SELECT statement for clarity.
11.3. ORDER BY multiple fields

• ORDER BY field_one, field_two


• Think of field_two as a tie-breaker

ORDER BY can also be used to sort on multiple fields. It will sort by the first field specified, then sort
by the next, etc. To specify multiple fields, we separate the field names with a comma. The second
field we sort by can be thought of as a tie-breaker when the first field is not decisive in telling the
order. Here is an example. Let's say we wanted to find the best movie. In the first query, we are only
sorting the films by the number of Oscar wins and getting a tie. We can break that tie by adding a
second sorting field by seeing which film has the most wins and the highest imdb_score.
11.4. Different orders

We can also select a different order for each field we are sorting. For example, here, we are sorting
birthdate in ascending order and name in descending order.
11.5. Order of execution

ORDER BY falls towards the end of the order of execution we already know, coming in just before
limit. The FROM statement will execute first, then WHERE, followed by SELECT, ORDER BY, and
finally, LIMIT.
Exercise
Sorting single fields
Now that you understand how ORDER BY works, you'll put it into practice. In this exercise, you'll
work on sorting single fields only. This can be helpful to extract quick insights such as the top-
grossing or top-scoring film.

The following exercises will help you gain further insights into the film database.

Instructions 1/2:
• Select the name of each person in the people table, sorted alphabetically.

-- Select name from people and sort alphabetically


SELECT name
FROM people
ORDER BY name;
Instructions 2/2:
• Select the title and duration for every film, from longest duration to shortest.

• -
- Select the title and duration from longest to shortest film
• SELECT title, duration
• FROM films
• ORDER BY duration DESC;

Exercise
Sorting multiple fields
ORDER BY can also be used to sort on multiple fields. It will sort by the first field specified, then sort
by the next, and so on. As an example, you may want to sort the people data by age and keep the
names in alphabetical order.

Try using ORDER BY to sort multiple columns.

Instructions 1/2:
Select the release_year, duration, and title of films ordered by their release year and duration, in that
order.
• -
- Select the release year, duration, and title sorted by re
lease year and duration
• SELECT release_year, duration, title
• FROM films
• ORDER BY release_year, duration;
Instructions 2/2:
• Select the certification, release_year, and title from films ordered first by certification
(alphabetically) and second by release year, starting with the most recent year.

-
- Select the certification, release year, and title sorted by
certification and release year
SELECT certification, release_year, title
FROM films
ORDER BY certification, release_year DESC;
12. Grouping data

In the real world, we'll often need to summarize data for a particular group of results. For example, we
might want to see the film data grouped by certification and make calculations on those groups, such
as the average duration for each certification.
12.1. GROUP BY single fields

SQL allows us to group with the GROUP BY clause. Here it is used in a query where we have
grouped by certification. GROUP BY is commonly used with aggregate functions to provide
summary statistics, particularly when only grouping a single field, certification, and selecting multiple
fields, certification and title. This is because the aggregate function will reduce the non- grouped field
to one record only, which will need to correspond to one group.
12.2. Error handling

SQL will return an error if we try to SELECT a field that is not in our GROUP BY clause. We'll need
to correct this by adding an aggregate function around title.
12.3. GROUP BY multiple fields

We can use GROUP BY on multiple fields similar to ORDER BY. The order in which we write the
fields affects how the data is grouped. The query here selects and groups certification and language
while aggregating the title. The result shows that we have five films that have missing values for both
certification and language, two films that are unrated and in Japanese, two films that are rated R and in
Norwegian, and so on.

12.4. GROUP BY with ORDER BY

We can combine GROUP BY with ORDER BY to group our results, make a calculation, and then
order our results. For example, we can clean up one of our previous queries by sorting the results by
the title count in descending order. Here is that query without ORDER BY, and this is the same query
with ordering added. ORDER BY is always written after GROUP BY, and notice that we can refer
back to the alias within the query. That is because of the order of execution. It looks like movies rated
R are most common in our database.
12.5. Order of execution

GROUP BY fits into our order after FROM and before all other clauses. Our updated queries will
begin with FROM, followed by grouping, selecting the data and creating the alias, sorting the results,
and limiting them to the desired number.

Exercise
GROUP BY single fields

GROUP BY is a SQL keyword that allows you to group and summarize results with the additional
use of aggregate functions. For example, films can be grouped by the certification and language
before counting the film titles in each group. This allows you to see how many films had a particular
certification and language grouping.

In the following steps, you'll summarize other groups of films to learn more about the films in your
database.

Instructions 1/2:
• Select the release_year and count of films released in each year aliased as film_count.

• -- Find the release_year and film_count of each year


• SELECT release_year, COUNT(title) AS film_count
• FROM films
• GROUP BY release_year;
Instructions 2/2:
• Select the release_year and average duration aliased as avg_duration of all films, grouped by
release_year.
• -
- Find the release_year and average duration of films for e
ach year
• SELECT release_year, AVG(duration) AS avg_duration
• FROM films
• GROUP BY release_year;
Exercise
GROUP BY multiple fields
GROUP BY becomes more powerful when used across multiple fields or combined with ORDER BY
and LIMIT.

Perhaps you're interested in learning about budget changes throughout the years in individual
countries. You'll use grouping in this exercise to look at the maximum budget for each country in each
year there is data available.

Instructions:
• Select the release_year, country, and the maximum budget aliased as max_budget for each
year and each country; sort your results by release_year and country.
• -
- Find the release_year, country, and max_budget, then grou
p and order by release_year and country
• SELECT release_year, country, MAX(budget) AS max_budget
• FROM films
• GROUP BY release_year, country
• ORDER BY release_year, country;
13. Filtering grouped data
That was excellent work. We've combined sorting and grouping; next, we will combine filtering with
grouping.

13.1. HAVING
In SQL, we can't filter aggregate functions with WHERE clauses. For example, this query attempting
to filter the title count is invalid. That means that if we want to filter based on the result of an
aggregate function, we need another way. Groups have their own special filtering word: HAVING.
For example, this query shows only those years in which more than ten films were released.
13.2. Order of execution

The reason why groups have their own keyword for filtering comes down to the order of execution.
We've written a query using many of the keywords we have covered here. This is their written order,
starting with SELECT, FROM films, WHERE the certification is G, PG, or PG-13, GROUP BY
certification, HAVING the title count be greater than 500, ORDER BY title count, and LIMIT to
three. In contrast, the order of execution is: FROM, WHERE, GROUP BY, HAVING, SELECT,
ORDER BY, and LIMIT. By reviewing this order, we can see WHERE is executed before GROUP
BY and before any aggregation occurs. This order is also why we cannot use the alias with HAVING,
but we can with ORDER BY.
13.3. HAVING vs WHERE

• WHERE filters individual records, HAVING filters grouped records


• What films were released in the year 2000?
• In what years was the average film duration over two hours ?

WHERE filters individual records while HAVING filters grouped records. We'll walk through two
business questions here to show how to translate them into the correct filter. The first question is
"What films were released in the year 2000?". This question does not indicate any sort of grouping. It
asks to see only the titles from a specific year and can therefore be written as SELECT title, FROM
films, WHERE release year equals 2000. The second question is, "In what years was the average film
duration over two hours?". Straight away, we can see this question has a few more layers. Let's break
down the question and query into smaller, easier-to-understand steps.
This question requires us to return information about years, so we select the release year from the
films table. Next, it asks for the average film duration, which tells us we need to place AVG(duration)
somewhere. Since we do not need to provide any additional information around the duration on its
own, it is unlikely we need to perform the aggregation within the SELECT clause, so we'll try the
HAVING clause instead. The last part of the question indicates we need to filter on the duration.
Since we can't filter aggregates with WHERE, this supports our theory about using HAVING! Finally,
we need to add a GROUP BY into our query since we have selected a column that has not been
aggregated. Recall the aggregate function will convert the duration values into one average value.
Going back to the start of our question, we're interested in knowing the average duration per year, so
we group it by release year. And there we have it!
Exercise
Filter with HAVING
Your final keyword is HAVING. It works similarly to WHERE in that it is a filtering clause, with the
difference that HAVING filters grouped data.

Filtering grouped data can be especially handy when working with a large dataset. When working
with thousands or even millions of rows, HAVING will allow you to filter for just the group of data
you want, such as films over two hours in length!

Practice using HAVING to find out which countries (or country) have the most varied film
certifications.

Instructions:
• Select country from the films table, and get the distinct count of certification aliased as
certification_count.
• Group the results by country.
• Filter the unique count of certifications to only results greater than 10.

-
- Select the country and distinct count of certification as cert
ification_count
SELECT country,COUNT(DISTINCT(certification)) AS certification_c
ount
FROM films
-- Group by country
GROUP BY country
-
- Filter results to countries with more than 10 different certif
ications
HAVING COUNT(DISTINCT(certification))>10;
Exercise
HAVING and sorting
Filtering and sorting go hand in hand and gives you greater interpretability by ordering our results.

Let's see this magic at work by writing a query showing what countries have the highest average film
budgets.

Instructions:
• Select the country and the average budget as average_budget, rounded to two decimal, from
films.
• Group the results by country.
• Filter the results to countries with an average budget of more than one billion (1000000000).
• Sort by descending order of the average_budget.
• -- Select the country and average_budget from films
• SELECT country, ROUND(AVG(budget),2) AS average_budget
• FROM films
• -- Group by country
• GROUP BY country
• -
- Filter to countries with an average_budget of more than o
ne billion
• HAVING AVG(budget)>1000000000
• -- Order by descending order of the aggregated budget
• ORDER BY average_budget DESC;
Exercise
All together now
It's time to use much of what you've learned in one query! This is good preparation for using SQL in
the real world where you'll often be asked to write more complex queries since some of the basic
queries can be answered by playing around in spreadsheet applications.

In this exercise, you'll write a query that returns the average budget and gross earnings for films each
year after 1990 if the average budget is greater than 60 million.

This will be a big query, but you can handle it!

Instructions 1/4:
• Select the release_year for each film in the films table, filter for records released after 1990,
and group by release_year.
• -
- Select the release_year for films released after 1990 gro
uped by year
• SELECT release_year
• FROM films
• WHERE release_year >1990
• GROUP BY release_year;
Instructions 2/4:
• Modify the query to include the average budget aliased as avg_budget and average gross
aliased as avg_gross for the results we have so far.

• -
- Modify the query to also list the average budget and aver
age gross
• SELECT release_year, AVG(budget) AS avg_budget, AVG(gross)
AS avg_gross
• FROM films
• WHERE release_year > 1990
• GROUP BY release_year;

Instructions 3/4:
• Modify the query once more so that only years with an average budget of greater than 60
million are included.

-
- Modify the query to see only years with an avg_budget of
more than 60 million
Having AVG(budget)>60000000;
Instructions 4/4:
• Finally, order the results from the highest average gross and limit to one.
• SELECT release_year, AVG(budget) AS avg_budget, AVG(gross)
AS avg_gross
• FROM films
• WHERE release_year > 1990
• GROUP BY release_year
• HAVING AVG(budget) > 60000000
• -
- Order the results from highest to lowest average gross and
limit to one
• ORDER BY avg_gross DESC
• LIMIT 1;

You might also like