0% found this document useful (0 votes)
427 views170 pages

Advanced SQL Querying

لالالالالال

Uploaded by

Muhammad Usman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
427 views170 pages

Advanced SQL Querying

لالالالالال

Uploaded by

Muhammad Usman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 170

SQL

Advanced
Querying
With SQL Pocket Guide Author Alice Zhao

*Copyright Maven Analytics, LLC


COURSE STRUCTURE

This is a hands-on, project-based course for students looking for a


practical approach to learning advanced SQL querying techniques

Additional resources include:

Downloadable PDF to serve as a helpful reference when you’re offline or on the go

Quizzes & Assignments to test and reinforce key concepts, with step-by-step solutions

Interactive demos to keep you engaged and apply your skills throughout the course

*Copyright Maven Analytics, LLC


COURSE OUTLINE

Review the big 6 clauses of a SQL query along with other


1 SQL Basics Review commonly used keywords like LIMIT, DISTINCT, and more

Review JOIN basics (INNER, LEFT, RIGHT, OUTER) and


2 Multi-Table Analysis introduce variations like self joins, CROSS JOINs, and more

Learn how to write subqueries and Common Table Expressions


3 Subqueries & CTEs and understand the best situations for using certain techniques

Introduce window functions to perform calculations across a set of


4 Window Functions rows and discuss various function options and applications

Discover the many SQL functions that can be applied to fields of


5 Functions by Data Type numeric, datetime, string, and NULL data types

Apply advanced querying techniques to common data analysis


6 Data Analysis Applications scenarios, including pivoting data, rolling calculations, and more

*Copyright Maven Analytics, LLC


PREVIEW: FINAL PROJECT

THE You’ve just been hired as a Data Analyst Intern for Major League Baseball (MLB),
SITUATION who has recently gotten access to a large amount of historical player data

You have access to decades worth of data including player statistics like schools
THE attended, salaries, teams played for, height and weight, and more
ASSIGNMENT Your task is to use advanced SQL querying techniques to track how player statistics
have changed over time and across different teams in the league

1. What schools do MLB players attend?


THE
2. How much do teams spend on player salaries?
OBJECTIVES
3. What does each player’s career look like?
4. How do player attributes compare?

Data source: http://seanlahman.com *Copyright Maven Analytics, LLC


SETTING EXPECTATIONS

This course focuses on querying, not database management


• We’ll cover concepts commonly used by analysts & data scientists to write complex SELECT statements
• We will NOT cover concepts more often used for database management such as stored procedures, triggers,
user-defined functions (UDFs), etc. – we cover most of those in our Advanced MySQL DBA course!

This is an advanced course, so we’ll only quickly review the basics


• It is strongly recommended that you complete our MySQL Data Analysis course before taking this one

You’ll be learning common data analysis applications within SQL


• We’ll apply SQL code to use cases that can be done within other tools like Excel or Python, including
summarizing and pivoting, dealing with duplicate data, performing rolling calculations, and more

We’ll cover general SQL syntax, but the demos will be in MySQL
• The SQL concepts taught in this course will apply to any relational database management system (Oracle,
PostgreSQL, SQL Server, SQLite, etc.), so you are welcome to use the SQL editor of your choice

*Copyright Maven Analytics, LLC


INSTALLATION & SETUP

*Copyright Maven Analytics, LLC


INSTALLATION & SETUP

In this section we’ll discuss where you can write SQL code, walk through the MySQL &
MySQL Workbench installation process and help you load the data for this course

TOPICS WE’LL COVER: GOALS FOR THIS SECTION:

Where to Write Installing MySQL & • Pick a place to write SQL code
SQL Code MySQL Workbench
• Install MySQL and MySQL Workbench (optional)

Getting Started with Loading Data For • Create a MySQL Connection to be able to start
MySQL Workbench This Course writing SQL code (optional)

• Load the tables for this course into your SQL editor

*Copyright Maven Analytics, LLC


WHERE CAN I WRITE SQL CODE?

There are many options when it comes to picking a place to write SQL code:

Where to Write 1 RDBMS (Relational Database Management System)


SQL Code
• Open source: MySQL, PostgreSQL, SQLite, etc.
• Proprietary: Oracle Database, Microsoft SQL Server, etc.
Installing MySQL &
MySQL Workbench
2 SQL Editor
Getting Started • Open source: MySQL Workbench, pgAdmin, DBeaver, etc.
with MySQL
• Proprietary: SQL Server Management Studio, Toad, etc.

Loading Data for


This Course 3 Within Another Language
PRO TIP: To quickly start writing SQL code
• Python, R, Java, etc. with no installations, you can also write
simple SQL queries within some websites –
4 Online SQL Editors just do a search for “online SQL editor”

*Copyright Maven Analytics, LLC


WHERE CAN I WRITE SQL CODE?

If you’d like to see the exact same interface as the demos in this course, you’ll
need to download & install two programs:

Where to Write
SQL Code
1 RDBMS: MySQL 2 SQL Editor: MySQL Workbench
Installing MySQL &
MySQL Workbench

Getting Started
with MySQL

Loading Data for


This Course

These installs will take ~10 min to complete


If you’re already working with another SQL
editor, feel free to use that one for this course

*Copyright Maven Analytics, LLC


INSTALLING MYSQL (MAC)

Where to Write
SQL Code

Installing MySQL &


MySQL Workbench

Getting Started
with MySQL

Loading Data for


This Course

*Copyright Maven Analytics, LLC


INSTALLING MYSQL (MAC)

1 Go to dev.mysql.com/downloads/mysql/ to download MySQL Community Server

Where to Write 2 Select the following menu options and download the DMG Archive version
SQL Code
• Version: 9.1.0 Innovation (latest version)
• Operating System: macOS
Installing MySQL &
MySQL Workbench • OS Version: x86 or ARM (if you’re on an M1 / M2 / M3 Mac)

Getting Started
with MySQL 3 No need to Login or Sign Up, just click “No thanks, just start my download”

Loading Data for


This Course 4 Find the install file in your downloads, then double click to run the installer package

5 Click through each install step, leaving defaults unless you need customized settings
• NOTE: Make sure you store your root password somewhere, you’ll need this later!

*Copyright Maven Analytics, LLC


INSTALLING MYSQL WORKBENCH (MAC)

Where to Write
SQL Code

Installing MySQL &


MySQL Workbench

Getting Started
with MySQL

Loading Data for


This Course

*Copyright Maven Analytics, LLC


INSTALLING MYSQL WORKBENCH (MAC)

1 Go to dev.mysql.com/downloads/workbench/ to download MySQL Workbench

Where to Write 2 Select the following menu options and download the DMG Archive version
SQL Code
• Operating System: macOS
• OS Version: x86 or ARM (if you’re on an M1 / M2 / M3 Mac)
Installing MySQL &
MySQL Workbench

3 No need to Login or Sign Up, just click “No thanks, just start my download”
Getting Started
with MySQL
4 Find the install file in your downloads, then double click to open the installer package
Loading Data for
This Course
5 Drag the MySQL Workbench icon into the Applications folder icon

6 Open the MySQL Workbench app from your Applications folder

*Copyright Maven Analytics, LLC


INSTALLING MYSQL (PC)

Where to Write
SQL Code

Installing MySQL &


MySQL Workbench

Getting Started
with MySQL

Loading Data for


This Course

*Copyright Maven Analytics, LLC


INSTALLING MYSQL (PC)

1 Go to dev.mysql.com/downloads/mysql/ to download MySQL Community Server

2 Select the following menu options and download the MSI Installer version
Where to Write • Version: 9.1.0 Innovation (latest version)
SQL Code
• Operating System: Microsoft Windows

Installing MySQL &


MySQL Workbench 3 No need to Login or Sign Up, just click “No thanks, just start my download”

Getting Started
with MySQL 4 Find the install file in your downloads, then double click to open the installer package

Loading Data for 5 Click through each step, leaving defaults unless you need customized settings
This Course
• Choose Setup Type: Typical
• On the last step, make sure Run MySQL Configurator is checked and click Finish

6 In the MySQL Configurator pop up window, click through each step, leaving defaults
unless you need customized settings
• NOTE: Make sure you store your root password somewhere, you’ll need this later!

*Copyright Maven Analytics, LLC


INSTALLING MYSQL WORKBENCH (PC)

Where to Write
SQL Code

Installing MySQL &


MySQL Workbench

Getting Started
with MySQL

Loading Data for


This Course

*Copyright Maven Analytics, LLC


INSTALLING MYSQL WORKBENCH (PC)

1 Go to dev.mysql.com/downloads/workbench/ to download MySQL Workbench

Where to Write
SQL Code 2 Select the following menu options and download the MSI Installer version
• Operating System: Microsoft Windows
Installing MySQL &
MySQL Workbench
3 No need to Login or Sign Up, just click “No thanks, just start my download”
Getting Started
with MySQL

4 Find the install file in your downloads, then double click to open the installer package
Loading Data for
This Course

5 Click through each step, leaving defaults unless you need customized settings

*Copyright Maven Analytics, LLC


GETTING STARTED WITH MYSQL WORKBENCH

Double click on the tile below “MySQL Connections”


to connect MySQL Workbench to MySQL Enter the password you chose during the installation

Where to Write
SQL Code

Installing MySQL &


MySQL Workbench

Getting Started
with MySQL

Loading Data for


This Course

You’re ready to write SQL code!

*Copyright Maven Analytics, LLC


MYSQL WORKBENCH INTERFACE (MAC VS. PC)

MySQL Workbench looks slightly different on Mac vs. PC, but everything you
need is found in the same place
Where to Write • While the course is recorded on a Mac, you should have no problem keeping up on a PC
SQL Code

Installing MySQL & Mac Interface PC Interface


MySQL Workbench

Getting Started
with MySQL

Loading Data for


This Course

*Copyright Maven Analytics, LLC


LOADING DATA FOR THIS COURSE

You have two options when it comes to loading data for this course:

Where to Write SQL Scripts CSV Files


SQL Code

Installing MySQL &


MySQL Workbench
This code is MySQL specific

Getting Started
with MySQL
This code will work in any RDBMS

Loading Data for


This Course
If you need to edit a
SQL script, you can
do so within your
SQL editor or within
a text editor, like
Visual Studio Code
You can load these into the
SQL editor of your choice

*Copyright Maven Analytics, LLC


SQL BASICS REVIEW

*Copyright Maven Analytics, LLC


SQL BASICS REVIEW

In this section we’ll quickly review the basics of SELECT statements so we’re on the
same page going into advanced querying concepts

TOPICS WE’LL COVER: GOALS FOR THIS SECTION:

• Review the “Big 6” SQL clauses


The Big 6 Common SQL Keywords
• Review common keywords used in SQL queries

*Copyright Maven Analytics, LLC


THE BIG 6

The Big 6 are the foundational clauses used in SQL queries:

Column(s) to display
The Big 6

Common SQL Table(s) to pull data from


Keywords
Criteria to filter the rows by
Column to group the rows by

Criteria to filter the grouped rows by


Column to sort values by

The clauses must always be written in this order


(one way to remember the order is the pneumonic:
The only required clause in a SQL
Start Fridays With Grandma’s Homemade Oatmeal)
query is the SELECT clause

*Copyright Maven Analytics, LLC


COMMON SQL KEYWORDS

In addition to the Big 6, there are common SQL keywords used in queries
These are popular keywords found in the SELECT clause:
The Big 6

DISTINCT returns unique values


Common SQL
Keywords

Aggregate functions like COUNT, SUM, AVG,


MIN, MAX are used to make calculations

AS renames a column or table to an alias

Math operators include +, -, x, /, %

*Copyright Maven Analytics, LLC


COMMON SQL KEYWORDS

In addition to the Big 6, there are common SQL keywords used in queries
These are popular keywords found in the WHERE clause:
The Big 6
Comparison operators include
=, !=, <>, <, <=, >, >=
Common SQL
Keywords

Logical operators include


AND, OR, and NOT

Comparison keywords include, IN, LIKE,


BETWEEN … AND, IS NULL and more

*Copyright Maven Analytics, LLC


COMMON SQL KEYWORDS

In addition to the Big 6, there are common SQL keywords used in queries
These are other popular keywords :
The Big 6

Common SQL
Keywords
DESC stands for “descending”, while
the default order is ASC (ascending)

LIMIT specifies the number of rows in the output


(TOP in SQL Server and FETCH FIRST in Oracle)

*Copyright Maven Analytics, LLC


COMMON SQL KEYWORDS

In addition to the Big 6, there are common SQL keywords used in queries
These are other popular keywords :
The Big 6

Common SQL This creates a new


Keywords “student_class”
column based on the
grade level for each
student

Case statements use the following syntax to do IF-ELSE logic within SQL:
CASE WHEN … THEN … WHEN … THEN … ELSE … END

*Copyright Maven Analytics, LLC


MULTI-TABLE ANALYSIS

*Copyright Maven Analytics, LLC


MULTI-TABLE ANALYSIS

In this section we’ll talk about combining multiple tables in a single SQL query, such as
using JOINs to add new columns from related tables, and UNIONs to add new rows

TOPICS WE’LL COVER: GOALS FOR THIS SECTION:

• Understand the difference between combining


Multi-Table Analysis JOIN Basics tables using a JOIN and a UNION
• Review the different types of JOINs
JOIN Variations UNION Basics • Learn how and when to apply JOINs and UNIONs

*Copyright Maven Analytics, LLC


WORKING WITH MULTIPLE TABLES

Simple queries will return data from a single table, but in practice it’s helpful to
combine multiple tables to analyze data properly
Multi-Table
Analysis

JOIN Basics

JOIN Variations

UNION Basics

*Copyright Maven Analytics, LLC


WORKING WITH MULTIPLE TABLES

There are two ways to combine multiple tables into a single table for analysis:
• JOIN adds related columns from one table to another, based on common columns
Multi-Table
Analysis • UNION stacks the rows from multiple tables with the same column structure

JOIN Basics

JOIN Variations

UNION Basics
The rows from the two
tables with the same
The continent and
columns were stacked
population columns were
using a UNION
added using a JOIN,
based on the matching
country column

*Copyright Maven Analytics, LLC


BASIC JOINS

Use JOIN to combine two tables based on common values in a column(s)


• The tables must have at least one column with matching values
Multi-Table
Analysis • Basic JOIN options include INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN

JOIN Basics
Left table Left table alias

JOIN Variations
Right table

Join type Right table alias


UNION Basics

Join condition
Column(s) in left Column(s) in right
table to join by table to join by

*Copyright Maven Analytics, LLC


BASIC JOIN TYPES

These are the four basic JOIN types in SQL:

Multi-Table
Analysis Returns records that exist in BOTH tables, and
INNER excludes unmatched records from either table
These are the most common
JOIN Basics
Returns ALL records from the LEFT table, and any
LEFT matching records from the RIGHT table

JOIN Variations
This is less often used in
Returns ALL records from the RIGHT table, and
RIGHT any matching records from the LEFT table
practice; switch the tables
and use a LEFT JOIN instead
UNION Basics
Returns ALL records from BOTH tables, including
FULL OUTER non-matching records

While INNER and LEFT JOINs are supported in all RDBMS’s, RIGHT and FULL OUTER are not – for example,
SQLite does not support RIGHT JOINs, and MySQL and SQLite do not support FULL OUTER JOINs

*Copyright Maven Analytics, LLC


BASIC JOIN TYPES

Left Table n=5 Right Table n=3

Multi-Table
Analysis

JOIN Basics

INNER n=1 LEFT n=5


JOIN Variations

UNION Basics
FULL OUTER n=7

RIGHT n=3

*Copyright Maven Analytics, LLC


ASSIGNMENT: BASIC JOINS

Results Preview
NEW MESSAGE
October 31, 2024

From: Mandy Smore (Analyst, Maven Candies)


Subject: Compare rows in tables

Hi there,

We’ve learned that there’s a discrepancy between our orders


and products tables in the candy database.
Could you use your JOIN knowledge to figure out which
products exist in one table, but not the other?
Thanks, and Happy Halloween!
Mandy

*Copyright Maven Analytics, LLC


SOLUTION: BASIC JOINS

Solution Code
NEW MESSAGE
October 31, 2024

From: Mandy Smore (Analyst, Maven Candies)


Subject: Compare rows in tables

Hi there,

We’ve learned that there’s a discrepancy between our orders


and products tables in the candy database.
Could you use your JOIN knowledge to figure out which
products exist in one table, but not the other?
Thanks, and Happy Halloween!
Mandy

*Copyright Maven Analytics, LLC


JOINING ON MULTIPLE COLUMNS

You can join tables on multiple columns by using “AND” in the join condition

Multi-Table
Analysis

JOIN Basics

JOIN Variations

UNION Basics

*Copyright Maven Analytics, LLC


JOINING ON MULTIPLE COLUMNS

You can join tables on multiple columns by using “AND” in the join condition

Multi-Table
Analysis

JOIN Basics
Use table aliases as a best practice
when working with multiple tables

JOIN Variations

UNION Basics These column names are different


but we’re still able to join them!

*Copyright Maven Analytics, LLC


JOINING MULTIPLE TABLES

You can join more than two tables as long as you specify the columns that link
the tables together
Multi-Table
Analysis

JOIN Basics

JOIN Variations

UNION Basics

PRO TIP: The code for joining multiple


tables gets messy very quickly, so it’s
important to include ample spacing and
consistent formatting for readability

*Copyright Maven Analytics, LLC


SELF JOINS

A self join lets you join a table with itself, and typically involves two steps:
1. Combine a table with itself based on a matching column
Multi-Table 2. Filter on the resulting rows based on some criteria
Analysis

JOIN Basics EXAMPLE Identifying matching rows within a table

JOIN Variations

UNION Basics

*Copyright Maven Analytics, LLC


SELF JOINS

A self join lets you join a table with itself, and typically involves two steps:
1. Combine a table with itself based on a matching column
Multi-Table 2. Filter on the resulting rows based on some criteria
Analysis

JOIN Basics EXAMPLE Comparing values within rows in a table

JOIN Variations

UNION Basics

*Copyright Maven Analytics, LLC


SELF JOINS

A self join lets you join a table with itself, and typically involves two steps:
1. Combine a table with itself based on a matching column
Multi-Table 2. Filter on the resulting rows based on some criteria
Analysis

JOIN Basics EXAMPLE Displaying relationships within a table

JOIN Variations

UNION Basics

*Copyright Maven Analytics, LLC


ASSIGNMENT: SELF JOINS

Results Preview
NEW MESSAGE
November 1, 2024

From: Mandy Smore (Analyst, Maven Candies)


Subject: Compare product prices

Hi again,

Thanks for your help earlier!


Our marketing team wants to do some analysis to identify
which of our products are similar in terms of price.
Could you write a query to determine which products are
within 25 cents of each other in terms of unit price and return
a list of all the candy pairs?
Thanks!
Mandy

*Copyright Maven Analytics, LLC


SOLUTION: SELF JOINS

Solution Code
NEW MESSAGE
November 1, 2024

From: Mandy Smore (Analyst, Maven Candies)


Subject: Compare product prices

Hi again,

Thanks for your help earlier!


Our marketing team wants to do some analysis to identify
which of our products are similar in terms of price.
Could you write a query to determine which products are
within 25 cents of each other in terms of unit price and return
a list of all the candy pairs?
Thanks!
Mandy

*Copyright Maven Analytics, LLC


CROSS JOIN

A cross join returns all combinations of rows within two or more tables

Multi-Table
Analysis

JOIN Basics

JOIN Variations

UNION Basics

PRO TIP: Cross joins can produce very large outputs, so be careful using this
on larger tables to avoid performance issues (in general, they are less common)

*Copyright Maven Analytics, LLC


UNION & UNION ALL

Use a UNION to stack multiple tables or queries on top of one another


• UNION removes duplicate values, while UNION ALL retains them
Multi-Table
Analysis

JOIN Basics

JOIN Variations

UNION Basics

*Copyright Maven Analytics, LLC


UNION & UNION ALL

Use a UNION to stack multiple tables or queries on top of one another


• UNION removes duplicate values, while UNION ALL retains them
Multi-Table
Analysis

JOIN Basics

JOIN Variations

UNION Basics

PRO TIP: If you know there are no duplicate values in the two tables
you’re combining, a UNION ALL will run much faster than a UNION

*Copyright Maven Analytics, LLC


KEY TAKEAWAYS

A JOIN combines data from two or more tables based on related column(s)
• Multiple JOINs can be written within the FROM clause of a single query aka SELECT statement

The main JOIN types are INNER, LEFT, RIGHT, and FULL OUTER
• INNER returns matches from both tables, LEFT includes everything from the left table, RIGHT includes
everything from the right table, and FULL OUTER returns all rows from both tables

Self joins and cross joins are additional JOIN options you can use
• Self joins are useful for side-by-side comparisons of rows within the same table
• CROSS JOINs return all combinations of rows within two or more tables, but are less commonly used

UNION and UNION ALL stack the results of two or more queries
• UNION removes duplicate rows and UNION ALL keeps them, making it the faster option of the two

*Copyright Maven Analytics, LLC


SUBQUERIES & CTES

*Copyright Maven Analytics, LLC


SUBQUERIES & CTEs

In this section we’ll cover subqueries and common table expressions (CTEs), which
are different ways of working with nested queries

TOPICS WE’LL COVER: GOALS FOR THIS SECTION:

• Learn the syntax for writing subqueries and CTEs


Subqueries CTEs and identify the appropriate use cases for each
• Understand the difference between subqueries,
Technique Comparison CTEs, temporary tables, and views

*Copyright Maven Analytics, LLC


SUBQUERY BASICS

A subquery is a query nested within a main query, and is typically used for
solving a problem in multiple steps

Subqueries
EXAMPLE Return all countries that have an above average happiness score

CTEs
Step 1: Calculate the average happiness score Step 2: Return all rows with a happiness score greater than the first query result

Technique This is a subquery!


Comparison

This is the main,


or outer, query

The subquery is run first,


before the main query

*Copyright Maven Analytics, LLC


SUBQUERY BASICS

Subqueries can occur in multiple places within a query:


• Calculations in the SELECT clause
• As part of a JOIN in the FROM clause
Subqueries
• Filtering in the WHERE and HAVING clauses

CTEs

This is an example of a subquery used


Technique as a filter in the WHERE clause
Comparison

*Copyright Maven Analytics, LLC


SUBQUERIES IN THE SELECT CLAUSE

EXAMPLE Return the difference between each country’s happiness score and the average

Subqueries

CTEs
This subquery lets you subtract the
average happiness score from each row
Technique
Comparison

*Copyright Maven Analytics, LLC


ASSIGNMENT: SUBQUERIES IN THE SELECT CLAUSE

Results Preview
NEW MESSAGE
November 4, 2024

From: Mandy Smore (Analyst, Maven Candies)


Subject: Most expensive products

Hello,
Our product team plans on evaluating our product prices later
this week to see if any adjustments need to be made for next
year.
Can you give me a list of our products from most to least
expensive, along with how much each product differs from
the average unit price?

Thanks!
Mandy

*Copyright Maven Analytics, LLC


SOLUTION: SUBQUERIES IN THE SELECT CLAUSE

Solution Code
NEW MESSAGE
November 4, 2024

From: Mandy Smore (Analyst, Maven Candies)


Subject: Most expensive products

Hello,
Our product team plans on evaluating our product prices later
this week to see if any adjustments need to be made for next
year.
Can you give me a list of our products from most to least
expensive, along with how much each product differs from
the average unit price?

Thanks!
Mandy

*Copyright Maven Analytics, LLC


SUBQUERIES IN THE FROM CLAUSE

EXAMPLE Return each country’s happiness score for the year alongside the country’s average happiness score

Subqueries This subquery calculates the average happiness score


by country, and is then joined with the main query

CTEs

Subqueries in the FROM clause need to have an alias


Technique
Comparison

PRO TIP: Using subqueries within the JOIN


clause is great for speeding up queries, since
it allows you to join smaller tables

*Copyright Maven Analytics, LLC


PRO TIP: MULTIPLE SUBQUERIES

Queries can contain multiple subqueries as long as each one has a different alias

Subqueries This subquery stacks the historical and


current happiness scores for each country

CTEs

Technique
Comparison

This subquery calculates the average happiness score


by country, and is then joined with the first subquery

*Copyright Maven Analytics, LLC


PRO TIP: MULTIPLE SUBQUERIES

Queries can contain multiple subqueries as long as each one has a different alias

This is a nested subquery now, used


to return years where the happiness
Subqueries is a whole point greater that the
country’s average score

CTEs

Technique
Comparison

PRO TIP: Interpreting nested subqueries


can be overwhelming, so start from the inner
subqueries and work your way out – or use
CTEs instead (coming up shortly)

*Copyright Maven Analytics, LLC


ASSIGNMENT: SUBQUERIES IN THE FROM CLAUSE

Results Preview
NEW MESSAGE
November 5, 2024

From: Mandy Smore (Analyst, Maven Candies)


Subject: Products in each factory

Hello,
Our inventory management team would like to review the
products produced by each factory.
Can you give me a list of our factories, along with the names
of the products they produce and the number of products
they produce?
Thanks!
Mandy

*Copyright Maven Analytics, LLC


SOLUTION: SUBQUERIES IN THE FROM CLAUSE

Solution Code
NEW MESSAGE
November 5, 2024

From: Mandy Smore (Analyst, Maven Candies)


Subject: Products in each factory

Hello,
Our inventory management team would like to review the
products produced by each factory.
Can you give me a list of our factories, along with the names
of the products they produce and the number of products
they produce?
Thanks!
Mandy

*Copyright Maven Analytics, LLC


SUBQUERIES IN THE WHERE & HAVING CLAUSES

EXAMPLE Return regions with above average happiness scores

Subqueries

CTEs

Technique
This subquery filters the
Comparison
grouped regional data

*Copyright Maven Analytics, LLC


ANY VS ALL

Keywords like ANY, ALL, and EXISTS can provide more specific filtering logic

Subqueries
EXAMPLE Return happiness scores that are greater than ANY / ALL of the current happiness scores

CTEs

Technique
Comparison

Only 5 rows
are returned

All rows are returned

*Copyright Maven Analytics, LLC


EXISTS

Keywords like ANY, ALL, and EXISTS can provide more specific filtering logic

Subqueries
EXAMPLE Only return happiness scores for countries that EXIST in the inflation rates table

CTEs

This is known as a correlated subquery, since it


Technique references an outside table and cannot stand alone
Comparison

Correlated subqueries are known


to be slow, but some RDBMSs
automatically optimize for this

*Copyright Maven Analytics, LLC


PRO TIP: CORRELATED SUBQUERIES

Correlated subqueries can be rewritten as INNER JOINs to run faster

Subqueries
EXAMPLE Only return happiness scores for countries that EXIST in the inflation rates table

This is arguably less readable,


CTEs
so it’s important to consider
readability and run time when
selecting an approach
Technique
Comparison

*Copyright Maven Analytics, LLC


ASSIGNMENT: SUBQUERIES IN THE WHERE CLAUSE

Results Preview
NEW MESSAGE
November 6, 2024

From: Mandy Smore (Analyst, Maven Candies)


Subject: Low unit price products

Hello,
Our Wicked Choccy’s factory has some extra bandwidth, and
we’d like to see if there are any lower priced products that
they can help produce going forward.
Can you help us identify products that have a unit price less
than the unit price of all products from Wicked Choccy’s?
Please include which factory is currently producing them as
well.
Thanks!
Mandy

*Copyright Maven Analytics, LLC


SOLUTION: SUBQUERIES IN THE WHERE CLAUSE

Solution Code
NEW MESSAGE
November 6, 2024

From: Mandy Smore (Analyst, Maven Candies)


Subject: Low unit price products

Hello,
Our Wicked Choccy’s factory has some extra bandwidth, and
we’d like to see if there are any lower priced products that
they can help produce going forward.
Can you help us identify products that have a unit price less
than the unit price of all products from Wicked Choccy’s?
Please include which factory is currently producing them as
well.
Thanks!
Mandy

*Copyright Maven Analytics, LLC


COMMON TABLE EXPRESSIONS

A common table expression (CTE) creates a named, temporary output that can
be referenced within another query

Subqueries EXAMPLE Return each country’s happiness score for the year alongside the country’s average happiness score

CTEs This is a CTE named “country_hs”

Technique
Comparison

It’s being referenced here!

*Copyright Maven Analytics, LLC


COMMON TABLE EXPRESSIONS

A common table expression (CTE) creates a named, temporary output that can
be referenced within another query

Subqueries EXAMPLE Return each country’s happiness score for the year alongside the country’s average happiness score

CTEs CTEs need to start with the WITH keyword,


followed by the alias, the AS keyword, and the
query between parentheses
Technique
Comparison

*Copyright Maven Analytics, LLC


COMMON TABLE EXPRESSIONS

A common table expression (CTE) creates a named, temporary output that can
be referenced within another query

Subqueries

Why use CTEs instead of subqueries?


CTEs
• Readability: Complex queries with CTEs are much easier to read
• Reusability: CTEs can be referenced multiple times within a query
Technique
Comparison
• Recursiveness: CTEs can handle recursive queries

Despite all this, you shouldn’t forget about subqueries:


• Most modern RDBMSs support CTEs, but not all of them do
• For simple queries, sometimes a subquery is readable enough and works just fine

*Copyright Maven Analytics, LLC


READABILITY

EXAMPLE Return each country’s happiness score for the year alongside the country’s average happiness score

Subqueries Subquery CTE

CTEs

Technique
Comparison
0

Same
outputs!

*Copyright Maven Analytics, LLC


REUSABILITY

Unlike subqueries, CTEs can be referenced multiple times within a query,


which helps avoid repeating code

Subqueries
EXAMPLE For each country, return countries from the same region with a lower happiness score in 2023

CTEs

Technique
Comparison

We’re performing a self join using the CTE!

*Copyright Maven Analytics, LLC


ASSIGNMENT: CTES

Results Preview
NEW MESSAGE
November 7, 2024

From: Mandy Smore (Analyst, Maven Candies)


Subject: Largest orders

Hello,
The sales director wants a list of our biggest orders. In
addition to sending over a list of all the orders over $200,
could you also tell him the number of orders over $200?
Thanks!
Mandy

*Copyright Maven Analytics, LLC


SOLUTION: CTES

Solution Code
NEW MESSAGE
November 7, 2024

From: Mandy Smore (Analyst, Maven Candies)


Subject: Largest orders

Hello,
The sales director wants a list of our biggest orders. In
addition to sending over a list of all the orders over $200,
could you also tell him the number of orders over $200?
Thanks!
Mandy

This ORDER BY clause in the CTE doesn't


affect the final output and can be removed
to make the code run more efficiently
*Copyright Maven Analytics, LLC
MULTIPLE CTES

You can use multiple CTEs in a query, and even combine them with subqueries

Subqueries

Note that we’re using a single WITH


CTEs keyword to create two CTEs, and they
are separated by a comma

Technique
Comparison

*Copyright Maven Analytics, LLC


MULTIPLE CTES

You can use multiple CTEs in a query, and even combine them with subqueries

Subqueries

CTEs

Our previous query with two CTEs


Technique is now a subquery!
Comparison

*Copyright Maven Analytics, LLC


ASSIGNMENT: MULTIPLE CTES

Results Preview
NEW MESSAGE
November 8, 2024

From: Mandy Smore (Analyst, Maven Candies)


Subject: RE: Products in each factory

Hi again –
Regarding my earlier message, could you rewrite your code
using CTEs instead of subqueries? Thanks!
---
Our inventory management team would like to review the
products produced by each factory.
Can you give me a list of our factories, along with the names of the
products they produce and the number of products they produce?

*Copyright Maven Analytics, LLC


SOLUTION: MULTIPLE CTES

Solution Code
NEW MESSAGE
November 8, 2024

From: Mandy Smore (Analyst, Maven Candies)


Subject: RE: Products in each factory

Hi again –
Regarding my earlier message, could you rewrite your code
using CTEs instead of subqueries? Thanks!
---
Our inventory management team would like to review the
products produced by each factory.
Can you give me a list of our factories, along with the names of the
products they produce and the number of products they produce?

*Copyright Maven Analytics, LLC


RECURSIVE CTEs

A recursive CTE is a query that references itself, which is useful for generating
sequences and working with hierarchical data

Subqueries Recursive CTEs use the


RECURSIVE keyword
WITH RECURSIVE cte_name AS (
CTEs
SELECT ... They have an anchor member

Technique
Comparison UNION ALL
UNION or UNION ALL
are used as connectors
SELECT ... And a recursive member that
references the CTE
FROM cte_name

)
The syntax is slightly
different in each RDBMS
SELECT * FROM cte_name;

*Copyright Maven Analytics, LLC


RECURSIVE CTEs

EXAMPLE Return daily stock prices, including dates with missing prices

Step 1: Generate a column of dates


Subqueries
This generates the next
date after the anchor
CTEs

Technique
Comparison This stops the recursion
after reaching this date
Notice the
missing dates

*Copyright Maven Analytics, LLC


RECURSIVE CTEs

EXAMPLE Return daily stock prices, including dates with missing prices

Step 1: Generate a column of dates Step 2: Join with the stock prices table
Subqueries

CTEs

Technique
Comparison
Notice the
missing dates

*Copyright Maven Analytics, LLC


RECURSIVE CTEs

EXAMPLE Return the reporting chain for each employee

Subqueries
This sets the highest-ranking
employee as the anchor
CTEs

Technique
Comparison

This stops the recursion


when there is nothing
left to join

*Copyright Maven Analytics, LLC


RECURSIVE CTEs

EXAMPLE Return the reporting chain for each employee

Subqueries

CTEs

Technique
Comparison

*Copyright Maven Analytics, LLC


RECURSIVE CTEs

EXAMPLE Return the reporting chain for each employee

Subqueries

CTEs

Technique
Comparison

*Copyright Maven Analytics, LLC


RECURSIVE CTEs

EXAMPLE Return the reporting chain for each employee

Subqueries

CTEs

Technique
Comparison
PRO TIP: Recursive
CTEs aren’t very common,
so instead of memorizing
syntax, keep in mind the
general concepts of
generating sequences and
returning hierarchies

*Copyright Maven Analytics, LLC


TEMPORARY TABLES & VIEWS

Temporary tables and views are other options for querying the results of a query
• Both subqueries and CTEs only exist for the duration of the query
Subqueries
• Temporary tables exist for a session and views continue to exist until modified or dropped

CTEs

Technique
Comparison

*Copyright Maven Analytics, LLC


TECHNIQUE COMPARISON

Here’s how these techniques compare with each other:

Subqueries Technique Description Persistence Permissions Summary In Practice

A query Simple queries


Exists only Fewer Start here if you need to
Subqueries nested within Great for simple queries
CTEs during execution permissions query the results of a query
a query that require multiple steps

To improve readability and


Named Complex queries
Technique Exists only Fewer reusability, or use recursion,
CTEs temporary Great for organizing
Comparison during execution permissions consider rewriting complex
output complex queries
subqueries as CTEs

Temporary
Multiple uses in a session If you find yourself
Temporary data storage Exists only More
Great for referencing referencing the same CTE,
Tables within a during a session permissions
multiple times in a session consider a temporary table
session

Virtual table Always available Use in multiple sessions


More For a more permanent
Views based on a until modified or Great for accessing the
permissions solution, consider a view
query dropped results of complex queries

*Copyright Maven Analytics, LLC


KEY TAKEAWAYS

A subquery is a query nested within a main query


• In the SELECT clause, they can be used to make calculations
• In the FROM clause, they can be used to join query results and require an alias using AS
• In the WHERE or HAVING clause, they can be used to filter results (can use keywords like ANY, ALL and EXISTS)
• Avoid correlated subqueries by using INNER JOINs if possible

A common table expression (CTE) creates a named temporary output


• CTEs are more readable, can be referenced multiple times, and you can create multiple CTEs within a query
• Recursive CTEs are useful for generating values and working with hierarchical data, but are less common

Temporary tables & views are other options for using the results of a query
• Subqueries and CTEs exists only for the duration of a query, and require minimal permissions to create
• Temporary tables exist for the duration of a session and views exist indefinitely, but they often require
additional permissions to create and maintain

*Copyright Maven Analytics, LLC


WINDOW FUNCTIONS

*Copyright Maven Analytics, LLC


WINDOW FUNCTIONS

In this section, we’ll break down each component of a window function, introduce
common window functions, and preview some of their applications

TOPICS WE’LL COVER: GOALS FOR THIS SECTION:

• Review the similarities and differences between


Window Function Basics Window Functions aggregate and window functions
• Identify the individual components of a window
Applications Preview function and understand what each of them do
• Learn about the numerous functions that are
available when working with window functions
• Get a taste of the practical applications of window
functions (more on this in the last section!)

*Copyright Maven Analytics, LLC


WINDOW FUNCTION BASICS

Window functions are used to apply a function to a “window” of data


• Windows are essentially groups of rows of data
Window Function
Basics

Window Functions The window function applied here is the


ROW_NUMBER() function, which returns a
series of numbers for each window
This is a window
Applications
Preview

This is a window

This is a window

*Copyright Maven Analytics, LLC


AGGREGATE VS WINDOW FUNCTIONS

How are window functions different than a GROUP BY?


• Aggregate functions collapse the rows in each group and apply a calculation
Window Function • Window functions leave the rows as they are and apply calculations by window
Basics

Window Functions

Applications
Preview

A calculation
is made for
each group A calculation is
applied to each
The original row window
granularity is kept
There is one row per country

*Copyright Maven Analytics, LLC


WINDOW FUNCTION COMPONENTS

Window functions consist of four main components:

Window Function
Basics
How each window should be sorted before
States that this is a the function is applied
window function (optional in MySQL, PostgreSQL, SQLite)
Window Functions (required in Oracle, SQL Server)
(required)

Applications
Preview

The function to apply to How you’re splitting the rows into


each window (required) windows (optional)
Examples: Examples:
• ROW_NUMBER • One column
• FIRST_VALUE • Multiple columns
• LAG • The entire table (if left blank)

*Copyright Maven Analytics, LLC


ROW_NUMBER() OVER

This is the most basic window function:


Window Function
Basics • ROW_NUMBER() is a commonly used window function that numbers each row in a window
• OVER states that we are using a window function
Window Functions

Applications
Preview

Because we didn’t specify a


window with PARTITION BY,
the function is applied to the
entire table

*Copyright Maven Analytics, LLC


PARTITION BY

A PARTITION BY allows you to define your windows


Window Function
Basics
• You can partition by one or more columns

Window Functions

Applications
Preview

Note that the numbers go from 1


to 4 for each country, but they are
in a seemingly random order
(more on this up next!)

*Copyright Maven Analytics, LLC


ORDER BY

An ORDER BY allows you to specify the order of the rows within your windows
Window Function
Basics • You can order in ASC (default) or DESC order by one or more columns

Window Functions

Applications
Preview

These are now ordered by the


happiness score for each country,
from lowest to highest

*Copyright Maven Analytics, LLC


ORDER BY

An ORDER BY allows you to specify the order of the rows within your windows
Window Function
Basics • You can order in ASC (default) or DESC order by one or more columns

Window Functions

Applications
Preview

And now from highest to lowest

*Copyright Maven Analytics, LLC


ASSIGNMENT: WINDOW FUNCTION BASICS

Results Preview
NEW MESSAGE
November 12, 2024

From: Mandy Smore (Analyst, Maven Candies)


Subject: New transaction number column

Hello,

We currently have an orders report with customer, order and


transaction IDs, and we would like to add an additional
column that contains the transaction number for each
customer as well.

Could you help us do this using window functions?

Thanks!
Mandy

*Copyright Maven Analytics, LLC


SOLUTION: WINDOW FUNCTION BASICS

Solution Code
NEW MESSAGE
November 12, 2024

From: Mandy Smore (Analyst, Maven Candies)


Subject: New transaction number column

Hello,

We currently have an orders report with customer, order and


transaction IDs, and we would like to add an additional
column that contains the transaction number for each
customer as well.

Could you help us do this using window functions?

Thanks!
Mandy

*Copyright Maven Analytics, LLC


WINDOW FUNCTIONS

There are many functions to choose from when writing window functions:

Window Function Category Functions


Basics
• ROW_NUMBER
Row Numbering • RANK These top three function categories are the
Window Functions • DENSE_RANK most common, and we’ll be covering these in
detail in this section of the course
• FIRST_VALUE
Applications Value Within a Window • LAST_VALUE
Preview • NTH_VALUE

• LEAD
Value Relative to a Row
• LAG

• SUM, AVG, COUNT


Aggregate Functions
• MIN, MAX

• NTILE
Statistical Functions • CUME_DIST
• PERCENT_RANK

*Copyright Maven Analytics, LLC


ROW NUMBERING

There are three different ways of numbering rows within a window:


• ROW_NUMBER() gives every row a unique number
• RANK() accounts for ties
Window Function
Basics • DENSE_RANK() accounts for ties and leaves no missing numbers in between

Window Functions

Applications
Preview

*Copyright Maven Analytics, LLC


ASSIGNMENT: ROW NUMBERING

Results Preview
NEW MESSAGE
November 13, 2024

From: Mandy Smore (Analyst, Maven Candies)


Subject: Product rank

Hello,

Our product team would like to know which products are


most popular within each order.

Could you create a product rank field that returns a 1 for the
most popular product in an order, 2 for second most, and so
on? Please take a look at the results preview to get an idea of
what they’d like the ranking to look like.

Thanks!
Mandy

*Copyright Maven Analytics, LLC


SOLUTION: ROW NUMBERING

Solution Code
NEW MESSAGE
November 13, 2024

From: Mandy Smore (Analyst, Maven Candies)


Subject: Product rank

Hello,

Our product team would like to know which products are


most popular within each order.

Could you create a product rank field that returns a 1 for the
most popular product in an order, 2 for second most, and so
on? Please take a look at the results preview to get an idea of
what they’d like the ranking to look like.

Thanks!
Mandy

*Copyright Maven Analytics, LLC


VALUE WITHIN A WINDOW

There are three different ways of extracting a particular value within a window:
• FIRST_VALUE() extracts the first value in a window, in sequential row order
• LAST_VALUE() extracts the last value
Window Function
Basics • NTH_VALUE() extracts the value at a specified position

Window Functions

Applications
Preview

Note that the first value is repeated


across all rows in a window

*Copyright Maven Analytics, LLC


VALUE WITHIN A WINDOW

There are three different ways of extracting a particular value within a window:
• FIRST_VALUE() extracts the first value in a window, in sequential row order
• LAST_VALUE() extracts the last value
Window Function
Basics • NTH_VALUE() extracts the value at a specified position

Window Functions

Applications
Preview

These values are NULL because the second


value hasn’t been encountered yet when
scanning through the rows in the window

SQL Server doesn’t support NTH_VALUE


(you can use ROW_NUMBER instead)

*Copyright Maven Analytics, LLC


VALUE WITHIN A WINDOW

EXAMPLE Return the top name for each gender

Window Function
Basics

Window Functions
Extract the first value for each gender ordered
by number of babies in a subquery

Applications
Preview

Then simply keep the top names!

*Copyright Maven Analytics, LLC


ASSIGNMENT: VALUE WITHIN A WINDOW

Results Preview
NEW MESSAGE
November 14, 2024

From: Mandy Smore (Analyst, Maven Candies)


Subject: Second most popular product

Hello,

Could you specifically give me a list of the 2nd most popular


product within each order?

The sales team is going to try to see if they can bundle them
with some other products to increase units sold within each
order.

Thanks!
Mandy

*Copyright Maven Analytics, LLC


SOLUTION: VALUE WITHIN A WINDOW

Solution Code
NEW MESSAGE
November 14, 2024

From: Mandy Smore (Analyst, Maven Candies)


Subject: Second most popular product

Hello,

Could you specifically give me a list of the 2nd most popular


product within each order?

The sales team is going to try to see if they can bundle them
with some other products to increase units sold within each
order. ORDER BY in subquery is not
needed and can be omitted
Thanks!
Mandy

*Copyright Maven Analytics, LLC


VALUE RELATIVE TO A ROW

LEAD() and LAG() allow you to return the value from the next and previous row,
respectively, within each window
Window Function
Basics

Window Functions

Applications
Preview

The first value is NULL because there


is no previous value in the window

*Copyright Maven Analytics, LLC


VALUE RELATIVE TO A ROW

EXAMPLE Calculate the difference in happiness scores over time, by country

Window Function
Basics Return the prior year’s happiness
score for each country in a CTE

Window Functions

Applications Then simply subtract them!


Preview

*Copyright Maven Analytics, LLC


ASSIGNMENT: VALUE RELATIVE TO A ROW

Results Preview
NEW MESSAGE
November 15, 2024

From: Mandy Smore (Analyst, Maven Candies)


Subject: Change in orders over time

Hello,
We’d like to look into how orders have changed over time for
each customer.
Could you produce a table that contains info about each
customer and their orders, the number of units in each order,
and the change in units from order to order?
Thanks!
Mandy

*Copyright Maven Analytics, LLC


SOLUTION: VALUE RELATIVE TO A ROW

Solution Code
NEW MESSAGE
November 15, 2024

From: Mandy Smore (Analyst, Maven Candies)


Subject: Change in orders over time

Hello,
We’d like to look into how orders have changed over time for
each customer.
Could you produce a table that contains info about each
customer and their orders, the number of units in each orders,
and the change in units from order to order?
Thanks!
Mandy

*Copyright Maven Analytics, LLC


STATISTICAL FUNCTIONS

NTILE() divides the rows in a window into a specified number of percentiles

Window Function EXAMPLE View the top 25% of happiness scores for each region
Basics

Window Functions

Applications
Preview
Because we specified NTILE(4), the range of
100% of the rows in each window is divided
into 4 groups of 25%, with 1 representing the
top percentile group, and 4 the bottom

NTILE is not supported in SQLite, but you can simulate it using other window functions

*Copyright Maven Analytics, LLC


STATISTICAL FUNCTIONS

NTILE() divides the rows in a window into a specified number of percentiles

Window Function EXAMPLE View the top 25% of happiness scores for each region
Basics

Window Functions

Applications
Preview
Return the percentiles in a CTE
Then simply filter for the top percentile (25%)

The number of results will differ by region because


some regions have more countries, so the top 25%
calculation can contain more or fewer rows

*Copyright Maven Analytics, LLC


ASSIGNMENT: STATISTICAL FUNCTIONS

Results Preview
NEW MESSAGE
November 18, 2024

From: Mandy Smore (Analyst, Maven Candies)


Subject: Top 1% of customers

Hello,
The customer engagement team would like to create a
rewards program for our top 1% of customers.
Could you pull a list of the top 1% of customers in terms of
how much they’ve spent with us?
Thanks!
Mandy

*Copyright Maven Analytics, LLC


SOLUTION: STATISTICAL FUNCTIONS

Solution Code
NEW MESSAGE
November 18, 2024

From: Mandy Smore (Analyst, Maven Candies)


Subject: Top 1% of customers

Hello,
ORDER BY in CTE is not
The customer engagement team would like to create a needed and can be omitted
rewards program for our top 1% of customers.
Could you pull a list of the top 1% of customers in terms of
how much they’ve spent with us?
Thanks!
Mandy

*Copyright Maven Analytics, LLC


PREVIEW: WINDOW FUNCTION APPLICATIONS

There are many practical applications of window functions, including calculating


moving averages, running totals, and more
Window Function
Basics

Note that this uses an AVG() aggregate


Window Functions function as part of a window function

Applications We’ve added some additional keywords at the


Preview end that specify rows within the window

In the Data Analysis Applications section of this course,


we’ll cover the following applications of window functions:
• Removing duplicate rows
• Min / max value filtering
• Rolling calculations

*Copyright Maven Analytics, LLC


KEY TAKEAWAYS

Window functions are used to apply a function across windows of data


• Windows refer to groups of rows in a table
• Aggregate functions collapse the rows in each group, but window functions leave the rows untouched

The general syntax is FUNCTION OVER(PARTITION BY x ORDER BY y)


• OVER indicates that we’re writing a window function
• PARTITION BY states how we’d like to split up the rows into groups
• ORDER BY states how the rows within each window should be ordered before applying the function

The function portion of a window function is applied to each window


• You can number rows with ROW_NUMBER(), RANK(), and DENSE_RANK()
• You can identify values within a window with FIRST_VALUE(), LAST_VALUE() and NTH_VALUE()
• You can return values from relative rows with LEAD() and LAG()
• You can use statistical functions like NTILE() for making percentile calculations
• You can use aggregate functions like AVG() for making moving average calculations

*Copyright Maven Analytics, LLC


FUNCTIONS BY DATA TYPE

*Copyright Maven Analytics, LLC


FUNCTIONS BY DATA TYPE

ƒₓ In this section, we’ll go over commonly used functions by data type, including numeric,
datetime, string functions, and more

TOPICS WE’LL COVER: GOALS FOR THIS SECTION:

• Review three categories of SQL functions:


Function Basics Numeric Functions aggregate, window, and general functions
• Recognize that most SQL functions will only work
Datetime Functions String Functions on specific data types
• Learn and apply commonly used SQL functions
across different data types
NULL Functions

*Copyright Maven Analytics, LLC


FUNCTION BASICS
ƒₓ
SQL functions take in zero or more inputs, apply a calculation or transformation,
and output a value
Function Basics • You can recognize a function by the parentheses () that follow a keyword

Numeric
Functions

Datetime
Functions

String Functions

NULL Functions

PRO TIP: While SQL is case insensitive, it’s a best practice to capitalize functions so they stand out,
similar to clauses. If you’re using a SQL editor, they will automatically be highlighted a different color.

*Copyright Maven Analytics, LLC


FUNCTION CATEGORIES
ƒₓ
These are three function categories: aggregate, window, and general functions

Function Basics
Aggregate
Numeric • Applies a calculation to many rows
Functions of data and returns a single value
• Often used alongside a GROUP BY
Datetime
Functions

Window
String Functions • Performs calculations across a
window of rows

NULL Functions
General
• Performs calculations on all
individual values within a column

*Copyright Maven Analytics, LLC


FUNCTION CATEGORIES
ƒₓ
These are three function categories: aggregate, window, and general functions

Function Basics
Aggregate
Numeric • Applies a calculation to many rows
Functions of data and returns a single value
• Often used alongside a GROUP BY
Datetime
Functions

Window
String Functions • Performs calculations across a
window of rows

NULL Functions
General
• Performs calculations on all
individual values within a column

*Copyright Maven Analytics, LLC


FUNCTION CATEGORIES
ƒₓ
These are three function categories: aggregate, window, and general functions

Function Basics
Aggregate
Numeric • Applies a calculation to many rows
Functions of data and returns a single value
• Often used alongside a GROUP BY
Datetime
Functions

Window
String Functions • Performs calculations across a
window of rows

NULL Functions
General
• Performs calculations on all
individual values within a column

*Copyright Maven Analytics, LLC


NUMERIC FUNCTIONS
ƒₓ
Numeric functions can be applied to numeric columns (integer, decimal, etc.)

Function Basics
Category Function Description

Numeric ABS Absolute value


Functions
SIGN Returns -1, 0, or 1 depending on the sign of the number

Datetime POWER x to the power of y


Functions Math
SQRT Square root

String Functions LOG Log of y base x

MOD (% in SQL Server) Remainder of x / y


NULL Functions
ROUND Rounds a number to n decimal places

Rounding FLOOR Rounds a number down (not supported in SQLite)

CEIL (CEILING in SQL Server) Rounds a number up (not supported in SQLite)

*Copyright Maven Analytics, LLC


NUMERIC FUNCTIONS
ƒₓ
Numeric functions can be applied to numeric columns (integer, decimal, etc.)

Function Basics
EXAMPLE Applying a log transform to the population of each country

Numeric
Functions
This is a nested function

Datetime
Functions

String Functions

NULL Functions

*Copyright Maven Analytics, LLC


PRO TIP: LEAST & GREATEST
ƒₓ
LEAST() and GREATEST() are two lesser-known functions that apply a
calculation across a row instead of a column
Function Basics

Numeric
Functions

Datetime
Functions
We’ll discuss how to handle NULL
values like these later in the section

String Functions

NULL Functions

LEAST() and GREATEST() are not supported in SQL Server

*Copyright Maven Analytics, LLC


PRO TIP: CAST & CONVERT
ƒₓ
Sometimes you may need to cast columns to different data types, so that you
can utilize specific functions
Function Basics

Numeric
Functions
Dividing by 1.0 turns an integer
into a float (includes decimals)
Datetime
Functions

String Functions

NULL Functions
The CAST / CONVERT functions only change
the data type for the duration of the query,
In some RDBMS’s like SQL Server, not permanently
use CONVERT instead of CAST

*Copyright Maven Analytics, LLC


ASSIGNMENT: NUMERIC FUNCTIONS

Results Preview
NEW MESSAGE
November 19, 2024

From: Mandy Smore (Analyst, Maven Candies)


Subject: Customer spend bins

Hello,
Our market research team is interested in seeing how many
customers have spent $0-$10 on our products, $10-$20, and
so on for every $10 range.
Could you generate this table for them?
Thanks!
Mandy

*Copyright Maven Analytics, LLC


SOLUTION: NUMERIC FUNCTIONS

Solution Code
NEW MESSAGE
November 19, 2024

From: Mandy Smore (Analyst, Maven Candies)


Subject: Customer spend bins

Hello,
Our market research team is interested in seeing how many
customers have spent $0-$10 on our products, $10-$20, and
so on for every $10 range.
Could you generate this table for them?
Thanks!
Mandy

*Copyright Maven Analytics, LLC


DATETIME FUNCTIONS
ƒₓ
Datetime functions can be applied to datetime columns (date, time, etc.)

Function Basics
Category Function Description
CURRENT_DATE Current date
Numeric
Current
Functions
CURRENT_TIMESTAMP Current date and time

Datetime YEAR, MONTH, etc. Extract a specific portion of the datetime value
Functions Extract
DAYOFWEEK Extract the day of the week from the datetime value

String Functions DATEDIFF / AGE Interval between two datetimes


Difference
DATE_ADD / DATE_SUB Add or subtract an interval to or from a datetime

NULL Functions

Datetime and string functions vary widely by RDMBS. The functions in this section are
for MySQL, but you may need to look up the specific function in your RDBMS

*Copyright Maven Analytics, LLC


DATETIME FUNCTIONS
ƒₓ
Datetime functions can be applied to datetime columns (date, time, etc.)

Function Basics

Numeric
Functions

Datetime
Functions

For day of the week, 1 is Sunday,


String Functions 2 is Monday, and so on

NULL Functions

*Copyright Maven Analytics, LLC


DATETIME FUNCTIONS
ƒₓ
Datetime functions can be applied to datetime columns (date, time, etc.)

Function Basics

Numeric
Functions

Datetime
Functions

String Functions

NULL Functions

*Copyright Maven Analytics, LLC


ASSIGNMENT: DATETIME FUNCTIONS

Results Preview
NEW MESSAGE
November 20, 2024

From: Mandy Smore (Analyst, Maven Candies)


Subject: Add shipping dates for Q2 orders

Hello,
The market research team wants to do a deep dive on the Q2
2024 orders data we currently have.
Can you pull that data for them?
In addition, they also requested that we include a ship_date
column for them that’s 2 days after the order_date.
Thanks for your help!
Mandy

*Copyright Maven Analytics, LLC


SOLUTION: DATETIME FUNCTIONS

Solution Code
NEW MESSAGE
November 20, 2024

From: Mandy Smore (Analyst, Maven Candies)


Subject: Add shipping dates for Q2 orders

Hello,
The market research team wants to do a deep dive on the Q2
2024 orders data we currently have.
Can you pull that data for them?
In addition, they also requested that we include a ship_date
column for them that’s 2 days after the order_date.
Thanks for your help!
Mandy

*Copyright Maven Analytics, LLC


STRING FUNCTIONS
ƒₓ
String functions can be applied to string columns (char, varchar, text, etc.)

Function Basics Category Function Description


Length LENGTH Returns the number of characters in a string
Numeric
Functions TRIM Removes leading or trailing spaces (or other characters)
Update
Datetime
REPLACE Replaces a substring with another
Functions
Case UPPER / LOWER Converts all characters to uppercase or lowercase

Combine CONCAT Combines multiple strings into one


String Functions
INSTR Returns the position of a substring within a string
Find Location
NULL Functions SUBSTR Extracts part of a string of a specified length, starting from a given position

LIKE* Performs a pattern match within a string, typically using wildcards


Find Pattern
REGEXP Matches a string against a regular expression pattern

*This is a keyword, not a function *Copyright Maven Analytics, LLC


STRING FUNCTIONS
ƒₓ
String functions can be applied to string columns (char, varchar, text, etc.)

Function Basics

Numeric
Functions

Datetime
Functions

String Functions

NULL Functions

*Copyright Maven Analytics, LLC


STRING FUNCTIONS
ƒₓ
String functions can be applied to string columns (char, varchar, text, etc.)

Function Basics

Numeric
Functions

Datetime
Functions

String Functions

NULL Functions

*Copyright Maven Analytics, LLC


ASSIGNMENT: STRING FUNCTIONS

Results Preview
NEW MESSAGE
November 21, 2024

From: Mandy Smore (Analyst, Maven Candies)


Subject: Update product ID

Hello,
We’re updating our product_ids to include the factory name
and product name.
Here’s what we’re thinking it should look like.
Could you write the SQL code to produce this?
Thank you!
Mandy

*Copyright Maven Analytics, LLC


SOLUTION: STRING FUNCTIONS

Solution Code
NEW MESSAGE
November 21, 2024

From: Mandy Smore (Analyst, Maven Candies)


Subject: Update product ID

ORDER BY in CTE is not


Hello, needed and can be omitted

We’re updating our product_ids to include the factory name


and product name.
Here’s what we’re thinking it should look like.
Could you write the SQL code to produce this?
Thank you!
Mandy

*Copyright Maven Analytics, LLC


STRING FUNCTIONS: PATTERN MATCHING
ƒₓ
String functions can be applied to string columns (char, varchar, text, etc.)

Function Basics

Numeric
Functions

Datetime Since this extracts the text before the first space, it
Functions doesn’t work for strings with a single word

String Functions

NULL Functions

This CASE statement returns the full event name


if it can’t find a space in the text string

*Copyright Maven Analytics, LLC


STRING FUNCTIONS: PATTERN MATCHING
ƒₓ
String functions can be applied to string columns (char, varchar, text, etc.)

Function Basics

This is a keyword, not a function


Numeric
Functions

Datetime
Functions

String Functions

“%” and “_” are wildcard characters


NULL Functions

*Copyright Maven Analytics, LLC


STRING FUNCTIONS: PATTERN MATCHING
ƒₓ
String functions can be applied to string columns (char, varchar, text, etc.)

Function Basics

Numeric
Functions

Datetime
Functions

String Functions

NULL Functions

*Copyright Maven Analytics, LLC


STRING FUNCTIONS: PATTERN MATCHING
ƒₓ
String functions can be applied to string columns (char, varchar, text, etc.)

Function Basics

Numeric
Functions

Datetime
Functions

String Functions

NULL Functions
Regular expressions allow you to find patterns in your text – they are language agnostic, which means they work in
multiple languages like SQL and Python (they take extra processing power, so they aren’t recommended for large datasets)

*Copyright Maven Analytics, LLC


ASSIGNMENT: PATTERN MATCHING

Results Preview
NEW MESSAGE
November 22, 2024

From: Mandy Smore (Analyst, Maven Candies)


Subject: Update product names

Hi again,
The marketing team has kicked off an initiative to simplify our
product names, starting with our Wonka Bars products.
Could you remove “Wonka Bar” from any products that
contain the term?
Thank you!
Mandy

*Copyright Maven Analytics, LLC


SOLUTION: PATTERN MATCHING

Solution Code
NEW MESSAGE
November 22, 2024

From: Mandy Smore (Analyst, Maven Candies)


Subject: Update product names

Hi again,
The marketing team has kicked off an initiative to simplify our
product names, starting with our Wonka Bars products.
Could you remove “Wonka Bar” from any products that
contain the term?
Thank you!
Mandy

*Copyright Maven Analytics, LLC


NULL FUNCTIONS
ƒₓ
NULL functions are used to replace NULL values with an alternative value

Function Basics
To do a simple IF-ELSE NULL check:
Numeric • Use the IFNULL function (NVL in Oracle, not supported in PostgreSQL)
Functions

Datetime
Functions To do more complex IF-ELSE NULL checks:
• Use the COALESCE function (supported in most modern RDBMS’s)
String Functions

NULL Functions
PRO TIP: COALESCE is a more flexible version of the two, and will allow
you to do multiple NULL checks, and returns the first non-NULL value

*Copyright Maven Analytics, LLC


ASSIGNMENT: NULL FUNCTIONS

Results Preview
NEW MESSAGE
November 25, 2024

From: Mandy Smore (Analyst, Maven Candies)


Subject: Fill in NULL values

Hello,
Sugar Shack and The Other Factory just added two new
products that don’t have divisions assigned to them.
For simplicity's sake, could you update those NULL values to
have a value of “Other”?
Here’s an extra challenge for you – instead of updating them to
“Other”, could you update them to be the same division as the
most common division within their respective factories?
Thanks!

*Copyright Maven Analytics, LLC


SOLUTION: NULL FUNCTIONS

Solution Code
NEW MESSAGE
November 25, 2024

From: Mandy Smore (Analyst, Maven Candies)


Subject: Fill in NULL values
ORDER BY in CTE is not
needed and can be omitted
Hello,
Sugar Shack and The Other Factory just added two new
products that don’t have divisions assigned to them.
For simplicity's sake, could you update those NULL values to
have a value of “Other”?
Here’s an extra challenge for you – instead of updating them to
“Other”, could you update them to be the same division as the
most common division within their respective factories?
Thanks!

*Copyright Maven Analytics, LLC


KEY TAKEAWAYS

A function applies a calculation or transformation to rows of data


• An aggregate function applies a calculation to all rows and returns a single value (COUNT, SUM, etc.)
• A window function performs a calculation across a window of rows (OVER, PARTITION BY, etc.)
• A general function performs a calculation or transformation on all rows

Specific functions can be applied to specific data types


• If needed, you can CAST or CONVERT a field into a different data type to apply a particular function
• Common numeric functions include LOG, ROUND, etc.
• Common datetime functions include YEAR, DATEDIFF, etc.
• Common string functions include TRIM, REPLACE, REGEXP, etc.
• Common NULL functions include IFNULL, COALESCE, etc.

*Copyright Maven Analytics, LLC


DATA ANALYSIS APPLICATIONS

*Copyright Maven Analytics, LLC


DATA ANALYSIS APPLICATIONS

Now that we’ve introduced a variety of advanced SQL concepts, we’ll use this section to go
over common data analysis applications that utilize the techniques we’ve learned

TOPICS WE’LL COVER: GOALS FOR THIS SECTION:

• Learn to identify and handle duplicate values


Duplicate Values Min/Max Value Filtering
• Apply min / max value filtering in various ways
• Pivot data using conditional aggregations
Pivoting Rolling Calculations
• Perform rolling calculations, including subtotals,
cumulative sums, and moving averages
Imputing NULL Values • Filling in NULL values with imputed values

*Copyright Maven Analytics, LLC


DUPLICATE VALUES

Duplicate values can be present in various ways:

Duplicate Values

Min / Max Value


Filtering

Pivoting
These are fully duplicate rows

Rolling
Calculations These could potentially be:
1. Two employees with the same
name, in different regions
Imputing NULL 2. The same employee who
Values transferred regions This employee seems to have gotten a raise

*Copyright Maven Analytics, LLC


DUPLICATE VALUES

To view duplicate values:

Duplicate Values
• Use a combination of GROUP BY, COUNT, and HAVING

Min / Max Value


Filtering To exclude duplicate values:
• Use DISTINCT to exclude fully duplicate rows
Pivoting

• Use window functions to exclude partially duplicate rows


Rolling
Calculations

Imputing NULL
Values

*Copyright Maven Analytics, LLC


ASSIGNMENT: DUPLICATE VALUES

Results Preview
NEW MESSAGE
December 2, 2024

From: Stu Dious (Admin, Maven High School)


Subject: Remove duplicate students

Good morning!
We’ve learned that there’s a student who’s showing up
multiple times in our student records.
Can you generate a report of the students and their emails,
and exclude the duplicate student record?
Thank you!
Stu

*Copyright Maven Analytics, LLC


SOLUTION: DUPLICATE VALUES

Solution Code
NEW MESSAGE
December 2, 2024

From: Stu Dious (Admin, Maven High School)


Subject: Remove duplicate students

Good morning!
We’ve learned that there’s a student who’s showing up
multiple times in our student records.
Can you generate a report of the students and their emails,
and exclude the duplicate student record?
Thank you!
Stu

*Copyright Maven Analytics, LLC


MIN / MAX VALUE FILTERING

Min / max value filtering allows you to filter data based on the lowest or highest
values within each group
Duplicate Values
EXAMPLE Return the most recent sales amount for each sales rep

Min / Max Value


Filtering

Pivoting

Rolling
Calculations Only the date is returned, but we
want the sales amount as well

Imputing NULL
Values

There are two approaches you can take to include the sales amount:
• Use a GROUP BY with a JOIN
• Use a window function

*Copyright Maven Analytics, LLC


ASSIGNMENT: MIN / MAX VALUE FILTERING

Results Preview
NEW MESSAGE
December 3, 2024

From: Stu Dious (Admin, Maven High School)


Subject: Top grade for each student

Hi again,
Can you create a report of each student with their highest
grade for the semester, as well as which class it was in?
Thanks!
Stu

*Copyright Maven Analytics, LLC


SOLUTION: MIN / MAX VALUE FILTERING

Solution Code
NEW MESSAGE
December 3, 2024

From: Stu Dious (Admin, Maven High School)


Subject: Top grade for each student

Hi again,
Can you create a report of each student with their highest ORDER BY in CTE is not needed and can be omitted
grade for the semester, as well as which class it was in?
Thanks!
Stu

*Copyright Maven Analytics, LLC


PIVOTING

Pivoting lets you transform rows into columns to summarize your data
• This can be achieved using CASE statements
Duplicate Values
• PIVOT is available in some RDBMS’s like SQL Server and Oracle

Min / Max Value


Filtering EXAMPLE Create a summary table by pivoting the crust type column in the pizza table

Pivoting

Rolling
Calculations

Imputing NULL
Values

*Copyright Maven Analytics, LLC


ASSIGNMENT: PIVOTING

Results Preview
NEW MESSAGE
December 4, 2024

From: Stu Dious (Admin, Maven High School)


Subject: Summary table

Hello,
Can you help us create a summary table that shows the
average grade for each department and grade level?
Thanks for all your help this week!
Stu

*Copyright Maven Analytics, LLC


SOLUTION: PIVOTING

Solution Code
NEW MESSAGE
December 4, 2024

From: Stu Dious (Admin, Maven High School)


Subject: Summary table

Hello,
Can you help us create a summary table that shows the
average grade for each department and grade level?
Thanks for all your help this week!
Stu

*Copyright Maven Analytics, LLC


ROLLING CALCULATIONS

Rolling calculations including subtotals, cumulative sums, and moving averages


allow you to perform calculations across rows of data
Duplicate Values

Min / Max Value


Filtering
Subtotals

Pivoting

Rolling
Calculations

Imputing NULL
Values

Use the WITH ROLLUP keywords to calculate


subtotals (not supported in SQLite)

*Copyright Maven Analytics, LLC


ROLLING CALCULATIONS

Rolling calculations including subtotals, cumulative sums, and moving averages


allow you to perform calculations across rows of data
Duplicate Values

Min / Max Value


Filtering
Cumulative sum

Pivoting

Rolling
Calculations
Use SUM() as a window function to
Imputing NULL calculate the cumulative sum
Values

*Copyright Maven Analytics, LLC


ROLLING CALCULATIONS

Rolling calculations including subtotals, cumulative sums, and moving averages


allow you to perform calculations across rows of data
Duplicate Values

Min / Max Value


Filtering
Moving average

Pivoting

Rolling
Calculations

Imputing NULL
Values

Use AVG() as a window function to


calculate the moving average

*Copyright Maven Analytics, LLC


ASSIGNMENT: ROLLING CALCULATIONS

Results Preview
NEW MESSAGE
December 5, 2024

From: Mandy Smore (Analyst, Maven Candies)


Subject: Monthly sales report

Hi again,

We have one final request for you.

Can you help us generate a report that shows the total sales
for each month, as well as the cumulative sum of sales and the
six-month moving average of sales?

Thanks for all your help over the past month!


Mandy

*Copyright Maven Analytics, LLC


SOLUTION: ROLLING CALCULATIONS

Solution Code
NEW MESSAGE
December 5, 2024

From: Mandy Smore (Analyst, Maven Candies)


Subject: Monthly sales report

Hi again,

We have one final request for you.

Can you help us generate a report that shows the total sales
for each month, as well as the cumulative sum of sales and the
six-month moving average of sales?

Thanks for all your help over the past month!


Mandy

ORDER BY in CTE is not


needed and can be omitted

*Copyright Maven Analytics, LLC


DEMO: IMPUTING NULL VALUES

Imputing values means replacing NULL values in the data with other values

Duplicate Values
We’ll cover four different approaches on how to do this in SQL:
Min / Max Value
Filtering 1. Hard coded value (integer)
2. Average of a column (subquery)
Pivoting
3. Prior row’s value (window function)
Rolling
Calculations
4. Smoothed value (two window functions)

Imputing NULL
Values
In this final demo, we’ll be writing a single query that contains techniques learned in every section of this course! It
includes a JOIN, UNION, subquery, recursive CTE, window function, numeric function and NULL function.

*Copyright Maven Analytics, LLC


KEY TAKEAWAYS

There are many ways to identify and handle duplicate values


• Use HAVING to view duplicate rows, and DISTINCT or window functions to exclude duplicate rows

Min / max value filtering allows you to filter data within each group
• This can be accomplished with a combination of GROUP BY and JOIN, or with a window function

Pivoting transforms row values into columns to summarize your data


• This can be accomplished by using CASE statements with aggregate functions, or PIVOT in some tools

Rolling calculations include subtotals, cumulative sums & moving averages


• This can be done using the WITH ROLLUP keywords or window functions with SUM() and AVG()

There are many options for imputing NULL values, or filling in missing data
• Options include using hard coded values, column aggregations, relative row values and more

*Copyright Maven Analytics, LLC


FINAL PROJECT

*Copyright Maven Analytics, LLC


FINAL PROJECT

THE You’ve just been hired as a Data Analyst Intern for Major League Baseball (MLB),
SITUATION who has recently gotten access to a large amount of historical player data

You have access to decades worth of data including player statistics like schools
THE attended, salaries, teams played for, height and weight, and more
ASSIGNMENT Your task is to use advanced SQL querying techniques to track how player statistics
have changed over time and across different teams in the league

1. What schools do MLB players attend?


THE
2. How much do teams spend on player salaries?
OBJECTIVES
3. What does each player’s career look like?
4. How do player attributes compare?

Data source: http://seanlahman.com *Copyright Maven Analytics, LLC

You might also like