@DataScience_ir - SQL CheatSheet
@DataScience_ir - SQL CheatSheet
Cheat Sheet
CONTENTS
SELECT 2
GROUP BY 4
AGGREGATE FUNCTIONS 4
ORDER BY 6
COMPUTATIONS 6
ROUNDING NUMBERS 6
TROUBLESHOOTING 7
JOIN 8
INSERT 8
UPDATE 9
DELETE 9
DATE AND TIME 10
INTERVALS 14
EXTRACTING PARTS OF DATES 16
GROUPING BY YEAR AND MONTH 17
CASE WHEN 18
GROUP BY EXTENSIONS 20
GROUPING SETS 20
CUBE 21
ROLLUP 22
COALESCE 23
COMMON TABLE EXPRESSIONS (CTE) 24
WINDOW FUNCTIONS 25
RANKING, RUNNING TOTAL 26
MOVING AVERAGE, DELTA 27
SQL
SQL, or Structured Query Language, is a language for talking to databases.
It lets you select specific data and build complex reports. Today, SQL is a
universal language of data, used in practically all technologies that
process data.
SELECT
Fetch the id and name columns from the product table:
SELECT id, name
FROM product;
Concatenate the name and the description to fetch the full description of
the products:
SELECT name || ' - ' || description
FROM product;
Fetch names of products that start with a 'P' or end with an 's':
SELECT name
FROM product
WHERE name LIKE 'P%' OR name LIKE '%s';
Fetch names of products that start with any letter followed by 'rain'
(like 'train' or 'grain'):
SELECT name
FROM product
WHERE name LIKE '_rain';
GROUP BY
PRODUCT
name category
Knife Kitchen
Pot Kitchen category count
Mixer Kitchen Kitchen 3
Jeans Clothing Clothing 3
Sneakers Clothing Electronics 2
Leggings Clothing
Smart TV Electronics
Laptop Electronics
AGGREGATE FUNCTIONS
Count the number of products:
SELECT COUNT(*)
FROM product;
Find the average price of products for each category whose average is
above 3.0:
SELECT category, AVG(price)
FROM product
GROUP BY category
HAVING AVG(price) > 3.0;
ORDER BY
Fetch product names sorted by the price column in the default
ASCending order:
SELECT name
FROM product
ORDER BY price [ASC];
COMPUTATIONS
Use +, -, *, / to do basic math. To get the number of seconds in a week:
SELECT 60 * 60 * 24 * 7;
-- result: 604800
ROUNDING NUMBERS
Round a number to its nearest integer:
SELECT ROUND(1234.56789);
-- result: 1235
TROUBLESHOOTING
INTEGER DIVISION
In PostgreSQL and SQL Server, the / operator performs integer division for
integer arguments. If you do not see the number of decimal places you
expect, it is because you are dividing between two integers. Cast one to
decimal:
123 / 2 -- result: 61
CAST(123 AS decimal) / 2 -- result: 61.5
DIVISION BY 0
To avoid this error, make sure the denominator is not 0. You may use the
NULLIF() function to replace 0 with a NULL, which results in a NULL for
the entire expression:
count / NULLIF(count_all, 0)
JOIN
JOIN is used to fetch data from multiple tables. To get the names of
products purchased in each order, use:
SELECT
orders.order_date,
product.name AS product,
amount
FROM orders
JOIN product
ON product.id = orders.product_id;
INSERT
To insert data into a table, use the INSERT command:
INSERT INTO category
VALUES
(1, 'Home and Kitchen'),
(2, 'Clothing and Apparel');
You may specify the columns to which the data is added. The remaining
columns are filled with predefined default values or NULLs.
INSERT INTO category (name)
VALUES ('Electronics');
UPDATE
To update the data in a table, use the UPDATE command:
UPDATE category
SET
is_active = true,
name = 'Office'
WHERE name = 'Ofice';
DELETE
To delete data from a table, use the DELETE command:
DELETE FROM category
WHERE name IS NULL;
Check out our interactive course How to INSERT, UPDATE, and DELETE
Data in SQL.
SORTING CHRONOLOGICALLY
Using ORDER BY on date and time columns sorts rows chronologically
from the oldest to the most recent:
SELECT order_date, product, quantity
FROM sales
ORDER BY order_date;
Use the DESCending order to sort from the most recent to the oldest:
SELECT order_date, product, quantity
FROM sales
ORDER BY order_date DESC;
Note: Pay attention to the end date in the query. The upper bound
'2023-08-01' is not included in the range. The timestamp '2023-
08-01' is actually the timestamp '2023-08-01 00:00:00.0'. The
comparison operator < is used to ensure the selection is made for all
timestamps less than '2023-08-01 00:00:00.0', that is, all
timestamps in July 2023, even those close to the midnight of August 1,
2023.
INTERVALS
An interval measures the difference between two points in time. For
example, the interval between 2023-07-04 and 2023-07-06 is 2 days.
Find customers who placed the first order within a month from the
registration date:
SELECT id
FROM customers
WHERE first_order_date >
registration_date + INTERVAL '1' month;
Note: In SQL Server, intervals are not implemented – use the DATEADD()
and DATEDIFF() functions.
Note that you must group by both the year and the month.
EXTRACT(MONTH FROM order_date) only extracts the month
number (1, 2, ..., 12). To distinguish between months from different years,
you must also group by year.
More about working with date and time values in our interactive
Standard SQL Functions course.
CASE WHEN
CASE WHEN lets you pass conditions (as in the WHERE clause), evaluates
them in order, then returns the value for the first condition met.
SELECT
name,
CASE
WHEN price > 150 THEN 'Premium'
WHEN price > 100 THEN 'Mid-range'
ELSE 'Standard'
END AS price_category
FROM product;
Here, all products with prices above 150 get the Premium label, those with
prices above 100 (and below 150) get the Mid-range label, and the rest
receives the Standard label.
Count the number of large orders for each customer using CASE WHEN
and SUM():
SELECT
customer_id,
SUM(
CASE WHEN quantity > 10
THEN 1 ELSE 0 END
) AS large_orders
FROM sales
GROUP BY customer_id;
GROUP BY EXTENSIONS
GROUPING SETS
GROUPING SETS lets you specify multiple sets of columns to group by in
one query.
SELECT region, product, COUNT(order_id)
FROM sales
GROUP BY
GROUPING SETS ((region, product), ());
CUBE
CUBE generates groupings for all possible subsets of the GROUP BY
columns.
SELECT region, product, COUNT(order_id)
FROM sales
GROUP BY CUBE (region, product);
ROLLUP
ROLLUP adds new levels of grouping for subtotals and grand totals.
SELECT region, product, COUNT(order_id)
FROM sales
GROUP BY ROLLUP (region, product);
COALESCE
COALESCE replaces the first NULL argument with a given value. It is often
used to display labels with GROUP BY extensions.
SELECT region,
COALESCE(product, 'All'),
COUNT(order_id)
FROM sales
GROUP BY ROLLUP (region, product);
WITH total_product_sales AS (
SELECT product, SUM(profit) AS total_profit
FROM sales
GROUP BY product
)
SELECT AVG(total_profit)
FROM total_product_sales;
WINDOW FUNCTIONS
Window functions compute their results based on a sliding window frame,
a set of rows related to the current row. Unlike aggregate functions,
window functions do not collapse rows.
RANKING
Rank products by price:
SELECT RANK() OVER(ORDER BY price), name
FROM product;
RANKING FUNCTIONS
RANK – gives the same rank for tied values, leaves gaps.
DENSE_RANK – gives the same rank for tied values without gaps.
ROW_NUMBER – gives consecutive numbers without gaps.
RUNNING TOTAL
A running total is the cumulative sum of a given value and all preceding
values in a column.
SELECT date, amount,
SUM(amount) OVER(ORDER BY date)
AS running_total
FROM sales;
MOVING AVERAGE
A moving average (a.k.a. rolling average, running average) is a technique
for analyzing trends in time series data. It is the average of the current
value and a specified number of preceding values.
SELECT date, price,
AVG(price) OVER(
ORDER BY date
ROWS BETWEEN 2 PRECEDING
AND CURRENT ROW
) AS moving_averge
FROM stock_prices;