0% found this document useful (0 votes)
69 views15 pages

Lecture 11 DMS

Uploaded by

dhkhang0803
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views15 pages

Lecture 11 DMS

Uploaded by

dhkhang0803
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

12/5/2023

Data Science for Economics and Business

DATABASE MANAGEMENT SYSTEM

SQL for Data Analytics

Lecture 12: Aggregate functions and Window functions

1
12/5/2023

Lecture Agenda

• Aggregate functions and Group By


• Window functions

SQL for Data Analytics

Section 1: Aggregate functions and Group By

2
12/5/2023

Aggregate Functions - Syntax


An aggregate function performs a calculation on a set of values, and returns a single value. Aggregate
functions ignore null values and are often used with the GROUP BY clause of the SELECT statement

● MIN() Syntax

-- Returns the smallest value of the selected column --


SELECT MIN(column_name)
FROM table_name

● MAX() Syntax

-- Returns the largest value of the selected column


SELECT MAX(column_name)
FROM table_name
● Try:

SELECT MIN(SalesAmount) as Lowest_sales, MAX(SalesAmount) as highest_sales


FROM [FactInternetSales]

Aggregate Functions - Syntax

● COUNT() Syntax

-- Returns the number of rows that matches a specified criteria


SELECT COUNT(columns_name)
FROM table_name
● AVG() Syntax
-- Returns the average value of a numeric column
SELECT AVG(column_name)
FROM table_name
● SUM() Syntax

-- Returns the total sum of a numeric column


SELECT SUM(column_name)
FROM table_name

3
12/5/2023

Aggregate Functions - GROUP BY Statement


● The GROUP BY statement is often used with aggregate functions such as COUNT(), MAX(),
MIN(), SUM(), AVG() to group the result-set by one or more columns.

SELECT column1, Aggregate Functions(column2)


FROM table_name
WHERE [condition]
GROUP BY column1

● Try:
SELECT OrderDate
, MAX(SalesAmount) as highest_sales
, SUM(SalesAmount) as total_sales
FROM [FactInternetSales]
WHERE OrderDate >= '2011-01-01'
GROUP BY OrderDate

Aggregate Functions - GROUP BY Statement

4
12/5/2023

Aggregate Functions - GROUP BY Practice

Exercise 1 : Write a query that displays the count of orders placed by each year for each customer
using the FactInternetsales table

Exercise 2: Write a query using DimProduct and DimProductSubcategory tables to display number of
product in each SubcategoryName (English)

SQL Aggregate Functions - HAVING Clause


● The HAVING clause is added to SQL because the WHERE clause could not be used with
aggregate functions.

SELECT column1, Aggregate Functions(column2)


FROM table_name
WHERE condition
GROUP BY column1
HAVING Aggregate Functions(column2) condition
● Try:

SELECT MAX(SalesAmount) as highest_sales


, SUM(SalesAmount) as total_sales
, OrderDate
FROM [FactInternetSales]
WHERE OrderDate >= '2011-01-01'
GROUP BY OrderDate
HAVING SUM(SalesAmount) > 10000

10

5
12/5/2023

SQL Aggregate Functions – HAVING Practice

Exercise : The company is about to run a loyalty scheme to retain customers having total value
of orders greater than 5000 USD per year. From FactInternetSales table, retrieve the list of
qualified customers and the corresponding year.

11

SQL for Data Analytics

Section 2: Window Functions

12

6
12/5/2023

RANK () and ROW_NUMBER()


► Numbers the output of a result set. More specifically, returns the sequential number of a row within a
partition of a result set, starting at 1 for the first row in each partition.
► ROW_NUMBER and RANK are similar. ROW_NUMBER numbers all rows sequentially (for example 1, 2, 3,
4, 5). RANK provides the same numeric value for ties (for example 1, 2, 2, 4, 5).
► Syntax:

ROW_NUMBER ()
OVER ( [ PARTITION BY value_expression , ... [ n ] ]
order_by_clause )

RANK() OVER (
[PARTITION BY partition_expression, ... ]
ORDER BY sort_expression [ASC | DESC], ...)

● First, the PARTITION BY clause divides the rows of the result set partitions to which the function is applied.
● Second, the ORDER BY clause specifies the logical sort order of the rows in each a partition to which the function
is applied

13

RANK () and ROW_NUMBER()

SELECT TOP 100 ProductKey


, SUM(OrderQuantity) AS Total_OrderQuantity
, ROW_NUMBER() OVER (ORDER BY SUM(OrderQuantity) DESC) as Row_Number
FROM FactResellerSales
GROUP BY ProductKey

14

7
12/5/2023

RANK () - ROW_NUMBER() - Practice

Exercise 1: Find out 5 SaleOrderNumber with highest SalesAmount in InternetSales table

Exercise 2: Find out 5 SaleOrderNumber with highest SalesAmount by each month in


InternetSales tables.

15

Window Functions

16

8
12/5/2023

Example – LEAD & LAG Functions

SELECT
DateKey
, LEAD(DateKey) OVER (Order By DateKey DESC) as LEAD_Date
, LAG(DateKey) OVER (Order By DateKey DESC) as LAG_Date
FROM DimDate

17

Example – Aggregate in Windows


WITH Quota AS
(
SELECT CalendarYear
, CalendarQuarter
, EmployeeKey
, SUM(SalesAmountQuota) as SalesAmountQuota
FROM FactSalesQuota
GROUP BY CalendarYear, CalendarQuarter, EmployeeKey
)

SELECT
CalendarYear AS Year
, CalendarQuarter AS Quarter
, EmployeeKey
, SalesAmountQuota AS SalesQuota
, LEAD(SalesAmountQuota,1,0) OVER (PARTITION BY Employeekey ORDER BY CalendarYear, CalendarQuarter) AS NextQuota
, SalesAmountQuota - LEAD(SalesAmountQuota,1,0)
OVER (PARTITION BY Employeekey ORDER BY CalendarYear, CalendarQuarter) AS Diff
, SUM(SalesAmountQuota) OVER (PARTITION BY CalendarYear, CalendarQuarter) AS Total_Quater_Quota
FROM Quota
ORDER BY EmployeeKey, CalendarYear, CalendarQuarter;

18

9
12/5/2023

SQL for Data Analytics

Section 2: Sorting Query Results

19

Sorting Results

Use ORDER BY to sort results by one or more columns


• Aliases created in SELECT clause are visible to ORDER BY
• You can order by columns in the source that are not included in the SELECT clause
• You can specify ASC or DESC (ASC is the default)

SELECT ProductCategoryID AS Category, [Name]


FROM SalesLT.Product
ORDER BY Category ASC, ListPrice DESC;

20

10
12/5/2023

Limiting Sorted Results

Use TOP to limit the number or percentage of rows returned by a query


• Works with ORDER BY clause to limit rows by sort order
• Added to SELECT clause:
• SELECT TOP N [Percent] [WITH TIES]

SELECT TOP 10 Name, ListPrice


FROM SalesLT.Product
ORDER BY ListPrice DESC;

21

Paging Through Results

OFFSET-FETCH is an extension to the ORDER BY clause:


• Allows returning a requested range of rows
• Provides a mechanism for paging through results
• Specify number of rows to skip, number of rows to retrieve

SELECT ProductID, [Name], ListPrice


FROM SalesLT.Product
ORDER BY ListPrice DESC
OFFSET 0 ROWS -- Skip zero rows
FETCH NEXT 10 ROWS ONLY; -- Get the next 10

22

11
12/5/2023

SQL for Data Analytics

Section 3: Filtering Query Results

23

Removing Duplicates
City CountryRegion
Aurora Canada
• SELECT ALL Barrie Canada
• Default behavior includes duplicates Brampton Canada
Brossard Canada
SELECT City, CountryRegion Brossard Canada
Burnaby Canada
FROM SalesLT.Address Burnaby Canada
ORDER BY CountryRegion, City; Burnaby Canada
Calgary Canada
Calgary Canada
• SELECT DISTINCT
• Removes duplicates City CountryRegion
Aurora Canada
SELECT DISTINCT City, CountryRegion Barrie Canada
FROM SalesLT.Address Brampton Canada
ORDER BY CountryRegion, City; Brossard Canada
Burnaby Canada
Calgary Canada

24

12
12/5/2023

Filtering and Using Predicates

• WHERE Statement

SELECT ProductCategoryID AS Category, [Name]


FROM SalesLT.Product
WHERE ProductCategoryID = 18
AND ListPrice < 1000.00
ORDER BY Category, ListPrice DESC;

25

Filtering and Using Predicates


Group Operator Description

= Equal
<> Or != Not equal
Comparison operators > Greater than
< Less than
>= Greater than or equal
<= Less than or equal

AND Return records that meet all the conditions separated by AND in WHERE clause.

Logical operators OR Return records that meet any of the conditions separated by OR in WHERE clause.

NOT Return records that do not satisfy any of the conditions in WHERE clause.
[NOT] BETWEEN Returns values [NOT] within a given range

[NOT] IN Specify multiple values in a WHERE clause (a shorthand for multiple OR conditions)

SQL operators IS [NOT] NULL Returns records having [NOT] NULL values in the given fileds.

[NOT] LIKE Returns records that [DO NOT] match a specified pattern in a column.

26

13
12/5/2023

Module Review
You write a Transact-SQL query to list the available sizes for products. Each individual size should be listed only once. Which
query should you use?
❑ SELECT Size FROM Production.Product;
❑ SELECT DISTINCT Size FROM Production.Product;
❑ SELECT ALL Size FROM Production.Product;

You must return the InvoiceNo and TotalDue columns from the Sales.Invoice table in decreasing order of TotalDue value.
Which query should you use?
❑ SELECT * FROM Sales.Invoice ORDER BY TotalDue, InvoiceNo;
❑ SELECT InvoiceNo, TotalDue FROM Sales.Invoice ORDER BY TotalDue DESC;
❑ SELECT TotalDue AS DESC, InvoiceNo FROM Sales.Invoice;

Complete this query to return only products that have a Category value of 2 or 4:
SELECT Name, Price FROM Production.Product
❑ ORDER BY Category;
❑ WHERE Category BETWEEN 2 AND 4;
❑ WHERE Category IN (2, 4);

27

SQL for Data Analytics

It’s time for your questions

28

14
12/5/2023

THANK YOU !

29

15

You might also like