0% found this document useful (0 votes)
37 views

Part 4 - Grouping Data and Subqueries

- The document discusses SQL expressions and functions, grouping, and subqueries. - Grouping is used to divide data into sets and perform aggregate calculations on those sets using the GROUP BY clause. The GROUP BY clause calculates metrics for each unique value in the specified column(s). - The HAVING clause is used instead of the WHERE clause to filter groups, since it applies after data is grouped, while WHERE applies before grouping.

Uploaded by

chi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views

Part 4 - Grouping Data and Subqueries

- The document discusses SQL expressions and functions, grouping, and subqueries. - Grouping is used to divide data into sets and perform aggregate calculations on those sets using the GROUP BY clause. The GROUP BY clause calculates metrics for each unique value in the specified column(s). - The HAVING clause is used instead of the WHERE clause to filter groups, since it applies after data is grouped, while WHERE applies before grouping.

Uploaded by

chi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Database Concepts and Skills for Big

Data
Le Duy Dung (Andrew)
SQL – Part 3
SQL Expressions and Functions
Grouping
❑ Grouping is a tool for dividing data into sets
❑ We can perform aggregate calculations on the resulted sets
❑ Groups are created using GROUP BY clause
Group By
❑ Example: Group the vendors by their cities and return the total
number of Vendors in each city
• To calculate the number of vendors in one specific city we have:
SELECT COUNT(*) AS NumberOfVendors
FROM VENDORS
WHERE City = ‘Seattle’;

• To calculate the number of vendors in each city using GROUP BY we have:


SELECT CITY, COUNT(*) AS NumberOfVendors
FROM VENDORS
GROUP BY City;
Group By
❑ The Group By clause calculates the number of vendors in each city .
❑ Using Group By we don’t need to specify each city. The statement will
find all cities in the City column.
❑ The Group By statement can contain multiple columns
❑ The Group By statement can be nested (group inside group)
❑ Most SQL implementations do not allow to group a column with
variable length data type (we will cover data types later)
❑ Group By clause come after any WHERE clause and before any
ORDER BY clause
Group By
❑ When using GROUP BY, instead of WHERE keyword, the HAVING
clause should be used.
❑ If we use the WHERE condition, it will be applied before grouping.
❑ In other words:
• The WHERE clause specifies which rows will used to determine the groups.
• The HAVING clause specifies which groups will be used in the final results.
Group By and Having
❑ Example: Find all the cities with at least 2 vendors
• Instead of WHERE, use HAVING:
SELECT City, COUNT(*) as NumOfVendor
FROM Vendor
GROUP BY City
HAVING COUNT(*) > 1;

❑ The reason WHERE does not work here because filtering is based on
the aggregated value, not the value of the rows
❑ In other words, WHERE filters before data is grouped; HAVING
filters after data is grouped.
Group By and Having
❑ Example: Find all customers who bought more than $1000 at the
store.
SELECT CustomerID, SUM(Total) as TotalBought
FROM SALE
GROUP BY CustomerID
Having SUM(Total) > 1000;
Subqueries
❑ A subquery is query embedded into other query
❑ Let’s consider an example:
❑ We want to recognize the best performing salesperson among the
employee.
❑ We need to know the total sale of each employee
❑ Get the employee with the highest sale.
Subqueries
We want to recognize the best performing salesperson among
the employee.
SELECT Top 1 EmployeeID, TotalSale
FROM (SELECT EmployeeID, SUM(Total) AS TotalSale
FROM SALE
GROUP BY EmployeeID)
ORDER BY TotalSale DESC;

What if we want to retrieve the name of this best performing employee?


Subqueries
• The subqueries are processed starting from the innermost
query and working outward
• Each subquery returns values to the upper-level query
• Subqueries can only process a single column.
QnA!

You might also like