Database Concepts and Skills for Big
Data
Le Duy Dung (Andrew)
SQL – Part 3
SQL Expressions and Functions
Grouping
❑ Grouping is a tool for dividing data into sets
❑ We can perform aggregate calculations on the resulted sets
❑ Groups are created using GROUP BY clause
Group By
❑ Example: Group the vendors by their cities and return the total
number of Vendors in each city
• To calculate the number of vendors in one specific city we have:
SELECT COUNT(*) AS NumberOfVendors
FROM VENDORS
WHERE City = ‘Seattle’;
• To calculate the number of vendors in each city using GROUP BY we have:
SELECT CITY, COUNT(*) AS NumberOfVendors
FROM VENDORS
GROUP BY City;
Group By
❑ The Group By clause calculates the number of vendors in each city .
❑ Using Group By we don’t need to specify each city. The statement will
find all cities in the City column.
❑ The Group By statement can contain multiple columns
❑ The Group By statement can be nested (group inside group)
❑ Most SQL implementations do not allow to group a column with
variable length data type (we will cover data types later)
❑ Group By clause come after any WHERE clause and before any
ORDER BY clause
Group By
❑ When using GROUP BY, instead of WHERE keyword, the HAVING
clause should be used.
❑ If we use the WHERE condition, it will be applied before grouping.
❑ In other words:
• The WHERE clause specifies which rows will used to determine the groups.
• The HAVING clause specifies which groups will be used in the final results.
Group By and Having
❑ Example: Find all the cities with at least 2 vendors
• Instead of WHERE, use HAVING:
SELECT City, COUNT(*) as NumOfVendor
FROM Vendor
GROUP BY City
HAVING COUNT(*) > 1;
❑ The reason WHERE does not work here because filtering is based on
the aggregated value, not the value of the rows
❑ In other words, WHERE filters before data is grouped; HAVING
filters after data is grouped.
Group By and Having
❑ Example: Find all customers who bought more than $1000 at the
store.
SELECT CustomerID, SUM(Total) as TotalBought
FROM SALE
GROUP BY CustomerID
Having SUM(Total) > 1000;
Subqueries
❑ A subquery is query embedded into other query
❑ Let’s consider an example:
❑ We want to recognize the best performing salesperson among the
employee.
❑ We need to know the total sale of each employee
❑ Get the employee with the highest sale.
Subqueries
We want to recognize the best performing salesperson among
the employee.
SELECT Top 1 EmployeeID, TotalSale
FROM (SELECT EmployeeID, SUM(Total) AS TotalSale
FROM SALE
GROUP BY EmployeeID)
ORDER BY TotalSale DESC;
What if we want to retrieve the name of this best performing employee?
Subqueries
• The subqueries are processed starting from the innermost
query and working outward
• Each subquery returns values to the upper-level query
• Subqueries can only process a single column.
QnA!