Business Intelligence SQL - Global
Business Intelligence SQL - Global
• Module 1: SQL
• Basics of SQL
• 1 table
• Where & Order by
• SQL is a non-procedural language, i.e. you specify what you want instead of
how you want to obtain it. The database management system (DBMS) will itself
interpret the SQL instructions and show the results ... SQL is suitable for any
DBMS!
• You need to install Libre Office on your laptop. (MacOS, Windows or Linux)
• You should/can also go to w3schools.com – SQL, and try basic SQL exercises
if you are an SQL novice.
[WHERE condition]
[GROUP BY column_list]
[HAVING condition]
[ORDER BY column_list]
• If we have only 1 table, we can omit the table name before the field
names:
SELECT Lastname, Emailaddress
FROM Dimcustomer
WHERE Maritalstatus = ‘M’;
Stephan Poelmans– Business Intell 10
Selection Conditions (continued)
• Comparison Operators = equal to <> different from
< lower than > greater than
<= lower or equal to >= greater or equal to
BETWEEN … AND … between two values
IN (list) equal to one of the values of the list
IS NULL equal to the NULL (blank) value
• Example: Show all products with a catalog price (list price) between $30 and
$100
SELECT *
FROM Dimproduct
WHERE Listprice BETWEEN 30 AND 100
In MS Access:
SELECT *
FROM Dimcustomer
WHERE BirthDate in (#1/15/1950#,#1/15/1970#)
In LibreOffice:
SELECT *
FROM Dimcustomer
WHERE Birthdate IN ( '1950-01-15', '1970-01-15' )
SELECT *
FROM Dimemployee
WHERE phone IS NOT NULL
Stephan Poelmans– Business Intell 12
Negations in the Where clause
• By adding the NOT operator to a selection condition we obtain the negation of
the selection condition.
E.g.: Return the name of the workers who are not “marketing manager”
SELECT *
FROM DimEmployee
WHERE phone IS NOT NULL
Stephan Poelmans– Business Intell 13
The use of Wildcards & Like
• The LIKE operator is used in a WHERE clause to search for a specified pattern in
a column.
• There are two wildcards often used in conjunction with the LIKE operator:
• % - The percent sign represents zero, one, or multiple characters
• Note: MS Access uses an asterisk (*) instead of the percent sign (%), and a question
mark (?) instead of the underscore (_).
Examples:
Select * Select *
From Dimcustomer From Dimcustomer
Where Lastname like ‘An%’ Where Lastname like ‘%an%’
• Sometimes a row must or may meet several selection conditions before it can
be selected.
• If a row must meet several selection conditions then they are linked together
with the AND operator. The composed selection condition is TRUE only if all
individual selection conditions are TRUE.
• If a row may meet several selection conditions then they are linked together
with the OR operator. The composed selection condition is TRUE if at least
one individual selected condition is TRUE
SELECT *
FROM Dimemployee
WHERE Gender = ‘F’
AND Maritalstatus = ‘M’
• All female employees or employees who are married (so no single males):
SELECT *
FROM Dimemployee
WHERE Gender = ‘F’
OR Maritalstatus = ‘M’
SELECT *
FROM DimEmployee
WHERE Gender = ‘M’
AND (Birthdate> #31-DEC-59# OR title = “chief financial officer”)
In LibreOffice:
SELECT *
FROM Dimemployee where Gender ='M' and (Birthdate > '1959-12-31'
or Title = ’Chief Financial Officer’);
So: it is anyway about male employees, they must also be either CFO or born
after 1959 ...
Stephan Poelmans– Business Intell 17
The Order By clause: Sorting the displayed rows
• The ORDER BY clause:
By adding an ORDER BY clause as last sentence of a query, we can show the
selected rows in a sorted order following the values of a particular column.
For example: show all the employees ordered by their date of birth (oldest to
youngest)
SELECT *
FROM Dimemployee
ORDER BY Birthdate
SELECT *
FROM Dimemployee
ORDER BY birthdate desc
Stephan Poelmans– Business Intell 18
Sorting the displayed rows (cont’d)
• Ordering on multiple columns
For example: Show all employees. First show male employees, then the
females. Show by gender ('m' or 'f'), the employees in order of age, from
oldest to youngest
SELECT *
FROM Dimemployee
ORDER BY gender DESC, birthdate
• This can be solved by using “joins” between tables within SQL instructions
• When multiple tables in the FROM sentence are mentioned then those tables
are linked through a JOIN.
The JOIN of tables A and B are the columns of A and the columns of B.
The Cartesian product of two tables basically includes in its result all of the
possible combinations of records between two tables. This means that there
are no conditions for the join. Each row (record) of Table A is linked to all rows
of Table B
• This is as good as taking a Cartesian product and thereby ensuring that only
the correct records remain (i.e. the right customers with the right sales).
FK PK
FactInternetSales x DimCustomer
• Result: join with a join condition ... The value of the primary key needs to
match the value of the foreign key ...
Note: both options give the same results. This format is longer.
We advise to use option 1 (previous slide) when writing SQL.
• Next to the join condition we can also add additional selection conditions in a
query.
• Example: Show the product (by name) , the name of a customer and the
dates of his internet purchases, on the condition that the customer is from
France …
To solve this query we need the following tables:
1. FactInternetSales (the fact table)
2. DimCustomer (dimension table, name of customer)
3. DimGeography (dimension table, country of the customer)
4. DimProduct (dimension table, name of the product)
5. DimDate (date of Internet sale)
So 4 join conditions and an additional condition (the customer's country)
Stephan Poelmans– Business Intell 27
The SQL statement, using
DWAdventureworks.mdb (in Base, Libre Office):
SELECT Dimcustomer.Firstname, Dimcustomer.Lastname,
Dimproduct.Productname, Dimgeography.Countryregionname,
Dimdate.Datekey
• Choose an alias for your table names and use them throughout your query. At
least one character, not a number, no spaces.
• Why ? This shortens your query considerably!
• So group by "city" (set to the same cities together in a group), and within each
group (each city), count the number of customerkeys.
Stephan Poelmans– Business Intell 30
Group By, explained with an example
• Suppose a manager, wants to know the sales for each product. Let us assume
that s/he wants a list with products and their total sales. The sales amount can
be found in the facts table.
The first records of your result then looks like this – (in total you have 1000
facts):
Presenting a table with 1000 individual sales does not make much sense for the
manager. The business analist needs to give a subtotal for each product: so
26.97 for AWC Logo Cap for instance. Hence he groups by product name and
sums up the sales amounts.
• In this example: first group by city, then "within each city” group by sex.
• Note: in the Select clause we can also add a heading using AS. (e.g. AS
COUNTofCustomerKey) The name of the header is optional, it should be one
word.
For example:
In Seattle: 1
women and 2
men
• So: each group (each city) presented in the result should include > 2
customers
• Note that the count operation does not have to be in the select clause!
• When in the same query we both have a WHERE and a HAVING clause, the
query processor processes the query as follows:
1. Union Queries
2. ‘Top[number]’
3. Subqueries
• Union queries are used to combine data from multiple queries into one result.
• For example: Provide a list of names and telephone numbers of all employees
and all customers with an annual income greater than 25 000. (This does not
necessarily imply a link between customers and employees).
• SQL
select Firstname, Lastname, Phone
from Dimemployee
union
Select Firstname, Lastname, Phone
from Dimcustomer
where Yearlyincome > 25000
• You cannot just simply combine two SQL statements (queries) with "union“
• Conditions:
• Every query must have the same number of columns (fields)
• Incorrect Example:
• Select Firstname, Lastname, Birthdate
from Dimemployee
union
select Firstname, Lastname, Phone
from Dimcustomer
where Yearlyincome > 25000
Stephan Poelmans– Business Intell 42
1. Union Queries
• Incorrect Example:
• Select Totalchildren, Firstname, Lastname
from Dimcustomer
where Yearlyincome > 25000
union
select Firstname, Lastname, Phone
from Dimemployee
• Correct Example:
• Select Phone, Firstname, Lastname
from Dimcustomer
where Yearlyincome > 25000
union
select Firstname, Lastname, Phone
from Dimemployee
• The order of the fields is different, but because the data types are the same,
most database systems recognize that query. The first query determines the
order in which the attributes are displayed
Stephan Poelmans– Business Intell 43
2. ‘Top[number]’
• Top[number]: E.g. Top 1, Top 3, ... This instruction can be added immediately after
the word "select". Example:
• Note that the “top x” instruction only ensures that the x-first records of a query are
displayed! The youngest, oldest or highest are not automatically chosen. Example:
• Then:
• How do you modify the query to select the customer with the lowest
purchase amount?
• For example:
SELECT Customerkey, Lastname FROM Dimcustomer
WHERE Customerkey IN
(SELECT Customerkey
FROM Factinternet
GROUP BY Customerkey
HAVING SUM(Salesamount) < 800)
• The data that we want to select can thus also be queried with an INNER JOIN.
Why then using a subquery? A subquery is generally a lot clearer than a JOIN
and easy to formulate. Some RDBMS perform more quickly when using a
JOIN.
• Examples:
• Which employees are older than Guy Gilbert?
So you compare employees with each other
• Which customers are located in the same town as the customer Joanna
Suarez?
So you compare with other customers
• The operator "IN" can be used if the second query yields one or more records.
• "IN" means: the defined field of a record from the top query must "appear in" one or more
records from the second query.
SELECT Customerkey FROM Dimcustomer WHERE Customerkey IN
(SELECT Customerkey
• For example:
FROM Factinternet
GROUP BY Customerkey
HAVING SUM(Salesamount) < 800)
• The second query returns all "customerkey" on the customers who bought 800 or less ....
• This query can be shorter, but then the town of customers will not be displayed. (You
only use “Geographykey”)
SELECT Firstname, Lastname
FROM Dimcustomer
WHERE Geographykey = (SELECT Geographykey
FROM Dimcustomer
WHERE Lastname=‘Suarez‘
and Firstname = ‘Jacquelyn')
Stephan Poelmans– Business Intell 51
4. Rollup and Cube
• Rollup and Cube are two typical OLAP concepts (see below)
• Rollup and Cube can also be used as SQL statements. They calculate
subtotals.
• Both instructions are not provided in MS Access, but in database systems like
MS SQL Server, Oracle and MySQL
Cat Miami 18
Cat Naples 9
SELECT Sort, Place, SUM(Amount) as amount Dog Miami 12
FROM Animal Dog Naples 5
GROUP BY Sort, Place Dog Tampa 14
Turtle Naples 1
Turtle Tampa 4
NULL NULL 63
• Note: "rollup" and "cube" calculate subtotals as they are calculated in a Pivot
Table…
Result of the query “with cube”