0% found this document useful (0 votes)
16 views

Business Intelligence SQL - Global

Uploaded by

Rihab
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Business Intelligence SQL - Global

Uploaded by

Rihab
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

Business Intelligence

Prof. Dr. Stephan Poelmans

LIRIS : Research Centre for


Information Systems Engineering
Faculty of Business and Economics

Office: ‘t Serclaes Building, A06.08

Stephan Poelmans– Business Intell


Planning

• Introduction to Business Intelligence

• Module 1: SQL
• Basics of SQL
• 1 table
• Where & Order by

• Joins & Group By & Having

• Subqueries and Union

• Note: We will use Base in Libre Office to do exercises

• Module 2: Data Mining (with Weka)


Stephan Poelmans– Business Intell 2
Architectural Perspective

Stephan Poelmans– Business Intell 3


SQL (Structured Query Language): Objective

• SQL is a database-software-independent language that allows the


user/designer to perform the following three operations

• Create a relational database (both operational databases and data


warehouses)

• Load a relational database

• Question (query) a relational database

Stephan Poelmans– Business Intell 4


SQL (Structured Query Language): Objective
(continued)

• SQL consists of two parts:


• The instructions for creating the database structure (logical and physical
model) is called a Data Definition Language (DDL)

• The instructions to enter, retrieve and update data is called a Data


Manipulation Language (DML)

• In this chapter, we mainly use SQL as DML

• SQL is a non-procedural language, i.e. you specify what you want instead of
how you want to obtain it. The database management system (DBMS) will itself
interpret the SQL instructions and show the results ... SQL is suitable for any
DBMS!

Stephan Poelmans– Business Intell 5


What is required for this module and the SQL
Exercise sessions?

• You need to install Libre Office on your laptop. (MacOS, Windows or Linux)

• In Libre Office we will use BASE as a relational DBMS (equivalent to MS


Access)

• You should open the AdvendureDW_Micro1000Facts.odb, (to be downloaded


from Toledo)
(It is a mini data warehouse, just for training purposes.)

• You need to understand the structure, the snowflake schema, of


AdvendureDW_Micro1000Facts (See next slide).

• You should/can also go to w3schools.com – SQL, and try basic SQL exercises
if you are an SQL novice.

Stephan Poelmans– Business Intell 6


AdventureDW_Micro1000Facts.odb

Stephan Poelmans– Business Intell 7


General SQL Select Instruction (Statement)

SELECT [DISTINCT | ALL] {* | attribute (-expression) [AS new_name] [,...] }

FROM table [, ...]

[WHERE condition]

[GROUP BY column_list]

[HAVING condition]

[ORDER BY column_list]

Stephan Poelmans– Business Intell 8


The basic SELECT instruction: simple example

• We have a dimension table with customer data (DimCustomer) ...


We want: a list with the name of the customers with more than 3 children ...

SELECT DimCustomer.Firstname, DimCustomer.TotalChildren


FROM DimCustomer
WHERE DimCustomer.TotalChildren > 3;

Table DimCustomer: Field

Stephan Poelmans– Business Intell 9


Selection Conditions
• To select certain records from a table, we use the “WHERE” clause in the SQL
statement.
• The "Where" clause always comes after the "From" clause ...
• E.g. Display familyname and email address of all customers who are married.
• The where condition can be True or False.

SELECT Dimcustomer.Lastname, Dimcustomer.Emailaddress


FROM Dimcustomer
WHERE Dimcustomer.Maritalstatus = ‘M’;

• If we have only 1 table, we can omit the table name before the field
names:
SELECT Lastname, Emailaddress
FROM Dimcustomer
WHERE Maritalstatus = ‘M’;
Stephan Poelmans– Business Intell 10
Selection Conditions (continued)
• Comparison Operators = equal to <> different from
< lower than > greater than
<= lower or equal to >= greater or equal to
BETWEEN … AND … between two values
IN (list) equal to one of the values of the list
IS NULL equal to the NULL (blank) value
• Example: Show all products with a catalog price (list price) between $30 and
$100

SELECT *
FROM Dimproduct
WHERE Listprice BETWEEN 30 AND 100

* = All fields from a table


Result: a list of articles between $ 30 and $ 100, including 30 and 100!

Stephan Poelmans– Business Intell 11


Examples
• Show all details of customers who were born on January 15,1950 or on 15/01/1970.

In MS Access:
SELECT *
FROM Dimcustomer
WHERE BirthDate in (#1/15/1950#,#1/15/1970#)

In LibreOffice:
SELECT *
FROM Dimcustomer
WHERE Birthdate IN ( '1950-01-15', '1970-01-15' )

• Show all employees whose phone number is unknown

SELECT *
FROM Dimemployee
WHERE phone IS NOT NULL
Stephan Poelmans– Business Intell 12
Negations in the Where clause
• By adding the NOT operator to a selection condition we obtain the negation of
the selection condition.

E.g.: Return the name of the workers who are not “marketing manager”

SELECT Lastname, Firstname


FROM Dimemployee
WHERE NOT title = ‘Marketing Manager’

• Other negations are NOT BETWEEN … AND …


NOT IN (list)
IS NOT NULL
• Return the name of the workers from which the phone number is known

SELECT *
FROM DimEmployee
WHERE phone IS NOT NULL
Stephan Poelmans– Business Intell 13
The use of Wildcards & Like
• The LIKE operator is used in a WHERE clause to search for a specified pattern in
a column.

• There are two wildcards often used in conjunction with the LIKE operator:
• % - The percent sign represents zero, one, or multiple characters

• _ - The underscore represents a single character

• Note: MS Access uses an asterisk (*) instead of the percent sign (%), and a question
mark (?) instead of the underscore (_).

Examples:

Select * Select *
From Dimcustomer From Dimcustomer
Where Lastname like ‘An%’ Where Lastname like ‘%an%’

Stephan Poelmans– Business Intell 14


Composed selection conditions

• Sometimes a row must or may meet several selection conditions before it can
be selected.

• If a row must meet several selection conditions then they are linked together
with the AND operator. The composed selection condition is TRUE only if all
individual selection conditions are TRUE.

• If a row may meet several selection conditions then they are linked together
with the OR operator. The composed selection condition is TRUE if at least
one individual selected condition is TRUE

Stephan Poelmans– Business Intell 15


Examples

• All female employees who are married:

SELECT *
FROM Dimemployee
WHERE Gender = ‘F’
AND Maritalstatus = ‘M’

• All female employees or employees who are married (so no single males):

SELECT *
FROM Dimemployee
WHERE Gender = ‘F’
OR Maritalstatus = ‘M’

Stephan Poelmans– Business Intell 16


Examples
• Show all male employees who are “chief financial officer” (CFO) or who were
born after December 31, 1959.

SELECT *
FROM DimEmployee
WHERE Gender = ‘M’
AND (Birthdate> #31-DEC-59# OR title = “chief financial officer”)

In LibreOffice:

SELECT *
FROM Dimemployee where Gender ='M' and (Birthdate > '1959-12-31'
or Title = ’Chief Financial Officer’);

So: it is anyway about male employees, they must also be either CFO or born
after 1959 ...
Stephan Poelmans– Business Intell 17
The Order By clause: Sorting the displayed rows
• The ORDER BY clause:
By adding an ORDER BY clause as last sentence of a query, we can show the
selected rows in a sorted order following the values of a particular column.
For example: show all the employees ordered by their date of birth (oldest to
youngest)
SELECT *
FROM Dimemployee
ORDER BY Birthdate

• Order in a descending order


By default, data is sorted from small to large, from a to z, and from old to new. The sort
order can be reversed by adding DESC at the end of ORDER BY clause.
For example: Show all employees in order of age, from youngest to oldest

SELECT *
FROM Dimemployee
ORDER BY birthdate desc
Stephan Poelmans– Business Intell 18
Sorting the displayed rows (cont’d)
• Ordering on multiple columns

More than one column can be used to order the rows

For example: Show all employees. First show male employees, then the
females. Show by gender ('m' or 'f'), the employees in order of age, from
oldest to youngest

SELECT *
FROM Dimemployee
ORDER BY gender DESC, birthdate

Stephan Poelmans– Business Intell 19


Multiple tables in SQL (“JOINS”)

• A JOIN without selection conditions

In the previous examples we worked with a single table. The design of a


database, however, typically results in different tables linked with each other
(through keys). In a data warehouse, the link is typically the relationship
between fact tables and dimension tables

• To fulfill business requirements, users of a data warehouse usually need


information from both the facts and dimension tables

• This can be solved by using “joins” between tables within SQL instructions

Stephan Poelmans– Business Intell 20


Multiple tables in SQL (“JOINS”) (continued)

• When multiple tables in the FROM sentence are mentioned then those tables
are linked through a JOIN.

The JOIN of tables A and B are the columns of A and the columns of B.

• A "join" in SQL terms is actually a Cartesian product

The Cartesian product of two tables basically includes in its result all of the
possible combinations of records between two tables. This means that there
are no conditions for the join. Each row (record) of Table A is linked to all rows
of Table B

Stephan Poelmans– Business Intell 21


The Cartesian product (a "join") without conditions
FactInternetSales x DimCustomer

• Assume 2 tables: FactInternetSales and DimCustomer. Without further


conditions, the Cartesian product is the combination of all the records between
two tables.

• Result: without "join" conditions ...


ProductKey OrderDateKey CustomerKey CustomerKey Firstname LastName
310 20010701 21768 14747 Aidan Wood

310 20010701 21768 21768 Cole Watson

310 20010701 21768 28389 Rachael Martinez

346 20010701 28389 14747 Aidan Wood

346 20010701 28389 21768 Cole Watson

346 20010701 28389 28389 Rachael Martinez

Stephan Poelmans– Business Intell 22


The Cartesian product (join) with conditions

• In the previous example, all combinations were given. In a relational database,


tables are however linked in a defined way (foreign keys refer to primary
keys!). The relationship between a foreign and primary key indicates a one to
many (or one on one) relationship. The value of a primary key corresponds to
many possible values of a foreign key

• In other words, instead of all possible combinations of Internet sales and


customers, the next question might be useful: “For each client, give the web
orders that he/she has sent”. This is clearly more useful than all customers
and all Internet sales combined ...

Stephan Poelmans– Business Intell 23


A Join between 2 tables

• This is as good as taking a Cartesian product and thereby ensuring that only
the correct records remain (i.e. the right customers with the right sales).
FK PK
FactInternetSales x DimCustomer

• Result: join with a join condition ... The value of the primary key needs to
match the value of the foreign key ...

ProductKey OrderDateKey CustomerKey CustomerKey Firstname LastName

310 20010701 21768 21768 Cole Watson

346 20010701 28389 28389 Rachael Martinez

Stephan Poelmans– Business Intell 24


The appropriate SQL statement: option 1
SELECT Factinternetsales.ProductKey, Factinternetsales.OrderDateKey,
Factinternetsales.CustomerKey, Dimcustomer.CustomerKey,
Dimcustomer.FirstName, Dimcustomer.LastName
FROM Factinternetsales, Dimcustomer
Where Dimcustomer.CustomerKey = Factinternetsales.CustomerKey

Note : if more than 1 table is involved in a query, table names need to be


added (before each field). Eg. Customer.customerkey

Stephan Poelmans– Business Intell 25


The appropriate SQL statement: option 2

SELECT Factinternetsales.Productkey, Factinternetsales.Orderdatekey,


Factinternetsales.Customerkey, Dimcustomer.Customerkey,
Dimcustomer.Firstname, Dimcustomer.Lastname
FROM Factinternetsales INNER JOIN Dimcustomer
on Dimcustomer.Customerkey = Factinternetsales.Customerkey

Note: both options give the same results. This format is longer.
We advise to use option 1 (previous slide) when writing SQL.

Stephan Poelmans– Business Intell 26


A Join between >2 tables
• If more than two tables are required, more "join conditions" are necessary.
• The number of join conditions is: number of tables -1.

• Next to the join condition we can also add additional selection conditions in a
query.

• Example: Show the product (by name) , the name of a customer and the
dates of his internet purchases, on the condition that the customer is from
France …
To solve this query we need the following tables:
1. FactInternetSales (the fact table)
2. DimCustomer (dimension table, name of customer)
3. DimGeography (dimension table, country of the customer)
4. DimProduct (dimension table, name of the product)
5. DimDate (date of Internet sale)
So 4 join conditions and an additional condition (the customer's country)
Stephan Poelmans– Business Intell 27
The SQL statement, using
DWAdventureworks.mdb (in Base, Libre Office):
SELECT Dimcustomer.Firstname, Dimcustomer.Lastname,
Dimproduct.Productname, Dimgeography.Countryregionname,
Dimdate.Datekey

FROM Dimcustomer, Dimgeography, Factinternet, Dimdate, Dimproduct

WHERE Dimcustomer.Geographykey = Dimgeography.Geographykey


AND Factinternet.Customerkey = Dimcustomer.Customerkey
AND Factinternet.Orderdatekey = Dimdate.Datekey
AND Factinternet.Productkey = Dimproduct.Productkey

AND Dimgeography.Countryregionname = 'France'

Stephan Poelmans– Business Intell 28


Using Aliases to shorten your queries !

• Choose an alias for your table names and use them throughout your query. At
least one character, not a number, no spaces.
• Why ? This shortens your query considerably!

SELECT DC.Firstname, DC.Lastname, DP.Productname, DG.Countryregionname,


DD.Datekey
FROM Dimcustomer DC, Dimgeography DG, Factinternet F, Dimdate DD, Dimproduct DP
WHERE DC.Geographykey = DG.Geographykey
AND F.Customerkey = DC.Customerkey
AND F.Orderdatekey = DD.Datekey
AND F.Productkey = DP.Productkey
AND DG.Countryregionname = 'France'

Stephan Poelmans– Business Intell 29


"Subgroups" and Arithmetic Operations with SQL
• When using joins between tables, the result is often an extensive list of very
detailed rows (records). It is often necessary to split the result of a join into
"subgroups". A subgroup consists of one or more rows (records).

• Subgroups are defined in a query using a GROUP BY clause. The grouping of


the rows of a table happens on the basis of their value for one or more
columns.

• A 'group by' clause mostly assumes that on the level of a subgroup, a


calculation is performed.
For example: how many customers live in each of the different cities?
SELECT Dimgeography.City, COUNT(Dimcustomer.Customerkey)
FROM Dimgeography, Dimcustomer
WHERE Dimgeography.Geographykey = Dimcustomer.Geographykey
GROUP BY Dimgeography.City;

• So group by "city" (set to the same cities together in a group), and within each
group (each city), count the number of customerkeys.
Stephan Poelmans– Business Intell 30
Group By, explained with an example
• Suppose a manager, wants to know the sales for each product. Let us assume
that s/he wants a list with products and their total sales. The sales amount can
be found in the facts table.

• You as a business analist propose the following query:


SELECT P.Productname, F.Salesamount
FROM Dimproduct P, Factinternet F
WHERE P.Productkey = F.Productkey

The first records of your result then looks like this – (in total you have 1000
facts):

AWC Logo Cap has


been sold several
times, at the same
price

Stephan Poelmans– Business Intell 31


Group By, explained with an example

SELECT P.Productname, F.Salesamount


FROM Dimproduct P, Factinternet F
WHERE P.Productkey = F.Productkey

Presenting a table with 1000 individual sales does not make much sense for the
manager. The business analist needs to give a subtotal for each product: so
26.97 for AWC Logo Cap for instance. Hence he groups by product name and
sums up the sales amounts.

SELECT P.Productname, sum(F.Salesamount)


FROM Dimproduct P, Factinternet F
WHERE P.Productkey = F.Productkey
Group by P.Productname

Stephan Poelmans– Business Intell


"Subgroups" and Arithmetic Operations with SQL

• For example: Show the number of customers by city and by gender


SELECT Dimgeography.City, Dimcustomer.Gender, Count(Dimcustomer.Customerkey) AS CountOfCustomerKey
FROM Dimgeography, Dimcustomer
WHERE Dimgeography.Geographykey = Dimcustomer.Geographykey
GROUP BY Dimgeography.City, Dimcustomer.Gender;

• The order of the fields in the "Group By" clause is important.

• In this example: first group by city, then "within each city” group by sex.

• Note: in the Select clause we can also add a heading using AS. (e.g. AS
COUNTofCustomerKey) The name of the header is optional, it should be one
word.

Stephan Poelmans– Business Intell 33


Result

For example:
In Seattle: 1
women and 2
men

Stephan Poelmans– Business Intell 34


The Having Clause
• Just as we can add selection criteria in a WHERE-clause, we can define
selection conditions for subgroups using a “HAVING” clause

• A "HAVING" clause ALWAYS implies a group by clause

• Example: In which city do more than two clients live?


SELECT Dimgeography.City, count(Dimcustomer.Customerkey)
FROM Dimgeography, Dimcustomer
WHERE Dimgeography.Geographykey = Dimcustomer.Geographykey
GROUP BY Dimgeography.City
HAVING count(Dimcustomer.Customerkey) > 2

• So: each group (each city) presented in the result should include > 2
customers

• Note that the count operation does not have to be in the select clause!

Stephan Poelmans– Business Intell 35


A Where and Having clause in 1 SQL statement

• When in the same query we both have a WHERE and a HAVING clause, the
query processor processes the query as follows:

1) selection of the tables included in the “FROM clause”.


2) the selection condition(s) in the “WHERE clause” are adapted to select
rows from a table.
3) Subgroups are formed with the remaining rows as specified in the GROUP
BY clause.
4) The group functions, specified in the SELECT and/or HAVING clause are
computed.
5) The conditions in the HAVING clause are used to select subgroups.
6) The results for the remaining subgroups are shown.
(Possibly in the order specified in the ORDER BY sentence)

Stephan Poelmans– Business Intell 36


An Example
For example: An overview of the total number of products per customer, on the condition
that at least 2 products were purchased (per customer) and that the customer comes from
the USA

Select Dimcustomer.Firstname, Dimcustomer.Lastname, Dimgeography.Countryregionname


from Dimgeography, Dimcustomer, Factinternet
where Dimcustomer.Customerkey = Factinternet.Customerkey
and Dimgeography.Geographykey = Dimcustomer.Geographykey

and Dimgeography.Countryregionname = 'United States’

group by Dimcustomer.Firstname, Dimcustomer.Lastname, Dimgeography.Countryregionname


having count(Factinternet.Productkey) >= 2 ;

• Note that we don’t need the table “DimProduct"!


The facts table "FactInternet" contains one line per product (see "ProductKey").
It is therefore sufficient to count the number of products!
• Note that count(Factinternet.Productkey) >= 2 is not mentioned in the select clause, it could be added
there, but is not stricty required
• Note that firstname, lastname and countryregionname are used in the select clause, and they need to
be used all 3 in the group by clause, to make it a consistent instruction !

Stephan Poelmans– Business Intell 37


Some other pre-defined arithmetic functions
AVG (K): the average value of column K
COUNT (K): number of values in column K
MAX (K) the maximum value in column K
MIN (K): minimum value in column K
SUM (K): sum of the values from column K

These functions are standard SQL language.

Additional functions (a.o. in Base):

Year(): returns the year of a date field


Now(): returns the current date

Stephan Poelmans– Business Intell 38


Examples
• Show the age of customers:
SELECT Year(Now())-Year(Birthdate) AS age
FROM Dimcustomer
order by Year(Now())-Year(Birthdate)

• Show the average price across all products


SELECT AVG(Listprice)
FROM Dimproduct
(result: 1 number!)

• Show the most expensive product SELECT MAX(Listprice)


FROM Dimproduct
• Show the birth dates of the oldest and youngest customer

SELECT MIN(Birthdate), MAX(Birthdate)


FROM Dimcustomer

Stephan Poelmans– Business Intell 39


SQL: Additional Topics

1. Union Queries

2. ‘Top[number]’

3. Subqueries

4. ‘Rollup’ & ‘Cube’ clauses

Stephan Poelmans– Business Intell 40


1. Union Queries

• Union queries are used to combine data from multiple queries into one result.

• For example: Provide a list of names and telephone numbers of all employees
and all customers with an annual income greater than 25 000. (This does not
necessarily imply a link between customers and employees).

• SQL
select Firstname, Lastname, Phone
from Dimemployee
union
Select Firstname, Lastname, Phone
from Dimcustomer
where Yearlyincome > 25000

Stephan Poelmans– Business Intell 41


1. Union Queries

• You cannot just simply combine two SQL statements (queries) with "union“

• Conditions:
• Every query must have the same number of columns (fields)

• The data types of the corresponding fields must be the same

• Incorrect Example:
• Select Firstname, Lastname, Birthdate
from Dimemployee
union
select Firstname, Lastname, Phone
from Dimcustomer
where Yearlyincome > 25000
Stephan Poelmans– Business Intell 42
1. Union Queries
• Incorrect Example:
• Select Totalchildren, Firstname, Lastname
from Dimcustomer
where Yearlyincome > 25000
union
select Firstname, Lastname, Phone
from Dimemployee

• Correct Example:
• Select Phone, Firstname, Lastname
from Dimcustomer
where Yearlyincome > 25000
union
select Firstname, Lastname, Phone
from Dimemployee

• The order of the fields is different, but because the data types are the same,
most database systems recognize that query. The first query determines the
order in which the attributes are displayed
Stephan Poelmans– Business Intell 43
2. ‘Top[number]’
• Top[number]: E.g. Top 1, Top 3, ... This instruction can be added immediately after
the word "select". Example:

Select top 3 Firstname, Lastname, Birthdate


from Dimcustomer => Result: The 3 youngest clients
order by Birthdate desc;

• Note that the “top x” instruction only ensures that the x-first records of a query are
displayed! The youngest, oldest or highest are not automatically chosen. Example:

SELECT top 1 DC.Firstname, DC.Lastname, sum(F.Salesamount)


FROM Dimcustomer DC, Factinternet F
WHERE DC.Customerkey = F.Customerkey
GROUP BY DC.Firstname, DC.Lastname
ORDER BY DC.Firstname, DC.Lastname
=> Result: the client Aaron Collins
• This is not necessary the best customer! Top 1 takes in this case the first customer
shown in the query result.

Stephan Poelmans– Business Intell 44


2. ‘Top[number]’

• How can you effectively select the best customer?


Simple: make sure that the information resulting from the query is sorted.
From high to low (“desc”). Top 1 will select the best customer.

• Then:

select top 1 DC.Firstname, DC.Lastname, sum(F.Salesamount)


from Dimcustomer DC, Factinternet F
where DC.Customerkey = F.Customerkey
group by DC.Firstname, DC.Lastname
order by sum(F.Salesamount) desc

• How do you modify the query to select the customer with the lowest
purchase amount?

Stephan Poelmans– Business Intell 45


3. Subqueries
• A subquery is a selection of data within a different selection. In other words, we
use a query within a query

• For example:
SELECT Customerkey, Lastname FROM Dimcustomer
WHERE Customerkey IN
(SELECT Customerkey
FROM Factinternet
GROUP BY Customerkey
HAVING SUM(Salesamount) < 800)

• This query is the same as:


SELECT Dimcustomer.Customerkey, Dimcustomer.Lastname, sum(Salesamount)
FROM Dimcustomer, Factinternet
WHERE Dimcustomer.Customerkey = Factinternet.Customerkey
GROUP by Dimcustomer.Customerkey, Lastname
HAVING sum(Salesamount) < 800

Stephan Poelmans– Business Intell 46


3. Subqueries

• The data that we want to select can thus also be queried with an INNER JOIN.
Why then using a subquery? A subquery is generally a lot clearer than a JOIN
and easy to formulate. Some RDBMS perform more quickly when using a
JOIN.

• Moreover a subquery is needed to compare records within a same table

• Examples:
• Which employees are older than Guy Gilbert?
So you compare employees with each other

• Which customers are located in the same town as the customer Joanna
Suarez?
So you compare with other customers

Stephan Poelmans– Business Intell 47


3. Subqueries
• Which customers are younger than Jon Yang? (In DWAdventureworks)

Select Firstname, Lastname, Year(Now())-Year(Birthdate)


from Dimcustomer
where Year(Now())-Year(Birthdate) <
(Select Year(Now())-Year(Birthdate)
from Dimcustomer
where Firstname = 'Jon' and Lastname = 'Yang')

• Which customers are located in the same town as Jacquelyn Suarez?


Select DC.Firstname, DC.Lastname
from Dimcustomer DC, Dimgeography DG
where DC.Geographykey = DG.Geographykey
and DG.City =
(Select DG.City
from Dimcustomer DC, Dimgeography DG
where DC.Geographykey = DG.Geographykey
and DC.Firstname = 'Jacquelyn' and DC.Lastname = 'Suarez' )

Stephan Poelmans– Business Intell 48


3. Subqueries: remark 1
• The use of "IN" and "=" ">", ">=", "<" and "<=" in subqueries:
• The operators "=" ">", ">=", "<","<=" can only be used if the second query in the subquery
yields only one record!

• The operator "IN" can be used if the second query yields one or more records.

• For example: Create the following query: (DWAdventureworks))


SELECT Dimcustomer.Firstname, Dimcustomer.Lastname
FROM Dimcustomer
WHERE Geographykey =
(SELECT Geographykey
FROM Dimcustomer
WHERE Dimcustomer.Lastname=“Suarez”
and DimCustomer.FirstName = “Jacquelyn”)
• “=“ is possible only because the second query only yields one geographykey! (Namely the
one of Jacquelyn Suarez and on the condition that there is only one “Jacquelyn Suarez” in
the warehouse!). The word “IN” can also be used in this query.

Stephan Poelmans– Business Intell 49


3. Subqueries: remark 1
• Why is the following query not correct?
SELECT Dimcustomer.Firstname, Dimcustomer.Lastname
FROM Dimcustomer
WHERE Geographykey =
(SELECT Geographykey
FROM Dimcustomer
WHERE Dimcustomer.Lastname like ’s*’)
• “IN” can thus be used in the above example.

• "IN" means: the defined field of a record from the top query must "appear in" one or more
records from the second query.
SELECT Customerkey FROM Dimcustomer WHERE Customerkey IN
(SELECT Customerkey
• For example:
FROM Factinternet
GROUP BY Customerkey
HAVING SUM(Salesamount) < 800)
• The second query returns all "customerkey" on the customers who bought 800 or less ....

Stephan Poelmans– Business Intell 50


3. Subqueries: remark 2
• By only working with keys, subqueries can often be shortened.
• E.g.: Which customers are located in the same city as “Jacquelyn Suarez”?
SELECT DC.Firstname, DC.Lastname, DG.City
FROM Dimcustomer DC, Dimgeography DG
WHERE DC.Geographykey = DG.Geographykey
AND DG.City =
(SELECT DG.City
FROM Dimcustomer DC, Dimgeography DG
WHERE DC.Geographykey = DG.Geographykey
AND DC.Lastname='Suarez’ AND DC.Firstname = 'Jacquelyn')

• This query can be shorter, but then the town of customers will not be displayed. (You
only use “Geographykey”)
SELECT Firstname, Lastname
FROM Dimcustomer
WHERE Geographykey = (SELECT Geographykey
FROM Dimcustomer
WHERE Lastname=‘Suarez‘
and Firstname = ‘Jacquelyn')
Stephan Poelmans– Business Intell 51
4. Rollup and Cube

• Rollup and Cube are two typical OLAP concepts (see below)

• Rollup and Cube can also be used as SQL statements. They calculate
subtotals.

• Both instructions are not provided in MS Access, but in database systems like
MS SQL Server, Oracle and MySQL

• Some examples may clarify things

Stephan Poelmans– Business Intell 52


Rollup Sort Place Amount

Cat Miami 18
Cat Naples 9
SELECT Sort, Place, SUM(Amount) as amount Dog Miami 12
FROM Animal Dog Naples 5
GROUP BY Sort, Place Dog Tampa 14
Turtle Naples 1
Turtle Tampa 4

SELECT Sort, Place, SUM(Amount) as amount Sort Place Number

FROM Animal Cat Miami 18

GROUP BY Sort, Place


Cat Naples 9
Cat NULL 27
WITH ROLLUP; Dog Miami 12
Dog Naples 5
Dog Tampa 14
So there are subtotals calculated (total Dog NULL 31

number of animals), per species. Turtle Naples 1

In addition, the "Grand Total" (63 units) is Turtle Tampa 4

shown. Turtle NULL 5

NULL NULL 63

Stephan Poelmans– Business Intell 53


Cube

Sort Place Number


SELECT Sort, Place, SUM(Amount) as amount
Cat Miami 18
FROM Animal
Cat Napels 9
GROUP BY Sort, Place
Cat NULL 27
WITH CUBE;
Dog Miami 12
Dog Napels 5
Dog Tampa 14
Dog NULL 31
Similar to Rollup, but now for each
Turtle Napels 1
attribute of the group by clause
Turtle Tampa 4
(‘sort' and 'place') a subtotal of
animals is calculated. We thus have a Turtle NULL 5

total of animals per sort and per living NULL NULL 63


place NULL Miami 30
NULL Napels 15
NULL Tampa 18

Stephan Poelmans– Business Intell 54


Rollup, Cube and Pivot Tables

• Note: "rollup" and "cube" calculate subtotals as they are calculated in a Pivot
Table…
Result of the query “with cube”

Sort Place Number


Cat Miami 18
A pivot talbe (MS Excel): Cat Napels 9
Sum of Cat NULL
Number Column Labels Dog Miami 12
Dog Napels 5
Row Labels
Dog Tampa 14
Dog NULL
Turtle Napels 1
Turtle Tampa 4
Turtle NULL
NULL NULL
NULL Miami
NULL Napels
NULL Tampa
Stephan Poelmans– Business Intell 55

You might also like