Database
Lession 5. Structured Query Language -
part 2
Nguyễn Thị Oanh
[email protected] Departement of Computer Science
SoICT, HUST
Outline
Data Manipulation: SQL Retrieval statement (Part 2)
1. Joins operators
2. Subqueries: in FROM clause and in WHERE clause
3. Union, Intersection and Difference of Queries
4. Aggregation operators
5. Grouping and aggregation in SQL , conditions in HAVING
clause
6. Controlling the output: duplicate elimination, ordering the
result
2
Learning objective
• Write retrieval statement in SQL: from simple queries to
complex ones
3
1. Example of a database schema
student(student_id, first_name,last_name, dob, gender,address,note,clazz_id)
clazz(clazz_id, name, lecturer_id, monitor_id)
subject(subject_id, name, credit, percentage_final_exam)
enrollment(student_id, subject_id, semester, midterm_score, final_score)
lecturer(lecturer_id, first_name, last_name, dob, gender, address, email)
teaching(subject_id, lecturer_id)
grade(code, from_score, to_score)
List of all female students ?
DBMS
First name, last name and address of class monitors ?
List of students (id and fullname) have
Client-applications enrolled subject 'Học máy' in semester 20172?
(in C#, Java, php, ...)
List of students (id and fullname) having CPA >= 3.2?
4
1. Example of a database schema
student
student_id first_name last_name dob gender …
List of students (id and fullname)
20160001 Ngọc An Bùi 3/18/1987 M …
have enrolled subject 'Học máy' in
… … … … … …
semester 20172?
20160003 Thu Hồng Trần 6/6/1987 F …
20160004 Minh Anh Nguyễn 5/20/1987 F …
enrollment
midterm_ final_
student_id subject_id semester
score score
subject
20160001 IT1110 20171 9 8.5 percentage_
… … … … … subject_id name credit
final_exam
20160001 IT4866 20172 7 9 IT1110 Tin học đại cương 4 60
20160002 IT3080 20172 9 … … … …
20160003 IT4866 20172 7 6 IT4866 Học máy 2 70
5
1. Data Manipulation: SELECT operation
SELECT[all|distinct]
{*|{table_name.*|expr[alias]}|view_name.*}
[,{table_name.*|expr[alias]}]...}
FROM table_name [alias][,table_name[alias]] ...
[WHERE condition]
[GROUP BY expr [,expr] ...]
[HAVING condition]
[{UNION|UNION ALL|INTERSECT|MINUS}
SELECT ...]
[ORDER BY {expr|position} [ASC|DESC]
[,expr|position}[ASC|DESC]
6
Data Manipulation: Advanced SELECT
• Joins operators
• Subqueries: in FROM clause and in WHERE clause
• Aggregation operators
• Grouping and aggregation in SQL , conditions in HAVING
clause
• Controlling the output: duplicate elimination, ordering the result
7
1.1. Joins operators
• Syntax:
SELECT t1.c1, t1.c2, …, t2.c1, t2.c2
FROM t1, t2
WHERE condition_expression
• Example:
student(student_id, first_name,last_name, dob, gender,address,note,clazz_id)
clazz(clazz_id, name, lecturer_id, monitor_id)
SELECT clazz.clazz_id, name, last_name, first_name
FROM clazz, student
WHERE student_id = monitor_id
8
1.1. Joins operators: AS keyword in FROM clause
• Used for naming variables:
SELECT …
FROM <table_name> [AS] <variable_name>, …
[WHERE …]
o AS: optional,
o <variable_name>: used in the whole SQL statement
• Example:
SELECT c.clazz_id, name, s.last_name, s.first_name
FROM clazz AS c, student s
WHERE s.student_id = c.monitor_id
10
1.1. Joins operators: Self-join
subject(subject_id, name, credit, percentage_final_exam)
Find all pairs of subjects id having the same name but the credit of the
first subject is less than the credit of the second one
SELECT sj1.subject_id, sj2. subject_id
FROM subject sj1, subject sj2
WHERE sj1.name = sj2.name
AND sj1.credit < sj2.credit
11
1.1. Joins operators: Example
student(student_id, first_name,last_name, dob, gender,address,note,clazz_id)
subject(subject_id, name, credit, percentage_final_exam)
enrollment(student_id, subject_id, semester, midterm_score, final_score)
List of students have enrolled subjects in semester 20172. The list composes of student
fullname, subject name, subject credit:
SELECT last_name ||' ' ||first_name as fullname,
sj.name as subjectname, credit
FROM student s, enrollment e, subject sj
WHERE s.student_id = e.student_id
AND sj.subject_id = e.subject_id
AND semester = '20172'
12
1.1. Joins operators: Join types
• Product:
• R CROSS JOIN S
• Theta join:
• R [INNER] JOIN S ON <condition>
• Natural join: (Be careful!)
• R NATURAL JOIN S
• Outer join:
• R [LEFT|RIGHT|FULL] [OUTER] JOIN S ON <condition>
• R NATURAL [LEFT|RIGHT|FULL] [OUTER] JOIN S
13
1.1. Joins operators: OUTER JOINS
• R [LEFT|RIGHT|FULL] OUTER JOIN S ON <condition>
• R NATURAL [LEFT|RIGHT|FULL] OUTER JOIN S
R FULL OUTER JOIN S ON (R.a = S.a)
R
R.a b R.c S.a S.c d
a b c
1 An 5 1 5 X
1 An 5
1 An 5 1 7 Y
2 Binh 5
2 Binh 5 2 5 Z
3 Cuong 7
3 Cuong 7 NULL NULL NULL
S NULL NULL NULL 4 1 Z
a c d
R NATURAL LEFT OUTER JOIN S
1 5 X
R.a b R.c S.a S.c d
1 7 Y
1 An 5 1 5 X
2 5 Z
2 Binh 5 2 5 Z
4 1 Z
3 Cuong 7 NULL NULL NULL
14
1.1. Joins operators: OUTER JOIN Example
• List of all classes with monitor names (firstname and lastname, NULL if
class has not yet a monitor)
clazz student
clazz_id name lecturer_id monitor_id student_id first_name last_name … clazz_id
20162101 CNTT1.01-K61 02001 20160003 20160003 Thu Hồng Trần … 20162101
20162102 CNTT1.02-K61 20160004 Minh Anh Nguyễn … 20162101
20172201 CNTT2.01-K62 02002 20170001 … … … … …
20172202 CNTT2.02-K62
result
clazz_id name last_name first_name
SELECT c.clazz_id, name, last_name, first_name 20172202 CNTT2.02-K62 NULL NULL
FROM clazz c LEFT OUTER JOIN student 20162102 CNTT1.02-K61 NULL NULL
ON (student_id = monitor_id); 20162101 CNTT1.01-K61 Trần Thu Hồng
20172201 CNTT2.01-K62 Nguyễn Nhật Ánh
15
1.2. Sub-queries
• A SELECT-FROM-WHERE statement can be used within a clause
of another outer query. It can be
• within a WHERE clause
• within a FROM clause
• Creates an intermediate result
• No limit to the number of levels of nesting
• Objectives:
• Check if an element is in a set (IN, NOT IN)
• Set comparison >ALL, >=ALL, <ALL,<=ALL,=ALL, ANY (SOME)
• Check if a relation is empty or not (EXISTS, NOT EXISTS)
16
1.2. Sub-queries: Subquery returns scalar value
• A sub-query provide a single value è we can use it as if it
were a constant
SELECT *
FROM student
WHERE student_id = (SELECT monitor_id
FROM clazz
WHERE clazz_id = '20162101');
17
1.2. Sub-queries: IN operators
• Syntax:
<tuple> [NOT ] IN <subquery>
• Example: First name, last name and address of class monitors?
student(student_id, first_name,last_name, dob, gender, address, note, clazz_id)
clazz(clazz_id, name, lecturer_id, monitor_id)
SELECT first_name, last_name, address
FROM student
WHERE student_id IN (SELECT monitor_id FROM clazz);
18
1.2. Sub-queries: EXISTS
• Syntax:
[NOT] EXISTS (<subquery>)
EXISTS (<subquery>): TRUE iff <subquery> result is not empty
• Example: subjects having no lecturer?
teaching(subject_id, lecturer_id)
subject(subject_id, name, credit, percentage_final_exam)
SELECT * FROM subject s
WHERE not exists (SELECT *
FROM teaching
WHERE subject_id = s.subject_id)
19
1.2. Sub-queries: ALL, ANY
• Syntax: <expression> <comparison_operator> ALL|ANY <subquery>
o <comparison_operator>: >, <, <=, >=, =, <>
o X >=ALL<subquery>: TRUE if there is no tuple larger than X in <subquery> result
o X = ANY<subquery>: TRUE if x equals at least one tuple in <subquery> result
o X >ANY<subquery>: TRUE if x is not the smallest tuple produced by <subquery>
• Example:
SELECT *
FROM subject
WHERE credit >= ALL (SELECT credit FROM subject);
20
1.2. Sub-queries: Example
SELECT *
subject
subject_id name credit perc…
FROM subject
IT1110 Tin học đại cương 4 60
WHERE credit > ANY(SELECT credit
IT3080 Mạng máy tính 3 70 FROM subject);
result
IT3090 Cơ sở dữ liệu 3 70 subject_id name credit perc…
IT4857 Thị giác máy tính 3 60 IT1110 Tin học đại cương 4 60
IT4866 Học máy 2 70 IT3080 Mạng máy tính 3 70
IT3090 Cơ sở dữ liệu 3 70
SELECT * IT4857 Thị giác máy tính 3 60
FROM subject
WHERE credit >= ALL(SELECT credit FROM subject);
result
subject_id name credit perc…
IT1110 Tin học đại cương 4 60
21
1.2. Sub-queries: Subquery in FROM Clause
• Subquery is used as a relation in a FROM clause
• Must give it a tuple-variable alias
• Eg.: List of lecturers teaching subject whose id is 'IT3090'
SELECT l.*
FROM lecturer l,
(SELECT lecturer_id
FROM teaching
WHERE subject_id = 'IT3090') lid
WHERE l.lecturer_id = lid.lecturer_id
22
1.3. Union, Intersection and Difference of Queries
S
• <subquery_1> UNION <subquery_2>
• <subquery_1> INTERSECT <subquery_2>
• <subquery_1> EXCEPT <subquery_2>
• Ex.: List of subjects no one has enrolled?
SELECT * FROM subject
EXCEPT
SELECT s.*
FROM subject s NATURAL JOIN enrollment e ;
23
1.4. Aggregation Operators
S
• SUM, AVG, COUNT, MIN, MAX: applied to a column in a SELECT clause
• COUNT(*) counts the number of tuples subject
subject_id name credit perc…
IT1110 Tin học đại cương 4 60
SELECT AVG(credit), MAX(credit) IT3080 Mạng máy tính 3 70
FROM subject IT3090 Cơ sở dữ liệu 3 70
WHERE subject_id LIKE 'IT%'; IT4857 Thị giác máy tính 3 60
IT4866 Học máy 2 70
result
LI0001 life's happy song 5
AVG MAX
3.0 4 LI0002 %life's happy song 2 5
24
1.4. Aggregation Operators: Functions
• Aggregate functions: MAX, MIN, SUM, AVG, COUNT
• Functions applying on individual tuples:
• Mathematic functions: ABS, SQRT, LOG, EXP, SIGN, ROUND, ..
• String functions: LEN, LEFT, RIGHT, MID,…
• Date/Time functions: DATE, DAY, MONTH, YEAR, HOUR, MINUTE, …
• Format modification: FORMAT
• Remark:
• In general, common functions are similar between different DBMSs,
• Some functions have different formats or names,… especially for date, time
and string data types è See documentations for each DBMS
25
1.4. Aggregation Operators: Functions
• Example
SELECT sjid, name, MIN(score), MAX(score), AVG(score), stddev_pop(score)
FROM (SELECT student_id sid, e.subject_id sjid, name,
(midterm_score*(1-1.0*percentage_final_exam/100)+
final_score*1.0*percentage_final_exam/100) score
FROM enrollment e, subject sj
WHERE sj.subject_id = e.subject_id) AS t
result
WHERE upper(sjid) LIKE 'IT%'
sjid name min max avg stddev
GROUP BY sjid, name; IT1110 Tin học đại cương 5.4 8.7 7.05 1.254
IT3080 Mạng máy tính
IT3090 Cơ sở dữ liệu 8.1 8.1 8.1 0
IT4857 Thị giác máy tính 8.25 8.25 8.25 0
IT4866 Học máy 8.4 8.4 8.4 0
26
1.4. NULL's ignored in Aggregation
• NULL: no contribution
• no non-NULL values in a column è the result: NULL
– Exception: COUNT of an empty set is 0
subject
SELECT AVG(percentage_final_exam)
subject percentage_
FROM subject; è 64=(60x2+70x3)/5 _id
name credit
final_exam
IT1110 Tin học đại cương 4 60
SELECT AVG(percentage_final_exam),
IT3080 Mạng máy tính 3 70
count(percentage_final_exam)
IT3090 Cơ sở dữ liệu 3 70
FROM subject
IT4857 Thị giác máy tính 3 60
WHERE subject_id NOT LIKE 'IT%';
IT4866 Học máy 2 70
result
LI0001 life's happy song 5
AVG COUNT
NULL 0 LI0002 %life's happy song 2 5
27
1.5. Grouping results
S
student
student_id first_name last_name … gender … clazz_id
• Syntax: 20160001 Ngọc An Bùi … M … 20172201
SELECT ... 20160002 Anh Hoàng … M … 20162101
FROM ... 20160003 Thu Hồng Trần … F … 20162101
[WHERE condition] 20160004 Minh Anh Nguyễn … F … 20162101
20170001 Nhật Ánh Nguyễn … F … 20172201
GROUP BY expr [,expr]...
• Example and Operational semantic: result
clazz_id count
SELECT clazz_id, count(student_id) 3 20162101 3
FROM student 20172201 2
1
WHERE gender = 'F'
GROUP BY clazz_id; 2
28
1.5. Grouping results S
• Each element of the SELECT list must be either:
– Aggregated, or
– An attribute on the GROUP BY list
SELECT clazz_id, count(student_id), first_name
FROM student
WHERE gender = 'F'
GROUP BY clazz_id;
29
1.5. Grouping results: HAVING
S
• Syntax:
SELECT ...
FROM ...
[WHERE condition]
GROUP BY expr [,expr]...
HAVING <condition on group>
• Example:
SELECT clazz_id, count(student_id) 4
FROM student result
WHERE gender = 'F'
1 clazz_id count
GROUP BY clazz_id 2 20162101 3
HAVING count(student_id) > 2; 3
30
1.5. Grouping results: HAVING
• Requirements on HAVING conditions:
• Anything goes in a subquery
• Outside subqueries, they may refer to attributes only if they are:
• either a grouping attribute
• or aggregated
SELECT subject_id, semester, count(student_id)
FROM enrollment
GROUP BY subject_id, semester
HAVING count(student_id) >= ALL
(SELECT count(student_id)
FROM enrollment
GROUP BY subject_id, semester)
31
1.5. Grouping results: HAVING
• Which subject in which semester has it the most enrollments?
SELECT subject_id, semester, count(student_id)
FROM enrollment
GROUP BY subject_id, semester
HAVING count(student_id) >= ALL
(SELECT count(student_id)
FROM enrollment
result GROUP BY subject_id, semester);
subject_id semester count
IT4857 20172 1 result
IT3090 20172 1 subject_id semester count
IT4866 20172 1 IT1110 20171 4
IT3080 20172 2
IT1110 20171 4
32
1.6. Controlling the output: Eliminating Duplicates
• Remove duplicate tuples: DISTINCT
SELECT DISTINCT student_id FROM enrollment;
• UNION | INTERSECT | EXCEPT: remove duplicate rows
• UNION | INTERSECT | EXCEPT ALL:
• does not remove duplicate rows
33
1.6. Controlling the output: Eliminating Duplicates in
an Aggregation
Sử
• Use DISTINCT inside aggregation
SELECT count(*) a,
count(distinct percentage_final_exam) b,
AVG(credit) c, subject
AVG(distinct credit) d subject percentage_
name credit
FROM subject; _id final_exam
IT1110 Tin học đại cương 4 60
IT3080 Mạng máy tính 3 70
result IT3090 Cơ sở dữ liệu 3 70
a b c d IT4857 Thị giác máy tính 3 60
7 3 3.57 3.5 IT4866 Học máy 2 70
LI0001 life's happy song 5
LI0002 %life's happy song 2 5
34
1.6. Controlling the output: Ordering results
• Syntax and operational semantic:
SELECT ...
FROM ...
[WHERE condition]
[GROUP BY expr [,expr]... ]
[HAVING …]
ORDER BY {expr|position} [ASC|DESC]
[{,expr|position}[ASC|DESC] 1
35
1.6. Controlling the output: Ordering results
• Example:
SELECT subject_id, semester, count(student_id)
FROM enrollment
GROUP BY subject_id, semester
ORDER BY semester,
count(student_id) DESC, subject_id;
result result
subject_id semester count subject_id semester count
IT4857 20172 1 IT1110 20171 4
IT3090 20172 1 IT3080 20172 2
IT4866 20172 1 IT3090 20172 1
IT3080 20172 2 IT4857 20172 1
IT1110 20171 4 IT4866 20172 1
36
Remark
• Complex query
• Clauses in SQL statement are not exchangeable
• A SQL statement executed successfully, it's not sure that this
statement provides the correct result
• A query provides correct result at a moment, it may not the correct
query for a demand
• Be careful with "natural join"
37
Quiz 1.
OX Example Select
Quiz Number 1 Quiz Type
What does the following SQL statement result?
Question
SELECT * FROM student WHERE (1=0);
A. An empty relation with the same structure of "student"
Example B. A relation with the same structure and data as "student"
C. The query raises error
Answer A
Expression (1=0) gives false value, so all tuples of student do not
Feedback satisfy this condition. There is no tuple in result relation.
38
Quiz 2.
OX Example Select
Quiz Number 2 Quiz Type
We must always have join conditions if there are more than one relation in
Question FROM clause ?
A. Yes
Example
B. No
Answer B
No, it is as cross join (called a Cartesian product), but the product by itself
Feedback is rarely a useful operation
39
Quiz
Quiz 3.
Number 3 Quiz Type
OX Example Select
Can we put the condition in HAVING clause into the WHERE clause ?
Question
A. Sometimes yes
Example B. No, never
C. Yes, we can
Answer A
Conditions in HAVING clause and in WHERE clause are not the same
meaning. Conditions in HAVING clause apply to groups as a whose.
Conditions in WHERE clause apply to individual tuples.
- If condition in HAVING clause refers to grouping attribute, then this
Feedback condition can be placed in WHERE clause.
- If condition in HAVING clause refers to aggregated attributes, it can not
be moved to WHERE clause.
40
Quiz 4.
OX Example Select
Quiz Number 4 Quiz Type
What does the following SQL statement result?
SELECT student_id FROM enrollment
Question
WHERE subject_id = 'IT3090' AND subject_id = 'IT4859'
A. Empty relation
B. List of student_ids that have enrolled both two subjects IT3090 and
Example IT4859.
C. List of student_ids that have enrolled at least one subject whose subject
_id is IT3090 or IT4859
Answer A
The condition in WHERE clause is always false.
Feedback
41
Summary
• Data manipulation (part 2)
• Joins operators
• Subqueries: in FROM clause and in WHERE clause
• Aggregation operators
• Grouping and aggregation in SQL , conditions in HAVING clause
• Controlling the output: duplicate elimination, ordering the result
Thank you for
your attention!
43