0% found this document useful (0 votes)
24 views86 pages

Lecture9 Fall

The document discusses SQL concepts like grouping, having, subqueries, and scalar subqueries. It provides examples of how to use these SQL features to solve different types of problems involving aggregation, filtering groups, and comparing values to query results.

Uploaded by

Faruk Karagoz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views86 pages

Lecture9 Fall

The document discusses SQL concepts like grouping, having, subqueries, and scalar subqueries. It provides examples of how to use these SQL features to solve different types of problems involving aggregation, filtering groups, and comparing values to query results.

Uploaded by

Faruk Karagoz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 86

CSE 412 Database Management

Lecture 9
SQL
Jia Zou
Arizona State University

1
Practice
Problem 15 (2 Point): Please return me the name of all parts, which are supplied
from both nations that have s_nationkey 6 and s_nationkey 12

Main Ideas:

2
Practice
Problem 15 (2 Point): Please return me the name of all parts, which are supplied
from both nations that have s_nationkey 6 and s_nationkey 12

Main Ideas:
1. We need first link nationkey (Supplier) to each supplying relationship
(PartSupp) by joining PartSupp with Supplier, and obtain a table, which we
temporarily call as PartSuppNationkey
PartSupp ps_partKey ps_suppkey …
PartSuppNationkey

Ps_PartKey Ps_SuppKey s_NationKey …


⨝ Ps_suppkey = s_suppkey
Supplier s_suppkey s_aationKey …

3
Practice
Problem 15 (2 Point): Please return me the name of all parts, which are supplied
from both nations that have s_nationkey 6 and s_nationkey 12

Main Ideas:

2. The problem can be solved by a Self Join over PartSuppNationKey…Why?


PartSuppNationkey
Ps_PartKey Ps_SuppKey s_NationKey …

4
Practice
Problem 15 (2 Point): Please return me the name of all parts, which are supplied
from both nations that have s_nationkey 6 and s_nationkey 12
Main Ideas:
2. The problem can be solved by a Self Join over PartSuppNationKey…Why?
PartSuppNationkey (PSN1)
Ps_PartKey Ps_SuppKey s_NationKey …

⨝ PSN1.ps_partkey = PSN2.ps_partkey
PartSuppNationkey (PSN2) ?
Ps_PartKey Ps_SuppKey s_NationKey …

5
Practice
Problem 15 (2 Point): Please return me the name of all parts, which are supplied
from both nations that have s_nationkey 6 and s_nationkey 12
Main Ideas:
2. The problem can be solved by a Self Join over PartSuppNationKey…
PartSuppNationkey (PSN1)
Ps_PartKey Ps_SuppKey s_NationKey …

ps_part PSN1.ps_ PSN1.s_n PSN2.ps_s PSN2.s_na …


key suppKey ationKey uppKey tionKey
⨝ PSN1.ps_partkey = PSN2.ps_partkey
PartSuppNationkey (PSN2)
Ps_PartKey Ps_SuppKey s_NationKey …

6
Practice
Problem 15 (2 Point): Please return me the name of all parts, which are supplied
from both nations that have s_nationkey 6 and s_nationkey 12
Main Ideas:
2. The problem can be solved by a Self Join over PartSuppNationKey…
PartSuppNationkey (PSN1)
Ps_PartKey Ps_SuppKey s_NationKey …
123 5 6 ps_part PSN1.ps_ PSN1.s_n PSN2.ps_s PSN2.s_na …
123 81 12 key suppKey ationKey uppKey tionKey
⨝ PSN1.ps_partkey = PSN2.ps_partkey
PartSuppNationkey (PSN2) 123 5 6 81 12

Ps_PartKey Ps_SuppKey s_NationKey … 123 81 12 5 6

123 5 6
7
123 81 12
Practice
Problem 15 (2 Point): Please return me the name of all parts, which are supplied
from both nations that have s_nationkey 6 and s_nationkey 12
Main Ideas:
2. The problem can be solved by a Self Join over PartSuppNationKey…
PartSuppNationkey (PSN1)
A Selection on the output:
Ps_PartKey Ps_SuppKey s_NationKey …
σ PSN1.s_nationKey = 6 ∧ PSN2.s_nationKey = 12
will solve the problem!
⨝ PSN1.ps_partkey = PSN2.ps_partkey ps_part PSN1.ps_ PSN1.s_n PSN2.ps_s PSN2.s_na …
key suppKey ationKey uppKey tionKey
PartSuppNationkey (PSN2)
Ps_PartKey Ps_SuppKey s_NationKey …

8
9
Review
• Data Definition Language
• Data Manipulation Language
• Basic Queries (SELECT-FROM-WHERE)
• ORDER BY
• Set Operations
• Null Values
• Aggregation
• Nested Queries

10
Grouping
• SELECT … FROM … WHERE …
GROUP BY list_of_columns

• Example: compute average popularity for each age


group
SELECT age, AVG(pop)
FROM User
GROUP BY age

11
Having
• Used to filter groups based on the group properties (e.g., aggregate
values, GROUP BY column values)
• SELECT … FROM … WHERE … GROUP BY …
HAVING condition;

12
The difference between HAVING and WHERE
• 1. Use HAVING to filter groups based on the group properties (e.g.,
aggregate values, GROUP BY column values)
• 2. Use WHERE to filter tuples based on any attributes in the table(s) in
the FROM clause
• 3. If the attribute to be filtered happens to be the GROUP BY attribute,
we can use both HAVING and WHERE

13
Table subqueries
• Use query result as a table
• In set operations, FROM clauses, SELECT list, WHERE clause etc.
• A way to “nest” queries
• Example: names of users who poked others but never get poked
SELECT DISTINCT name
FROM User, ((SELECT uid1 AS uid FROM Poke) EXCEPT (SELECT uid2 AS uid
FROM Poke)) AS T
WHERE User.uid = T.uid;

14
IN subqueries
• x IN (subquery) checks if x is in the result of subquery
• Example: users at the same age as (some) Bart

15
Exists subqueries
• Exists (subquery) checks if the result of subquery is non-empty
• Example: users at the same age as (some) Bart

• This happens to be a correlated subquery—a subquery that references


tuple variables in surrounding queries
16
More examples
• Which users are the most popular?

17
Today’s Agenda
• Data Definition Language
• Data Manipulation Language
• Basic Queries (SELECT-FROM-WHERE)
• ORDER BY
• Set Operations
• Null Values
• Aggregation
• Nested Queries

18
More examples
• Which users are the most popular?

19
More examples
• Which users are not the least popular?
• SELECT *
FROM User
WHERE pop >= SOME (SELECT pop FROM User)

20
More examples
• Which part has the highest retail price? (Return the partkey)

21
More examples
• Which part has the highest retail price? (Return the partkey)

22
More examples
• Which part has the highest retail price? (Return the partkey)

It works but slow… Any better approach??

23
More examples
• Which part has the highest retail price? (Return the partkey)

Super fast, at milliseconds level

24
More examples
• Which part has the highest retail price? (Return the partkey)

This is called as Scalar subqueries!

25
Scalar subqueries
• A query that returns a single row can be used as a value in WHERE, SELECT,
etc.
• Example: users at the same age as Bart

• Runtime error if subquery returns more than one row


• Under what condition will this error never occur?
• What if the subquery returns no rows?
• The answer is treated as a special value NULL, and the comparison with NULL will fail
26
Scalar subqueries Example
• Returns me the names of customers whose account balance (c_acctbal) is
above average (larger than the average account balance of all customers.)

27
Scalar subqueries Example
• For each customer, returns his/her name and the difference between his
account balance and the average account balance of all customers .

CName AccountBal-AVGBal

28
Scalar subqueries Example
• For each customer, returns his/her name and the difference between his
account balance and the average account balance of all customers.

29
Select in WITH
• The basic value of SELECT in WITH is to break down complicated
queries into simpler parts. An example is:

30
More Examples
• For each customer, returns his/her name and the difference between
his account balance and the average account balance of all
customers.

31
Today’s Agenda
• Data Definition Language
• Data Manipulation Language
• Basic Queries (SELECT-FROM-WHERE)
• ORDER BY
• Set Operations
• Null Values
• Aggregation
• Nested Queries
• Data Modification Languages

32
INSERT
• Insert one row
• INSERT INTO Member VALUES (789, 'dps’);
• User 789 joins Dead Putting Society
• Insert the result of a query
• INSERT INTO Member
(SELECT uid, 'dps’
FROM User
WHERE uid NOT IN
(SELECT uid FROM Member WHERE gid = 'dps'));
• Everybody joins Dead Putting Society!

33
Update
• Example: User 142 changes name to “Barney”
• UPDATE User
SET name = 'Barney’
WHERE uid = 142;
• Example: We are all popular!
• UPDATE User
SET pop = (SELECT AVG(pop) FROM User);
• But won’t update of every row causes average pop to change?
ØSubquery is always computed over the old table

34
Delete
• Delete everything from a table
• DELETE FROM Member;
• Delete according to a WHERE condition
Example: User 789 leaves Dead Putting Society •
• DELETE FROM Member
WHERE uid = 789 AND gid = 'dps’;
• Example: Users under age 18 must be removed from United Nuclear
Workers
• DELETE FROM Member
WHERE uid IN
(SELECT uid FROM User WHERE age < 18)
AND gid = 'nuk';

35
Example

36
Import from csv file or text file

Much faster
than
executing a
large batch of
insertion
statements.
37
Review
• Data Definition Language
• Data Manipulation Language
• Basic Queries (SELECT-FROM-WHERE)
• ORDER BY
• Set Operations
• Null Values
• Aggregation
• Nested Queries
• Modification Queries

38
More Practice
• Please return me the name of all suppliers who supply the
most number of parts

39
More Practice
• Please return me the name of all suppliers who supply the
most number of parts

• SubQuery 1:

WITH supplier_numParts(supplierName, numParts) AS


(SELECT s_name, count(distinct p_name) FROM Part, Supplier,
PartSupp WHERE p_partkey = ps_partkey AND s_suppkey =
ps_suppkey)

40
More Practice
• Please return me the name of all suppliers who supply the
most number of parts

• SubQuery 2
WITH supplier_numParts(supplierName, numParts) AS
(SELECT s_name, count(distinct p_name) FROM Part, Supplier,
PartSupp WHERE p_partkey = ps_partkey AND s_suppkey =
ps_suppkey)
maxNumParts(value) AS (SELECT MAX(numParts) FROM
supplier_numParts)

41
More Practice
• Please return me the name of all suppliers who supply the most number
of parts

• Final Query
WITH supplier_numParts(supplierName, numParts) AS
(SELECT s_name, count(distinct p_name) FROM Part, Supplier, PartSupp
WHERE p_partkey = ps_partkey AND s_suppkey = ps_suppkey)
maxNumParts(value) AS (SELECT MAX(numParts) FROM
supplier_numParts)
SELECT SupplierName
FROM supplier_numParts, maxNumParts
WHERE numParts = maxNumParts.value

42
More Practice
Please return me the name of all suppliers who supply the most number of parts

43
Intermediate and Advanced
SQL
Starting from Lecture 13 in Part 2 of the course

44
Agenda
• Data Definition Language
• Data Manipulation Language
• Basic Queries (SELECT-FROM-WHERE)
• ORDER BY
• Set Operations
• Null Values
• Aggregation
• Nested Queries
• Data Modification Languages
• Views
45
Views
• A view is like a “virtual” table
• Defined by a query, which describes how to compute the view contents on
the fly
• DBMS stores the view definition query instead of view contents
• Can be used in queries just like a regular table

46
Creating and dropping views
• Example: members of Jessica’s Circle
• CREATE VIEW JessicaCircle AS
SELECT * FROM User
WHERE uid IN (SELECT uid FROM Member
WHERE gid = 'jes’);
• Tables used in defining a view are called “base tables”
• User and Member above
• To drop a view
• DROP VIEW JessicaCircle;

47
Using views in queries
• Example: find the average popularity of members in Jessica’s Circle
• SELECT AVG(pop)
FROM JessicaCircle;
• To process the query, replace the reference to the view by its definition
• SELECT AVG(pop)
FROM (SELECT * FROM User
WHERE uid IN
(SELECT uid FROM Member WHERE gid = 'jes’))
AS JessicaCircle;
48
Why using views
• To hide data from users
• To hide complexity from users
• Logical data independence
• If applications deal with views, we can change the underlying schema without
affecting applications
• Recall physical data independence: change the physical organization of data
without affecting applications
• To provide a uniform interface for different implementations or
sources
ØReal database applications use tons of views
49
Modifying views
• Does it even make sense, since views are virtual?
• It does make sense if we want users to really see views as tables
• Goal: modify the base tables such that the modification would appear
to have been accomplished on the view

50
A simple case
• CREATE VIEW UserPop AS
SELECT uid, pop FROM User;

DELETE FROM UserPop WHERE uid = 123;

translates to:

DELETE FROM User WHERE uid = 123;

51
An impossible case
• CREATE VIEW PopularUser AS
SELECT uid, pop FROM User
WHERE pop >= 0.8;

INSERT INTO PopularUser


VALUES(987, 0.3);
• No matter what we do on User, the inserted row will not be in
PopularUser

52
A case with too many possibilities
• CREATE VIEW AveragePop(pop) AS
SELECT AVG(pop) FROM User;
• Note that you can rename columns in view definition

UPDATE AveragePop SET pop = 0.5;


• Set everybody’s pop to 0.5?
• Adjust everybody’s pop by the same amount?
• Just lower Jessica’s pop?

53
SQL92 updateable views
• More or less just single-table selection queries
• No join
• No aggregation
• No subqueries
• Arguably somewhat restrictive
• Still might get it wrong in some cases
• See the slide titled “An impossible case”
• Adding WITH CHECK OPTION to the end of the view definition will make
DBMS reject such modifications

54
Example

55
Agenda
• Data Definition Language
• Data Manipulation Language
• Basic Queries (SELECT-FROM-WHERE)
• ORDER BY
• Set Operations
• Null Values
• Aggregation
• Nested Queries
• Data Modification Languages
• Views
• Integrity Constraints
56
Primary Keys
• Single-column primary key:

• Multi-column primary key:

57
Foreign Key References
• Single-column reference:

• Multi-column reference:

58
Foreign Key References
• You can define what happens when the parent table is
modified:
• CASCADE (remove from the table)
• NO ACTION
• SET NULL
• SET DEFAULT

59
Foreign Key References
• Delete/update the enrollment information when a student is changed:

60
Value Constraints
• Ensure one-and-only-one value exists:

• Make sure a value is not null:

61
Example

62
General assertion
• CREATE ASSERTION assertion_name
CHECK assertion_condition;
• assertion_condition is checked for each modification that could potentially
violate it
• Example: Member.uid references User.uid
• CREATE ASSERTION MemberUserRefIntegrity
CHECK (NOT EXISTS
(SELECT * FROM Member
WHERE uid NOT IN
(SELECT uid FROM User)));
ØIn SQL3, but not all (perhaps no) DBMS supports it

63
Tuple- and Attribute- based Checks
• Associated with a single table
• Only checked when a tuple/attribute is inserted/updated
• Reject if condition evaluates to FALSE
• TRUE and UNKNOWN are fine
• Examples:
• CREATE TABLE User(... age INTEGER CHECK(age IS NULL OR age > 0), ...);
• CREATE TABLE Member
(uid INTEGER NOT NULL,
CHECK(uid IN (SELECT uid FROM User)), ...);
• Is it a referential integrity constraint?
• Not quite; not checked when User is modified

64
Example

CREATE TABLE products (


product_no integer,
name text,
price numeric,
CHECK (price > 0),
discounted_price numeric,
CHECK (discounted_price > 0),
CHECK (price > discounted_price) );

65
Integrity Constraints

https://www.pos
tgresql.org/docs/
12/ddl-
constraints.html

66
Agenda
• Data Definition Language
• Data Manipulation Language
• Basic Queries (SELECT-FROM-WHERE)
• ORDER BY
• Set Operations
• Null Values
• Aggregation
• Nested Queries
• Data Modification Languages
• Views
• Integrity Constraints
• Indexes

67
Indexes
• An index is an auxiliary persistent data structure
• Search tree (e.g., B+-tree), lookup table (e.g., hash table), etc.
Ø More on indexes later in this course!
• An index on R.A can speed up accesses of the form
• R.A = value
• R.A > value (sometimes; depending on the index type)
• An index on (R.A1, …, R.An) can speed up
• R.A1 = value1∧ … ∧ R.An = valuen
• (R.A1 , …, R.An) > (value1, …, valuen)(again depends)
Ø Ordering of index columns is important—is an index on (R.A, R.B)
equivalent to one on (R.B, R.A)?
Ø How about an index on R.A plus another on R.B?

68
Examples of using indexes
• SELECT * FROM User WHERE name = 'Bart’;
• Without an index on User.name: must scan the entire table if we store User as a flat
file of unordered rows
• With index: go “directly” to rows with name='Bart’
• SELECT * FROM User, Member
WHERE User.uid = Member.uid
AND Member.gid = 'jes’;
• With an index on Member.gid or (gid, uid): find relevant Member rows directly
• With an index on User.uid: for each relevant Member row, directly look up User rows
with matching uid
• Without it: for each Member row, scan the entire User table for matching uid
• Sorting could help

69
Creating and dropping indexes in SQL
CREATE [UNIQUE] INDEX indexname ON
tablename (columnname1,…,columnnamen);
• With UNIQUE, the DBMS will also enforce that
(columnname1,…,columnnamen) is a key of tablename
DROP INDEX indexname;

• Typically, the DBMS will automatically create indexes for PRIMARY KEY
and UNIQUE constraint declarations

70
Choosing indexes to create
• More indexes = better performance?
• Indexes take space
• Indexes need to be maintained when data is updated
• Indexes have one more level of indirection
ØOptimal index selection depends on both query and update workload
and the size of tables
• Automatic index selection is now featured in some commercial DBMS

71
Agenda
• Data Definition Language
• Data Manipulation Language
• Basic Queries (SELECT-FROM-WHERE)
• ORDER BY
• Set Operations
• Null Values
• Aggregation
• Nested Queries
• Data Modification Languages
• Views
• Integrity Constraints
• Indexes
• Discretionary Access Control

72
GRANT Command
• GRANT privileges ON object TO users [WITH GRANT OPTION]
• The following privileges can be specified:
• SELECT: Can read all columns (including those added later via ALTER TABLE command).
• INSERT(col-name): Can insert tuples with non-null or non-default values in this column.
• INSERT means same right with respect to all columns.
• Update (col-name): similar to INSERT
• DELETE: Can delete tuples.
• REFERENCES (col-name): Can define foreign keys (in other tables) that refer to this column.
• Object can be a table or a view
• User can be a user or a role of user
• If a user has a privilege with the GRANT OPTION, can pass privilege on to other
users (with or without passing on the GRANT OPTION).
• Only owner can execute CREATE, ALTER, and DROP.
Revoke Command
• Revoke privileges ON object FROM users [CASCADE]
• When a privilege is revoked from X with CASCADE is specified, , it is
also revoked from all users who got it solely from X.
Examples: GRANT and REVOKE of Privileges
• GRANT INSERT, SELECT ON Sailors TO Horatio
• Horatio can query Sailors or insert tuples into it.
• GRANT DELETE ON Sailors TO Yuppy WITH GRANT OPTION
• Yuppy can delete tuples, and also authorize others to do so.
• GRANT UPDATE (rating) ON Sailors TO Dustin
• Dustin can update (only) the rating field of Sailors tuples.
• GRANT SELECT ON ActiveSailors TO Guppy, Yuppy
• This does NOT allow the ‘uppies to query Sailors directly!
• REVOKE SELECT ON Sailors FROM Yuppy CASCADE;
• This will revoke the authorization for querying Sailors from Yuppy and all users who
got this privilege solely from Yuppy
Agenda
• Data Definition Language
• Data Manipulation Language
• Basic Queries (SELECT-FROM-WHERE)
• ORDER BY
• Set Operations
• Null Values
• Aggregation
• Nested Queries
• Data Modification Languages
• Views
• Integrity Constraints
• Indexes
• Discretionary Access Control
• Programming Interfaces

76
Working with SQL through an API
• E.g.: Python psycopg2, JDBC, ODBC (C/C++/VB)
• All based on the SQL/CLI (Call-Level Interface) standard
• The application program sends SQL commands to the DBMS at
runtime
• Responses/results are converted to objects in the application
program

77
Working with SQL through an API
https://pypi.org/project/psycopg2/

• E.g.: Python psycopg2, JDBC, ODBC (C/C++/VB)


• All based on the SQL/CLI (Call-Level Interface) standard
• The application program sends SQL commands to the DBMS at
runtime
• Responses/results are converted to objects in the application
program

78
Example API: Python psycopg2

79
More psycopg2 examples

80
Prepared statements: motivation

• Every time we send an SQL string to the DBMS, it must perform parsing,
semantic analysis, optimization, compilation, and finally execution
• A typical application issues many queries with a small number of patterns
(with different parameter values)
• Can we reduce this overhead?

81
Prepared statements: example

• The DBMS performs parsing, semantic analysis, optimization, and compilation


only once, when it “prepares” the statement
• At execution time, the DBMS only needs to check parameter types and validate
the compiled plan
• Most other API’s have better support for prepared statements than psycopg2
• E.g., they would provide a cur.prepare() method 82
“Exploits of a mom”

• The school probably had something like:


cur.execute("SELECT * FROM Students " + \ "WHERE (name = '" + name +
"')")
where name is a string input by user
• Called an SQL injection attack
83
SQL comments
• https://www.postgresql.org/docs/current/sql-syntax-
lexical.html#SQL-SYNTAX-COMMENTS

84
SQL Injection

85
Guarding against SQL injection
• Escape certain characters in a user input string, to ensure that it
remains a single string
• E.g., ', which would terminate a string in SQL, must be replaced by '' (two
single quotes in a row) within the input string
• Luckily, most APIs provide ways to “sanitize” input automatically (if
you use them properly)
• E.g., pass parameter values in psycopg2 through %s’s

afe fe
s Sa
Un = 'SELECT * FROM
sql_query sql_query = 'SELECT * FROM %s'
{}'.format(user_input) cur.execute(sql_query) cur.execute(sql_query,
(user_input,))
86

You might also like