Databases SQL
Databases SQL
SQL
Slides by: ?
Joseph E. Gonzalez
[email protected]
Joseph E. Gonzalez
Questions for Today
Ø What is a database management system?
Ø Why/how do we use them?
Ø Why do I have to do so many joins?
Joseph E. Gonzalez
What is a database?
Joseph E. Gonzalez
Defining Databases
Ø A database is an organized collection of data.
Joseph E. Gonzalez
Database Management Systems
Ø Data storage
Ø Provide reliable storage to survive system crashes and disk failures
Ø Special data-structures to improve performance
Ø Data management
Ø Configure how data is logically organized and who has access
Ø Ensure data consistency properties (e.g., positive bank account
values)
Ø Facilitate access
Ø Enable efficient access to the data
Ø Supports user defined computation (queries) over data
Joseph E. Gonzalez
How do you interact with a
database?
What is the DBMS?
Ø Server
Ø Software
Ø A library
Query
SELECT * FROM sales DBMS Serve
WHERE price > 100.0 r on
yo u r o w n m
achine
Python Analysis
Response
Date Purchase ID Name Price
9/20/2012 1234 Sue $200.00
8/21/2012 3453 Joe $333.99
Joseph E. Gonzalez
Cust.
Prod.
Query
SELECT * FROM sales
WHERE price > 100.0 DBMS Serve
r
Python Analysis
Response
Date Purchase ID Name Price
9/20/2012 1234 Sue $200.00
8/21/2012 3453 Joe $333.99
Joseph E. Gonzalez
Cust.
Prod.
Web Servers
DBMS Serve
r
HTTP
Web Applications
Often many
Python Analysis systems will
Visualization
connect to a
DBMS
concurrently.
Joseph E. Gonzalez
Why are databases drawn as “cans”
Platters on a Disk Drive
Looks Like?
Platters on a Disk Drive
Looks
Like?
Joseph E. Gonzalez
Widely Used
DBMS Technologies
Joseph E. Gonzalez
Common DBMS Systems
https://db-engines.com/en/ranking
Abstraction
Joey sid Bike
sname $333.99
rating age
Alice 28 Car
yuppy 9
$999.00 35.0
31 lubber 8 55.5
44 guppy 5 35.0 Page 1 Page 2
58
bid bname
rusty 10
color
35.0
Optimized
101 Interlake blue Page 3 Page 4
Storage
102 Interlake red Page
Header
Joseph E. Gonzalez
Physical Data
Relational Independence:
Data Abstraction
Database management systems hideManagement
Database how data System
is
Relations (Tables)
stored from end user applications
Optimized Data Structures
Name Prod Price
Abstraction
Joey sid Bike
sname $333.99
rating age
without
Alice 28 Car changing
yuppy 9
$999.00
applications.
35.0
31 lubber 8 55.5
44 guppy 5
bid bname
Big Idea in Data Structures
35.0
color Optimized
Page 1 Page 2
58 rusty 10 35.0
101 Interlake
102 Interlake
blue
red
Data Systems &
Storage Page 3 Page 4
Page
Computer Science
Header
Joseph E. Gonzalez
Before 1970’s databases
were not routinely organized as tables.
Joseph E. Gonzalez
Ted Codd and the Relational Model
Ø [1969] Relational model: a mathematical
abstraction of a database as sets
Ø Independence of data from the physical properties of
stage storage and representation
Joseph E. Gonzalez
SQL
What
not
How
Joseph E. Gonzalez
SQL is a Declarative Language
Ø Declarative: “Say what you want, not how to get it.”
Ø Declarative Example: I want a table with columns “x” and “y” from tables
“A” and ”B” where the values in “y” are greater than 100.00.
Ø Imperative Example: For each record in table “A” find the corresponding
record in table “B” then drop the records where “y” is less than or equal to
100 then return the ”x” and “y” values.
Joseph E. Gonzalez
Two sublanguages of SQL
Ø DDL – Data Definition Language
Ø Define and modify schema
Ø DML – Data Manipulation Language
Ø For interacting with the data itself SELECT * FROM
THINGS;
CAPITALIZATION IS optional …
BUT DATABASE PEOPLE PREFER TO YELL
Joseph E. Gonzalez
Creating Tables &
Populating Tables
Joseph E. Gonzalez
http://sqlfiddle.com/#!17/80257
Creating Tables
CREATE TABLE students(
name TEXT PRIMARY KEY,
gpa REAL CHECK (gpa >= 0.0 and gpa <= 4.0),
age INTEGER,
dept TEXT,
gender CHAR);
Imposing Integrity
Constraints
-- This is a comment. No
-- Does the order matter?
http://sqlfiddle.com/#!17/80257 Note: This information is fictional and does not
reflect the individuals listed in the table.
Common SQL Types (there are others...)
Ø CHAR(size): Fixed number of characters
Ø TEXT: Arbitrary number of character strings
Ø INTEGER & BIGINT: Integers of various sizes
Ø REAL & DOUBLE PRECISION: Floating point numbers
Ø DATE & DATETIME: Date and Date+Time formats
Joseph E. Gonzalez
CREATE TABLE Sailors (
sid INTEGER,
sname CHAR(20), Columns have
sid sname rating age rating INTEGER, names and types
1 Fred 7 22 age REAL,
PRIMARY KEY (sid));
2 Jim 2 39
Specify
3 Nancy 8 27 Primary Key
CREATE TABLE Boats ( column(s)
bid INTEGER,
bname CHAR (20),
bid bname color color CHAR(10),
PRIMARY KEY (bid)); Specify
101 Nina red
Foreign Key
102 Pinta blue relationships
103 Santa Maria red CREATE TABLE Reserves (
sid INTEGER,
bid INTEGER,
day DATE,
PRIMARY KEY (sid, bid, day),
FOREIGN KEY (sid) REFERENCES Sailors,
sid bid day FOREIGN KEY (bid) REFERENCES Boats);
1 102 9/12
2 102 9/13 Semicolon at
end of command
Deleting and Modifying Records
Ø Records are deleted by specifying a condition:
DELETE FROM students
WHERE LOWER(name) = 'sergey brin'
String Function
Ø Modifying records
UPDATE students
SET gpa = 1.0 + gpa
WHERE dept = ‘CS’;
Joseph E. Gonzalez
Dropping Tables
Ø To delete a table entirely from our database, we can use
the “DROP TABLES” command.
Ø Example: DROP TABLE IF EXISTS tips;
Ø Be careful, there is no easy way to undo this command.
https://xkcd.com/327/
Joseph E. Gonzalez
Querying Tables
Joseph E. Gonzalez
SQL DML:
Basic Single-Table Queries
SELECT [DISTINCT] <column expression list>
FROM <list of tables>
[WHERE <predicate>]
[GROUP BY <column list>
[HAVING <predicate>] ]
[ORDER BY <column list>]
[LIMIT <number of rows>];
Joseph E. Gonzalez
Find the name and GPA for all CS
Students
Students Table
SELECT name, gpa
FROM students
WHERE dept = 'CS'
Joseph E. Gonzalez
Find the name and GPA for all CS
Students
Students Table
SELECT name, gpa
FROM students
WHERE dept = 'CS'
Joseph E. Gonzalez
Combing predicates with OR and AND
Students Table
SELECT name, gpa
FROM students
WHERE dept = 'CS' AND gpa > 3.8
Joseph E. Gonzalez
Students Table
SELECT DISTINCT
SELECT DISTINCT dept
FROM students
[WHERE <predicate>]
[GROUP BY <column list>
[HAVING <predicate>] ]
[ORDER BY <column list>] ;
Joseph E. Gonzalez
Students Table
SELECT DISTINCT
SELECT DISTINCT dept, gender
FROM students
[WHERE <predicate>]
[GROUP BY <column list>
[HAVING <predicate>] ]
[ORDER BY <column list>] ;
Output
Joseph E. Gonzalez
Functions in the SELECT list
SELECT UPPER(name),
LOWER(dept) AS d,
gpa/4.0 AS gpa_ratio
FROM students
[WHERE <predicate>]
[GROUP BY <column list>
[HAVING <predicate>] ]
[ORDER BY <column list>] ;
Joseph E. Gonzalez
Students Table
Aggregates
Output
SELECT AVG(age)
FROM students
WHERE dept = 'CS'
[GROUP BY <column list>
[HAVING <predicate>] ]
[ORDER BY <column list>] ;
Joseph E. Gonzalez
Grouping Your Data
from where group by having select
GROUP BY
SELECT dept, AVG(age) AS avg_age
FROM students
[WHERE <predicate>]
GROUP BY dept
[HAVING <predicate>]
[ORDER BY <column list>] ;
Output
Ø Partition table into groups with same
GROUP BY column values
Ø Produce one aggregate result per group
Joseph E. Gonzalez
What does the following Produce?
SELECT name, AVG(age) AS avg_age
FROM students
[WHERE <predicate>]
GROUP BY dept
[HAVING <predicate>]
[ORDER BY <column list>] ;
Joseph E. Gonzalez
What if we wanted to only consider
departments that have greater than two
students?
SELECT dept, AVG(age)
FROM students
[WHERE <predicate>]
GROUP BY dept
[HAVING <predicate>]
[ORDER BY <column list>] ;
Joseph E. Gonzalez
What if we wanted to only consider
departments that have greater than two
students?
SELECT dept, AVG(age)
FROM students
?WHERE COUNT(*) > 2
GROUP BY dept
[HAVING <predicate>]
[ORDER BY <column list>] ;
Ø Doesn’t work …
Ø WHERE clause is applied before GROUP BY
Ø You cannot have aggregation functions in the where clause
Joseph E. Gonzalez
Students Table
HAVING
SELECT dept, AVG(gpa) as avg_gpa,
COUNT(*) as cnt
FROM students
[WHERE <predicate>]
GROUP BY dept
HAVING COUNT(*) > 2
Output
[ORDER BY <column list>] ;
Joseph E. Gonzalez
Recap: Grouping Your Data
from where group by having select order by limit
Joseph E. Gonzalez
Students Table
Output
ORDER BY
SELECT name, gpa, age
FROM students
WHERE name > ‘E'
[GROUP BY <column list>
[HAVING <predicate>] ]
ORDER BY gpa, name;
Joseph E. Gonzalez
Students Table
Output
ORDER BY
SELECT name, gpa, age
FROM students
WHERE name > ‘E'
[GROUP BY <column list>
[HAVING <predicate>] ]
ORDER BY gpa DESC, name ASC;
Output
Ø Ascending order by default
Ø DESC flag for descending, ASC for ascending
Ø Can mix and match, lexicographically
Joseph E. Gonzalez
LIMIT
SELECT * FROM students LIMIT 5;
Ø Returns 5 tuples from the students table
Ø Which 5 tuples?
Ø Arbitrary 5 à could be a convenience sample
Ø the ones in cache?
Joseph E. Gonzalez
Sampling with LIMIT
SELECT * FROM students LIMIT 5;
Ø What does this return?
Ø Simple Random Sample?
Ø Probability Sample?
Joseph E. Gonzalez
Test Your Understanding #1
from where group by having select order by limit
The students where the gender is female grouped by department and only
including departments having more than 2 female students and showing
the department name, average GPA and number of female students. The
results are ordered by the average GPA.
Joseph E. Gonzalez
Test Your Understanding #2
SELECT ????
FROM tips
WHERE ????
GROUP BY ????
HAVING ????
ORDER BY ????
SELECT ????
FROM tips
WHERE ????
GROUP BY ????
HAVING ????
ORDER BY ????
SELECT ????
FROM tips
WHERE day = 'Sun' OR day = 'Sat'
GROUP BY ????
HAVING ????
ORDER BY ????
SELECT ????
FROM tips
WHERE day = 'Sun' OR day = 'Sat'
GROUP BY ????
HAVING ????
ORDER BY ????
SELECT ????
FROM tips
WHERE day = 'Sun' OR day = 'Sat'
GROUP BY sex, smoker
HAVING ????
ORDER BY ????
SELECT ????
FROM tips
WHERE day = 'Sun' OR day = 'Sat'
GROUP BY sex, smoker
HAVING ????
ORDER BY ????
Joseph E. Gonzalez
Wouldn’t it be nice if there was just one big
table with everything we need?
pname category price qty date day city state country
Ø Big Galaxy
table:1many columns
Phones 18 and
30 rows
1/30/16 Wed. Omaha NE USA
Ø Substantial redundancy
Ø Risk of inconsistencies
Galaxy 1 Phones 18 20 3/31/16 Thu. Omaha NE USA
Ø Difficult to update
Galaxy 1 Phones 18 50 4/1/16 Fri. Omaha NE USA
Ø Could we organize the data more
efficiently?
Galaxy 1 Phones 18 8 1/30/16 Wed. Omaha NE USA
How do we do analysis?
Joins!!!!!
Locations
locid city state country
Joins!
Bringing tables together
Joseph E. Gonzalez
Basic Joins
SELECT s.sid, s.sname, r.bid
FROM Sailors s, Reserves r
WHERE s.sid = r.sid
AND s.age > 20;
Joseph E. Gonzalez
The Outer-Product (×)
R1 × S1: Each row of R1 paired with each row of S1
R1: R1 × S1
sid bid day sid bid day sid sname rating age
22 101 10/10/96
22 101 10/10/96 22 dustin 7 45.0
58 103 11/12/96
22 101 10/10/96 31 lubber 8 55.5
× S1: =
22
58
101
103
10/10/96
11/12/96
58
22
rusty
dustin
10
7
35.0
45.0
sid sname rating age
58 103 11/12/96 31 lubber 8 55.5
22 dustin 7 45.0
58 103 11/12/96 58 rusty 10 35.0
31 lubber 8 55.5
58 rusty 10 35.0
How many rows in the result? |R1| * |R2|
Sometimes also called
Cartesian Product:
SELECT *
FROM Sailors AS S1, Sailors AS S2
WHERE S1.age > S2.age
S1: S1 S2
sid sname rating age sid sname rating age sid sname rating age
22 dustin 7 45.0 22 dustin 7 45.0 58 rusty 10 35.0
31 lubber 8 55.5 31 lubber 8 55.5 22 dustin 7 45.0
58 rusty 10 35.0 31 lubber 8 55.5 58 rusty 10 35.0
http://sqlfiddle.com/#!17/53815/4
Joseph E. Gonzalez
Join Variants
SELECT (column_list)
FROM table_name
[INNER | {LEFT |RIGHT | FULL } {OUTER}] JOIN table_name
ON qualification_list
WHERE …
Ø INNER is default
Ø Inner join is akin to what we have seen so far.
Ø The term Outer is optional for Left, Right, and Full joins
Ø For example: LEFT OUTER = LEFT
Joseph E. Gonzalez
Sailors Boats
Joseph E. Gonzalez
NULL in comparators
What entries are in the output of all these queries?
SELECT rating = NULL FROM sailors; All of these queries
evaluate to null!
SELECT rating < NULL FROM sailors;
http://sqlfiddle.com/#!17/f35aa/6
Joseph E. Gonzalez
Explicit NULL Checks
http://sqlfiddle.com/#!17/f35aa/4
Joseph E. Gonzalez
Writing More Complex
SQL Queries
Joseph E. Gonzalez
WITH Common Table
dept_sizes AS (
SELECT dept, COUNT(*) AS size
Expressions
FROM students Allow for the creation of
GROUP BY dept “temporary tables”
), to organize
female_gpa (dept, avg_gpa) AS ( complex queries:
SELECT dept, AVG(gpa)
Output:
FROM students
WHERE gender = 'F'
GROUP BY dept
)
SELECT dept_sizes.dept AS dept, size, avg_gpa
FROM dept_sizes, female_gpa
WHERE dept_sizes.dept = female_gpa.dept
Joseph E. Gonzalez
Details on Joins
You need to know this but we probably won’t cover in class.
Joseph E. Gonzalez
Left Join
Returns all matched rows, and preserves all unmatched rows from the
table on the left of the join clause
(use nulls in fields of non-matching tuples)
Returns all sailors & bid for boat in any of their reservations
Note: If there is a sailor without a boat reservation then the sailor is
matched with the NULL bid.
Joseph E. Gonzalez
SELECT s.sid, s.sname, r.bid
FROM Sailors2 s LEFT JOIN Reserves2 r
ON s.sid = r.sid;
Sailors2 Reserves2
sid sname rating age sid bid day
22 Dustin 7 45 22 101 1996-10-10
31 Lubber 8 55.5 95 103 1996-11-12
95 Bob 3 63.5
Joseph E. Gonzalez
SELECT r.sid, b.bid, b.bname
FROM Reserves2 r RIGHT JOIN Boats2 b
ON r.bid = b.bid;
Boats2
Reserves2 bid bname color
sid bid day 101 Interlake blue
22 101 1996-10-10 102 Interlake red
95 103 1996-11-12 103 Clipper green
104 Marine red
Result:
sid bid bname
22 101 Interlake
95 103 Clipper
(null) 104 Marine
http://sqlfiddle.com/#!17/a7b2f/1 (null) 102 Interlake
Joseph E. Gonzalez
Full Outer Join
Ø Full Outer Join returns all (matched or unmatched) rows
from the tables on both sides of the join clause
Joseph E. Gonzalez
SELECT r.sid, b.bid, b.bname
FROM Reserves3 r FULL JOIN Boats2 b
ON r.bid = b.bid
Boats2
Reserves3 bid bname color
sid bid day
101 Interlake blue
22 101 1996-10-10 102 Interlake red
95 103 1996-11-12 103 Clipper green
38 42 2010-08-21 104 Marine red
Joseph E. Gonzalez
NULL in Boolean Logic
Three-valued logic: And T F N OrT F N
T T F N T T T T
Not T F N F F F F F T F N
F T N
N N F N N T N N
http://sqlfiddle.com/#!17/f35aa/2
Joseph E. Gonzalez
NULL and Aggregation
SELECT count(rating) FROM sailors;
Ø4
SELECT sum(rating) FROM sailors;
Ø 27 sid sname rating age
SELECT avg(rating) FROM sailors; 1 Popeye 10 22
Ø ?? 2 OliveOyl 11 39
3 Garfield 1 27
SELECT count(*) FROM sailors; 4 Bob 5 19
Ø ?? 11 Jack Sparrow (null) 35
http://bit.ly/ds100-sp18-null http://sqlfiddle.com/#!17/f35aa/7
Joseph E. Gonzalez
NULL and Aggregation
SELECT count(rating) FROM sailors;
Ø4
SELECT sum(rating) FROM sailors;
Ø 27 sid sname rating age
SELECT avg(rating) FROM sailors; 1 Popeye 10 22
Ø (10+11+1+5) / 4 = 6.75 2 OliveOyl 11 39
3 Garfield 1 27
SELECT count(*) FROM sailors; 4 Bob 5 19
Ø5 11 Jack Sparrow (null) 35
http://sqlfiddle.com/#!17/f35aa/7
Joseph E. Gonzalez
NULLs: Summary
Ø NULL op NULL is NULL
Ø WHERE NULL: do not send to output
Ø Boolean connectives: 3-valued logic
Ø Aggregates ignore NULL-valued inputs
Joseph E. Gonzalez
Joseph E. Gonzalez