0% found this document useful (0 votes)
9 views

Databases SQL

This document discusses databases and SQL. It begins with questions about database management systems and how they are used to store and manage organized collections of data. It then covers how SQL is used to query databases and expresses more complex queries. The document discusses interacting with database management systems through queries, applications, and web servers. It also covers widely used database technologies like relational database management systems and how they logically organize data in tables.

Uploaded by

komisanc6
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Databases SQL

This document discusses databases and SQL. It begins with questions about database management systems and how they are used to store and manage organized collections of data. It then covers how SQL is used to query databases and expresses more complex queries. The document discusses interacting with database management systems through queries, applications, and web servers. It also covers widely used database technologies like relational database management systems and how they logically organize data in tables.

Uploaded by

komisanc6
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 98

Databases and

SQL
Slides by: ?
Joseph E. Gonzalez
[email protected]

Joseph E. Gonzalez
Questions for Today
Ø What is a database management system?
Ø Why/how do we use them?
Ø Why do I have to do so many joins?

Ø How do we query databases? SQL


Ø What SQL should I already know (go review quickly)?

Ø How do I express more complex SQL queries?


Ø Common Table Expression (CTEs)
Ø Probability Samples from a Database

Joseph E. Gonzalez
What is a database?

Joseph E. Gonzalez
Defining Databases
Ø A database is an organized collection of data.

Ø A database management systems (DBMS) is a software


system that stores, manages, and facilitates access to
one or more databases.

Joseph E. Gonzalez
Database Management Systems
Ø Data storage
Ø Provide reliable storage to survive system crashes and disk failures
Ø Special data-structures to improve performance

Ø Data management
Ø Configure how data is logically organized and who has access
Ø Ensure data consistency properties (e.g., positive bank account
values)

Ø Facilitate access
Ø Enable efficient access to the data
Ø Supports user defined computation (queries) over data

Joseph E. Gonzalez
How do you interact with a
database?
What is the DBMS?
Ø Server
Ø Software
Ø A library

Answer: It can be all of these.


Joseph E. Gonzalez
Interacting with a DBMS

Query
SELECT * FROM sales DBMS Serve
WHERE price > 100.0 r on
yo u r o w n m
achine
Python Analysis

Response
Date Purchase ID Name Price
9/20/2012 1234 Sue $200.00
8/21/2012 3453 Joe $333.99

Joseph E. Gonzalez
Cust.
Prod.

Interacting with a DBMS Sales

Query
SELECT * FROM sales
WHERE price > 100.0 DBMS Serve
r
Python Analysis

Response
Date Purchase ID Name Price
9/20/2012 1234 Sue $200.00
8/21/2012 3453 Joe $333.99

Joseph E. Gonzalez
Cust.
Prod.

Interacting with a DBMS Sales

Web Servers

DBMS Serve
r
HTTP

Web Applications
Often many
Python Analysis systems will
Visualization
connect to a
DBMS
concurrently.

Joseph E. Gonzalez
Why are databases drawn as “cans”
Platters on a Disk Drive

Looks Like?
Platters on a Disk Drive

Looks
Like?

1956: IBM MODEL 350 RAMAC


First Commercial Disk Drive
5MB @ 1 ton
Why should I use a DBMS?
Why can’t I just have my CSV files?

Ø DBMSs organize many related sources of information


Ø DBMSs enforce guarantees on the data
Ø Can be used to prevent data anomalies
Ø Ensure safe concurrent operations on data

Ø DBMSs can be scalable


Ø Optimized to compute on data that does not fit in memory
Ø Parallel computation and optimized data structures

Ø DBMSs prevent data loss from software/hardware failures

Joseph E. Gonzalez
Widely Used
DBMS Technologies

Joseph E. Gonzalez
Common DBMS Systems
https://db-engines.com/en/ranking

Relational database management systems are widely used!


Joseph E. Gonzalez
Relational Database Management Systems
Ø Relational databases are the traditional DBMS technology

Ø Logically organize data in relations (tables)


Describes relationship:
Sales relation: Name Prod Price
Name purchased
Sue iPod $200.00
Prod at Price.
Joey Bike $333.99
Tuple (row) Alice Car $999.00
How is data
physically
Attribute (column) stored?
Joseph E. Gonzalez
Relational Data Abstraction
Database Management System
Relations (Tables) Optimized Data Structures
Name Prod Price

Sue iPod $200.00


B+Trees

Abstraction
Joey sid Bike
sname $333.99
rating age
Alice 28 Car
yuppy 9
$999.00 35.0
31 lubber 8 55.5
44 guppy 5 35.0 Page 1 Page 2

58
bid bname
rusty 10
color
35.0
Optimized
101 Interlake blue Page 3 Page 4
Storage
102 Interlake red Page
Header

104 Marine red Page 5 Page 6


103 Clipper green

Joseph E. Gonzalez
Physical Data
Relational Independence:
Data Abstraction
Database management systems hideManagement
Database how data System
is
Relations (Tables)
stored from end user applications
Optimized Data Structures
Name Prod Price

à System can optimize storage and computation


Sue iPod $200.00
B+Trees

Abstraction
Joey sid Bike
sname $333.99
rating age
without
Alice 28 Car changing
yuppy 9
$999.00
applications.
35.0
31 lubber 8 55.5
44 guppy 5
bid bname
Big Idea in Data Structures
35.0
color Optimized
Page 1 Page 2

58 rusty 10 35.0
101 Interlake
102 Interlake
blue
red
Data Systems &
Storage Page 3 Page 4
Page

Computer Science
Header

104 Marine red Page 5 Page 6


103 Clipper green

It wasn’t always like this …


Joseph E. Gonzalez
In a time long ago …

Joseph E. Gonzalez
Before 1970’s databases
were not routinely organized as tables.

Instead they exposed specialized


data structures designed for
specific applications.

Joseph E. Gonzalez
Ted Codd and the Relational Model
Ø [1969] Relational model: a mathematical
abstraction of a database as sets
Ø Independence of data from the physical properties of
stage storage and representation

Ø [1972] Relational Algebra & Calculus: a collection


of operations and a way defining logical
outcomes for data transformations
Ø Algebra: foundation of technologies like Pandas
Ø Calculus: the foundation of modern SQL

Edgar F. “Ted” Codd (1923 - 2003)


Turing Award 1981

Joseph E. Gonzalez
SQL
What
not
How
Joseph E. Gonzalez
SQL is a Declarative Language
Ø Declarative: “Say what you want, not how to get it.”
Ø Declarative Example: I want a table with columns “x” and “y” from tables
“A” and ”B” where the values in “y” are greater than 100.00.

Ø Imperative Example: For each record in table “A” find the corresponding
record in table “B” then drop the records where “y” is less than or equal to
100 then return the ”x” and “y” values.

Ø Advantages of declarative programming


Ø Enable the system to find the best way to achieve the result.
Ø Often more compact and easier to learn for non-programmers

Ø Challenges of declarative programming


Ø System performance depends heavily on automatic optimization
Ø Limited language (not Turing complete)
Joseph E. Gonzalez
Relational Terminology
Ø Database: Set of Relations (i.e., one or more tables)
Ø Relation (Table):
Ø Schema: description of columns, their types, and constraints
Ø Instance: data satisfying the schema
Ø Attribute (Column)
Ø Tuple (Record, Row)
Ø Schema of database is set of schemas of its relations

Joseph E. Gonzalez
Two sublanguages of SQL
Ø DDL – Data Definition Language
Ø Define and modify schema
Ø DML – Data Manipulation Language
Ø For interacting with the data itself SELECT * FROM
THINGS;

CAPITALIZATION IS optional …
BUT DATABASE PEOPLE PREFER TO YELL
Joseph E. Gonzalez
Creating Tables &
Populating Tables

Joseph E. Gonzalez
http://sqlfiddle.com/#!17/80257

Creating Tables
CREATE TABLE students(
name TEXT PRIMARY KEY,
gpa REAL CHECK (gpa >= 0.0 and gpa <= 4.0),
age INTEGER,
dept TEXT,
gender CHAR);

Imposing Integrity
Constraints

Useful to ensure data quality…


Inserting Records into a Table
INSERT INTO students (name, gpa, age, dept, gender) ß Optional
VALUES
('Sergey Brin', 2.8, 45, 'CS', 'M'),
('Danah Boyd', 3.9, 40, 'CS', 'F'),
('Bill Gates', 1.0, 63, 'CS', 'M'), Ø Fields must be entered in
order (record)
('Hillary Mason', 4.0, 39, 'DATASCI', 'F'),
Ø Comma between records
('Mike Olson', 3.7, 53, 'CS', 'M'),
Ø Must use the single quote
('Mark Zuckerberg', 3.8, 34, 'CS', 'M'),
('Sheryl Sandberg', 3.6, 49, 'BUSINESS', 'F'), (’) for strings.
('Susan Wojcicki', 3.8, 50, 'BUSINESS', 'F'),
('Marissa Mayer', 2.6, 43, 'BUSINESS', 'F')
;

-- This is a comment. No
-- Does the order matter?
http://sqlfiddle.com/#!17/80257 Note: This information is fictional and does not
reflect the individuals listed in the table.
Common SQL Types (there are others...)
Ø CHAR(size): Fixed number of characters
Ø TEXT: Arbitrary number of character strings
Ø INTEGER & BIGINT: Integers of various sizes
Ø REAL & DOUBLE PRECISION: Floating point numbers
Ø DATE & DATETIME: Date and Date+Time formats

See documentation for database system (e.g., Postgres)

Joseph E. Gonzalez
CREATE TABLE Sailors (
sid INTEGER,
sname CHAR(20), Columns have
sid sname rating age rating INTEGER, names and types
1 Fred 7 22 age REAL,
PRIMARY KEY (sid));
2 Jim 2 39
Specify
3 Nancy 8 27 Primary Key
CREATE TABLE Boats ( column(s)
bid INTEGER,
bname CHAR (20),
bid bname color color CHAR(10),
PRIMARY KEY (bid)); Specify
101 Nina red
Foreign Key
102 Pinta blue relationships
103 Santa Maria red CREATE TABLE Reserves (
sid INTEGER,
bid INTEGER,
day DATE,
PRIMARY KEY (sid, bid, day),
FOREIGN KEY (sid) REFERENCES Sailors,
sid bid day FOREIGN KEY (bid) REFERENCES Boats);
1 102 9/12
2 102 9/13 Semicolon at
end of command
Deleting and Modifying Records
Ø Records are deleted by specifying a condition:
DELETE FROM students
WHERE LOWER(name) = 'sergey brin'
String Function

Ø Modifying records
UPDATE students
SET gpa = 1.0 + gpa
WHERE dept = ‘CS’;

Ø Note: There is no way to modify records by location


Joseph E. Gonzalez
Deleting and Modifying Records
Ø What is wrong with the following
UPDATE students
SET gpa = 1.0 + gpa
WHERE dept = ‘CS’;
CREATE TABLE students(
name TEXT PRIMARY KEY,
gpa FLOAT CHECK (gpa >= 0.0 and gpa <= 4.0),
age INTEGER,
dept TEXT,
gender CHAR); Update would violate
Integrity Constraints

Joseph E. Gonzalez
Dropping Tables
Ø To delete a table entirely from our database, we can use
the “DROP TABLES” command.
Ø Example: DROP TABLE IF EXISTS tips;
Ø Be careful, there is no easy way to undo this command.

https://xkcd.com/327/

Joseph E. Gonzalez
Querying Tables

Joseph E. Gonzalez
SQL DML:
Basic Single-Table Queries
SELECT [DISTINCT] <column expression list>
FROM <list of tables>
[WHERE <predicate>]
[GROUP BY <column list>
[HAVING <predicate>] ]
[ORDER BY <column list>]
[LIMIT <number of rows>];

Ø Elements of the basic select statement


Ø [Square brackets] are optional expressions.
Joseph E. Gonzalez
How to Read A Basic Query
from where select

SELECT [DISTINCT] <column expression list>


FROM <list of tables>
[WHERE <predicate>] Ø FROM: List of tables
[GROUP BY <column list> Ø Take the OUTER PRODUCT
[HAVING <predicate>] ]
[ORDER BY <column list>] Ø WHERE: Predicate on which rows
[LIMIT <number of rows>]; to include
Ø SELECT: Which columns and
computed values to include

Joseph E. Gonzalez
Find the name and GPA for all CS
Students
Students Table
SELECT name, gpa
FROM students
WHERE dept = 'CS'

Joseph E. Gonzalez
Find the name and GPA for all CS
Students
Students Table
SELECT name, gpa
FROM students
WHERE dept = 'CS'

Joseph E. Gonzalez
Combing predicates with OR and AND
Students Table
SELECT name, gpa
FROM students
WHERE dept = 'CS' AND gpa > 3.8

Joseph E. Gonzalez
Students Table

SELECT DISTINCT
SELECT DISTINCT dept
FROM students
[WHERE <predicate>]
[GROUP BY <column list>
[HAVING <predicate>] ]
[ORDER BY <column list>] ;

Ø DISTINCT flag specifies removal of duplicates before output

Joseph E. Gonzalez
Students Table

SELECT DISTINCT
SELECT DISTINCT dept, gender
FROM students
[WHERE <predicate>]
[GROUP BY <column list>
[HAVING <predicate>] ]
[ORDER BY <column list>] ;

Output

Ø DISTINCT operates at the tuple level, not the column level

Joseph E. Gonzalez
Functions in the SELECT list

SELECT UPPER(name),
LOWER(dept) AS d,
gpa/4.0 AS gpa_ratio
FROM students
[WHERE <predicate>]
[GROUP BY <column list>
[HAVING <predicate>] ]
[ORDER BY <column list>] ;

Ø Can provide functions in selection list.

Ø AS lets us alias (name) our columns.

Joseph E. Gonzalez
Students Table

Aggregates
Output
SELECT AVG(age)
FROM students
WHERE dept = 'CS'
[GROUP BY <column list>
[HAVING <predicate>] ]
[ORDER BY <column list>] ;

Ø Before producing output, compute a summary statistic


Ø Aggregates include: SUM, COUNT, MAX, MIN, …

Ø Produces 1 row of output à Still a table


Ø Note: can use DISTINCT inside the agg function
Ø SELECT COUNT(DISTINCT name) …
Joseph E. Gonzalez
So far … (you know this already?)
from where select

SELECT [DISTINCT] <column expression list>


FROM <list of tables>
[WHERE <predicate>] Ø FROM: List of tables
[GROUP BY <column list> Ø Take the OUTER PRODUCT
[HAVING <predicate>] ]
[ORDER BY <column list>] Ø WHERE: Predicate on which rows
[LIMIT <number of rows>]; to include
Ø SELECT: Which columns and
computed values to include

Joseph E. Gonzalez
Grouping Your Data
from where group by having select

SELECT [DISTINCT] <column expression list>


FROM <list of tables>
[WHERE <predicate>] Ø GROUP BY: List of columns
[GROUP BY <column list> Ø Cluster rows into groups
[HAVING <predicate>] ] Ø One group for each unique
[ORDER BY <column list>] combination of column values
[LIMIT <number of rows>]; Ø Final table will have one row per
group

Ø HAVING: Predicate on which


GROUPS to include
Ø e.g., COUNT(*) >= 10
Joseph E. Gonzalez
Students Table

GROUP BY
SELECT dept, AVG(age) AS avg_age
FROM students
[WHERE <predicate>]
GROUP BY dept
[HAVING <predicate>]
[ORDER BY <column list>] ;
Output
Ø Partition table into groups with same
GROUP BY column values
Ø Produce one aggregate result per group

Joseph E. Gonzalez
What does the following Produce?
SELECT name, AVG(age) AS avg_age
FROM students
[WHERE <predicate>]
GROUP BY dept
[HAVING <predicate>]
[ORDER BY <column list>] ;

Ø An error? (maybe?) (why?)


Ø What name should be used for each group?
Ø Depends on database (error in Postgres but not SQLite or MySQL)

Joseph E. Gonzalez
What if we wanted to only consider
departments that have greater than two
students?
SELECT dept, AVG(age)
FROM students
[WHERE <predicate>]
GROUP BY dept
[HAVING <predicate>]
[ORDER BY <column list>] ;

Joseph E. Gonzalez
What if we wanted to only consider
departments that have greater than two
students?
SELECT dept, AVG(age)
FROM students
?WHERE COUNT(*) > 2
GROUP BY dept
[HAVING <predicate>]
[ORDER BY <column list>] ;

Ø Doesn’t work …
Ø WHERE clause is applied before GROUP BY
Ø You cannot have aggregation functions in the where clause

Joseph E. Gonzalez
Students Table

HAVING
SELECT dept, AVG(gpa) as avg_gpa,
COUNT(*) as cnt
FROM students
[WHERE <predicate>]
GROUP BY dept
HAVING COUNT(*) > 2
Output
[ORDER BY <column list>] ;

Ø The HAVING predicate is applied after


grouping and aggregation
Ø Hence can contain anything that could go in the SELECT list
Joseph E. Gonzalez
Recap: Grouping Your Data
from where group by having select

SELECT [DISTINCT] <column expression list>


FROM <list of tables>
[WHERE <predicate>]
[GROUP BY <column list>
[HAVING <predicate>] ]
[ORDER BY <column list>]
[LIMIT <number of rows>];

Joseph E. Gonzalez
Recap: Grouping Your Data
from where group by having select order by limit

SELECT [DISTINCT] <column expression list>


FROM <list of tables>
[WHERE <predicate>] Ø ORDER BY: ordering of the final
[GROUP BY <column list> rows (groups)
[HAVING <predicate>] ]
[ORDER BY <column list>] Ø LIMIT: specifies the number of
[LIMIT <number of rows>]; records that are finally returned

Joseph E. Gonzalez
Students Table

Output
ORDER BY
SELECT name, gpa, age
FROM students
WHERE name > ‘E'
[GROUP BY <column list>
[HAVING <predicate>] ]
ORDER BY gpa, name;

Ø ORDER BY clause specifies output to be sorted


Ø Lexicographic ordering

Joseph E. Gonzalez
Students Table

Output
ORDER BY
SELECT name, gpa, age
FROM students
WHERE name > ‘E'
[GROUP BY <column list>
[HAVING <predicate>] ]
ORDER BY gpa DESC, name ASC;
Output
Ø Ascending order by default
Ø DESC flag for descending, ASC for ascending
Ø Can mix and match, lexicographically

Joseph E. Gonzalez
LIMIT
SELECT * FROM students LIMIT 5;
Ø Returns 5 tuples from the students table
Ø Which 5 tuples?
Ø Arbitrary 5 à could be a convenience sample
Ø the ones in cache?

Ø Why is LIMIT important?


Ø Queries may generate BIG DATA responses (e.g., billions of tuples)
Ø Be careful with your Jupyter notebooks
Ø browsers don’t like big tables

Joseph E. Gonzalez
Sampling with LIMIT
SELECT * FROM students LIMIT 5;
Ø What does this return?
Ø Simple Random Sample?
Ø Probability Sample?

SELECT name FROM students ORDER BY name LIMIT 5;


Ø Is this a probability sample?
Ø How can we generate a Simple Random Sample?
SELECT name FROM students

ORDER BY RANDOM() LIMIT 5;


Joseph E. Gonzalez
Summary of Reading SQL Queries
from where group by having select order by limit

SELECT [DISTINCT] <column expression list>


FROM <list of tables>
[WHERE <predicate>]
[GROUP BY <column list>
[HAVING <predicate>] ]
[ORDER BY <column list>]
[LIMIT <number of rows>]; Try Queries Here
http://sqlfiddle.com/#!17/80257

Joseph E. Gonzalez
Test Your Understanding #1
from where group by having select order by limit

SELECT dept, AVG(gpa) AS avg_gpa, COUNT(*) AS size


FROM students
WHERE gender = 'F'
GROUP BY dept
HAVING COUNT(*) > 2
ORDER BY avg_gpa DESC

What does this compute?

The students where the gender is female grouped by department and only
including departments having more than 2 female students and showing
the department name, average GPA and number of female students. The
results are ordered by the average GPA.

Joseph E. Gonzalez
Test Your Understanding #2
SELECT ????
FROM tips
WHERE ????
GROUP BY ????
HAVING ????
ORDER BY ????

Suppose we want to compare smoker vs. non-


smoker and female vs. male tips for weekend
diners. Create a table ordered by percentage
tip that gives the average tip for all four
possibilities.
Joseph E. Gonzalez
Test Your Understanding #2

SELECT ????
FROM tips
WHERE ????
GROUP BY ????
HAVING ????
ORDER BY ????

Suppose we want to compare smoker vs. non-smoker and


female vs. male tips for weekend diners. Create a table ordered
by percentage tip that gives the average tip for all four
possibilities.
Joseph E. Gonzalez
Test Your Understanding #2

SELECT ????
FROM tips
WHERE day = 'Sun' OR day = 'Sat'
GROUP BY ????
HAVING ????
ORDER BY ????

Suppose we want to compare smoker vs. non-smoker and


female vs. male tips for weekend diners. Create a table ordered
by percentage tip that gives the average tip for all four
possibilities.
Joseph E. Gonzalez
Test Your Understanding #2

SELECT ????
FROM tips
WHERE day = 'Sun' OR day = 'Sat'
GROUP BY ????
HAVING ????
ORDER BY ????

Suppose we want to compare smoker vs. non-smoker and


female vs. male tips for weekend diners. Create a table ordered
by percentage tip that gives the average tip for all four
possibilities.
Joseph E. Gonzalez
Test Your Understanding #2

SELECT ????
FROM tips
WHERE day = 'Sun' OR day = 'Sat'
GROUP BY sex, smoker
HAVING ????
ORDER BY ????

Suppose we want to compare smoker vs. non-smoker and


female vs. male tips for weekend diners. Create a table ordered
by percentage tip that gives the average tip for all four
possibilities.
Joseph E. Gonzalez
Test Your Understanding #2

SELECT ????
FROM tips
WHERE day = 'Sun' OR day = 'Sat'
GROUP BY sex, smoker
HAVING ????
ORDER BY ????

Suppose we want to compare smoker vs. non-smoker and


female vs. male tips for weekend diners. Create a table ordered
by percentage tip that gives the average tip for all four
possibilities.
Joseph E. Gonzalez
Test Your Understanding #2

SELECT sex, smoker, avg(tip/total_bill) as pct


FROM tips
WHERE day = 'Sun' OR day = 'Sat'
GROUP BY sex, smoker
HAVING ????
ORDER BY ????

Suppose we want to compare smoker vs. non-smoker and


female vs. male tips for weekend diners. Create a table ordered
by percentage tip that gives the average tip for all four
possibilities.
Joseph E. Gonzalez
Test Your Understanding #2

SELECT sex, smoker, avg(tip/total_bill) as pct


FROM tips
WHERE day = 'Sun' OR day = 'Sat'
GROUP BY sex, smoker
HAVING ????
ORDER BY pct

Suppose we want to compare smoker vs. non-smoker and


female vs. male tips for weekend diners. Create a table ordered
by percentage tip that gives the average tip for all four
possibilities.
Joseph E. Gonzalez
Joins?

Let’s first talk about real-world


schemas

Joseph E. Gonzalez
Wouldn’t it be nice if there was just one big
table with everything we need?
pname category price qty date day city state country

Corn Food 25 25 3/30/16 Wed. Omaha NE USA

Corn Food 25 8 3/31/16 Thu. Omaha NE USA

Corn Food 25 15 4/1/16 Fri. Omaha NE USA

Galaxy 1 Phones 18 30 1/30/16 Wed. Omaha NE USA

Galaxy 1 Phones 18 20 3/31/16 Thu. Omaha NE USA

Galaxy 1 Phones 18 50 4/1/16 Fri. Omaha NE USA


Joseph E. Gonzalez
Time
Real World Schemas: timeid Date Day
1 3/30/16 Wed.
2 3/31/16 Thu.
Sales Locations 3 4/1/16 Fri.
pid timeid locid sales locid city state country

11 1 1 25 1 Omaha Nebraska USA


2 Seoul Korea
11 2 1 8
5 Richmond Virginia USA
11 3 1 15
12 1 1 30 Products Managers
12 2 1 20 pid pname category price mid locid title name
11 Corn Food 25 1 1 CTO Deer
12 3 1 50
12 Galaxy 1 Phones 18 2 1 Dr. Park
12 1 1 8
13 Peanuts Food 2 3 5 Mr. Smith
13 2 1 10
13 3 1 10
So many tables!!!!!
11 1 2 35
11 2 2 22 Why?
Joseph E. Gonzalez
Example Sales Data
pname category price qty date day city state country

Corn Food 25 25 3/30/16 Wed. Omaha NE USA

Corn Food 25 8 3/31/16 Thu. Omaha NE USA

Corn Food 25 15 4/1/16 Fri. Omaha NE USA

Ø Big Galaxy
table:1many columns
Phones 18 and
30 rows
1/30/16 Wed. Omaha NE USA
Ø Substantial redundancy
Ø Risk of inconsistencies
Galaxy 1 Phones 18 20 3/31/16 Thu. Omaha NE USA
Ø Difficult to update
Galaxy 1 Phones 18 50 4/1/16 Fri. Omaha NE USA
Ø Could we organize the data more
efficiently?
Galaxy 1 Phones 18 8 1/30/16 Wed. Omaha NE USA

Peanuts Food 2 45 3/31/16 Thu. Seoul Korea


Joseph E. Gonzalez
Multidimensional Data Model
Sales Fact Table Locations
pid timeid locid sales locid
1
city
Omaha
state
Nebraska
country
USA
Dimension
11 1 1 25
11 2 1 8
2 Seoul Korea Tables
5 Richmond Virginia USA
11 3 1 15
12 1 1 30 Products Ø Normalized Data
12 2 1 20 pid pname category price Representation
11 Corn Food 25
12 3 1 50
12 Galaxy 1 Phones 18 Ø Fact Table
12 1 1 8
13 Peanuts Food 2 Ø minimizes redundant info.
13 2 1 10 Ø Reduces data errors
13 3 1 10 Time
timeid Date Day Ø Dimensions
11 1 2 35
1 3/30/16 Wed. Ø easy to manage and
11 2 2 22 summarize
2 3/31/16 Thu.
11 3 2 10 Ø Rename: Galaxy1 à Phablet
3 4/1/16 Fri.
12 1 2 26 Joseph E. Gonzalez
Connections between table
Products Time
pid pname category price timeid Date Day

Sales Fact Table


pid timeid locid sales ß This looks like a star …

How do we do analysis?
Joins!!!!!

Locations
locid city state country
Joins!
Bringing tables together

Joseph E. Gonzalez
Basic Joins
SELECT s.sid, s.sname, r.bid
FROM Sailors s, Reserves r
WHERE s.sid = r.sid
AND s.age > 20;

Can select from multiple tables.


Ø By ensuring sids are equal we get
an inner join.
Ø Will discuss joins in more detail in
the next lecture.
http://sqlfiddle.com/#!17/4215a/10
Joseph E. Gonzalez
Join Queries
SELECT [DISTINCT] <column expression list>
FROM <table1 [AS t1], ... , tableN [AS tn]>
[WHERE <predicate>]
[GROUP BY <column list>
[HAVING <predicate>] ]
[ORDER BY <column list>];

1. FROM : compute outer product of tables.


2. WHERE : Check conditions, discard tuples that fail.
3. SELECT : Specify desired fields in output.

Ø Note: likely a terribly inefficient strategy!


Ø Query optimizer will find more efficient plans.

Joseph E. Gonzalez
The Outer-Product (×)
R1 × S1: Each row of R1 paired with each row of S1
R1: R1 × S1
sid bid day sid bid day sid sname rating age
22 101 10/10/96
22 101 10/10/96 22 dustin 7 45.0
58 103 11/12/96
22 101 10/10/96 31 lubber 8 55.5

× S1: =
22
58
101
103
10/10/96
11/12/96
58
22
rusty
dustin
10
7
35.0
45.0
sid sname rating age
58 103 11/12/96 31 lubber 8 55.5
22 dustin 7 45.0
58 103 11/12/96 58 rusty 10 35.0
31 lubber 8 55.5
58 rusty 10 35.0
How many rows in the result? |R1| * |R2|
Sometimes also called
Cartesian Product:

2 2,a 2,b 2,c


1 1,a 1,b 1,c
a b c
Joseph E. Gonzalez
Return Sailors (S) and the dates of their
Reservations (R)
SELECT S.sname, R.day Symbol for join
FROM Reserves AS R, Sailors AS S (Rel. Alg.)

WHERE S.sid = R.sid


R1 ⋈ S1
R: S: sid bid day sid sname rating age
22 101 10/10/96 22 dustin 7 45.0
sid bid day sid sname rating age
22 101 10/10/96 31 lubber 8 55.5
22 101 10/10/96 22 dustin 7 45.0
22 101 10/10/96 58 rusty 10 35.0
58 103 11/12/96 31 lubber 8 55.5
58 103 11/12/96 22 dustin 7 45.0
58 rusty 10 35.0
58 103 11/12/96 31 lubber 8 55.5
58 103 11/12/96 58 rusty 10 35.0
http://sqlfiddle.com/#!17/53815/1140/0
Joseph E. Gonzalez
About Range Variables
Ø Needed when ambiguity could arise.
Ø e.g., same table used multiple times in FROM (“self-join”)

SELECT *
FROM Sailors AS S1, Sailors AS S2
WHERE S1.age > S2.age

S1: S1 S2
sid sname rating age sid sname rating age sid sname rating age
22 dustin 7 45.0 22 dustin 7 45.0 58 rusty 10 35.0
31 lubber 8 55.5 31 lubber 8 55.5 22 dustin 7 45.0
58 rusty 10 35.0 31 lubber 8 55.5 58 rusty 10 35.0

http://sqlfiddle.com/#!17/53815/4
Joseph E. Gonzalez
Join Variants
SELECT (column_list)
FROM table_name
[INNER | {LEFT |RIGHT | FULL } {OUTER}] JOIN table_name
ON qualification_list
WHERE …
Ø INNER is default
Ø Inner join is akin to what we have seen so far.
Ø The term Outer is optional for Left, Right, and Full joins
Ø For example: LEFT OUTER = LEFT

Joseph E. Gonzalez
Sailors Boats

Inner/Natural Joins sid


1
sname rating
Fred 7
age
22
bid
101
bname
Nina
color
red

SELECT s.sid, s.sname, r.bid 2 Jim 2 39 102 Pinta blue


3 Nancy 8 27 103 Santa Maria red
FROM Sailors s, Reserves r
WHERE s.sid = r.sid Reserves
AND s.age > 20; sid bid day
1 102 9/12
SELECT s.sid, s.sname, r.bid 2 102 9/13
FROM Sailors s INNER JOIN Reserves r
ON s.sid = r.sid Ø “NATURAL” means equi-
WHERE s.age > 20; join for each pair of
attributes with the same
SELECT s.sid, s.sname, r.bid name
FROM Sailors s NATURAL JOIN Reserves r
WHERE s.age > 20;
all 3 are
equivalent!
http://sqlfiddle.com/#!17/4215a/10
Joseph E. Gonzalez
Brief Detour: Null Values
Ø Field values are sometimes unknown
Ø SQL provides a special value NULL for such situations.
Ø Every data type can be NULL

Ø The presence of null complicates many issues. E.g.:


Ø Selection predicates (WHERE)
Ø Aggregation

Ø But NULLs are common after outer joins

Joseph E. Gonzalez
NULL in comparators
What entries are in the output of all these queries?
SELECT rating = NULL FROM sailors; All of these queries
evaluate to null!
SELECT rating < NULL FROM sailors;

SELECT rating >= NULL FROM sailors;


Even this one!

SELECT * FROM sailors WHERE rating = NULL;


Rule: (x op NULL) evaluates to … NULL!

http://sqlfiddle.com/#!17/f35aa/6

Joseph E. Gonzalez
Explicit NULL Checks

Ø To check if a value is NULL you must use explicit NULL


checks

SELECT * FROM sailors WHERE rating IS NULL;

SELECT * FROM sailors WHERE rating IS NOT NULL;

http://sqlfiddle.com/#!17/f35aa/4
Joseph E. Gonzalez
Writing More Complex
SQL Queries

Joseph E. Gonzalez
WITH Common Table
dept_sizes AS (
SELECT dept, COUNT(*) AS size
Expressions
FROM students Allow for the creation of
GROUP BY dept “temporary tables”
), to organize
female_gpa (dept, avg_gpa) AS ( complex queries:
SELECT dept, AVG(gpa)
Output:
FROM students
WHERE gender = 'F'
GROUP BY dept
)
SELECT dept_sizes.dept AS dept, size, avg_gpa
FROM dept_sizes, female_gpa
WHERE dept_sizes.dept = female_gpa.dept
Joseph E. Gonzalez
Details on Joins
You need to know this but we probably won’t cover in class.

Joseph E. Gonzalez
Left Join
Returns all matched rows, and preserves all unmatched rows from the
table on the left of the join clause
(use nulls in fields of non-matching tuples)

SELECT s.sid, s.sname, r.bid


FROM Sailors2 s LEFT JOIN Reserves2 r
ON s.sid = r.sid;

Returns all sailors & bid for boat in any of their reservations
Note: If there is a sailor without a boat reservation then the sailor is
matched with the NULL bid.

Joseph E. Gonzalez
SELECT s.sid, s.sname, r.bid
FROM Sailors2 s LEFT JOIN Reserves2 r
ON s.sid = r.sid;
Sailors2 Reserves2
sid sname rating age sid bid day
22 Dustin 7 45 22 101 1996-10-10
31 Lubber 8 55.5 95 103 1996-11-12
95 Bob 3 63.5

sid sname bid


22 Dustin 101
95 Bob 103
31 Lubber (null)
http://sqlfiddle.com/#!17/54a88/2
Joseph E. Gonzalez
Right Join
Ø Right join returns all matched rows, and preserves all unmatched
rows from the table on the right of the join clause

SELECT r.sid, b.bid, b.bname


FROM Reserves2 r RIGHT JOIN Boats2 b
ON r.bid = b.bid;

Ø Returns all boats & information on which ones are reserved.


Ø No match for b.bid? r.sid IS NULL!

Joseph E. Gonzalez
SELECT r.sid, b.bid, b.bname
FROM Reserves2 r RIGHT JOIN Boats2 b
ON r.bid = b.bid;
Boats2
Reserves2 bid bname color
sid bid day 101 Interlake blue
22 101 1996-10-10 102 Interlake red
95 103 1996-11-12 103 Clipper green
104 Marine red

Result:
sid bid bname
22 101 Interlake
95 103 Clipper
(null) 104 Marine
http://sqlfiddle.com/#!17/a7b2f/1 (null) 102 Interlake
Joseph E. Gonzalez
Full Outer Join
Ø Full Outer Join returns all (matched or unmatched) rows
from the tables on both sides of the join clause

SELECT r.sid, b.bid, b.bname


FROM Reserves2 r FULL JOIN Boats2 b
ON r.bid = b.bid
Ø If no boat for a sailor? à b.bid IS NULL AND b.bname IS
NULL!
Ø If no sailor for a boat? à r.sid IS NULL!

Joseph E. Gonzalez
SELECT r.sid, b.bid, b.bname
FROM Reserves3 r FULL JOIN Boats2 b
ON r.bid = b.bid
Boats2
Reserves3 bid bname color
sid bid day
101 Interlake blue
22 101 1996-10-10 102 Interlake red
95 103 1996-11-12 103 Clipper green
38 42 2010-08-21 104 Marine red

Result: sid bid bname


22 101 Interlake
95 103 Clipper
38 (null) (null)
(null) 104 Marine
(null) 102 Interlake
http://sqlfiddle.com/#!17/e1f3a/3/0
Joseph E. Gonzalez
Bonus Slides on Null Logic
These will not be on the exam but are good to know about.

Joseph E. Gonzalez
NULL in Boolean Logic
Three-valued logic: And T F N OrT F N
T T F N T T T T
Not T F N F F F F F T F N
F T N
N N F N N T N N

SELECT * FROM sailors WHERE rating > 8 AND TRUE;

SELECT * FROM sailors WHERE rating > 8 OR TRUE;

SELECT * FROM sailors WHERE NOT (rating > 8);

http://sqlfiddle.com/#!17/f35aa/2
Joseph E. Gonzalez
NULL and Aggregation
SELECT count(rating) FROM sailors;
Ø4
SELECT sum(rating) FROM sailors;
Ø 27 sid sname rating age
SELECT avg(rating) FROM sailors; 1 Popeye 10 22
Ø ?? 2 OliveOyl 11 39
3 Garfield 1 27
SELECT count(*) FROM sailors; 4 Bob 5 19
Ø ?? 11 Jack Sparrow (null) 35

http://bit.ly/ds100-sp18-null http://sqlfiddle.com/#!17/f35aa/7
Joseph E. Gonzalez
NULL and Aggregation
SELECT count(rating) FROM sailors;
Ø4
SELECT sum(rating) FROM sailors;
Ø 27 sid sname rating age
SELECT avg(rating) FROM sailors; 1 Popeye 10 22
Ø (10+11+1+5) / 4 = 6.75 2 OliveOyl 11 39
3 Garfield 1 27
SELECT count(*) FROM sailors; 4 Bob 5 19
Ø5 11 Jack Sparrow (null) 35

http://sqlfiddle.com/#!17/f35aa/7
Joseph E. Gonzalez
NULLs: Summary
Ø NULL op NULL is NULL
Ø WHERE NULL: do not send to output
Ø Boolean connectives: 3-valued logic
Ø Aggregates ignore NULL-valued inputs

Joseph E. Gonzalez
Joseph E. Gonzalez

You might also like