SQL
SQL Introduction
Standard language for querying and manipulating data Structured Query Language Many standards out there: ANSI SQL, SQL92 (a.k.a. SQL2), SQL99 (a.k.a. SQL3), . Vendors support various subsets: watch for fun discussions in class !
SQL
Structured Query Language
SQL (Structured Query Language) is used to modify and access data or information from a storage area called database. SQL stands for Structured Query Language and can be pronounced as SQL or sequel . IBM first developed SQL in 1970s. Also it is an ANSI/ISO standard. It is a Standard Universal Language used by most of the relational database management systems (RDBMS). Some of the RDBMS systems are: Oracle, Microsoft SQL server, Sybase etc.
SQL Commands:
SQL commands are instructions used to communicate with the database to perform specific task that work with data. SQL commands can be used not only for searching the database but also to perform various other functions like, for example, you can create tables, add data to tables, or modify data, drop the table, set permissions for users. SQL commands are grouped into four major categories depending on their functionality:
o o o o Data Definition Language (DDL) Data Manipulation Language (DML) Transaction Control Language (TCL) Data Control Language (DCL)
Data Definition Language (DDL)
These SQL commands are used for creating, modifying, and dropping the structure of database objects. The commands are CREATE, ALTER, DROP, RENAME, and TRUNCATE.
Data Manipulation Language (DML)
These SQL commands are used for storing, retrieving, modifying, and deleting data. These commands are SELECT, INSERT, UPDATE, and DELETE.
Transaction Control Language (TCL)
These SQL commands are used for managing changes affecting the data. These commands are COMMIT, ROLLBACK, and SAVEPOINT.
Data Control Language (DCL)
These SQL commands are used for providing security to database objects. These commands are GRANT and REVOKE..
SQL commands
Few of the SQL commands used in SQL prog.
SELECT Statement, UPDATE Statement, INSERT INTO Statement, WHERE Clause, ORDER BY Clause, GROUP BY Clause, ORDER Clause, GROUP Functions Joins, Views Indexes DELETE Statement ,
Table name
Attribute names
Tables in SQL
Product
PName
Gizmo Powergizmo SingleTouch MultiTouch Tuples or rows
Price
$19.99 $29.99 $149.99 $203.99
Category
Gadgets Gadgets Photography Household
Manufacturer
GizmoWorks GizmoWorks Canon Hitachi
Tables Explained
The schema of a table is the table name and its attributes: Product(PName, Price, Category, Manfacturer)
A key is an attribute whose values are unique; we underline a key Product(PName, Price, Category, Manfacturer)
Data Types in SQL
Atomic types:
Characters: CHAR(20), VARCHAR(50) Numbers: INT, BIGINT, SMALLINT, FLOAT Others: MONEY, DATE,TIME,TIMESTAMP, YEAR, YEAR STAMPL etc
Every attribute must have an atomic type
Hence tables are flat Why ?
Tables Explained
A tuple = a record
Restriction: all attributes are of atomic type
A table = a set of tuples
Like a list but it is unorderd: no first(), no next(), no last().
SQL SELECT Statement
The SQL SELECT statement is used to query or retrieve data from a table in the database. A query may retrieve information from specified columns or from all of the columns in the table. To create a simple SQL SELECT Statement, you must specify the column(s) name and the table name. The whole query is called SQL SELECT Statement.
SQL WHERE Clause
The WHERE Clause is used when you want to retrieve specific information from a table excluding other irrelevant data. For example, when you want to see the information about students in class 10th only then you do need the information about the students in other class. Retrieving information about all the students would
increase the processing time for the query.
SELECT column_list FROM table-name WHERE condition;
SQL Operators
There are two type of Operators, namely Comparison Operators and Logical Operators. These operators are used mainly in the WHERE clause, HAVING clause to filter the data to be selected. Comparison operators are used to compare the column data with specific values in a condition. There are three Logical Operators namely AND, OR and NOT
Operators
Comparison Operators
= Description equal to Logical Operato rs OR Description For the row to be selected at least one of the conditions must be true.
<>, !=
< > >= <=
is not equal to
less than greater than greater than or equal to less than or equal to
AND
For a row to be selected all the specified conditions must be true. For a row to be selected the specified condition must be false.
NOT
SQL Comparison Keywords
There are other comparison keywords available in sql which are used to enhance the search capabilities of a sql query. They are "IN", "BETWEEN...AND", "IS NULL", "LIKE".
Comparision Operators LIKE Description column value is similar to specified character(s).
IN BETWEEN...AND
IS NULL
column value is equal to any one of a specified set of values.
column value is between two values, including the end values specified in the range. column value does not exist.
SQL Comparison Keywords
Example:
SELECT first_name, last_name FROM student_details WHERE first_name LIKE 'S%';
first_name ------------Stephen Shekar last_name ------------Fleming Gowda
SQL Comparison Keywords
SELECT first_name, last_name, age FROM student_details WHERE age BETWEEN 10 AND 15;
SELECT first_name, last_name, subject FROM student_details WHERE subject IN ('Maths', 'Science'); SELECT first_name, last_name FROM student_details WHERE games IS NULL
SQL ORDER BY
The ORDER BY clause is used in a SELECT statement to sort results either in ascending or descending order. Oracle sorts query results in ascending order by default. SELECT name, salary FROM employee ORDER BY salary; name salary
---------Soumya Ramesh Priya Hrithik Harsha ---------20000 25000 30000 35000 35000
SQL GROUP Functions
The Group functions are built-in SQL functions that operate on groups of rows and return one value for the entire group. These functions are: COUNT, MAX, MIN, AVG, SUM,
DISTINCT.
SELECT DISTINCT dept FROM employee; SELECT COUNT (*) FROM employee WHERE dept = 'Electronics'; SELECT MAX (salary) FROM employee; SELECT SUM (salary) FROM employee;
SQL Query
Basic form: (plus many many more bells and whistles) SELECT <attributes> FROM <one or more relations> WHERE <conditions>
Simple SQL Query
Product PName
Gizmo Powergizmo SingleTouch MultiTouch
Price
$19.99 $29.99 $149.99 $203.99
Category
Gadgets Gadgets Photography Household
Manufacturer
GizmoWorks GizmoWorks Canon Hitachi
SELECT * FROM Product WHERE category=Gadgets
PName Gizmo Price $19.99 $29.99 Category Gadgets Gadgets Manufacturer GizmoWorks GizmoWorks
selection
Powergizmo
Simple SQL Query
Product PName
Gizmo Powergizmo SingleTouch MultiTouch
Price
$19.99 $29.99 $149.99 $203.99
Category
Gadgets Gadgets Photography Household
Manufacturer
GizmoWorks GizmoWorks Canon Hitachi
SELECT PName, Price, Manufacturer FROM Product WHERE Price > 100
PName Price $149.99 $203.99 Manufacturer Canon Hitachi
selection and projection
SingleTouch MultiTouch
Notation
Input Schema
Product(PName, Price, Category, Manfacturer)
SELECT PName, Price, Manufacturer FROM Product WHERE Price > 100
Answer(PName, Price, Manfacturer)
Output Schema
Details
Case insensitive:
Same: SELECT Select select Same: Product product Different: Seattle seattle
Constants:
abc - yes abc - no
The LIKE operator
SELECT * FROM Products WHERE PName LIKE %gizmo%
s LIKE p: pattern matching on strings p may contain two special symbols:
% = any sequence of characters _ = any single character
Eliminating Duplicates
Category
SELECT DISTINCT category FROM Product
Gadgets
Photography
Household
Compare to:
Category Gadgets
SELECT category FROM Product
Gadgets Photography Household
Ordering the Results
SELECT pname, price, manufacturer FROM Product WHERE category=gizmo AND price > 50 ORDER BY price, pname Ties are broken by the second attribute on the ORDER BY list, etc. Ordering is ascending, unless you specify the DESC keyword.
PName
Price
Category
Manufacturer
Gizmo
Powergizmo SingleTouch MultiTouch
$19.99
$29.99 $149.99 $203.99
Gadgets
Gadgets Photography Household
GizmoWorks
GizmoWorks Canon Hitachi
SELECT DISTINCT category FROM Product ORDER BY category SELECT Category FROM Product ORDER BY PName SELECT DISTINCT category FROM Product ORDER BY PName
? ? ?
Keys and Foreign Keys
Company
CName StockPrice 25 65 15 Country USA Japan Japan
Key
GizmoWorks Canon Hitachi
Product
PName Gizmo Powergizmo SingleTouch MultiTouch Price $19.99 $29.99 $149.99 $203.99 Category Gadgets Gadgets Photography Household Manufacturer GizmoWorks GizmoWorks Canon Hitachi
Foreign key
Joins
Product (pname, price, category, manufacturer) Company (cname, stockPrice, country) Find all products under $200 manufactured in Japan; return their names and prices. Join between Product and Company SELECT PName, Price FROM Product, Company WHERE Manufacturer=CName AND Country=Japan AND Price <= 200
Joins
Product PName Gizmo Powergizmo SingleTouch MultiTouch Price $19.99 $29.99 $149.99 $203.99 Category Gadgets Gadgets Photography Household Manufacturer GizmoWorks GizmoWorks Canon Hitachi Company
Cname GizmoWorks
Canon Hitachi
StockPrice 25
65 15
Country USA
Japan Japan
SELECT PName, Price FROM Product, Company WHERE Manufacturer=CName AND Country=Japan AND Price <= 200
PName SingleTouch
Price $149.99
More Joins
Product (pname, price, category, manufacturer) Company (cname, stockPrice, country) Find all Chinese companies that manufacture products both in the electronic and toy categories SELECT cname
FROM
WHERE
A Subtlety(Refinement) about Joins
Product (pname, price, category, manufacturer) Company (cname, stockPrice, country) Find all countries that manufacture some product in the Gadgets category. SELECT Country FROM Product, Company WHERE Manufacturer=CName AND Category=Gadgets
Unexpected duplicates
A Subtlety about Joins
Product Name Gizmo Powergizmo SingleTouch MultiTouch Price $19.99 $29.99 $149.99 $203.99 Category Gadgets Gadgets Photography Household Manufacturer GizmoWorks GizmoWorks Canon Hitachi Company
Cname GizmoWorks
Canon Hitachi
StockPrice 25
65 15
Country USA
Japan Japan
SELECT Country FROM Product, Company WHERE Manufacturer=CName AND Category=Gadgets
Country
What is the problem ? Whats the solution ?
?? ??
Tuple Variables
Person(pname, address, worksfor) Company(cname, address)
SELECT DISTINCT pname, address FROM Person, Company WHERE worksfor = cname Which address ?
SELECT DISTINCT Person.pname, Company.address FROM Person, Company WHERE Person.worksfor = Company.cname SELECT DISTINCT x.pname, y.address FROM Person AS x, Company AS y WHERE x.worksfor = y.cname
Meaning (Semantics) of SQL Queries
SELECT a1, a2, , ak FROM R1 AS x1, R2 AS x2, , Rn AS xn WHERE Conditions
Answer = {} for x1 in R1 do for x2 in R2 do .. for xn in Rn do if Conditions then Answer = Answer {(a1,,ak)} return Answer
An Unintuitive Query
SELECT DISTINCT R.A FROM R, S, T WHERE R.A=S.A OR R.A=T.A What does it compute ? Computes R (S T) But what if S = f ?
Subqueries Returning Relations
Company(name, city) Product(pname, maker) Purchase(id, product, buyer)
Return cities where one can find companies that manufacture products bought by Joe Blow SELECT Company.city FROM Company WHERE Company.name IN (SELECT Product.maker FROM Purchase , Product WHERE Product.pname=Purchase.product AND Purchase .buyer = Joe Blow);
Subqueries Returning Relations
Is it equivalent to this ?
SELECT Company.city FROM Company, Product, Purchase WHERE Company.name= Product.maker AND Product.pname = Purchase.product AND Purchase.buyer = Joe Blow
Beware of duplicates !
Removing Duplicates
SELECT DISTINCT Company.city FROM Company WHERE Company.name IN (SELECT Product.maker FROM Purchase , Product WHERE Product.pname=Purchase.product AND Purchase .buyer = Joe Blow); SELECT DISTINCT Company.city FROM Company, Product, Purchase WHERE Company.name= Product.maker AND Product.pname = Purchase.product AND Purchase.buyer = Joe Blow
Now they are equivalent
Correlated Queries
Movie (title, year, director, length) Find movies whose title appears more than once.
correlation
SELECT DISTINCT title FROM Movie AS x WHERE year <> ANY (SELECT year FROM Movie WHERE title = x.title);
Note (1) scope of variables (2) this can still be expressed as single SFW
Complex Correlated Query
Product ( pname, price, category, maker, year) Find products (and their manufacturers) that are more expensive than all products made by the same manufacturer before 1972
SELECT DISTINCT pname, maker FROM Product AS x WHERE price > ALL (SELECT price FROM Product AS y WHERE x.maker = y.maker AND y.year < 1972);
Very powerful ! Also much harder to optimize.
Aggregation
SELECT avg(price) FROM Product WHERE maker=Toyota SELECT count(*) FROM Product WHERE year > 1995
SQL supports several aggregation operations: sum, count, min, max, avg
Except count, all aggregations apply to a single attribute
Aggregation: Count
COUNT applies to duplicates, unless otherwise stated: SELECT Count(category) FROM Product WHERE year > 1995 We probably want: SELECT Count(DISTINCT category) FROM Product WHERE year > 1995 same as Count(*)
More Examples
Purchase(product, date, price, quantity)
SELECT Sum(price * quantity) FROM Purchase
What do they mean ? SELECT Sum(price * quantity) FROM Purchase WHERE product = bagel
Purchase
Simple Aggregations
Date Price Quantity
Product
Bagel
Banana Banana
10/21
10/3 10/10
1
0.5 1
20
10 10
Bagel
10/25
1.50
20
SELECT Sum(price * quantity) FROM Purchase WHERE product = bagel
50 (= 20+30)
Grouping and Aggregation
Purchase(product, date, price, quantity) Find total sales after 10/1/2005 per product.
SELECT FROM WHERE GROUP BY
product, Sum(price*quantity) AS TotalSales Purchase date > 10/1/2005 product
Lets see what this means
Grouping and Aggregation
1. Compute the FROM and WHERE clauses.
2. Group by the attributes in the GROUPBY
3. Compute the SELECT clause: grouped attributes and aggregates.
1&2. FROM-WHERE-GROUPBY
Product Bagel Bagel Banana Banana Date 10/21 10/25 10/3 10/10 Price 1 1.50 0.5 1 Quantity 20 20 10 10
3. SELECT
Product
Bagel Bagel Banana
Date
10/21 10/25 10/3
Price
1 1.50 0.5
Quantity
20 20 10
Product
Bagel Banana
TotalSales
50 15
Banana
10/10
10
SELECT FROM WHERE GROUP BY
product, Sum(price*quantity) AS TotalSales Purchase date > 10/1/2005 product
GROUP BY v.s. Nested Quereis
SELECT product, Sum(price*quantity) AS TotalSales FROM Purchase WHERE date > 10/1/2005 GROUP BY product
SELECT DISTINCT x.product, (SELECT Sum(y.price*y.quantity) FROM Purchase y WHERE x.product = y.product AND y.date > 10/1/2005) AS TotalSales FROM Purchase x WHERE x.date > 10/1/2005
Another Example
What does it mean ? SELECT product, sum(price * quantity) AS SumSales max(quantity) AS MaxQuantity FROM Purchase GROUP BY product
HAVING Clause
Same query, except that we consider only products that had at least 100 buyers. SELECT product, Sum(price * quantity) FROM Purchase WHERE date > 10/1/2005 GROUP BY product HAVING Sum(quantity) > 30 HAVING clause contains conditions on aggregates.
Advanced SQLizing
1. Getting around INTERSECT and EXCEPT
2. Quantifiers 3. Aggregation v.s. subqueries
INTERSECT and EXCEPT: not in SQL Server
1. INTERSECT and EXCEPT:
(SELECT R.A, R.B FROM R) INTERSECT (SELECT S.A, S.B FROM S) SELECT R.A, R.B FROM R WHERE EXISTS(SELECT * FROM S WHERE R.A=S.A and R.B=S.B)
If R, S have no duplicates, then can write without subqueries (HOW ?)
(SELECT R.A, R.B FROM R) EXCEPT (SELECT S.A, S.B FROM S)
SELECT R.A, R.B FROM R WHERE NOT EXISTS(SELECT * FROM S WHERE R.A=S.A and R.B=S.B)
2. Quantifiers
Product ( pname, price, company) Company( cname, city) Find all companies that make some products with price < 100
SELECT DISTINCT Company.cname FROM Company, Product WHERE Company.cname = Product.company and Product.price < 100
Existential: easy !
2. Quantifiers
Product ( pname, price, company) Company( cname, city) Find all companies that make only products with price < 100
same as: Find all companies s.t. all of their products have price < 100
Universal: hard !
2. Quantifiers
1. Find the other companies: i.e. s.t. some product 100
SELECT DISTINCT Company.cname FROM Company WHERE Company.cname IN (SELECT Product.company FROM Product WHERE Produc.price >= 100
2. Find all companies s.t. all their products have price < 100
SELECT DISTINCT Company.cname FROM Company WHERE Company.cname NOT IN (SELECT Product.company FROM Product WHERE Produc.price >= 100
3. Group-by v.s. Nested Query
Author(login,name) Wrote(login,url) Find authors who wrote 10 documents: This is SQL by Attempt 1: with nested queries
SELECT DISTINCT Author.name FROM Author WHERE count(SELECT Wrote.url FROM Wrote WHERE Author.login=Wrote.login) > 10
a novice
3. Group-by v.s. Nested Query
Find all authors who wrote at least 10 documents: Attempt 2: SQL style (with GROUP BY)
SELECT Author.name FROM Author, Wrote WHERE Author.login=Wrote.login GROUP BY Author.name HAVING count(wrote.url) > 10 This is SQL by an expert
No need for DISTINCT: automatically from GROUP BY
3. Group-by v.s. Nested Query
Author(login,name) Wrote(login,url) Mentions(url,word)
Find authors with vocabulary 10000 words:
SELECT Author.name FROM Author, Wrote, Mentions WHERE Author.login=Wrote.login AND Wrote.url=Mentions.url GROUP BY Author.name HAVING count(distinct Mentions.word) > 10000
Two Examples
Store(sid, sname) Product(pid, pname, price, sid)
Find all stores that sell only products with price > 100
same as:
Find all stores s.t. all their products have price > 100)
SELECT Store.name FROM Store, Product WHERE Store.sid = Product.sid GROUP BY Store.sid, Store.name HAVING 100 < min(Product.price)
Why both ?
SELECT Store.name FROM Store Almost equivalent WHERE 100 < ALL (SELECT Product.price FROM product WHERE Store.sid = Product.sid) SELECT Store.name FROM Store WHERE Store.sid NOT IN (SELECT Product.sid FROM Product WHERE Product.price <= 100)
Two Examples
Store(sid, sname) Product(pid, pname, price, sid)
For each store, find its most expensive product
Two Examples
This is easy but doesnt do what we want: SELECT Store.sname, max(Product.price) FROM Store, Product WHERE Store.sid = Product.sid GROUP BY Store.sid, Store.sname Better: SELECT Store.sname, x.pname FROM Store, Product x WHERE Store.sid = x.sid and x.price >= ALL (SELECT y.price FROM Product y WHERE Store.sid = y.sid)
But may return multiple product names per store
Two Examples
Finally, choose some pid arbitrarily, if there are many with highest price: SELECT Store.sname, max(x.pname) FROM Store, Product x WHERE Store.sid = x.sid and x.price >= ALL (SELECT y.price FROM Product y WHERE Store.sid = y.sid) GROUP BY Store.sname
NULLS in SQL
Whenever we dont have a value, we can put a NULL Can mean many things:
Value does not exists Value exists but is unknown Value not applicable Etc.
The schema specifies for each attribute if can be null (nullable attribute) or not How does SQL cope with tables that have NULLs ?
Null Values
If x= NULL then 4*(3-x)/7 is still NULL
If x= NULL then x=Joe is UNKNOWN In SQL there are three boolean values:
FALSE UNKNOWN TRUE = = = 0 0.5 1
Null Values
C1 AND C2 = min(C1, C2) C1 OR C2 = max(C1, C2) NOT C1 = 1 C1
SELECT * FROM Person WHERE (age < 25) AND (height > 6 OR weight > 190) Rule in SQL: include only tuples that yield TRUE
E.g. age=20 heigth=NULL weight=200
Null Values
Unexpected behavior:
SELECT * FROM Person WHERE age < 25 OR age >= 25 Some Persons are not included !
Null Values
Can test for NULL explicitly:
x IS NULL x IS NOT NULL
SELECT * FROM Person WHERE age < 25 OR age >= 25 OR age IS NULL
Now it includes all Persons
Outerjoins
Explicit joins in SQL = inner joins: Product(name, category) Purchase(prodName, store)
SELECT Product.name, Purchase.store FROM Product JOIN Purchase ON Product.name = Purchase.prodName
Same as:
SELECT Product.name, Purchase.store FROM Product, Purchase WHERE Product.name = Purchase.prodName But Products that never sold will be lost !
Outerjoins
Left outer joins in SQL: Product(name, category) Purchase(prodName, store)
SELECT Product.name, Purchase.store FROM Product LEFT OUTER JOIN Purchase ON Product.name = Purchase.prodName
Product
Name Gizmo Camera OneClick Category gadget Photo Photo Name Gizmo Camera Camera OneClick
Purchase
ProdName Gizmo Camera Camera Store Wiz Ritz Wiz NULL Store Wiz Ritz Wiz
Application
Compute, for each product, the total number of sales in September Product(name, category) Purchase(prodName, month, store) SELECT Product.name, count(*) FROM Product, Purchase WHERE Product.name = Purchase.prodName and Purchase.month = September GROUP BY Product.name
Whats wrong ?
Application
Compute, for each product, the total number of sales in September Product(name, category) Purchase(prodName, month, store)
SELECT Product.name, count(*) FROM Product LEFT OUTER JOIN Purchase ON Product.name = Purchase.prodName and Purchase.month = September GROUP BY Product.name Now we also get the products who sold in 0 quantity
Outer Joins
Left outer join:
Include the left tuple even if theres no match
Right outer join:
Include the right tuple even if theres no match
Full outer join:
Include the both left and right tuples even if theres no match
Modifying the Database
Three kinds of modifications Insertions Deletions Updates
Sometimes they are all called updates
Insertions
General form: INSERT INTO R(A1,., An) VALUES (v1,., vn)
Example: Insert a new purchase to the database: INSERT INTO Purchase(buyer, seller, product, store) VALUES (Joe, Fred, wakeup-clock-espresso-machine, The Sharper Image) Missing attribute NULL. May drop attribute names if give them in order.
Insertions
INSERT INTO PRODUCT(name) SELECT DISTINCT Purchase.product FROM Purchase WHERE Purchase.date > 10/26/01
The query replaces the VALUES keyword. Here we insert many tuples into PRODUCT
Insertion: an Example
Product(name, listPrice, category) Purchase(prodName, buyerName, price) prodName is foreign key in Product.name
Suppose database got corrupted and we need to fix it:
Product
name
gizmo
Purchase
prodName buyerName John Smith Smith price 200 80 225
listPrice
100
category
gadgets
camera gizmo camera
Task: insert in Product all prodNames from Purchase
Insertion: an Example
INSERT INTO Product(name)
SELECT DISTINCT prodName FROM Purchase WHERE prodName NOT IN (SELECT name FROM Product)
name gizmo camera
listPrice 100 -
category Gadgets -
Insertion: an Example
INSERT INTO Product(name, listPrice)
SELECT DISTINCT prodName, price FROM Purchase WHERE prodName NOT IN (SELECT name FROM Product)
name gizmo camera listPrice 100 200 category Gadgets -
camera ??
225 ??
Depends on the implementation
Deletions
Example: DELETE FROM WHERE PURCHASE
seller = Joe AND product = Brooklyn Bridge
Factoid about SQL: there is no way to delete only a single
occurrence of a tuple that appears twice
in a relation.
Updates
Example: UPDATE PRODUCT SET price = price/2 WHERE Product.name IN (SELECT product FROM Purchase WHERE Date =Oct, 25, 1999);