Database Cheatsheet
Database Cheatsheet
Chapter 1
DBMS - database management system - is a specialized SW for efficiently managing large amount of -
mostly structured - data. A DBMS is capable of
data model, which specifies a logical structure of the data, called schema
high-level query language
efficient persistent storage system
transaction management, an atomic unit of work
Data Model
...is a collection of conceptual tools for describing
data
relationships among various data
data semantics
consistency constraints
It provides users with an abstract view of the data. There are data models like
relational model
entity-relationship data model, which is mainly for database design
object-based data models
semi-structured data model
et cetera
Relational Model
Levels of Abstraction
Physical, logical, and view level
1
Data Independence
ability to make physical or logical level changes without affecting application programs
domain constraints
Chapter 2
Terms
A relation schema R(A1 , ..., An ) is, like, a table, having its attributes and all.
logical design of a database
A relation instance r(R) defined over schema is a set of rows
a snapshot of the data in the database at a given instant in time
A tuple is an element of a relation, aka row in a table
Keys
A key K ⊆ R = {A1 , ..., An }
K is a superkey if values for K can uniquely identify a tuple of each possible relation r(R)
2
Relational Algebra
Basic Operators
selection σ
σp (r) where p is the selection predicate.
relation consisting of select rows that satisfy the predicate
projection Π
ΠA1 ,...,Ak (r) where Ai are attribute names
relation of k listed columns
since relations are sets, duplicate rows are removed
cartesian product ×
a tuple from each possible pair of tuples
If an attribute of same name exists in both relations, we need to distinguish them.
join r ▷◁θ s = σθ (r × s)
union ∪, difference −
relations must have the same arity, i.e., same number of attributes
...and the attribute domains must be compatible
rename ρ
ρx (E), where E is a relational algebra expression
A result of E, by default, does not have a name to be referred to by.
Can also be used as ρx(A1,...,Ak) (E)
These are fundamental operators of relational algebra, which cannot be written in terms of others.
Extended Operators
duplicate-elimination δ
grouping γ
grouping attributes γaggregation (r)
outer join ▷◁
assignment ←
A query can be written as a sequential program consisting of a series of assignments.
3
Chapter 3
DDL in SQL
We have data types like:
char(n): fixed-length string
varchar(n): string with maximum length n
int: integer, machine-dependent
smallint: small integer, machine-dependent
numeric(p, d): aka decimal, fixed point number
e.g. numeric(3,2) can store 3.24
real, double precision: floating point, machine-dependent
We create tables like:
CREATE TABLE table_name (
attr0_name attr0_type,
attr1_name attr1_type NOT NULL,
...
attrn_name attrn_type,
PRIMARY KEY (attrp0_name, ..., attrpm_name),
FOREIGN KEY (attrf0_name, ..., attrfk_name) REFERENCES other_table
)
We alter tables like:
ALTER TABLE r
ADD attr_name attr_type
or:
ALTER TABLE r
DROP attr_name
We drop tables like:
... ROBERT’);
DROP TABLE students; --
I bet you are already familiar with SELECT FROM WHERE. Beware to use DISTINCT or ALL keywords
to explicitly eliminate or keep duplicate rows.
String Operations
We have LIKE operator, % for zero or more of any characters, for any single character, specifiable
escape character using ESCAPE operator. MySQL is not case sensitive even even for LIKE operation.
We can also concatenate with —— operator, convert case with UPPER() and LOWER() functions, get
length or extract substring with LEN() and SUBSTRING() functions, etc.
Clauses
WHERE is a clause.
HAVING applies to each group, while WHERE applies to each tuple before forming groups.
ORDER BY can order the tuples of the result by one or more attributes. You can specify ASC(default)
or DESC per attribute, like:
ORDER BY dept_name ASC, gpa DESC
GROUP BY is a clause to be used with aggregate functions. Result can only contain aggregate values
and/or the grouping attribute(s).
WITH is a clause to define a temporary relation, called common table expression (CTE),
4
Clause Predicates
BETWEEN operator like:
WHERE gpa BETWEEN 2.7 AND 3.7
Row constructor, like:
SELECT name, course_id
FROM instructor, teaches
WHERE (instructor.ID, dept_name) = (teaches.ID, ’History’)
...honestly not seeing much point here.
Set Operations
UNION, INTERSECT, and EXCEPT(set difference), these are DISTINCT by default, specify ALL
keyword when needed.
Subqueries
In the following SQL query:
SELECT a1, a2, ..., an
FROM r1, r2, ..., rm
WHERE p
ai can be replaced by a subquery that generates a single value, aka scalar subquery
ri can be replaced by any valid subquery
P can be replaced with an expression of the form ”attribute <operation>(subquery)”
UPDATE instructor
SET salary = salary * 1.05
WHERE ...;
UPDATE instructor
SET salary = CASE
WHEN salary <= 100000 THEN salary * 1.05
ELSE salary * 1.03
END;
5
Chapter 4
Join
There are either inner or left/right/full outer join, depending on how unmatched(dangling) tuples are
treated.
We can use ON predicate/NATURAL/USING (attrs) for join condition.
That was inner join, the default way. Outer join, in addition to the result of inner join, keeps tuples
that have no match. There are LEFT/RIGHT/FULL OUTER JOIN to keep unmatched tuples from
left/right/both operand relations.
There is JOIN ... ON predicate, which is about the same as WHERE.
View
A view is a virtual relation, i.e. is not stored.
CREATE VIEW viewname AS (query);
But can we modify a view to modify stored relation(s)? Most SQL implementations allow updates only
on simple views, where
FROM: only one relation
Then there is materialized view, a view that is physically stored. How do we keep it up-to-date? Periodic
reconstruction: unacceptable for applications requiring up-to-date data. Incremental maintenance: only
recompute parts that are affected by the changes of underlying base tables
Transaction
A: atomicity
C: consistency
I: isolation
D: durability
Integrity Constraints
On a single relation: NOT NULL, PRIMARY KEY, UNIQUE(=superkey), CHECK (predicate)
Referential integrity: like, if a row in the instructor table has value ”Biology” for attribute deptName,
then there must exist a row in department table whose deptName is ”Biology”.
Foreign key constraint: value in R.A must appear in the primary key of S. A is called a foreign key. And
NULL is allowed unless declared NOT NULL.