Day1.1 DBMS
Day1.1 DBMS
1
Introduction
Database: A collection of related data. It should
support
– Definition
– Construction
– Manipulation
4
Databases Everyday
You may not notice it, but databases
are behind almost everything you do on
the Web.
Google searches.
Queries at Amazon, eBay, etc.
5
And More…
Databases often have unique
concurrency-control problems
Many activities (transactions) at the
database at all times.
Must not confuse actions, e.g., two
withdrawals from the same account must
each debit the account.
6
What is a Data Model?
1. Mathematical representation of data.
Examples: relational model = tables;
semistructured model = trees/graphs.
2. Operations on data.
3. Constraints.
7
A Relation is a Table
Attributes
(column
headers) name manf
Tuples Winterbrew Pete’s
(rows) Bud Lite Anheuser-Busch
Beers
Relation
name
8
Schemas
Relation schema = relation name and
attribute list.
Optionally: types of attributes.
Example: Beers(name, manf) or
Beers(name: string, manf: string)
Database = collection of relations.
Database schema = set of all relation
schemas in the database.
9
Why Relations?
Very simple model.
Often matches how we think about
data.
Abstract model that underlies SQL, the
most important database language
today.
10
Our Running Example
Beers(name, manf)
Bars(name, addr, license)
Drinkers(name, addr, phone)
Likes(drinker, beer)
Sells(bar, beer, price)
Frequents(drinker, bar)
Underline = key (tuples cannot have
the same value in all key attributes).
Excellent example of a constraint.
11
PRIMARY KEY vs. UNIQUE
1. There can be only one PRIMARY KEY
for a relation, but several UNIQUE
attributes.
2. No attribute of a PRIMARY KEY can
ever be NULL in any tuple. But
attributes declared UNIQUE may have
NULL’s, and there may be several
tuples with NULL.
12
Entity-Relationship Model
E/R Diagrams
Weak Entity Sets
Converting E/R Diagrams to Relations
13
Purpose of E/R Model
14
Framework for E/R
• Design is a serious business.
• The “boss” knows they want a database, but
they don’t know what they want in it.
• Sketching the key components is an efficient
way to develop a working database.
15
Entity Sets
16
E/R Diagrams
• In an entity-relationship diagram:
– Entity set = rectangle.
– Attribute = oval, with a line to the rectangle
representing its entity set.
17
Example:
name manf
Beers
18
Relationships
• A relationship connects two or more entity
sets.
• It is represented by a diamond, with lines to
each of the entity sets involved.
19
Example: Relationships
name addr name manf
license
Drinkers like
some beers.
Frequents Likes
Note:
license = Drinkers frequent
beer, full, some bars.
none Drinkers
name addr
20
Relationship Set
• The current “value” of an entity set is the set
of entities that belong to it.
– Example: the set of all bars in our database.
• The “value” of a relationship is a relationship
set, a set of tuples with one component for
each related entity set.
21
Example: Relationship Set
• For the relationship Sells, we might have a
relationship set like:
Bar Beer
Joe’s Bar Bud
Joe’s Bar Miller
Sue’s Bar Bud
Sue’s Bar Pete’s Ale
Sue’s Bar Bud Lite
22
Multiway Relationships
• Sometimes, we need a relationship that
connects more than two entity sets.
• Suppose that drinkers will only drink certain
beers at certain bars.
– Our three binary relationships Likes, Sells, and
Frequents do not allow us to make this distinction.
– But a 3-way relationship would.
23
Example: 3-Way Relationship
name addr name manf
Preferences
Drinkers
name addr
24
Relational Algebra
Basic Operations
25
What is an “Algebra”
Mathematical system consisting of:
Operands --- variables or values from
which new values can be constructed.
Operators --- symbols denoting procedures
that construct new values from given
values.
26
What is Relational Algebra?
An algebra whose operands are
relations or variables that represent
relations.
Operators are designed to do the most
common things that we need to do with
relations in a database.
The result is an algebra that can be used
as a query language for relations.
27
Core Relational Algebra
Union, intersection, and difference.
Usual set operations, but both operands
must have the same relation schema.
Selection: picking certain rows.
Projection: picking certain columns.
Products and joins: compositions of
relations.
Renaming of relations and attributes.
28
Selection
R1 := σC (R2)
C is a condition (as in “if” statements) that
refers to attributes of R2.
R1 is all those tuples of R2 that satisfy C.
29
Example: Selection
Relation Sells:
bar beer price
Joe’s Bud 2.50
Joe’s Miller 2.75
Sue’s Bud 2.50
Sue’s Miller 3.00
JoeMenu := σbar=“Joe’s”(Sells):
bar beer price
Joe’s Bud 2.50
Joe’s Miller 2.75
30
Projection
R1 := πL (R2)
L is a list of attributes from the schema of
R2.
R1 is constructed by looking at each tuple
of R2, extracting the attributes on list L, in
the order specified, and creating from
those components a tuple for R1.
Eliminate duplicate tuples, if any.
31
Example: Projection
Relation Sells:
bar beer price
Joe’s Bud 2.50
Joe’s Miller 2.75
Sue’s Bud 2.50
Sue’s Miller 3.00
Prices := πbeer,price(Sells):
beer price
Bud 2.50
Miller 2.75
Miller 3.00
32
Extended Projection
Using the same πL operator, we allow
the list L to contain arbitrary
expressions involving attributes:
1. Arithmetic on attributes, e.g., A+B->C.
2. Duplicate occurrences of the same
attribute.
33
Example: Extended Projection
R= (A B)
1 2
3 4
πA+B->C,A,A (R) = C A1 A2
3 1 1
7 3 3
34
Product
R3 := R1 Χ R2
Pair each tuple t1 of R1 with each tuple t2 of
R2.
Concatenation t1t2 is a tuple of R3.
Schema of R3 is the attributes of R1 and then
R2, in order.
But beware attribute A of the same name in
R1 and R2: use R1.A and R2.A.
35
Example: R3 := R1 Χ R2
R1( A, B) R3( A, R1.B, R2.B, C )
1 2 1 2 5 6
3 4 1 2 7 8
1 2 9 10
R2( B, C) 3 4 5 6
5 6 3 4 7 8
7 8 3 4 9 10
9 10
36
Theta-Join
R3 := R1 ⋈C R2
Take the product R1 Χ R2.
Then apply σC to the result.
As for σ, C can be any boolean-valued
condition.
Historic versions of this operator allowed
only A B, where is =, <, etc.; hence
the name “theta-join.”
37
Example: Theta Join
Sells( bar, beer, price ) Bars( name, addr )
Joe’s Bud 2.50 Joe’s Maple St.
Joe’s Miller 2.75 Sue’s River Rd.
Sue’s Bud 2.50
Sue’s Coors 3.00
39
Example: Natural Join
Sells( bar, beer, price ) Bars( bar, addr )
Joe’s Bud 2.50 Joe’s Maple St.
Joe’s Miller 2.75 Sue’s River Rd.
Sue’s Bud 2.50
Sue’s Coors 3.00
42
Design Theory for Relational
Databases
Functional Dependencies
Decompositions
Normal Forms
43
Functional Dependencies
• X ->Y is an assertion about a relation R that
whenever two tuples of R agree on all the attributes
of X, then they must also agree on all attributes in set
Y.
– Say “X ->Y holds in R.”
– Convention: …, X, Y, Z represent sets of attributes; A, B,
C,… represent single attributes.
– Convention: no set formers in sets of attributes, just ABC,
rather than {A,B,C }.
44
Splitting Right Sides of FD’s
45
Example: FD’s
Drinkers(name, addr, beersLiked, manf, favBeer)
• Reasonable FD’s to assert:
1. name -> addr favBeer
Note this FD is the same as name -> addr and name ->
favBeer.
2. beersLiked -> manf
46
Example: Possible Data
47
Keys of Relations
• K is a superkey for relation R if K
functionally determines all of R.
• K is a key for R if K is a superkey, but no
proper subset of K is a superkey.
48
Example: Superkey
Drinkers(name, addr, beersLiked, manf,
favBeer)
• {name, beersLiked} is a superkey because
together these attributes determine all the
other attributes.
– name -> addr favBeer
– beersLiked -> manf
49
Example: Key
• {name, beersLiked} is a key because neither
{name} nor {beersLiked} is a superkey.
– name doesn’t -> manf; beersLiked doesn’t -> addr.
• There are no other keys, but lots of superkeys.
– Any superset of {name, beersLiked}.
50
Relational Schema Design
• Goal of relational schema design is to avoid
anomalies and redundancy.
– Update anomaly : one occurrence of a fact is
changed, but not all occurrences.
– Deletion anomaly : valid fact is lost when a tuple is
deleted.
51
Example of Bad Design
52
This Bad Design Also
Exhibits Anomalies
53
Boyce-Codd Normal Form
• We say a relation R is in BCNF if whenever X -
>Y is a nontrivial FD that holds in R, X is a
superkey.
– Remember: nontrivial means Y is not contained
in X.
– Remember, a superkey is any superset of a key
(not necessarily a proper superset).
54
Example
Drinkers(name, addr, beersLiked, manf, favBeer)
FD’s: name->addr favBeer, beersLiked->manf
• Only key is {name, beersLiked}.
• In each FD, the left side is not a superkey.
• Any one of these FD’s shows Drinkers is not in
BCNF
55
Third Normal Form -- Motivation
• There is one structure of FD’s that causes
trouble when we decompose.
• AB ->C and C ->B.
– Example: A = street address, B = city, C = zip
code.
• There are two keys, {A,B } and {A,C }.
• C ->B is a BCNF violation, so we must
decompose into AC, BC.
56
We Cannot Enforce FD’s
• The problem is that if we use AC and BC as
our database schema, we cannot enforce the
FD AB ->C by checking FD’s in these
decomposed relations.
• Example with A = street, B = city, and C = zip
on the next slide.
57
An Unenforceable FD
58
3NF Let’s Us Avoid This Problem
• 3rd Normal Form (3NF) modifies the BCNF
condition so we do not have to decompose
in this problem situation.
• An attribute is prime if it is a member of any
key.
• X ->A violates 3NF if and only if X is not a
superkey, and also A is not prime.
59
Example: 3NF
• In our problem situation with FD’s AB ->C
and C ->B, we have keys AB and AC.
• Thus A, B, and C are each prime.
• Although C ->B violates BCNF, it does not
violate 3NF.
60
What 3NF and BCNF Give You
61
3NF and BCNF -- Continued
62
Why It Works
• Preserves dependencies: each FD from a
minimal basis is contained in a relation,
thus preserved.
• Lossless Join: use the chase to show that
the row for the relation that contains a key
can be made all-unsubscripted variables.
• 3NF: hard part – a property of minimal
bases.
63