0% found this document useful (0 votes)
17 views63 pages

Day1.1 DBMS

SEBI DBMS notes

Uploaded by

Rishav Dhama
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views63 pages

Day1.1 DBMS

SEBI DBMS notes

Uploaded by

Rishav Dhama
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 63

Module 1: Database Concepts

1
Introduction
Database: A collection of related data. It should
support
– Definition
– Construction
– Manipulation

Database Management System: A collection of


programs that enable the users to create and
maintain a database.
Databases
It used to be about boring stuff:
employee records, bank records, etc.
Today, the field covers all the largest
sources of data, with many new ideas.
Web search.
Data mining.
Scientific and medical databases.
Integrating information.
3
More Interesting Stuff
Database programming centers around
limited programming languages.
Leads to very succinct programming, but
also to unique query-optimization problems

4
Databases Everyday
You may not notice it, but databases
are behind almost everything you do on
the Web.
Google searches.
Queries at Amazon, eBay, etc.

5
And More…
Databases often have unique
concurrency-control problems
Many activities (transactions) at the
database at all times.
Must not confuse actions, e.g., two
withdrawals from the same account must
each debit the account.

6
What is a Data Model?
1. Mathematical representation of data.
Examples: relational model = tables;
semistructured model = trees/graphs.
2. Operations on data.
3. Constraints.

7
A Relation is a Table
Attributes
(column
headers) name manf
Tuples Winterbrew Pete’s
(rows) Bud Lite Anheuser-Busch

Beers
Relation
name
8
Schemas
Relation schema = relation name and
attribute list.
Optionally: types of attributes.
Example: Beers(name, manf) or
Beers(name: string, manf: string)
Database = collection of relations.
Database schema = set of all relation
schemas in the database.

9
Why Relations?
Very simple model.
Often matches how we think about
data.
Abstract model that underlies SQL, the
most important database language
today.

10
Our Running Example
Beers(name, manf)
Bars(name, addr, license)
Drinkers(name, addr, phone)
Likes(drinker, beer)
Sells(bar, beer, price)
Frequents(drinker, bar)
Underline = key (tuples cannot have
the same value in all key attributes).
Excellent example of a constraint.
11
PRIMARY KEY vs. UNIQUE
1. There can be only one PRIMARY KEY
for a relation, but several UNIQUE
attributes.
2. No attribute of a PRIMARY KEY can
ever be NULL in any tuple. But
attributes declared UNIQUE may have
NULL’s, and there may be several
tuples with NULL.
12
Entity-Relationship Model

E/R Diagrams
Weak Entity Sets
Converting E/R Diagrams to Relations

13
Purpose of E/R Model

• The E/R model allows us to sketch database


schema designs.
– Includes some constraints, but not operations.
• Designs are pictures called entity-
relationship diagrams.
• Later: convert E/R designs to relational DB
designs.

14
Framework for E/R
• Design is a serious business.
• The “boss” knows they want a database, but
they don’t know what they want in it.
• Sketching the key components is an efficient
way to develop a working database.

15
Entity Sets

• Entity = “thing” or object.


• Entity set = collection of similar entities.
– Similar to a class in object-oriented languages.
• Attribute = property of (the entities of) an
entity set.
– Attributes are simple values, e.g. integers or
character strings, not structs, sets, etc.

16
E/R Diagrams
• In an entity-relationship diagram:
– Entity set = rectangle.
– Attribute = oval, with a line to the rectangle
representing its entity set.

17
Example:
name manf

Beers

• Entity set Beers has two attributes, name and manf


(manufacturer).
• Each Beers entity has values for these two attributes,
e.g. (Bud, Anheuser-Busch)

18
Relationships
• A relationship connects two or more entity
sets.
• It is represented by a diamond, with lines to
each of the entity sets involved.

19
Example: Relationships
name addr name manf

Bars sell some


Bars Sells Beers
beers.

license
Drinkers like
some beers.
Frequents Likes
Note:
license = Drinkers frequent
beer, full, some bars.
none Drinkers
name addr

20
Relationship Set
• The current “value” of an entity set is the set
of entities that belong to it.
– Example: the set of all bars in our database.
• The “value” of a relationship is a relationship
set, a set of tuples with one component for
each related entity set.

21
Example: Relationship Set
• For the relationship Sells, we might have a
relationship set like:

Bar Beer
Joe’s Bar Bud
Joe’s Bar Miller
Sue’s Bar Bud
Sue’s Bar Pete’s Ale
Sue’s Bar Bud Lite

22
Multiway Relationships
• Sometimes, we need a relationship that
connects more than two entity sets.
• Suppose that drinkers will only drink certain
beers at certain bars.
– Our three binary relationships Likes, Sells, and
Frequents do not allow us to make this distinction.
– But a 3-way relationship would.

23
Example: 3-Way Relationship
name addr name manf

license Bars Beers

Preferences

Drinkers

name addr
24
Relational Algebra

Basic Operations

25
What is an “Algebra”
Mathematical system consisting of:
Operands --- variables or values from
which new values can be constructed.
Operators --- symbols denoting procedures
that construct new values from given
values.

26
What is Relational Algebra?
An algebra whose operands are
relations or variables that represent
relations.
Operators are designed to do the most
common things that we need to do with
relations in a database.
The result is an algebra that can be used
as a query language for relations.
27
Core Relational Algebra
Union, intersection, and difference.
Usual set operations, but both operands
must have the same relation schema.
Selection: picking certain rows.
Projection: picking certain columns.
Products and joins: compositions of
relations.
Renaming of relations and attributes.
28
Selection
R1 := σC (R2)
C is a condition (as in “if” statements) that
refers to attributes of R2.
R1 is all those tuples of R2 that satisfy C.

29
Example: Selection
Relation Sells:
bar beer price
Joe’s Bud 2.50
Joe’s Miller 2.75
Sue’s Bud 2.50
Sue’s Miller 3.00

JoeMenu := σbar=“Joe’s”(Sells):
bar beer price
Joe’s Bud 2.50
Joe’s Miller 2.75
30
Projection
R1 := πL (R2)
L is a list of attributes from the schema of
R2.
R1 is constructed by looking at each tuple
of R2, extracting the attributes on list L, in
the order specified, and creating from
those components a tuple for R1.
Eliminate duplicate tuples, if any.

31
Example: Projection
Relation Sells:
bar beer price
Joe’s Bud 2.50
Joe’s Miller 2.75
Sue’s Bud 2.50
Sue’s Miller 3.00

Prices := πbeer,price(Sells):
beer price
Bud 2.50
Miller 2.75
Miller 3.00
32
Extended Projection
Using the same πL operator, we allow
the list L to contain arbitrary
expressions involving attributes:
1. Arithmetic on attributes, e.g., A+B->C.
2. Duplicate occurrences of the same
attribute.

33
Example: Extended Projection
R= (A B)
1 2
3 4

πA+B->C,A,A (R) = C A1 A2
3 1 1
7 3 3

34
Product
R3 := R1 Χ R2
Pair each tuple t1 of R1 with each tuple t2 of
R2.
Concatenation t1t2 is a tuple of R3.
Schema of R3 is the attributes of R1 and then
R2, in order.
But beware attribute A of the same name in
R1 and R2: use R1.A and R2.A.

35
Example: R3 := R1 Χ R2
R1( A, B) R3( A, R1.B, R2.B, C )
1 2 1 2 5 6
3 4 1 2 7 8
1 2 9 10
R2( B, C) 3 4 5 6
5 6 3 4 7 8
7 8 3 4 9 10
9 10

36
Theta-Join
R3 := R1 ⋈C R2
Take the product R1 Χ R2.
Then apply σC to the result.
As for σ, C can be any boolean-valued
condition.
Historic versions of this operator allowed
only A  B, where  is =, <, etc.; hence
the name “theta-join.”
37
Example: Theta Join
Sells( bar, beer, price ) Bars( name, addr )
Joe’s Bud 2.50 Joe’s Maple St.
Joe’s Miller 2.75 Sue’s River Rd.
Sue’s Bud 2.50
Sue’s Coors 3.00

BarInfo := Sells ⋈Sells.bar = Bars.name Bars


BarInfo( bar, beer, price, name, addr )
Joe’s Bud 2.50 Joe’s Maple St.
Joe’s Miller 2.75 Joe’s Maple St.
Sue’s Bud 2.50 Sue’s River Rd.
Sue’s Coors 3.00 Sue’s River Rd.
38
Natural Join
A useful join variant (natural join)
connects two relations by:
Equating attributes of the same name, and
Projecting out one copy of each pair of
equated attributes.
Denoted R3 := R1 ⋈ R2.

39
Example: Natural Join
Sells( bar, beer, price ) Bars( bar, addr )
Joe’s Bud 2.50 Joe’s Maple St.
Joe’s Miller 2.75 Sue’s River Rd.
Sue’s Bud 2.50
Sue’s Coors 3.00

BarInfo := Sells ⋈ Bars


Note: Bars.name has become Bars.bar to make the natural
join “work.”
BarInfo( bar, beer, price, addr )
Joe’s Bud 2.50 Maple St.
Joe’s Milller 2.75 Maple St.
Sue’s Bud 2.50 River Rd.
Sue’s Coors 3.00 River Rd. 40
Sequences of Assignments
Create temporary relation names.
Renaming can be implied by giving
relations a list of attributes.
Example: R3 := R1 ⋈C R2 can be
written:
R4 := R1 Χ R2
R3 := σC (R4)
41
Expressions in a Single Assignment
Example: the theta-join R3 := R1 ⋈C R2
can be written: R3 := σC (R1 Χ R2)
Precedence of relational operators:
1. [σ, π, ρ] (highest).
2. [Χ, ⋈].
3. ∩.
4. [∪, —]

42
Design Theory for Relational
Databases
Functional Dependencies
Decompositions
Normal Forms

43
Functional Dependencies
• X ->Y is an assertion about a relation R that
whenever two tuples of R agree on all the attributes
of X, then they must also agree on all attributes in set
Y.
– Say “X ->Y holds in R.”
– Convention: …, X, Y, Z represent sets of attributes; A, B,
C,… represent single attributes.
– Convention: no set formers in sets of attributes, just ABC,
rather than {A,B,C }.

44
Splitting Right Sides of FD’s

• X->A1A2…An holds for R exactly when each


of X->A1, X->A2,…, X->An hold for R.
• Example: A->BC is equivalent to A->B and
A->C.
• There is no splitting rule for left sides.
• We’ll generally express FD’s with singleton
right sides.

45
Example: FD’s
Drinkers(name, addr, beersLiked, manf, favBeer)
• Reasonable FD’s to assert:
1. name -> addr favBeer
Note this FD is the same as name -> addr and name ->
favBeer.
2. beersLiked -> manf

46
Example: Possible Data

name addr beersLiked manf favBeer


Janeway Voyager Bud A.B. WickedAle
Janeway Voyager WickedAle Pete’s WickedAle
Spock Enterprise Bud A.B. Bud

Because name -> addr Because name -> favBeer

Because beersLiked -> manf

47
Keys of Relations
• K is a superkey for relation R if K
functionally determines all of R.
• K is a key for R if K is a superkey, but no
proper subset of K is a superkey.

48
Example: Superkey
Drinkers(name, addr, beersLiked, manf,
favBeer)
• {name, beersLiked} is a superkey because
together these attributes determine all the
other attributes.
– name -> addr favBeer
– beersLiked -> manf

49
Example: Key
• {name, beersLiked} is a key because neither
{name} nor {beersLiked} is a superkey.
– name doesn’t -> manf; beersLiked doesn’t -> addr.
• There are no other keys, but lots of superkeys.
– Any superset of {name, beersLiked}.

50
Relational Schema Design
• Goal of relational schema design is to avoid
anomalies and redundancy.
– Update anomaly : one occurrence of a fact is
changed, but not all occurrences.
– Deletion anomaly : valid fact is lost when a tuple is
deleted.

51
Example of Bad Design

Drinkers(name, addr, beersLiked, manf, favBeer)

name addr beersLiked manf favBeer


Janeway Voyager Bud A.B. WickedAle
Janeway ??? WickedAle Pete’s ???
Spock Enterprise Bud ??? Bud

Data is redundant, because each of the ???’s can be figured


out by using the FD’s name -> addr favBeer and
beersLiked -> manf.

52
This Bad Design Also
Exhibits Anomalies

name addr beersLiked manf favBeer


Janeway Voyager Bud A.B. WickedAle
Janeway Voyager WickedAle Pete’s WickedAle
Spock Enterprise Bud A.B. Bud

• Update anomaly: if Janeway is transferred to Intrepid,


will we remember to change each of her tuples?
• Deletion anomaly: If nobody likes Bud, we lose track
of the fact that Anheuser-Busch manufactures Bud.

53
Boyce-Codd Normal Form
• We say a relation R is in BCNF if whenever X -
>Y is a nontrivial FD that holds in R, X is a
superkey.
– Remember: nontrivial means Y is not contained
in X.
– Remember, a superkey is any superset of a key
(not necessarily a proper superset).

54
Example
Drinkers(name, addr, beersLiked, manf, favBeer)
FD’s: name->addr favBeer, beersLiked->manf
• Only key is {name, beersLiked}.
• In each FD, the left side is not a superkey.
• Any one of these FD’s shows Drinkers is not in
BCNF

55
Third Normal Form -- Motivation
• There is one structure of FD’s that causes
trouble when we decompose.
• AB ->C and C ->B.
– Example: A = street address, B = city, C = zip
code.
• There are two keys, {A,B } and {A,C }.
• C ->B is a BCNF violation, so we must
decompose into AC, BC.
56
We Cannot Enforce FD’s
• The problem is that if we use AC and BC as
our database schema, we cannot enforce the
FD AB ->C by checking FD’s in these
decomposed relations.
• Example with A = street, B = city, and C = zip
on the next slide.

57
An Unenforceable FD

street zip city zip


545 Tech Sq. 02138 Cambridge 02138
545 Tech Sq. 02139 Cambridge 02139

Join tuples with equal zip codes.

street city zip


545 Tech Sq. Cambridge 02138
545 Tech Sq. Cambridge 02139

Although no FD’s were violated in the decomposed relations,


FD street city -> zip is violated by the database as a whole.

58
3NF Let’s Us Avoid This Problem
• 3rd Normal Form (3NF) modifies the BCNF
condition so we do not have to decompose
in this problem situation.
• An attribute is prime if it is a member of any
key.
• X ->A violates 3NF if and only if X is not a
superkey, and also A is not prime.

59
Example: 3NF
• In our problem situation with FD’s AB ->C
and C ->B, we have keys AB and AC.
• Thus A, B, and C are each prime.
• Although C ->B violates BCNF, it does not
violate 3NF.

60
What 3NF and BCNF Give You

• There are two important properties of a


decomposition:
1. Lossless Join : it should be possible to project the
original relations onto the decomposed schema,
and then reconstruct the original.
2. Dependency Preservation : it should be possible
to check in the projected relations whether all the
given FD’s are satisfied.

61
3NF and BCNF -- Continued

• We can get (1) with a BCNF decomposition.


• We can get both (1) and (2) with a 3NF
decomposition.
• But we can’t always get (1) and (2) with a BCNF
decomposition.
– street-city-zip is an example.

62
Why It Works
• Preserves dependencies: each FD from a
minimal basis is contained in a relation,
thus preserved.
• Lossless Join: use the chase to show that
the row for the relation that contains a key
can be made all-unsubscripted variables.
• 3NF: hard part – a property of minimal
bases.

63

You might also like