Database Theories Review
Topics covered in Database Systems
Introduction and Basic Concepts
Database Architecture
Relational Model and Algebra
Entity Relationship Diagram
Normalization
Structured Query Language
Transaction and Concurrency
◎ Quality assurance and standards
◎ Quality planning and control
Introduction and Basic Concepts
What you learnt
Basic terminology:
File Based Systems
Database Approach
Basic Terminology
Data:
Database: A collection of data that describes the activities of one or more
related organizations.
DBMS: A software designed to manage the creation, manipulation and access
to the database.
File Systems
o Data stored as computer files
o Data processing specialist needed to:
- Create necessary computer file structures
- Write software that managed data with those structures
- Design application programs that produced reports based
on the file data
File Systems
Limitations of file-based systems
Separation and isolation of data
Duplication of data
Data dependence
Incompatible file formats
Fixed queries/proliferation of application programs
Database Approach
Advantages of DBMS over file based systems.
Control of data redundancy
Data consistency
More information from the same amount of data
Sharing of data
Improved data integrity
Improved security
Enforcement of standards
Improved maintenance through data independence
Increased concurrency
Database Architecture
What you Learnt
Three level ANSI-SPARC Architecture
Multilevel Database Architectures
Architecture
External Level
User’s view of the database
Conceptual level
Describes what data is stored in the
database and the relationships among the
data
Internal level
Physical representation of the database
(describes how data is stored)
The three-level architecture ensures data independence.
(Logical and physical independence)
Architecture
Overall description of the database is called the database schema
External Schema
Correspond to different views of the data
Conceptual Schema
Describes all the entities, attributes, and
relationships together with integrity
constraints
Internal Schema
Complete description of the internal
model (data fields, the indexes and
storage structures used)
There is only one conceptual schema and one internal schema per database.
Architecture
Multi-user Architecture
Teleprocessing File server Architecture Traditional Two-Tier Client–
Single CPU (where all processing Processing is distributed on the Server Architecture
is done) and a number of terminals network, typically LAN Client: manages user interface and
connected to the CPU File server holds files required . runs applications.
Each workstation has a DBMS Server :holds database and DBMS
Data Models
Data Models
Data Model: A representation of a representation, often graphical of
complex data structures
Relational Model
Relational Models
Relation: A relation is a table with columns
and rows.
Attribute: An attribute is a named column of
a relation. (field)
Tuple/ Record: A tuple is a row of a relation.
Degree: The degree of a relation is the
number of attributes it contains
Cardinality: The cardinality of a relation is
the number of rows it contains (Records/rows
Relational Algebra
Relational Algebra
Selection Operator ( σpredicate(R) )
Picks certain rows
List all staff with a salary greater than 10,000
σ salary > 10000 (Staff)
List all female staff with salary greater than 10,000
σ sex =‘’F’ ∧ salary >10000 (Staff)
Relational Algebra
Projection Operator
Picks certain columns
Produce a list of salaries for all staff, showing only the staffNo, fName, lName, and salary
details.
staffNo, fName, lName, salary (Staff)
Relational Algebra
Picking both columns and rows:
Produce the staffno and first names of all staff earning a salary greater than 10,000
staffNo, fName ( σ salary > 10000 (Staff) )
Relational Algebra
Cartesian Product
Picks columns from more than one table
List the names and comments of all clients who have viewed a property for rent.
fName, propertyNo, comments ( σ c.clientno = v.clientno (Client X Viewing))
Relational Algebra
Natural Join
Picks columns from more than one table
List the names and comments of all clients who have viewed a property for rent.
fName, propertyNo, comments (Client Viewing))
Natural Join:
Enforces equality on all attributes with the same name
Eliminates one copy of duplicate attributes
Relational Algebra
Theta Join ( ) )
Picks columns from more than one table
List all clients who have viewed a property number PA14.
(Client property=‘PA14’ Viewing)
Theta Join containing only equality, the term Equijoin is used instead.
Relational Algebra
Core Operators in Relational Algebra
Select
Project
Cross Product
Union
Difference
Rename
Entity Relationship Diagram
What you Learnt
Database design process
Entities (vs Entity Sets)
Relationships
Multiplicity
Constraints : Keys, Referential, Participation
Database design process
Example fund transfer
Database Design Process
Requirement Analysis- (What will be stored, accessed by who)
Conceptual design (Create ERD’s)
Logical design (Represent designs in data model)
Schema refinement (Normalization)
Physical design (storage, indexing)
Application and security design (Security)
What is a relationship?
A relationship between entity sets P and C is a subset of all possible pairs of entities in P and C, with tuples
uniquely identified by P and C’s keys
Multiplicity of Relationships
1 a
One-to-one: 2
3
b
c
d
1 a
Many-to-one: 2
3
b
c
d
1 a
b
One-to-many: 2
3 c
d
1 a
Many-to-many: 2
3
b
c
d
Multiplicity of Relationships
Each object of class c1 is related to at least m and at most n objects of c2
m…n A m…n
CC1 CC2
◎ 0…n At least 0 at most n
◎ 0…* No restrictions
From E/R diagrams to relational schema
Relational schema
CREATE TABLE Purchased(
ProductID CHAR(50),
SSN CHAR(50),
date DATE,
PRIMARY KEY (ProductID,
SSN),
FOREIGN KEY (ProductID)
REFERENCES Product,
FOREIGN KEY (SSN)
REFERENCES Person
)
Constraints
Constraints
• Entity Integrity Constraint
In a base relation, no attribute of a primary key can be null (null means unknown)
• Keys: Implicit constraints on uniqueness of entities
• Ex: An SSN uniquely identifies a person
• Single-value constraints:
• Ex: a person can have only one father
• Referential integrity constraints: Referenced entities must exist
• Ex: if you work for a company, it must exist in the database
• Other constraints:
• Ex: peoples’ ages are between 0 and 150
Normalization
What you Learnt
Redundancy and data anomalies
Functional Dependencies
Normalization Process
Anomalies in data
A poorly designed database causes anomalies
We can’t reserve a Student Course Room If every
room without course is in
Joseph ICS 3301 B01
students= an only one
insert anomaly Patricia ICS 3301 B01 room=
redundant
Mary ICS 3301 B01
information
… CS229 C12
If we update the
room number for
one tuple, we get
inconsistent data
If everyone drops the class, we lose what room the class is in! = a = an update
delete anomaly anomaly
Functional dependencies
Functional dependencies describes the relationships between attributes.
Def: Let A,B be sets of attributes
We write A B or say A functionally determines B if, for any
tuples t1 and t2:
t1[A] = t2[A] implies t1[B] = t2[B]
and we call A B a functional dependency
A->B means that
“whenever two tuples agree on A then they
agree on B.”
Functional Dependencies and keys
Relation with no duplicates
Suppose attribute A-> all attributes (A functionally determines all
attributes) then A is a key.
A B C
a1 b1 c1
a1 b1 c2
For A to be a key, it should functionally determine all
attributes
Closure of Attributes
Given a set of attributes A1, …, An and a set of FDs F:
Then the closure, {A1, …, An}+ is the set of attributes B
such that {A1, …, An} B
{name} {color}
F= {category} {department}
{color, category} {price}
{name}+ = {name, color}
Example of {name, category}+ =
closures {name, category, color, dept, price}
{color}+ = {color}
Normalization
SQL
What you learnt
Data Definition Language (DDL)
Data Manipulation Language (DML)
Joins
SQL
Creating a database\
Creating Tables
Insert , Update, Delete data from tables
Select
Join Operations
Aggregate functions: SUM, GROUPBY
Transaction Concurrency
What you Learnt
Transactions
Serializability
Concurrency Control Techniques
Properties of transactions
Atomic
State shows either all the effects of transaction, or none of them
Consistent
Transaction moves from a state where integrity holds, to another where integrity holds
Isolated
Effect of transactions is the same as transactions running one after another (ie looks like
batch mode)
Durable
Once a transaction has committed, its effects remain in the database
Motivation for Transactions
Grouping user actions (reads & writes) into transactions helps with two
goals:
Recovery & Durability: Keeping the DBMS data consistent and
durable in the face of crashes, aborts, system shutdowns, etc.
(Resilience to system failures)
Concurrency: Achieving better performance by parallelizing TXNs
without creating anomalies
Concurrency ensures that isolation and consistency are maintained
Recovery ensures atomicity and durability are maintained
Scheduling transactions
Comparing results of serial and non-serial schedule Starting Account Balances
Consider previous examples of transaction 1 and transaction 2 A B
Serial schedule Transaction 1, Transaction2 50 200
Transaction 1 A +=100 B -=100
Transaction 2 A *=1.06 B *=1.06 Account Balances after
serial schedule
A B
159 106
Interleaved schedule A
Same Result
Transaction 1 A +=100 B -=100 Account Balances after
interleaved schedule
Transaction 2 A *=1.06 B *=1.06 A B
159 106
Scheduling transactions
Comparing results of serial and non-serial schedule Starting Account Balances
Consider previous examples of transaction 1 and transaction 2 A B
Serial schedule Transaction 1, Transaction2 50 200
Transaction 1 A +=100 B -=100
Transaction 2 A *=1.06 B *=106 Account Balances after
serial schedule
A B
159 106
Interleaved schedule B
Different Result!!
Transaction 1 A +=100 B -=100 Account Balances after
interleaved schedule
Transaction 2 A *=1.06 B *=106
A B
159 112
Anomalies with interleaved execution
Several problems can occur when interleaved transactions conflict.
They include:
Unrepeatable read
Dirty Read/ Reading Uncommitted data
Lost update problem
Inconsistent analysis problem
Concurrency Control Techniques
Two phase Locking
Timestamping
Lock Granularity