Database Management Systems All Weeks
Database Management Systems All Weeks
By Anupratee Bharadwaj
Introduction to DBMS
Database Applications:
Banking: transactions
Airlines: reservations, schedules
Universities: registration, grades
Sales: customers, products, purchases
Online retailers: order tracking, customized recommendations
Manufacturing: production, inventory, orders, supply chain
HR: employee records, salaries, tax deductions
Physical Data Management/ Book Keeping: Process of physical data or records management using physical
ledgers and journals
Physical Data Independence: the ability to modify the physical schema without changing the logical
schema.
Instance: the actual content of the database at a particular point, similar to the value of the variable
Data Definition Language:
Defines the database schema
Create, alter, drop objects in a database
Grant, revoke privileges and roles
A DDL compiler generates a set of table templates stored in a data dictionary (contains metadata,
database schema, integrity constraints, authorizations)
1. Storage manager:
Program module that provides the interface between the low-level data stored in the database
and the application programs and queries submitted to the system.
It is responsible for interaction with the OS file manager, efficient storing, retrieving and
updating of data.
2. Query Processing:
A translation of high-level queries into low-level expressions
Parsing and translation
Optimization
Evaluation
3. Transaction Management:
Transaction management component: ensures that the database remains in a consistent state
despite system failures
Concurrency control manager: controls the interaction among the concurrent transactions to
ensure the consistency of the database
Introduction to Relational Model
Attribute:
Piece of data that describes an entity, columns
Domain of the attribute: set of allowed values for each attribute
Attribute values need to be atomic (indivisible)
Null: special value of every domain that indicates that the value is unknown, can cause complications
in the definitions of many operations.
Keys (IMPORTANT):
Superkey: an attribute or set of attributes that can be used to uniquely identify all attributes in a
relation. All Superkeys can’t be candidate keys. In a relation the no. of Superkeys is always greater
than or equal to the no. Candidate Keys. (2^n-1 + 2^n-3 – 2^4)
Candidate Key: a minimal subset of a Superkey, allowed to be null. Every candidatekey is a superkey
Primary Key: selected from candidate keys, used to uniquely identify all attributes in a relation.
Secondary Key/ Alternate Key: all Candidate Keys not selected as a Primary Key
Surrogate Key/ Synthetic Key: a key that uniquely identifies an entity/ object in the database that is
not directly derived from the application data. Example: serial number
Foreign Key: used to establish relationships between two tables, value in one relation must appear in
another
Simple Key: consists of a single attribute
Composite Key: more than one attribute, each component not a simple key
Compound Key: more than one attribute, each component is a simple key
Relational Query Languages:
Procedural programming: tell the computer how to get the output
Declarative programming: requires a more descriptive style
Relation Operators:
Select Operation (σ): selection of rows (where clause)
Projection Operation (π): selection of columns (select clause)
Union Operation (∪): union of two relations (or), must have same number of columns, each column
must have the same data type
Difference (-): r – s, what’s there in r but not in s, must satisfy same conditions as union
Intersection (∩): common attributes (and), must satisfy same conditions as union [r∩s = r – (r – s)]
Cartesian product (×): cross-join which outputs all possible combinations
Natural Join (⋈): outputs pairs of rows from the two input relations that have the same value on all
attributes that have the same name.
Aggregation Operators:
SUM()
AVG()
MAX()
MIN()
Symbols:
∨ - or
∧ - and
¬ - not
Introduction to SQL
Select Clause:
List of attributes
distinct is used after select to eliminate duplicates
all after select is used to keep duplicates
* after select clause clause denotes all attributes
Can use arithmetic expressions in select
Can rename resulted query with as (select ID, name, salary/12 as monthly_salary)
From Clause:
Lists the relations involved in the query
Cartesian product in relational algebra
Where Clause:
Specifies conditions that the result must satisfy
Selection predicate in relational algebra
Basic Operations:
String Operations: (%) matches any substring, represents zero, one, or many characters;
( _ ) matches any character, represents one single character; case sensitive
Examples:
‘Intro%’ matches any string beginning with “Intro”
‘%meow’ matches any string ending with “meow”
‘%Comp%’ matches any string containing “Comp” as a substring
‘_ _ _ ’ matches any string of exactly three characters
‘_ _ _ %’ matches any string of at least three characters
Ordering the Display of Tuples: order by, can specify desc or asc. Asc is the default. Can sort on
multiple attributes
Select top: specify the number of records to return
in allows for multiple values in a where clause, example: where dept_name in (‘CS’, ‘Bio)
Set Operations:
All of these operations automatically eliminate duplicates, to keep duplicates use all after the operation
union (or)
intersect (and)
except (difference)
Null Values:
null signifies an unknown value or that a value doesn’t exist
is null/ is not null is used to check for null values (in where clause)
result of any arithmetic expression with null is null
any comparison with null returns unknown
logic using unknown:
1. OR: (unknown or true) = true
(unknown or false) = unknown
(unknown or unknown) = unknown
2. AND: (unknown and true) = unknown
(unknown and false) = false
(unknown and unknown) = unknown
3. NOT: (not unknown) = unknown
Aggregate Functions:
These values operate on the multiset of values of a column of a relation and return a value
All aggregate operations except count(*) ignores tuples with null values
avg: average value
min: minimum value
max: maximum value
sum: sum of values
count: number of values
group by: form groups
having: used after forming groups, like where clause for groups
Examples:
1. Select distinct:
From the classroom relation find the names of buildings in which every individual classroom has
capacity less than 100 (removing the duplicates).
select distinct building
from classroom
where capacity < 100
select i.name
from instructors i, department d
where d.dept_name = i.dept_name and
(d.dept_name = ‘Finance’ or d.building in (‘Watson’, ‘Taylor’))
4. String Operations:
From the course relation, find the titles of all courses whose course id has three alphabets indicating
the department.
select title
from course
where course_id like ‘_ _ _-%’
5. Order by:
From the student relation in the figure, obtain the list of all students in alphabetic order of
departments and within each department, in decreasing order of total credits.
6. In Operator:
From the teaches relation, find the IDs of all courses taught in the Fall or Spring of 2018.
7. Union
From the teaches relation, find the IDs of all courses taught in the Fall or Spring of 2018.
select course id
from teaches
where semester=‘Fall’ and year=2018
union
select course id
from teaches
where semester=‘Spring’ and year=2018
8. Intersect:
From the instructor relation, find the names of all instructors who taught in either the Computer
Science department or the Finance department and whose salary is < 80000
select name
from instructor
where dept name in (‘Comp. Sci.’,‘Finance’)
intersect
select name
from instructor
where salary < 80000
9. Except:
From the instructor relation, find the names of all instructors who taught in either the Computer
Science department or the Finance department and whose salary is either ≥ 90000 or ≤ 70000
select name
from instructor
where dept name in (‘Comp. Sci.’,‘Finance’)
except
select name
from instructor
where salary < 90000 and salary > 70000
10. Avg:
From the classroom relation, find the names and the average capacity of each building whose
average capacity is greater than 25.
11. Min:
From the instructor relation, find the least salary drawn by any instructor among all the instructors
12. Max:
From the student relation, find the maximum credits obtained by any student among all the students
13. Count:
From the section relation, find the number of courses run in each building
14. Sum:
From the course relation, find the total credits offered by each department
select dept name, sum(credits) as sum credits
from course
group by dept name
Intermediate SQL
Nested Subqueries:
A subquery is a select-from-where expression that is nested within another query
Can be used in select, from or where clause
Find the total number of (distinct) students who have taken course sections taught by the instructor
with ID 10101 (in)
select count (distinct ID)
from takes
where (course_id, sec_id, semester, year) in
(select course_id, sec_id, semester, year
from teaches
where teaches.ID = 10101)
Find names of instructors with salary greater than that of some (at least one) instructor in the Biology
department (some)
select name
from instructor
where salary > some (select salary
from instructor
where dept_name = ‘Biology’)
Find the names of all instructors whose salary is greater than the salary of all instructors in the
Biology department (all)
select name
from instructor
where salary > all (select salary
from instructor
where dept_name = ‘Biology’)
Find all courses taught in both the Fall 2009 semester and in the Spring 2010 semester (exists)
select course_id
from section s
where semester = ‘Fall’ and year = 2009 and exists
(select *
from section t
where semester = ‘Spring’ and year = 2010
and s.course_id = t.course_id)
Find all students who have taken all courses offered in the Biology department (not exists)
Cannot write this using all and its variants
select distinct s.ID, s.name
from student s
where not exists ((select course_id
from course
where dept_name = ‘Biology’)
except
(select t.course_id
from takes as t
where s.ID = t.ID))
Find all courses that were offered at most once in 2009 (unique)
The unique constructs tests whether a subquery has any duplicates tuples in its result
It evaluates to true if a given subquery has no duplicates
select c.course_id
from course c
where unique (select s.course_id
from section s
where c.course_id = s.course_id and s.year = 2009)
The with clause provides a way of defining a temporary relation whose definition is only available to
the query in which the with clause occurs
Find all departments with the maximum budget
with max_budget(value) as
(select max(budget)
from department)
select department.name
from department, max_budget
where department.budget = max_budget.value
Find all departments where the total salary is greater than the average of the total salary of all
departments
with dept_total(dept_name, value) as
select dept_name, sum(salary)
from instructor
group by dept_name,
dept_total_avg(value) as
(select avg(value)
(from dept_total)
select dept_name
from dept_total, dept_total_avg
where dept_total.value > dept_total_avg.value
Examples:
Delete all instructors
delete from instructor
Delete all instructors whose salary is less than the average salary of instructors
delete from instructor
where salary < (select avg(salary)
from instructor)
Problem: as we delete tuples from deposit, the average salary changes
The correct solution will use the with clause to compute and store the avg salary and then find and
delete all the tuples with salary less than avg
Insertion:
Inserting new tuples into a relation
Format:
insert into r1, r2
values x1,x2
Examples:
Add a new tuple to course
insert into course
values (‘MEW-101, ‘Cat Science’, ‘Cat. Sci.’, 4)
Add all instructors to the student relation with tot creds set to 0
insert into student
select ID, name, dept_name, 0
from instructor
Updating:
Updating values in some tuples of a relation
Format:
update r1
set values
where P
Examples:
Increase salaries of instructors whose salary is over $100,000 by 3%, and all others by a 5%
update instructor
set salary = salary * 1.03
where salary > 100000
update instructor
set salary = salary * 1.05
where salary <= 100000
Order is important
update instructor
set salary = case
when salary <= 100000
then salary * 1.03
else salary * 1.05
end
Join Expressions:
Join operations take two relations and return as a result another relation
A join operation is a Cartesian product which requires that tuples in the two relations matches (under
some condition).
It also specifies the attributes that are present in the result of the join
The join operations are typically used as subquery expressions in the from clause
Cross Join:
Returns the cartesian product of rows from tables in the join
no. of tuples returned = m * n (without condition)
Examples:
select *
from employee cross join department
select *
from employee, department
Inner Join:
Joins two table on the basis of the column which is explicitly specified in the ON clause with a given
condition using a comparison operator
course inner join prereq on
course.course_id = prereq.course_id
If it is natural join (joins two tables based on same attribute name and compatible datatypes) the
duplicate columns are skipped
course natural inner join prereq on
course.course id = prereq.course id
Outer Join:
An extension of the join operation that avoids loss of information
Computes the join and then adds tuples from one relation that does not match tuples in the other
relation to the result of the join
Uses null values
Left Outer Join:
Includes all tuples from relation A, if attributes not correspondingly in B, B gets a null value
course natural left outer join prereq on
course.course id = prereq.course id
Includes all tuples from relation B, if attributes not correspondingly in A, A gets a null value
course natural right outer join prereq on
course.course id = prereq.course id
Views:
Any relation that is not of the conceptual model but is made visible to a user as a “virtual relation” is
called a view.
A view provides a mechanism to hide certain data from the view of certain users
Doesn’t create a new relation, virtual, isn’t stored
A view can be used to create another view
Modifying the data will not affect the view (unless it’s a simple view)
Views need to be maintained by updating the view
Materializing a view - create a physical table containing all the tuples in the result of the query
defining the view
Format:
create view v as
<any legal SQL statement>
Example:
A view of instructors without their salary
create view faculty as
select ID, name, dept name
from instructor
Integrity Constraints:
Guard against accidental damage to the database, by ensuring that authorized changes to the database do not
result in a loss of data consistency
not null: specified value can’t be null
unique(a1, a2..am): a1, a2, am form a candidate key
check(P): check (semester in (‘Fall’, ‘Winter’, ‘Spring’, ‘Summer’))
Referential Integrity:
ensures that a value that appears in one relation for a given set of attributes also appears for a certain
set of attributes in another relation.
Example: If ‘Biology’ is a department name appearing in one of the tuples in the instructor relation,
then there exists a tuple in the department relation for ‘Biology’.
The Cascading Referential Integrity Constraints in SQL Server are the foreign key constraints that
tell SQL Server to perform certain actions whenever a user attempts to delete or update a primary
key to which an existing foreign keys point.
SQL Functions:
Function is a tool in SQL that is used to calculate anything to produce an output for the provided inputs
Define a function that, given the name of a department, returns the count of the number of instructors
in that department
create function dept_count (dept_name varchar(20))
returns integer #indicate variable type that is return
begin
declare d_count integer
select count (*) into d_count
from instructor
where instructor.dept name = dept_name
return d_count #what to return
end
The function dept count can be used to find the department names and budget of all departments with
more that 12 instructors:
select dept name, budget
from department
where dept count (dept name ) > 12
Procedures:
A procedure is a set of instructions which takes input and performs a certain task
The dept count function could instead be written as procedure
create procedure dept_count_proc (in dept_name varchar (20), out d count integer)
begin
select count(*) into d_count
from instructor
where instructor.dept_name = dept_count_proc.dept_name
end
Language constructs for procedures and functions - while loop, repeat loop, for loop, if-then-else,
case
Differences between function and procedure:
Triggers:
A trigger defines a set of actions that are performed in response to an insert, update, or delete
operation on a specified table.
When such an SQL operation is executed, the trigger is said to have been activated
Triggers are optional
Triggers are defined using the create trigger statement
Triggers can be used:
to enforce data integrity rules via referential constraints and check constraints
to cause updates to other tables, automatically generate or transform values for inserted or updated
rows
invoke functions to perform tasks such as issuing alerts
To design a trigger mechanism the following must be done:
specify the events / (like update, insert, or delete) for the trigger to executed
specify the time (BEFORE or AFTER) of execution
specify the actions to be taken when the trigger executes
1. BEFORE triggers
Run before an update, or insert
Values that are being updated or inserted can be modified before the database is actually modified.
You can use triggers that run before an update or insert to: Check or modify values before they are
actually updated or inserted in the database − Useful if user-view and internal database format
differs. Run other non-database operations coded in user-defined functions
3. AFTER triggers
Run after an update, insert, or delete
You can use triggers that run after an update or insert to: Update data in other tables − Useful for
maintain relationships between data or keep audit trail . Check against other data in the table or in
other tables − Useful to ensure data integrity when referential integrity constraints aren’t appropriate,
or − when table check constraints limit checking to the current table only. Run non-database
operations coded in user-defined functions − Useful when issuing alerts or to update information
outside the database
Row level triggers are executed whenever a row is affected by the event on which the trigger is
defined, fires once for each row affected.
Statement level triggers perform a single action for all rows affected by a statement, instead of
executing a separate action for each affected row, fires once per triggering event.
Triggering event can be an insert, delete or update
Uses:
Logging changes to a history table
Auditing users and their actions against sensitive tables
Simple validation
Relational Algebra:
Procedural and Algebra based
The operators take one or two relations as inputs and produce a new relation as a result
Six basic operators
select: σ
project: Π
union: ∪
set difference: −
Cartesian product: x
rename: ρ
Symbols:
∨ or
∧ and
¬ not
Examples:
where dept_name = ‘Physics’
select ID, name, salary from instructor
select course_id
from section
where semester = ‘Fall’ and year = ‘2009’
union
select course_id
from section
where semester = ‘Spring’ and year = ‘2010’
select course_id
from section
where semester = ‘Fall’ and year = ‘2009’
except
select course_id
from section
where semester = ‘Spring’ and year = ‘2010’
Division Operator (÷):
Division is a derived operation and can be expressed in terms of other operations
Examples:
Tuple Relational Calculus:
Non-Procedural and Predicate Calculus based
Each query is of the form {t | P(t)}, t is resulting tuples, P(t) is the predicate used to fetch t.
Predicate calculus formula:
Set of attributes and constants
Set of comparison operators: (e.g., , ≥)
Set of connectives: and (∧), or (∨), not (¬) d)
Implication (⇒) : x ⇒ y, if x if true, then y is true x ⇒ y ≡ ¬ x ∨ y
Set of quantifiers: • ∃t ∈ r (Q(t)) ≡ “there exists” a tuple in t in relation r such that predicate
Q(t) is true [existential quantifier]
∀t ∈ r (Q(t)) ≡ Q is true “for all” tuples t in relation r [universal quantifier]
Symbols:
⇒ implies
∃ there exists
∈ belonging to
≡ is defined as
∀ for all
∧ and
∨ or
¬ not
Examples:
1. Find out the names of all students who have taken the course name ‘DBMS’
2. Find out the names of all students and their rollNo who have taken the course name
‘DBMS’
RA vs TRC vs DRC:
Entity- Relationship Model
The ER data model was developed to facilitate database design by allowing specification of an
enterprise schema that represents the overall logical structure of a database
The ER model is useful in mapping the meanings and interactions of real-world enterprises onto a
conceptual schema
ER Diagram for a University Enterprise:
Attributes:
An Attribute is a property associated with and entity / entity set. Based on the values of certain
attributes, an entity can be identified uniquely
Domain: Set of permitted values for each attribute
Attribute types:
Simple: cannot be further divided into sub attributes, eg: roll_no
Composite attributes: can be divided into sub attributes, eg: name [first_name, last_name]
Single-valued: can only take a single value, eg: ID student can only have one ID
Multivalued attributes: can take many values, eg: email can have multiple emails
Derived attributes: computed from other attributes, eg: age
Complex attributes: those attributes, which can be formed by the nesting of composite and
multi-valued attributes, notation to express entity with complex attributes:
Entity: an object that exists and is distinguishable from other objects. An entity is represented by a set of
attributes
Relationship sets:
Relationship set: a mathematical relation between 2 or more entities, each taken from entity sets
Entity Sets:
Entity set: a set of entities of the same type that share the same properties
Primary key: a subset of the attributes form a primary key of the entity set; that is, uniquely
identifying each member of the set. Underline indicates primary key attributes.
Rectangles represent entity sets. Attributes are listed inside entity rectangle.
Strong Entity Sets:
A strong entity set has a primary key to uniquely identify all its entities
Primary key of a strong entity set is represented by underlining it
Cardinality Constraints:
Express the number of entities to which another entity can be associated via a relationship set
One-to-many: an instructor is associated with several (including 0) students via advisor, a student is
associated with at most one instructor via advisor
Many-to-one: an instructor is associated with at most one student via advisor, and a student is
associated with several (including 0) instructors via advisor
Many-to-many: An instructor is associated with several (possibly 0) students via advisor. A student
is associated with several (possibly 0) instructors via advisor
E-R Diagram Symbols:
Anomaly: inconsistencies that can arise due to data changes in a database with insertion, deletion, and
update. These problems occur in poorly planned, un-normalised databases where all the data is stored in one
table (a flat-file database).
Insertions Anomaly: when the insertion of a data record is not possible without adding some
additional unrelated data to the record
Deletion Anomaly: when deletion of a data record results in losing some unrelated information that
was stored as part of the record that was deleted from a table
Update Anomaly: when a data is changed, which could involve many records having to be changed,
leading to the possibility of some changes being made incorrectly
Lossy decomposition: a reconstruct of the original relation is not possible after decomposition, lost info.
Lossless-Join Decomposition: original relation can be reconstructed by using natural join on sub relations.
Armstrong’s Axioms:
Reflexivity (trivial property): if β ⊆ α, then α → β (if AB → B, then A → B)
Augmentation: if α → β, then γα → γβ
Transitivity: if α → β and β → γ, then α → γ
These axioms are Sound; given a set of functional dependencies F specified on a relation schema R,
any dependency that we can infer from F by using the primary rules of Armstrong axioms holds in
every relation state r of R that satisfies the dependencies in F
and Complete; using primary rules of Armstrong axioms repeatedly to infer dependencies until no
more dependencies can be inferred results in the complete set of all possible dependencies that can
be inferred from F called the Closure Set F+ for FDs F.
Closure set of FDs (F+): A set of all FDs implied by the original set of dependencies F
Example: F = {A → B, B → C}
F + = {A → B, B → C, A → C}
Closure of Attribute Sets (α+): the set of attributes that are functionally determined by α under F,
where α is a set of attributes
Example: R = (A, B, C, G, H, I) ; F = {A → B, A → C, CG → H, CG → I, B → H}
We need to find (AG)+
a) result = AG (trivial)
b) result = ABCG (A → C and A → B)
c) result = ABCGH (CG → H and CG ⊆ AGBC)
d) result = ABCGHI (CG → I and CG ⊆ AGBCH)
Canonical Cover/ Minimal Cover/ Irreducible Sets: a minimal set of FDs. Should satisfy the
following conditions:
Equivalent to F (F+ = Fc+)
No FD in Fc contains an extraneous attribute
Each left side of FD in Fc is unique. That is, there are no two dependencies α1 → β1 and α2 → β2 in
such that α1 → α2
Normalisation:
Should be in 2NF
Should not contain any Transitive Dependency
Transitive Dependency: if a non-prime attribute derives a non-prime attribute. If an indirect
relationship causes a functional dependency like A → B and B → C are true, then A → C happens to
be a transitive dependency
There is always a lossless-join, dependency-preserving decomposition into 3NF which is why it’s the
most common normal form
Proper subset of any candidate key is not allowed to functionally determine the non-prime attributes
Example:
Example of decomposition:
The functional dependencies for this relation schema are (will be given):
a) customer_id, employee_id → branch_name, type
b) employee_id → branch_name
c) customer_id, branch_name → employee_id
Should be in 3NF
Every functional dependency (A → B), A is either the super key or the candidate key. In simple
terms, for any case (A → B), A can't be a non-prime attribute. This is called super key restriction
BCNF always eliminates redundancies/anomalies
Either a lossless join is possible or preserving dependencies, both isn’t possible
Example of decomposition:
R = (A, B, C)
F = {A → B, B → C}
Key = {A}
R is not in BCNF (B → C but B is not superkey)
Decomposition:
R1 = (B, C)
R2 = (A, B)
Comparison of Normal Forms:
Multivalued Dependency:
Multivalued dependency occurs when two attributes in a table are independent of each other but,
both depend on a third attribute
A multivalued dependency consists of at least two attributes that are dependent on a third attribute
that's why it always requires at least three attributes
For a dependency A → B, if for a single value of A, multiple values of B exist, then the relation will
be a multi-valued dependency
Example:
Here columns COLOR and MANUF_YEAR are dependent on BIKE_MODEL and independent of
each other
BIKE_MODEL →→ MANUF_YEAR
BIKE_MODEL →→ COLOR
BIKE_MODEL multidetermined MANUF_YEAR and COLOR
We use multivalued dependencies in two ways:
To test relations to determine whether they are legal under a given set of functional and
multivalued dependencies
To specify constraints on the set of legal relations. We shall thus concern ourselves only with
relations that satisfy a given set of functional and multivalued dependencies
MVD Theory:
Should be in BCNF
Has no multivalued dependency
A relation schema R is in 4NF with respect to a set D of functional and multivalued dependencies if
for all multivalued dependencies in D + of the form A →→ B, where A ⊆ R and B ⊆ R, at least one
of the following hold:
A →→ B is trivial (that is, β ⊆ α or α ∪ β = R)
A is a superkey for schema R
Example:
The given student relation is in BCNF but not in 4NF because STU_ID multidetermined COURSE
and HOBBY, so to make the above table into 4NF, we can decompose it into two tables:
Algorithm to decompose to 4NF:
For all dependencies A →→ B in D +, check if A is a superkey by using attribute closure
If not, then
Choose a dependency in F+ that breaks the 4NF rules, say A →→ B
Create R1 = AB
Create R2 = (R – (B – A))
Repeat for R1, and R2
Example of decomposition:
Temporal Databases
Historical data: data collected about past events and circumstances pertaining to a particular subject,
include time-dependent/time-varying data
Examples: medical records, judicial records, share prices, exchange rates, interest rates, company
profits
Temporal data: have an associated time interval during which the data are valid
Temporal databases: provide a uniform and systematic way of dealing with historical data
Snapshot: the value of the data at a particular point in time
Temporal Relation is one where each tuple has associated time; either valid time or transaction time
or both associated with it:
Uni-Temporal Relations: has one axis of time, either Valid Time or Transaction Time
Bi-Temporal Relations: has both axis of time – Valid time and Transaction time. It includes
Valid Start Time, Valid End Time, Transaction Start Time, Transaction End Time
Example:
Advantages: the main advantages of this bi-temporal relations is that it provides historical (valid
time) and roll back information (transaction time). For example, you can get the result for a query on
John’s history, like: Where did John live in the year 2001? The result for this query can be got with
the valid time entry. The transaction time entry is important to get the rollback information
Disadvantages: more storage, complex query processing, complex maintenance including backup
and recovery
Application Programs and Architecture
Applications are used to manage internal systems of big organisations (ERP: Enterprise Resource
Planning)
Diversity and Unity are characteristics of application programs
Application Functionality:
Frontend or Presentation Layer / Tier: interacts with the user like display / view, input / output;
responsible for providing the graphical user interface (GUI)
Example: choose item, add to cart, checkout, pay, track order
Interfaces may be browser-based, mobile app, or custom
Middle or Application / Business Logic Layer / Tier: implements the functionality of the
application: links front and backend, differs based on the business logic, handles work flows,
Support functionality based on frontend interface.
Example: authentication, search / browse logic, pricing, cart management, payment handling
(gateway), order management (mail / SMS / internal actions), delivery management
Support functionality based on frontend interface
Backend or Data Access Layer / Tier: manages persistent data, large volume, efficient access,
security
Example: User, Cart, Inventory, Order, Vendor databases
All applications have this structural requirement which is translated into the following architecture
Application Architecture:
Database architecture focuses on the design, development, implementation and maintenance of
computer programs that store and organize information for businesses, agencies and institutions
Business Logic Layer / Tier: returns with a response for the Controller; provides high level view of
data and actions on data, often using an object data model, hides details of data storage schema.
Provides abstractions of entities (create objects)
Has very complex business logic that can’t be implemented as part of the database.
Enforces business rules for carrying out actions
Supports workflows which define how a task involving multiple participants is to be carried out
Data Access Layer / Tier: interfaces between business logic layer and the underlying database,
provides mapping from object model of business layer to relational model of database. Relational
database design
Architecture Classification:
The design of a DBMS depends on its architecture. It can be centralized, decentralized or
hierarchical
The architecture of a DBMS can be seen as either single tier or multi-tier:
1-tier architecture: involves putting all of the required components for a software
application or technology on a single server or platform, the simplest and most direct way
2-tier architecture: based on Client Server architecture,
3-tier architecture: Presentation, Logic and Data Access layers
n-tier architecture: an n-tier architecture distributes different components of the 3 tiers
between different servers and adds interfaces tiers for interactions and workload balancing
Web Fundamentals:
World Wide Web (www): The Web is a distributed information system based on hypertext
Hypertext: text with references (hyperlinks) to other text that a reader can access
HyperText Markup Language (HTML)
Uniform Resource Identifier (URI): identifier of doc
Uniform Resource Locator (URL): location of doc
Uniform Resource Name (URN): name of doc
HyperText Transfer Protocol (HTTP): used for
communication with the Web server, connectionless
Session: time for which a virtual connection to a
web page remains
Cookie: small piece of text containing identifying
information, can be stored permanently or for a
limited time
Web browser: application software for accessing the
www
Web server: a software and underlying hardware that accepts
requests via HTTP or its secure variant HTTPS
Representation State Transfer (REST): allows use of standard HTTP request to a URL to execute a
request and return data
JavaScript Object Notation (JSON)
Script: a list of (text) commands that are embedded in a web-page or in the server
Scripting language: the programming languages in which scripts are written
Client-Side Scripting: Client-side scripting is responsible for interaction within a web page. The
client-side scripts are firstly downloaded at the client-end and then interpreted and executed by the
browser; Javascript
Server-Side Scripting: Server-side scripting is responsible for the completion or carrying out a task
at the server-end and then sending the result to the client-end; Servlets, Java Server Pages (JSPs),
PHP
Embedded SQL: put SQL commands in some format inside native programming language,
embedded SQL works with C, C++, Java, COBOL, FORTRAN and Pascal
Connectionist Framework:
Connectionist Bridge Configurations: a bridge is a special kind of driver that uses another driver-
based technology. This driver translates source function-calls into target function-calls. Programmers
usually use such a bridge when they lack a source driver for some database but have access to a
target driver
a) Create connection:
psycopg2.connect(database="dbname", user="username", password="mypass" host="127.0.0.1",
port="5432")
b) Create cursor:
connection.cursor()
Steps:
Explanation:
we need to import sys, os and psycopg2 because they help us connect to the database
info about the database we need to connect to will be given in the question, store that info
open file using command open (‘filename.txt’, ‘r’), here filename is name.txt; and store in variable,
here I used file
read file using command file.read() and store in variable, here I used fname because the file contains
the student’s first name
after creating a connection and cursor, use cursor to execute select query
question says to write a Python program to print the roll number of the student whose first name is
given in a file named ‘name.txt’
in the select query we select roll no from students where student_fname = %s [note: = %s is a
placeholder and different from using string commands where we use like instead of =]
the placeholder only accepts tuples so we have to mention the variable into which we read the file,
the full format of the query will then be;
(‘select roll no from students where student_fname = %s’ , (fname,))
store the results from the query before printing them, the first part of the tuple contains the roll_no so
we will only print that, hence row[0] is used while printing
close the cursor and connection, it is considered good practice to do so
Steps:
1. import relevant libraries (we need sys, os, psycopg2 and datetime) [line 1]
2. store info about database, will be given in question [line 3 to 7]
3. open and read file content [line 9&10]
4. create connection [line 12]
5. create cursor [line 13]
6. execute query [line 15]
7. store results [line 16]
8. print results [line 18 to 22]
9. close cursor [line 25]
10. close connection [line 26]
Explanation:
we need to import datetime because the question requires us to extract the year from date of birth
which is a datetime object
after the results have been stored, proceed to extract the year and check if its odd or even
row[2] stores the dob (row[0] stores name and row[1] stores department name), use function
row[2].year to extract the year
format the string that needs to be printed using +
Primary Storage:
Cache: fastest and most costly form of storage, volatile, managed by the computer system hardware
Main memory: fast access, expensive, too small
Secondary Storage:
Flash memory: non-volatile, can support a limited number of write/erase cycles, reads are roughly
as fast as main memory, writes are slow (few microseconds), erase is slower
Magnetic Disk (IMPORTANT):
data is stored on spinning disk, and read/written magnetically
data must be moved from disk to main memory for access, and written back for storage -
much slower access than main memory
direct-access
non-volatile
large capacity
both surfaces can be read at the same time
tracks: surface of platter divided into circular tracks
sectors: each track is divided into sectors; a sector is the smallest unit of data read or
written; sector size typically 512 bytes; all sectors are the same size
The number of cylinders of a disk drive exactly equals the number of tracks on a single
surface in the drive
Disk Controller: interfaces between the computer system and the disk drive hardware;
accepts high-level commands to read or write a sector; initiates actions moving the disk arm
to the right track, reading or writing the data; computes and attaches checksums to each
sector to verify that correct read back; ensures successful writing by reading back sector after
writing it; performs remapping of bad sectors
Access Time: time from a read or write request issue to start of data transfer:
Seek Time: time to reposition the arm over the correct track
Rotational Latency: time for the sector to be accessed to appear under the head; Rotational
latency = (1 / 2) x (1 / rotational speed) x 1000
Total access time = seek time + rotational latency
Data-transfer Rate: the rate at which data can be retrieved from or stored to the disk
Mean Time To Failure (MTTF): avg. time the disk is expected to run continuously without
any failure
Example questions:
Question
Consider a magnetic disk with the following specifications. The magnetic disk consists of 16
platters, and has information recorded on both the surfaces of each platter. Each platter’s
surface is logically divided into 128 tracks, each of which is subdivided into 256 sectors. Find
the storage capacity of a track. (Given, sector size is 512 bytes.)
Answer
To find the storage capacity of a track, we need to know the number of sectors in a track and
the size of each sector.
We know that each platter has 128 tracks and there are 16 platters, so the total number of
tracks is:
128 tracks/platter x 16 platters = 2048 tracks
We also know that each track is logically divided into 256 sectors and each sector has a size
of 512 bytes. Therefore, the storage capacity of a track is:
256 sectors/track x 512 bytes/sector = 131,072 bytes/track
Therefore, the storage capacity of a track is 131,072 bytes or 128 kilobytes
Question:
Consider a magnetic disk with 8 platters, 2 surfaces/platter, 1024 tracks/surface, 2048
sectors/track, and 512 bytes/sector. The disk rotates with 6000 revolutions per minute and
seek time is 3ms. What is the capacity of the disk? What will be the rotational latency? What
is the minimum number of bits required for addressing all the sectors?
Answer:
Capacity –
To calculate the capacity of the disk, we first need to find the total number of sectors on the
disk, and then multiply that by the size of each sector.
The total number of sectors on the disk can be calculated as follows:
Number of platters = 8 Number of surfaces/platters = 2 Number of tracks/surfaces = 1024
Number of sectors/tracks = 2048
Total number of sectors = Number of platters x Number of surfaces/platter x Number of
tracks/surface x Number of sectors/track = 8 x 2 x 1024 x 2048 = 33,554,432 sectors
The size of each sector is given as 512 bytes. Therefore, the total capacity of the disk can be
calculated as:
Capacity of disk = Total number of sectors x Size of each sector = 33,554,432 sectors x 512
bytes/sector = 17,179,869,184 bytes or 16GB
Rotational Latency-
Rotational latency = (1/2) x (1/rotational speed) x 1000
where rotational speed is the speed of the disk in revolutions per second.
In this case, the rotational speed is given as 6000 revolutions per minute, or 100 revolutions
per second. Therefore, we can calculate the rotational latency as:
Rotational latency = (1/2) x (1/100) x 1000 = 5 milliseconds
Tertiary Storage:
Optical Disks: non-volatile; data is read optically from a spinning disk using a laser
Magnetic Tapes: hold large volumes of data and provide high transfer rates; tapes are cheap, but
cost of drives is very high; very slow access time in comparison to magnetic and optical disks;
limited to sequential access; used mainly for backup, for storage of infrequently used information,
and as an off-line medium for transferring information from one system to another
Cloud Storage:
Cloud storage is purchased from a third-party cloud vendor who owns and operates data storage
capacity and delivers it over the Internet in a pay-as-you-go model
These cloud storage vendors manage capacity, security and durability to make data accessible to
applications all around the world
Applications access cloud storage through traditional storage protocols or directly via an API
Other Storages:
Flash Drives: USB flash drives are removable and rewritable storage devices that, as the name
suggests, require a USB port for connection and utilizes non-volatile flash memory technology
A Secure Digital (SD Card): a type of removable memory card used to read and write large quantities
of data
Flash Storage
Solid-State Drives (SSD): SSDs replace traditional mechanical hard disks by using flash-based
memory, which is significantly faster; SSDs speed up computers significantly due to their low read-
access times and fast throughput
File Structure
File Organisation:
A database is a collection of files; A file is a sequence of records; a record is a sequence of fields
A database file is partitioned into fixed-length storage units called blocks
Storage Access:
Database system seeks to minimize the number of block transfers between the disk and memory
A database file is partitioned into fixed-length storage units called blocks
Buffer: portion of main memory available to store copies of disk blocks
Buffer Manager: subsystem responsible for allocating buffer space in main memory
Most operating systems replace the block least recently used (LRU strategy)
Pinned block: memory block that is not allowed to be written back to disk
Toss-immediate strategy: frees the space occupied by a block as soon as the final tuple of that block
has been processed
Most recently used (MRU) strategy: system must pin the block currently being processed. After
the final tuple of that block has been processed, the block is unpinned, and it becomes the most
recently used block
Buffer managers also support forced output of blocks for the purpose of recovery
Basic Concepts:
Indexing mechanisms are used to speed up access to desired data
Search Key: attribute to set of attributes used to look up records in a file
An index file consists of records (called index entries) of the form
Index files are typically much smaller than the original file
Ordered indices: search keys are stored in sorted order, update is costlier
Hash indices: search keys are distributed uniformly across buckets using a hash function
Index Evaluation Metrics: access types supported efficiently, access time, insertion time, deletion
time, space overhead
Ordered Indices:
Index entries are stored sorted on the search key value
Primary index/ clustering index: in a sequentially ordered file, the index whose search key
specifies the sequential order of the file; the search key of a primary index is usually but not
necessarily the primary key
Secondary index/ non-clustering index: an index whose search key specifies an order different
from the sequential order of the file, have to be dense
Here, since name is the search key that the file is sorted by (blue table) it is the primary index and the
phone number is the secondary index
Index-sequential file: ordered sequential file with a primary index
Dense index: index points to every search-key value in the file, very large in size; examples:
Sparse index: contains index records for only some search-key values, less space and less
maintenance overhead for insertions and deletions, generally slower than dense index for locating
records; examples:
Each node that is not a root or a leaf has between n/2 and n children
A leaf node has between (n−1)/2 and n − 1 values
B+ Tree Index Files:
Is a balanced binary search tree
Follows a multi-level index format like 2-3-4 Tree
Has the leaf nodes denoting actual data pointers
Ensures that all leaf nodes remain at the same height (like 2-3-4 Tree)
Has the leaf nodes are linked using a link list
Can support random access as well as sequential access
p-1 is the max no. of keys in a node where p is the order; p is the max no. of children
Data point doesn’t need to be repeated in intermediate step
B-Tree:
B-tree allows search-key values to appear only once
3 order B tree means that nodes can have 2 data points and three children
In a B-tree of order m, a non-root node can have at least ⌈m/2⌉ child pointers and ⌈m/2⌉ - 1 keys
Questions:
1. A telecom company has 2^16 customer records in a table T in their database. These records are
sorted in ascending order of the attribute customer_id, which is also the primary key of T. The data
file is stored in a disk with a block size of 512 bytes. Assume that, in each block, the records are
unspanned and are of fixed-length. Each record is of size 32 bytes, the size of the primary key field is
10 bytes and the size of the block pointer is 6 bytes. If a primary (sparse index with an index entry
for every block in the file) index is created on the data file, what is the minimum number of blocks
required for the index file?
Options: A. 64
B. 128
C. 256
D. 512
Answer:
Calculate the total size of the data file: Total size of data file = Number of customer records * Size of
each record = 2^16 * 32 bytes = 2,097,152 bytes
Calculate the number of blocks required to store the data file: Number of blocks for data file = Total
size of data file / Block size = 2,097,152 bytes / 512 bytes = 4096 blocks
Calculate the size of each index entry: Size of each index entry = Size of primary key field + Size of
block pointer = 10 bytes + 6 bytes = 16 bytes
Since primary index is a sparse index with an index entry for every block in the file, no. of entries is
the same as total no. of blocks which is 4096
Calculate the size of the index file: Size of index file = Number of index entries * Size of each index
entry = 4096 entries * 16 bytes = 65,536 bytes
Calculate the number of blocks required to store the index file: Number of blocks for index file =
Size of index file / Block size = 65,536 bytes / 512 bytes = 128 blocks
Transactions
A transaction is a unit of program execution that accesses and, possibly updates, various data items
Failures of various kinds, such as hardware failures and system crashes and concurrent execution
of multiple transactions are the two main issues to deal with in transactions
Example:
Required Properties of a Transaction: ACID (Important):
Atomicity: in the example, if the transaction fails after step 3 and before step 6, money will be “lost”
leading to an inconsistent database state
Atomicity guarantees that all of the commands that make up a transaction are treated as a
single unit and either succeed or fail together. So, the system should ensure that updates of a
partially executed transaction are not reflected in the database
All or nothing
Consistency: consistency guarantees that changes made within a transaction are consistent with
database constraints. This includes all rules, constraints, and triggers. If the data gets into an illegal
state, the whole transaction fails
Preserves database integrity
For example, there is a constraint that the balance should be a positive integer. If we try to
overdraw money, then the balance won’t meet the constraint. Because of that, the consistency
of the ACID transaction will be violated and the transaction will fail
Isolation: isolation ensures that all transactions run in an isolated environment. That enables running
transactions concurrently because transactions don’t interfere with each other
Execute as if they were run alone
Durability: durability guarantees that once a transaction has been committed, it will remain
committed even in the case of a system failure (like power outage or crash). Durability guarantees
that once the transaction completes and changes are written to the database, they are persisted. This
ensures that data within the system will persist even in the case of system failures like crashes or
power outages
Results are not lost by a failure
Transaction States:
Active: the initial state; the transaction stays in this state while it is executing
Partially committed: after the final statement has been executed
Failed: after the discovery that normal execution can no longer proceed
Aborted: after the transaction has been rolled back and the database restored to its state prior to the
start of the transaction. Two options after it has been aborted: restart the transaction (can be done
only if no internal logical error) or kill the transaction
Committed: after successful completion
Terminated: after it has been committed or aborted (killed)
Concurrent Executions:
Multiple transactions are allowed to run concurrently in the system; increased processor and disk
utilization; reduced average response time
Concurrency Control Schemes: mechanisms to achieve isolation
Schedule: A sequence of instructions that specify the chronological order in which instructions of
concurrent transactions are executed
Must consist of all instructions of those transactions
Must preserve the order in which the instructions appear in each individual transaction
Schedule 1:
Schedule 2:
Schedule 3:
Schedule 4:
Serializability:
Serializability helps to ensure Isolation and Consistency of a schedule
Assumption: Each transaction preserves database consistency
A (possibly concurrent) schedule is serializable if it is equivalent to a serial schedule
Conflicting operations: Read and Write are considered as conflicting operations, if they hold the
following conditions:
Both the operations are on the same data
Both the operations (Read and Write) belong to different transactions
At least one of the two operations is a write operation
Non-conflicting operations: Read and Write are considered as non-conflicting operations, if they
hold the following conditions:
Both the operations are on different data item
Both the operations belong to different transactions
Conflict Serializability:
Conflict equivalent: if a schedule S can be transformed into a schedule S’ by a series of swaps of
non-conflicting instructions, we say that S and S’ are conflict equivalent
Conflict Serializability: we say that a schedule S is conflict serializable if it is conflict equivalent to
a serial schedule
Example of a conflict serializable schedule:
Example of a schedule that is not conflict serializable:
All serializable schedules are not conflict-serializable but all conflict-serializable schedules are
serializable
Precedence Graph/ Serialization graph/ Conflict graph: a directed graph where the vertices are
the transactions (names); we draw an arc from Ti to Tj if the two transactions conflict, and Ti
accessed the data item on which the conflict arose earlier.
View Serializability:
View Equivalent: Let S and S’ be two schedules with the same set of transactions. S and S’ are view
equivalent if the following three conditions are met, for each data item Q,
Initial Read: If in schedule S, transaction Ti reads the initial value of Q, then in schedule S’
also transaction Ti must read the initial value of Q
Write-Read Pair: If in schedule S transaction Ti executes read(Q), and that value was
produced by transaction Tj (if any), then in schedule S’ also transaction Ti must read the
value of Q that was produced by the same write(Q) operation of transaction Tj
Final Write: The transaction (if any) that performs the final write(Q) operation in schedule S
must also perform the final write(Q) operation in schedule S’
View serializable: a schedule S is view serializable if it is view equivalent to a serial schedule
Every conflict serializable schedule is also view serializable
Below is a schedule which is view-serializable but not conflict serializable:
Blind write: performing a write operation without having performed a read operation
Every view serializable schedule that is not conflict serializable has blind writes
Recoverability:
Watch/look at lecture or ppt
The RELEASE SAVEPOINT command is used to remove a SAVEPOINT that you have created
Concurrency Control
A database must provide a mechanism that will ensure that all possible schedules are both: conflict
serializable and recoverable (preferably, cascadeless)
Goal: To develop concurrency control protocols that will assure serializability
One way to ensure isolation is to require that data items be accessed in a mutually exclusive manner;
that is, while one transaction is accessing a data item, no other transaction can modify that data item
The most common method used to implement locking requirement is to allow a transaction to access
a data item only if it is currently holding a lock on that item
Lock-Based Protocols:
A locking protocol is a set of rules followed by all transactions while requesting and releasing locks
A lock is a mechanism to control concurrent access to a data item
Data items can be locked in two modes:
Exclusive (X) mode: data item can be both read as well as written; X-lock is requested using
lock-X instruction
Shared (S) mode: data item can only be read; S-lock is requested using lock-S instruction
A transaction can unlock a data item Q by the unlock(Q) Instruction
Lock requests are made to the concurrency-control manager by the programmer
Transaction can proceed only after request is granted
Lock-Compatibility Matrix: a lock compatibility matrix is used which states whether a data item
can be locked by two transactions at the same time
Requesting for / Granting of a Lock: a transaction may be granted a lock on an item if the
requested lock is compatible with locks already held on the item by other transactions
Sharing a Lock: any number of transactions can hold shared locks on an item unless a transaction
holds an exclusive lock on the item no other transaction may hold any lock on the item
Waiting for a Lock: if a lock cannot be granted, the requesting transaction is made to wait till all
incompatible locks held by other transactions have been released
Holding a Lock: a transaction must hold a lock on a data item as long as it accesses that item
Unlocking / Releasing a Lock: transaction Ti may unlock a data item that it had locked at some
earlier point; it is not necessarily desirable for a transaction to unlock a data item immediately after
its final access of that data item, since serializability may not be ensured
Implementation of Locking:
A lock manager can be implemented as a separate process to which transactions send lock and
unlock requests
The lock manager maintains a data-structure called a lock table to record granted locks and pending
requests
Deadlock handling:
Deadlock Prevention protocols ensure that the system will never enter into a deadlock state
Some prevention strategies:
Require that each transaction locks all its data items before it begins execution (pre-
declaration)
Impose partial ordering of all data items and require that a transaction can lock data items
only in the order specified by the partial order
Transaction Timestamp: Timestamp is a unique identifier created by the DBMS to identify the
relative starting time of a transaction. Timestamping is a method of concurrency control in which
each transaction is assigned a transaction timestamp
Following schemes use transaction timestamps for the sake of deadlock prevention alone:
wait-die scheme (non-pre-emptive): older transaction may wait for younger one to release
data item; younger transactions never wait for older ones, they are rolled back instead. A
transaction may die several times before acquiring needed data item
wound-wait scheme (pre-emptive): older transaction wounds (forces rollback) of younger
transaction instead of waiting for it; younger transactions may wait for older ones. May be
fewer rollbacks than wait-die scheme
Timeout-Based Schemes:
A transaction waits for a lock only for a specified amount of time. If the lock has not been
granted within that time, the transaction is rolled back and restarted
Thus, deadlocks are not possible
Simple to implement; but starvation is possible. Also, difficult to determine good value of the
timeout interval
The system is in a deadlock state if and only if the wait-for graph has a cycle
Timestamp-Based Protocols:
The protocol manages concurrent execution such that the time-stamps determine the serializability
order
In order to assure such behaviour, the protocol maintains for each data Q two timestamp values:
W-timestamp(Q) is the largest time-stamp of any transaction that executed write(Q)
successfully
R-timestamp(Q) is the largest time-stamp of any transaction that executed read(Q)
successfully
The timestamp ordering protocol ensures that any conflicting read and write operations are executed
in timestamp order
Timestamp protocol ensures freedom from deadlock as no transaction ever waits but the schedule
may not be cascade-free, and may not even be recoverable
Backup
Recovery on any given day only needs the data from the full backup and the most recent
differential backup
Advantages: recoveries require fewer backup sets, better recovery options
Disadvantages: storage may exceed incremental backups, can even reach the size of a full
backup if don’t after a long time
Hot backup: refers to keeping a database up and running while the backup is performed
concurrently
Preferable whenever high availability is a requirement
Dynamic data and systems which run 24x7 need hot backups
Advantages: high availability of database, easier point-in-time recovery, best for dynamic and
modularized data
Disadvantages: may not be feasible when the data set is huge and monolithic, low fault
tolerance, high cost of maintenance and setup
Mainly used for Transaction Log Backup
RAID 0 (Striping):
RAID level-0 only uses data striping, no redundant information is maintained
Effective space utilization for a RAID Level-0 system is always 100 percent
best write performance
least costly
Reliability is very poor
RAID 1 (Mirroring):
RAID 1 employs mirroring, maintaining two identical copies of the data on two different
disks
most expensive solution
excellent fault tolerance
parallel reads
transfer rate for a single request is comparable to the transfer rate of a single disk
The effective space utilization is 50 percent
RAID 2 (Parity):
RAID 2 uses designated drive for parity
In RAID 2, the striping unit is a single bit
Hamming code
Effective disks utilised = (n + 1)/ n
Recovery
Recovery is the process of restoring the database to its latest known consistent state after a system
failure occurs
A Database Log records all transactions in a sequence. Recovery using logs is quite popular in
databases. A typical log file contains information about transactions to execute, transaction states,
and modified values
Failure Classifications: transaction failure (logical and system errors), system crash, disk failure
Recovery algorithms have two parts
Actions taken during normal transaction processing to ensure enough information exists to
recover from failures
Actions taken after a failure to recover the database contents to a state that ensures atomicity,
consistency and durability
Storage Structure:
Volatile Storage: does not survive system crashes, examples: main memory, cache memory
Nonvolatile Storage: survives system crashes, examples: disk, tape, flash memory, non-volatile
(battery backed up) RAM; but may still fail, losing data
Stable Storage: a mythical form of storage that survives all failures, approximated by maintaining
multiple copies on distinct non-volatile media
System Buffer blocks are those blocks residing temporarily in main memory.
Physical blocks are the blocks residing on the disk
Undo of a log record < Ti, X, V1, V2 > writes the old value V1 to X; when undo of a
transaction is complete, a log record < Ti abort> is written out; the undo is used for
transaction rollback during normal operation
Redo of a log record < Ti, X, V1, V2 > writes the new value V2 to X; no logging done
The undo and redo operations are used during recovery from failure
Write a log record < checkpoint L > onto stable storage where L is a list of all transactions
active at the time of checkpoint
Any transactions that committed before the last checkpoint should be ignored
Any transactions that committed since the last checkpoint need to be redone
Any transaction that was running at the time of failure needs to be undone and restarted
In this example, T1 can be ignored, T2 and T3 need to be redone, T4 needs to be undone and
restarted
The deferred-modification scheme performs updates to buffer/disk only at the time of transaction
commit
A transaction is said to have committed when its commit log record is output to stable storage; all
previous log records of the transaction must have been output already
Writes performed by a transaction may still be in the buffer when the transaction commits, and may
be output later
Query Processing