Cse V Database Management Systems
Cse V Database Management Systems
UNIT-I
i) Data base
Ans: A database is a collection of related data, where data means recorded facts. Computer-
based repositories for data.It is logically coherent data to which some meaning is attached.
ii) Canned transaction:
Ans: These are the transactions that are carefully programmed and tested in advance. Ex bank
teller, check account balances post withdrawals/deposits.
iii) Data model:
Ans: It is a collection of concepts that can be used to describe the concepts/logical structure of a
database which provides the necessary means to achieve their abstraction.
iv) Meta data:
Ans: It is the data about data which is stored in system catalog which contains description of the
structure of each file the type and the storage format of each file and the various constraints on
the data.
v) Data base designer:
Ans: They arte responsible for identifying the data to be stored and for choosing an appropriate
way to organize it. They also define views for different categories of users.
2. Insulation between Programs and Data and Data Abstraction - Called program-data
independence. Allows changing data storage structures and operations without having to
change the DBMS access programs. The structure of data files is stored in the DBMS
catalog separately from the access programs.
3. Data Abstraction: A data model is used to hide storage details and present the users with
a conceptual view of the database.
4. Support of Multiple Views of the Data - Each users may see a different view of the
database, which describes only the data of interest to that user. (fig 1.4)
5. Sharing of Data and Multi-user Transaction Processing - the DBMS must include
concurrency control software to ensure that the result of multi-user access is correct.
5 ) Define and explain the following terms with an example for each.
8 Marks (Dec /Jan 2013, Jun / July2014)
i) Snapshot: The data in the database at a particular moment in time is called a database
state or snapshot. A database state (also called instances) changes every time data is inserted,
deleted, or modified
ii) Intension: The distinction between database schema and database state is that when we
define a new database, we specify the database schema only to the DBMS The schema is
sometimes called the intension.
iii) Extension: The distinction between database schema and database state is that when we
define a new database, we specify the database schema only to the DBMS (the current state of
the database is the empty state with no data). The database state is called an extension of the
schema
6) Breifly discuss the advantages of using the DBMS 8 Marks (Dec/Jan 2014)
4. Providing Storage Structures for Efficient Query Processing: The DBMS maintains
indexes (typically in the form of trees and/or hash tables) that are utilized to improve the
execution time of queries and updates. (The choice of which indexes to create and
maintain is part of physical database design and tuning (see Chapter 16) and is the
responsibility of the DBA.
The query processing and optimization module is responsible for choosing an efficient
query execution plan for each query submitted to the system. (See Chapter 15.)
5. Providing Backup and Recovery: The subsystem having this responsibility ensures that
recovery is possible in the case of a system crash during execution of one or more
transactions.
6. Providing Multiple User Interfaces: For example, query languages for casual users,
programming language interfaces for application programmers, forms and/or command
codes for parametric users, menu-driven interfaces for stand-alone users.
7. Representing Complex Relationships Among Data: A DBMS should have the
capability to represent such relationships and to retrieve related data quickly.
8. Enforcing Integrity Constraints: Most database applications are such that the semantics
(i.e., meaning) of the data require that it satisfy certain restrictions in order to make sense.
Dept. of CSE, SJBIT Page 4
Data Base Management System 10CS54
Perhaps the most fundamental constraint on a data item is its data type, which specifies
the universe of values from which its value may be drawn. (E.g., a Grade field could
be defined to be of type Grade_Type, which, say, we have defined as including precisely
the values in the set { "A", "A-", "B+", ..., "F" }.
Another kind of constraint is referential integrity, which says that if the database includes
an entity that refers to another one, the latter entity must exist in the database. For
example, if (R56547, CIL102) is a tuple in the Enrolled_In relation, indicating that a
student with ID R56547 is taking a course with ID CIL102, there must be a tuple in the
Student relation corresponding to a student with that ID.
9. Permitting Inferencing and Actions Via Rules: In a deductive database system, one
may specify declarative rules that allow the database to infer new data! E.g., Figure out
which students are on academic probation. Such capabilities would take the place of
application programs that would be used to ascertain such information otherwise.
Active database systems go one step further by allowing "active rules" that can be used to
initiate actions automatically.
7 ) Discuss the main Characteristics of the database approach. How does it differ
from Traditional file systems? ` 8 Marks (Jun / July 2013, Jun / July2014)
Characteristics of the Database Approach are
Self-Describing Nature of a Database System - it has a complete definition or description of the
database structure and constraints. This definition is stored in the system catalog, which contains
information such as the structure of each file, the type and storage format of each data item, and
various constraints on the data. This information stored in the system catalog is called, Meta-
data and it describes the structure of the primary database. This allows the DBMS software to
work with different databases (Fig 1.1)
Insulation between Programs and Data and Data Abstraction - Called program-data
independence. Allows changing data storage structures and operations without having to change
the DBMS access programs. The structure of data files is stored in the DBMS catalog separately
from the access programs.
Data Abstraction: A data model is used to hide storage details and present the users with a
conceptual view of the database.
Support of Multiple Views of the Data - Each users may see a different view of the database,
Dept. of CSE, SJBIT Page 5
Data Base Management System 10CS54
which describes only the data of interest to that user. (fig 1.4)
Sharing of Data and Multiuser Transaction Processing - the DBMS must include concurrency
control software to ensure that the result of multiuser access is correct.
9. Explain the three-schema architecture. What is the logical data independence and physical
data independence? 8 or 10marks (June/July
2015/Jan 2016)
One important characteristic of the database approach was the insulation of programs and
data (see Chapter 1). We can define two types of data independence:
logical data independence - the capacity to change the conceptual scheme without having to
change external schemas or application programs.
Physical data independence - the capacity to change the internal scheme without having to
change the conceptual (or external) schemas. Changes to the internal schema may be needed
because some physical files had to be reorganized.
10) Define the database and briefly explain the implicit properties of the database?
11) Define the following with examples: 10 marks (Dec 14/Jan 15)
ii) Complex attributes We refer to an attribute that involves some combination of multi-
valued and compositeness as a complex attribute.
iii) Data model a collection of concepts that can be used to describe the
conceptual/logical structure of a database--- provides the necessary means to achieve this
abstraction
iv) Schema constructs which is specified during design and is not expected to change
often.
v) Meta data (i.e., data about data) is stored in the so-called system catalog,
UNIT-II
Entity-Relationship Model
1 . List the summary of the notations for ER diagrams. Include symbols used in
ER diagram and their meaning. 8 Marks (Jun / July2014/Jan 2016)
2. Explain hoe role names are assigned in case of recursive relationships? Illustrate this
concept with an example. . 8 Marks (Jun / July 2013/Jan 2016)
A relationship can relate two entities of the same entity type ; for example, a
SUPERVISION
relationship type relates one EMPLOYEE (in the role of supervisee ) to another EMPLOYEE
(in the role of supervisor ). This is called a recursive relationship type
A weak entity type must participate in an identifying relationship type with an owner
or identifying entity type
Entities are identified by the combination of:
A partial key of the weak entity type
The particular entity they are related to in the identifying entity type
A weak entity type usually has a partial key, which is the set of attributes that can
uniquely identify weak entities that are related to the same owner entity. For
example, Name of
DEPENDENT is the partial key
4. Define an entity and an attribute,explain the different types of attributes that occur
in an ERdiagram model,with an example 8 Marks (Dec /Jan 2013, Dec/Jan 2014)
Our focus now is on the second phase, conceptual design, for which The Entity-
Relationship
(ER) Model is a popular high-level conceptual data model.
In the ER model, the main concepts are entity, attribute, and relationship.
Entities and Attributes
Entity: An entity represents some "thing" (in the miniworld) that is of interest to us, i.e.,
about which we want to maintain some data. An entity could represent a physical object (e.g.,
Dept. of CSE, SJBIT Page 10
Data Base Management System 10CS54
house, person, automobile, widget) or a less tangible concept (e.g., company, job, academic
course).
Attribute: An entity is described by its attributes, which are properties characterizing it. Each
attribute has a value drawn from some domain (set of meaningful values).
Example: A PERSON entity might be described by Name, BirthDate, Sex, etc., attributes,
each having a particular value.
What distinguishes an entity from an attribute is that the latter is strictly for the purpose of
describing the former and is not, in and of itself, of interest to us. It is sometimes said that an
entity has an independent existence, whereas an attribute does not. In performing data
modeling, however, it is not always clear whether a particular concept deserves to be
classified as an entity or "only" as an attribute.
zero or more academic degrees, dependents, or (if the person is a male living in Utah)
spouses! How can we model this via attributes AcademicDegrees, Dependents, and Spouses?
One way is to allow such attributes to be multi-valued (perhaps set-valued is a better term),
which is to say that we assign to them a (possibly empty) set of values rather than a single
value. To distinguish a multi-valued attribute from a single-valued one, it is customary to
enclose the former within curly braces (which makes sense, as such an attribute has a value
that is a set, and curly braces are traditionally used to denote sets). Using the PERSON
example from above, we would depict its structure in text as
PERSON(SSN, Name, BirthDate(Month, Day, Year), { AcademicDegrees(School, Level,
Year) },
{ Dependents }, ...)
Here we have taken the liberty to assume that each academic degree is described by a school,
level (e.g., B.S., Ph.D.), and year. Thus, AcademicDegrees is not only multi-valued but also
composite. We refer to an attribute that involves some combination of multi-valuedness and
compositeness as a complex attribute.
A more complicated example of a complex attribute is AddressPhone in Figure 3.5 (page 65).
This attribute is for recording data regarding addresses and phone numbers of a business. The
structure of this attribute allows for the business to have several offices, each described by an
address and a set of phone numbers that ring into that office. Its structure is given by
{ AddressPhone( { Phone(AreaCode, Number) }, Address(StrAddr(StrNum, StrName,
AptNum),
City, State, Zip)) }
Stored vs. derived attribute: Perhaps independent and derivable would be better terms for
these (or non-redundant and redundant). In any case, a derived attribute is one whose value
can be calculated from the values of other attributes, and hence need not be stored. Example:
Age can be calculated from BirthDate, assuming that the current date is accessible. The Null
value: In some cases a particular entity might not have an applicable value for a particular
attribute. Or that value may be unknown. Or, in the case of a multi-valued attribute, the
appropriate value might be the empty set.
Example: The attribute DateOfDeath is not applicable to a living person and its correct value
may be unknown for some persons who have died.
In such cases, we use a special attribute value (non-value?), called null. There has been some
argument in the database literature about whether a different approach (such as having distinct
values for not applicable and unknown) would be superior
5. Define the following with an example. 20 Marks (Jun/July 2013 & June/July 2105)
Weak Entity Types: An entity type that has no set of attributes that qualify as a key is called
weak. (Ones that do are strong.). An entity of a weak identity type is uniquely identified by
the specific entity to which it is related (by a so-called identifying relationship that relates
the weak entity type with its so-called identifying or owner entity type) in combination with
some set of its own attributes (called a partial key).
Example: A DEPENDENT entity is identified by its first name together with the EMPLOYEE
entity to which it is related via DEPENDS_ON. (Note that this wouldn't work for former
heavyweight boxing champion George Foreman's sons, as they all have the name "George"!)
Because an entity of a weak entity type cannot be identified otherwise, that type has a total
participation constraint (i.e., existence dependency) with respect to the identifying
relationship.
This should not be taken to mean that any entity type on which a total participation constraint
exists is weak. For example, DEPARTMENT has a total participation constraint with respect
to MANAGES, but it is not weak.
In an ER diagram, a weak entity type is depicted with a double rectangle and an identifying
relationship type is depicted with a double diamond
ii)Participation constraint
• participation: specifies whether or not the existence of an entity depends upon its
being related to another entity via the relationship.
total participation (or existence dependency): To say that entity type A is constrained to
participate totally in relationship R is to say that if (at some moment in time) R's
instance set is
{ (a1, b1), (a2, b2), ... (am, bm) },
then (at that same moment) A's instance set must be { a1, a2, ..., am }. In
other words, there can be no member of A's instance set that does not participate in at least
one instance of R. According to our informal description of COMPANY, every employee
must be assigned to some department. That is, every employee instance must participate in at
least one instance of WORKS_FOR, which is to say that EMPLOYEE satisfies the total
Also note that, in our COMPANY example, all relationship instances will be ordered
pairs, as each relationship associates an instance from one entity type with an instance of
another (or the same, in the case of SUPERVISES) relationship type. Such relationships
are said to be binary, orto have degree two. Relationships with degree three (called
ternary) or more are also possible, although not as common.
SUPPLY (perhaps not the best choice for a name) has as instances ordered triples of
suppliers, parts, and projects, with the intent being that inclusion of the ordered triple
(s2, p4, j1), for example, indicates that supplier s2 supplied part p4 to project j1).
V) Recursive relationship
An exception to this rule occurs when the same entity type plays two (or more) roles in the
same relationship. (Such relationships are said to be reCURsive, which I find to be a
misleading use of that term. A better term might be self-referential.) For example, in each
instance of a SUPERVISES relationship set, one employee plays the role of supervisor and
the other plays the role of supervisee.
created) – For each binary 1:1 relationship type R in the ER schema, identify the relations
S and T that correspond to the entity types participating in R.
8) Define and explain Partial Key, with example? 4 Marks (Dec/Jan 2014, Jun / July2013)
An attribute is a Partial Key if a Key from a related entity type must be used in conjunction
with the attribute in question to uniquely identify instances of a corresponding entity set.
A relationship can relate two entities of the same entity type ; for example, a SUPERVISION
relationship type relates one EMPLOYEE (in the role of supervisee ) to another EMPLOYEE
(in the role of supervisor ). This is called a recursive relationship type. A relationship type can
have attributes; for example, HoursPerWeek of WORKS_ON; its value for each relationship
instance describes the number of hours per week that an EMPLOYEE works on a PROJECT.
Structural constraints on relationships: Cardinality ratio (of a binary relationship): 1:1, 1:N,
N:1, or M:N. Participation constraint (on each participating entity type): total (called
existence dependency ) or partial.
Alternative (min, max) notation for relationship structural constraints: Specified on each
participation of an entity type E in a relationship type R. Specifies that each entity e in E
participates in at least min and at most max relationship instances in R. Default(no
constraint): min=0, max=n.
10) Draw the ER –diagram of musician who performs for album. Assume any four entities.
Indicate all key and constraints and assumptions that are made? 8 Marks (Jun / July2014)
11. Design an ER diagram for an insurance company. Assume suitable entity types like
CUSTOMER, AGENT, BRANCH, POLICY, PAYMENT and the relation between them.
10 marks (June/ July 2015) (Dec 14/Jan 15)
12. What are structural constraints on relationship types? Explain with an example?
EMPLOYEE participates partially. This is not to say that for all employees to be
managers is not allowed; it only says that it need not be the case that all employees are
managers.
13) What are weak entity type? Explain the role of partial key in design of weak entity type?
they all have the name "George"!) Because an entity of a weak entity type cannot be
identified otherwise, that type has a total participation constraint (i.e., existence
dependency) with respect to the identifying relationship. This should not be taken
to mean that any entity type on which a total participation constraint exists is weak. For
example, DEPARTMENT has a total participation constraint with respect to MANAGES,
but it is not weak. In an ER diagram, a weak entity type is depicted with a double
rectangle and an identifying relationship type is depicted with a double diamond.
UNIT-III
1) Define the following terms with an example for each. 8 Marks (Jun / July2014)
Super key: A set of attributes SK of R such that no two tuples in any valid relation
instance r(R) will have the same value for SK. That is, for any distinct tuples t1 and t2 in
r(R), for any two distinct tuples t1 and t2 in a relation state r of R we have (the subset of
attributes SK is called a superkey of the relation schema R).
Domain: A set/universe of atomic values, where by “atomic” we mean simply that from
the point of view of the database each value in the domain is indivisible.
Tuple: A tuple is a mapping from attributes to values drawn from the respective domains
of those attributes. A tuple is intended to describe some entity in the miniworld.
Relational database schema: It is a set of schemas for its relations together with a set of
integrity constraints.
Entity integrity constraint: The entity integrity constraint states that no primary key
value can be null. This is because the primary key value is used to identify individual tuples
in a relation: having null values for the primary keys implies that we cannot identify some
tuples. Key constraints and entity constraints are specified on individual relations.
Domain constraint: Each attribute value must be NULL or drawn from the domain of that
attribute. Note that some DBMS”s allow you to impose the not null constraint upon an
attribute, which is to say that attribute may not have the value-null.
Semantic integrity constraint: These are application specific restrictions that are
unlikely to be expressible in DDL.
Functional dependency constraint :It is a constraint b/w the two sets of attributes from
two relations.
4) Briefly discuss the different types of update operations on relational database. show
an example of a violation of referential integrity in each of the update operation For
each of the update operations (Insert, Delete, and Update), we consider what kinds
constraint violations may result from applying it and how we might choose to react.
8 Marks (Dec /Jan 2013, Dec/Jan 2014)
Insert:
• domain constraint violation: some attribute value is not of correct domain
• entity integrity violation: key of new tuple is null
• key constraint violation: key of new tuple is same as existing one
• referential integrity violation: foreign key of new tuple refers to non-existent tuple
Ways of dealing with it: reject the attempt to insert! Or give user opportunity to try again with
different attribute values.
Delete:
• referential integrity violation: a tuple referring to the deleted one exists.
Three options for dealing with it:
• Reject the deletion
• Attempt to cascade (or propagate) by deleting any referencing tuples (plus those that
reference them, etc., etc.)
• modify the foreign key attribute values in referencing tuples to null or to some valid
value referencing a different tuple
Update:
• Key constraint violation: primary key is changed so as to become same as another
tuple's
• referential integrity violation:
o foreign key is changed and new one refers to nonexistent tuple
o primary key is changed and now other tuples that had referred to this one
violate the constraint
6) List the characteristics of relation? Discuss any one? 4 Marks (Jun / July2014/June
2016)
Characteristics of Relation
• A tuple can be considered as a set of (<attribute>, <value>) pairs. Thus the following
two tuples are identical:
• t1 = <(Name, B. Bayer),(SSN, 305-61-2435),(HPhone, 3731616),
(Address, 291 Blue Lane),(WPhone, null),(Age, 23),(GPA, 3.25)>
• t2 = <(HPhone, 3731616),(WPhone, null),(Name, B. Bayer),(Age, 23),
(Address, 291 Blue Lane),(SSN, 305-61-2435),(GPA, 3.25)>
• Tuple ordering is not a part of relation, that is the following relation is identical to
that of
Table taken below.
inner join
The cartesian product example above combined each employee with each department. If
we only keep those lines where the dept attribute for the employee is equal to the dnr (the
department number) of the department, we get a nice list of the employees, and the
department.
• If we assume that these relational algebra expressions are executed, inside a relational
DBMS which uses relational algebra operations as its lower-level internal operations,
different relational algebra expressions can take very different time (and memory) to execute.
Consider the following schema
Sailor(Sal-id,Sal-name,rating,age) Reserves(Sal-id,Boat-id,day) Boats(Boat_id,boat-
name,color)
I. Find the names of sailors ,who have reserved all boats,called Interlake?
Select Sal-name from Sailor s, Reserves r,Boats b where
r.sal-id==s.sal-id and r.boat-id==b.boat-id and boat-name=”Interlake”;
II. Find the Sid of the sailor with age over 20 who haven’t reserved the boat.
Select sid from sailor s where NOT EXISTS(select S.Sal-id FROM Sailor S
,Reserves r ,Boats b where r.bid=b.boat-id) and age>20;
III Find the names of sailors,who have reserved atleast Two Boats
Select S.sal-name from sailors S,Reserves r,boats b where r.salid=s.sal-id and
b.boat-id=r.boat-id having count(*)>=2
8) Explain Foreign key and its importance. Can foreign key exist, only for single table
explain? 8 Marks (Jun / July 2013)(10 Marks Jan 2016)
A foreign key is a field (or fields) that points to the primary key of another table. The purpose
of the foreign key is to ensure referential integrity of the data. In other words, only values that
are supposed to appear in the database are permitted.
9) How an Intersection Operator can be implemented using Union and Minus operator?
8 Marks (Jun/July 2013/June 2016)
select Rollno From (
Select 1 AS dummy,Rollno From StageShow
Union ALL
Select 2 AS dummy,Rollno From Sports
) X group By Rollno having count(*)=2
10) Write queries in Relational Algebra? 8 Marks (Dec /Jan 2013/June 2016)
i) Retrieve the number of dependents for an employee “RAM”?
Name, SSN, Dname( SSN=SSN2(Employee x SSN2, Dname(Dependents))
iii)Retrieve the names of employee who works in same department as that of “RAJ”?
Π <name> ( σ <dno=5>(<Employee>))
11. Briefly discuss how the different update operations on a relation deal with constraint
violations? 8 marks (June/July 2015)
Domain Constraints:
It specifies that each attribute in a relation must contain an atomic value only from the corresponding
domains. Domain constraint specifies the condition that we want to put on each instance of the
relation. So the values that appear in each column must be drawn from the domain associated with
that column.
Key Constraints:
This constraints states that the key attribute value in each tuple must be unique, i.e., no two tuples
contain the same value for the key attribute. This is because the value of the primary key is used to
identify the tuples in the relation.
Integrity Constraints:
There are two types of integrity constraints:
• Entity Integrity Constraint
• Referential Integrity Constraint
It states that no primary key value can be null. This is because the primary key is used to identify
individual tuple in the relation. So we will not be able to identify the records uniquely containing null
values for the primary key attributes.
UNIT-IV
SQL – 1
1 ) Given the schema
14 Marks (Jun/July 2013 & June/July 2015/Jan 2016)
EMP ( Fname, Lname, SSN, Bdate, Address, Sex, Salary, SuperSSN, Dno)
DEP T(Dname, Dnumber, MgrSSN, MGrstartdate)
DEPT-LOC (Dnumber, Dloc) PROJECT(Pname, Pnumber, Ploc,Dnum)
WORKS-ON (ESSN,PNo,Hours)
Temp2 σ sex=’F’(Temp)
Result П Lname, Fname (Temp2)
EXISTS is used to check whether the result of a correlated nested query is empty (contains
no tuples) or not.
Ex: Query : Retrieve the name of each employee who has a dependent with the same
first name as the employee.
Q: SELECT FNAME, LNAME
FROM EMPLOYEE
WHERE EXISTS (SELECT *
FROM DEPENDENT WHERE SSN=ESSN
AND FNAME=DEPENDENT_NAME)
The comparison operator IN compares a value v with a set (or multi-set) of values V, and
evaluates to TRUE if v is one of the elements in V
Ex: Retrieve the name of each employee who has a dependent with the same first name as the
employee.
Q: SELECT E.FNAME, E.LNAME
FROM EMPLOYEEASE
WHERE E.SSN IN
(SELECT ESSN
FROM DEPENDENT
WHERE ESSN=E.SSN AND
E.FNAME=DEPENDENT_NAME)
3) How does SQL inplement the entity integrity constraints of relational Data
Model? Explain with an example?
4 Marks (Dec/Jan 2014, Jun / July2013)(8 marks Dec 14/Jan 15)
In general, an update on a view on defined on a single table w/o any aggregate functions can
be mapped to an update on the base table
More on Views
4) Consider the following two tables T1 and T2 show the result of the following
operations
T1∞T1.p=T2.A T2
T1∞T1.Q=T2.B T2
T1∞T1.p=T2.A T2
T1UT2
(ASSUME T1 AND T2 ARE UNION COMPATIBILITY)
P Q R A B C
10 a 5 10 b 6
15 b 8 25 c 3
25 a 6 10 b 5
5. Explain with an example , the basic constraints that can be specified, when you create
a table in SQL. 4 Marks (Jun / July2014/June 2016)
Tuples in the referencing relation R1 have attributes FK (called foreign key attributes)
that reference the primary key attributes PK of the referenced relation R2. A tuple t1 in R1 is
said to reference a tuple t2 in R2 if
t1[FK] = t2[PK]
• Entity integrity constraint -The entity integrity constraint states that no primary key
value can be null. This is because the primary key value is used to identify individual tuples
in a relation: having null values for the primary keys implies that we cannot identify some
tuples. Key constraints and entity constraints are specified on individual relations.
• Foreign key is a set of attributes of one relation R1 whose values are required to
match values of some candidate key of some relation R2 .
For example, the value of attribute DNO in every EMPLOYEE tuple must match
the DNUMBER value of some tuple in the DEPARTMENT relation.
6) Explain how groupBy clause works?What is the difference between WHERE and
Having? In many cases, we want to apply the aggregate functions to subgroups of
tuples in a relation Each subgroup of tuples consists of the set of tuples that have the
same value for the grouping attribute(s)
7) How does SQL implement the entity integrity constraints of relational Data
Model? Explain with an example? (8 Marks (Jun / July 2013/June 2016)
In general, an update on a view on defined on a single table w/o any aggregate functions can
be mapped to an update on the base table
More on Views
We can make the following observations:
• A view with a single defining table is updatable if we view contain PK or CK of the
base table
• View on multiple tables using joins are not updatable
• View defined using grouping/aggregate are not updatable
• Specifying General Constraints
• Users can specify certain constraints such as semantics constraints
CREATE ASSERTION SALARY_CONSTRAINT
CHECK (N O T EXISTS ( SELECT * FROM EMPLOYEE E,
EMPLOYEE M,
DEPARTMENT D
WHERE E.SALARY > M. SALARY AND E.DNO=D.NUMBER AND
D.MGRSSN=M.SSN))
8) Explain all possible options that are specified when referential integrity constraint is
violated using suitable example for all options? 8 Marks (Jun/July 2013, Jun / July2014)
An addition to the original standard allows specification of primary and candidate keys and
• primary key clause includes a list of attributes forming the primary key.
• foreign key clause includes a list of attributes forming the foreign key, and the name
of the relation referenced by the foreign key.
• When a referential integrity constraint is violated, the normal procedure is to reject the
action. But a foreign key clause in SQL-92 can specify steps to be taken to change the
tuples in the referenced relation to restore the constraint.
Example.
create table account
foreign key (bname) references branch
on delete cascade
on insert cascade,
If a delete of a tuple in branch results in the preceding referential integrity constraints
being violated, the delete is not rejected, but the delete ``cascade'' to the account relation,
deleting the tuple that refers to the branch that was deleted. Update will be cascaded to the
new value of the branch!
• SQL-92 also allows the foreign key clause to specify actions other than cascade, such
as setting the refencing field to null, or to a default value, if the constraint is violated.
• If there is a chain of foreign key dependencies across multiple relations, a deletion or
update at one end of the chain can propagate across the entire chain.If a cascading update or
delete causes a constraint violation that cannot be handled by a further cascading operation,
the system aborts the transaction and all the changes caused by the transaction and its
cascading actions are undone. Given and complexity and arbitrary nature of the way
constraints in SQL behave with null values, it is the best to ensure that all columns of unique
and foreign key specifications are declared to be non null.
10) List and explain the basic data types available for attributes in SQL and give example.?
5 marks (June/July 2015)
DECIMAL(p,s) Exact numerical, precision p, scale s. Example: decimal(5,2) is a number that has
before the decimal and 2 digits after the decimal
TIMESTAMP Stores year, month, day, hour, minute, and second values
Temp2 σ sex=’F’(Temp)
Result П Lname, Fname (Temp2)
2. List ‘CSE’ department details.
Result σ Dname=’CSE’(Department)
3. Retrieve the first name, last name and salary of all employees who work
in departmental number 50
Dep-Emps σ Dno=’50’ (Emp)
4. Retrieve the name of the manager of each department.
UNIT-V
SQL – 2
Referential integrity can be violated if the value of any foreign key in t refers to a tuple that
does
2. Write a note on aggregate functions in SQL and Views in SQL with examples.
4 Marks (Jun / July2014 & June/July 2015)
AGGREGATE FUNCTIONS ( )
Functions such as SUM, COUNT, AVERAGE, MIN, MAX are often applied to sets
of values or sets of tuples in database applications
<grouping attributes> <function list>
(R)
SQL Views
In SQL, a view is a virtual table based on the result-set of an SQL statement.
A view contains rows and columns, just like a real table. The fields in a view are fields from one or more
real tables in the database.
CREATE VIEW view_name AS
SELECT column_name(s)
FROM table_name
WHERE condition
SQL CREATE VIEW Examples
3) How is a view created and dropped? What problems are associated with updating
of views? (Dec 11/Jan 12)(6 marks Dec 14/Jan 15)
A view refers to a single table that is derived from other tables
CREATE VIEW WORKS_ON1
AS SELECT FNAME, LNAME, PNAME, HOURS
FROM EMPLOYEE, PROJECT, WORKS_ON WHERE SSN=ESSN AND
PNO=PNUMBER
A view can be dropped as shown below
DROP VIEW WORKS_ON1
Updating of Views
Updating the views can be complicated and ambiguous In general, an update on a view on
defined on a single table w/o any aggregate functions can be mapped to an update on the base
table A view with a single defining table is updatable if we view contain PK or CK of the base
table View on multiple tables using joins are not updatable View defined using
grouping/aggregate are not updatable
4) What is embedded SQL? With an example explain how would you Connect to
a database, fetch records and display. Also explain the concept of stored procedure in
brief. (Dec 11/Jan 13/ Jan 2014) (10 marks Dec 14/ Jan 15)
The embedded SQL statement is distinguished from programming language statements
by prefixing it with a special character or command so that a preprocessor can extract
the SQL statements. In PL/I the keywords EXEC SQL precede any SQL
statement. In some implementations, SQL statements are passed as parameters in
procedure calls. We will use PASCAL as the host programming language, and a "$" sign
to identify SQL statements in the program. Within an embedded SQL command, we may
refer to program variables, which are prefixed by a "%" sign. The programmer should
declare program variables to match the data types of the database attributes that the
program will process. These program variables may or may not have names that are
identical to their corresponding attributes
Example: Write a program segment (loop) that reads a social security number
and prints out some information from the corresponding EMPLOYEE tuple
E1: LOOP:= 'Y';
while LOOP = 'Y' do
begin
writeln('input social security number:');
readln(SOC_SEC_NUM);
$SELECT FNAME, MINIT, LNAME, SSN, BDATE,
ADDRES
INTO %E.FNAME, %E.MINIT, %E.LNAME, %E.SSN,
%E.BDATE, %E.ADDRESS, %E.SALARY
FROM EMPLOYEE
WHERE SSN=%SOC_SEC_NUM ;
writeln( E.FNAME, E.MINIT, E.LNAME, E.SSN,
E.BDAT
5) Explain the syntax of a SELECT statement in SQL.write the SQL query for the
following relation algebra expression. (MAY/JULY 2013/ 2014) (4 Marks Dec 14/Jan 15)
*bdate,address(σFname=’John’ and Lname=’smith’(Employee))
The SELECT Statement
• need special integrators to loop over query results and manipulate individual values
Steps:
• Client program opens a connection to the database server
• Client program submits queries to and/or updates the database
• When database access is no longer needed, client program closes (terminates)
the connection
Embedded SQL
• Most SQL statements can be embedded in a general-purpose host programming
language such as COBOL, C, Java
• An embedded SQL statement is distinguished from the host language
statements by enclosing it between EXEC SQL or EXEC SQL BEGIN and a
matching END-EXEC or
EXEC SQL END (or semicolon)
• Syntax may vary with language
• Shared variables (used in both languages) usually prefixed with a colon (:) in SQL
9) How are Triggers and assertions defined in SQL?Explain (Dec 11/Jan 13/Jan 2016)
Constraints as Assertions
General constraints: constraints that do not fit in the basic SQL categories (presented in
Mechanism: CREAT ASSERTION Components include:
• a constraint name,
• followed by CHECK,
• followed by a condition
The salary of an employee must not be greater than the salary of the manager of the
department that the employee works for’’
CREAT ASSERTION SALARY_CONSTRAINT
CHECK (NOT EXISTS (SELECT *
FROM EMPLOYEE E, EMPLOYEE M,
DEPARTMENT D
WHERE E.SALARY > M.SALARY AND
E.DNO=D.NUMBER AND
D.MGRSSN=M.SSN))
SQL Triggers:
Triggers are expressed in a syntax similar to assertions and include the following:
Event
Such as an insert, deleted, or update operation
Condition
Action
10) Discuss the significance of assertion? Write an assertion to specify constraint such
that the salary of an employee must not be greater than the salary of the manager of the
department that employees works for? (DEC/JAN 2013/ JAN- 2014)
To make matters more confusing - a trigger could be used to enforce a check constraint and in some
DBs can take the place of an assertion (by allowing you to run code un-related to the table being
modified). A common mistake for beginners is to use a check constraint when a trigger is required or
a trigger when a check constraint is required.
Syntax: In the same code the programmer must use two programming styles and
must follow two different grammars. Similar concepts are denoted differently (for instance,
strings in C are written within “…”, in SQL – ‘…’) and different concepts are denoted
similarly (for instance, in C = denotes an assignment, in SQL – a comparison).
Binding phases and mechanisms: Query languages are based on late (run-time)
binding of all the names that occur in queries, while programming languages are based on
early (compile and linking time) binding. Thus, from the point of view of a program, queries
are simply strings of characters.
Name spaces and scope rules: Queries do not see names occurring in programs and
v/v. Because eventually there must be some intersection of these name spaces (e.g. program
variables must parameterize queries) additional facilities, with own syntax, semantics and
pragmatics, are required. These facilities are the burden for the size and legibility of the
program code. Moreover, in programming languages name scopes are organized
hierarchically and processed by special rules based on stacks. These rules are ignored by a
query language. This leads e.g. to problems with recursive procedure calls (a well-known
example concerns SQL cursors that severely reduce the possibility of recursion). Another
disadvantage of separated name spaces concerns automatic refactoring of programs, which
cannot be performed on queries.
Collections: Databases store collections (e.g. tables) which are processed by queries.
In programming languages collections are absent or severely limited. Hence collections
returned by queries have no direct counterparts in a programming language and must be
processed by special constructs with own syntax and semantics.
Null values: Databases and their query languages have specialized features for
storing and processing null values. Such features are absent in programming languages, thus
some substitutes must be introduced. For instance, if some value in a relational database can
be null, mapping it to a programming language requires two variables: one for storing
information about null and another one for storing the value.
12 ) Create View which will display the dname,no of employees working and total salary
of each department? (DEC/JAN 2013/July 2013/June 2016)
CREATE VIEW WORKS_ON1
AS SELECT FNAME, LNAME, PNAME, HOURS ,SALARY
FROM EMPLOYEE, PROJECT, WORKS_ON WHERE SSN=ESSN AND
PNO=PNUMBER .
a) Embedded SQL
loop = 1;
while (loop) {
prompt (“Enter SSN: “, ssn);
EXEC SQL
select FNAME, LNAME, ADDRESS, SALARY
into :fname, :lname, :address, :salary
from EMPLOYEE where SSN == :ssn;
if (SQLCODE == 0) printf(fname, …);
else printf(“SSN does not exist: “, ssn);
prompt(“More SSN? (1=yes, 0=no): “, loop);
END-EXEC
}
A cursor (iterator) is needed to process multiple tuples
FETCH commands move the cursor to the next tuple
CLOSE CURSOR indicates that the processing of query results has been completed
All these things are embedded in a language.
UNIT – 6
Database Design – 1
1 ) What is the need for normalization? Explain the first,second and third normal
forms with examples. (JUN/JULY 2013/Jan 2016)
First normal form is now considered to be part of the formal definition of a relation;
historically, it was defined to disallow multivalued attributes, composite attributes, and their
combinations. It states that the domains of attributes must include only atomic (simple,
indivisible) values and that the value of any attribute in a tuple must be a single value from
the domain of that attribute. Practical Rule: "Eliminate Repeating Groups," i.e., make a
separate table for each set of related attributes, and give each table a primary key.
Formal Definition: A relation is in first normal form (1NF) if and only if all underlying
simple domains contain atomic values only.
Practical Rule: "Eliminate Redundant Data," i.e., if an attribute depends on only part
of a multivalued key, remove it to a separate table.
Formal Definition: A relation is in second normal form (2NF) if and only if it is in 1NF
and every nonkey attribute is fully dependent on the primary key.
3NF: R is in 3NF iff R is 2NF and every nonkey attribute is non-transitively dependent on
the key
2 ) Explain informal design guidelines for relation schemas. (Dec /Jan 12/Jan 2016)
Guideline 1: Design a relation schema so that it is easy to explain its meaning. Do not combine
attributes from multiple entity types and relationship types into a single relation. Reducing redundant
values in tuples. Save storage space and avoid update anomalies.
3. Discuss insertion, deletion, and modification anomalies. Why they are bad? Illustrate
with example? (Jan 2014)
• Insertion anomalies.
• Deletion anomalies.
• Modification anomalies.
Insertion Anomalies
To insert a new employee tuple into EMP_DEPT, we must include either the attribute values
for that department that the employee works for, or nulls. It's difficult to insert a new
department that has no employee as yet in the EMP_DEPT relation. The only way to do
this is to place null values in the attributes for employee. This causes a problem because
SSN is the primary key of EMP_DEPT, and each tuple is supposed to represent an employee
entity - not a department entity.
Deletion Anomalies
If we delete from EMP_DEPT an employee tuple that happens to represent the last
employee working for a particular department, the information concerning that department is
lost from the database.
Modification Anomalies
In EMP_DEPT, if we change the value of one of the attributes of a particular department- say
the manager of department 5- we must update the tuples of all employees who work
in that department.
Guideline 2: Design the base relation schemas so that no insertion, deletion, or
modification anomalies occur. Reducing the null values in tuples. e.g., if 10% of employees
have offices, it is better to have a separate relation, EMP_OFFICE, rather than an attribute
OFFICE_NUMBER in EMPLOYEE.
Guideline 3: Avoid placing attributes in a base relation whose values are mostly
null. Disallowing spurious tuples.
Spurious tuples - tuples that are not in the original relation but generated by natural join
Dept. of CSE, SJBIT Page 51
Data Base Management System 10CS54
of decomposed subrelations.
Example: decompose EMP_PROJ into EMP_LOCS and EMP_PROJ1.
Fig. 14.5a
Guideline 4: Design relation schemas so that they can be naturally JOINed on primary
keys or foreign keys in a way that guarantees no spurious tuples are generated.
4) Which normal form is based on the concept of transitive dependency? Explain with
an example the decomposition into 3NF (July2013 /Jan 2013/Jan 2016)
5) what is the need for normalization ?explain second normal form ( Jan 2013/June
2016)
Second Normal Form (2NF)
Second normal form is based on the concept of fully functional dependency. A functional X
Y is a fully functional dependency is removal of any attribute A from X means that
the dependency does not hold any more. A relation schema is in 2NF if every nonprime
attribute in relation is fully functionally dependent on the primary key of the relation. It also
can be restated as: a relation schema is in 2NF if every nonprime attribute in relation is not
partially dependent on any key of the relation.
Practical Rule: "Eliminate Redundant Data," i.e., if an attribute depends on only part
of a multivalued key, remove it to a separate table.
Formal Definition: A relation is in second normal form (2NF) if and only if it is in 1NF
and every nonkey attribute is fully dependent on the primary key.
6. Explain any Two informal quality measures employed for a relation schema Design?
Delete Anomaly: When a project is deleted, it will result in deletingall the employees
who work on that project. Alternately, if an employee is the sole employee on a
project, deleting that employee would result in deleting the corresponding project.
salesman and hence primary key is {car-no, salesman} additional dependencies are:
Date-sold discount and salesmanno. Commission? (JAN -2014)
I) Yes this relation is in 1NF
C) Discuss the minimal sets of FD’S?
Every set of FDs has an equivalent minimal set
There can be several equivalent minimal sets
There is no simple algorithm for computing a minimal set of FDs that is equivalent to a set F
of FDs
To synthesize a set of relations, we assume that we start with a set of dependencies that is
a minimal sets.
8 ) Suggest and explain three different techniques to achieve 1nf using suitable example?
(DEC/JAN 2013/June 2016)
First normal form is now considered to be part of the formal definition of a relation;
historically, it was defined to disallow multivalued attributes, composite attributes, and their
combinations. It states that the domains of attributes must include only atomic (simple,
indivisible) values and that the value of any attribute in a tuple must be a single value from
the domain of that attribute.
Practical Rule: "Eliminate Repeating Groups," i.e., make a separate table for each set of
related attributes, and give each table a primary key.
Formal Definition: A relation is in first normal form (1NF) if and only if all underlying
simple domains contain atomic values only.
A non-prime attribute of R is an attribute that does not belong to any candidate key of R. A
transitive dependency is a functional dependency in which X → Z (X determines Z) indirectly,
by
virtue of X → Y and Y → Z (where it is not the case that Y → X).
A-X, the set difference between A and X is a prime attribute (i.e., A-X is contained within a
candidate key
c) Consider the Relation R and FD A->B,C->DF,AC->E,D->F?WHAT IS KEY AND Highest
normal form? if it is not in 3nf find decomposition that is lossless and dependency
preserving?
8 marks
Ans: The key here is E because it has no incoming only outgoing edges.
A->B,C->DF,AC->E,D->F
UNIT – 7
Database Design -2
1. Explain multivalued dependency and fourth normal form 4NF with examples.
A relation that is in Boyce-Codd Normal Form and contains no MVDs. BCNF to 4NF involves the
removal of the MVD from the relation by placing the attribute(s) in a new relation along with a copy
of the determinant(s). A Relation is in 4NF if it is in 3NF and there is no multivalued dependencies.
Note: The word loss in lossless refers to loss of information, not to loss of tuples. In fact,
for “loss of information” a better term is “addition of spurious information”
Algorithm 11.1: Testing for Lossless Join Property
Input: A universal relation R, a decomposition D = {R1, R2, ..., Rm} of R, and aset F of
functional dependencies.
1. Create an initial matrix S with one row i for each relation Ri in D, and one column j for
each attribute Aj in R.
2. Set S(i,j):=bij for all matrix entries. (* each bij is a distinct symbol associated with indices
(i,j) *).
3. For each row i representing relation schema Ri
{for each column j representing attribute Aj
{if (relation Ri includes attribute Aj) then set S(i,j):= aj;};};
(* each aj is a distinct symbol associated with index (j) *)
Algorithm 11.1: Testing for Lossless Join Property
4. Repeat the following loop until a complete loop execution results in no changes to S
{for each functional dependency X Y in F
{for all rows in S which have the same symbols in the columns corresponding to attributes
in X{make the symbols in each column that correspond to an attribute in Y be the same in
all these rows as follows: If any of the rows has an “a” symbol for the column, set the other
rows to that same “a” symbol in the column. If no “a” symbol exists for the attribute in any
of the rows, choose one of the “b” symbols that appear in one of the rows for the attribute
and set the other rows to that same “b” symbol in the column ;};
};
};
5. If a row is made up entirely of “a” symbols, then the decomposition has the lossless
join property; otherwise it does not.
A relation that is in Boyce-Codd Normal Form and contains no MVDs. BCNF to 4NF
involves the removal of the MVD from the relation by placing the attribute(s) in a new
relation along with a copy of the determinant(s).
Note: The word loss in lossless refers to loss of information, not to loss of tuples. In fact,
for “loss of information” a better term is “addition of spurious information”
Algorithm 11.1: Testing for Lossless Join Property
Input: A universal relation R, a decomposition D = {R1, R2, ..., Rm} of R, and a set F of
functional dependencies.
1. Create an initial matrix S with one row i for each relation Ri in D, and one column j for
each attribute Aj in R.
2. Set S(i,j):=bij for all matrix entries. (* each bij is a distinct symbol associated with indices
(i,j)*).
3. For each row i representing relation schema Ri
{for each column j representing attribute Aj
{if (relation Ri includes attribute Aj) then set S(i,j):= aj;};};
(* each aj is a distinct symbol associated with index (j) *)
Algorithm 11.1: Testing for Lossless Join Property
4. Repeat the following loop until a complete loop execution results in no changes to S
{for each functional dependency X Y in F
{for all rows in S which have the same symbols in the columns
corresponding to attributes in X
{make the symbols in each column that correspond to an attribute in Y be the same in all
these rows as follows:
If any of the rows has an “a” symbol for the column, set the other rows to that same “a”
symbol in the column.
If no “a” symbol exists for the attribute in any of the rows,
choose
one of the “b” symbols that appear in one of the rows for the attribute and set the other
rows to
that same “b” symbol in the column ;};
};
};
5. If a row is made up entirely of “a” symbols, then the decomposition has the lossless
join
property; otherwise it does not.
Lossless (nonadditive) join test for n-ary
decompositions.
(a) Case 1: Decomposition of EMP_PROJ into EMP_PROJ1 and EMP_LOCS fails test.
(b) A decomposition of EMP_PROJ that has the lossless join property
decompositions.
(c) Case 2: Decomposition of EMP_PROJ into EMP, PROJECT, and WORKS_ON
satisfies test
(a) The EMP relation with two MVDs: ENAME —>> PNAME and ENAME —>> DNAME.
(b) Decomposing the EMP relation into two 4NF relations EMP_PROJECTS
and EMP_DEPENDENTS.
The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has the JD(R1, R2, R3).
(d) Decomposing the relation SUPPLY into the 5NF relations R1, R2, and R3.
6) Which Normal form is based on a concept of MVD?explain the same with example?
A relation that is in Boyce-Codd Normal Form and contains no MVDs. BCNF to 4NF
involves the removal of the MVD from the relation by placing the attribute(s) in a new
relation along with a copy of the determinant(s). A Relation is in 4NF if it is in 3NF and there
is no multivalued dependencies.
A join dependency (JD), denoted by JD{R1, R2, …, Rn}, specified on relation schema R,
specifies a constraint on the states r of R. The constraint states that every legal state r of R
should have a lossless join decomposition into R1, R2, …, Rn; that is, for every such r we have
Lossless-join property refers to when we decompose a relation into two relations - we can
rejoin the resulting relations to produce the original relation. However, sometimes there is the
requirement to decompose a relation into more than two relations. Although rare, these cases
are managed by join dependency and 5NF.
The idea behind domain-key normal form (DKNF) is to specify (theoretically, at least) the
"ultimate normal form" that takes into account all possible types of dependencies and
constraints. A relation is said to be in DKNF if all constraints and dependencies that should
hold on the relation can be enforced simply by enforcing the domain constraints and key
constraints on the relation. However, because of the difficulty of including complex
constraints in a DKNF relation, its practical utility is limited, since it may be quite difficult to
specify general integrity constraints.
For example, consider a relation CAR(MAKE, VIN#) (where VIN# is the vehicle
identification number) and another relation MANUFACTURE(VIN#, COUNTRY) (where
COUNTRY is the country of manufacture). A general constraint may be of the following
form: "If the MAKE is either Toyota or Lexus, then the first character of the VIN# is a "J" if
the country of manufacture
is Japan; if the MAKE is Honda or Acura, the second character of the VIN# is a "J" if the
country of manufacture is Japan." There is no simplified way to represent such constraints
short of writing a procedure (or general assertions) to test them.
UNIT-VIII
Transaction Management
2. What is a schedule? Explain with example serial, non serial and conflict
serializable
schedules. (JUN/JULY 2013/Jan 2016)
A schedule (or history) S of n transactions T1, T2, ..., Tn is an ordering of the operations of
the transactions subject to the constraint that, for each transaction Ti that participates in
S, the operations of Ti in S must appear in the same order in which they occur in Ti. Note,
however, that operations from other transactions Tj can be interleaved with the operations
of Ti in S. For now, consider the order of operations in S to be a total ordering,
although it is possible theoretically to deal with schedules whose operations form partial
orders.
which we call Sb, can be written as follows, if we assume that transaction T1 aborted after
its read_item(Y) operation:
Two operations in a schedule are said to conflict if they satisfy all three of the
following conditions:
1. they belong to different transactions;
),
and the operations w1(X) and w2(X). However, the operations r1(X) and
r2(X) do not conflict, since they are both read operations; the operations w2(X) and w1(Y) do
not conflict, because they operate on distinct data items X and Y; and the operations r1(X) and
w1(X) do not conflict, because they belong to the same transaction.
• A schedule S is serial if, for every transaction T all the operations of T are
executed consecutively in the schedule.
• A schedule S of n transactions is serializable if it is equivalent to some serial
Dept. of CSE, SJBIT Page 67
Data Base Management System 10CS54
When in-place update (immediate or deferred) is used then log is necessary for recovery and it
must be available to recovery manager. This is achieved by Write-Ahead Logging (WAL)
protocol. WAL states that
For Undo: Before a data item’s AFIM is flushed to the database disk (overwriting the BFIM) its
BFIM must be written to the log and the log must be saved on a stable store (log disk).
For Redo: Before a transaction executes its commit operation, all its AFIMs must be written to the
log and the log must be saved on a stable store.
COMMIT or ROLLBACK.
Characteristics specified by a SET TRANSACTION statement in SQL2:
Two locks modes (a) shared (read) and (b) exclusive (write). Shared mode: shared lock (X).
More than one transaction can apply share lock on X for reading its value but no write lock can be
applied on X by any other transaction. Exclusive mode: Write lock (X). Only one write lock on X
can exist at any time and no shared lock can be applied by any other transaction on X.
Conflict matrix
Read Write
Y N
N N
If the condition in part (a) does not exist, then execute write_item(X) of T and set
write_TS(X) to TS(T).
If write_TS(X) > TS(T), then an younger transaction has already written to the data item so
abort and roll-back T and reject the operation.
5 explain the problems that can occur when concurrent transaction are executed give
examples (JULY 2013/ Jan 2014/July 2015)
Why Concurrency Control is needed:
The Lost Update Problem
This occurs when two transactions that access the same database items have their operations
interleaved in a way that makes the value of some database item incorrect.
The Temporary Update (or Dirty Read) Problem
This occurs when one transaction updates a database item and then the transaction fails for some
reason (see Section 17.1.4).
The updated item is accessed by another transaction before it is changed back to its original value.
The Incorrect Summary Problem
If one transaction is calculating an aggregate summary function on a number of records while
other transactions are updating some of these records, the aggregate function may calculate some
values before they are updated and others after they are updated
Locking is an operation which secures (a) permission to Read or (b) permission to Write a data
item for a transaction. Example: Lock (X). Data item X is locked in behalf of the requesting
transaction. Unlocking is an operation which removes these permissions from the data
item.
Example: Unlock (X). Data item X is made available to all other transactions.
Two locks modes (a) shared (read) and (b) exclusive (write).
Shared mode: shared lock (X). More than one transaction can apply share lock on X for
reading its value but no write lock can be applied on X by any other transaction.
Exclusive mode: Write lock (X). Only one write lock on X can exist at any time and no
Conflict matrix
conflict matrix
Two-Phase Locking Techniques: Essential components
Lock Manager: Managing locks on data items.
Lock table: Lock manager uses it to store the identify of transaction locking a data item, the
data item, lock mode and pointer to the next data item locked. One simple way to implement
a lock table is through linked list.
go to B
end;
The following code performs the unlock operation:
if LOCK (X) = “write-locked” then
begin LOCK (X) ← “unlocked”;
wakes up one of the transactions, if any
end
else if LOCK (X) ← “read-locked” then
begin
no_of_reads (X) ← no_of_reads (X) -1
if no_of_reads (X) = 0 then
begin
LOCK (X) = “unlocked”;
wake up one of the transactions, if any
end
end;
Lock conversion
Lock upgrade: existing read lock to write lock
if Ti has a read-lock (X) and Tj has no read-lock (X) (i ≠ j) then
convert read-lock (X) to write-lock (X)
else
force Ti to wait until Tj unlocks X
Lock downgrade: existing write lock to read lock
Ti has a write-lock (X) (*no transaction can have any lock on X*)
convert write-lock (X) to read-lock (X)
Two Phases: (a) Locking (Growing) (b) Unlocking (Shrinking).
Locking (Growing) Phase: A transaction applies locks (read or write) on desired data
items one at a time.
Unlocking (Shrinking) Phase: A transaction unlocks its locked data items one at a time.
Requirement: For a transaction these two phases must be mutually exclusively, that is,
during locking phase unlocking phase must not start and during unlocking phase locking
phase must not begin.
T1 T2 Result
read_lock (Y); read_lock (X); Initial values: X=20; Y=30
read_item (Y); read_item (X); Result of serial execution
unlock (Y); unlock (X); T1 followed by T2
write_lock (X); Write_lock (Y); X=50, Y=80.
read_item (X); read_item (Y); Result of serial execution
X:=X+Y; Y:=X+Y; T2 followed by T1
write_item (X); write_item (Y); X=70, Y=50
unlock (X); unlock (Y);
Two-Phase Locking Techniques: The algorithm
T1 T2 Result
read_lock (Y); X=50; Y=50
read_item (Y); Nonserializable because it.
unlock (Y); violated two-phase policy.
read_lock (X);
read_item (X);
unlock (X);
write_lock (Y);
read_item (Y);
Y:=X+Y;
write_item (Y);
unlock (Y);
write_lock (X);
read_item (X);
X:=X+Y;
write_item (X);
unlock (X);
Two-Phase Locking Techniques: The algorithm
T’1 T’2
Conservative: Prevents deadlock by locking all desired data items before transaction
begins execution.
Basic: Transaction locks data items incrementally. This may cause deadlock which is
dealt with.
Strict: A more stricter version of Basic algorithm where unlocking is performed after a
transaction terminates (commits or aborts and rolled-back). This is the most commonly
used two-phase locking algorithm.
Serializability of Schedules
If no interleaving of operations is permitted, there are only two possible arrangement for
transactions T1 and T2.
Execute all the operations of T1 (in sequence) followed by all the operations of T2 (in
sequence).
Execute all the operations of T2 (in sequence) followed by all the operations of T1
A schedule S is serial if, for every transaction T all the operations of T are executed
consecutively in the schedule.
n transactions.
a) 2PL Lock:
Locking is an operation which secures (a) permission to Read or (b) permission to Write a data
item for a transaction. Example: Lock (X). Data item X is locked in behalf of the requesting
transaction.
Unlocking is an operation which removes these permissions from the data item.
Example: Unlock (X). Data item X is made available to all other transactions.
Two locks modes (a) shared (read) and (b) exclusive (write).
Shared mode: shared lock (X). More than one transaction can apply share lock on X for reading its
value but no write lock can be applied on X by any other transaction.
Exclusive mode: Write lock (X). Only one write lock on X can exist at any time and no shared
lock can be applied by any other transaction on X.
Lock table: Lock manager uses it to store the identify of transaction locking a data item, the
data item, lock mode and pointer to the next data item locked. One simple way to
implement a lock table is through linked list.
goto B
end;
Dead Locks:
Deadlock
T’1 T’2
read_lock (Y); T1 and T2 did follow two-phase read_item
(Y);
policy but they are deadlock
read_lock (X);
read_item (Y);
write_lock (X);
(waits for X) write_lock (Y);
(waits for Y)
Deadlock (T’1 and T’2)
Deadlock prevention
A transaction locks all data items it refers to before it begins execution. This way of
locking prevents deadlock since a transaction never waits for a data item. The conservative
two-phase locking uses this approach.
Deadlock detection and resolution
In this approach, deadlocks are allowed to happen. The scheduler maintains a wait-for-
graph for detecting cycle. If a cycle exists, then one transaction involved in the cycle is
selected (victim) and rolled-back.
A wait-for-graph is created using the lock table. As soon as a transaction is blocked, it is
added to the graph. When a chain like: Ti waits for Tj waits for Tk waits for Ti or Tj
occurs, then this creates a cycle. One of the transaction of the cycle is selected and rolled
back.
Deadlock avoidance
There are many variations of two-phase locking algorithm. Some avoid deadlock by not
letting the cycle to complete. That is as soon as the algorithm discovers that
blocking a transaction is likely to create a cycle, it rolls back the transaction. Wound-
Wait and Wait-Die algorithms use timestamps to avoid deadlocks by rolling-back victim
Starvation
Starvation occurs when a particular transaction consistently waits or restarted and never
gets a chance to proceed further.. In Wound-Wait scheme a younger transaction may
In a deadlock resolution it is possible that the same transaction may consistently be selected
as victim and rolled-back. This limitation is inherent in all priority based scheduling
mechanisms always be wounded (aborted) by a long running older transaction which may
create starvation.
ARIES: The AFIM does not overwrite its BFIM but recorded at another place on the disk.
Thus, at any time a data item has AFIM and BFIM (Shadow copy of the data item) at two
different places on the disk.
To manage access of data items by concurrent transactions two directories (current and
shadow) are used. The directory arrangement is illustrated below. Here a page is a data item
Locking is an operation which secures (a) permission to Read or (b) permission to Write a
data item for a transaction. Example: Lock (X). Data item X is locked in behalf of the
requesting transaction.
Unlocking is an operation which removes these permissions from the data item.
Example: Unlock (X). Data item X is made available to all other transactions.
A larger timestamp value indicates a more recent event or operation. Timestamp based algorithm
If read_TS(X) > TS(T) or if write_TS(X) > TS(T), then an younger transaction has already
read the data item so abort and roll-back T and reject the operation.
If the condition in part (a) does not exist, then execute write_item(X) of T and set write_TS(X) to
TS(T).
If write_TS(X) > TS(T), then an younger transaction has already written to the data item so abort and
roll-back T and reject the operation.
If write_TS(X) ≤ TS(T), then execute read_item(X) of T and set read_TS(X) to the larger
of TS(T) and the current read_TS(X
2. Repeating history during redo: ARIES will retrace all actions of the database
system prior to the crash to reconstruct the database state when the crash
occurred.
3. Logging changes during undo: It will prevent ARIES from repeating the
1. Analysis: step identifies the dirty (updated) pages in the buffer and the set of
transactions active at the time of crash. The appropriate point in the log where
A log record is written for (a) data update, (b) transaction commit, (c) transaction abort, (d)
undo, and (e) transaction end. In the case of undo a compensating log record is written. A
unique LSN is associated with every log record. LSN increases monotonically and indicates
the disk address of the log record it is associated with. In addition, each data page stores the
LSN of the latest log record corresponding to a change for that page. A log record stores (a)
the previous LSN of that transaction, (b) the transaction ID, and (c) the type of log record.
1. Previous LSN of that transaction: It links the log record of each transaction. It is like
a back pointer points to the previous record of the same transaction.
2. Transaction ID
2. Writes an end_checkpoint record in the log. With this record the contents of
transaction table and dirty page table are appended to the end of the log.
3. Writes the LSN of the begin_checkpoint record to a special file. This special file is