Dbms Sanchit Sir Notes Compress
Dbms Sanchit Sir Notes Compress
0 0
E-R DIAGRAM/MODEL
• Introduced in 1976 by Dr Peter Chen, a non-technical design method works on
conceptual level based on the perception of the real world.
• It consists of collections of basic objects, called entities and of relationships among these
entities and attributes which defines their properties.
• E-R data model was developed to facilitate database design by allowing specification of
an enterprise schema that represents the overall logical structure of a database.
• E-R model is very useful in mapping the meanings and interactions of real-world
enterprises onto a conceptual schema.
• It is free from ambiguities and provides a standard and logical way of visualizing the
data.
• As basically it is a diagrammatical representation easy to understand even by a non-
technical user.
0 0
E R MODEL BASIC
• ENTITY- An entity is a thing or an object in the real world that is distinguishable from
other object based on the values of the attributes it possesses.
• An entity may be concrete, such as a person or a book, or it may be abstract, such as a
course, a course offering, or a flight reservation.
Types of Entity-
➢ Tangible - Entities which physically exist in real world. E.g. - Car, Pen, locker
➢ Intangible - Entities which exist logically. E.g. – Account, video.
0 0
ENTIY SET- Collection of same type of entities that share the same properties or attributes.
0 0
ATTRIBUTES
• Attributes are the units defines and describe properties and characteristics of entities.
• Attributes are the descriptive properties possessed by each member of an entity set. for
each attribute there is a set of permitted values called domain.
• In an ER diagram attributes are represented by ellipse or oval connected to rectangle.
• While in a relational model they are represented by independent column. e.g. Instructor
(ID, name, salary, dept_name)
0 0
Types of Attributes
• Single valued- Attributes having single value at any instance of time for an entity. E.g. –
Aadhar no, dob.
• Multivalued - Attributes which can have more than one value for an entity at same time.
E.g. - Phone no, email, address.
o A multivalued attribute is represented by a double ellipse in an ER diagram and by
an independent table in a relational model.
• Separate table for each multivalued attribute, by taking mva and pk of main table as fk
in new table
0 0
• Simple - Attributes which cannot be divided further into sub parts. E.g. Age
• Composite - Attributes which can be further divided into sub parts, as simple attributes.
A composite attribute is represented by an ellipse connected to an ellipse and in a
relational model by a separate column.
0 0
• Stored - Main attributes whose value is permanently stored in database. E.g.
date_of_birth
• Derived -The value of these types of attributes can be derived from values of other
Attributes. E.g. - Age attribute can be derived from date_of_birth and Date attribute.
0 0
• Descriptive attribute - Attribute of relationship is called descriptive attribute.
• An attribute takes a null value when an entity does not have a value for it. The null value
may indicate “not applicable”— that is, that the value does not exist for the entity.
• Null can also designate that an attribute value is unknown. An unknown value may be
either missing (the value does exist, but we do not have that information) or not known
(we do not know whether or not the value actually exists).
0 0
Relationship / Association
• Is an association between two or more entities of same or different entity set.
• In ER diagram we cannot represent individual relationship as it is an instance or data.
• Note: - normally people use word relationship for relationship type so don’t get
confused.
0 0
• In an ER diagram it is represented by a diamond, while in relational model sometimes
through foreign key and other time by a separate table.
• Every relationship type has three components. Name, Degree, Structural constraints
(cardinalities ratios, participation)
0 0
• Unary Relationship - One single entity set participate in a Relationship, means two
entities of the same entity set are related to each other. These are also called as self -
referential Relationship set. E.g.- A member in a team maybe supervisor of another
member in team.
0 0
• Ternary Relationship - When three entities participate in a Relationship. E.g. The
University might need to record which teachers taught which subjects in which
courses.
0 0
Structural constraints (Cardinalities Ratios, Participation)
• An E-R enterprise schema may define certain constraints to which the contents of a
database must conform.
• Express the number of entities to which another entity can be associated via a
relationship set. Four possible categories are-
• One to One (1:1) Relationship.
• One to Many (1: M) Relationship.
• Many to One (M: 1) Relationship.
• Many to Many (M: N) Relationship.
0 0
• One to One (1:1) Relationship- An entity in A is associated with at most one entity in
B, and an entity in B is associated with at most one entity in A.
E.g.- The directed line from relationship set advisor to both entities set indicates that ‘an
instructor may advise at most one student, and a student may have at most one
advisor’.
0 0
• One to Many (1: M) Relationship - An entity in A is associated with any number (zero
or more) of entities in B. An entity in B, however, can be associated with at most one
entity in A.
E.g.- This indicates that an instructor may advise many students, but a student may have
at most one advisor.
0 0
• Many to One (M: 1) Relationship- An entity in A is associated with at most one entity
in B. An entity in B, however, can be associated with any number (zero or more) of
entities in A.
E.g.- This indicates that student may have many instructors but an instructor can advise
at most one student.
0 0
• Many to Many(M:N) Relationship- An entity in A is associated with any number (zero
or more) of entities in B, and an entity in B is associated with any number (zero or
more) of entities in A.
E.g.- This indicates a student may have many advisors and an instructor may advise many
students.
0 0
Q Match the following:
Codes:
a b c d
a iii iv ii i
b iv iii ii i
c ii iii iv i
d iii iv i ii
0 0
Participation constraints
• Participation constraint specifies whether the existence of an entity depends on its being
related to another entity via the relationship type.
• These constraints specify the minimum and maximum number of relationship instances
that each entity must/can participate in.
• Max cardinality – it defines the maximum no of times an entity occurrence participating
in a relationship.
• Min cardinality - it defines the minimum no of times an entity occurrence participating
in a relationship.
• PARTICIPATION CONSTRAINTS- it defines participations of entities of an entity type in a
relationship.
• Partial participation
• Total Participation
PARTIAL PARTICIPATION (min cardinality zero) - In Partial participation only some entities
of entity set participate in Relationship set, that is there exists at least one entity which do
not participate in a relation.
TOTAL PARTICIPATION (min cardinality at least one) - In total participation every entity of
an entity set participates in at least one relationship in Relationship set.
• Min max cardinalities can be represented either by single line double line or by ()min
max() more information writing
Q What should be the condition for total participation of the entity in a relation?
a) Maximum cardinality should be one b) Minimum cardinality should be zero
c) Minimum cardinality should be one d) None of these
0 0
• A line may have an associated minimum and maximum cardinality, shown in the form
l..h, where l is the minimum and h the maximum cardinality.
• The line between advisor and student has a cardinality constraint of 1..1, meaning
the minimum and the maximum cardinality are both 1. That is, each student must
have exactly one advisor.
• The limit 0..∗ on the line between advisor and instructor indicates that an instructor
can have zero or more students. Thus, the relationship advisor is one-to-many from
instructor to student, and further the participation of student in advisor is total,
implying that a student must have an advisor.
Q Consider an Entity-Relationship (ER) model in which entity sets E1 and E2 are connected
by an m: n relationship R12, E1 and E3 are connected by a 1: n (1 on the side of E1 and n on
the side of E3) relationship R13. E1 has two single-valued attributes a11 and a12 of which
a11 is the key attribute. E2 has two single-valued attributes a21 and a22 is the key
attribute. E3 has two single valued attributes a31 and a32 of which a31 is the key attribute.
The relationships do not have any attributes. If a relational model is derived from the above
0 0
ER model, then the minimum number of relations that would be generated if all the
relations are in 3NF is ___________. (GATE-2015) (2 Marks)
a) 2 b) 3 c) 4 d) 5
Answer: 4
Q Let M and N be two entities in an E-R diagram with simple single value attributes. R1 and
R2 are two relationship between M and N, where as
R1 is one-to-many and R2 is many-to-many.
The minimum number of tables required to represent M, N, R1 and R2 in the relational
model are _______. (NET-JAN-2017)
(1) 4 (2) 6 (3) 7 (4) 3
Ans: 4
Q Given the basic ER and relational models, which of the following is INCORRECT? (GATE-
2012) (1 Marks)
(A) An attribute of an entity can have more than one value
(B) An attribute of an entity can be composite
(C) In a row of a relational table, an attribute can have more than one value
(D) In a row of a relational table, an attribute can have exactly one value or a NULL value
Ans: c
Q Let E1 and E2 be two entities in an E/R diagram with simple single-valued attributes. R1
and R2 are two relationships between E1 and E2, where R1 is one-to-many and R2 is many-
to-many. R1 and R2 do not have any attributes of their own. What is the minimum number
of tables required to represent this situation in the relational model? (GATE-2005) (2
Marks)
a) 2 b) 3 c) 4 d) 5
Ans: b
Q Consider the entities ‘hotel room’, and ‘person’ with a many to many relationship
‘lodging’ as shown below:
0 0
If we wish to store information about the rent payment to be made by person (s) occupying
different hotel rooms, then this information should appear as an attribute of (GATE-2005)
(1 Marks)
(A) Person (B) Hotel Room (C) Lodging (D) None of these
Ans: c
Q Consider the following entity relationship diagram (ERD), where two entities E1 and E2
have a relation R of cardinality 1 : m.
The attributes of E1 are A11, A12 and A13 where A11 is the key attribute. The attributes of
E2 are A21, A22 and A23 where A21 is the key attribute and A23 is a multi-valued attribute.
Relation R does not have any attribute. A relational database containing minimum number
of tables with each table satisfying the requirements of the third normal form (3NF) is
designed from the above ERD. The number of tables in the database is (GATE-2004)(2
Marks)
(A) 2 (B) 3 (C) 5 (D) 4
Ans: b
Q Let E1 and E2 be two entities in E-R diagram with simple single valued attributes. R1and
R2 are two relationships between E1 and E2 where R1 is one - many and R2 is many - many.
R1 and R2 do not have any attribute of their own. How many minimum numbers of tables
are required to represent this situation in the Relational Model? (NET-DEC-2015)
(1) 4 (2) 3 (3) 2 (4) 1
Ans. 2
Q The minimum number of tables required to convert the following ER diagram to relation
model is _____________
0 0
Q The minimum number of tables required to convert the following ER diagram to
Relational model is ____________
0 0
STRONG AND WEAK ENTITY SET
• A key for an entity is a set of attributes that suffice to distinguish entities from each
other. The concepts of super key, candidate key, and primary key are applicable to
entity sets just as they are applicable to relation schemas.
• An entity set is called strong entity set, if it has a primary key, all the tuples in the set
are distinguishable by that key. Convert every strong entity set into an independent
table
•
• An entity set that does not process sufficient attributes to form a primary key is called a
weak entity set. It contains discriminator attributes (partial key) which contain partial
information about the entity set, but it is not sufficient enough to identify each tuple
uniquely. Represented by double rectangle
• The discriminator of a weak entity set is also called the partial key of the entity set.
• The discriminator of a weak entity is underlined with a dashed, rather than a solid,
line.
0 0
• For a weak entity set to be meaningful and converted into strong entity set, it must be
associated with another entity set called the identifying or owner entity set i.e. weak
entity set is said to be existence dependent on the identity set.
• The identifying entity set is said to own weak entity set that it identifies. the primary
key of weak entity set will be the union of primary key and discriminator attributes.
• The relationship associating the weak entity set with the identifying entity set is called
the identifying relationship (double diamonds).
• The identifying relationship is many to one from the weak entity set to identifying entity
set, and the participation of the weak entity set in relationship is always total.
• The identifying relationship set should not have any descriptive attributes, since any
such attributes can instead be associated with the weak entity set.
• A weak entity set may participate as owner in an identifying relationship with another
weak entity set.
• Convert every weak entity set into a table
0 0
Q For a weak entity set to be meaningful, it must be associated with another entity set in
combination with some of their attribute values, is called as: (NET-DEC-2015)
(1) Neighbor Set (2) Strong Entity Set
(3) Owner Entity Set (4) Weak Set
Ans. 3
Q Which of the following statements is FALSE about weak entity set? (NET-DEC-2015)
a) Weak entities can be deleted automatically when their strong entity is deleted.
b) Weak entity set avoids the data duplication and consequent possible inconsistencies
caused by duplicating the key of the strong entity.
c) A weak entity set has no primary keys unless attributes of the strong entity set on which
it depends are included
d) Tuples in a weak entity set are not partitioned according to their relationship with tuples
in a strong entity set.
Ans. D
Ans: 3
(GATE-2008) (2 Marks)
0 0
a) {M1, M2, M3, P1} b) {M1, P1, N1, N2}
c) {M1, P1, N1} d) {M1, P1}
Ans: a
Q The entity type on which the ................. type depends is called the identifying owner.
(NET-DEC-2004)
(A) Strong entity (B) Relationship
(C) Weak entity (D) E – R
Ans: c
0 0
Trapes
• It is possible that even after all efforts there remain some problem with ER diagram.
These problems are called trapes. There are two types of trapes in ER diagram.
• FAN TRAP - If two or more 1 to M relationships are emerging out from single entity.
Then there will be a FAN trap.
• A single site contains many departments and employs many staff. However, which staff
work in a particular department
• The fan trap is resolved by restructuring the original ER model to represent the correct
association.
0 0
• CHASM TRAP - If two directly related entities are connected through another(third)
entity with partial participation then there is a chasm trap. So, logic suggests the
existence of a relationship between entity sets, but the relationship does not exist
between certain entity occurrences in ER diagram.
• A model suggests the existence of a relationship between entity types, but the
pathway does not exist between certain entity occurrences. Following is the example
in different notations.
0 0
RELATIONAL DATABASE MANAGEMENT SYSTEM
• A relational database management system (RDBMS) is a database engine/system based
on the relational model specified by Edgar F. Codd--the father of modern relational
database design--in 1970.
• Most modern commercial and open-source database applications are relational in
nature. The most important relational database features include an ability to use tables
for data storage while maintaining and enforcing certain data relationships.
Student
0 0
Properties of Relational tables
• Each row is unique
• Each column has a unique name
• The sequence of rows is insignificant
• The sequence of columns is insignificant.
• Values are atomic
• Column values are of the same kind
• No two tables can have the same name in a relational schema.
• For all relations, the domains of all attribute should be unique. It means elements of all
attribute in particular domain is indivisible (cannot be divided). And all attributes should
have same property of a domain.
0 0
Update Anomalies- Anomalies that cause redundant work to be done during insertion into
and Modification of a relation and that may cause accidental loss of information during a
deletion from a relation.
Purpose of Normalization
0 0
FUNCTIONAL DEPENDENCY
• A formal tool for analysis of relational schemas.
• In a Relation R, if ‘X’ ⊑ R AND Y ⊑ R, then attribute or a Set of attribute ‘X’ Functionally
derives an attribute or set of attributes ‘Y’,
o iff each ‘X’ value is associated with precisely one ‘Y’ value.
o For all pairs of tuples t1 and t2 in R such that
▪ If T1[X] = T2[X]
▪ Then, T1[Y] = T2[Y]
o X- Determinant (Determines Y value).
o Y- Dependent (Dependent on X).
o If k --> R, the K is a super key of R
Q Consider the relation X(P, Q, R, S, T, U) with the following set of functional dependencies
F={
{P, R} → {S,T},
{P, S, U} → {Q, R}
Q Which of the following functional dependencies are satisfied by the instance? (GATE CS
2000)
X Y Z
1 4 2
1 5 3
2 6 3
3 2 2
0 0
(A) XY -> Z and Z -> Y (B) YZ -> X and Y -> Z
(C) YZ -> X and X -> Z (D) XZ -> Y and Y -> X
Answer: (B)
Which of the following dependencies are satisfied by the above relation instance?
a) A->B, BC->A b) C->B, CA->B c) B->C, AB->C d) A->C, BC->A
X Y Z
1 4 3
1 5 3
4 6 3
3 2 2
XZ>X
XY>Z
Z>Y
Y>Z
XZ>Y
Q Consider the following relation instance, which of the following dependency doesn’t hold
0 0
A B C
1 2 3
4 2 3
5 3 3
Q From the following instance of a relation scheme R (A, B, C), we can conclude that (CS-
2002)
A B C
1 1 1
1 1 0
2 3 2
2 3 2
Q From the following instance of the relation schema R (A, B, C) we can conclude that
0 0
d) None of these
0 0
ATTRIBUTES CLOSURE/CLOSURE ON ATTRIBUTE SET/ CLOSURE
SET OF ATTRIBUTES
• Attribute closure of an attribute set F, can be defined as set of attributes which can be
functionally determined from F.
• DENOTED BY F+
ARMSTRONG’S AXIOMS
Armstrong Axioms -
0 0
a. Decomposition rule i. If X → Y and Z → W then {X, Z} → {Y, W}
b. Union rule ii. If X → Y and {Y, W} →Z then {X, W} → Z
c. Composition rule iii. If X → Y and X→ Z then X → {Y, Z}
d. Pseudo transitivity rule iv. If X → {Y, Z} then X → Y and X → Z
Codes:
a b c d
(A) iii ii iv i
(B) i iii iv ii
(C) ii i iii iv
(D) iv iii i ii
Ans: d
Q Decomposition rules is
a) XZ → YZ, X → Y b) X → Y, Y → Z | X → YZ
c) X → YZ | X → Y, X → Z d) X → Y, WY → Z | WX → Z
Ans: c
A→B
B→C
AB → D
A+ = {A, B, C, D}.
Q Consider the relation X (P, Q, R, S, T, U) with the following set of functional dependencies
F={
{P, R} → {S, T},
{P, S, U} → {Q, R}
}
Which of the following is the trivial functional dependency in F+ is closure of F?
a) {P,R} → {S,T} b) {P,R} → {R,T}
c) {P,S} → {S} d) {P,S,U} → {Q}
0 0
Q R(ABCDEFG)
A>B
BC>DE
AEG>G
(AC)+ =?
Q R(ABCDE)
A>BC
CD>E
B>D
E>A
(B)+ =
Q R(ABCDEF)
AB>C
BC>AD
D>E
CF>B
(AB)+ =
Q R(ABCDEFG)
A>BC
CD>E
E>C
D>AEH
ABH>BD
DH>BC
(BCD)+ =
0 0
Q Let R = ABCDE is a relational scheme with functional dependency set F = {A → B, B → C,
AC → D}.
The attribute closures of A and E are (NET-DEC-2014)
(A) ABCD, φ (B) ABCD, E (C) Φ, φ (D) ABC, E
Ans: b
Q In a Relation R (A, B, C, D) Given, F= {A->B, B->C, C->D} To check whether, A->C is valid or
not?
0 0
APPLICATION OF ATTRIBUTE CLOSURE
Equivalence of Two FD sets-
F: G:
A>C A>CD
AC>D E>AH
E>AD
E>H
Q R(VWXYZ)
F: G:
V>W V>W
VW>X V>X
Y>VX Y>V
Y>Z Y>Z
F: G:
B>CD B>CDE
AD>E A>BC
B>A AD>E
F: G:
P>Q P>QR
Q>R R>S
R>S
F: G:
A>B A> BC
0 0
B>C B>A
C>A C>A
F: G:
W>X W>XY
WX>Y Z>WX
Z>WY
Z>V
0 0
To find the MINIMAL COVER /CANONICAL
COVER/IRREDUCIBLE SET
Minimal cover- It means to eliminate any kind of redundancy from a FD set.
Q R(ABCD)
A>B
C>B
D>ABC
AC>D
Q R(VWXYZ)
V>W
VW>X
Y>VX
Y>Z
0 0
C->B C->B C->B
D->AC A->B AB ->D
AC->D
Q Find the minimal cover for the FD set {AC → BD, A→C, B → C, D → C}
a) {A → B, A → C, B → C, A → D} b) {A →B, A → D, A → C, B →D}
c) {A → B, A → D, A → C, B → C} d) {A → B, A → D, D →C, B → C}
Q R(WXYZ)
X>W
WZ>XY
Y>WXZ
Q The following functional dependencies hold true for the relational schema {V, W, X, Y, Z}:
GATE (2017 SET 1)
V→W
VW → X
Y → VX
Y→Z
Which of the following is irreducible equivalent for this set of functional dependencies?
0 0
To Find a candidate key of a Relation
• We must have to specify how tuples with in a given relation are distinguish from other tuples, so the
value of one or more attributes of a tuple must be such that they can uniquely identify the tuple. So, a
key field is a column value in a table that is used to uniquely identify tuple in a relation.
• Various Keys used in database System are as follows-
Super key
• Set of attributes using which we can identify each tuple uniquely, all the remaining
attributes is called Super key.
• Let X be a set of attributes in a Relation R, if X+ determines all attributes of R then X is
said to be Super key of R.
• Every table will have a least one super key.
• Biggest Super Key possible in a Relation is a Set comprising all attributes of a Relation.
• A relation of ‘n’ attributes with every attribute being a super key, then there are 2n -1
1. A → B
2. B → C
3. AB→ D
Q The maximum number of super keys for the relation schema R(E, F, G, H) with E as the
key is (Gate-2014) (1 Marks)
a) 5 b) 6 c) 7 d) 8
Ans: d
0 0
Candidate key
• A super key whose no proper subset is a super key is called Candidate key, also called as
MINIMAL SUPER KEY.
• There should be at least one candidate key with Not Null constraint.
• Prime attribute- Attributes that are member of candidate Keys are called Prime
attributes
Q (NET-JUNE-2019)
Primary key
• One of the candidate keys is selected by database administrator as a Primary Key.
• Primary Key attribute are not allowed to have Null values.
• Candidate key which are not chosen as primary key is alternate key.
0 0
Q Which one is correct w.r.t. RDBMS? (NET-JAN-2017)
(1) primary key ⊆ super key ⊆ candidate key
(2) primary key ⊆ candidate key ⊆ super key
(3) super key ⊆ candidate key ⊆ primary key
(4) super key ⊆ primary key ⊆ candidate key
Ans: b
Q Consider the following database table having A, B, C and D as its four attributes and four
possible candidate keys (I, II, III and IV) for this table: (NET-JULY-2016)
0 0
Foreign Keys
• A foreign key is a column or group of columns in a relational database table that refers
the primary key of the same table or some other table to represent relationship.
• The concept of referential integrity is derived from foreign key theory.
Q Let R1(a, b, c) and R2(x, y, z) be two relations in which a is the foreign key of R1 that
refers to the primary key of R2. Consider following four options. (NET-JULY-2018)
(a) Insert into R1 (b) Insert into R2
(c) Delete from R1 (d) Delete from R2
Which of the following is correct about the referential integrity constraint with respect to
above?
(1) Operations (a) and (b) will cause violation.
(2) Operations (b) and (c) will cause violation.
(3) Operations (c) and (d) will cause violation.
(4) Operations (d) and (a) will cause violation.
Ans: 4
Q A many-to-one relationship exists between entity sets r1 and r2. How will it be
represented using functional dependencies if Pk(r) denotes the primary key attribute of
relation r ? (NET-JULY-2018)
(1) Pk(r1) → Pk(r2) (2) Pk(r2) → Pk(r1)
(3) Pk(r2) → Pk(r1) and Pk(r1) → Pk(r2) (4) Pk(r2) → Pk(r1) or Pk(r1) → Pk(r2)
Ans: a
0 0
Q The following table has two attributes A and C where AA is the primary key and CC is the
foreign key referencing AA with on-delete cascade.
The set of all tuples that must be additionally deleted to preserve referential integrity when
the tuple (2,4) is deleted is: (GATE- 2005) (2 Marks)
a) (3,4) and (6,4) b) (5,2) and (7,2)
c) (5,2),(7,2) and (9,5) d) (3,4),(4,3) and (6,4)
Ans: c
Q Drop Table cannot be used to drop a Table referenced by __________ constraint. (NET-
JUNE-2015)
(a)Primary key (b)Sub key (c)Super key (d)Foreign key
(1)(a) (2)(a), (b) and (c (3)(d) (4)(a) and (d)
Ans.3
Q Consider the following table consisting of two attributes A and B, where ‘B’ is the foreign
key referring the candidate key ‘A’ with on – delete cascade option
A B
8 9
4 6
7 6
3 2
6 5
5 1
1 2
2 3
When we delete the tuple (3 2), we need to delete few tuples additionally in order to
preserve the referential integrity. The number of tuples that are remaining in the table
when we delete (3 2) and additional tuples if necessary is ______
0 0
Q Consider the following tables T1 and T2.
In table T1, P is the primary key and Q is the foreign key referencing R in table T2 with on-
delete cascade and on-update cascade. In table T2, R is the primary key and S is the foreign
key referencing P in table T1 with on-delete set NULL and on-update cascade. In order to
delete record 〈3,8〉 from table T1, the number of additional records that need to be deleted
from table T1 is _________. (GATE- 2017) (1 Marks)
0 0
Composite key – composite key is a key composed of more than one column sometimes it is
also known as concatenated key.
Secondary key – secondary key is a key used to speed up the search and retrieval contrary
to primary key, a secondary key does not necessary contain unique values.
Q R(ABCD)(AB, BD)
AB>CD
D>A
Q R(ABCDEF)(BF)
AB>C
C>D
B>AE
Q R(ABC)(AB, BC)
AB>C
C>A
Q R(ABCDEFGHIJ)(AB)
AB>C
A>DE
B>F
F>GH
D>IJ
Q R(ABCDEFGHIJ)(ABD)
AB>C
AD>GH
BD>EF
A>I
H>J
Q R(ABCDE)(CE)
CE>D
D>B
C>A
0 0
Q R(ABCDEFGH)(AE)
A>BC
ABE>CDGH
C>GD
D>G
E>F
……………………
Q R(ABCDE)(BC, CD)
BC>ADE
D>B
………………………..
Q R(ABCDEF)(ABD, BCD)
AB>C
DC>AE
E>F
0 0
D>BE
E>F
F>A
Q R(VWXYZ)(WYZ)
Z>Y
Y>Z
X>YV
VW>X
Q R(ABCDEF)(ABC, ACD)
ABC>D
ABD>E
CD>F
CDF>B
BF>D
Q R(ABCDE)
A>BC
CD>E
B>D
E>A
Q R(ABCDEF)
A>BCDEF
BC>ADEF
DEF>ABC
0 0
Q (NET-DEC-2018)
(a) Entity integrity (i) enforces some specific business rule that do not fall
into entity or domain
(b) Domain integrity (ii) Rows can’t be deleted which are used by other records
(c) Referential integrity (iii) enforces valid entries for a column
(d) User defined integrity (iv) No duplicate rows in a table ‘’
Code:
a b c d
1 iii iv i ii
2 iv iii ii i
3 iv ii iii i
4 ii iii iv i
Ans: 2
Q Let pk(R) denotes primary key of relation R. A many-to-one relationship that exists
between two relations R1 and R2 can be expressed as follows: (NET-JAN-2017)
(1) pk(R2) → pk(R1) (2) pk(R1) → pk(R2)
(3) pk(R2) → R1 ∩ R2 (4) pk(R1) → R1 ∩ R2
Ans: 2
0 0
Q Find the candidate key(s) of the relation R(UVWXYZ) with FD set F = {UV →W, XW → Y,
U→ XZ, Y → U} (NET-DEC-2015)
a) XW b) UV, YV and WXV c) YUV, XV and WV d) None of these
Answer: (B)
Q Identify the minimal key for relational scheme R(A, B, C, D, E) with functional
dependencies F = {A → B, B → C, AC → D} (NET-DEC-2014)
(A) A (B) AE (C) BE (D) CE
Answer: (B)
For
(StudentName, Student Age) to be the key for this instance, the value X should not be equal
to______________ (GATE- 2014) (1 Marks)
Ans 19
Q Relation R has eight attributes ABCDEFGH. Fields of R contain only atomic values. F = {CH
-> G, A -> BC, B -> CFH, E -> A, F -> EG} is a set of functional dependencies (FDs) so that F+ is
exactly the set of FDs that hold for R. How many candidate keys does the relation R have?
(GATE- 2013) (2 Marks)
0 0
(A) 3 (B) 4 (C) 5 (D) 6
Answer: (B)
Q Consider a relational table with a single record for each registered student with the
following attributes.
1. Registration_Num: Unique registration number of each registered student
2. UID: Unique identity number, unique at the national level for each citizen
3. BankAccount_Num: Unique account number at the bank. A student can have multiple
accounts or join accounts. This attribute stores the primary account number.
4. Name: Name of the student
5. Hostel_Room: Room number of the hostel
Which one of the following option is INCORRECT? (GATE- 2011) (1 Marks)
A BankAccount_Num is candidate key
B Registration_Num can be a primary key
C UID is candidate key if all students are from the same country
D If S is a superkey such that S∩UID is NULL then S∪UID is also a superkey
Ans: a
0 0
c. Event control action model iii. Encryption
d. Data security iv. Trigger
Codes:
a b c d
a) iii ii i iv
b) ii i iv iii
c) iii iv i ii
d) i ii iii iv
Ans: b
Q _______ constraints ensure that a value that appears in one relation for a given set of
attributes also appears for a certain set of attributes in another relation. (NET-SEP-2013)
(A) Logical Integrity (B) Referential Integrity
(C) Domain Integrity (D) Data Integrity
Ans: b
Q The student marks should not be greater than 100. This is (NET-DEC-2013)
(A) Integrity constraint (B) Referential constraint
(C) Over-defined constraint (D) Feasible constraint
Ans: a
Q In RDBMS, the constraint that no key attribute (column) may be NULL is referred to
as: (NET-JULY-2016)
(1) Referential integrity (2) Multi-valued dependency
(3) Entity Integrity (4) Functional dependency
Ans. 3
0 0
Normalization
• As one paragraph contains a single idea similarly one table must contain an information
about single idea, otherwise we have to repeat one info for other.
0 0
Q Relational database schema normalization is NOT for: (NET-AUG-2016)
(1) reducing the number of joins required to satisfy a query.
(2) eliminating uncontrolled redundancy of data stored in the database.
(3) eliminating number of anomalies that could otherwise occur with inserts and deletes.
(4) ensuring that functional dependencies are enforced.
Ans: a
Ans. 2
0 0
FIRST NORMAL FORM
• A Relation table is said to be in first normal form iff each attribute in each cell have
single value(atomic). Means a Relation should not contain any multivalued or composite
attributes.
• Other implications of first normal form
o Every row should be unique, that is no two rows should have the same values of
all the attributes.
o There must be a primary key.
o Every column should have a unique name
o Order of row and column is irrelevant
Customer
• This table is not in first normal form, as column telephone number, contains multiple
value in a single cell.
0 0
Solution
• An apparent solution is to introduce more columns:
Customer
• An arbitrary and hence meaningless ordering has been introduced: why is 555-
861-2025 put into the Telephone Number1 column rather than the Telephone
Number2 column?
• There's no reason why customers could not have more than two telephone
numbers, so how many Telephone Number N columns should there be?
• It is not possible to search for a telephone number without searching an arbitrary
number of columns.
• Adding an extra telephone number may require the table to be reorganized by the
addition of a new column rather than just having a new row (tuple) added.
0 0
Designs that comply with 1NF
• To bring the model into the first normal form, we split the strings we used to hold
our telephone number information into "atomic" (i.e. indivisible) entities: single
phone numbers. And we ensure no row contains more than one phone number.
Customer
• Note that the "ID" is no longer unique in this solution with duplicated customers.
To uniquely identify a row, we need to use a combination of (ID, Telephone
Number). The value of the combination is unique although each column
separately contains repeated values. Being able to uniquely identify a row (tuple)
is a requirement of 1NF.
0 0
An alternative design uses two tables:
789 555-808-9633
0 0
• Prime attribute: - A attribute is said to be prime if it is part of any of the candidate key
• Non-Prime attribute: - A attribute is said to be non-prime if it is not part of any of the
candidate key
E.g. R(ABCD)
AB>CD
Here candidate key is AB so, A and B are prime attribute, C and D are non-prime attributes.
0 0
SECOND NORMAL FORM
Relation R is in 2NF if,
• R should be in 1 NF.
• R should not contain any Partial dependency. (that is every non-key column should be
fully dependent upon candidate key)S
Q R(A, B, C) B>C
A B C
A B
a 1 X
A 1
b 2 Y
B 2
a 3 Z
A 3 B C
C 3 Z
C 3 1 X
D 3 Z
D 3 2 Y
E 3 Z
E 3 3 z
Even if one some subset become null then also, we can find the desired attribute
• Every table with two attributes will always be in second normal form.
0 0
TRANSITIVE DEPENDENCY
A functional dependency from non-Prime attribute to non-Prime attribute is called
transitive
E.g.- R(A, B, C, D) with A as a candidate key
A->B
B->C [ transitive dependency]
C->D [transitive dependency]
0 0
THIRD NORMAL FORM
Let R be the relational schema, it is said to be in 3 NF
• R should be in 2NF
• It must not contain any transitive dependency
A B B C
A B C
A 1 1 P
A 1 P
B 2 2 Q
B 2 Q
C 2 3 R
C 2 Q
D 2 4 S
D 2 Q
E 3
E 3 R
F 3
F 3 R
G 4
G 4 S
Q Which normal form is considered as adequate for usual database design? (NET-JUNE-
2013)
(A) 2NF (B) 3NF (C) 4NF (D) 5NF
Ans: b
0 0
BCNF (BOYCE CODD NORMAL FORM)
A relational schema R is said to be BCNF if every functional dependency in R from
α-->β
E.g.- R (A, B, C, D)
A B C A B
A B B A B
B B C B B
B A D B A
A A
A A E
C C
C C B
D C
D C B e C
E C B C B
f c
F C B B B
C B
D A
E A
Q (NET-JULY-2019)
0 0
Ans: 4
• If a relation R does not contain any non- trivial dependency, then R Is in BCNF.
• A Relation with two attributes is always in BCNF.
• A relation schema R consist of only simple candidate key then, R is always in 2NF but
may or may not be in 3NF or BCNF.
• A Relation schema R consist of only prime attributes then R is always in 3NF, but may
or may not be in BCNF.
• A relation schema R in 3NF and with only simple candidate keys, then R surely in
BCNF.
0 0
Q R(ABCDEF) (A, BC, DEF) (BCNF) →
A>BCDEF
BC>ADEF
DEF>ABC
………………………………………………………
0 0
Q R(ABCDEF)(C, D, AB, BE, BF)(3 NF)
AB>C
C>D
D>BE
E>F
F>A
Q R(ABCDE)(AE)(2 NF)
A>B
B>E
C>D
…………………………………………………………………………
Q R(ABCDE)(ac)(1NF)
A>B
0 0
B>E
C>D
Q R(ABCDE)(ab)(1NF)
AB>C
B>D
D>E
Q R(ABCDE)(AB)(1NF)
AB>C
B>D
D>E
Q R(ABCD)(AB)(1 NF)
AB>C
B>D
Q R(ABCDEF)(BF)(1 NF)
AB>C
C>D
B>AE
Q R(ABCDEFGHIJ)(AB)(1 NF)
AB>C
A>DE
B>F
F>GH
D>IJ
Q R(ABCDEFGHIJ)(ABD) (1NF)
AB>C
AD>GH
BD>EF
A>I
0 0
H>J
Q R(ABCDE)(CE)(1 NF)
CE>D
D>B
C>A
Q R(ABCDEFGH)(AE)(1 NF)
A>BC
ABE>CDGH
C>GD
D>G
E>F
Q R(VWXYZ)(VW, XW)(1NF)
Z>Y
Y>Z
X>YV
VW>X
Q R(ABCDE)(ABD)(1 NF)
BD>E
A>C
Q Consider the following four relational schemas. For each schema, all non-trivial functional
dependencies are listed. The underlined attributes are the respective primary keys. (GATE-
0 0
2020) (2 Marks)
Which one of the relational schemas above is in 3NF but not in BCNF?
(a) Schema 1 (b) Schema 2 (c) Schema 3 (d) Schema 4
Ans: b
Q A database of research articles in a journal uses the following schema. (GATE- 2016) (2
Marks)
0 0
(VOLUME, NUMBER, STARTPAGE, ENDPAGE, TITLE, YEAR, PRICE)
The primary key is (VOLUME, NUMBER, STARTPAGE, ENDPAGE) and the following
functional dependencies exist in the schema.
Q If every non-key attribute is functionally dependent on the primary key, then the relation
is in __________. (NET-NOV-2017)
(1) First normal form (2) Second normal form
(3) Third normal form (4) Fourth normal form
Ans: (2)
Q For a database relation R(a, b, c, d) where the domains of a, b, c and d include only
atomic values, and only the following functional dependencies and those that can be
inferred from them hold : a → c b → d The relation is in _________.(NET-JULY-2017)
(1) First normal form but not in second normal form
(2) Second normal form but not in third normal form
(3) Third normal form
(4) BCNF
Ans: a
Q For a database relation R(A, B, C, D) where the domains of A, B, C and D include only
atomic values, only the following functional dependencies and those that can be inferred
from them are : A → C B → D The relation R is in _______. (NET-JAN-2017)
(1) First normal form but not in second normal form.
0 0
(2) Both in first normal form as well as in second normal form.
(3) Second normal form but not in third normal form.
(4) Both in second normal form as well as in third normal form.
Ans: a
Q Consider a relation R (A, B, C, D, E, F, G, H), where each attribute is atomic, and following
functional dependencies exist. (NET-NOV-2017)
CH → G
A → BC
B → CFH
E→A
F → EG
The relation R is __________ .
(1) in 1NF but not in 2NF (2) in 2NF but not in 3NF
(3) in 3NF but not in BCNF (4) in BCNF
Ans: 1
Q The best normal form of relation scheme R(A, B, C, D) along with the set of functional
dependencies F = {AB → C, AB → D, C → A, D → B} is (NET-DEC-2014)
(A) Boyce-Codd Normal form (B) Third Normal form
(C) Second Normal form (D) First Normal form
Ans: b
0 0
Q Which of the following is TRUE? (GATE- 2012) (1 Marks)
a) Every relation in 3NF is also in BCNF
b) A relation R is in 3NF if every non-prime attribute of R is fully functionally dependent on
every key of R
c) Every relation in BCNF is also in 3NF
d) No relation can be in both BCNF and 3NF
Ans: c
0 0
c) Proportional to the size of F+
d) Indeterminate
Ans: a
(1) in 1NF, but not in 2NF (2) in 2NF, but not in 3NF
(3) in 3NF, but not in BCNF (4) in BCNF
Ans: 1
Q Consider the relation R (ABCDE) with the FD set F = {A → CE, B → D, AE → D}. Identify the
highest normal form satisfied by the relation R.
a) 1 NF b) 2 NF c) 3 NF d) BCNF
Assume that, in the supplier’s relation above, each supplier and each street within a city has
a unique name, and (sname, city) forms a candidate key. No other functional dependencies
are implied other than those implied by primary and candidate keys. Which one of the
following is TRUE about the above schema? (GATE- 2009) (1 Marks)
a) The schema is in BCNF b) The schema is in 3NF but not in BCNF
c) The schema is in 2NF but not in 3NF d) The schema is not in 2NF
Ans: a
0 0
Q Consider the following relational schemes for a library database
Book (Title, Author, Catalog_no, Publisher, Year, Price)
Collection (Title, Author, Catalog_no)
with in the following functional dependencies:
I. Title Author --> Catalog_no
II. Catalog_no --> Title Author Publisher Year
III. Publisher Title Year --> Price
Assume {Author, Title} is the key for both schemes. Which of the following statements is
true? (GATE- 2008) (1 Marks) (NET-JUNE-2014)
(A) Both Book and Collection are in BCNF
(B) Both Book and Collection are in 3NF only
(C) Book is in 2NF and Collection is in 3NF
(D) Both Book and Collection are in 2NF only
Answer: (C)
Q The relation scheme Student Performance (name, courseNo, rollNo, grade) has the
following functional dependencies:
name, courseNo → grade
rollNo, courseNo → grade
name → rollNo
rollNo → name
The highest normal form of this relation scheme is (GATE- 2004) (1 Marks)
a) 2 NF b) 3 NF c) BCNF d) 4 NF
Ans: b
0 0
(D) None of the above
Answer: (D)
Q The Relation Vendor Order (V_no, V_ord_no, V_name, Qty_sup, unit_price) is in 2NF
because (NET-JUNE-2015)
(1) Non_key attribute V_name is dependent on V_no which is part of composite key
(2) Non_key attribute V_name is dependent on Qty_sup
(3) Key attribute Qty_sup is dependent on primary_key unit price
(4) Key attribute V_ord_no is dependent on primary_key unit price
Ans. 1
“The Relation Vendor Order (V_no, V_ord_no, V_name, Qty_sup, unit_price) is in 2NF
because: Non_key attribute V_name is dependent on V_no which is part of composite
key.”
0 0
Decomposition
2 NF DECOMPOSTION
Q R(ABCD)
AB>D, B>C
Q R(ABCDE)
A>B, B>E, C>D
Q R(ABCDEFGHIJ)(ABD) (1NF)
AB>C, AD>GH, BD>EF, A>I, H>J
0 0
3NF DECOMPOSITON
Q R(ABCDE)
A>B, B>E, C>D
Q R(ABCDEFGHIJ)
AB>C, A>DE, B>F, F>GH, D>IJ
Q R(ABCDE)
AB>C, B>D, D>E
0 0
BCNF DECOMPOSITION
Q R(ABCDE) (AB, BC, DB, EB) (3NF) AFTER BCNF DECOMPOSITION BC,
EA, DE, CD
AB>C, C>D, D>E, E>A
0 0
Q If the relation R (ABCDE) with FD set {AB - > CDE, A -> C, C->D} is
converted into BCNF then the number of foreign keys exist in resulting
relations is ___________
0 0
FOURTH NORMAL FORM (4NF)
Multivalued Dependency-
• Multivalued dependencies are a consequence of first normal form (1NF) which disallows
an attribute in a tuple to have a set of values (Multiple values).
• Denoted by, A ->-> B, Means, for every value of A, there may exist more than one value
of B.
• If there is functional dependency from A-> B, then there will also a multivalued
functional dependency from A->->B.
• A trivial multivalued dependency X→→Y is one where either Y is a subset of X,
or X and Y together form the whole set of attributes of the relation.
• E.g. let the constraint specified by MVD in relation EMP as
Ename ->-> Pname
Ename ->-> Dname
EMP
Ename Pname Dname Redundancy due to two
independent
Smith X John
multivalued
Smith Y Anna
dependencies in same
Smith X Anna Relation.
Smith Y John
EMP_PROJECTS EMP_DEPENDENT
Ename Pname
Smith X Ename Dname
Smith Y Smith john
Smith anna
NOTE: The above EMP schema is in BCNF as no functional dependency holds on EMP, but
still redundancy due to MVD.
0 0
Consider the following example:
Pizza Delivery Permutations
Restaurant Pizza Variety Delivery Area
A1 Pizza Thick Crust Springfield
A1 Pizza Thick Crust Shelbyville
A1 Pizza Thick Crust Capital City
A1 Pizza Stuffed Crust Springfield
A1 Pizza Stuffed Crust Shelbyville
A1 Pizza Stuffed Crust Capital City
Elite Pizza Thin Crust Capital City
Elite Pizza Stuffed Crust Capital City
Vincenzo's Pizza Thick Crust Springfield
Vincenzo's Pizza Thick Crust Shelbyville
Vincenzo's Pizza Thin Crust Springfield
Vincenzo's Pizza Thin Crust Shelbyville
• Each row indicates that a given restaurant can deliver a given variety. The table has no
non-key attributes because its only key is {Restaurant, Pizza Variety, Delivery Area}.
Therefore, it meets all normal forms up to BCNF.
• If we assume, however, that pizza varieties offered by a restaurant are not affected by
delivery area (i.e. a restaurant offers all pizza varieties it makes to all areas it supplies),
then it does not meet 4NF. The problem is that the table features two non-trivial
multivalued dependencies on the {Restaurant} attribute (which is not a super key). The
dependencies are:
o {Restaurant} →→ {Pizza Variety}
o {Restaurant} →→ {Delivery Area}
• These non-trivial multivalued dependencies on a non-superkey reflect the fact that the
varieties of pizza a restaurant offers are independent from the areas to which the
restaurant delivers. This state of affairs leads to redundancy in the table:
• for example, we are told three times that A1 Pizza offers Stuffed Crust, and if A1 Pizza
starts producing Cheese Crust pizzas then we will need to add multiple rows, one for
each of A1 Pizza's delivery areas.
Varieties By Restaurant
Restaurant Pizza Variety
A1 Pizza Thick Crust
A1 Pizza Stuffed Crust
Elite Pizza Thin Crust
Elite Pizza Stuffed Crust
Vincenzo's Pizza Thick Crust
Vincenzo's Pizza Thin Crust
0 0
Delivery Areas By Restaurant
Restaurant Delivery Area
A1 Pizza Springfield
A1 Pizza Shelbyville
A1 Pizza Capital City
Elite Pizza Capital City
Vincenzo's Pizza Springfield
Vincenzo's Pizza Shelbyville
• If we have two or more multivalued independent attributes in the same relation schema,
we get into a problem of having to repeat every value of one of the attributes with every
value of the other attribute to keep the relation state consistent and to maintain the
independence among the attributes involved. This constraint is specified by a
multivalued dependency.
• It is in BCNF
• There must not exist any non-trivial multivalued dependency.
• Each MVD is decomposed in separate table, where it becomes trivial MVD.
0 0
Q Match the following: (NET-DEC-2005)
(i) 5 NF (a) Transitive dependencies eliminated
(ii) 2 NF (b) Multivalued attribute removed
(iii) 3 NF (c) Contains no partial functional
dependencies
(iv) 4 NF (d) Contains no join dependency
(A) i-a, ii-c, iii-b, iv-d (B) i-d, ii-c, iii-a, iv-b
(C) i-d, ii-c, iii-b, iv-a (D) i-a, ii-b, iii-c, iv-d
Ans: b
0 0
Lossy/Lossless-Dependency Preserving Decomposition
• Because of a normalization a table is Decomposed into two or more tables, but during
this decomposition we must ensure satisfaction of some properties out of which the
most important is lossless join property/decomposition.
• if we decompose a table r into two tables r1 and r2 because of normalization then at
some later stage if we want to join(combine) (natural join) these tables r1 and r2, then
we must get back the original table r, without any extra or less tuple. But some
information may be lost during retrieval of original relation or table. For e.g.
R (A, B, C)
A B C
1 a p
2 b q
3 a r
R1 (A, B) R2 (B, C)
A B B C
1 a a p
2 b b q
3 a a r
R (A, B, C)
A B C
1 a p
1 a r
2 b q
3 a p
3 a r
▪ Decomposition is lossy if R1 ⋈ R2 ⊃ R
▪ Decomposition is lossy if R ⊃ R1 ⋈ R2
0 0
• Decomposition is lossless if R1 ⋈ R2 = R "The decomposition of relation R into R1 and R2
is lossless when the join of R1 and R2 yield the same relation as in R." which guarantees
that the spurious (extra or less) tuple generation problem does not occur with respect to
the relation schemas created after decomposition.
• This property is extremely critical and must be achieved at any cost.
A B C D E
A 122 1 W A
E 236 4 X B
A 199 1 Y C
B 213 2 Z D
0 0
How to check for lossless join decomposition using FD set, following conditions must
hold:
5 NF
A Relational table R is said to be in 5th normal form if
a) it is in 4 NF
B) it cannot be further non-loss decomposed
0 0
Dependency Preserving Decomposition
Let relation R be decomposed into Relations R1, R2, R3…………. RN with their respective
functional Dependencies set as F1, F2, F3…………. FN, then the Decomposition is Dependency
Preserving iff-
{F1 ∪ F2 ∪ F3 ∪ F4………. ∪ FN }+ = F+
Q R (A, B, C)
A--> B, B→C, C-→A
R1(A, B) AND R2(B, C)
LOSSLESS AND FD PREVERSING
Q R (A, B, C, D)
AB→CD, D→A
R1(A, D), R2(B, C, D)
LOSSLESS AND NOT FD PREVERSING
0 0
Q R (A, B, C, D)
A--> B, B→C, C-→D, D-→A
R1(A, B), R2(B, C) AND R3(C, D)
LOSSLESS AND FD PREVERSING
E.g.1 R (A, B, C, D)
F={AB->C, C->A, C->D}
Given decomposition as R1(A, C, D) and R2 (B, C)
Solution-
R1 ∪ R2 = R [True]
R1 ∩ R2 = C = φ
C+ = {A, C, D} So C -> R1
Hence Given decomposition is LOSSLESS.
E.g. 2 R (A, B, C, D)
F={B->C, D->A}
Decomposition as –
R1(B, C) and R2(A, D)
Solution-
1.) R1 ∪ R2 =R [True]
2.) R1 ∩ R2 = φ [not satisfied]
Hence LOSSY Decomposition
Q Consider a scheme R (A B C D E)
F→D
D→F
C→AD
0 0
AB → C
is decomposed into R, (ABC)& R(CDE)is the decomposition lossy
Q Identify the lossy decomposition on the relation R(ABCD) with functional dependencies
A →B
B→ C
C→D
a) (ABC) (BD) b) (AB) (BC) (CD) c) (AB)(CD) d) None of these
Q Consider the relation R (ABCD) with the FD set F = {A → BC, B → CD, C→ AD} which is
decomposed into set of tables D = {(AB), (BC), CD}. Which of the following is true about the
decomposition D?
a) It is lossless and dependency preserving
b) It is lossy but dependency preserving
c) It is lossless but dependencies are nor preserved
d) It is neither lossless nor dependency preserving
Q R(A,B,C,D) is a relation. Which of the following does not have a lossless join, dependency
preserving BCNF decomposition? (Gate - 2001) (2 Marks)
(A) A->B, B->CD (B) A->B, B->C, C->D
(C) AB->C, C->AD (D) A ->BCD
Answer: (C)
0 0
I. Both Y and Z are in BCNF
II. Decomposition of X into Y and Z is dependency preserving and lossless
Which of the above statements is/are correct? (GATE- 2019) (1 Marks)
(a) I only (b) Neither I nor II
(c) II only (d) Both I and II
Ans: c
0 0
D2: The decomposition of the schema R(A, B, C, D, E) having AD → B, C → DE, B → AE and
AE → C, into R1 (A, B, D) and R2 (A, C, D, E) is lossless.
(1) Both D1 and D2 (2) Neither D1 nor D2
(3) Only D1 (4) Only D2
Ans. 4
Only D2 is True because AD is key and present in both the tables. D1 is not always true
because FD’s not given and if we take B->A and C->A then it is lossy decomposition because
no common attributes contain key from one of the tables.
Q Consider the table R with attributes A, B and C. The functional dependencies that hold on
R are : A → B, C → AB. Which of the following statements is/are True? (NET-AUG-2016)
I. The decomposition of R into R1(C, A) and R2(A, B) is lossless.
II. The decomposition of R into R1(A, B) and R2(B, C) is lossy.
(1) Only I (2) Only II (3) Both I and II (4) Neither I nor II
Ans: c
Q (NET-DEC-2010)
0 0
Q The dependency preservation decomposition is a property to decompose database
schema D, in which each functional dependency X → Y specified in F, (NET-DEC-2010)
(A) appeared directly in one of the relation schemas Ri in the decomposed D.
(B) could be inferred from dependencies that appear in some Ri.
(C) both (A) and (B)
(D) None of these
Ans c
Q The relation schemas R1 and R2 form a Lossless join decomposition of R if and only if:
(NET-JUNE-2015)
(a) R1 ∩ R2 ↠ (R1 - R2) (b) R1 → R2
(c) R1 ∩ R2 ↠ (R2 - R1) (d) (R2 → R1) ∩ R2
(1) (a) and (b) happens (2) (a) and (d) happens
(3) (a) and (c) happens (4) (b) and (c) happens
Ans. 3
0 0
decompositions? (Assume that the closures of F and G are available).(Gate-2002) (2 Marks)
(A) Dependency-preservation (B) Lossless-join
(C) BCNF definition (D) 3NF definition
Answer: (B)
Q Select the 'False' statement from the following statements about Normal Forms: (NET-
JUNE-2015)
(1) Lossless preserving decomposition into 3NF is always possible
(2) Lossless preserving decomposition into BCNF is always possible
(3) Any Relation with two attributes is in BCNF
(4) BCNF is stronger than 3NF
Ans. 2
Q Which one of the following statements about normal forms is FALSE? (GATE-2005) (2
Marks)
(A) BCNF is stricter than 3NF
(B) Lossless, dependency-preserving decomposition into 3NF is always possible
(C) Lossless, dependency-preserving decomposition into BCNF is always possible
(D) Any relation with two attributes is in BCNF
Answer: (C)
0 0
A→ Key Attribute
B and C → Atomic Attributes
D and E → Multi – valued Attributes
F and G → Composite Attributes
Which of the following is the correct 1 NF decomposition for the above relation?
a) R1 (ABC), R2(AD), R3 (AE), R4(AF), R5 (AG)
b) R1 (AB), R2(BC), R3 (DE), R4(FG)
c) R1 (ABC), R2 (ADE), R3 (AFG)
d) R1 (ABC), R2 (DEFG)
0 0
Indexing
• Reason for indexing - For a large file when it contains a large number of records which
will eventually acquire large number of blocks, then its access will become slow. In
todays events we want a high-speed database.
• Theoretically relational database is derived from set theory, and in a set the order of
elements in a set is irrelevant, so does in relations(tables).
• But in practice implementation we have to specify the order.
• A number of properties, like search, insertion and deletion will depend on the order in
which elements are stored in the tables.
0 0
File organization/ organization of records in a file
1) Ordered file organization: - All the records in the file are ordered on some search key
field. Here binary search is possible for e.g. dictionary.
2) Unordered file organization: - All the records are inserted usually in the end of the file so
not ordered according to any field, Because of this only linear search is possible.
Q Suppose we have ordered file with records stored r = 30,000 on a disk with Block Size B =
1024 B. File records are of fixed size and are unspanned with record length R = 100 B.
Suppose that ordering key field of file is 9 B long and a block pointer is 6 B long, Implement
primary indexing?
0 0
Important Points about Indexing
• Additional auxiliary access structure is called indexes, a data technique to efficiently
retrieve records from the database files based on some attributes on which the indexing
has been done. Indexing in database systems is similar to what we see in books.
• Index typically provides secondary access path, which provide alternative way to access
the records without affecting the physical placement of records in the main file.
• Index file is always ordered, irrespective of weather main file is ordered or unordered.
So that we can take the advantage of binary search.
0 0
• Index file always contains two columns one the attribute on which search will be done
and other the block or record pointer.
• The size of index file is way smaller than that of the main file.
• As the size of the records is very smaller compare to the main file, as index file record
contain only two columns key (attribute in which searching is done) and block pointer
(base address of the block of main file which contains the record holding the key), while
main file contains all the columns.
• Normally apart from secondary indexing (a type of indexing), the number of records in
index file <= the number records in the main file.
0 0
• One index file is designed according to an attribute, means more than one index file can
be designed for a main file.
• Indexing gives the advantage of faster time, but space taken by index file will be an
overhead.
• Number of access required to search the correct block of main file is log2(number of
blocks in index file) + 1
• Index can be created on any field of relation (primary key, non-key)
Q Data which improves the performance and accessibility of the database are called: (NET-
DEC-2015)
(1) Indexes (2) User Data
(3) Application Metadata (4) Data Dictionary
Ans. 1
0 0
Indexing can be classified on number of criteria’s one of them could be –
• Dense Index
• Sparse Index
DENSE INDEX: -
• In dense index, there is an entry in the index file for every search key value in the
main file. This makes searching faster but requires more space to store index records
itself.
• Note that it is not for every record, it is for every search key value. Sometime
number of records in the main file >= number of search keys in the main file, for
example if search key is repeated.
0 0
SPARSE INDEX:
• If an index entry is created only for some records of the main file, then it is called
sparse index.
• No. of index entries in the index file < No. of records in the main file.
• Note: - dense and sparse are not complementary to each other, sometimes it is possible
that a record is both dense and sparse.
0 0
Basic term used in Indexing
• BLOCKING FACTOR = No. of Records per block= ⌊block size/record size⌋
• If file is unordered then no of block assesses required to reach correct block which
contain the desired record is O(n), where n is the number of blocks.
• if file is unordered then no of block assesses required to reach correct block which
contain the desired record is O(log2n), where n is the number of blocks.
0 0
TYPES OF INDEXING
Index
Single Multilevel
level
• Single level index means we create index file for the main file, and then stop the
process.
• Multiple level index means, we further index the index file and keep repeating the
process until we get one block.
0 0
PRIMARY INDEXING
• Main file is always sorted according to primary key.
• Indexing is done on Primary Key, therefore called as primary indexing
• Index file have two columns, first primary key and second anchor pointer (base address
of block)
• It is an example of Sparse Indexing.
• Here first record (anchor record) of every block gets an entry in the index file
• No. of entries in the index file = No of blocks acquired by the main file.
0 0
CLUSTERED INDEXING
• Main file will be ordered on some non-key attributes
• No of entries in the index file = no of unique values of the attribute on which indexing is
done.
• It is the example of Sparse as well as dense indexing
Q A clustering index is defined on the fields which are of type (GATE-2008) (1 Marks)
1) non-key and ordering 2) non-key and non-ordering
3) key and ordering 4) key and non-ordering
ANSWER A
Q (NET-DEC-2018)
0 0
SECONDARY INDEXING
• Most common scenarios, suppose that we already have a primary indexing on primary
key, but there is frequent query on some other attributes, so we may decide to have
one more index file with some other attribute.
• Main file is ordered according to the attribute on which indexing is done(unordered).
• Secondary indexing can be done on key or non-key attribute.
• No of entries in the index file is same as the number of entries in the index file.
• It is an example of dense indexing.
Q Suppose we have ordered file with records stored r = 30,000 on a disk with Block Size B =
1024 B. File records are of fixed size and are unspanned with record length R = 100 B.
Suppose that ordering key field of file is 9 B long and a block pointer is 6 B long, Implement
Secondary indexing?
Q Consider a file of 16384 records. Each record is 32 bytes long and its key field is of size 6
bytes. The file is ordered on a non-key field, and the file organization is unpanned. The file
is stored in a file system with block size 1024 bytes, and the size of a block pointer is 10
bytes. If the secondary index is built on the key field of the file, and a multi-level index
scheme is used to store the secondary index, the number of first-level and second-level
blocks in the multi-level index are respectively (GATE-2008) (1 Marks)
1) 8 and 0 2) 128 and 6 3) 256 and 4 4) 512 and 5
ANSWER C
Q A file is organized so that the ordering of data records is the same as or close to the
ordering of data entries in some index. Then that index is called (GATE-2015) (1 Marks)
(A) Dense (B) Sparse (C) Clustered (D) Unclustered
Answer: (C)
0 0
MULTILEVEL INDEXING
• Multi-level Index helps in breaking down the index into several smaller indices in order
to make the outermost level so small that it can be saved in a single disk block-0, which
can easily be accommodated anywhere in the main memory.
0 0
Reason to have B tree and B+ tree
• After studying indexing in detail now we understand that an index file is always sorted
in nature and will be searched frequently, and sometimes index files can be so large that
even we want to index the index file (Multilevel index), therefore we must search best
data structure to meet our requirements.
• There are number of options in data structure like array, stack, link list, graph, table etc.
but we want a data structure which support frequent insertion deletion but at the same
time also provide speed search and give us the advantage of having a sorted data.
• If we look at the data structures option then tree seems to be the most appropriate but
every kind of tree in the available option have some problems either simple tree or
binary search tree or AVL tree, so we end up on designing new data structure called B-
tree which are kind of specially designed for sorted stored index files in databases.
• B tree and B+ tree also provides efficient search time, as the height of the structure is
very less and they are also perfectly balanced.
0 0
B tree
• A B-tree of order m if non-empty is an m-way search tree in which.
o The root has at least two child nodes and at most m child nodes.
o The internal nodes except the root have at least celling(m/2) child nodes and at
most m child nodes.
o The number of keys in each internal node is one less than the number of child
nodes and these keys partition the subtrees of the nodes in a manner similar to
that of m-way search tree.
o All leaf nodes are on the same level.
Root
CHILD m 0
DATA m-1 1
CHILD m ⌈m/2⌉
Leaf
CHILD 0 0
Q Consider the following elements 5, 10, 12, 13, 14, 1, 2, 3, 4 insert them into an empty b-
tree of order = 3.
Q Consider the following elements 5, 10, 12, 13, 14, 1, 2, 4, 20, 18, 19, 17, 16, 15, 25, 23, 24,
22, 11, 30, 31, 28, 29 insert them into an empty b-tree of order = 3.
0 0
• A B-tree starts with a single root node (which is also a leaf node) at level 0 (zero). Once
the root node is full with p – 1 search key values and we attempt to insert another entry
in the tree, the root node splits into two nodes at level 1. Only the middle value is kept
in the root node, and the rest of the values are split evenly between the other two
nodes. When a non-roof node is full and a new entry is inserted into it, that node is split
into two nodes at the same level, and the middle entry is moved to the parent node
along with two pointers to the new split nodes. If the parent node is full, it is also split.
Splitting can propagate all the way to the root node, creating a new level if the root is
split.
Q Consider the Following B-tree of order m=6, delete the following nodes H, T, R, E, A, C, S
in sequence?
Deletion in B-TREE-
• If the deletion is from the leaf node and leaf node is satisfying the minimal condition
even after the deletion, then delete the value directly.
• If deletion from leaf node renders leaf node in minimal condition, then first search the
extra key in left sibling and then in the right sibling. Largest value from left sibling or
smallest value from right sibling is pushed into the root node and corresponding value
can be fetched from parent node to leaf node.
• If the deletion is to be from internal node, then first we check for the extra key in the
left and then in the right child. If we find one, we fetch the value in the required node.
And delete the key.
Deletion
0 0
• If deletion of a value causes a node to be less than half full, it is combined with its
neighboring nodes, and this can also propagate all the way to the root. Hence, deletion
can reduce the number of tree levels. It has been shown by analysis and simulation that,
after numerous random insertions and deletions on a B-tree, the nodes are
approximately 69 percent full when the number of values in the tree stabilizes. This is
also true of B+ trees. If this happens, node splitting and combining will occur only rarely,
so insertion and deletion become quite efficient. If the number of values grows, the tree
will expand without a problem—although splitting of nodes may occur, so some
insertions will take more time.
0 0
Analysis
• B- TREE- In computer science, a B-tree is a self-balancing tree data structure that
maintains sorted data and allows searches, sequential access, insertions, and deletions
in logarithmic time.
• A search tree of order p is a tree such that each node contains at most p -1 search values
and p pointers in the order <P1, K1, P2, K2, ..., Pq-1, Kq-1, Pq>, where q <= p. Each Pi is a
pointer to a child node (or a NULL pointer), and each Ki is a search value from some
ordered set of values. All search values are assumed to be unique. Two constraints must
hold at all times on the search tree:
o Within each node, K1 < K2 < ... < Kq-1.
o For all values X in the subtree pointed at by Pi, we have Ki-1 < X < Ki for 1 < i < q; X <
Ki for i = 1; and Ki-1 < X for i = q.
• We can use a search tree as a mechanism to search for records stored in a disk file. The
values in the tree can be the values of one of the fields of the file, called the search
field (which is the same as the index field if a multilevel index guides the search). Each key
value in the tree is associated with a pointer to the record in the data file having that
value.
• To guarantee that nodes are evenly distributed, so that the depth of the tree is
minimized for the given set of keys and that the tree does not get skewed with some
nodes being at very deep levels.
• To make the search speed uniform, so that the average time to find any random key is
roughly the same
• While minimizing the number of levels in the tree is one goal, another implicit goal is to
make sure that the index tree does not need too much restructuring as records are
inserted into and deleted from the main file. Thus, we want the nodes to be as full as
possible and do not want any nodes to be empty if there are too many deletions. Record
deletion may leave some nodes in the tree nearly empty, thus wasting storage space and
increasing the number of levels. The B-tree addresses both of these problems by specifying
additional constraints on the search tree.
0 0
• The B-tree has additional constraints that ensure that the tree is always balanced and that
the space wasted by deletion, if any, never becomes excessive. The algorithms for insertion
and deletion, though, become more complex in order to maintain these constraints.
Nonetheless, most insertions and deletions are simple processes; they become
complicated only under special circumstances—namely, whenever we attempt an
insertion into a node that is already full or a deletion from a node that makes it less than
half full. More formally, a B-tree of order p, when used as an access structure on a key
field to search for records in a data file, can be defined as follows:
• Each internal node in the B-tree is of the form
o <P1, <K1, Pr1>, P2, <K2, Pr2>, ..., <Kq–1, Prq–1>, Pq> where q <= p. Each Pi is a tree
pointer—a pointer to another node in the B- tree. Each Pri is a data pointer—a
pointer to the record whose search key field value is equal to Ki (or to the data file
block containing that record).
o Within each node, K1 < K2 < ... < Kq-1.
o For all search key field values X in the subtree pointed at by Pi (the ith sub- tree),
we have: Ki–1 < X < Ki for 1 < i < q; X < Ki for i = 1; and Ki–1 < X for i = q.
o Each node has at most p tree pointers.
o Each node, except the root and leaf nodes, has at least ⎡(p/2)⎤ tree pointers. The
root node has at least two tree pointers unless it is the only node in the tree.
o A node with q tree pointers, q <= p, has q – 1 search key field values (and hence has q
– 1 data pointers).
o All leaf nodes are at the same level. Leaf nodes have the same structure as internal
nodes except that all of their tree pointers Pi are NULL.
0 0
Conclusion: -
• Less number of nodes(blocks) are used and height is also optimized, so access will be
very fast.
• Difficulty of traversing the key sequentially. Means B-TREE do not hold good for range-
based queries of database.
Q A B-Tree used as an index for a large database table has four levels including the root
node. If a new key is inserted in this index, then the maximum number of nodes that could
be newly created in the process are: (Gate-2005) (1 Marks)
(A) 5 (B) 4 (C) 3 (D) 2
Answer: (A)
0 0
B+ Tree
Q Consider the following elements 5, 10, 12, 13, 14, 1, 2, 3, 4 insert them into an empty b+
tree of order = 3.
Q Consider the following elements 5, 10, 12, 13, 14, 1, 2, 4, 20, 18, 19, 17, 16, 15, 25, 23,
24, 22, 11, 30, 31, 28, 29 insert them into an empty b+ tree of order = 5
Insertion in B+ Tree-
• Start from root node and proceed towards leaf using the logic of binary search tree.
Value is inserted in the leaf.
• If overflow condition occurs pick the median and push it into the parent node. Also
copy the median or key inserted in parent node to the left or right child node.
Q Consider the following elements 5, 8, 1, 7, 3, 12, 9, 6 insert them into an empty b+ tree of
order = 3. and then delete following nodes in sequence 9, 8, 12?
Deletion in B+ Tree-
• if B+ tree entries are deleted at the leaf nodes, then the target entry is searched and
deleted.
• If it is an internal node, delete and replace with the entry from the left position.
• If underflow occurs, distribute the entries from the nodes left to it.
• If distribution is not possible from left or from right, then Merge the node with left and
right to it
0 0
Analysis
• Most implementations of a dynamic multilevel index use a variation of the B-tree data
structure called a B+-tree. In a B-tree, every value of the search field appears once at
some level in the tree, along with a data pointer.
• In a B+-tree, data pointers are stored only at the leaf nodes of the tree; hence, the
structure of leaf nodes differs from the structure of internal nodes.
• The leaf nodes have an entry for every value of the search field, along with a data
pointer to the record (or to the block that contains this record) if the search field is a key
field.
• The leaf nodes of the B+-tree are usually linked to provide ordered access on the search
field to the records.
• Each internal node is of the form <P1, K1, P2, K2, ..., Pq – 1, Kq –1, Pq>
• Within each internal node, K1 < K2 < ... < Kq-1.
• For all search field values X in the subtree pointed at by Pi, we have Ki-1 < X <= Ki for 1
< i < q; X <= Ki for i = 1; and Ki-1 < X for i = q.
• Each internal node has at most p tree pointers.
• Each internal node, except the root, has at least ⎡(p/2)⎤ tree pointers. The root node has at
least two tree pointers if it is an internal node.
• An internal node with q pointers, q <= p, has q - 1 search field values.
• The structure of the leaf nodes of a B+-tree of order p is as follows:
• Each leaf node is of the form <<K1, Pr1>, <K2, Pr2>, ..., <Kq–1, Prq–1>, Pnext> where
q <= p, each Pri is a data pointer, and Pnext points to the next leaf node of the B+-tree.
• Within each leaf node, K1 <= K2 ... , K q-1, q <= p.
• Each Pri is a data pointer that points to the record whose search field value is Ki or to a
file block containing the record (or to a block of record pointers that point to records
whose search field value is Ki if the search field is not a key).
• Each leaf node has at least ⎡(p/2)⎤ values.
• All leaf nodes are at the same level.
0 0
B+ is the modified B-tree, where we perform two major modifications in order to support
sequential search.
0 0
a) 2 b) 3 c) 4 d) 5
(GATE-2009) (2 Marks)
ANSWER d
Q (GATE-2015) (1 Marks)
Ans: 4
Q Consider a B+-tree in which the maximum number of keys in a node is 5. What is the
minimum number of keys in any non-root node? (GATE-2010) (1 Marks)
Answer 2
Q Which of the following is a key factor for preferring B+-trees to binary search trees for
indexing database relations? (Gate-2005) (1 Marks)
a) Database relations have a large number of records
0 0
b) Database relations are sorted on the primary key
c) B+-trees require less memory than binary search trees
d) Data transfer form disks is in blocks
Ans: d
Q Which one of the following statements is NOT correct about the B+ tree data structure
used for creating an index of a relational database table? (GATE-2019) (1 Marks)
(1) Each leaf node has a pointer to the next leaf node
(2) Non-leaf nodes have pointers to data records
(3) B+ Tree is a height-balanced tree
(4) Key values in each node are kept in sorted order
Ans: b
Q In a B+ tree, if the search-key value is 8 bytes long, the block size is 512 bytes and the
block pointer is 2 bytes, then the maximum order of the B+ tree is. (GATE-2017) (2 Marks)
Ans: 52
Q Consider B+ tree in which the search key is 12 bytes long, block size is 1024 bytes, record
pointer is 10 bytes long and block pointer is 8 bytes long. The maximum number of keys
that can be accommodated in each non-leaf node of the tree is (Gate-2015) (2 Marks)
Answer: 50
Q The order of a leaf node in a B+- tree is the maximum number of (value, data record
pointer) pairs it can hold. Given that the block size is 1K bytes, data record pointer is 7 bytes
0 0
long, the value field is 9 bytes long and a block pointer is 6 bytes long, what is the order of
the leaf node? (GATE-2007) (1 Marks)
a) 63 b) 64 c) 67 d) 68
ANSWER a
Q In a database file structure, the search key field is 9 bytes long, the block size is 512 bytes,
a record pointer is 7 bytes and a block pointer is 6 bytes. The largest possible order of a
non-leaf node in a B+ tree implementing this file structure is (Gate-2006) (2 Marks)
(A) 23 (B) 24 (C) 34 (D) 44
Answer: (C)
Q The order of an internal node in a B+ tree index is the maximum number of children it
can have. Suppose that a child pointer takes 6 bytes, the search field value takes 14 bytes,
and the block size is 512 bytes. What is the order of the internal node? (Gate-2004) (2
Marks)
(A) 24 (B) 25 (C) 26 (D) 27
Answer: (C)
Q A B+ -tree index is to be built on the Name attribute of the relation STUDENT. Assume
that all student names are of length 8 bytes, disk block are size 512 bytes, and index
pointers are of size 4 bytes. Given this scenario, what would be the best choice of the
degree (i.e. the number of pointers per node) of the B+ -tree? (Gate-2002) (2 Marks)
(A) 16 (B) 42 (C) 43 (D) 44
Answer: (C)
Q Consider a table T in a relational database with a key field K. A B-tree of order p is used as
an access structure on K, where p denotes the maximum number of tree pointers in a B-
tree index node. Assume that K is 10 bytes long; disk block size is 512 bytes; each data
pointer PD is 8 bytes long and each block pointer PB is 5 bytes long. In order for each B-tree
node to fit in a single disk block, the maximum value of p is (Gate-2004) (2 Marks)
(A) 20 (B) 22 (C) 23 (D) 32
Answer: (C)
0 0
Q Consider a join (relation algebra) between relations r(R) and s(S) using the nested loop
method. There are 3 buffers each of size equal to disk block size, out of which one buffer is
reserved for intermediate results. Assuming size(r(R)) < size(s(S)), the join will have fewer
number of disk block accesses if (GATE-2014) (2 Marks)
(1) relation r(R) is in the outer loop
(2) relation s(S) is in the outer loop
(3) join selection factor between r(R) and s(S) is more than 0.5
(4) join selection factor between r(R) and s(S) is less than 0.5
Ans: a
a b c d
a) i ii iv iii
b) ii i iv iii
c) i iii iv ii
d) ii iv i iii
Ans: D
Q A B-tree of order 4 is built from scratch by 10 successive insertions. What is the maximum
number of node splitting operations that may take place? (GATE-2008) (1 Marks)
Answer 5
Q In the indexed scheme of blocks to a file, the maximum possible size of the file depends
on: (NET-JUNE-2015)
(1) The number of blocks used for index and the size of index
(2) Size of Blocks and size of Address
(3) Size of index
(4) Size of block
Ans. 1
0 0
TRANSACTION
• Why we study transaction?
o According to general computation principle (operating system) we may have partially
executed program, as the level of atomicity is instruction i.e. either an instruction is
executed completely or not
o But in DBMS view, user perform a logical work(operation) which is always atomic in
nature i.e. either operation is execute or not executed, there is no concept like
partial execution. For example, Transaction T1 which transfer 100 units from account
A to B
T1
Read(A)
A = A-100
Write(A)
Read(B)
B = B+100
Write(B)
o In this transaction if a failure occurs after Read(B) then the final statue of the system
will be inconsistent as 100 units are debited from account A but not credited in
account B, this will generate inconsistency.
o Here for ‘consistency’ before (A + B) == after (A + B)”
0 0
What is Transaction
o To remove this partial execution problem, we increase the level of atomicity and bundle all
the instruction of a logical operation into a unit called transaction.
o So formally ‘A transaction is a Set of logically related instructions to perform a logical unit
of work’.
o As here we are only concerned with DBMS so we well only two basic operation on database
• READ (X) - Accessing the database item x from disk (where database stored data) to
memory variable also name as X.
• WRITE (X) - Writing the data item from memory variable X to disk.
0 0
Desirable Properties of Transaction
o Now as the smallest unit which have atomicity in DBMS view is transaction, so if want that
our data should be consistent then instead of concentrating on data base, we must
concentrate on the transaction for our data to be consistent.
o Transactions should possess several properties, often called the ACID properties; to provide
integrity and consistency of the data in the database. The following are the ACID
properties:
o Atomicity - A transaction is an atomic unit of processing; it should either be
performed in its entirety or not performed at all. It is the responsibility of recovery
control manager / transaction control manager of DBMS to ensure atomicity
o Consistency - A transaction should be consistency preserving, meaning that if it is
completely executed from beginning to end without interference from other
transactions, it should take the database from one consistent state to another. The
definition of consistency may change from one system to another. The preservation
of consistency of database is the responsibility of programmers(users) or the DBMS
modules that enforces integrity constraints.
o Isolation - A transaction should appear as though it is being executed in isolation
from other transactions, even though many transactions are executing concurrently.
That is, the execution of a transaction should not be interfered with by any other
transactions executing concurrently. The isolation property of database is the
responsibility of concurrency control manager of database.
o Durability - The changes applied to the database by a committed transaction must
persist in the database. These changes must not be lost because of any failure. It is
the responsibility of recovery control manager of DBMS.
0 0
Transaction states
o ACTIVE - It is the initial state. Transaction remains in this state while it is executing
operations.
o PARTIALLY COMMITTED - After the final statement of a transaction has been executed, the
state of transaction is partially committed as it is still possible that it may have to be
aborted (due to any failure) since the actual output may still be temporarily residing in
main memory and not to disk.
o FAILED - After the discovery that the transaction can no longer proceed (because of
hardware /logical errors). Such a transaction must be rolled back
o ABORTED - A transaction is said to be in aborted state when the when the transaction has
been rolled back and the database has been restored to its state prior to the start of
execution.
o COMMITTED - A transaction enters committed state after successful completion of a
transaction and final updation in the database
Q if the transaction is in which of the state that we can guarantee that data base is in
consistent state
a) aborted b) committed c) both aborted & committed d) none
Q Match:
Column I Column II
1. Atomicity A. Recovery Manager
2. Durability B. Concurrency control manager
3. Isolation C. Programmer
4. Consistency
a) 1-a, 2–a, 3-b, 4–c b) 1-a, 2-b,3-b,4-c c) 1-a,2-a,3-b,4-b d) none of these
0 0
PROBLEMS DUE TO CONCURRENT EXECUTION OF TRANSACTION
Concurrent execution is necessary because-
• It leads to good database performance.
• Disk accesses are frequent and relatively slow.
• Overlapping I/O activity with CPU increases throughput and response time.
But interleaving of instructions between transactions may also lead to many problems that can
lead to inconsistent database. Sometimes it is possible that even though individual transaction
are satisfying the acid properties even though the final statues of the system will be
inconsistent.
0 0
Lost update problem / Write - Write problem
o If there is any two write operation of different transaction on same data value, and
between them there is no read operations, then the second write over writes the first
write.
T1 T2
Read(A)
Write(A)
Write(A)
Commit
Commit
0 0
Dirty read problem/ Read -Write problem
• In this problem, the transaction reads a data item updated by another uncommitted
transaction, this transaction may in future be aborted or failed.
• The reading transactions end with incorrect results.
T1 T2
Read(A)
Write(A)
Read(A)
Commit
Abort
Q The problem that occurs when one transaction updates a database item and
then the transaction fails for some reason is ________. (NET-JUNE-2012)
(A) Temporary Select Problem (B) Temporary Modify Problem
(C) Dirty Read Problem (D) None
Ans: d
0 0
Unrepeatable read problem/phantom read problem
▪ When a transaction tries to read a value of a data item twice, and another transaction
updates the data item in between, then the result of the two read operation of the first
transaction will differ, this problem is called, Non-repeatable read problem
T1 T2
Read(A)
Read(A)
Write(A)
Read(A)
T1 T2
Read(A)
Delete(A)
Read(A)
Q The following schedule is suffering from
Q Which of the following scenarios may lead to an irrecoverable error in a database system?
(GATE – 2008) (1 Marks)
(A) A transaction writes a data item after it is read by an uncommitted transaction
(B) A transaction reads a data item after it is read by an uncommitted transaction
(C) A transaction reads a data item after it is written by a committed transaction
(D) A transaction reads a data item after it is written by an uncommitted transaction
Answer: (D)
0 0
Solution is Schedule
o When two or more transaction executed together or one after another then they can be
bundled up into a higher unit of execution called schedule, A schedule of n transactions T1,
T2, ..., Tn is an ordering of the operations of the transactions. Operations from different
transactions can be interleaved in the schedule S.
o However, schedule for a set of transaction must contain all the instruction of those
transaction, and for each transaction Ti that participates in the schedule S, the operations
of Ti in S must appear in the same order in which they occur in Ti.
o Schedule can be of two types-
o Serial schedule - A serial schedule consists of sequence of instruction belonging to
different transactions, where instructions belonging to one single transaction appear
together. Before complete execution of one transaction another transaction cannot
be started.
o For a set of n transactions, there exist n! different valid serial schedules. Every serial
schedule lead database into consistent state. Throughput of system is less.
0 0
o So the number of schedules for n different transaction T1,T2,T3,---------TN where each
transaction conations n1, n2, n3, -----------,n4 respectively will be
o {(n1, n2, n3, -----------,nn)!}/ (n1! n2! n3! -----------nn!)
o Conclusion of schedules
o We do not have any method to proof that a schedule is consistent , but from the
above discussion we understand that a serial schedule will always be consistent , so if
somehow we proof that a non-serial schedule will also have same effects as of a
serial schedule that we get a proof that , this particular non-serial schedule will also
be consistent “find those schedules that are logically equal to serial schedules”.
o For a concurrent schedule to result in consistent state, it should be equivalent to a
serial schedule. i.e. it must be serializable.
Conflict
Recoverable
serializable
View
Cascadeless
serializable
Result Strict
Equivalent Recoverable
0 0
SERIALIZABILITY
Conflicting instructions - Let I and J be two consecutive instructions belonging to two different
transactions Ti and Tj in a schedule S, the possible I and J instruction can be as-
I= READ(Q), J=READ(Q) ->Non-conflicting
I= READ(Q), J=WRITE(Q) ->Conflicting
I= WRITE(Q), J=READ(Q) ->Conflicting
I= WRITE(Q), J=WRITE(Q) ->Conflicting
So, the instructions I and J are said to be conflicting, if they are operations by different
transactions on the same data item, and at least one of these instructions is a write operation.
T1 T2
R(B)
B=B+50
R(A)
A=A-50
R(B)
B=B+50
R(A)
A=A+10
0 0
CONFLICT SERIALIZABLE
• The schedules which are conflict equivalent to a serial schedule are called conflict
serializable schedule. If a schedule S can be transformed into a schedule S’ by a series of
swaps of non- conflicting instructions, we say that S and S’ are conflict equivalent.
• A schedule S is conflict serializable, if it is conflict equivalent to a serial schedule.
0 0
Procedure for determining conflict serializability of a schedule
o It can be determined using PRECEDENCE GRAPH method:
o A precedence graph consists of a pair G (V, E)
▪ V= set of vertices consisting of all the transactions participating in the schedule.
▪ E= set of edges consists of all edges Ti -> Tj, for which one of the following conditions
holds:
▪ Ti executes write(Q) before Tj executes read(Q)
▪ Ti executes read(Q) before Tj executes write(Q)
▪ Ti executes write(Q) before Tj executes write(Q)
o If an edge Ti -> Tj exists in the precedence graph, then in any serial schedule S’ equivalent to
S, Ti must appear before Tj.
o If the precedence graph for S has no cycle, then schedule S is conflict serializable, else it is
not. This cycle detection can be done by cycle detection algorithms, one of them based on
depth first search takes O(n2) time.
0 0
Q Consider the following schedule for transactions T1, T2 and T3: (GATE – 2010) (2 Marks)
Which one of the schedules below is the correct serialization of the above?
(A) T1->>T3->>T2 (B) T2->>T1->>T3 (C) T2->>T3->>T1 (D) T3->>T1->>T2
Answer: (A)
Q Consider two transactions T1 and T2 which form schedules S1, S2, S3 and S4 as follows: -
T1: R1 [A], W1 [A], W1 [B] T2: R2 [A], R2 [B], W2 [B]
S1: R1[A], R2 [A], R2[B], W1[A], W2[B], W1[B]
S2: R1[A], R2 [A], R2[B], W1[A], W1[B], W2 [B]
S3: R2[A], R1 [A], R2[B], W1[A], W1[B], W2[B]
S4: R1[A], W2 [A], R2[A], W1[B], R2[B], W2[B]
Which of the above schedules is conflicts serializable? (GATE - 2009) (2 Marks)
a) Only S1 b) Both S1 and S2 c) Both S1 and S4 d) Both S3 and S4
0 0
Q Consider the following four schedules due to three transactions (indicated by the subscript)
using read and write on a data item x, denoted by r(x) and w(x) respectively. Which one of
them is conflict serializable. (GATE - 2014) (2 Marks)
(a) r1 (x); r2 (x); w1 (x); r3 (x); w2 (x) (b) r2 (x);r1 (x);w2 (x);r3 (x);w1 (x)
(c) r3 (x);r2 (x);r1 (x);w2 (x);w1 (x) (d) r2 (x);w2 (x);r3 (x);r1 (x);w1 (x)
Ans: d
Q Consider the transactions T1, T2, and T3 and the schedules S1 and S2 given below. (GATE -
2006) (2 Marks)
T1: r1(X); r1(Z); w1(X); w1(Z)
T2: r2(Y); r2(Z); w2(Z)
T3: r3(Y); r3(X); w3(Y)
S1: r1(X); r3(Y); r3(X); r2(Y); r2(Z);w3(Y); w2(Z); r1(Z); w1(X); w1(Z)
S2: r1(X); r3(Y); r2(Y); r3(X); r1(Z);r2(Z); w3(Y); w1(X); w2(Z); w1(Z)
Which one of the following statements about the schedules is TRUE?
(A) Only S1 is conflict-serializable. (B) Only S2 is conflict-serializable.
(C) Both S1 and S2 are conflict-serializable. (D) Neither S1 nor S2 is conflict-serializable.
Answer: (A)
Q Consider three data items D1, D2 and D3 and the following execution schedule of
transactions T1, T2 and T3. In the diagram, R(D) and W(D) denote the actions reading and
writing the data item D respectively. (GATE – 2003) (2 Marks)
0 0
Which of the following statements is correct?
(A) The schedule is serializable as T2; T3; T1 (B) The schedule is serializable as T2; T1; T3
(C) The schedule is serializable as T3; T2; T1 (D) The schedule is not serializable
Answer: (D)
Q Consider following schedules involving two transactions : S 1 : r1(X); r1(Y); r2(X); r2(Y);
w2(Y); w1(X) S 2 : r1(X); r2(X); r2(Y); w2(Y); r1(Y); w1(X) Which of the following statement is
true ? (NET-JAN-2017)
(1) Both S1 and S2 are conflict serializable.
(2) S1 is conflict serializable and S2 is not conflict serializable.
(3) S1 is not conflict serializable and S2 is conflict serializable.
(4) Both S1 and S2 are not conflict serializable.
Ans: c
Q Consider the following four schedules due to three transactions (indicated by the subscript)
using read and write on a data item X, denoted by r(X) and w(X) respectively. Which one of
them is conflict serializable? (NET-NOV-2017)
S1: r1(X); r2(X); w1(X); r3(X); w2(X) S2: r2(X); r1(X); w2(X); r3(X); w1(X)
S3: r3(X); r2(X); r1(X); w2(X); w1(X) S4: r2(X); w2(X); r3(X); r1(X); w1(X)
(1) S1 (2) S2 (3) S3 (4) S4
Ans: d
Q Consider the following transactions with data items P and Q initialized to zero: (GATE-2012)
(2 Marks)
T1: read (P);
read (Q);
if P = 0 then Q: = Q + 1;
write (Q);
0 0
(C) A conflict serializable schedule
(D) A schedule for which a precedence graph cannot be drawn
Answer: (B)
Q Consider the following transaction involving two bank accounts x and y. (GATE - 2015) (1
Marks)
read(x);
x: = x – 50;
write(x);
read(y);
y: = y + 50;
write(y)
The constraint that the sum of the accounts x and y should remain constant is that of
(A) Atomicity (B) Consistency (C) Isolation (D) Durability
Answer: (B)
Q Consider the following schedule S of transactions T1, T2, T3, T4 (GATE - 2014) (2 Marks)
0 0
(D) S is neither conflict-serializable nor is it recoverable
Answer: (C)
Q Consider the following partial Schedule S involving two transactions T1 and T2. Only the
read and the write operations have been shown. The read operation on data item P is denoted
by read(P) and the write operation on data item P is denoted by write(P).
Suppose that the transaction T1 fails immediately after time instance 9. Which one of the
following statements is correct? (GATE-2015) (2 Marks)
(A) T2 must be aborted and then both T1 and T2 must be re-started to ensure transaction
atomicity
(B) Schedule S is non-recoverable and cannot ensure transaction atomicity
(C) Only T2 must be aborted and then re-started to ensure transaction atomicity
(D) Schedule S is recoverable and can ensure atomicity and nothing else needs to be done
Ans: b
Q Consider a simple checkpointing protocol and the following set of operations in the log.
(start, T4);
(write, T4, y, 2, 3);
(start, T1);
0 0
(commit, T4);
(write, T1, z, 5, 7);
(checkpoint);
(start, T2);
(write, T2, x, 1, 9);
(commit, T2);
(start, T3);
(write, T3, z, 7, 2);
If a crash happens now and the system tries to recover using both undo and redo operations,
what are the contents of the undo list and the redo list (GATE - 2015) (2 Marks)
(A) Undo: T3, T1; Redo: T2 (B) Undo: T3, T1; Redo: T2, T4
(C) Undo: none; Redo: T2, T4, T3; T1 (D) Undo: T3, T1, T4; Redo: T2
Answer: (A)
0 0
VIEW SERIALIZABLE
• If a schedule is not conflict serializable, still it can be consistent, so let us study a weaker
form of serializability called View serializability, and even if a schedule is view serializable
still it can be consistent.
• If a schedule is conflict serializable then it will also be view serializable, so we must check
view serializability only if a schedule is not conflict serializable.
• If a schedule is not conflict serializable then it must have at least one blind write to be
eligible for view serializable. i.e. if a schedule is not conflict serializable and it does not
contain any blind write then it can never be view serializable, but if not conflict serializable
and have blind write then may or may not be view serializable.
• If a schedule is not conflict serializable and if there exist a blind write. First tabulate all
serial schedules possible. Then check one by one whether given schedule is view equivalent
to any of the serial schedule. If yes then schedule is view serializable otherwise not.
• Two schedules S and S’ are view equivalent, if they satisfy following conditions –
o For each data item Q, if the transaction Ti reads the initial value of Q in schedule S ,
then then the transaction Ti must, in schedule S’ ,also read the initial value of Q.
o If a transaction Ti in schedule S reads any data item Q, which is updated by
transaction Tj, then a transaction Ti must in schedule S’ also read data item Q updated
by transaction Tj in schedule S’.
o For each data item Q, the transaction (if any) that performs the final write(Q)
operation in schedule S, then the same transaction must also perform final write(Q)
in schedule S’.
• Complexity wise finding the schedule is view serializable or not is a NP- complete problem.
0 0
View Serializable
A schedule S is view serializable, if it is view equivalent to a serial schedule.
E.g.
• BLIND WRITES- In the above example, transaction T4 and T6 perform write operation on
data item Q without accessing (reading the data item), such updation without
knowing/accessing previous value of data item, are called Blind updation or BLIND WRITE.
• Every view serializable that is not conflict serializable has a BLIND WRITE
0 0
Q Consider the following schedule ‘S’ with three transactions.
S: R1(B); R3(C); R1 (A); W2 (A); W1(A), W2 (B); W3 (A); W1 (B); W3 (B), W3 (C)
Which of the following is TRUE with respect to the above schedule?
a) It is conflict serializable with sequence [T1, T2 T3]
b) It is conflict serializable with sequence [T2, T1 T3]
c) It is view serializable but not conflict serializable
d) It is neither conflict serializable nor view serializable
Q The following schedule S is having 4 transactions and is executed concurrently. The order of
their operations is given below.
S: R1 (x); R2 (y); W2 (x); W3 (z); R4 (z); R3 (x); W3 (y); W1 (x); W2 (y) W3 (x); R4(x); W4 (y); commit
1; commit 2; commit 3; commit 4;
The schedule is
a) Conflict serializable with sequence [T1 T2 T3 T4]
b) Conflict serializable with sequence [T2 T1 T3 T4]
c) View serializable but not conflict serializable
d) Neither conflict serializable nor view serializable
0 0
RECOVERABILITY
RECOVERABLE SCHEDULE-
A schedule in which for each pair of transaction Ti and Tj , such that if Tj reads a data item
previously written by Ti, then the commit or abort of Ti must appear before Tj. Such a schedule
is called Recoverable schedule.
E.g. –
Sa: r1(X); r2(X); w1(X); r1(Y); w2(X); c2; w1(Y); c1; is recoverable
Sc: r1(X); w1(X); r2(X); r1(Y); w2(X); c2; a1; is not recoverable
0 0
CASCADING ROLLBACK
▪ It is a phenomenon, in which a single transaction failure leads to a series of transaction
rollbacks, is called cascading rollback.
▪ Even if the schedule is recoverable the, the commit of transaction may lead lot of
transaction to rollback.
▪ Cascading rollback is undesirable, since it leads to undoing of a significant amount of work.
▪ Uncommitted reads are not allowed in cascade less schedule.
▪ E.g.
In above schedule T11 reads a data item written by T10 and T12 reads a data item written by
T11. So aborting of T10 may lead T11 and so T12 to roll back.
CASCADELESS SCHEDULE-
▪ To avoid cascading rollback, cascade less schedule are used.
▪ A schedule in which for each pair of transactions Ti and Tj, such that if Tj reads a data item
previously written by Ti then the commit or abort of Ti must appear before read operation
of Tj. Such a schedule is called cascade less schedule.
0 0
STRICT
SCHEDULE
A schedule in which for each pair of transactions Ti and Tj , such that if Tj reads a data item
previously written by Ti then the commit or abort of Ti must appear before read and write
operation of Tj.
0 0
Q Consider the following schedules:
S1: R1(x) W1(x)R1(y)R2(x)W2(x)C2, C1;
S2: R2(x) W2(x)R1(y)R1(x)W2(x)C2, C1;
Which of the following is true?
a) Both S1 and S2 are recoverable b) S1 is recoverable but S2 is not
c) S2 is recoverable but S1 is not d) Both schedule are not Recoverable
Q A Schedule that is not conflict serializable and contains at least one blind write then the
schedule is
a) Always view serializable b) Always non-serializable
c) May be serializable d) None of the above
0 0
c) Recoverable, cascade less and also strict d) Not recoverable
T1: r1(X)w1(X)r1(Y)w1(Y)
T2: r2(Y)w2(Y)r2(Z)w2(Z)
where ri(V) denotes a read operation by transaction Ti on a variable V and wi(V) denotes a
write operation by transaction Ti on a variable V. The total number of conflict serializable
schedules that can be formed by T1 and T2 is ______(GATE-2017) (2 Marks)
Answer: 54
Q Suppose a database schedule S involves transactions T1, T2, .............,Tn. Consider the
precedence graph of S with vertices representing the transactions and edges representing the
conflicts. If S is serializable, which one of the following orderings of the vertices of the
precedence graph is guaranteed to yield a serial schedule? (NET-NOV-2017)
(1) Topological order (2) Depth - first order
(3) Breadth - first order (4) Ascending order of transaction indices
Ans: a
0 0
System log /transaction log
• To be able to Recover from failures, recovery subsystem of database maintains transaction
log to keep track of all operations of a transaction that effects database items, as well as
other information that may be needed in recovery operation until the commit/abort point
of a transaction.
• Log records can be used to trace out the transaction steps in the event of failure where the
transaction need to be rolled back and all the operations of a transaction either need to be
redone/ undone.
• A log file is written on the disk to avoid hardware/logical failures.
• A log record for a transaction has following syntax-
0 0
Q Usage of Pre-emption and Transaction Rollback prevents ______. (NET-SEP-2013)
(A) Unauthorized usage of data file (B) Deadlock situation
(C) Data manipulation (D) File pre-emption
Ans: b
Q (NET-DEC-2018)
0 0
Topological order in a precedence graph
• Visit the vertex V in a graph G with in degree zero and delete it from the graph.
• Repeat the step 1 till the graph is empty.
• The order in which the vertex is deleted is the serializability order of the equivalent serial
schedule. The number of conflict equal schedules is equal to no. of topological orders
possible in given acyclic precedence graph
T1->T2->T3->T4->T5
T1->T3->T2->T4->T5
T5->T1->T2->T3->T4
T5->T1->T3->T2->T4
` T1->T5->T2->T3->T4
T1->T2->T5->T3->T4
T1->T2->T3->T5->T4
T1->T5->T3->T3->T4
T1->T3->T2->T5->T4
0 0
Q (GATE - 2007) (2 Marks)
Q NOT a part of the ACID properties of database transactions? (GATE- 2016) (1 Marks)
(a) Atomicity (b) Consistency
(c) Isolation (d) Deadlock-freedom
Ans: d
Q Consider the following database schedule with two transactions, T1 and T2.
S = r2(X); r1(X); r2(Y); w1(X); r1(Y); w2(X); a1; a2
where ri(Z) denotes a read operation by transaction Ti on a variable Z, wi(Z) denotes a write
operation by Ti on a variable Z and ai denotes an abort by transaction Ti.
Which one of the following statements about the above schedule is TRUE? (GATE- 2016) (1
Marks)
(a) S is non-recoverable (b) S is recoverable, but has a cascading abort
(c) S does not have a cascading abort (d) S is strict
Ans: c
0 0
(GATE – 2015) (1 Marks)
(a) Atomicity (b) Consistency (c) Isolation (d) Durability
Ans: b
Q Suppose a database schedule S involves transactions T1, ..., Tn. Construct the precedence
graph of S with vertices representing the transactions and edges representing the conflicts. If S
is serializable, which one of the following orderings of the vertices of the precedence graph is
guaranteed to yield a serial schedule? (GATE- 2016) (1 Marks)
(a) Topological order (b) Depth-first order
(c) Breadth-first order (d) Ascending order of transaction indices
Ans: a
Q In a certain database there are 2 transactions, one contains 5 instructions and another
contains 3 instructions. Then number of concurrent schedules possible is ______.
0 0
CONCURRENCY CONTROL
• Now we understood that if there is a schedule how to check whether it will work
correctly or not i.e. weather it will maintain the consistency of the data base or not.
(conflict serializability, view serializability, recoverability and cascade less)
• Now we will understand those protocol which guarantee how to design those schedules
which ensure conflict serializability or other properties.
• There are different approach or idea to ensure conflict serializability which is the most
important property. So first we must understand what is the possibility of conflict
between two instruction and if somehow, we manage than the generated schedule will
always be conflict serializable
• If we remember, two instructions are conflicting if and only if three things happen
simultaneously
0 0
• Now question is how to approach conflict serializability, there are two popular
approaches to go forwards.
o Time stamping based method: - where before entering the system, a specific
order is decided among the transaction, so in case of a clash we can decide which
one to allow and which to stop.
o Lock based method: - where we ask a transaction to first lock a data item before
using it. So that no different transaction can use a data at the same time,
removing any possibility of conflict.
▪ 2 phase locking
• Basic 2pl
• Conservative 2pl
• Rigorous 2pl
• Strict 2pl
▪ Graph based protocol
o Validation based protocol – Majority of transactions are read only transactions,
the rate of conflicts among the transaction may be low, thus many of transaction,
if executed without the supervision of a concurrency control scheme, would
nerveless leave the system in a consistent state.
0 0
TIME STAMP ORDERING PROTOCOL
• Basic idea of time stamping is to decide the order between the transaction before they
enter in the system using a stamp (time stamp), in case of any conflict during the
execution order can be decided using the time stamp.
• Let’s understand how this protocol works, here we have two idea of timestamping, one
for the transaction, and other for the data item.
• Time stamp with transaction,
o With each transaction ti, in the system, we associate a unique fixed timestamp,
denoted by TS(ti). this timestamp is assigned by database system to a
transaction at time transaction enters into the system. If a transaction has
been assigned a timestamp TS(ti) and a new transaction tj , enters into the
system with a timestamp TS(tj), then always TS(ti) <TS(tj).
o Two things are to be noted, first time stamp of a transaction remain fixed
throughout the execution, second it is unique means no two transaction can
have the same timestamp.
o The reason why we called time stamp not stamp, because for stamping we use
the value of the system clock as stamp, advantage is, it will always be unique
as time never repeats and there is no requirement of refreshing and starting
with fresh value.
o The time stamp of the transaction also determines the serializability order.
Thus if TS(ti) <TS(tj), then the system must ensure that the produced schedule
is equivalent to a serial schedule in which transaction ti appears before
transaction tj.
0 0
• Time stamp with data item
o In order to assure such scheme, the protocol maintains for each data item Q
two timestamp values:
o W-timestamp(Q) is the largest time-stamp of any transaction that executed
write(Q) successfully.
o R-timestamp(Q) is the largest time-stamp of any transaction that executed
read(Q) successfully.
o These timestamps are updated whenever a new read(Q) or write(Q)
instruction is executed.
• Suppose a transaction Ti request a read(Q)
o If TS(Ti ) < W-timestamp(Q), then Ti needs to read a value of Q that was
already overwritten. Hence, the read operation is rejected, and Ti is rolled
back.
o If TS(Ti )≥ W-timestamp(Q), then the read operation is executed, and R-
timestamp(Q) is set to the maximum of R-timestamp(Q) and TS(Ti).
• Suppose that transaction Ti issues write(Q).
o If TS(Ti ) < R-timestamp(Q), then the value of Q that Ti is producing was
needed previously, and the system assumed that that value would never be
produced. Hence, the write operation is rejected, and Ti is rolled back.
o If TS(Ti ) < W-timestamp(Q), then Ti is attempting to write an obsolete value of
Q. Hence, this write operation is rejected, and Ti is rolled back.
o If TS(Ti ) ≥ R-timestamp(Q), then the write operation is executed, and W-
timestamp(Q) is set to max(W-timestamp(Q), TS(Ti )).
o If TS(Ti ) ≥ W-timestamp(Q), then the write operation is executed, and W-
timestamp(Q) is set to max(W-timestamp(Q), TS(Ti )).
• If a transaction Ti is rolled back by the concurrency control scheme as a result of
either a read or write operation, the system assigns it’s a new timestamp and restarts
it.
0 0
Properties
Conflict View recoverability Cascade less ness Deadlock
serializability serializability Independence
yes yes No no Yes
Conclusion: -
• It may cause starvation to occur, as if a sequence of conflicting short transactions causes
repeated restarting of the long transaction, and then there is a possibility of starvation
of long transaction.
• It is relatively slow as before executing every instruction we have to check conditions
before. Time stamping protocol ensure that the schedule designed through this protocol
will always be conflict serializable.
• This protocol can also be used for determining the serializability order (order in which
transaction must execute) among the transaction in advance.
0 0
THOMAS WRITE RULE
• Thomas write is an improvement in time stamping protocol, which makes some
modification and may generate those protocols that are even view serializable, because
it allows greater potential concurrency.
• It is a Modified version of the timestamp-ordering protocol in which obsolete write
operations may be ignored under certain circumstances.
• The protocol rules for read operations remain unchanged. while for write operation,
there is slightly change in Thomas write rule than timestamp ordering protocol.
• When Ti attempts to write data item Q, if TS(Ti ) < W-timestamp(Q), then Ti is
attempting to write an obsolete value of {Q}. Rather than rolling back Ti as the
timestamp ordering protocol would have done, this {write} operation can be ignored.
Otherwise this protocol is the same as the timestamp ordering protocol.
• This modification is valid as the any transaction with TS(Ti ) < W-timestamp(Q), the
value written by this transaction will never be read by any other transaction performing
Read(Q) ignoring such obsolete write operation is considerable.
• Thomas' Write Rule allows greater potential concurrency. Allows some view-serializable
schedules that are not conflict serializable.
• E.g.-
Precedence Graph
T3 T6
T4
• Above schedule is not allowed in timestamp ordering as the schedule is not conflict
serializable (precedence graph consist of cycle) , while the schedule is allowed in
Thomas write rule protocol because it ignores the write(Q) operation of T3,being it an
obsolete updation to a value.And equivalent serial schedule is so T3->T4->T6.
if TS(T2)<TS(T1) then
T1 is killed
else T2 waits.
0 0
Assume any transaction that is not killed terminates eventually. Which of the following is
TRUE about the database system that uses the above algorithm to prevent deadlocks?
(GATE-2017) (2 Marks)
(a) The database system is both deadlock-free and starvation-free
(b) The database system is deadlock-free, but not starvation-free
(c) The database system is starvation-free, but not deadlock-free
(d) The database system is neither deadlock-free nor starvation-free
Ans: a
0 0
Lock Based Protocols
• To ensure isolation is to require that data items be accessed in a mutually exclusive
manner i.e. while one transaction is accessing a data item, no other transaction can
modify that data item. Locking is the most fundamental approach to ensure this. Lock
based protocols ensure this requirement. Idea is first obtaining a lock on the desired
data item then if lock is granted then perform the operation and then unlock it.
• In general, we support two modes of lock because, to provide better concurrency.
• A data item can be locked in two modes-
• Shared mode
o If transaction Ti has obtained a shared-mode lock (denoted by S) on any data
item Q, then Ti can read, but cannot write Q, any other transaction can also
acquire a shared mode lock on the same data item(this is the reason we called
this shared mode).
• Exclusive mode
o If transaction Ti has obtained an exclusive-mode lock (denoted by X) on any data
item Q, then Ti can both read and write Q, any other transaction cannot acquire
either a shared or exclusive mode lock on the same data item. (this is the reason
we called this exclusive mode)
• Lock –Compatibility Matrix
• Conclusion shared is compatible only with shared while exclusive is not compatible
either with shared or exclusive.
• To access a data item, transaction Ti must first lock that item, if the data item is already
locked by another transaction in an incompatible mode, or some other transaction is
already waiting in non-compatible mode, then concurrency control manager will not
grant the lock until all incompatible locks held by other transactions have been released.
The lock is then granted.
• Pitfalls of lock-based protocols- Lock based protocol do not ensure serializability as
granting and releasing of lock do not follow any order and any transaction any time may
go for lock and unlock. Here in the example below we can see, that even this transaction
0 0
in using locking but neither it is conflict serializable nor independent from deadlock.
Even others problems also persist like non – recoverability or cascading rollbacks may
occur.
S
T1 T2
LOCK-X(A)
READ(A)
WRITE(A)
UNLOCK(A)
LOCK-S(B)
READ(B)
UNLOCK(B)
LOCK-X(B)
READ(B)
WRITE(B)
UNLOCK(B)
LOCK-S(A)
READ(A)
UNLOCK(A)
• If we do not use locking, or if we unlock data items too soon after reading or writing
them, we may get inconsistent states, as there exists a possibility of dirty read. On the
other hand, if we do not unlock a data item before requesting a lock on another data
item, deadlocks may occur and concurrency will be poor.
• We shall require that each transaction in the system follow a set of rules, called a locking
protocol, indicating when a transaction may lock and unlock each of the data items for e.g.
2pl or graph based locking.
• Locking protocols restrict the number of possible schedules. The set of all such schedules
is a proper subset of all possible serializable schedules.
• We say that a schedule S is legal under a given locking protocol if S is a possible
schedule for a set of transactions that follows the rules of the locking protocol. We say
that a locking protocol ensures conflict serializability if and only if all legal schedules are
conflict serializable.
• When a transaction requests a lock on a data item in a particular mode, and no other
transaction has a lock on the same data item in a conflicting mode, the lock can be
granted. However, care must be taken to avoid the following scenario. Suppose a
transaction T2 has a shared-mode lock on a data item, and another transaction T1
requests an exclusive-mode lock on the data item. Clearly, T1 has to wait for T2 to
release the shared-mode lock. Meanwhile, a transaction T3 may request a shared-mode
lock on the same data item. The lock request is compatible with the lock granted to T2, so
T3 may be granted the shared-mode lock. At this point T2 may release the lock, but still
T1 has to wait for T3 to finish. But again, there may be a new transaction T4 that requests
a shared-mode lock on the same data item, and is granted the lock before T3 releases it.
0 0
In fact, it is possible that there is a sequence of transactions that each requests a
shared-mode lock on the data item, and each transaction releases the lock a short
while after it is granted, but T1 never gets the exclusive-mode lock on the data item. The
transaction T1 may never make progress, and is said to be starved.
• We can avoid starvation of transactions by granting locks in the following manner:
When a transaction Ti requests a lock on a data item Q in a particular mode M, the
concurrency-control manager grants the lock provided that:
o There is no other transaction holding a lock on Q in a mode that conflicts with M.
o There is no other transaction that is waiting for a lock on Q and that made its lock
request before Ti.
• a refinement of the basic two-phase locking protocol, in which lock conversions are
allowed. We shall provide a mechanism for upgrading a shared lock to an exclusive lock,
and downgrading an exclusive lock to a shared lock. We denote conversion from shared
to exclusive modes by upgrade, and from exclusive to shared by downgrade. Lock
conversion cannot be allowed arbitrarily. Rather, upgrading can take place in only the
growing phase, whereas downgrading can take place in only the shrinking phase.
0 0
Two phase locking protocol(2PL)
• The protocol ensures that each transaction issue lock and unlock requests in two
phases, note that each transaction will be 2 phased not schedule.
• Growing phase- A transaction may obtain locks, but not release any locks.
• Shrinking phase- A transaction may release locks, but may not obtain any new locks.
• Initially a transaction is in growing phase and acquires lock as needed and in between
can perform operation reach to lock point and once a transaction releases a lock, it can
issue no more lock requests i.e. it enters the shrinking phase.
.
T1 T2
LOCK-X(A)
READ(A)
WRITE(A)
LOCK-S(B)
READ(B)
LOCK-X(B)
READ(B)
WRITE(B)
LOCK-S(A)
READ(A)
UNLOCK(B)
UNLOCK(A)
UNLOCK(B)
UNLOCK(A)
0 0
Properties
2PL ensures conflict serializability, and the ordering of transaction over lock points is itself a
serializability order of a schedule in 2PL.
If a schedule is allowed in 2PL protocol then definitely it is always conflict serializable. But it
is not necessary that if a schedule is conflict serializable then it will be generated by 2pl.
Equivalent serial schedule is based on the order of lock points.
S2
T1 T2
LOCK-X(A)
READ(A)
WRITE(A)
UNLOCK(A)
LOCK-S(A)
READ(A)
Commit
Commit
Q Which of the following concurrency control protocols ensure both conflict serializability
and freedom from deadlock? (GATE-2010) (2 Marks)
0 0
I. 2-phase locking II. Time-stamp ordering
(A) I only (B) II only (C) Both I and II (D) Neither I nor II
Answer: (B)
Q Which of the following concurrency protocol ensures both conflict serializability and
freedom from deadlock? (NET-JUNE-2015)
(a) 2 - phase Locking (b) Time stamp - ordering
(1) Both (a) and (b) (2) (a) only
(3) (b) only (4) Neither (a) nor (b)
Ans. 3
Q Consider the following two-phase locking protocol. Suppose a transaction T accesses (for
read or write operations), a certain set of objects{O1,...,Ok}. This is done in the following
manner:
Step1. T acquires exclusive locks to O1,...,Ok in increasing order of their addresses.
Step2. The required operations are performed.
Step3. All locks are released.
This protocol will (GATE- 2016) (1 Marks)
(a) guarantee serializability and deadlock-freedom
(b) guarantee neither serializability nor deadlock-freedom
(c) guarantee serializability but not deadlock-freedom
(d) guarantee deadlock-freedom but not serializability
Ans: a
0 0
Q Which of the following is correct? (NET-DEC-2014)
I. Two phase locking is an optimistic protocol.
II. Two phase locking is pessimistic protocol
III. Time stamping is an optimistic protocol.
IV. Time stamping is pessimistic protocol.
(A) I and III (B) II and IV (C) I and IV (D) II and III
Ans: b
Q Which of the following concurrency protocol ensures both conflict serializability and
freedom from deadlock: (NET-JUNE-2014)
I. 2-phase locking II. Time phase ordering
(A) Both I & II (B) II only (C) I only (D) Neither I nor II
Ans: b
Q Which of the following time stamp ordering protocol(s) allow(s) the following schedules?
T: W1(A); W2 (A); W3 (A); R2 (A) ; R4 (4);
Time stamps: T1 : 5, T2 : 10, T3 : 15; T4 : 20
a) Thomas write rule b) Basic time stamp
0 0
Variants of Two- Phase locking method
• Different variants of 2pl are used where we try ensure the properties like deadlock,
recoverable, cascade less.
• Conservative 2pl
• The idea is there is no growing phase transaction start directly from lock point, i.e.
transaction must first acquire all the required locks then only it can start execution. If all
the locks are not available then transaction must release the acquired locks and must
wait.
o Shrinking phase will work as usual, and transaction can unlock any data item
anytime.
o we must have a knowledge in future to understand what is data required so
that we can use it
Lock Point
Shrinking Phase
• Properties
• Conflict serializable, view serializable, Independence from deadlock
o Still have possibility of irrecoverable schedule and cascading rollbacks.
0 0
RIGOROUS 2PL
• requires that all locks be held until the transaction commits.
• This protocol requires that locking be two phase and also all the locks taken be held by
transaction until that transaction commit.
• Hence there is no shrinking phase in the system.
E.g.
• Properties
• Conflict serializable, view serializable, recoverable and cascade less
• Still have possibility of deadlocks.
0 0
STRICT 2PL
• that all exclusive-mode locks taken by a transaction be held until that transaction
commits. This requirement ensures that any data written by an uncommitted
transaction are locked in exclusive mode until the transaction commits, preventing any
other transaction from reading the data.
• This protocol requires that locking be two phase and also that exclusive –mode locks
taken by transaction be held until that transaction commits.
• So it is simplified form of rigorous 2pl
• E.g.
• It ensures serializability (Equivalent serial schedule order based on the order of lock
points.
• Ensures strict recoverability.
• Deadlock still possible in strict 2PL.
• In general, deadlocks are a necessary evil associated with locking, if we want to avoid
inconsistent states. Deadlocks are definitely preferable to inconsistent states, since they
can be handled by rolling back transactions, whereas inconsistent states may lead to real-
world problems that cannot be handled by the database system.
• Strict two-phase locking and rigorous two-phase locking (with lock conversions) are used
extensively in commercial database systems.
0 0
Q Which of the following statement is/are correct
a) Every conflict serializable schedule allowed under 2PL protocol is allowed by basic time
stamping protocol.
b) Every schedule allowed under basic time stamping protocol is allowed by Thomas-write
rule
c) Every schedule allowed under Thomas-write rule is allowed by basic time stamping
protocol
d) none
0 0
Graph based protocol
• if we wish to develop protocols that are not two phase, we need additional information
on how each transaction will access the database.
• There are various models that can give us the additional information, each differing in
the amount of information provided.
• The simplest model requires that we have prior knowledge about the order in which the
database items will be accessed.
• Given such information, it is possible to construct locking protocols that are not two
phases, but that, nevertheless, ensure conflict serializability.
• To acquire such prior knowledge, we impose a partial ordering → on the set D = {d1,
d2,..., dh} of all data items. If di → d j, then any transaction accessing both di and d j must
access di before accessing d j.
• This partial ordering may be the result of either the logical or the physical organization
of the data, or it may be imposed solely for the purpose of concurrency control.
• The partial ordering implies that the set D may now be viewed as a directed acyclic
graph, called a database graph.
• Here for the sake of simplicity, we will follow two restriction
o Will study graphs that are rooted trees.
o Will restrictto employ only exclusive locks.
0 0
Tree Protocol
• In the tree protocol, the only lock instruction allowed is lock-X. Each transaction Ti can
lock a data item at most once, and must observe the following rules:
• The first lock by Ti may be on any data item.
• Subsequently, a data item Q can be locked by Ti only if the parent of Q is currently
locked by Ti.
• Data items may be unlocked at any time.
B C
• A data item that has been locked and unlocked by Ti cannot subsequently be relocked
by Ti.
0 0
Properties
• All schedules that are legal under the tree protocol are conflict serializable.
• tree protocol ensures freedom from deadlock.
• tree protocol does not ensure recoverability and cascadelessness.
• The tree-locking protocol has another advantage over the two-phase locking protocol in
that unlocking may occur earlier. Earlier unlocking may lead to shorter waiting times,
and to an increase in concurrency.
• A transaction may have to lock data items that it does not access. This additional locking
results in increased locking overhead, the possibility of additional waiting time, and a
potential decrease in concurrency.
• Without prior knowledge of what data items will need to be locked, transactions will
have to lock the root of the tree, and that can reduce concurrency greatly.
• there may be conflict-serializable schedules that cannot be obtained through the tree
protocol. Indeed, there are schedules possible under the two-phase locking protocol that
are not possible under the tree protocol, and vice versa.
• To ensure recoverability and cascadelessness, the protocol can be modified to not
permit release of exclusive locks until the end of the transaction. Holding exclusive locks
until the end of the transaction reduces concurrency.
0 0
Deadlock Handling
• There are two principal methods for dealing with the deadlock problem.
• We can use a deadlock prevention protocol to ensure that the system will never enter a
deadlock state, Prevention is commonly used if the probability that the system would
enter a deadlock state is relatively high
• Alternatively, we can allow the system to enter a deadlock state, and then try to recover
by using a deadlock detection and deadlock recovery scheme.
• both methods may result in transaction rollback more efficient. Note that a detection and
recovery scheme require overhead that includes not only the run-time cost of
maintaining the necessary information and of executing the detection algorithm, but
also the potential losses inherent in recovery from a deadlock.
0 0
Deadlock Prevention
• One approach ensures that no hold & waits can occur it is the simplest scheme requires
that each transaction locks all its data items before it begins execution, either all are
locked in one step or none are locked e.g. conservative 2PL.
• Other approach ensures that no cyclic waits can occur by ordering the requests for
locks, i.e. is to impose an ordering of all data items, and to require that a transaction
lock data items only in a sequence consistent with the ordering e.g. tree protocol.
o it is often hard to predict, before the transaction begins, what data items need
to be locked;
o data-item utilization may be very low, since many of the data items may be
locked but unused for a long time.
• The other approach is closer to deadlock recovery, and performs transaction rollback
instead of waiting for a lock, whenever the wait could potentially result in a deadlock
i.e. to use preemption and transaction rollbacks.
• So when a transaction Tj requests a lock that transaction Ti holds, the lock granted to Ti
may be preempted by rolling back of Ti, and granting of the lock to Tj. To control the
preemption, we assign a unique timestamp, to each transaction when it begins. The
system uses these timestamps only to decide whether a transaction should wait or roll
back. Locking is still used for concurrency control. If a transaction is rolled back, it retains
its old timestamp when restarted. Two different deadlock-prevention schemes using
timestamps have been proposed:
o The wait–die scheme is a non-preemptive technique. When transaction Ti
requests a data item currently held by Tj , Ti is allowed to wait only if it has a
timestamp smaller than that of Tj (that is, Ti is older than Tj). Otherwise, Ti is
rolled back (dies).
o The wound–wait scheme is a preemptive technique. It is a counterpart to the
wait–die scheme. When transaction Ti requests a data item currently held by Tj ,
Ti is allowed to wait only if it has a timestamp larger than that of Tj (that is, Ti is
younger than Tj). Otherwise, Tj is rolled back (Tj is wounded by Ti).
• The major problem with both of these schemes is that unnecessary rollbacks may occur.
Another simple approach to deadlock prevention is based on lock timeouts. In this
approach, a transaction that has requested a lock waits for at most a specified amount
of time. If the lock has not been granted within that time, the transaction is said to time
out, and it rolls itself back and restarts. If there was in fact a deadlock, one or more
transactions involved in the deadlock will time out and roll back, allowing the others to
proceed. This scheme falls somewhere between deadlock prevention, where a deadlock
will never occur, and deadlock detection and recovery.
• The timeout scheme is particularly easy to implement, and works well if transactions
are short and if long waits are likely to be due to deadlocks. However, in general it is hard
to decide how long a transaction must wait before timing out. Too long a wait results in
unnecessary delays once a deadlock has occurred. Too short a wait results in transaction
rollback even when there is no deadlock, leading to wasted resources. Starvation is also
0 0
a possibility with this scheme. Hence, the timeout-based scheme has limited
applicability.
0 0
Deadlock Detection and Recovery
• If a system does not employ some protocol that ensures deadlock freedom, then a
detection and recovery scheme must be used. An algorithm that examines the state of
the system is invoked periodically to determine whether a deadlock has occurred. If one
has, then the system must attempt to recover from the deadlock. To do so, the system
must:
• Maintain information about the current allocation of data items to transaction, as well
as any outstanding data item requests.
• Provide an algorithm that uses this information to determine whether the system has
entered a deadlock state.
• Recover from the deadlock when the detection algorithm determines that a deadlock
exists.
0 0
Recovery from Deadlock
• When a detection algorithm determines that a deadlock exists, the system must
recover from the deadlock. The most common solution is to roll back one or more
transactions to break the deadlock. Three actions need to be taken:
• Selection of a victim. Given a set of deadlocked transactions, we must determine which
transaction (or transactions) to roll back to break the deadlock. We should roll back
those transactions that will incur the minimum cost. Unfortunately, the term minimum
cost is not a precise one. Many factors may determine the cost of a rollback, including:
• How long the transaction has computed, and how much longer the transaction will
compute before it completes its designated task.
• How many data items the transaction has used.
• How many more data items the transaction needs for it to complete.
• How many transactions will be involved in the rollback.
• Rollback. Once we have decided that a particular transaction must be rolled back, we
must determine how far this transaction should be rolled back.
• The simplest solution is a total rollback: Abort the transaction and then restart it.
• However, it is more effective to roll back the transaction only as far as necessary to
break the deadlock. Such partial rollback requires the system to maintain additional
information about the state of all the running transactions. Specifically, the sequence of
lock requests/grants and updates performed by the transaction needs to be recorded.
The deadlock detection mechanism should decide which locks the selected transaction
needs to release in order to break the deadlock. The selected transaction must be rolled
back to the point where it obtained the first of these locks, undoing all actions it took
after that point. The recovery mechanism must be capable of performing such partial
rollbacks. Furthermore, the transactions must be capable of resuming execution after a
partial rollback.
• Starvation. In a system where the selection of victims is based primarily on cost factors, it
may happen that the same transaction is always picked as a victim. As a result, this
transaction never completes its designated task, thus there is starvation. We must
ensure that a transaction can be picked as a victim only a (small) finite number of times.
The most common solution is to include the number of rollbacks in the cost factor.
0 0
Multiple Granularity
In the concurrency-control schemes described thus far, we have used each individual data
item as the unit on which synchronization is performed.
There are circumstances, however, where it would be advantageous to group several data
items, and to treat them as one individual synchronization unit. For example, if a
transaction Ti needs to access the entire database, and a locking protocol is used, then Ti
must lock each item in the database. Clearly, executing these locks is time-consuming. It
would be better if Ti could issue a single lock request to lock the entire database. On the
other hand, if transaction Tj needs to access only a few data items, it should not be
required to lock the entire database, since otherwise concurrency is lost.
What is needed is a mechanism to allow the system to define multiple levels of granularity.
This is done by allowing data items to be of various sizes and defining a hierarchy of data
granularities, where the small granularities are nested within larger ones. Such a hierarchy
can be represented graphically as a tree. A non-leaf node of the multiple-granularity tree
represents the data associated with its descendants. In the tree protocol, each node is an
independent data item.
As an illustration, consider the tree of Figure, which consists of four levels of nodes. The
highest level represents the entire database. Below it are nodes of type area; the database
consists of exactly these areas. Each area in turn has nodes of type file as its children. Each
area contains exactly those files that are its child nodes. No file is in more than one area.
Finally, each file has nodes of type record. As before, the file consists of exactly those
records that are its child nodes, and no record can be present in more than one file.
Each node in the tree can be locked individually. As we did in the two-phase locking
protocol, we shall use shared and exclusive lock modes. When a transaction locks a node,
in either shared or exclusive mode, the transaction also has implicitly locked all the
descendants of that node in the same lock mode. For example, if transaction Ti gets an
explicit lock on file Fc of Figure, in exclusive mode, then it has an implicit lock in exclusive
mode on all the records belonging to that file. It does not need to lock the individual
records of Fc explicitly. explicitly.
A1 A2
rb1 r bk r c1 r cm
0 0
Suppose now that transaction Tk wishes to lock the entire database. To do so, it simply must
lock the root of the hierarchy. Note, however, that Tk should not succeed in locking the root
node, since Ti is currently holding a lock on part of the tree (specifically, on file Fb). But how
does the system determine if the root node can be locked? One possibility is for it to search
the entire tree. This solution, however, defeats the whole purpose of the multiple-
granularity locking scheme. A more efficient way to gain this knowledge is to introduce a
new class of lock modes, called intention lock modes. If a node is locked in an intention
mode, explicit locking is done at a lower level of the tree (that is, at a finer granularity).
Intention locks are put on all the ancestors of a node before that node is locked explicitly.
Thus, a transaction does not need to search the entire tree to determine whether it can lock
a node successfully. A transaction wishing to lock a node—say, Q—must traverse a path in
the tree from the root to Q. While traversing the tree, the transaction locks the various
nodes in an intention mode.
There is an intention mode associated with shared mode, and there is one with exclusive
mode. If a node is locked in intention-shared (IS) mode, explicit locking is being done at a
lower level of the tree, but with only shared-mode locks. Similarly, if a node is locked in
intention-exclusive (IX) mode, then explicit locking is being done at a lower level, with
exclusive-mode or shared-mode locks. Finally, if a node is locked in shared and intention-
exclusive (SIX) mode , the subtree rooted by that node is locked explicitly in shared mode,
and that explicit locking is being done at a lower level with exclusive-mode locks.
IS IX S SIX X
IS true true true true false
IX true true false false false
S true false true false false
SIX true false false false false
X false false false false false
The multiple-granularity locking protocol uses these lock modes to ensure serializability. It
requires that a transaction Ti that attempts to lock a node Q must follow these rules:
0 0
5. Transaction Ti can lock a node only if Ti has not previously unlocked any node (that
is, Ti is two phase).
6. Transaction Ti can unlock a node Q only if Ti currently has none of the children of Q
locked.
Observe that the multiple-granularity protocol requires that locks be acquired in top-
down (root-to-leaf) order, whereas locks must be released in bottom-up (leaf- to-root)
order. This protocol enhances concurrency and reduces lock overhead. It is particularly
useful in applications that include a mix of:
• Short transactions that access only a few data items.
• Long transactions that produce reports from an entire file or set of files.
0 0
Validation-Based Protocols
In cases where a majority of transactions are read-only transactions, the rate of conflicts
among transactions may be low. Thus, many of these transactions, if executed without the
supervision of a concurrency-control scheme, would nevertheless leave the system in a
consistent state.
The validation protocol requires that each transaction Ti executes in two or three different
phases in its lifetime, depending on whether it is a read-only or an update transaction. The
phases are, in order:
Read phase. During this phase, the system executes transaction Ti. It reads the values of the
various data items and stores them in variables local to Ti. It performs all write operations
on temporary local variables, without updates of the actual database.
Validation phase. The validation test (described below) is applied to transaction Ti. This
determines whether Ti is allowed to proceed to the write phase without causing a
violation of serializability. If a transaction fails the validation test, the system aborts the
transaction.
Write phase. If the validation test succeeds for transaction Ti, the temporary local variables
that hold the results of any write operations performed by Ti are copied to the database.
To perform the validation test, we need to know when the various phases of transactions
took place. We shall, therefore, associate three different timestamps with each transaction
Ti:
1. Finish(Tk) < Start(Ti). Since Tk completes its execution before Ti started, the
0 0
serializability order is indeed maintained.
2. The set of data items written by Tk does not intersect with the set of data items read
by Ti, and Tk completes its write phase before Ti starts its validation phase (Start(Ti) <
Finish(Tk) < Validation(Ti)). This condition ensures that the writes of Tk and Ti do not
overlap. Since the writes of Tk do not affect the read of Ti, and since Ti cannot affect
the read of Tk, the serializability order is indeed maintained.
The validation scheme automatically guards against cascading rollbacks, since the actual
writes take place only after the transaction issuing the write has committed. However,
there is a possibility of starvation of long transactions, due to a sequence of conflicting short
transactions that cause repeated restarts of the long transaction. To avoid starvation,
conflicting transactions must be temporarily blocked, to enable the long transaction to
finish.
This validation scheme is called the optimistic concurrency-control scheme since transactions
execute optimistically, assuming they will be able to finish execution and validate at the end.
In contrast, locking and timestamp ordering are pessimistic in that they force a wait or a
rollback whenever a conflict is detected, even though there is a chance that the schedule
may be conflict serializable.
0 0
When several transactions execute concurrently in the database, the consistency of data
may no longer be preserved. It is necessary for the system to control the interaction among
the concurrent transactions, and this control is achieved through one of a variety of
mechanisms called concurrency-control schemes.
To ensure serializability, we can use various concurrency-control schemes. All these schemes
either delay an operation or abort the transaction that issued the operation. The most
common ones are locking protocols, timestamp- ordering schemes, validation techniques,
and multiversion schemes.
• A locking protocol is a set of rules that state when a transaction may lock and
unlock each of the data items in the database.
• The two-phase locking protocol allows a transaction to lock a new data item
only if that transaction has not yet unlocked any data item. The protocol
ensures serializability, but not deadlock freedom. In the absence of
information concerning the manner in which data items are accessed, the two-
phase locking protocol is both necessary and sufficient for ensuring
serializability.
• The strict two-phase locking protocol permits release of exclusive locks only at
the end of transaction, in order to ensure recoverability and cascadelessness
of the resulting schedules. The rigorous two-phase locking protocol releases all locks
only at the end of the transaction.
• Graph-based locking protocols impose restrictions on the order in which items are
accessed, and can thereby ensure serializability without requiring the use of two-
phase locking, and can additionally ensure deadlock freedom.
• Various locking protocols do not guard against deadlocks. One way to pre- vent
deadlock is to use an ordering of data items, and to request locks in a sequence
consistent with the ordering.
• Another way to prevent deadlock is to use preemption and transaction roll- backs.
To control the preemption, we assign a unique timestamp to each transaction. The
system uses these timestamps to decide whether a transac- tion should wait or roll
back. If a transaction is rolled back, it retains its old timestamp when restarted. The
wound– wait scheme is a preemptive scheme.
• If deadlocks are not prevented, the system must deal with them by using a
deadlock detection and recovery scheme. To do so, the system constructs a wait-for
graph. A system is in a deadlock state if and only if the wait-for graph contains a
cycle. When the deadlock detection algorithm determines that a deadlock exists, the
system must recover from the deadlock. It does so by rolling back one or more
transactions to break the deadlock.
• There are circumstances where it would be advantageous to group several data
items, and to treat them as one aggregate data item for purposes of working,
resulting in multiple levels of granularity. We allow data items of various sizes, and
define a hierarchy of data items, where the small items are nested within larger ones.
Such a hierarchy can be represented graphically as a tree. Locks are acquired in root-
to-leaf order; they are released in leaf-to-root order. The protocol ensures
0 0
serializability, but not freedom from deadlock.
• A timestamp-ordering scheme ensures serializability by selecting an ordering in
advance between every pair of transactions. A unique fixed timestamp is
associated with each transaction in the system. The timestamps of the transactions
determine the serializability order. Thus, if the timestamp of transaction Ti is smaller
than the timestamp of transaction Tj , then the scheme ensures that the produced
schedule is equivalent to a serial schedule in which transaction Ti appears before
transaction Tj. It does so by rolling back a transaction whenever such an order is
violated.
• A validation scheme is an appropriate concurrency-control method in cases where a
majority of transactions are read-only transactions, and thus the rate of conflicts
among these transactions is low. A unique fixed timestamp is associated with each
transaction in the system. The serializability order is determined by the timestamp of
the transaction. A transaction in this scheme is never delayed. It must, however, pass
a validation test to complete. If it does not pass the validation test, the system rolls
it back to its initial state.
• A multiversion concurrency-control scheme is based on the creation of a new version
of a data item for each transaction that writes that item. When a read operation is
issued, the system selects one of the versions to be read. The concurrency-control
scheme ensures that the version to be read is selected in a manner that ensures
serializability, by using timestamps. A read operation always succeeds.
◦
In multiversion timestamp ordering, a write operation may result in the rollback of
the transaction.
In multiversion
◦ two-phase locking, write operations may result in a lock wait or,
possibly, in deadlock.
0 0
consistency; cursor stability is a special case of degree- two consistency, and is
widely used.
• Concurrency control is a challenging task for transactions that span user
interactions. Applications often implement a scheme based on validation of writes
using version numbers stored in tuples; this scheme provides a weak level of
serializability, and can be implemented at the application level without
modifications to the database.
• Special concurrency-control techniques can be developed for special data
structures. Often, special techniques are applied in B+-trees to allow greater
concurrency. These techniques allow non-serializable access to the B+-tree, but they
ensure that the B+-tree structure is correct, and ensure that accesses to the database
itself are serializable.
Q If the interference among the transactions is less, which of the following concurrency
control techniques have less overloaded in the execution?
a) Locking techniques b) Time stamping techniques
c) Validation technique d) None
Q Consider a directory having 3 files, each file has 1000 pages and each page has 100
records
In multiple granularity locking protocol, if a transaction reads records from page 200 to
page 700. What is the sequence of locks acquired?
a) IX on directory, S on file 1 b) IS on directory, S on file 1
c) IS on directory, X on file 1 d) IS on directory, S on file 2
Q __________ rules used to limit the volume of log information that has to be handled and
0 0
processed in the event of system failure involving the loss of volatile information. (NET-
DEC-2014)
(A) Write-ahead log (B) Check-pointing
(C) Log buffer (D) Thomas
Ans: b
Q The basic variants of time-stamp based method of concurrency control are (NET-JUNE-
2011)
(A) Total time stamp-ordering (B) Partial time stamp ordering
(C) Multiversion Time stamp ordering (D) All of the above
Ans: d
0 0
Q Which of the following is the recovery management technique in DDBMS? (NET-JUNE-
2011)
(A) 2PC (Two Phase Commit) (B) Backup
(C) Immediate update (D) All of the above
Ans: d
0 0
Query Language
• After designing a data base, that is ER diagram followed by conversion in relational model
followed by normalization and indexing, now next task is how to store, retrieve and modify
data in the data base. Thought here we will be concentrating more on the retrieval part.
Query languages are used for this purpose.
0 0
• QUERY LANGUAGE
o A Languages using which user request some information from the database.
• Procedural Query Language
o Here users instruct the system to performs a sequence of operations on the data
base in order to compute the desired result.
o Means user provides both what data to be retrieved and how data to be retrieved.
e.g. Relational Algebra.
• Non-Procedural Query Language
o In nonprocedural language, the user describes the desired information without
giving a specific procedure for obtaining that information. What data to be retrieved
e.g. Relational Calculus. Tuple relational calculus, Domain relational calculus are
declarative query languages based on mathematical logic
• Relational Algebra (Procedural) and Relational Calculus (non-procedural) are mathematical
system/ query languages which are used for query on relational model. RA and RC are not
executed in any computer, they provide the fundamental mathematics on which SQL is
based.
• SQL (structured query language) works on RDBMS, and it includes elements of both
procedural or non-procedural query language.
0 0
RELATIONAL ALGEBRA
• RA like any other mathematical system provides a number of operators and use relations
(tables) as operands and produce a new relation as their result.
• Every operator in the RA accepts (one or two) relation/table as input arguments and
returns always a single relation instance as the result without a name.
• It also does not consider duplicity by default as it is based on set theory. Same query is
written in RA and SQL the result may be different as SQL considers duplication.
• As it is pure mathematics no use of English keywords. Operators are represented using
symbols.
• The relational algebra is a procedural query language.
• The fundamental operations in the relational algebra are select, project, union, set
difference, Cartesian product, and Rename.
• There are several other operations namely: set intersection, natural join, and assignment.
• The select, project, and rename operations are called unary operations, because they
operate on one relation.
• Union, Cartesian product and set difference operate on pairs of relations and are,
therefore, called binary operations.
• Relational algebra also provides the framework for query optimization.
0 0
• Relational schema - A relation schema R, denoted by R (A1, A2, ..., An), is made up of a
relation name R and a list of attributes, A1, A2, ..., An. Each attribute Ai is the name of a role
played by some domain D in the relation schema R. It is use to describe a Relation.
• E.g. Schema representation of Table Student is as –
o STUDENT (NAME, ID, CITY, COUNTRY, HOBBY).
• Relational Instance - Relations with its data at particular instant of time.
Q If D1, D2, …... Dn are domains in a relational model, then the relation is a table, which is a
subset of (NET-JUNE-2013)
(A) D1 + D2 + ... + Dn (B) D1 × D2 × ... × Dn
(C) D1 ∪ D2 ∪ ... ∪ Dn (D) D1 – D2 – ... – Dn
Ans: b
0 0
OPERATORS USED IN RELATIONAL ALGEBRA
• BASIC / FUNDAMENTAL OPERATORS
Name Symbol
Select (σ)
Project (∏)
Union (∪)
Set difference (−)
Cross product (Χ)
Rename (ρ)
• DERIVED OPERATORS
0 0
The Project Operation (Vertical Selection)
● Main idea behind project operator is to select desired columns.
● The project operation is a unary operation that returns its argument relation, with certain
attributes left out.
● Projection is denoted by the uppercase Greek letter pi (Π).
● Πcolumn_name (table_name)
● We list those attributes that we wish to appear in the result as a subscript to Π, argument
relation follows in parentheses.
● Minimum number of columns selected can be 1, Maximum selected Columns can be n - 1.
Q Write a RELATIONAL ALGEBRA query to find the name of all customer having bank account?
Q Write a RELATIONAL ALGEBRA query to find each loan number along with loan amount?
Q Write a RELATIONAL ALGEBRA query to find the name of all customer without duplication
having bank account?
Q Write a RELATIONAL ALGEBRA query to find all the details of bank branches?
Q Suppose R1(A, B) and R2(C, D) are two relation schemas. Let r1 and r2 be the corresponding
relation instances. B is a foreign key that refers to C in R2. If data in r1 and r2 satisfy referential
integrity constraints, which of the following is ALWAYS TRUE? (Gate-2012) (2 Marks)
a) ∏B(r1) − ∏C(r2) = ∅ b) ∏C(r2) − ∏B(r1) = ∅
c) ∏B(r1) = ∏C(r2) d) ∏B(r1) − ∏C(r2) ≠ ∅
Ans: a
0 0
The Select Operation (Horizontal Selection)
● The select operation selects tuples that satisfy a given predicate/Condition p.
● Lowercase Greek letter sigma (σ) is used to denote selection.
● It is a unary operator.
● Eliminates only tuples/rows.
● σ condition (table_name)
● Predicate appears as a subscript to σ, the argument relation is in parentheses after the σ.
● Commutative in Nature, σp1( σp2(r))= σp2( σp1(r))
Q Write a RELATIONAL ALGEBRA query to find all account_no where balance is less the 1000?
Q Write a RELATIONAL ALGEBRA query to find branch name which is situated in Delhi and
having assets less than 1,00,000?
Q Write a RELATIONAL ALGEBRA query to find branch name and account_no which has
balance greater than equal to 1,000 but less than equal to 10,000?
0 0
Q Which of the following query transformations (i.e., replacing the l.h.s. expression by the
r.h.s. expression) is incorrect? R1 and R2 are relations. C1, C2 are selection conditions and A1,
A2 are attributes of R1. (Gate-1998) (2 Marks)
a) σ C1 (σ C2 R1)) → σ C2 (σ C1 (R1)) b) σ C1 (π A1 R1)) → πA1 (σ C1 (R1))
c) σ C1 (R1 ∪ R2) → σ C1 (R1) ∪ σ C1 (R2) d) πA1 (σ C1(R1)) → σ C1 (π A1 (R1))
Answer: (D)
D) if the selection condition is on attribute A2, then we cannot replace it by RHS as there will not be any
attribute A2 due to projection of A1 only.
0 0
The Union Operation
• It is a binary operation, denoted, as in set theory, by ∪.
• Written as, Expression1 ∪ Expression2, r ∪ s = {t | t ∈ r or t ∈ s}
• For a union operation r ∪ s to be valid, we require that two conditions hold:
• The relations r and s must be of the same arity. That is, they must have the same number of
attributes, the domains of the ith attribute of r and the ith attribute of s must be the same,
for all i.
• Mainly used to fetch data from different relations.
Q Write a RELATIONAL ALGEBRA query to find all the customer name who have a loan or an
account or both?
0 0
The Set-Difference Operation
● The set-difference operation, denoted by −, allows us to find tuples that are in one relation
but are not in another. It is a binary operator.
● The expression r − s produces a relation containing those tuples in r but not in s.
● For a set-difference operation r − s to be valid, we require that the relations r and s be of
the same arity, and that the domains of the ith attribute of r and the ith attribute of s be
the same, for all i.
● 0 <= ӀR - SӀ <= ӀRӀ
Q Write a RELATIONAL ALGEBRA query to find all the customer name who have a loan but do
not have an account?
0 0
The Cartesian-Product Operation
● The Cartesian-product operation, denoted by a cross (×), allows us to combine information
from any two relations.
● It is a binary operator; we write the Cartesian product of relations R1 and R2 as R1 × R2.
● Cartesian-product operation associates every tuple of R1 with every tuple of R2.
o R1 Χ R2 = {rs | r ∈ R1 and s ∈ R2}, contains one tuple <r, s> (concatenation of tuples r
and s) for each pair of tuples r ϵ R1, s ϵ R2.
● R1 Χ R2 returns a relational instance whose schema contains all the fields of R1 (in order as
they appear in R1) and all fields of R2 (in order as they appear in R2).
● If R1 has m tuples and R2 has n tuples the result will be having = m*n tuples.
● Same attribute name may appear in both R1 and R2, we need to devise a naming schema to
distinguish between these attributes.
R1 R2
A B B C
1 P Q X
2 Q R Y
3 R S Z
R1 * R2
A R1.B R2.B C
1 P Q X
1 P R Y
1 P S Z
2 Q Q X
2 Q R Y
2 Q S Z
3 R Q X
3 R R Y
3 R S Z
Q Write a RELATIONAL ALGEBRA query to find the name of all the customers along with
account balance, who have an account in the bank?
• To solve this query we understand that customer who have an account are available in
depositor and balance is available in account, so to answer this query each tuple of the
table account must be matched with each tuple with depositor, and they have a common
attribute account_no if there is a match then that tuple is valid otherwise redundant, must
be eliminated with the conditions.
0 0
Q Write a RELATIONAL ALGEBRA query to find the name of all the customers along with
account balance, who have an account in the bank?
Q Write a RELATIONAL ALGEBRA query to find the name of all the customers along with loan
amount, who have a loan in the bank?
Q Write a RELATIONAL ALGEBRA query to find all loan_no along with amount and
branch_name, which is situated in Delhi?
Q Write a RELATIONAL ALGEBRA query to find the name of the customer who have an account
in the branch situated in Delhi and balance greater than 1000?
0 0
The Rename Operation
● The results of relational algebra are also relations but without any name.
● The rename operation allows us to rename the output relation. It is denoted with small
Greek letter rho ρ. Where the result of expression E is saved with name of x.
● ρx(A1, A2, A3, A4,…..AN)(E)
Q Write a RELATIONAL ALGEBRA query to find the account_no along with balance with 8%
interest as total amount, with table name as balance sheet?
• even if an attribute name can be derived from the base relations, we may want to
change the attribute name in the result.
• One reason to rename a relation is to replace a long relation name with a shortened
version that is more convenient to use elsewhere in the query.
Q Write a RELATIONAL ALGEBRA query to find the loan_no with maximum loan amount?
• Another reason to rename a relation is a case where we wish to compare tuples in the
same relation. We then need to take the Cartesian product of a relation with itself and,
without renaming, it becomes impossible to distinguish one tuple from the other.
• A and b are used to rename a relation is referred to as table alias, correlation variable or
tuple variable.
0 0
Additional Relational-Algebra Operations
• If we restrict ourselves to just the fundamental operations, certain common queries are
lengthy to express. Therefore, we use additional operations.
● These additional operations do not add any power to the algebra.
● They are used to simplify the queries.
0 0
The Set-Intersection Operation
• We will be using ∩ symbol to denote set intersection.
• r ∩ s = r − (r − s)
• Set intersection is not a fundamental operation and does not add any power to the
relational algebra.
• r ∩ s = {t | t ∈ r and t ∈ s}
• 0 <= Ӏ R∩S Ӏ <= min (ӀRӀ, ӀSӀ)
Q Write a RELATIONAL ALGEBRA query to find all the customer name who have both a loan
and an account?
0 0
The Natural-Join Operation
● The natural join is a binary operation that allows us to combine certain selections and a
Cartesian product into one operation.
● The natural join of r and s, denoted by r ⋈ s
● The natural-join operation forms a Cartesian product of its two arguments, performs a
selection forcing equality on those attributes that appear in both relation schemas and
finally removes duplicate attributes.
• The natural join of r and s is a relation on schema R ∪ S formally defined as follows:
● =
R1 R2
A B B C
1 P Q X
2 Q R Y
3 R S Z
R1 ⋈ R2
A B C
2 Q X
3 R Y
Q Write a RELATIONAL ALGEBRA query to find the name of all the customers along with
account balance, who have an account in the bank?
Q Write a RELATIONAL ALGEBRA query to find the name of all the customers along with loan
amount, who have a loan in the bank?
Q Write a RELATIONAL ALGEBRA query to find all loan_no along with amount and
branch_name, which is situated in Delhi?
Q Write a RELATIONAL ALGEBRA query to find the name of the customer who have an account
in the branch situated in Delhi and balance greater than 1000?
0 0
Q Let r be a relation instance with schema R = (A, B, C, D). We define r1 = ΠA, B, C (r) and r2 = ΠA.D
(r). Let s = r1 * r2 where * denotes natural join. Given that the decomposition of r into r1 and r2
is lossy, which one of the following is TRUE? (Gate-2005) (1 Marks)
(A) s ⊂ r (B) r ∪ s (C) r ⊂ s (D) r * s = s
Q Let r and s be two relations over the relation schemes R and S respectively, and let A be an
attribute in R. The relational algebra expression σ A=a(r ⋈ s) is always equal to (Gate-2001) (1
Marks)
a) σ A=a(r) b) r c) σ A=a(r)⋈s d) None of the above
Answer is C.
C is just the better form of query, more execution friendly because requires less memory while
joining. query, given in question takes more time and memory while joining.
Q Consider the relations R(A, B) and S(B, C) and the following four relational algebra queries
over R and S :
I. Π A, B (R ⨝ S) II. R ⨝ ΠB(S)
III. R ∩ (ΠA(R) × ΠB(S)) IV. ΠA, R.B (R × S)
where R⋅B refers to the column B in table R. One can determine that: (NET-JULY-2016)
(1) I, III and IV are the same query. (2) II, III and IV are the same query.
(3) I, II and IV are the same query. (4) I, II and III are the same query
0 0
Ans. 4
Q Let R and S be two relations with the following schema (Gate-2008) (2 Marks)
R (P, Q, R1, R2, R3) S (P, Q, S1, S2)
where {P, Q}is the key for both schemas. Which of the following queries are equivalent?
i) ΠP (R ⋈ S) ii) ΠP (R) ⋈ ΠP (S)
iii) ΠP (ΠP, Q (R) ∩ ΠP, Q (S)) iv) ΠP (ΠP, Q (R) − (ΠP, Q (R)−ΠP, Q (S)))
a) Only I and II b) Only I and III
c) Only I, II and III d) Only I, III and IV
Ans: d
0 0
Q Consider the join of a relation R with a relation S. If K has m tuples and S has n tuples, then
the maximum and minimum sizes of the join respectively are: (Gate-1999) (1 Marks)
(A) m+n and 0 (B) mn and 0 (C) m+n and m-n (D) mn and m+n
Answer: (B)
0 0
Q The following functional dependencies hold for relations R(A, B, C) and S(B, D, E).
B→A
A→C
The relation R contains 200 tuples and the relation S contains 100 tuples. What is the
maximum number of tuples possible in the natural join R ⋈ S? (Gate-2010) (2 Marks)
a) 100 b) 200 c) 300 d) 2000
Ans: a
Q (NET-DEC-2015)
0 0
0 0
Theta join / Conditional Join
• The theta join/ Conditional join operation is a variant of the natural-join operation that allows
us to combine a selection and a Cartesian product into a single operation.
• The theta join operation is defined as follows:
Q Let R1 (A, B, C) and R2 (D, E) be two relation schema, where the primary keys are shown
underlined, and let C be a foreign key in R1 referring to R2. Suppose there is no violation of the
above referential integrity constraint in the corresponding relation instances r1 and r2. Which
one of the following relational algebra expressions would necessarily produce an empty
relation? (Gate-2004) (1 Marks)
a) ΠD(r2)−ΠC(r1) b) ΠC(r1)−ΠD(r2) c) ΠD(r1⋈C≠Dr2) d) ΠC(r1⋈C=Dr2)
Answer: (B)
C in R1 is a foreign key referring to the primary key D in R2. So, every element of C must come from
some D element.
How many tuples does the result of the following relational algebra expression contain?
Assume that the schema of A∪B is the same as that of A. (Gate-2007) (2 Marks)
(A∪B) ⋈ A.Id > 40 ∨ C.Id < 15 C
a) 7 b) 4 c) 5 d) 9
Q Suppose database table T1(P, R) currently has tuples {(10, 5), (15, 8), (25, 6)} and table T2(A,
C) currently has {(10, 6), (25, 3), (10, 5)}. Consider the following three relational algebra
queries RA1, RA2 and RA3: (NET-AUG-2016)
RA1: T1 ⨝ T1.P = T2.A T2 where ⨝ is natural join symbol
RA2: T1 =⨝ T1.P = T2.A T2 where =⨝ is left outer join symbol
RA3: T1 ⨝ T1.P = T2.A and T1.R = T2.C T2
The number of tuples in the resulting table of RA1, RA2 and RA3 are given by:
(1) 2, 4, 2 respectively (2) 2, 3, 2 respectively
0 0
(3) 3, 3, 1 respectively (4) 3, 4, 1 respectively
Ans: 4
0 0
Outer join Operations
● The outer-join operation is an extension of the join operation to deal with missing
information.
● The outer join operation works in a manner similar to the natural join operation, but
preserves those tuples that would be lost in a join by creating tuples in the result
containing null values.
● We can use the outer-join operation to avoid this loss of information.
● There are actually three forms of the operation: left outer join, denoted ; right outer
join, denoted ; and full outer join, denoted .
0 0
Left Outer Join
● The left outer join ( ) takes all tuples in the left relation that did not match with any tuple
in the right relation, pads the tuples with null values for all other attributes from the right
relation, and adds them to the result of the natural join.
R1 R2
A B B C
1 P Q X
2 Q R Y
3 R S Z
R1 ⟕ R2
A B C
1 P NULL
2 Q X
3 R Y
Q Consider the relations r(A, B) and s(B, C), where s.B is a primary key and r.B is a
foreign key referencing s.B. Consider the query
Q: r ⋈ (σ B<5 (s))
Let LOJ denote the natural left outer-join operation. Assume that r and s contain
no null values. Which of the following is NOT equivalent to Q? (Gate-2018) (2
Marks)
a) σ B<5(r ⋈ s) b) σ B<5(r LOJ s)
c) r LOJ (σ B<5 (s)) d) σ B<5(r) LOJ s
0 0
0 0
Right Outer Join
● The right outer join ( ) is symmetric with the left outer join: It pads tuples from the right
relation that did not match any from the left relation with nulls and adds them to the result
of the natural join.
R1 R2
A B B C
1 P Q X
2 Q R Y
3 R S Z
R1 ⟖R2
A B C
2 Q X
3 R Y
NULL S Z
0 0
Full Outer Join
● The full outer join( ) does both the left and right outer join operations, padding tuples
from the left relation that did not match any from the right relation, as well as tuples from
the right relation that did not match any from the left relation, and adding them to the
result of the join.
R1 R2
A B B C
1 P Q X
2 Q R Y
3 R S Z
R1 ⟗ R2
A B C
1 P NULL
2 Q X
3 R Y
NULL S Z
Q Consider two relations R1(A, B) with the tuples (1, 5), (3, 7) and R1(A, C) = (1, 7), (4, 9).
Assume that R(A, B, C) is the full natural outer join of R1 and R2. Consider the following tuples
of the form (A, B, C)
a = (1, 5, null),
b = (1, null, 7),
c = (3, null, 9),
d = (4, 7, null),
e = (1, 5, 7),
f = (3, 7, null),
g = (4, null, 9).
Which one of the following statements is correct? (Gate-2015) (1 Marks)
(A) R contains a, b, e, f, g but not c, d (B) R contains a, b, c, d, e, f, g
(C) R contains e, f, g but not a, b (D) R contains e but not f, g
Answer: (C)
0 0
Q Consider the following 2 tables
Ans: 3
0 0
DIVISION
Q Consider the given table R and S find the number of elements retrieved by the query
0 0
The following query is made on the database.
T1 ← π CourseName (σ StudentName=SA(CR))
T2 ← CR÷T1
The number of rows in T2 is ______________ . (Gate-2017) (1 Marks)
0 0
Q Information about a collection of students is given by the relation studInfo(studId, name,
sex). The relation enroll (studId, courseId) gives which student has enrolled for (or taken) that
course(s). Assume that every course is taken by at least one male and at least one female
student. What does the following relational algebra expression represent?(Gate-2007) (2
Marks)
πcourceId((πstudId(σsex="female"(studInfo)) × πcourseId(enroll)) − enroll)
(A) Courses in which all the female students are enrolled.
(B) Courses in which a proper subset of female students are enrolled.
(C) Courses in which only male students are enrolled.
(D) None of the above
Answer: (B)
Q Consider the relational schema given below, where eId of the relation dependent is a
foreign key referring to empId of the relation employee. Assume that every employee has at
least one associated dependent in the dependent relation. (Gate-2014) (2 Marks)
employee (empId, empName, empAge)
dependent(depId, eId, depName, depAge)
ΠempId(employee) − ΠempId(employee⋈(empId=eID)∧(empAge≤depAge)dependent)
The above query evaluates to the set of empIds of employees whose age is greater than that
of
(A) some dependent. (B) all dependents.
(C) some of his/her dependents (D) all of his/her dependents.
Answer: (D)
(D) all of his/her dependents. The inner query selects the employees whose age is less than or
0 0
equal to at least one of his dependents. So, subtracting from the set of employees, gives
employees whose age is greater than all of his dependents.
Q Consider the relation Student (name, sex, marks), where the primary key is shown
underlined, pertaining to students in a class that has at least one boy and one girl. What does
the following relational algebra expression produce? (Note: ρ is the rename operator). (Gate-
2004) (2 Marks)
πname{σsex=female(Student)} − πname(Student ⨝ (sex=female∧x=male∧marks≤m) ρn,x,m(Student))
Q With respect to relational algebra, which of the following operations are included from
mathematical set theory? (NET-JUNE-2019)
1) Join 2) Intersection 3) Cartesian Product 4) Project
0 0
a) 1 and 4 b) 2 and 3 c) 3 and 4 d) 2 and 4
Ans: b
Q Consider a selection of the form σ A≤100 (r), where r is a relation with 1000 tuples. Assume
that the attribute values for A among the tuples are uniformly distributed in the
interval [0,500]. Which one of the following options is the best estimate of the number of
tuples returned by the given selection query? (Gate-2007) (2 Marks)
0 0
a) 50 b) 100 c) 150 d) 200
Q Consider a relational table r with sufficient number of records, having attributes A1, A2…, An
and let 1 ≤ p ≤ n. Two queries Q1 and Q2 are given below. (Gate-2011) (2 Marks)
Q1: π A1,…,Ap (σ Ap=c (r)) where cc is a constant
Q2: π A1,…,Ap (σ c1 ≤ Ap ≤ c2 (r)) where c1 and c2 are constants.
The database can be configured to do ordered indexing on Ap or hashing on Ap. Which of the
following statements is TRUE?
a) Ordered indexing will always outperform hashing for both queries
b) Hashing will always outperform ordered indexing for both queries
c) Hashing will outperform ordered indexing on Q1, but not on Q2
d) Hashing will outperform ordered indexing on Q2, but not on Q1
Ans: c
Q Consider the relation enrolled (student, course) in which (student, course) is the primary
key, and the relation paid (student, amount), where student is the primary key. Assume no
null values and no foreign keys or integrity constraints. Assume that amounts 6000, 7000,
8000, 9000 and 10000 were each paid by 20% of the students. Consider these query plans
(Plan 1 on left, Plan 2 on right) to “list all courses taken by students who have paid more than
x”. (Gate-2006) (2 Marks)
0 0
A disk seek takes 4ms, disk data transfer bandwidth is 300 MB/s and checking a tuple to see if
amount is greater than x takes 10 micro-seconds. Which of the following statements is
correct?
(A) Plan 1 and Plan 2 will not output identical row sets for all databases.
(B) A course may be listed more than once in the output of Plan 1 for some databases
(C) For x = 5000, Plan 1 executes faster than Plan 2 for all databases.
(D) For x = 9000, Plan I executes slower than Plan 2 for all databases.
Answer: (C)
Q Consider a join (relation algebra) between relations r(R)and s(S) using the nested loop
method. There are 3 buffers each of size equal to disk block size, out of which one buffer is
reserved for intermediate results. Assuming size(r(R)) < size(s(S)), the join will have fewer
number of disk block accesses if (Gate-2014) (2 Marks)
a) relation r(R) is in the outer loop.
b) relation s(S) is in the outer loop.
c) join selection factor between r(R) and s(S) is more than 0.5.
d) join selection factor between r(R) and s(S) is less than 0.5.
Ans: a
0 0
Q Suppose the adjacency relation of vertices in a graph is represented in a table Adj (X, Y).
Which of the following queries cannot be expressed by a relational algebra expression of
constant length?(Gate-2001) (1 Marks)
(A) List of all vertices adjacent to a given vertex
(B) List all vertices which have self-loops
(C) List all vertices which belong to cycles of less than three vertices
(D) List all vertices reachable from a given vertex
0 0