DBMS Questions Merged
DBMS Questions Merged
It is a collec on of inter – related data in a well-organized manner which represents an aspect of the
real world. The DB allows for retrieval, con nuous modifica on and data search opera ons. Prior to
the DB, a file system was used which stored permanent records aka files. There were a few drawbacks
wit this system –
Data redundancy
Inconsistent data
Maladap ve data
Insecure data
Incorrect data
The DB resolved these issues. We have a few key terms related to DBMS –
ADV OF DB
DISADV OF DB
DB TYPES
Hierarchical
Network
Object – Oriented
Rela onal
HIERARCHICAL DB
The DB is arranged in levels where each parent can have mul ple children but each child has a single
parent.
NETWORK DB
It is the hierarchical DB except here a child can have mul ple parents.
OBJECT – ORIENTED DB
The DB is arranged as classes and objected and they are connected to each other via methods.
RELATIONAL DB
This is the most used DB. It has informa on stored in tables as key – value pairs where every piece of
info is related to every other info. Every DB has a primary key to uniquely iden fy the records.
DBMS is the so ware aka group of programs which handle the data in the DB. DBMS acts as an
interface between user and OS. It takes requests from user and gives it to the OS for servicing the
requests. There can be 3 types of users in a DBMS –
Applica on Programmers – These users write the programs to interact with the data
DB Admin – They manage the en re DBMS and DB environment
End users – They are the ones that send the requests to manipulate the data.
As we can see, DBMS has a lot of advantages. However, it is also very expensive to implement and at
the same me can increase the complexity of the system.
TYPES OF DBMS ARCHITECTURE
Single er architecture – In this case, the DB is available on the client system itself and thus
there is no need for any network connec on or service – request model to manipulate the
data.
Two er architecture – In this case, the client sends a request to the server and then receives
a response. The maintenance and understanding is be er here but the system loses
performance for large number of users.
Three er architecture – In this case, the client doesn’t directly interact with the server. Rather,
it interacts with the applica on server which in turn interacts with the DB to execute the query
and send the response back. This is used when we have large web applica ons.
These insights has helped the DB engineers to divided the DBMS into three level independent modules
–
DATABASE ABSTRACTION
Abstrac on is the process of hiding unnecessary data and only displaying the required data.
DATA INDEPENDENCE
It is the property where a change in one data level should not affect the other data level. This can be
of two types –
Physical – Change in physical loca on should not change the conceptual level or external view
of the data.
Conceptual – Change in conceptual data should not affect the data at the external view level.
FEATURES OF DBMS
DATA LANGUAGES
Data Defini on Language (DDL) – Deals with DB schemas and descrip ons of how data resides
in the database. For eg – CREATE, ALTER, DROP, TRUNCATE, COMMENT, RENAME.
Data Manipula on Language (DML) – Deals with data manipula on and change. For eg –
SELECT, INSERT, DELETE, UPDATE, MERGE, CALL, LOCK TABLE, EXPLAIN PLAN
Data Control Language (DCL) – Deals with the gran ng and revoking permissions using the
GRANT and REVOKE query.
Transac on Control Language (TCL) – Manages the transac ons using ROLL BACK, COMMIT
and SAVE POINT.
ENTITY RELATIONSHIP MODEL (ER MODEL)
This is a conceptual model that gives the graphical representa on of the logical structure of the DB. It
shows how the different en es are related (hence the name). Here are some standard defini ons –
For example,
Here,
Student is the en ty
ROLL_NO is a key a ribute which can be used to uniquely iden fy the student data.
Address is a composite a ribute which is formed by combining Street, City, State, Country.
Phone_No is a mul valued a ribute since a student can have mul ple phone numbers.
Age is a derived a ribute which can be derived if we know the value of DOB a ribute.
RELATIONSHIP
It defines the rela onship between 2 en es and is represented by a diamond shape. For example, if
a student is enrolled in a course, we can represent it as –
The number of en es par cipa ng in the rela onship is called the degree of rela onship.
Addi onally, the number of mes an en ty can par cipate in a rela onship will define the cardinality
of the rela onship.
One-to-one – Marriage is a great example. For one groom, there is only one bride.
Many-to-one – A parent child rela onship. For mul ple children, there can be a single parent.
Many-to-many – Course and students. Mul ple students can enroll in a single course and a
single student can enroll in mul ple courses.
PARTICIPATION CONSTRAINTS
Total par cipa on – All en es in the set must take part in the rela onship
Par al par cipa on – Not all en es are requiring to take part in the rela onship
For example, let us take the rela on of student and roll no. Each student must have a roll no and can
have a unique roll no. Hence, it is a total par cipa on with a one-to-one cardinality. On the other hand,
when students enroll in a course, not all students require to enroll in the course. Hence, this is a par al
par cipa on rela on which is also referred to as Op onal par cipa on.
As men oned previously, a weak en ty set is one which doesn’t have a primary key. On the other hand,
a weak en ty set has a discriminator which is a par al key that is formed by taking mul ple a ributes
at a me. For example, let us assume that we have 2 sets – Building and Apartment. Building is a strong
set as the building numbers are unique. On the other hand, mul ple buildings can have the same
apartment numbers making Apartment a weak set. Here, Apartment no. is a discriminator and is
represented by a dashed line.
Using the above symbols, we can represent the apartment and building rela onship as follows –
NOTE
To form a rela onship, we need a primary key. Hence, it is not possible to have a rela onship between
2 weak sets.
RECURSIVE RELATIONSHIPS
It is the rela onship defined between 2 en es of the same en ty set. For example, let us take the
example of an en ty set of employees and a rela onship called REPORTS_TO. Here,
Any employee can report to any other employee who is senior to them
The CEO doesn’t report to anyone
The junior most employee doesn’t have anyone repor ng to them
In this example, the rela onship is recursive since we are defining it over the same set of Employee
en es. At the same me, it is a par al par cipa on as CEO doesn’t report to anyone. In this
rela onship, suppose employee E1 reports to employee E2. Then, we can assign role names as
Subordinate (E1) and Supervisor (E2).
One of the most important applica ons of ER model is to convert the same to the table. To do so, we
need to follow a certain number of rules –
Rule 1
Any strong en ty set with basic a ributes can be represented using 1 table.
Rule 2
Suppose in the strong en ty set, we have a composite a ribute. In such a case, we s ll need only 1
table but instead of men oning the composite a ribute, we create columns based on the components
of the composite a ribute.
Rule 3
In the case of a strong en ty set which has a mul valued a ribute, we need to create a separate table
for each mul valued a ribute. Hence, if the number of mul valued a ributes is , then we would
need a total of + tables to represent the ER model.
Rule 4
Just like the en ty set, a rela onship can also be converted to a table that includes the Primary key of
the en ty sets in the rela onship.
In this case, the rela onship ENROLLED_IN can be represented as –
Rule 5
This rule deals with the binary rela onships and the cardinality ra os of the same. Hence, there can
be 4 total cases here –
( 1, 2)
( 1, 2)
Many-To-Many
In this case, we need 1 table for A, 1 for B and 1 for R. Total number of tables is 3.
( 1, 2)
( 1, 2)
( 1, 1)
Many-to-One
In this case, many en es of A are mapped to a single en ty of B. For such cases, we can combine A
& R in a single table. Thus, we need 2 tables.
( 1, 2, 1)
( 1, 2)
One-to-Many
Like the previous case, here we combine B & R into a single table thus causing us to need 2 tables.
( 1, 2)
( 1, 2, 1)
One-to-One
Both sides have par al par cipa on, then we can combine either A&R or B&R
Only one side has par al par cipa on, then we can combine A, B and R as 1 table
Both sides have total par cipa on, then also we can combine A, B and R as 1 table
Ques on
Answer
From the ER model, we can see that → is a many-to-one rela onship and → is also a many-
to-one rela onship. Therefore, we need
1 1, 2, 3, 1
2 1, 2, 1
1, 2
GENERALIZATION
This is a bo om – up approach where we take mul ple en es which have something in common
and generalize it to a higher-level en ty. For example, the ER model is shown below –
In this case, en es FACULTY and STUDENT are generalized into a higher – level en ty called PERSON.
Here, P_NAME and P_ADD are the common a ributes since both Facul es and Students have name
and address. However, S_FEE is a specialized a ribute since only a Student pays fees.
SPECIALIZATION
In this case, a higher – level en ty is specialized into mul ple lower – level en es. This is a top – down
approach. For example,
Here, the EMPLOYEE en ty is specialized into either a TESTER or DEVELOPER. The Salary and Name are
common a ributes since they are both present for TESTER and DEVELOPER. However, TEST_TYPE is
only for TESTER and hence is a specialized a ribute.
AGGREGATION
Using regular ER models, we can define rela onships between an en ty and another en ty. However,
there can be cases where we need to define rela onship between an en ty and a rela onship. In this
case, we use aggrega on where we combine a rela onship between en es as a high – level en ty
and then that can be used to form a rela onship further. For example,
In the above case, we are combining the WORKS_FOR rela onship into a high – level en ty. And then
it is used to form the REQUIRE rela onship. Hence, we can now form a rela onship as a Employee
working on a project that requires a certain machinery.
RELATIONAL MODEL
It is a way of defining how the data will be stored in RDBMS. The data is stored in RDBMS is in the form
of tables and rows.
Domain Constraints – Ensures that the value entered in the cell is part of the domain aka
allowed values. For eg, a field called AGE can only house posi ve integers.
En ty Integrity constraints – This states that the primary key can’t be NULL. However, other
values can be null.
Referen al Integrity constraints – This is present between tables. If the foreign key of table A
refers to primary key of table B, then the foreign keys need to be either present in table B or
should be NULL. Basically, Foreign key of A must be a subset of Primary key of B.
Key Constraints – The Primary key must be unique
PREVENTION OF ANOMALY
An anomaly is basically an irregularity. Usually, these occur in 3 process – Insert, Delete and Update.
The Referen al Integrity constraints can be used to prevent these anomalies.
Post the inser on, the foreign key is no longer a subset of the primary key. This is called Insert
anomaly in referencing rela on. This causes Referen al Integrity viola on. To resolve this, one of the
methods used is to deny the update request if it will cause a Referen al Integrity viola on.
Now, let’s say that we delete the row with ROLL_NO = 2 in Table A (referenced rela on). In that case,
the Foreign key will no longer be a subset of Primary key and hence this is a Referen al Integrity
viola on. This is called the Dele on anomaly in referenced rela on. To resolve this, we can employ 3
methods –
If the value is deleted in Table A (referenced rela on), then delete the tuple corresponding to
the same foreign key in Table B as well (referencing rela on). In our example, if we have
deleted ROLL_NO = 2, then we need to delete tuple with EMP_ID = 20 from Table B as well.
This is called On Delete Cascade.
The second solu on is to outright deny the delete requests that cause referen al integrity
viola on
Finally, we also just simply set the value of the foreign key value to NULL. For example, if we
update the ROLL_NO = 2 to be ROLL_NO = 5 in referenced rela on, then we can set the value
of EMP_ID = 20 (ROLL_NO = 2) to NULL.
Now, the foreign keys are no longer a subset of primary keys and hence there is a Referen al Integrity
viola on. This is called Update anomaly in referenced rela on. To solve this, we have again 3 solu ons
–
If the primary key in referenced rela on is updated, update the same foreign key as well in
referencing rela on. This is called On Update Cascade.
Deny the update requests that cause the anomaly and integrity viola ons
If the primary key is updated, then set the previous foreign key value to NULL
TYPES OF KEYS
Super Key – It is a set of a ributes that can be used to uniquely iden fy each tuple.
Candidate Key – The minimal super key is called a candidate key. The candidate keys have to
be unique and NOT NULL. At the same me, the a ributes used in the candidate keys are
called prime a ributes. A rela on can have mul ple candidate keys.
Primary Key – Primary key is a candidate key that the designer uses to uniquely iden fy the
tuples in a rela on. There can only be one primary key in a rela on. ⊆ ⊂
Alternate Key – The candidate keys that are not primary keys are called alternate keys.
Foreign Key – A key is said to be a foreign key if it is present in the referencing rela on and
such that key is dependent on a ribute key in the referenced rela on. Foreign key can be
null and need not be unique. It is a subset of primary key. ⊂
Composite Key – It is a key made from mul ple a ributes. It is also referred to as a compound
key.
Par al Key – It is a key which can’t be used to uniquely iden fy the tuples
Secondary Key – It is used for indexing and improving the speed of data access.
Surrogate Key – It is a key that is unique for all tuples, can be updated and can’t be NULL.
Unique Key – It is a key that is unique for all tuples and can’t be updated. However, it can be
NULL.
Ques on
Answer
(Shankar, 19)
(Shankar, X)
Ques on
Answer
Op on B
Ques on
Answer
A candidate key will be the one which can derive every other key. In this ques on,
Now, we know that → . Addi onally, any key can derive itself so → . Therefore, we get –
Hence, is the candidate key. Now, to get the number of super keys, we need to form combina ons
of EC with the other keys. Thus,
____
We have a slot of each of the other a ributes. Now, these a ributes can par cipate or not. Hence,
=2 =
NOTE
In general, we have –
= 2( )
NORMALIZATION
This is the process of organizing the data in DB which minimizes the redundancy between the rela ons.
It also helps in resolving/elimina ng insert, update and delete anomaly.
First Normal Form (1NF)
In this form, no cell has composite values. There are only atomic values. No two rows are iden cal
and every row has unique values.
In this case, first the table has to be in 1NF form. In addi on to that, there should be no par al
dependency. This means that no non-prime a ribute (NPA) should be dependent on a par al
candidate key. For example, let’s take the case as follows –
( , , , , , )
→
In this case, the prime a ributes are , and the NPAs are , , , . We can see that all NPAs are
dependent on the en re candidate key. Hence, it has no par al dependency. Suppose, we now add
another dependency –
→
In this case, a NPA is dependent on which is a par al candidate key (aka part of the candidate key).
Hence, there is par al dependency and hence the table will NOT be in 2NF.
In this case, the table first has to be in 2NF form. In addi on to that, it must not have transi ve
dependency. This means that there shouldn’t be any NPA deriving another NPA. For example,
( , , , , , )
→
This is the rela on in 2NF form. Here, there is no NPA deriving another NPA. Hence, this table is in 3NF
form. Suppose, we add the following dependency –
→
In this case, which is a NPA is deriving which is a NPA. Hence, this rela on is not in 3NF form.
In this case, the rela on must first be in 3NF form. In addi on to that, for every deriva on → ,
must be a super key. For example,
( , , , , , )
In this case, BC is the candidate key and hence every super key must have BC in it. In the above rela on,
the rela on is in BCNF as BC is a super key. However, let’s say we have the following deriva on –
→
In the above case, does not contain and hence is not a super key. Therefore, the rela on is no
longer in BCNF form.
In this case, the rela on must already be in BCNF form. In addi on to that, there must not be any
mul valued dependency. For a dependency → , is for a single value of mul ple values of exist,
then it is termed as mul valued dependency. For example,
−≫
−≫ _
The rela on must be in 4NF and must not have any join dependency. A join dependency means that
the rela on can be decomposed into lossless sub – rela ons. For a 5NF form, the decomposi on must
be lossy. The difference between lossy and lossless decomposi on is explained later on. This is also
called Project – join Normal Form (PJ/NF)
Ques on
→
→
Find the normal form of this rela on
Answer
By default, we first assume that the rela on is in 1NF. Now, we can see that –
={ }→{ , , }
Hence, is a NPA. Now, from the ques on we can see that NPA ( ) is being derived/is dependent on
par al candidate key ( ). Thus, there is par al dependency and hence the rela on is NOT in 2NF form.
As it is not in 2NF form, it can’t be in any other form as well. Hence, the rela on is in 1NF form.
Ques on
Answer
Ques on
Answer
={ }→{ , , , , , , , }
Thus, { , , , , , } are NPAs. We can see that NPAs ( ) are dependent on par al candidate key
( ). Therefore, there is par al dependency and the rela on is NOT in 2NF. Hence, Op on (a) is correct.
NOTE
In rela ons where all a ributes are prime a ributes, the rela on will always be 3NF but needn’t be
BCNF. If there is a rela on that is binary aka has only 2 a ributes, then it is in BCNF form. BCNF form
has no redundancy.
Trivial deriva ons are those deriva ons → where ⊂ . In case all deriva ons in a rela on are
trivial, the rela on is definitely BCNF.
2NF is the normal form that is based on the concept of “Full Func onal Dependency”
DECOMPOSITION
When we have a rela on that is not in the required normal form, we decompose the rela on into sub
– rela ons. While decomposing into sub – rela ons, we need to ensure –
No data is lost
No dependency is changed
For example, let us assume we have a rela on that is decomposed into 1 and 2. Then, if 1 ⋈
2 ⊃ (natural join of R1 and R2 is a superset of R), then the decomposi on is lossy. On the other
hand, if 1 ⋈ 2 = , then the decomposi on is lossless.
If a rela on decomposi on is lossless, then it needs to sa sfy the following condi ons –
The union of the a ributes of the sub – rela ons must be equal to the a ributes of the original
rela on.
( 1) ∪ ( 2) = ( )
There must be some common a ributes between the sub – rela ons.
( 1) ∩ ( 2) ≠
At least one of the common a ributes must be a candidate key for either of the sub – rela ons.
Now, we are decomposing the rela on into 2 sub – rela ons as follows –
1( , , ) 2( , )
We can now check for lossless condi ons –
( 1) ∪ ( 2) = { , , , } = ( )
( 1) ∩ ( 2) = { } ≠
For 1, we can see from the deriva on → that is the candidate key. Since this is also present
as a common a ribute between the sub – rela ons, it means that this decomposi on is lossless.
One thing to note here is that since the deriva on of is also present in 1, the deriva on is preserved
and hence the decomposi on is also called dependency preserving.
Ques on
Answer
1( , ) →
1( , ) →
A FD is a dependency → such that for a certain value of , the value of remains the same across
instances.
Ques on
Answer
Ques on
Answer
( ) ={ , , , , }
( ) ={ , }
( ) ={ , , , , }
( ) ={ , , , , }
→
→
→
INFERENCE RULES
These are rules that are based on Armstrong’s axioms and are used to derive addi onal FDs from a
given set of FDs. They are 7 inference rules –
Reflexive Rule
Augmenta on Rule
Transi ve Rule
Union Rule
Decomposi on Rule
Pseudo Transi ve Rule
Composi on
This is the reverse of the union rule. So, if → , then we can derive → and →
CANONICAL COVER
This is a simplified, reduced version of the set of FDs. It is an irreducible set of FDs. For a given FD set,
the canonical cover needs to be lossless and at the same me it need not be unique. To get the
canonical cover, we need to perform the following steps –
First, write the FDs such that the RHS has a single element.
Now, for every FD, find the closure.
Next, ignore the FD and see if the closure remains the same or if it changes. If it remains the
same, then we can ignore that FD
For example, let us consider the rela on ( , , , ) with the following FDs –
→
→
→
As per Step 1, we decompose the FDs such that there is only 1 element in the RHS –
→ (1)
→ (2)
→ (3)
→ (4)
→ (5)
→ (6)
Now, we take the FDs one by one and check the closure.
FD (1)
( ) ={ , }
FD (2)
( ) ={ , , , }
We con nue the same for all the FDs and finally, we get –
→
→
→
It is a language used for manipula ng and accessing DBs. SQL is used in RDBMS where data is stored
in the form of tables and records. Each column is called a field. There are some important SQL
commands –
Used to select data from the table. The general syntax is as follows –
In case we want to select all the data from the table, we can use –
To ensure that we get the unique values from the table, we can use the DISTINCT keyword –
SQL Where Statement
The Order By clause is used to arrange the records in either ascending or descending order.
In this case, first the records will be arranged as per column1. If there are certain rows which have the
same value in column1, then the column2 value is checked to check for order.
It is used to insert records in the table. The data can be inserted either by sta ng the columns or by
just providing the values.
Suppose we remove the WHERE condi on, then all the rows under that field will be updated.
This statement is used to specify how many rows from the top that sa sfy the condi ons need to be
returned.
These are used to perform calcula ons on mul ple rows of a single field. These func ons include –
COUNT – Used to get the number of rows. It includes duplicate data and can be applied to
either numeric or non – numeric data.
SUM – Used to get the sum of the row data. It works only for numeric data.
MIN – Gives the min data
MAX – Gives the max data
AVG – Performs the average of the rows. It is basically SUM/COUNT
In the Like operator, ′%′ operator checks for 0 or more occurrences. The ′_′ operator checks for a single
character.
SQL RELATIONAL OPERATORS
Ques on
Answer
These statements are used to combine 2 or more table data. They can be of 4 types –
INNER – This join will select the records in both the tables as long as the condi ons are sa sfied
FULL – This will return the complete le and right table values as long as condi ons are
sa sfied. If there are no matching values, it will return NULL.
LEFT - This will return all the records of the le table and the corresponding matching values
from the right table. If there are no matching values, then it will return NULL
RIGHT - This will return all the records of the right table and the corresponding matched values
from the le table. If there are no matched values, then it will return NULL
UNION – This is used to combine the output of 2 SELECT statements. This also removes
duplicate items.
UNIONALL – It is the same as UNION except it doesn’t remove the duplicate items.
INTERSECT – This returns the intersec on of the outputs of the 2 SELECT statements. It
removes the duplicate items.
MINUS – This returns the rows present in the first query but absent in the second.
RELATIONAL ALGEBRA
It is a conceptual procedural query language which takes a rela on as input and produces a rela on as
an output.
Projec on ( )
Project required column data from a rela on. By default, it removes duplicate data.
Projec on is NOT COMMUTATIVE. Hence,
Normally, the projec on table’s cardinality will be lesser than the original table. However, in case there
are no duplicate tuples, the cardinality will be the same. For a rela on of degree = , we can have a
total of − possible projec ons.
Selec on ( )
Selec on is used to select the required tuples of the rela ons as per the condi on. However, the
selec on doesn’t display the data. For that, we need to use a projec on.
The degree of the input and selected rela ons will be the same. Minimum number of tuples selected
will be 0 and the max number of tuples selected will be the number of tuples in R. Hence,
[0, | |]
Union (∪)
This is the same as the set theory union. Except, the two tables which are being passed through Union
must have the same set of a ributes. Also, it removes duplicate elements. It is both commuta ve and
associa ve.
Intersec on (∩)
This is the same as the set theory intersec on. Except, the two tables which are being passed through
Intersec on must have the same set of a ributes. Also, it removes the duplicate. It is both
commuta ve and associa ve.
Difference (−)
This is the same as the set theory difference. Except, the two tables which are being passed through
difference must have the same set of a ributes. Also, it removes the duplicate. It is associa ve but not
commuta ve.
Condi onal Join is used when you want to join two of more rela on based on some condi ons. For
example,
The command will join the Student and Employee tables and chooses the tuples that have a ROLL_NO
greater than EMP_NO.
One thing to note here is that Join is an Extended/Derived operator. This means that it can be derived
from using the basic operators as well. This can be done as follows –
⋈ ( ) = ( )( × )
Where × is a cartesian product which will map each row of with every row of .
Division Operator (÷)
For the opera on ÷ , the result will be the a ributes that are in but not in aka − . This will
only work when ⊆ . For example,
Suppose we want to get the students that are NOT learning Machine Learning or Data Mining. Then,
we can perform ( )÷ ( ). This is possible since
( )⊆ ( ).
Rename Operator ( )
TYPES OF JOINS
Ques on
Answer
⋈ = . . ( × )
There can be a case where every value of . has a mapping in . . Hence, the max number of tuples
will be 100. On the other hand, it can also be possible that none of the . values are mapping with
. values. Thus, the minimum number of tuples will be 0.
NOTE
In the Selec on operator, and condi on is represented by a ∧ and the or condi on is represented by a
∨. For example, if we want to select tuples in rela on such that they sa sfy condi onA AND
(condi on OR condi onC), then we can write –
∧( ∨ )( )
2 rela ons are said to be Union Compa ble if they sa sfy the following 3 condi ons –
We can use the Union, Intersec on and Difference operators only between Union Compa ble
rela ons.
TRANSACTIONS
It is a set of logically related opera ons. There are mainly two opera ons in a transac on –
There can be cases where the transac on starts opera on but due to any error, it stops mid-way and
hence causes data inconsistency. To handle such situa ons, we have 2 important opera ons –
Ac ve – When the transac on is being executed. All changes are stored in the buffer.
Par ally commi ed – The transac on enters this state when the last instruc on is executed
and the changes are s ll in the buffer.
Commi ed – When the changes from the buffer is stored to the DB via the commit opera on,
it enters the Commi ed state.
Failed – When the transac on encounters a failure either in the ac ve or par ally commi ed
state, it enters the failed state.
Aborted – A er a failure, we use the Rollback opera on to undo the changes and the
transac on enters the Aborted state.
Terminated – This is where the transac on life cycle ends. It can be entered via commi ed or
aborted states.
OS is responsible for monitoring and performing the transac on opera ons. The opera ons in the
transac on cycle are as follows –
ACID PROPERTIES
These are the proper es that need to be followed by the transac ons to ensure DB consistency. These
proper es are –
Atomicity – This states that the transac on must occur completely or not be executed at all.
Instruc ons are not allowed to occur par ally. Atomicity is ensured by Transac on Control
Manager.
Consistency – This property ensures that the integrity constraints of data are maintained. It is
the responsibility of the DBMS and app programmer.
Isola on – This property ensures that mul ple transac ons can occur simultaneously without
causing any inconsistency. Transac ons are executed parallelly without any interac on
between the transac ons. This also means that changes made by a transac on will only be
visible to the other transac ons when it commits the changes to the DB. It is the responsibility
of Concurrency Control manager.
Durability – This property ensures that once the transac on is completed, the data stored in
the disk remains like that even if system failure occurs. This is the responsibility of Recovery
Manager.
Ques on
Answer
CONCURRENCY PROBLEMS
When mul ple transac ons execute concurrently in an uncontrolled manner, there can be several
problems -
It is the situa on where a transac on reads a data which has been wri en by an uncommi ed
transac on. This is because there is s ll a chance that the uncommi ed transac on might roll back
later and this will lead to data inconsistency. However, the data inconsistency only occurs when the
uncommi ed transac on performs a rollback.
For example,
In this case, Uncommi ed transac on T1 has modified the value of A but has not yet commi ed it.
Before commit, transac on T2 reads the value of A. Later, transac on T1 fails and now performs
rollback. This is a classic dirty read case for transac on 2.
This happens when transac on gets different value when repeated reading the same variable. For
example,
Here, Transac on T2 has not updated the value of X but between the 2 reads, Transac on T1 has
updated the value and thus T2 will read 2 different values despite it not upda ng shit.
This happens when mul ple transac ons update the same variable one a er the other. For example,
In this case, First transac on T1 will update the value of A from 10 to 15 (say). Now, a er that
transac on T2 will update A to be 25. Thus, when T1 commits, it will commit 25 which was not what
T1 had update. So, T1’s update has been lost.
Thus, we can see that this problem occurs when there is a write – write conflict.
This is a varia on of the Unrepeatable Read problem. This problem occurs when a transac on reads a
variable once but the next me it reads, the variable has been deleted or does not exist. For example,
In this case, T2 first reads the value of X. However, the next me it tries to read the value of X, the
variable X no longer exists as T1 has deleted it.
SCHEDULE
A transac on is a collec on of opera ons. A collec on of transac ons is known as a schedule. It can
be of 2 types –
Serial Schedule – In this case, a transac on completes en re execu on and only a er that the
next transac on can be executed. This has low throughput and resource u liza on.
Concurrent Schedule – There can be interleaving of transac ons and a transac on can be
interrupted by another transac on. This is also called a non-serial schedule.
In a non-serial schedule, the transac on opera ons of 2 transac ons can either interfere with each
other or not interfere with each other. When they are not interfering, it is called a serializable
schedule. A serializable schedule will maintain the consistency in the DB and behaves like a Serial
schedule.
RECOVERABLE vs IRRECOVERABLE SCHEDULES
Let’s assume that a transac on reads a value of variable from transac on and the read is a dirty
read. Now, if commits the value of before , then the schedule is called an Irrecoverable
schedule. It is called irrecoverable because when commits, there is no rever ng back and we can’t
rollback to recover the false changes. For example –
On the other hand, if there is no dirty read of even if there is a dirty read but commits before ,
then we can perform a rollback and recover the faulty changes. Hence, these schedules are termed as
recoverable schedules. For example –
TYPES OF RECOVERABLE SCHEDULES
Cascading Schedule
Cascadeless Schedule
Strict Schedule
Cascading Schedule
In the example above, we can see that a failure in T1 has occurred. As a result, T2 has to rollback. Since
T2 has to rollback, even T3 has to rollback. Finally, due to T3 rolling back, even T4 has to perform a
rollback. Therefore, there is a cascading rollback that occurs and this is called a Cascading schedule.
In case T2, T3 and T4 would have commi ed before T1 fails, then the schedule would have been
irrecoverable.
Cascadeless Schedule
If a schedule allows only commi ed read i.e. can only read a er commits, then the schedule is
called a Cascadeless Schedule. Since they allow only commi ed read, there is no case where will
fail and cause a rollback of . Hence, the cascading rollback is prevented.
One important thing to note is that Cascadeless schedules allows commi ed reads only but can allow
uncommi ed write opera ons.
In the above example, T2 writes before T1 can commit. This is allowed by a cascadeless schedule.
Strict Schedule
This is a sub – class of Cascadeless schedules which allow only for commi ed writes. Hence, there can
be only commi ed reads and writes.
NOTE
Strict schedules apply the maximum number of restric ons. Every strict schedule is a cascadeless
schedule.
Ques on
Answer
Op on 2.
SERIALIZABILITY
As men oned previously, a non – serial schedule is said to be serializable if it maintains data
consistency throughout the opera ons. Hence, a non – serial serializable schedule behaves like a serial
schedule and has the following proper es as a result –
Conflict
View
Conflict Serializability
If a non – serial schedule can be converted into a serial schedule by swapping non – conflic ng
opera ons, then it is called a conflict serializable schedule. To understand this, we need to understand
what are conflic ng opera ons. Each conflic ng opera on follows the following three rules –
1. Both the opera ons belong to different transac ons each
2. Both the opera ons are being performed on the same variable/data entry
3. At least one of the opera ons is a write opera on.
For example –
Let us take the opera ons W1(A) and R2(A). These opera ons are for different transac ons, being
performed on the same variable and one of them is a write opera on. Therefore, they are conflic ng
opera ons.
To figure out if a schedule is conflict serializable, we need to perform the following steps –
Ques on
Answer
2( ) → 1( )
1( ) → 2( )
3( ) → 2( )
Suppose we have two schedules S1 and S2 which have the same conflic ng opera ons. In that case,
S1 and S2 are said to be conflict equivalent.
View Serializability
There can be cases where the precedence graph of a schedule results in a cycle but the schedule
maintains data consistency. In short, a schedule can have a cycle in precedence graph and at the same
me be serializable. Therefore, just because a schedule is not sa sfying conflict serializa on check
doesn’t mean it is non – serializa on. This is where view serializa on comes into play.
A schedule is called view serializable if it is a view equivalent with a serial schedule. To be view
equivalent, the 2 schedules need to sa sfy the following condi ons –
For example,
Schedule S1 is the serial schedule and S2 is the non – serial schedule. We can see that –
Ques on
Answer
In this schedule, a cycle will exist between T1 and T2 thus making it non conflict serializable. Thus, we
will now check for View serializability. We can see that none of the transac ons have any blind writes.
Hence, the schedule is also not view serializable.
Ques on
Answer
Op on C
Ques on
Answer
We need to write all the conflic ng opera ons of S1, S2 and S3. Once we do that, we will observe that
the conflic ng opera ons of S1 and S2 are the same but S3 are different. Hence, Op on D is the correct
answer.
The storing of a copy of the original data in case of data loss is called a backup. When this backed up
data is used to restore the loss of data, then that process is referred to as recovery. The backup and
recovery procedures are used to improve the DB reliability and data protec on. The main reason for
us to use recovery process is to combat the cases where there is a failure before the transac on
completes/commits. Thus, we need to either Undo/Redo the changes to prevent any data
inconsistency and also ensure atomicity.
TYPES OF FAILURES
System crash – When there is a error during execu on that causes en re system to crash.
System Error – Programming Logical error (Divide by zero)
Local Error – When condi ons are such that a transac on request ends up ge ng cancelled.
Concurrency Control Enforcement – The CCE may decide to cancel a transac on to ensure
serializability and concurrency control.
Disk Failure – When disks lose data due to read/write errors.
Catastrophe – Extreme shit like fire, the , natural disaster, power outage.
Network Failure – Disrup on in the comms between client and server.
Deadlock – Two or more transac ons are wai ng for each other
So ware bugs – Bugs in the DBMS that result in unexpected outcomes
Power Outage – Generator lagwa le bhadve!!!
Data Corrup on – Any of the above failures can cause data to become inconsistent or
corrupted.
DB RECOVERY TECHNIQUES
Undo/Rollback – This is done in cases where the transac on hasn’t commi ed its changes and
encountered a failure. In such case, the recovery system undoes the changes to the last valid
checkpoint.
Redo/Commit – This is done for transac ons that have successfully commi ed and then a er
that encounter a failure. In this case, the recovery system redoes or re-applies the changes it
has made to set the DB back to the most recent stable state.
The transac on states and opera ons are stored in a transac on log file and this acts as the basis for
the DB recovery system. There can also be checkpoints in the DB where the DB regularly stores the
points where the DB was stable which are called checkpoints.
Ques on
Answer
Ques on
Answer
Since the system crash occurs before transac on 7, we can conclude that before the crash transac on
T1 has commi ed and transac on T2 has not commi ed. Therefore, we need to redo write ops of T1
and undo write ops of T2. Hence, the correct answer is Op on B.
Concurrency control is the procedure in DBMS for managing simultaneous opera ons without
conflic ng with each other. The aim of concurrency control protocol is to –
Ensure Serializability
Prevent Deadlock
Prevent Cascading Rollback
Prevent Starva on
Ensure Recoverability
There are different types of concurrency control protocols that are implemented –
Lock is a variable that is associated with a data item that indicates the status of the data item. If we
lock a data item, then no other transac on can operate on it ll we unlock the data item. Simple
concept. We usually deal with 2 types of locks –
Shared Lock (S) – These are also referred to as Read – Only Locks as this lock prevents any
wri ng or modifica on of the data item.
Exclusive Lock (X) – These allow for both read and write opera ons to be performed on the
data item. As a result, the Exclusive lock can be applied to a data item by a single transac on
at a me and hence the name.
NOTE
A Shared Lock allows for just reading of the data item. This means that we can have mul ple
transac ons that can have the Shared Lock over the same data item as mul ple transac ons can read
the same data item without any issues. However, this is not the case with Exclusive Lock as mul ple
simultaneous modifica ons are not allowed and can cause Dirty Read, Loss Update and Phantom Read
problems. Thus, we get a compa bility matrix as follows –
S X
S Compa ble Not Compa ble
X Not Compa ble Not Compa ble
Suppose we have a case where only 1 transac on is having a Shared Hold over a data item. Then, we
can easily Upgrade the Shared lock to the Exclusive Lock → . At the same me, there can be cases
where a transac on has an Exclusive lock on the data item but we no longer have a need to write.
There, we can Downgrade the Exclusive Lock to the Shared Lock → .
Deadlock
In this schedule, we can see that in Opera on 1, T1 applies an Exclusive Lock on B and there is no
unlock. Hence, When T2 tries to apply a Shared Lock on B in Opera on 7, it can’t do it ll T1 unlocks
B. Similarly, Opera on 8 of T1 can’t be executed ll the Shared Lock in Opera on 5 applied by T2 is
unlocked. Therefore, the transac ons are in a state of Deadlock.
Starva on
Serializability
In this case, we can see that T2 performs R(A) a er T1 performs W(A) without commi ng. At the same
me, T1 performs W(A) a er T2 performs R(A) without commi ng. Therefore, even though we are
using the simple locking system, there will be a loop in the precedence graph and the schedule is non
– serializable.
Recoverable
In this case, T2 performs dirty read of A and then commits. Post that, T1 faces an error and rollbacks.
Therefore, we can see that this schedule is irrecoverable.
TWO PHASE LOCKING
As we can see, the simple locking is not a good concurrency control protocol as it can have deadlock,
starva on, irrecoverable and also does not guarantee serializability. To ensure serializability, we can
use 2-Phase Locking (2PL). In 2PL, we have 2 main phases (as the name suggests) –
Growing Phase – In this phase, the transac on will acquire all the locks for all the data items
needed. The opera on where it obtains the final lock is called the Lock Point. This phase allows
for the → upgrada on.
Shrinking Phase – This is the phase where the transac on releases all the locks it has acquired.
This phase allows for the → down grada on.
The transac on can’t release locks in the Growing phase and it can’t acquire locks in the Shrinking
Phase. Therefore, the schedule acts as a serial schedule as all the transac ons follow the 2 phases.
Therefore, 2PL ensures Serializability.
DRAWBACKS OF 2PL
Cascading Rollback
All the transac ons T1, T2 and T3 follows the Growing and Shrinking phases indica ng that they are
following the 2PL protocol. However, we can see that once T1 fails and does a rollback, both T2 and T3
are forced to rollback as a result and therefore, we face the problem of cascading rollback.
Deadlock
In this case, we can see that T1 can’t acquire lock on B thanks to T2 and T2 can’t acquire lock on A
thanks to T1. Therefore, this is a classic deadlock case.
TYPES OF 2PL
Till now, we have discussed the basic 2PL protocol. We have also seen that the Basic 2PL suffers from
Cascading rollback and Deadlock problems. Now, we can see the different types of enhancements done
for the 2PL to ensure it would allow us to overcome the problems –
Strict 2PL
In this case, all the exclusive locks held by the transac on be released un l a er the transac on
commits.
As we can see, the Strict 2PL will solve both the Cascading Rollback and Irrecoverable problem.
However, deadlocks are s ll possible.
Rigorous 2PL
This is a more restric ve version of the Strict 2PL. In this case, both Exclusive and Shared locks need to
be released un l a er the transac on commits. This also ensures that the Cascading Rollback and
Irrecoverable problems are solved. However, deadlocks are s ll possible.
In this case, the transac on needs to predeclare the read/write sets or opera ons. Once it has the
read and write sets, the transac on will acquire the lock if it can get the lock for all the opera ons in
the read or write sets. Therefore, this solves the Deadlock problem but it can’t solve the Cascading
Rollback and Irrecoverable problem.
However, Conserva ng 2PL is not a prac cal solu on as it is not easy to implement a transac on where
it can predeclare the read and write sets. Therefore, this is not a widely used solu on.
Ques on
For the following schedule, implement it using the 2PL protocol and also iden fy the class of 2PL it
belongs to.
Answer
First thing we need to do here is to check whether the given schedule is conflict serializable or not.
Since it is, we can proceed with 2PL implementa on as follows –
We can see that both Shared lock S(A) and Exclusive lock X(B) are being unlocked a er the commit
opera on in T1 and T2. Therefore, the schedule is Rigorous 2PL and since all rigorous 2PL are also
Strict 2PL, the above schedule is also a Strict 2PL.
A mestamp is basically a tag that can be a ached to any transac on which denotes the specific me
on which the transac on was used. To get the mestamp, we can either take the current clock value,
or we can have a logical counter that keeps incremen ng and the value it returns will be the
mestamp. We have 2 mestamps –
As per the me stamp ordering protocol, it is ensured that the transac ons must be accessed in the
order of their mestamps. If we have 2 transac ons 1 and 2 with the mestamps as say 100 and
200 clocks respec vely, then as per TSO protocol, should be accessed before . In short –
This is also referred to as the Basic TSO protocol. Since this protocol decides the serializability, it results
in a schedule which is both conflict and view serializable. In addi on to that, Basic TSO schedules the
transac on based on mestamps which are sta c values aka they don’t change at all. Hence, there is
never a case where a new transac on comes and can execute before the pending transac ons in the
schedule. Therefore, Basic TSO prevents Deadlocks.
Whenever a transac on issue conflic ng Read and Write opera ons, Basic TSO follows the given
algorithm –
Ques on
Assume that ini ally the read and write mestamps for the data items are 0. Addi onally, the
mestamps for T1, T2 and T3 are 100, 200 and 300. Find the instruc on that will be rolled back.
Answer
As we can see, there are conflic ng read and write statements in this schedule. Hence, we need to use
the TSO flowchart that was discussed previously. As a result, we get –
Opera on 1 - ( )
Transac on T1 issues Read of data item A. We can see that ( ) = 0 and ( 1) = 100.
Therefore, ( ) < ( 1) and as a result, ( ) opera on is executed and ( )=
max{ ( ), ( 1)} = 100. Thus,
A B C
RTS 100 0 0
WTS 0 0 0
Opera on 2 - ( )
Transac on T2 issues Read of data item B. We can see that ( ) = 0 and ( 2) = 200.
Therefore, ( ) < ( 2) and as a result, ( ) opera on is executed and ( )=
max{ ( ), ( 2)} = 200. Thus,
A B C
RTS 100 200 0
WTS 0 0 0
Hence, when we con nue the above opera ons, we can see that ( ) opera on will have a rollback.
NOTE
TSO protocol doesn’t remove the possibility of a dirty read happening and also doesn’t exclude the
cases where the transac on T1 will abort a er T2. Hence, TSO s ll has Cascading rollback and
Recoverability issues.
STRICT TSO
This is a varia on of the Basic TSO where transac on T2 will delay its read/write opera ons un l the
transac on T1 who has wri en the value of the same data item commits or gets aborted. Hence, the
schedule becomes Strict and also Recoverable and solves the Cascading Rollback issues.
This is a varia on on the Basic TSO protocol. We know that Basic TSO protocol applies only for Conflict
serializable schedules. However, with Thomas Write rule, we can apply this to any view serializable
schedule and it doesn’t ma er if the schedule is non – conflict serializable.
In the Basic TSO, we can see that when ( ) is scheduled and ( ) > ( ), then the transac on
needs to be aborted. On the other hand, as per Thomas Write rule, the write is ignored and the
transac on is not aborted. The processing con nues further without any rollback. These write
opera ons which are ignored are referred to as Outdated/Obsolete Writes.
Let us assume the schedule above such that ( 2) < ( 1). As a result, we can see that 2 → 1.
Therefore, Opera on 3 where 1 → 2 is not allowed and is referred to as an Obsolete write. Basic
TSO would have aborted the transac on while Thomas Write rule will ignore and move on.
Ques on
Answer
In Thomas write rule, obsolete writes are ignored and the transac ons are not aborted. On the other
hand, Basic TSO will abort the transac ons. Therefore, we can conclude that Thomas Write rule has
higher concurrency when compared to Basic TSO. Thus, Op on A is the correct answer.
In this case, the transac ons are represented as nodes and their conflicts are represented as the edges.
The rules of the graphs are as follows –
Graph based protocols are advantageous as they can solve complex dependencies. However, they are
also computa onally expensive.
CHEAT SHEET
Ques on
Answer
Op on A
INDEXING IN DB
Indexing is a way to minimize the disk accesses required by the DB. This helps to quickly access the
data and prevent unnecessary overhead. An index basically has 2 parts –
Search Key – The first part is the search key which is either a primary/candidate key for which
we need to get the data.
Data Reference/Pointer – This is the set of pointers holding the address of the disk block
where the data can be found
TYPES OF INDEXING
Primary Indexing
This is when the primary key is used as the index. Since the PK is always unique and ordered, the
indexing will be in ascending order and at the same me will be 1:1 mapping. Since the indexes are
ordered, it is efficient to search. There are two types of Primary Indexing –
Dense Index – In this indexing method, each value in the data file, there exists an index.
Sparse Index – In this case, only select values in the data file have a corresponding index
mapping. To get the disk address for the values that don’t have an index map, we go to the
nearest value with index map and then parse the disk sequen ally a er that ll we get the
desired value.
Clustered Indexing
When two or more records are stored in the same file loca on, then it is called Clustered Indexing.
There are cases where we need to perform clustering based on non – primary key a ributes. In such
cases, we will have to combine mul ple a ributes and then perform the cluster Indexing. The output
of clustered indexing is always in ascending order.
In this case, the data is not present in the clusters. Instead, the virtual references of the data are
present in the clusters. Here, the data is not sorted but the references are ordered. Since the data is
not physically stored in the clustering, we need to perform more opera ons to get the data in this case
when compared to Clustered indexing. For example –
In general, we can classify the indexing methods as follows –
INDEXING ATTRIBUTES
MULTILEVEL INDEXING
When the size of the database grows, the indices also grow. Thus, a single index will be too large to be
stored in the main memory. To solve this, we use mul level indexing where we have outer blocks and
inner blocks.
FILE ORGANIZATION
A file is a collec on of related informa on on the secondary memory. A File Organiza on refers to the
logical rela onships between the files to help us access the files. There are five types of file
organiza ons –
Sequen al
Heap
Hash
B+ Tree
Clustered
This is the easiest method where the files are stored in a sequence. There are 2 ways to implement
this method –
Pile File Method – In this case, the files are stored in the same sequence in which they are
added.
Sorted File Method – In this case, the files are added and then sorted in the ascending order.
The Sequen al File Organiza on is simple and cheap, but at the same me is takes a long me to
access a file (sequen al searching) and will have the highest space for storage.
In this case, the files are stored in blocks. There is no sor ng and the DBMS has the responsibility to
take care of the mapping.
This is good for smaller DBs but at the same me there are a lot of unused memory blocks.
In this case, the index can be found by using a hash func on. The records are stored in memory
loca ons which are called Data buckets. There can be 2 types of hashing in DBMS –
Sta c – In this case, the number of data buckets remains the same and the hash func on also
remains the same. So, the indexing result is always the same.
Dynamic – In Sta c hashing, the number of data buckets remain the same. To accommodate
the cases where the data buckets increase, we resort to dynamic hashing.
BUCKET OVERFLOW
In this case, the hash func on produces the block which is already occupied. For this, there can be 3
solu ons –
These are self – balancing trees which are used to store and manage large data sets and helps to
simplify the scenarios where Mul level Indexing has been implemented. It is called a balanced tree
because all the leaf nodes are on the same level.
In a B – tree, every node can have mul ple keys in it. Each node is made up of the following three
elements –
Keys – These are the keys that help to perform the indexing.
Record Pointer (RP) – Each key has a corresponding record pointer which points to the block
in the secondary memory where the required data for that key is stored.
Block/Tree Pointer (BP) – It is the pointer that points to the child nodes for that node in the B
– tree.
The max no of children any node can have is termed as the order of the B-tree. Therefore, when we
say that if a B-tree has an order of 4, it means that any node can have a maximum of 4 children.
Addi onally, for any node we have –
ℎ = +1
From here also, we can see that the number of keys is 1 less than the number of children.
NOTE
Suppose we have nodes and the order of the B-tree is , then we can write –
ℎ ℎ =ℎ = (log ( + 1)) − 1
+1
ℎ ℎ =ℎ = log
2
B-TREE TRAVERSAL
A B – tree follows the Inorder traversal where we first print the le child, then the root and finally the
right child.
B-TREE SEARCHING
We know that keys in a node are ordered and at the same me, we have –
ℎ < < ℎ ℎ
In the above case, overflow refers to the case where the node is trying to fit more than the max
number of keys it can. Let us take an example as follows –
= {1,2,3,4,5,6,7,8,9,10}
=4
For this case, we can have maximum of 3 keys per node. Let us begin now
Ini ally, there are no nodes and keys inserted. Thus, we create a root node and allocate the element 1
to it as follows –
Once this is done, we can also insert keys 2 and 3 in the same node as there will be no overflow –
Now, we can try to insert key 4 to the node. In this case, there is an overflow. Thus, we need to find
the median of the keys and push the median as the root node and split the other keys as the le and
right children.
(1,2,3,4) = 2 3
We shall take 2. Thus, we keep key 2 as the root node and have –
ℎ = {1}
ℎ ℎ = {3,4}
Now, we can add key 5 to the right child of the root node without any overflow. However, when it
comes to adding key 6, we will face overflow and must repeat the process again –
(3,4,5,6) = 4
ℎ = {3}
ℎ ℎ = {5,6}
Hence, Key 4 is now pushed to the root level. We can con nue this en re process and end up with the
following B – tree –
B – TREE DELETION
For example,
In the above B-tree, the minimum number of keys in a node can be 2. Let us perform the dele on
opera ons –
Delete Key 6
We can see that Key 6 is in a leaf node and at the same me the node doesn’t go into underflow.
Therefore, we can simply delete 6.
Delete Key 13
We can see that key 13 is an intermediate node and thus, we can’t simply delete it. To delete it, we
can go for pushing the largest predecessor (12) to the root level. We can’t push the smallest successor
(14) to the root level as the number of keys need to be at least 2 in each node. Therefore, we get –
Delete Key 7
Again, key 7 is an intermediate node. In this case, we can’t take the predecessor or successor as at
least 1 node will be le with a single key in either scenario. Hence, we will merge the two leaf nodes
and get the following tree –
Delete Key 2
Here key 2 is a leaf node but we can’t simply delete it as it will cause the node to have just 1 key le .
Hence, we take the successor from the root node (3) and to replace 3, we take the successor from the
right child (4) into the root node.
NOTE
= ( )
B+ TREE
These are an enhanced version of the B – trees wherein the intermediate nodes don’t have the
record/data pointers. Only the leaf nodes have the record/data pointer. In short, the intermediate
nodes only have keys that help in naviga ng and indexing through the B+ tree. Only the leaf nodes will
store the RPs and can be used to access the disk storage.
One more important thing is that the leaf nodes are arranged as a linked list to ensure there is
sequen al access of the RPs and disk data. Therefore, for a B+ tree, we can see –
One more thing to note here is that since we are not storing the RPs at every node, we can save up on
space and also have a larger number of leaf nodes which ensures be er indexing. In addi on to that,
there can be cases where a key is repeated in an intermediate node and also in a leaf node.
Other than this, B+ and B trees are both balanced, ordered and mul -level.
Question
Answer
Option A is correct
Question
Answer
We know that 𝑟1 → 𝑟2 is a many to one relationship. So, multiple records in 𝑟1 will be mapped to the
same entity in 𝑟2. However, if we have an entity in 𝑟2, there are multiple entities in 𝑟1 that can be
mapped to it. Therefore, the 𝑟1 entity will be able to uniquely identify an entity in 𝑟2. Hence, Option
A is the correct answer
Question
Answer
To convert a weak entity set to a strong entity set, we need to combine the primary key of the strong
set with the discriminator from the weak set. Hence, Option 4 is the correct answer
Question
Answer
Since Y is the dominant entity and it has a subordinate as X, deleting Y will also delete X. Hence, Option
B is correct.
Question
Answer
• 𝐸1 → 𝐸2 is a many to one relationship. This means that multiple elements in 𝐸1 are mapped
to 1 element in 𝐸2.
• We can also see that 𝐸1 participates totally. That means, each element in 𝐸1 has a mapping.
Therefore, from the above statements we can conclude that the relationship allows for every element
in 𝐸1 to be mapped to a single element in 𝐸2. Thus, Option A is correct.
Question
Answer
The answer is Option 3 since Owner Entity set is another name for Identifying Set.
Question
Answer
Since Weak Entity set is dependent on Strong Entity set, the deletion of the Strong entity will cause
the weak entity to be deleted. Hence, Option A is TRUE. Also, the weak entity set forms its keys by
taking the Primary Key of the strong entity set and the discriminator. Since Primary keys are all unique,
there will no duplication, redundancy or inconsistency in the data of the weak entity as well. Hence,
Option B is TRUE. Since we know that weak entity set doesn’t have a primary key, Option C is also
TRUE.
Question
Answer
The immunity to change is called Data Independence. The data independence between External
schema (View) and Conceptual Schema is referred to as Logical Data Independence making the
correct option to be Option B
Question
Answer
We know that one author can write multiple books and at the same time, one book may require the
contribution of multiple authors. Therefore, this is a many-to-many relationship which in turn makes
Option 2 as correct.
Question
Answer
Cardinality gives the information of the max number of times an entity can participate in the
relationship. Hence, Option 1 is correct.
Question
Answer
𝐸1 = (𝑎1, 𝑎2)
𝐸3(𝑐1, 𝑐2)
Question
Answer
𝐴(𝑎1, 𝑎2)
𝑅2(𝑎1, 𝑐1)
𝐷(𝑑1, 𝑑2)
Question
Answer
𝐵(𝑏1, 𝑏2)
𝐶(𝑐1, 𝑐2)
𝑅3(𝑏1, 𝑐1)
Question
Answer
𝑃𝑒𝑟𝑠𝑜𝑛(𝑁𝐼𝐷, 𝑁𝑎𝑚𝑒)
𝐸𝑥𝑎𝑚(𝐸𝑥𝑎𝑚𝐼𝐷, 𝐸𝑥𝑎𝑚𝑁𝑎𝑚𝑒)
Answer
𝑅2 𝑅12 (𝑋, 𝑌, 𝐴)
𝑅1 (𝐴, 𝐵)
𝑅2 (𝐴, 𝐶)
Question
Answer
Answer
𝐴(𝑎1, 𝑎2)
𝐴(𝑎1, 𝑎3)
𝑅(𝑎1, 𝑏1)
𝐵(𝑏1, 𝑏2)
𝐵(𝑏1, 𝑏3)
Question
Answer
Since there are 2 arrows, we can see that this is a one-to-one relationship. At the same time, we can
see that E1 has total participation. That means, all 𝑛 entries in E1 are being mapped to a single element
in E2. Therefore, there will be exactly 𝒏 entries in the relationship set and hence Option 3 is the correct
answer.
Question
𝐸1𝑅(𝐴, 𝐵, 𝐶, 𝐷)
𝐸2𝑅2(𝐷, 𝐸, 𝐹, 𝐺)
𝐸3(𝐺, 𝐻)
Question
Answer
𝐸2 (𝐶, 𝐷)
𝐸1 𝑅𝐸4 (𝐴, 𝐵, 𝐸, 𝐹)
𝐸3 (𝐺, 𝐻)
Question
Answer
Question
Answer
We need to store the information about the rent being paid by a person lodging in a particular hotel
room. Since, we need both the hotel room and person details, the attribute rent should belong to
lodging as it maps the person to the hotel room. Hence, Option C is the correct answer.
Question
Answer
𝑅2(𝑎1, 𝑏1)
Question
Answer
Question
Answer
Question
Answer
Question
Answer
• 𝑀
• 𝑁𝑅1
• 𝑅2
Question
Answer
We know that when we have multiple tables involved in a relation, we can have multiple foreign keys
thus making S1 false. At the same time, there can be a case where the foreign and primary keys are
the same. In such cases, we can use the foreign key to refer to tuples (rows) of R and hence even S2
is false.
Question
Answer
𝐴𝑅(𝑎, 𝑏, 𝑐)
𝐵(𝑐, 𝑑)
Question
Answer
Question
Find the minimum number of tables and number of foreign keys used to realize this ER model
Answer
𝑋𝑅(𝑐, 𝑚, 𝑏, 𝑎)
𝑌(𝑎, 𝑏, 𝑐)
Now, we can see that 𝑍 has been generalized into 𝑋. Hence, 𝑍 will have the primary key of 𝑋 which
will in turn contain the primary key of 𝑌. Thus,
𝑍(𝑚, 𝑛, 𝑐, 𝑎)
Since 𝑌 is neither a subset and is also a strong entity, we try to check for its primary key which is 𝑎.
We can see that key 𝑎 is used in both 𝑋𝑅 and 𝑍. Hence, there are 3 tables and 2 foreign keys.
Question
Answer
Option 2 is the correct answer. The number of super keys will be 27 = 𝟏𝟐𝟖
Question
Answer
Answer
Question
Answer
{𝐹2} → {𝐹2, 𝐹4, 𝐹1, 𝐹3, 𝐹5}
Question
{𝐴} → {𝐴, 𝐵, 𝐶, 𝐷, 𝐸}
We can see that none of the keys derive 𝐹. Hence, the candidate key will be 𝑨𝑭
Question
Let a Relation 𝑅 have attributes {𝑎1, 𝑎2, 𝑎3 … 𝑎𝑛} and the candidate key is 𝑎1𝑎2. Then, find the
number of possible super keys.
Answer
We know that,
𝑁𝑜 𝑜𝑓 𝑎𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑠 = 𝑛
𝑁𝑜 𝑜𝑓 𝑎𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑠 𝑖𝑛 𝐶𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒 𝑘𝑒𝑦 = 2
Hence,
Question
Answer
This is a different case since we have 2 separate candidate keys. Thus, in this case –
Question
Answer
𝑆(𝑎1 ∪ 𝑎2𝑎3) = 𝑆(𝑎1) + 𝑆(𝑎2𝑎3) − 𝑆(𝑎1 ∩ 𝑎2𝑎3) = 2𝑛−1 + 2𝑛−2 − 2𝑛−3 = 𝟓 ∗ 𝟐𝒏−𝟑
Question
Answer
𝑆(𝑎1𝑎2 ∪ 𝑎3𝑎4) = 𝑆(𝑎1𝑎2) + 𝑆(𝑎3𝑎4) − 𝑆(𝑎1𝑎2 ∩ 𝑎3𝑎4) = 2𝑛−2 + 2𝑛−2 − 2𝑛−4 = 𝟕 ∗ 𝟐𝒏−𝟒
Question
Answer
𝑆(𝑎1𝑎2 ∪ 𝑎1𝑎3) = 𝑆(𝑎1𝑎2) + 𝑆(𝑎1𝑎3) − 𝑆(𝑎1𝑎2 ∩ 𝑎1𝑎3) = 2𝑛−2 + 2𝑛−2 − 2𝑛−3 = 𝟑 ∗ 𝟐𝒏−𝟑
Question
Answer
Here,
𝐶𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒 𝑘𝑒𝑦 = 𝐴𝐵
Since we have a derivation as 𝐵 → 𝐺, we can conclude that the relation is NOT in 2NF.
Question
Answer
Question
Answer
𝐶𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒 𝑘𝑒𝑦 = 𝑆 𝑜𝑟 𝑇 𝑜𝑟 𝑈 𝑜𝑟 𝑉
Since all the attributes are prime attributes, the relation is in BCNF form and hence Option D is the
correct answer.
Question
Answer
Here,
• Schema IV is 1NF
• Schema III is 2NF
• Schema II is 3NF
• Schema I is BCNF
Question
Answer
𝐶𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒 𝑘𝑒𝑦 = 𝐴, 𝐸, 𝐶𝐷
Since we have 𝐵 → 𝐷 and 𝐵 is NOT a super key, the above relation is in 3NF but no BCNF.
Question
Answer
𝑅𝑒𝑙(𝑁, 𝐶, 𝑅, 𝐺)
𝑁𝐶 → 𝐺
𝑅𝐶 → 𝐺
𝑁→𝑅
𝑅→𝑁
Hence,
Answer
𝐶𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒 𝐾𝑒𝑦 = 𝑎𝑏
Since a NPA (𝑐) is being derived from a partial candidate key (𝑎), we can say that there is partial
dependency and hence this is a 1NF form.
Question
Answer
𝑅(𝑉, 𝑁, 𝑆, 𝐸, 𝑇, 𝑌, 𝑃)
𝑉𝑁𝑆𝐸 → 𝑇
𝑉𝑁 → 𝑌
𝑉𝑁𝑆𝐸 → 𝑃
From the derivation 𝑉𝑁 → 𝑌, we can see that this has partial dependency and hence is a 1NF form.
𝑅(𝑉, 𝑁, 𝑆, 𝐸, 𝑇, 𝑃)
𝑉𝑁𝑆𝐸 → 𝑇
𝑉𝑁𝑆𝐸 → 𝑃
Since there is no longer a partial dependency, we can conclude that the weakest NF which is satisfied
by the new relation and not by the old relation is 2NF.
Question
Answer
3NF form
Question
Answer
We know from the question that one student can have multiple bank accounts and 2 students can
share a bank account (joint account). Hence, BankAccount_Num can’t be a candidate key. So, Option
A is INCORRECT.
Since it is mentioned that each student has a unique Registration_Num, it can be used as a Primary
key for Student relation. Hence, Option B is CORRECT.
It is also mentioned that UID is a unique ID for people of the country. Hence, if all the students are of
the same country, they will have unique UIDs and hence it can be used as a primary key in that
scenario. Therefore, Option C is CORRECT.
Finally, let’s assume Registration_Num is the Primary key. Then, we can have –
Answer
Here,
𝑃𝐴 = {𝑒𝑚𝑝𝑐𝑜𝑑𝑒}
𝑁𝑃𝐴 = {𝑛𝑎𝑚𝑒, 𝑠𝑡𝑟𝑒𝑒𝑡, 𝑐𝑖𝑡𝑦, 𝑠𝑡𝑎𝑡𝑒, 𝑝𝑖𝑛𝑐𝑜𝑑𝑒}
From the question, we get –
Question
Answer
Answer
𝑃𝐴 = {𝐴, 𝐵, 𝐶, 𝐷}
Hence, the relation is 3NF. It is not BCNF since 𝐶 → 𝐴 has LHS which is not a super key.
Question
Answer
Question
𝐴𝐵 → 𝐶
𝐷→𝐴
Suppose we decompose the relation into 𝑅1(𝐴, 𝐷) and 𝑅2(𝐵, 𝐶, 𝐷). Is the decomposition dependency
preserving?
Answer
In this case, we can get 𝐷 → 𝐴 using 𝑅1 but under no circumstances will we be able to achieve 𝐴𝐵 →
𝐶. Hence, this is a non – dependency preserving decomposition.
Question
𝐴𝐵 → 𝐶𝐷
𝐶→𝐷
𝐷→𝐸
Suppose we decompose the relation into
• 𝑅1(𝐴, 𝐵, 𝐶)
• 𝑅2(𝐶, 𝐷)
• 𝑅3(𝐷, 𝐸)
Is the decomposition dependency preserving?
Answer
From 𝑅2 and 𝑅3, we can directly get 𝐶 → 𝐷 and 𝐷 → 𝐸. Now, from 𝑅1, we get 𝐴𝐵 → 𝐶 and from 𝑅2
we get 𝐶 → 𝐷. Thus, we can also get 𝐴𝐵 → 𝐶𝐷. Therefore, this is dependency preserving
decomposition.
Question
Answer
One thing to note here is that in the Student table, the name Raj appears twice. Since the query is
grouping by Student name, the result will have only 2 tuples.
Question
Answer
𝐴𝑉𝐺 = 𝟐. 𝟔
Question
Answer
Answer
The WITH clause is used to form temporary relations. Here, we create 2 temporary relations –
• Total
• Total_avg
Question
Answer
Answer
In a SQL query, if we use HAVING without a GROUP BY clause, then the HAVING clause acts as a WHERE
clause. Hence, Statement P is True but Statement Q is False. Additionally, there can be queries where
we don’t select an attribute but refer to that attribute in GROUP BY. Hence, Statement R is False but
Statement S is True. Hence, Option B is the correct answer.
Question
Answer
We know that FULL OUTER JOIN will have the maximum coverage out of all the other options. Hence,
Query 4 is the correct answer
Question
Answer
Basically, we need the query to be such that P1.Capacity is greater than or equal to ALL of the other
cinema capacities. Hence Option A is the correct answer.
Question
Answer
Option C
Question
Answer
Question
Answer
Option D
Question
Answer
Question
Answer
• 1123, John
• 9876, Bart
Question
Write the relational algebra query for the following SQL query –
Answer
Question
Find the output for 𝑈𝑠𝑒𝑟 ⋈ 𝑂𝑐𝑐𝑢𝑝𝑎𝑡𝑖𝑜𝑛 ⋈ 𝐶𝑖𝑡𝑦
Answer
Question
Answer
The above relational algebra query will return the difference of the following sets –
• All empIDs
• All empIDs of the employees who are younger than or the same age as their dependents
Question
Answer
Answer
Question
Answer
Question
Answer
We know that, if a relation 𝑅 is decomposed into 𝑅1 and 𝑅2, then for a lossy decomposition –
𝑅 ⊂ 𝑅1 ⋈ 𝑅2
Since it is given that 𝑆 = 𝑟1 ∗ 𝑟2 where ∗→⋈, we can write –
𝑟⊂𝑠
Hence, Option C is the correct answer.
Question
Answer
Option D
Question
Answer
So, the result of 𝑇2 will be the students who study all the three courses in T1. Hence, we can see that
𝑇2 = {𝑆𝐴, 𝑆𝐶, 𝑆𝐷, 𝑆𝐹}. Hence, the number of rows will be 4.
Question
Answer
Question
Answer
Question
Answer
Question
Answer
Question
Answer
Answer
Option 1.
Question
Answer
Option 2
Question
Answer
Option 1
Question
Answer
Option 3
Question
Answer
Question
Answer
Option D
Question
Answer
T2 reads the value of A and then commits before T1. Later, T1 aborts and hence causing data
inconsistency. Therefore, the schedule is non – recoverable.
Question
Answer
Question
Answer
Question
Answer
Option A
Question
Answer
There is no dirty read in this case and hence S is recoverable. Thus, Option A is FALSE. Additionally,
since there is no commit statements, S will not be strict. Hence, Option D is also FALSE. Now, when
T2 aborts before T1, then we would have to cascade the abort process. However, since T1 aborts
before T2, there will be no cascade abort. Hence, Option B is FALSE and Option C is TRUE.
Question
Answer
Question
Answer
Answer
• 𝑅2(𝐼) → 𝑊3(𝐼)
• 𝑊3(𝐼) → 𝑊2(𝐼)
• 𝑊3(𝐼) → 𝑊1(𝐼)
• 𝑊3(𝐼) → 𝑅1(𝐼)
• 𝑊2(𝐼) → 𝑅1(𝐼)
• 𝑊2(𝐼) → 𝑊1(𝐼)
• 𝑊2(𝐽) → 𝑅1(𝐽)
• 𝑊2(𝐽) → 𝑊1(𝐽)
Thus, we can draw the precedence graph as follows –
Since there is a cycle, the schedule is non – conflict serializable.
Question
Answer
Question
Answer
Answer
Question
Answer
Conflict serializable
Question
Answer
Option B
Question
Answer
Option A
Question
Answer
Option B
Question
Answer
Option D
Question
Answer
Option C
Question
Answer
Option A
Question
Answer
Option A
Question
Answer
Option B
Question
Answer
Option A
Question
Answer
Option B
Question
Answer
Basically, we need to find the possible combinations of the operations that cause cycle from 𝑇1 → 𝑇2
and also from 𝑇2 → 𝑇1. This can be done by trial and error. The correct answer is 4.
Question
Answer
Question
Answer
Option B
Question
Answer
Option A
Question
Answer
Option B
Question
Answer
Option A
Question
Answer
Option B
Question
Answer
Option B
Question
Answer
Option C
Question
Answer
Question
Answer
Option B
Question
Answer
This is basic TO protocol. Hence, we know it is deadlock free. However, the ordering of the transactions
is done based on the timestamps which are static values are can’t change. Hence, the DBMS is also
starvation free. Thus, Option A is the correct answer.
Question
Answer
Question
Answer
Question
Answer
𝒑𝒍𝒆𝒂𝒇 = 𝟒𝟔
Question
Answer
Here, we need to check for intermediate nodes and first we will get the order of the nodes. An
intermediate node looks like –
Thus, we get –
Therefore, we have a 3 – level B+ tree. So we need to access 3 levels and then access the data from
the node. Therefore, we need 4 memory access.
Question
Answer
Question
Answer
Option A
Question
Answer
Option C
Question
Answer
Option C
Question
Answer
Option B
Question
Answer
Option C
QUESTION BANK
Question 1
SQL is a non – procedural/declarative language in which we just need to provide the data to be queried
but don’t need to provide the method of querying. Is this true or false?
Question 2
Question 3
Question 4
Question 5
Question 6
Question 7
Question 8
Question 9
Question 10
Question 11
ANSWER KEY
1 True
2 B
3 A
4 D
5 D
6 A
7 C
8 B
9 C
10 1
11 5