DBMS Note
DBMS Note
Sir’s Note: sir er book pdf theke dagano question + sir er deya notes + this copy. Sob mile prte hbe.
Degree of The changes made at the physical Any changes made at the physical
Changes level need not be made at the level need to be made at the
Required application level. application level as well.
Internal We may or may not need the Making modifications at the logical
Modification modifications at the internal level level is a prerequisite whether we
for improving the performance of a want to change the database structure
system’s structure. or not.
Type of Schema The internal schema is the primary The conceptual schema is the primary
concern. concern.
1. What it is Used to manage and organise the files A software to store and retrieve the
stored in the hard disk of the computer user’s data
4. Data consistency Data consistency is low Due to the process of normalisation, the
data consistency is high
5. complexity Less complex, does not support More complexity in managing the data,
complicated transactions easier to implement complicated
transactions
7. expense Less expensive in comparison to DBMS Higher cost than the File system
8. crash recovery Does not support crash recovery Crash recovery mechanism is highly
supported
The procedural languages are command-driven or The non-procedural languages are fact-oriented.
statement-oriented.
The programs in procedural language specify what is to The programs in non-procedural language specify
be accomplished by a program and instruct the what is to be done and do not state exactly how a
computer on accurately how the evaluation is result is to be evaluated.
completed.
Procedural languages are used for application and Non-Procedural languages are used in RDBMS, expert
system programming. systems, natural language processing, and education.
These are imperative programming languages. These are declarative programming languages.
The textual context or execution sequence is considered. There is no need to consider textual context or
execution sequence.
Machine efficiency is good. The logic programs that use the only resolution face
serious problems of machine efficiency.
The procedural paradigm leads to a large number of the There are no such connections present in the non-
probable network between functions and data if there procedural paradigm.
are many functions and many global data items.
Simple Attributes
Composite Attributes
Single Valued Attributes
Multi-Valued Attributes
Derived Attributes
Complex Attributes (Rarely used attributes)
Key Attributes
Stored Attributes
Now, we will study all of these different types of attributes in DBMS in detail along with their diagrams and
examples :)
Simple Attributes
Simple attributes in an ER model diagram are independent attributes that can't be classified further and also, can't
be subdivided into any other component. These attributes are also known as atomic attributes.
Example Diagram:
As we can see in the above example, Student is an entity represented by a rectangle, and it consists of
attributes: Roll_no, class, and Age. Also, there is a point to be noted that we can't further subdivide the Roll_no
attribute and even the other two attributes into sub-attributes. Hence, they are known as simple attributes of the
Student entity.
Composite Attributes
Composite attributes have opposite functionality to that of simple attributes as we can further subdivide composite
attributes into different components or sub-parts that form simple attributes. In simple terms, composite
attributes are composed of one or more simple attributes.
Example Diagram
As we can see in the above example, Address is a composite attribute represented by an elliptical shape, and it
can be further subdivided into many simple attributes like Street, City, State, Country, Landmark, etc.
Single-Valued Attributes
Single valued attributes are those attributes that consist of a single value for each entity instance and can't store
more than one value. The value of these single-valued attributes always remains the same, just like the name of a
person.
Example Diagram:
As we can see in the above example, Student is an entity instance, and it consists of attributes: Roll_no, Age,
DOB, and Gender. These attributes can store only one value from a set of possible values. Each entity instance can
have only one Roll_no, which is a unique, single DOB by which we can calculate age and also fixed gender. Also,
we can't further subdivide these attributes, and hence, they are simple as well as single-valued attributes.
Multi-Valued Attributes
Multi-valued attributes have opposite functionality to that of single-valued attributes, and as the name suggests,
multi-valued attributes can take up and store more than one value at a time for an entity instance from a set of
possible values. These attributes are represented by co-centric elliptical shape, and we can also use curly braces
{ } to represent multi-valued attributes inside it.
Example Diagram:
As we can see in the above example, the Student entity has four attributes: Roll_no and Age are simple as well as
single-valued attributes as discussed above but Mob_no and Email_id are represented by co-centric ellipse are
multi-valued attributes. Each student in the real world can provide more than one email-id as well as a mobile
contact number, and therefore, we need these attributes to be multi-valued so that they can store multiple values at
a time for an entity instance.
Derived Attributes
Derived attributes are those attributes whose values can be derived from the values of other attributes. They are
always dependent upon other attributes for their value.
For example, As we were discussing above, DOB is a single-valued attribute and remains constant for an entity
instance. From DOB, we can derive the Age attribute, which changes every year, and can easily calculate the age
of a person from his/her date of birth value. Hence, the Age attribute here is derived attribute from
the DOB single-valued attribute.
Example Diagram:
Key Attributes
Key attributes are special types of attributes that act as the primary key for an entity and they can uniquely identify
an entity from an entity set. The values that key attributes store must be unique and non-repeating.
Example Diagram:
As we can see in the above example, we can say that the Roll_no attribute of the Student entity is not only simple
and single-valued attribute but also, a key valued attribute as well. Roll_no of a student will always be unique to
identify the student. Also note that the Gender and Age of two or more persons can be same and overlapping in
nature and obviously, we can't identify a student on the basis of them. Hence, gender and age are not key-valued
attributes.
Complex Attributes
Complex attributes are rarely used in DBMS. They are formed by the combination of multi-valued and composite
attributes. These attributes always have many sub-sections in their values.
Example Diagram:
As we can see in the above example, Address_EmPhone (which represents Address, Email, and
Phone number altogether) is a complex attribute. Email and Phone number are multi-valued attributes
while Address is a composite attribute which is further subdivided as House number, Street,
City & State. This combination of multi-valued and composite attributes altogether forms a complex
attribute.
Stored Attributes
Values of stored attributes remain constant and fixed for an entity instance and also, and they help in
deriving the derived attributes. For example, the Age attribute can be derived from the Date of
Birth attribute, and also, the Date of birth attribute has a fixed and constant value throughout the life of
an entity. Hence, the Date of Birth attribute is a stored attribute.
Example Diagram:
As we can see in the above image, there are different types of attributes in DBMS, well slotted for each field for an
entity instance.
Answer:
Let's say we take two relations, namely R and S that are created by using two entity sets in a way that
every entity in R is also S entity. Inclusion dependence occurs when projecting R's key attributes gives a relation
that is contained in the relation acquired by projecting S's key attributes.
Let's name the relations R as teacher and S as student, so take the attribute as teacher_id, so we can write:
teacher:
student:
In the first phase, when the transaction begins to execute, it requires permission for the locks it needs.
The second part is where the transaction obtains all the locks. When a transaction releases its first lock,
the third phase starts.
In this third phase, the transaction cannot demand any new locks. Instead, it only releases the acquired
locks.
The Two-Phase Locking protocol allows each transaction to make a lock or unlock request in two steps:
Growing Phase: In this phase transaction may obtain locks but may not release any locks.
Shrinking Phase: In this phase, a transaction may release locks but not obtain any new lock
It is true that the 2PL protocol offers serializability. However, it does not ensure that deadlocks do not happen.
In the above-given diagram, you can see that local and global deadlock detectors are searching for deadlocks and
solve them with resuming transactions to their initial states.
A transaction can release a shared lock after the lock point, but it cannot release any exclusive lock
until the transaction commits. This protocol creates a cascade less schedule.
Cascading schedule: In this schedule one transaction is dependent on another transaction. So if one has
to rollback then the other has to rollback.
A transaction cannot release any lock either shared or exclusive until it commits.
The 2PL protocol guarantees serializability, but cannot guarantee that deadlock will not happen.
Example
T1 T2
Lock-X(A) Lock-X(B)
Read A; Read B;
Lock-X(B) Lock-X(A)
Here,
Lock-X(B) : Cannot execute Lock-X(B) since B is locked by T2.
Lock-X(A) : Cannot execute Lock-X(A) since A is locked by T1.
In the above situation T1 waits for B and T2 waits for A. The waiting time never ends. Both the
transaction cannot proceed further at least any one releases the lock voluntarily. This situation is called
deadlock.
The wait for graph is as follows −
Wait for graph: It is used in the deadlock detection method, creating a node for each transaction,
creating an edge Ti to Tj, if Ti is waiting to lock an item locked by Tj. A cycle in WFG indicates a
deadlock has occurred. WFG is created at regular intervals.
8. What are the various states of a transaction? Explain with a state diagram.
Answer:
A transaction is a unit of database processing which contains a set of operations. For example, deposit of money,
balance enquiry, reservation of tickets etc.
Every transaction starts with delimiters begin transaction and terminates with end transaction delimiters. The set
of operations within these two delimiters constitute one transaction.
main()
{
begin transaction
} end transaction
A transaction is divided into states to handle various situations such as failure. It passes through various states
during its lifetime. The state of a transaction is defined by the current activity it is performing.
At a particular instant of time, a transaction can be in one of the following state −
o The log is a sequence of records. Log of each transaction is maintained in some stable storage so that if
o But the process of storing the logs should be done before the actual transaction is applied in the database.
Let's assume there is a transaction to modify the City of a student. The following logs are written for this
transaction.
1. <Tn, Start>
o When the transaction modifies the City from 'Noida' to 'Bangalore', then another log is written to the file.
o When the transaction is finished, then it writes another log to indicate the end of the transaction.
1. <Tn, Commit>
o The deferred modification technique occurs if the transaction does not modify the database until it has
committed.
o In this method, all the logs are created and stored in the stable storage, and the database is updated when
a transaction commits.
o The Immediate modification technique occurs if database modification occurs while the transaction is
still active.
o In this technique, the database is modified immediately after every operation. It follows an actual
database modification.
When the system is crashed, then the system consults the log to find which transactions need to be undone and
which need to be redone.
1. If the log contains the record <Ti, Start> and <Ti, Commit> or <Ti, Commit>, then the Transaction Ti
needs to be redone.
2. If log contains record<Tn, Start> but does not contain the record either <Ti, commit> or <Ti, abort>, then
11. Show that if a relation schema BCNF then it is in 3 nf , but reverse is not true.
Answer: A Data Dictionary is an integral part of a database. It holds the information about the database and
Such as names, types, range of values, access authorization, indicate which application program
uses the data.
Meta data is used by the developers to develop the program, queries to manage and
manipulate the data.
Types of dictionaries
In general, DBMS data dictionaries are of two types. These dictionaries are as follows −
Active data dictionary
13. Explain the terms candidate key, primary key, foreign key, super key.
Answer:
o Keys play an important role in the relational database.
o It is used to uniquely identify any record or row of data from the table. It is also used to establish and
For example, ID is used as a key in the Student table because it is unique for each student. In the PERSON table,
passport_number, license_number, SSN are keys since they are unique for each person.
Types of keys:
1. Primary key
o It is the first key used to identify one and only one instance of an entity uniquely. An entity can contain
multiple keys, as we saw in the PERSON table. The key which is most suitable from those lists becomes
a primary key.
o In the EMPLOYEE table, ID can be the primary key since it is unique for each employee. In the
EMPLOYEE table, we can even select License_Number and Passport_Number as primary keys since
o A candidate key is an attribute or set of attributes that can uniquely identify a tuple.
o Except for the primary key, the remaining attributes are considered a candidate key. The candidate keys
For example: In the EMPLOYEE table, id is best suited for the primary key. The rest of the attributes, like SSN,
Passport_Number, License_Number, etc., are considered a candidate key.
3. Super Key
Super key is an attribute set that can uniquely identify a tuple. A super key is a superset of a candidate key.
For example: In the above EMPLOYEE table, for(EMPLOEE_ID, EMPLOYEE_NAME), the name of two
employees can be the same, but their EMPLYEE_ID can't be the same. Hence, this combination can also be a key.
4. Foreign key
o Foreign keys are the column of the table used to point to the primary key of another table.
o Every employee works in a specific department in a company, and employee and department are two
different entities. So we can't store the department's information in the employee table. That's why we
link these two tables through the primary key of one table.
o We add the primary key of the DEPARTMENT table, Department_Id, as a new attribute in the
EMPLOYEE table.
o In the EMPLOYEE table, Department_Id is the foreign key, and both the tables are related.
Functional Dependency
The functional dependency is a relationship that exists between two attributes. It typically exists
between the primary key and non-key attribute within a table.
1. X → Y
The left side of FD is known as a determinant, the right side of the production is known as a dependent.
For example:
Here Emp_Id attribute can uniquely identify the Emp_Name attribute of employee table because if we
know the Emp_Id, we can tell that employee name associated with it.
1. Emp_Id → Emp_Name
Example:
Example:
1. ID → Name,
2. Name → DOB
16. What is data dictionary? What do you mean by unary operation in relational algebra?
Answer:
Data Dictionary is made up of two words, data which means the collected information through multiple sources,
and dictionary meaning the place where all this information is made available.
A data dictionary is a crucial part of a relational database as it provides additional information about the
relationships between multiple tables in a database. The data dictionary in DBMS helps the user to arrange data in
a neat and well-organized way, thus preventing data redundancy.
1.
Data models provide very little information about the database, so a data dictionary is very essential to
have proper knowledge about entities, relationships, and attributes that are present in a data model.
2.
3.
The Data Dictionary provides consistency by reducing data redundancy in the collection and use of data
across various members of a team.
4.
5.
The Data Dictionary provides structured analysis and design tools by enforcing the use of data standards.
Data standards are the set of rules that govern the way data is collected, recorded, and represented.
6.
7.
Using a Data Dictionary helps to define naming conventions that are used in a model.
8.
There are mainly two types of data dictionary in a database management system:
Every relational database has an Integrated Data Dictionary contained within the DBMS. This integrated data
dictionary acts as a system catalog that is accessed and updated by the relational database. In older databases, they
did not include an integrated data dictionary, so in that case, the database administrator had to use Stand Alone
Data Dictionary. In DBMS, an Integrated Data Dictionary can bind metadata to data.
The Integrated Data Dictionary can be further classified into two types:
Active: An active data dictionary is updated automatically by the DBMS whenever any changes are
made to the database. This is also known as a self-updating dictionary as it keeps the information up-to-
date.
Passive: In contrast to an active dictionary, a passive dictionary needs to be updated manually whenever
any changes are made to the database. This type of data dictionary is difficult to handle as it requires
proper handling. Otherwise, the database and the data dictionary will get unsynchronized.
In DBMS, this type of data dictionary is very flexible as it allows the Database Administrator to define and
manage all the confidential data. It doesn't matter whether the data is computerized or not. A stand-alone data
dictionary allows database designers to interact with end-users regardless of the data dictionary format.
There is no standard format for a data dictionary. Below given are some of the common elements:
1.
Data Elements: The Data Dictionary stores the definition of all the data elements such as name,
datatype, storage formats, and validation rules.
2.
3.
Tables: All information regarding the table, such as the user who created the table, the number of rows
and columns, the date on which the table was created and accessed, etc.
4.
5.
Index: Indexes for defined database tables are stored in the data dictionary. DBMS stores the index
name used by the attributes, location, and characteristics of the index, as well as the date of creation, in
each index.
6.
7.
Programs: Programs defined to access the database, including reports, application and screen formats,
SQL queries, etc., are also stored in the data dictionary.
8.
9.
Relationship between data elements: The Data Dictionary stores the type of relationship; for example,
if it is compulsory or optional, the cardinality of the relationship and connectivity, etc.
10.
11.
Administrations and End-Users: The Data Dictionary stores all the information of the administration
along with the end-users.
12.
The metadata, which is stored in the Data Dictionary, is similar to a monitor that monitors the use of the database
and the allocation of permission to access the database by the users.
As discussed above, most businesses rely on database management systems having an integrated data dictionary as
they are updated automatically and are easy to maintain. Documentation for a data dictionary can be generated in
various types of relational databases like MySQL, SQL Server, Oracle, etc.
While creating a stand-alone data dictionary, the database administrator can take the help of a template in SQL
Server, Oracle, or even Microsoft Excel.
In the above sections, we discussed the advantages of a data dictionary, but dealing with a data dictionary has its
challenges.
A data dictionary can be difficult and time-consuming to create if we have not done any kind of data preparation.
Without proper data preparation, a data dictionary might only standardize a part of a database. While doing data
preparation for large-scale data can be a huge maintenance burden for little value and quickly becomes outdated,
An entity set that does not have a primary key is referred to as a weak entity set.
The existence of a weak entity set depends on the existence of a identifying entity set:
o It must relate to the identifying entity set via a total, one-to-many relationship set from the
identifying to the weak entity set.
o Identifying relationship depicted using a double diamond.
The discriminator (or partial key) of a weak entity set is the set of attributes that distinguishes among all
the entities of a weak entity set.
The primary key of a weak entity set is formed by the primary key of the strong entity set on which the
weak entity set is existence dependent, plus the weak entity set’s discriminator.
We depict a weak entity set by double rectangles. We underline the discriminator of a weak entity set
with a dashed line.
Example:
Note: the primary key of the strong entity set is not explicitly stored with the weak entity set, since it is
implicit in the identifying relationship.
If loan-number were explicitly stored, payment could be made a strong entity, but then the relationship
between payment and loan would be duplicated by an implicit relationship defined by the attribute loan-
number common to payment and loan.
Advantages
Disadvantages
Deadlock Handling
Ostrich Algorithm
Ostrich Algorithm is an approach of handling deadlock that involves ignoring the deadlock and
pretending that it would never occur. This approach derives its name from the behavior of an Ostrich
which is “to stick one’s head in the sand and pretend there is no problem”. Windows and UNIX-based
systems use this approach of handling a deadlock.
This is because deadlock is a very rare case and the cost of handling a deadlock is very high. You
might have encountered a situation when your system got hanged up and to fix it a restart was needed.
In this case, the Operating system ignores the deadlock as the time required to handle the deadlock is
higher than the rebooting time of windows. Rebooting is a preferred choice, considering the rarity of
deadlock in Windows.
Deadlock Avoidance
Deadlock avoidance is a technique of detecting any deadlock in advance. Methods like Wait-
For graph can be used in smaller databases to detect deadlocks, but in the case of larger
databases deadlock prevention measures have to be used.
When a database gets stuck in a state of deadlock, it is preferred to avoid using that database
instead of aborting or rebooting the database server as it wastes of both time and resources.
Deadlock Detection
While a database transaction, if a task waits indefinitely to obtain CPU resources, then DBMS has to
check whether that task is in a state of deadlock or not. To detect a deadlock a resource scheduler is
used. A resource scheduler can detect deadlock by keeping track of resources allocated to a specific
transaction and requested by another transaction.
Wait-For Graph: This method of detecting deadlocks involves creating a graph based on a
transaction and its acquired lock (a lock is a way of preventing multiple transactions from
accessing any resource simultaneously). If the created graph contains a cycle/loop, it means a
deadlock exists.
DBMS creates a wait-for graph for every transaction/task that is in a waiting state and keeps on
checking whether there exists a cycle in any of the graphs or not
The above is a wait-for graph representation for two transactions T1 and T2 in a deadlock situation.
Deadlock Prevention
1. Avoiding one or more of the above-stated Coffman conditions can lead to the prevention of a
deadlock. Deadlock prevention in larger databases is a much more feasible situation rather than
handling it.
2. The DBMS is made to efficiently analyze every database transaction, whether they can cause a
deadlock or not, if any of the transactions can lead to a deadlock, then that transaction is never
executed.
3. Wait-Die Scheme: When a transaction requests a resource that is already locked by some other
transaction, then the DBMS checks the timestamp of both the transactions and makes the older
transaction wait until that resource is available for execution.
4. Wound-wait Scheme: When an older transaction demands a resource that is already locked by a
younger transaction (a transaction that is initiated later), the younger transaction is forced to kill/stop its
processing and release the locked resource for the older transaction's own execution. The younger
transaction is now restarted with a one-minute delay, but the timestamp remains the same. If a younger
transaction requests a resource held by an older one, the younger transaction is made to wait until the
older one releases the resource.
Strength 3NF is comparatively less strong than BCNF is comparatively much stronger
that of the BCNF. than that of the 3NF.
Functional In the case of 3NF, preservation occurs In the case of BCNF, there is no
Dependencies for all the functional dependencies. preservation for all the functional
dependencies.
1. In Conservative 2-PL, A transaction has to acquire In Strict 2-PL, A transaction can acquire locks on
locks on all the data items it requires before the data items whenever it requires (only in growing
transaction begins it execution. phase) during its execution.
4. It ensures that the schedule generated would be It ensures that the schedule generated would be
Serializable and Deadlock-Free. Serializable, Recoverable and Cascadeless.
5. It does not ensures Recoverable and Cascadeless It does not ensures Deadlock-Free schedule.
schedule.
6. It does not ensure Strict Schedule. It ensures that the schedule generated would
be Strict.
7. It is less popular as compared to Strict 2-PL. It is the most popular variation of 2-PL.
9. In Conservative 2-PL, a transaction can read a In Strict 2-PL, a transaction only reads value of
value of uncommitted transaction. committed transaction.
26. Discuss the advantages and disadvantages of using DBMS approach as compared to using a
conventional file system.
Answer: pdf book from page 5 to 9.
Generalization
Generalization is a process of generalizing an entity which contains generalized attributes or properties of
generalized entities. The entity that is created will contain the common features. Generalization is a Bottom up
process.
We can have three sub entities as Car, Truck, Motorcycle and these three entities can be generalized into one
general super class as Vehicle.
It is a form of abstraction that specifies two or more entities (sub class) having common characters that can be
generalized into one single entity (super class) at higher level hiding all the differences.
Specialization
Specialization is a process of identifying subsets of an entity that shares different characteristics. It breaks an
entity into multiple entities from higher level (super class) to lower level (sub class). The breaking of higher level
entity is based on some distinguishing characteristics of the entities in super class.
It is a top down approach in which we first define the super class and then sub class and then their attributes and
relationships.
Aggregation
Aggregation represents relationship between a whole object and its component. Using aggregation we can express
relationship among relationships. Aggregation shows ‘has-a’ or ‘is-part-of’ relationship between entities where
one represents the ‘whole’ and other ‘part’.
Consider a ternary relationship Works_On between Employee, Branch and Manager. Now the best way to model
this situation is to use aggregation, So, the relationship-set, Works_On is a higher level entity-set. Such an entity-
set is treated in the same manner as any other entity-set. We can create a binary relationship, Manager, between
Works_On and Manager to represent who manages what tasks.
28. Explain the terms 'partial functional dependency' and 'non-transitive dependency' with example.
Answer:
Partial Dependency?
Partial Dependency occurs when a non-prime attribute is functionally dependent on part of a candidate key.
The 2nd Normal Form (2NF) eliminates the Partial Dependency.
Let us see an example −
Example
<StudentProject>
StudentID = Unique ID of the studentStudentName = Name of the studentProjectNo = Unique ID of the projectProjectName = Name
of the project
As stated, the non-prime attributes i.e. StudentName and ProjectName should be functionally dependent on part
of a candidate key, to be Partial Dependent.
The StudentName can be determined by StudentID, which makes the relation Partial Dependent.
The ProjectName can be determined by ProjectNo, which makes the relation Partial Dependent.
Therefore, the <StudentProject> relation violates the 2NF in Normalization and is considered a bad database
design.
To remove Partial Dependency and violation on 2NF, decompose the tables −
<StudentInfo>
ProjectNo ProjectName
199 Geo Location
120 Cluster Exploration
Now the relation is in 2nd Normal form of Database Normalization.
29. With suitable examples show how recovery in a database system can be done using LOG file with :
i) immediate updation
ii) differed updation.
Answer:
o The deferred modification technique occurs if the transaction does not modify the database until it has
committed.
o In this method, all the logs are created and stored in the stable storage, and the database is updated when
a transaction commits.
2. Immediate database modification:
o The Immediate modification technique occurs if database modification occurs while the transaction is
still active.
o In this technique, the database is modified immediately after every operation. It follows an actual
database modification.
For a tangible example, let’s look at the orders table in our database again. The user_id column here
corresponds with the user_id column in the users table, and the product_sku column corresponds with
the product_sku column in the books table.
When we’re setting up this table, it would make sense to add foreign key rules to
both orders.user_id and orders.product_sku:
Using these foreign keys saves us from having to store the same data repeatedly – we don’t have to
store the user’s name in the orders table, because we can use orders.user_id to reference that user’s
unique row in users.user_id to get their name and other information about them.
But the real purpose of foreign keys is that they add a restriction: entries to the table with a foreign
key must have a value that corresponds with the ‘foreign’ table column.
This restriction is called a foreign key constraint. Let’s take a look at foreign key constraints in more
detail.
o The Timestamp Ordering Protocol is used to order the transactions based on their Timestamps. The order
execution time. But Timestamp based protocols start working as soon as a transaction is created.
o Let's assume there are two transactions T1 and T2. Suppose the transaction T1 has entered the system at
007 times and transaction T2 has entered the system at 009 times. T1 has the higher priority, so it
data.
1. Check the following condition whenever a transaction Ti issues a Read (X) operation:
o If TS(Ti) < W_TS(X) then the operation is rejected and Ti is rolled back otherwise the operation is
executed.
Where,
o But the schedule may not be recoverable and may not even be cascade- free.
Deadlock Prevention
To prevent any deadlock situation in the system, the DBMS aggressively inspects all the operations, where
transactions are about to execute. The DBMS inspects the operations and analyzes if they can create a deadlock
situation. If it finds that a deadlock situation might occur, then that transaction is never allowed to be executed.
There are deadlock prevention schemes that use timestamp ordering mechanism of transactions in order to
predetermine a deadlock situation.
Wait-Die Scheme
In this scheme, if a transaction requests to lock a resource (data item), which is already held with a conflicting lock
by another transaction, then one of the two possibilities may occur −
If TS(Ti) < TS(Tj) − that is Ti, which is requesting a conflicting lock, is older than Tj − then Ti is allowed to
wait until the data-item is available.
If TS(Ti) > TS(tj) − that is Ti is younger than Tj − then Ti dies. Ti is restarted later with a random delay but
with the same timestamp.
This scheme allows the older transaction to wait but kills the younger one.
Wound-Wait Scheme
In this scheme, if a transaction requests to lock a resource (data item), which is already held with conflicting lock
by some another transaction, one of the two possibilities may occur −
If TS(Ti) < TS(Tj), then Ti forces Tj to be rolled back − that is Ti wounds Tj. Tj is restarted later with a
random delay but with the same timestamp.
If TS(Ti) > TS(Tj), then Ti is forced to wait until the resource is available.
This scheme, allows the younger transaction to wait; but when an older transaction requests an item held by a
younger one, the older transaction forces the younger one to abort and release the item.
In both the cases, the transaction that enters the system at a later stage is aborted.
Deadlock Avoidance
Aborting a transaction is not always a practical approach. Instead, deadlock avoidance mechanisms can be used to
detect any deadlock situation in advance. Methods like "wait-for graph" are available but they are suitable for only
those systems where transactions are lightweight having fewer instances of resource. In a bulky system, deadlock
prevention techniques may work well.
Wait-for Graph
This is a simple method available to track if any deadlock situation may arise. For each transaction entering into
the system, a node is created. When a transaction Ti requests for a lock on an item, say X, which is held by some
other transaction Tj, a directed edge is created from Ti to Tj. If Tj releases item X, the edge between them is
dropped and Ti locks the data item.
The system maintains this wait-for graph for every transaction waiting for some data items held by others. The
system keeps checking if there's any cycle in the graph.
The second option is to roll back one of the transactions. It is not always feasible to roll back the younger
transaction, as it may be important than the older one. With the help of some relative algorithm, a transaction
is chosen, which is to be aborted. This transaction is known as the victim and the process is known as victim
selection.
If we union the sub relations X1 and X2, then it should consist of all the attributes available before the
decomposition in the original relation X.
The intersections of X1 and X2 can never be Null. There must be a common attribute in the sub relation.
This common attribute must consist of some unique data/information.
Here, the common attribute needs to be the super key of the sub relations, either X1 or X2.
In this case,
X = (P, Q, R)
X1 = (P, Q)
X2 = (Q, W)
The relation X here consists of three attributes P, Q, and R. The relation X here decomposes into two separate
relations X1 and X2. Thus, each of these X1 and X2 both have two attributes. The common attribute among each
of these is Q.
Remember that the value present in column Q has to be unique. In case it consists of a duplicate value, then a
lossless-join decomposition would not be possible here.
Example 1
Draw a table with the relation X that has raw data:
X (P, Q, R)
P Q R
37 25 16
29 18 35
16 39 28
This relation would decompose into the following sub relations, X1 and X2:
X1 (P, Q)
P Q
37 25
29 18
16 39
X2 (Q, R)
Q R
25 16
18 35
39 28
Let us now check the first condition that satisfies the lossless-join decomposition. Here, the union of
the sub relations X1 and X2 generate the same results as the relation X.
X1 ∩ X2 = X
Here, we will get the result as follows:
X (P, Q, R)
P Q R
37 25 16
29 18 35
16 39 28
This relation is similar to the original relation X. Thus, this decomposition can be considered as the lossless join
decomposition in DBMS.
Example 2
Let us take a look at an example:
<Cand_Info>
Sec_1 G0091 HR
Indexing is used to quickly retrieve particular data from the database. Formally we can define Indexing as a
technique that uses data structures to optimize the searching time of a database query. Indexing reduces the number
of disks required to access a particular data by internally creating an index table.
Index usually consists of two columns which are a key-value pair. The two columns of the index table(i.e., the
key-value pair) contain copies of selected columns of the tabular data of the database.
Here, Search Key contains the copy of the Primary Key or the Candidate Key of the database table. Generally, we
store the selected Primary or Candidate keys in a sorted manner so that we can reduce the overall query time or
search time(from linear to binary).
Data Reference contains a set of pointers that holds the address of the disk block. The pointed disk block contains
the actual data referred to by the Search Key. Data Reference is also called Block Pointer because it uses block-
based addressing.
Ordered indexing is the traditional way of storing that gives fast retrieval. The indices are stored in a sorted manner hence it is
also known as ordered indices.
1. Dense Indexing: In dense indexing, the index table contains records for every search key value of the database. This
makes searching faster but requires a lot more space. It is like primary indexing but contains a record for every search
key.
Example:
1. Sparse Indexing: Sparse indexing consumes lesser space than dense indexing, but it is a bit slower as well. We do
not include a search key for every record despite that we store a Search key that points to a block. The pointed block
further contains a group of data. Sometimes we have to perform double searching this makes sparse indexing a bit
slower.
Example:
Answer:
According to the attributes defined above, we divide indexing into three types:
It is somewhat like the index (or the table of contents) found in a book. Index of a book contains topic names along
with the page number similarly the index table of the database contains keys and their corresponding block
address.
1. Primary Indexing: The indexing or the index table created using Primary keys is known as Primary
Indexing. It is defined on ordered data. As the index is comprised of primary keys, they are unique, not
null, and possess one to one relationship with the data blocks.
Example:
Characteristics of Primary Indexing:
1. Secondary Indexing: It is a two-level indexing technique used to reduce the mapping size of the primary index. The
secondary index points to a certain location where the data is to be found but the actual data is not sorted like in the
primary indexing. Secondary Indexing is also known as non-clustered Indexing.
Example:
1. Cluster Indexing: Clustered Indexing is used when there are multiple related records found at one
place. It is defined on ordered data. The important thing to note here is that the index table of clustered
indexing is created using non-key values which may or may not be unique. To achieve faster retrieval,
we group columns having similar characteristics. The indexes are created using these groups and this
process is known as Clustering Index.
Example:
Characteristics of Clustered Indexing:
40. All candidate keys are super keys but all super keys are not canidate key. Justify with
suitable example.
Answer:
A superkey SK is a subset of attributes of a relation schema R, such that for any two
distinct tuples t1 and t2 in relation state r of R, we have t1 [SK] ≠ t2 [SK]. For example,
consider a relation schema BOOK with three attributes ISBN, Book_title, and Category. The
value of ISBN is unique for each tuple; hence, {ISBN } is a superkey. In addition, the
combination of all the attributes, that is, {ISBN, Book_title, Category} is a default superkey for
this relation schema.
Generally, all the attributes of a superkey are not required to identify each tuple uniquely
in a relation. Instead, only a subset of attributes of the superkey is sufficient to uniquely
identify each tuple. Further, if any attribute is removed from this subset, the remaining set
of attributes can no longer serve as a superkey. Such a minimal set of attributes, say K, is
a candidate key (also known as irreducible superkey). For example, the
superkey {ISBN, Book_title, Category} is not a candidate key, since its subset {ISBN} is a
minimal set of attributes that uniquely identify each tuple of BOOK relation. So, ISBN is a
candidate key as well as superkey. Hence, it is concluded that all candidate keys are
superkeys, whereas all superkeys are not candidate keys.
The transitive dependency can occur easily only in the case of some given relation of
three or more attributes. Such a type of dependency helps us in normalizing the
database in their 3rd Normal Form (3NF).