Database Management System Notes
Database Management System Notes
Now, both tables satisfy BCNF because in each table, the determinant (EmployeeID,
ProjectID) is a superkey, and there are no non-trivial dependencies on proper subsets
of superkeys.
Q 4. What do you mean by lossless decomposition ? Explain with suitable
example how functional dependencies can be used to show that
decompositions are lossless.
Ans- Lossless decomposition is a property of database normalization that
ensures that when a relation (table) is decomposed into multiple smaller
relations, the original information can be reconstructed without loss. In other
words, after decomposing a relation, joining the decomposed relations should
yield the original relation.
Functional dependencies play a crucial role in demonstrating lossless
decomposition. If a set of functional dependencies is preserved in the
decomposition, it guarantees that the information losslessly decomposed can
be reconstructed by joining the decomposed relations.
Example-
relation R having attributes (A, B) and a functional dependency A → B.
Original relation R:
|A|B|
|---|---|
|1|x|
|2|y|
|3|z|
Now, let's decompose R into two relations:
1. Relation R1 with attributes (A, B)
2. Relation R2 with attributes (A, C)
Both R1 and R2 have the common attribute A. We want to show that the
decomposition is lossless.
1. Projection (π) on the Common Attribute (A):
• π(A) on R1 and R2 yields the same set of values:
R1:
|A|
|---|
|1|
|2|
|3|
R2:
|A|
|---|
|1|
|2|
|3|
2. Join (⨝) on the Common Attribute (A):
• Joining R1 and R2 using the common attribute A results in a relation with
attributes (A, B, C):
R1 ⨝(π(A) ⨝ R2):
|A|B|C|
|---|---|---|
|1|x|p|
|2|y|q|
|3|z|r|
This result contains all the attributes from the original relation R. The
decomposition is lossless, as we can reconstruct the original relation by joining
R1 and R2 using the common attribute A and the values in column C are
arbitrary.
Q 5. What is MVD and join dependency ? Describe
Ans-
Multivalued Dependency (MVD):
A Multi-Valued Dependency (MVD) is a concept in database theory that
extends the idea of functional dependencies to handle situations where an
attribute in a table depends on a set of attributes rather than a single attribute.
MVDs express relationships involving sets of values rather than individual
values.
Given a relation R with attributes A, B, and C, an MVD A →→ B (read as "A
multi-determines B") means that for each combination of values of A, there is a
set of values for B that is independent of the other attributes in the relation.
For example, consider a relation representing employees and their projects:
| EmployeeID | ProjectID | Skills |
|------------|-----------|--------------|
|1 | 101 | Programming |
|1 | 102 | Database |
|2 | 101 | Design |
Here, if we have the MVD {EmployeeID, ProjectID} →→ Skills, it means that for
each combination of EmployeeID and ProjectID, there is a set of skills that is
independent of other combinations.
Join Dependency:
• A join dependency is a constraint on a relational schema that specifies
how a relation can be reconstructed by joining other relations. It deals
with the relationships among attributes across multiple relations.
• If a set of relations {R1, R2, ..., Rn} satisfies a join dependency D, it means
that any legal instance of the relations can be reconstructed by joining
them.
• A join dependency is represented as D = {R1, R2, ..., Rn}, where the set of
relations {R1, R2, ..., Rn} must satisfy the join dependency.
For example:
• R1: | A | B |
• |---|---|
• |1|x|
• |2|y|
• R2: | A | C |
• |---|---|
• |1|p|
• |2|q|
The join dependency D = {R1, R2} indicates that any instance of R1 and R2
can be reconstructed by joining them on the common attribute A.
Q 6. Explain the fourth and fifth normal with suitable example. Also
differentiate between BCNF and 3NF.
Ans- Fourth Normal Form (4NF):
• Definition:
• A relation is in 4NF if it is in Boyce-Codd Normal Form (BCNF) and
has no non-trivial multivalued dependencies (MVDs).
• An MVD X →→ Y states that, for each value of X, there is a set of
values for Y that is independent of all other attributes.
• Example:
• Consider a relation with attributes (EmployeeID, ProjectID, Skill),
and the MVD EmployeeID →→ ProjectID.
| EmployeeID | ProjectID | Skill |
|------------|-----------|---------|
|1 | 101 | Java |
|1 | 102 | SQL |
|2 | 101 | Python |
|2 | 103 | Java |
In this example, for each employee (EmployeeID), there is a set of projects
(ProjectID) associated with them, and the skill is independent of other
attributes. To bring it to 4NF, you might create two relations: one for
(EmployeeID, ProjectID) and another for (EmployeeID, Skill).
Fifth Normal Form (5NF):
• Definition:
• A relation is in 5NF if it is in 4NF and has no non-trivial join
dependencies.
• A join dependency is a constraint on a relational schema that
specifies how a relation can be reconstructed by joining other
relations.
• Example:
Consider a relation with attributes (CourseID, StudentID,Instructor), and the
join dependency {CourseID, StudentID} → Instructor.
| CourseID | StudentID | Instructor |
|----------|-----------|------------|
| Math |1 | Dr. Smith |
| Math |2 | Dr. Johnson |
| Physics | 1 | Dr. White |
| Physics | 2 | Dr. Brown |
In this example, the join dependency states that for each combination of
CourseID and StudentID, there is a unique value for the Instructor. To achieve
5NF, you might decompose the relation into two: one for {CourseID, StudentID}
and another for {CourseID, Instructor}.
Sno. Criteria BCNF 3NF
Consider a table with attributes (A, Consider a table with attributes (A,
B, C) and functional dependencies A B, C) and functional dependencies A
→ B, B → C. Decomposing it to BCNF → B, B → C. Decomposing it to 3NF
results in two tables: (A, B) and (B, results in two tables: (A, B) and (B,
5. Example C). C).
1. Recoverable Schedule:
• This schedule is recoverable because if T1 commits first, T2's write
operation on B is executed only after T1's write operation on A.
Therefore, the changes made by T1 are not visible to T2 until T1
commits.
2. Cascadeless Schedule:
• This schedule is cascadeless because T2's read operation on B only
occurs after T1's write operation on A has committed. T2 does not
read the uncommitted value of B modified by T1. Thus, there is no
cascading effect of uncommitted changes.
3. Strict Schedule:
• This schedule is strict because T2's read operation on B only occurs
after T1's write operation on A has committed. In a strict schedule,
no transaction can read the value of any write operation until that
write operation is committed.
Serializable Schedule: A schedule is serializable if it is equivalent to some serial
execution of its transactions. Serializable schedules ensure that the final state
of the database is the same as if the transactions were executed in some serial
order.
Conflict Serializability:
Conflict serializability is a criterion for determining whether a schedule is
equivalent to some serial execution. Two actions conflict if they belong to
different transactions, and at least one of them is a write operation. A schedule
is conflict serializable if it can be transformed into a serial schedule by
swapping non-conflicting actions while preserving the order of conflicting
actions
Example:
In this schedule, the conflicting actions are the write operations on A in T1 and
T2. To determine conflict serializability, we can swap non-conflicting actions
while preserving the order of conflicting actions. In this case, the read
operations do not conflict, so we can swap them:
Q 3. What is log? Explain log based recovery. What is log file ? Write the steps
for log based recovery of a system with suitable example
Ans- Log: A log is a sequential record of events, actions, or changes made to a
system. Logs are crucial for maintaining system integrity, recovering from
failures, and ensuring consistency.
Log-Based Recovery: Log-based recovery is a mechanism used to restore a
system to a consistent state after a failure. It involves the use of a transaction
log, which records all changes made to the system's database. In the event of a
system failure, the log is consulted to redo or undo transactions, ensuring that
the database is brought back to a consistent state.
Log File: A log file is a file that contains the transaction log, which records the
sequence of actions or changes made to the system. Each entry in the log file
corresponds to a specific event or operation, such as the start or end of a
transaction, a commit, or a write operation.
Steps for Log-Based Recovery: Log-based recovery typically involves three
phases: the redo phase, the undo phase and commit phase. Here are the steps
for log-based recovery:
1. Redo Phase:
• During the redo phase, transactions that were in progress at the
time of failure are redone to ensure that their changes are applied
to the database.
2. Undo Phase:
• During the undo phase, transactions that were not completed (i.e., not
committed) at the time of failure are undone to rollback their changes and
maintain consistency.
3. Commit Phase:
• After the redo and undo phases, the system reaches a consistent state. The
commit phase involves marking all recovered transactions as committed.
Q 5. What is distributed databases and its types? What are the advantages
and disadvantages of distributed databases ? what is the difference between
data replication and data fragmentation with all its type.
Ans-
1. A distributed database system consists of collection of sites, connected
together through a communication network.
2. Each site is a database system site in its own right and the sites have agreed
to work together, so that a user at any site can access anywhere in the network
as if the data were all stored at the user’s local site.
3. Each side has its own local database.
4. A distributed database is fragmented into smaller data sets.
5. DDBMS can handle both local and global transactions
Distributed databases are classified as :
1. Homogeneous distributed database :
a. In this, all sites have identical database management system software.
b. All sites are aware of one another, and agree to co-operate in
processing user’s requests.
2. Heterogeneous distributed database :
a. In this, different sites may use different schemas, and different
database management system software.
b. The sites may not be aware of one another, and they may provide only
limited facilities for co-operation in transaction processing
Advantages of Distributed Databases:
1. Improved Availability and Reliability:
• Distribution of data across multiple locations reduces the risk of a
single point of failure, enhancing system reliability and availability.
2. Scalability:
• Distributed databases can be easily scaled horizontally by adding
more nodes to the network, accommodating increasing data and
user demands.
3. Improved Performance:
• Data can be located closer to the users, reducing latency and
improving query performance.
4. Fault Tolerance:
• Distributed databases can continue to function even if some nodes
experience failures, ensuring fault tolerance and data integrity.
5. Local Autonomy:
• Each local site may have some degree of autonomy, allowing it to
manage its portion of the database independently.
Disadvantages of Distributed Databases:
1. Complexity:
• Managing a distributed system is more complex than a centralized
one, requiring sophisticated coordination and communication
mechanisms.
2. Security Challenges:
• Distributed systems may face additional security challenges due to
the need for secure communication and coordination across
different nodes.
3. Data Consistency:
• Ensuring consistency of data across distributed nodes can be
challenging, especially in the presence of network failures or
delays.
4. Cost:
• The initial setup and maintenance costs of a distributed database
system can be higher compared to a centralized system.
5. Synchronization Issues:
• Coordinating updates and ensuring consistency among distributed
copies of the database can lead to synchronization challenges.
Difference between Data Replication and Data Fragmentation:
Data Replication:
• Definition: Data replication involves creating and maintaining copies of
the same data at multiple locations.
• Types:
• Full Replication: Entire database is copied to each site.
• Partial Replication: Only a subset of the database is copied to each
site.
• Advantages:
• Improved data availability and reliability.
• Enhanced query performance, as data can be accessed locally.
• Disadvantages:
• Increased storage requirements.
• Synchronization challenges to maintain consistency.
Data Fragmentation:
• Definition: Data fragmentation involves dividing a database into
fragments, each stored at different locations.
• Types:
• Horizontal Fragmentation: Divides the rows of a table.
• Vertical Fragmentation: Divides the columns of a table.
• Hybrid Fragmentation: Combination of both horizontal and
vertical fragmentation.
• Advantages:
• Improved query performance, as each site only accesses relevant
fragments.
• Allows for local autonomy in managing specific fragments.
• Disadvantages:
• Coordination challenges to ensure data consistency.
• Increased complexity in managing fragmented data.
Multi-Version Scheme:
In a multi-version concurrency control scheme (MVCC), each write
operation creates a new version of the data item. This allows multiple
versions of the same data item to coexist. Each version is associated with a
timestamp or a version number.
• Read Operation:
• A transaction reads the version of a data item that was committed
before the transaction's start time.
• Write Operation:
• A transaction creates a new version of a data item when it writes.
This new version is associated with the transaction's timestamp.
• Advantages:
• Read operations are not blocked by write operations, and vice
versa.
• Provides a consistent snapshot of the database for each
transaction.
• Disadvantages:
• Increased storage requirements due to multiple versions of data
items.
Q 2. Explain 2 phase locking for concurrency control.
Ans- the two phase locking(2PL) is a concurrency control mechanism used in
database management system to ensure the consistency and integrity of
the data when multiple transaction are executed concurrently.
The two phase locking protocol consist of 2 phases:
1.Growing phase ( Lock Acquisition)
- During this phase, a transaction is allowed to acquire locks but is not
allowed to release any locks.
- The transaction can acquire locks on data items until it has acquired all the
lock it requires
2. Shrinking phase(Lock Release)
- Once a transaction releases its first lock, it enters the shrinking phase.
- In the shrinking phase, the transaction is not allowed to acquire any new
locks, but it can release locks that it holds.
- After releasing a lock, a transaction cannot acquires any new locks.
- The transaction proceeds to release all its locks.
Advantages of Two-Phase Locking:
1. Serializability: Guarantees that the execution of transactions is
serializable, ensuring consistency of the database.
2. Deadlock Prevention: Helps prevent deadlocks by ensuring that
transactions follow a strict protocol for acquiring and releasing locks.
Disadvantages of Two-Phase Locking:
1. Conservative Approach: Can lead to inefficiencies when transactions are
unnecessarily delayed due to the conservative lock acquisition strategy.
2. Locking Overhead: The need to acquire and release locks incurs
additional overhead in terms of both time and system resources.
Example- consider two transactions T1 and T2
T1: Lock(x)
Read(x)
Lock(y)
Write(x)
Unlock(x)
Unlock(y)
T2: Lock(y)
Read(y)
Lock(x)
Write(y)
Unlock(y)
Unlock(x)
Q 3. How do optimistic concurrency control techniques differ from other
concurrency control techniques? Why they are also called validation or
certification techniques? Discuss the typical phases of an optimistic
concurrency control method.
Ans-
Optimistic Concurrency Control:
1. Read Phase:
• Transactions read data items without acquiring locks. The current state
of the data is recorded or buffered locally for the transaction
.
2. Validation Phase:
• At the time of transaction commitment, the system checks whether any
conflicts exist between the locally recorded state of the transaction and
the current state of the database. Conflicts may arise if another
transaction has modified the same data items concurrently.
3. Conflict Detection:
• Conflicts are detected by comparing the timestamps, versions, or other
markers associated with the data items. If conflicts are detected, the
transaction is considered potentially invalid.
4. Resolution of Conflicts:
• If conflicts are detected, the system may employ conflict resolution
mechanisms to address them. This may involve aborting one or more
transactions, forcing them to roll back and retry.
5. Write Phase:
• If the transaction passes the validation phase without conflicts, it
acquires locks on the data items and proceeds to the write phase. The
changes are applied to the database, and the transaction is committed.
Disadvantages:
Multiple granularity :
1. Multiple granularity can be defined as hierarchically breaking up the
database into blocks which can be locked.
2. It maintains the track of what to lock and how to lock.
3. It makes easy to decide either to lock a data item or to unlock a data
item.
Implementation :
1. Multiple granularity is implemented in transaction system by defining
multiple levels of granularity by allowing data items to be of various
sizes and defining a hierarchy of data granularity where the small
granularities are nested within larger ones.
2. In the tree, a non leaf node represents the data associated with its
descendents.
3. Each node is an independent data item.
4. The highest level represents the entire database.
5. Each node in the tree can be locked individually using shared or exclusive
mode locks.
6. If a node is locked in an intention mode, explicit locking is being done at
lower level of the tree (that is, at a finer granularity).
7. Intention locks are put on all the ancestors of a node before that node is
locked explicitly.
8. While traversing the tree, the transaction locks the various nodes in an
intention mode. This hierarchy can be represented graphically as a tree
Q 5. What is concurrency control ? Why it is needed in database system.
Ans-
1. Concurrency Control (CC) is a process to ensure that data is updated
correctly and appropriately when multiple transactions are concurrently
executed in DBMS.
2. It is a mechanism for correctness when two or more database transactions
that access the same data or dataset are executed concurrently with time
overlap.
3. In general, concurrency control is an essential part of transaction
management.
Concurrency control is needed :
1. To ensure consistency in the database. 2. To prevent following problem : a.
Lost update : i. A second transaction writes a second value of a data item on
top of a first value written by a first concurrent transaction, and the first value
is lost to other transactions running concurrently which need, by their
precedence, to read the first value. ii. The transactions that have read the
wrong value end with incorrect results. b. Dirty read : i. Transactions read a
value written by a transaction that has been later aborted. ii. This value
disappears from the database upon abort, and should not have been read by
any transaction (“dirty read”). iii. The reading transactions end with incorrect
results.