0% found this document useful (0 votes)
17 views

Database Management System Notes

The document provides definitions and explanations of key concepts related to database design and normalization including: - Functional dependencies and the different types like full, partial, trivial, and non-trivial dependencies. Canonical covers are defined as a minimal representation of functional dependencies. - Armstrong's axioms which are a set of inference rules used to reason about functional dependencies. The five axioms - reflexivity, augmentation, transitivity, union, and decomposition - are described. - Normal forms up to third normal form (3NF) are defined including when a relation is in 1NF, 2NF, 3NF. Boyce-Codd normal form (BCNF) is also explained using an example.

Uploaded by

abhinavbisht7o7
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Database Management System Notes

The document provides definitions and explanations of key concepts related to database design and normalization including: - Functional dependencies and the different types like full, partial, trivial, and non-trivial dependencies. Canonical covers are defined as a minimal representation of functional dependencies. - Armstrong's axioms which are a set of inference rules used to reason about functional dependencies. The five axioms - reflexivity, augmentation, transitivity, union, and decomposition - are described. - Normal forms up to third normal form (3NF) are defined including when a relation is in 1NF, 2NF, 3NF. Boyce-Codd normal form (BCNF) is also explained using an example.

Uploaded by

abhinavbisht7o7
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Database Management System Notes

➔ Chapter : 3 - Database Design & Normalization

Q 1. What is functional dependency? explain full functional dependency and


partial functional dependency. explain trivial and non trivial functional
dependency and also define canonical cover.
Ans- A functional dependency is a constraint between two sets of attributes
from the database. A functional dependency is denoted by X → Y, between two
sets of attributes X and Y that are subsets of R specifies a constraint on the
possible tuples that can form a relation r.
The constraint for any two tuples t1 and t2 , in r which have
t1 [X] = t2 [X];
Also, must have
t1 [Y] = t2 [Y];
1. Full Functional Dependency:
• A functional dependency A → B is called full functional
dependency when B is fully functionally dependent on A, and
there is no other proper subset of A that also determines B. In
other words, removing any attribute from A will result in a loss of
the functional dependency.
2. Partial Functional Dependency:
• A functional dependency A → B is called partial functional
dependency when B is functionally dependent on a part (proper
subset) of A, not the whole set A. In this case, removing some
attributes from A may still determine B.
Trivial and Non-Trivial Functional Dependency:
A functional dependency A → B is trivial if B is a subset of A. In other words, if
B can be determined by A because all the attributes in B are already in A, it is
considered trivial.
A functional dependency is non-trivial if B is not a subset of A. It provides new
information and is considered meaningful.
For example: Let a relation R (A, B, C) The following functional dependencies
are non-trivial : A → B (B is not a subset of A) A → C (C is not a subset of A) The
following dependencies are trivial : {A, B} → B [B is a subset of {A, B}]
Canonical covers/ minimal covers:
A canonical cover is a way of representing a set of functional dependencies in a
minimal form. It eliminates redundant dependencies while preserving the
closure of the original set.
Q 2. Describe Armstrong’s axioms in detail. What is the role of these rules in
database development process?
Ans- Armstrong's axioms are a set of inference rules that are used for
reasoning about functional dependencies in a relational database.
1. Reflexivity (or Identity):
• If Y is a subset of X, then X determines Y.
• Symbolically, if Y ⊆ X, then X → Y.
2. Augmentation (or Additivity):
• If X → Y, then adding an attribute Z to both sides of the
dependency results in XZ → YZ.
• Symbolically, if X → Y, then XZ → YZ.
3. Transitivity:
• If X → Y and Y → Z, then X → Z.
• Symbolically, if X → Y and Y → Z, then X → Z.
4. Union (or Combination):
• If X → Y and X → Z, then you can combine these dependencies to
derive X → YZ.
• Symbolically, if X → Y and X → Z, then X → YZ.
5. Decomposition (or Project):
• If X → YZ, then you can decompose this dependency into two
dependencies: X → Y and X → Z.
• Symbolically, if X → YZ, then X → Y and X → Z.
Q 3. Define normal forms. List the definitions of first, second and third normal
forms. Explain BCNF with a suitable example.
Ans- Database normalization is a process in database design that involves
organizing the data in a relational database to reduce redundancy and improve
data integrity. Normal forms are a set of guidelines or rules that define the
structure of well-formed relational databases. The most common normal forms
are the First Normal Form (1NF), Second Normal Form (2NF), Third Normal
Form (3NF), and Boyce-Codd Normal Form (BCNF).
1. First Normal Form (1NF):
• A relation is in 1NF if all its attributes (columns) contain atomic
(indivisible) values. It means that each column should have a
single, non-repeating value, and there should be no sets, arrays, or
nested structures.
• Example: Consider a table representing students and their courses.
StudentID Courses
1 Math, Physics
2 Chemistry, Biology
To convert it to 1NF, you would split the Courses column:
StudentID Course
1 Math
1 Physics
2 Chemistry
2 Biology

2. Second Normal Form (2NF):


• A relation is in 2NF if it is in 1NF and all non-key attributes are fully
functionally dependent on the primary key. In other words, every
non-key attribute should be functionally dependent on the entire
primary key, not just part of it.
• Example: Consider a table with composite primary key (StudentID, Course) and
non-prime attribute (Instructor).
StudentID Course Instructor
1 Math Dr. Smith
1 Physics Dr. Johnson
2 Chemistry Dr. Brown
2 Biology Dr. White
To convert it to 2NF, you would create two tables:
• Table 1: StudentsCourses (StudentID, Course)
• Table 2: CoursesInstructors (Course, Instructor)

3. Third Normal Form (3NF):


• A relation is in 3NF if it is in 2NF, and no transitive dependencies
exist. This means that non-key attributes should not depend on
other non-key attributes.
• Example: Consider a table with attributes (StudentID, Course, Instructor,
InstructorOffice).
StudentID Course Instructor InstructorOffice
1 Math Dr. Smith Room 101
1 Physics Dr. Johnson Room 102
2 Chemistry Dr. Brown Room 103
2 Biology Dr. White Room 104
To convert it to 3NF, you would create three tables:
• Table 1: StudentsCourses (StudentID, Course)
• Table 2: CoursesInstructors (Course, Instructor)
• Table 3: InstructorsOffices (Instructor, InstructorOffice)

4. Boyce-Codd Normal Form (BCNF):


• A relation is in BCNF if it is in 3NF and, for every non-trivial
functional dependency X → Y, X is a superkey. This means that
every determinant of a non-trivial dependency must be a
superkey.
• Example: Consider a table with attributes (EmployeeID, ProjectID,
Hours), and the functional dependencies:
1. EmployeeID, ProjectID → Hours
2. ProjectID → EmployeeID
To convert it to BCNF, you would decompose the table into two tables:
• Table 1: EmployeeProjects1 (EmployeeID, ProjectID, Hours)
• Table 2: ProjectEmployees (ProjectID, EmployeeID)

Now, both tables satisfy BCNF because in each table, the determinant (EmployeeID,
ProjectID) is a superkey, and there are no non-trivial dependencies on proper subsets
of superkeys.
Q 4. What do you mean by lossless decomposition ? Explain with suitable
example how functional dependencies can be used to show that
decompositions are lossless.
Ans- Lossless decomposition is a property of database normalization that
ensures that when a relation (table) is decomposed into multiple smaller
relations, the original information can be reconstructed without loss. In other
words, after decomposing a relation, joining the decomposed relations should
yield the original relation.
Functional dependencies play a crucial role in demonstrating lossless
decomposition. If a set of functional dependencies is preserved in the
decomposition, it guarantees that the information losslessly decomposed can
be reconstructed by joining the decomposed relations.
Example-
relation R having attributes (A, B) and a functional dependency A → B.
Original relation R:
|A|B|
|---|---|
|1|x|
|2|y|
|3|z|
Now, let's decompose R into two relations:
1. Relation R1 with attributes (A, B)
2. Relation R2 with attributes (A, C)
Both R1 and R2 have the common attribute A. We want to show that the
decomposition is lossless.
1. Projection (π) on the Common Attribute (A):
• π(A) on R1 and R2 yields the same set of values:
R1:
|A|
|---|
|1|
|2|
|3|
R2:
|A|
|---|
|1|
|2|
|3|
2. Join (⨝) on the Common Attribute (A):
• Joining R1 and R2 using the common attribute A results in a relation with
attributes (A, B, C):
R1 ⨝(π(A) ⨝ R2):
|A|B|C|
|---|---|---|
|1|x|p|
|2|y|q|
|3|z|r|
This result contains all the attributes from the original relation R. The
decomposition is lossless, as we can reconstruct the original relation by joining
R1 and R2 using the common attribute A and the values in column C are
arbitrary.
Q 5. What is MVD and join dependency ? Describe
Ans-
Multivalued Dependency (MVD):
A Multi-Valued Dependency (MVD) is a concept in database theory that
extends the idea of functional dependencies to handle situations where an
attribute in a table depends on a set of attributes rather than a single attribute.
MVDs express relationships involving sets of values rather than individual
values.
Given a relation R with attributes A, B, and C, an MVD A →→ B (read as "A
multi-determines B") means that for each combination of values of A, there is a
set of values for B that is independent of the other attributes in the relation.
For example, consider a relation representing employees and their projects:
| EmployeeID | ProjectID | Skills |
|------------|-----------|--------------|
|1 | 101 | Programming |
|1 | 102 | Database |
|2 | 101 | Design |
Here, if we have the MVD {EmployeeID, ProjectID} →→ Skills, it means that for
each combination of EmployeeID and ProjectID, there is a set of skills that is
independent of other combinations.

Join Dependency:
• A join dependency is a constraint on a relational schema that specifies
how a relation can be reconstructed by joining other relations. It deals
with the relationships among attributes across multiple relations.
• If a set of relations {R1, R2, ..., Rn} satisfies a join dependency D, it means
that any legal instance of the relations can be reconstructed by joining
them.
• A join dependency is represented as D = {R1, R2, ..., Rn}, where the set of
relations {R1, R2, ..., Rn} must satisfy the join dependency.
For example:
• R1: | A | B |
• |---|---|
• |1|x|
• |2|y|
• R2: | A | C |
• |---|---|
• |1|p|
• |2|q|
The join dependency D = {R1, R2} indicates that any instance of R1 and R2
can be reconstructed by joining them on the common attribute A.
Q 6. Explain the fourth and fifth normal with suitable example. Also
differentiate between BCNF and 3NF.
Ans- Fourth Normal Form (4NF):
• Definition:
• A relation is in 4NF if it is in Boyce-Codd Normal Form (BCNF) and
has no non-trivial multivalued dependencies (MVDs).
• An MVD X →→ Y states that, for each value of X, there is a set of
values for Y that is independent of all other attributes.
• Example:
• Consider a relation with attributes (EmployeeID, ProjectID, Skill),
and the MVD EmployeeID →→ ProjectID.
| EmployeeID | ProjectID | Skill |
|------------|-----------|---------|
|1 | 101 | Java |
|1 | 102 | SQL |
|2 | 101 | Python |
|2 | 103 | Java |
In this example, for each employee (EmployeeID), there is a set of projects
(ProjectID) associated with them, and the skill is independent of other
attributes. To bring it to 4NF, you might create two relations: one for
(EmployeeID, ProjectID) and another for (EmployeeID, Skill).
Fifth Normal Form (5NF):
• Definition:
• A relation is in 5NF if it is in 4NF and has no non-trivial join
dependencies.
• A join dependency is a constraint on a relational schema that
specifies how a relation can be reconstructed by joining other
relations.
• Example:
Consider a relation with attributes (CourseID, StudentID,Instructor), and the
join dependency {CourseID, StudentID} → Instructor.
| CourseID | StudentID | Instructor |
|----------|-----------|------------|
| Math |1 | Dr. Smith |
| Math |2 | Dr. Johnson |
| Physics | 1 | Dr. White |
| Physics | 2 | Dr. Brown |
In this example, the join dependency states that for each combination of
CourseID and StudentID, there is a unique value for the Instructor. To achieve
5NF, you might decompose the relation into two: one for {CourseID, StudentID}
and another for {CourseID, Instructor}.
Sno. Criteria BCNF 3NF

BCNF deals with non-trivial 3NF deals with transitive


Dependency functional dependencies where the dependencies and non-trivial
1. Type determinant is a superkey. dependencies on prime attributes.

Decomposition is based on non- Decomposition is based on


trivial functional dependencies, and transitive dependencies, and it aims
it aims to eliminate partial to eliminate partial and transitive
2. Decomposition dependencies. dependencies.

In 3NF, the removal of transitive


In BCNF, the determinant of every dependencies ensures that the
Superkey non-trivial functional dependency determinant is a candidate key
3. Requirement must be a superkey. (superkey).

The goal of BCNF is to ensure that The goal of 3NF is to eliminate


there are no non-trivial functional transitive dependencies and ensure
dependencies on proper subsets of that data is stored without
4. Goal candidate keys. unnecessary redundancy.

Consider a table with attributes (A, Consider a table with attributes (A,
B, C) and functional dependencies A B, C) and functional dependencies A
→ B, B → C. Decomposing it to BCNF → B, B → C. Decomposing it to 3NF
results in two tables: (A, B) and (B, results in two tables: (A, B) and (B,
5. Example C). C).

BCNF is typically considered when


there are non-trivial functional 3NF is a common choice in
dependencies, and it's crucial to database design when addressing
ensure that determinants are transitive dependencies and aiming
6. Use Case superkeys. to eliminate redundancy.

Loss Lossless decomposition is hard to Lossless decomposition can be


7. decomposition achieve in BCNF. achieved by 3NF.
➔ Chapter : 4 - Database Design & Normalization
Q 1. What do you mean by transaction? Explain ACID properties of
transaction. Draw a transaction state diagram and describe the states that
a transaction goes through during execution.
Ans- A transaction is a sequence of one or more operations (reads or writes)
that are executed as a single unit of work. Transactions ensure the
consistency and integrity of a database by providing a way to group multiple
operations into an atomic and isolated operation.
ACID Properties: ACID is an acronym that stands for Atomicity, Consistency,
Isolation, and Durability. These properties define the key characteristics that
a transaction must exhibit:
1. Atomicity:
• Atomicity ensures that a transaction is treated as a single,
indivisible unit of work. Either all the operations within the
transaction are executed, or none of them are. There is no partial
execution.
2. Consistency:
• Consistency ensures that a transaction brings the database from
one consistent state to another. The database must satisfy certain
integrity constraints before and after the transaction.
3. Isolation:
• Isolation ensures that the execution of one transaction is isolated
from the execution of other transactions. Each transaction appears
to be the only transaction interacting with the database, even
though there may be concurrent transactions.
4. Durability:
• Durability ensures that once a transaction is committed, its effects
are permanent and survive subsequent system failures. The
changes made by a committed transaction should persist in the
database.
Transaction State Diagram: A transaction goes through various states during
its execution. The typical states in a transaction state diagram are as follows:
1. Active (A):
• The initial state where the transaction is actively executing its
operations.
2. Partially Committed (PC):
• The state where the transaction has completed its execution, and
the system is about to commit the changes. However, it is not yet
guaranteed that the changes will be permanent.
3. Committed (C):
• In this state, the transaction has been successfully completed, and
its changes have been permanently saved to the database.
4. Failed (F):
• If an unrecoverable error occurs during the execution of the
transaction, it enters the failed state. In this state, the changes
made by the transaction are rolled back.
5. Aborted (Abo):
• This state represents a deliberate action to abort the transaction.
Like the failed state, it leads to a rollback of changes.
6. Terminated (T):
• After the transaction has been committed or aborted, it enters the
terminated state, indicating the end of its life cycle.
Q 2. What is schedule? Define the concept of recoverable, cascadeless and
strict schedules? Describe serializable schedule. Discuss conflict serializability
and view serializability with example.
Ans- a schedule is an ordered sequence of actions (operations) performed by a
set of transactions. These actions include read and write operations on the
database. Schedules play a crucial role in ensuring the correctness and
consistency of database transactions.
Recoverable Schedule: A schedule is recoverable if, for every transaction that
commits, all its write operations are executed only after the write operations of
transactions it depends on. In other words, a transaction's changes are not
visible to other transactions until it commits. If a transaction aborts, any
changes it made should be rolled back.
Cascadeless Schedule: A schedule is cascadeless if a transaction cannot read a
value updated by a transaction that has not committed yet. Cascadeless
schedules avoid the cascading effect of uncommitted changes.
Strict Schedule: A schedule is strict if no transaction reads the value of any
write operation until that write operation is committed. Strict schedules
provide the highest level of isolation among transactions.
Example - Suppose we have two transactions T1 and T2 operating on a
database with two data items A and B.

1. Recoverable Schedule:
• This schedule is recoverable because if T1 commits first, T2's write
operation on B is executed only after T1's write operation on A.
Therefore, the changes made by T1 are not visible to T2 until T1
commits.
2. Cascadeless Schedule:
• This schedule is cascadeless because T2's read operation on B only
occurs after T1's write operation on A has committed. T2 does not
read the uncommitted value of B modified by T1. Thus, there is no
cascading effect of uncommitted changes.
3. Strict Schedule:
• This schedule is strict because T2's read operation on B only occurs
after T1's write operation on A has committed. In a strict schedule,
no transaction can read the value of any write operation until that
write operation is committed.
Serializable Schedule: A schedule is serializable if it is equivalent to some serial
execution of its transactions. Serializable schedules ensure that the final state
of the database is the same as if the transactions were executed in some serial
order.
Conflict Serializability:
Conflict serializability is a criterion for determining whether a schedule is
equivalent to some serial execution. Two actions conflict if they belong to
different transactions, and at least one of them is a write operation. A schedule
is conflict serializable if it can be transformed into a serial schedule by
swapping non-conflicting actions while preserving the order of conflicting
actions
Example:
In this schedule, the conflicting actions are the write operations on A in T1 and
T2. To determine conflict serializability, we can swap non-conflicting actions
while preserving the order of conflicting actions. In this case, the read
operations do not conflict, so we can swap them:

Original schedule: equivalent schedule(conflict serializable)


View Serializability:
View serializability is another criterion for determining whether a schedule is
equivalent to some serial execution. Two schedules are view equivalent if they
produce the same final result for every possible database state. View
serializability considers the final results of different schedules rather than the
specific order of actions.
Example- In Schedule 1, T2 reads the value of B after T1 has committed,
whereas in Schedule 2, T2 reads the value of B before T1's write on A. Despite
the different order of operations, both schedules produce the same final result
(the final state of the database is the same). Therefore, these schedules are
view equivalent.

Q 3. What is log? Explain log based recovery. What is log file ? Write the steps
for log based recovery of a system with suitable example
Ans- Log: A log is a sequential record of events, actions, or changes made to a
system. Logs are crucial for maintaining system integrity, recovering from
failures, and ensuring consistency.
Log-Based Recovery: Log-based recovery is a mechanism used to restore a
system to a consistent state after a failure. It involves the use of a transaction
log, which records all changes made to the system's database. In the event of a
system failure, the log is consulted to redo or undo transactions, ensuring that
the database is brought back to a consistent state.
Log File: A log file is a file that contains the transaction log, which records the
sequence of actions or changes made to the system. Each entry in the log file
corresponds to a specific event or operation, such as the start or end of a
transaction, a commit, or a write operation.
Steps for Log-Based Recovery: Log-based recovery typically involves three
phases: the redo phase, the undo phase and commit phase. Here are the steps
for log-based recovery:
1. Redo Phase:
• During the redo phase, transactions that were in progress at the
time of failure are redone to ensure that their changes are applied
to the database.
2. Undo Phase:
• During the undo phase, transactions that were not completed (i.e., not
committed) at the time of failure are undone to rollback their changes and
maintain consistency.

3. Commit Phase:
• After the redo and undo phases, the system reaches a consistent state. The
commit phase involves marking all recovered transactions as committed.

Consider a simple example with a log file containing entries like:

Q 4. What is a deadlock ? What are necessary conditions for it ? and Describe


methods to handle a deadlock. also Discuss about deadlock prevention
schemes.
Ans-
deadlocks can occur when multiple transactions are contending for exclusive
access to resources, such as database tables or rows. The necessary conditions
for a deadlock in a DBMS are adapted from the general conditions:
Necessary Conditions for Deadlock in a DBMS:
1. Mutual Exclusion:
• Transactions request exclusive locks on data items, and only one
transaction can hold an exclusive lock on a data item at a time.
2. Hold and Wait:
• Transactions hold locks on some data items and may request
additional locks while holding the existing ones.
3. No Preemption:
• Locks cannot be preempted from one transaction to be given to
another; a transaction must release its locks voluntarily.
4. Circular Wait:
• There is a circular chain of transactions, each waiting for a lock
held by the next transaction in the chain.
Handling Deadlocks in a DBMS:
1. Deadlock Prevention:
• Mutual Exclusion Prevention:
• Use shared locks instead of exclusive locks where
appropriate.
• Hold and Wait Prevention:
• Adopt a policy where transactions request all the locks they
need before starting their execution. This can reduce the
likelihood of a deadlock but may lead to decreased
concurrency.
2. Deadlock Avoidance:
• Use techniques like the Banker's algorithm to dynamically analyze
transaction requests and ensure that granting a lock will not lead
to a deadlock.
3. Deadlock Detection and Recovery:
• Periodically check the system for deadlock conditions. If a
deadlock is detected, take actions such as aborting one or more
transactions to break the deadlock and allow the system to
continue.
4. Ignore the Problem:
• Some DBMS may choose to ignore the deadlock problem, relying
on the assumption that deadlocks are rare and handling them
manually when they occur.
Deadlock Prevention Schemes in a DBMS:
Criteria Wait-Die Technique Wound-Wait Technique
Older transactions are allowed to wait Older transactions force younger ones
Basic Concept for younger ones. to wait or abort.
Younger transactions may be aborted Older transactions may be aborted if
Abort Policy if needed. needed.
Younger transactions are more
Transaction Age Older transactions are more "tolerant." "tolerant."
Typically implemented using Typically implemented using
Implementation timestamps. timestamps or priority levels.
Consider a scenario where an older Consider a scenario where an older
transaction T1 has a lock, and a transaction T1 has a lock, and a
younger transaction T2 requests the younger transaction T2 requests the
Example same lock. T2 is forced to wait or same lock. T1 is aborted or forced to
Scenario abort. wait.

Q 5. What is distributed databases and its types? What are the advantages
and disadvantages of distributed databases ? what is the difference between
data replication and data fragmentation with all its type.
Ans-
1. A distributed database system consists of collection of sites, connected
together through a communication network.
2. Each site is a database system site in its own right and the sites have agreed
to work together, so that a user at any site can access anywhere in the network
as if the data were all stored at the user’s local site.
3. Each side has its own local database.
4. A distributed database is fragmented into smaller data sets.
5. DDBMS can handle both local and global transactions
Distributed databases are classified as :
1. Homogeneous distributed database :
a. In this, all sites have identical database management system software.
b. All sites are aware of one another, and agree to co-operate in
processing user’s requests.
2. Heterogeneous distributed database :
a. In this, different sites may use different schemas, and different
database management system software.
b. The sites may not be aware of one another, and they may provide only
limited facilities for co-operation in transaction processing
Advantages of Distributed Databases:
1. Improved Availability and Reliability:
• Distribution of data across multiple locations reduces the risk of a
single point of failure, enhancing system reliability and availability.
2. Scalability:
• Distributed databases can be easily scaled horizontally by adding
more nodes to the network, accommodating increasing data and
user demands.
3. Improved Performance:
• Data can be located closer to the users, reducing latency and
improving query performance.
4. Fault Tolerance:
• Distributed databases can continue to function even if some nodes
experience failures, ensuring fault tolerance and data integrity.
5. Local Autonomy:
• Each local site may have some degree of autonomy, allowing it to
manage its portion of the database independently.
Disadvantages of Distributed Databases:
1. Complexity:
• Managing a distributed system is more complex than a centralized
one, requiring sophisticated coordination and communication
mechanisms.
2. Security Challenges:
• Distributed systems may face additional security challenges due to
the need for secure communication and coordination across
different nodes.
3. Data Consistency:
• Ensuring consistency of data across distributed nodes can be
challenging, especially in the presence of network failures or
delays.
4. Cost:
• The initial setup and maintenance costs of a distributed database
system can be higher compared to a centralized system.
5. Synchronization Issues:
• Coordinating updates and ensuring consistency among distributed
copies of the database can lead to synchronization challenges.
Difference between Data Replication and Data Fragmentation:
Data Replication:
• Definition: Data replication involves creating and maintaining copies of
the same data at multiple locations.
• Types:
• Full Replication: Entire database is copied to each site.
• Partial Replication: Only a subset of the database is copied to each
site.
• Advantages:
• Improved data availability and reliability.
• Enhanced query performance, as data can be accessed locally.
• Disadvantages:
• Increased storage requirements.
• Synchronization challenges to maintain consistency.
Data Fragmentation:
• Definition: Data fragmentation involves dividing a database into
fragments, each stored at different locations.
• Types:
• Horizontal Fragmentation: Divides the rows of a table.
• Vertical Fragmentation: Divides the columns of a table.
• Hybrid Fragmentation: Combination of both horizontal and
vertical fragmentation.
• Advantages:
• Improved query performance, as each site only accesses relevant
fragments.
• Allows for local autonomy in managing specific fragments.
• Disadvantages:
• Coordination challenges to ensure data consistency.
• Increased complexity in managing fragmented data.

➔ Chapter : 5 - Concurrency Control Techniques

Q 1. Explain timestamp based protocol for concurrency controlling?. How


does strict timestamp ordering differ from basic timestamp ordering?
And Discuss multi version scheme of concurrency control.
Ans- Timestamp-Based Concurrency Control:
Timestamp-based concurrency control is a technique used in database
management systems to ensure that transactions are executed in a
serializable manner, preserving consistency. Each transaction is assigned a
timestamp representing its start time, and this timestamp is used to
determine the order of conflicting transactions.

Aspect Basic Timestamp Ordering (BTO) Strict Timestamp Ordering (STO)


Timestamp Each transaction is assigned a unique Each transaction is assigned a
Assignment timestamp. unique timestamp.
Allowed if timestamp ≥ last write Allowed if timestamp > last write
Read Operations timestamp. timestamp.
Allowed if timestamp > last Allowed if timestamp > all previous
Write Operations read/write timestamp. timestamps.
Concurrency Ensures temporal order of
Control transactions. Enforces a stricter temporal order.
Read Anomalies May lead to the lost update anomaly. Prevents the lost update anomaly.
May lead to the temporary Prevents the temporary
Write Anomalies inconsistency anomaly. inconsistency anomaly.

Multi-Version Scheme:
In a multi-version concurrency control scheme (MVCC), each write
operation creates a new version of the data item. This allows multiple
versions of the same data item to coexist. Each version is associated with a
timestamp or a version number.
• Read Operation:
• A transaction reads the version of a data item that was committed
before the transaction's start time.
• Write Operation:
• A transaction creates a new version of a data item when it writes.
This new version is associated with the transaction's timestamp.
• Advantages:
• Read operations are not blocked by write operations, and vice
versa.
• Provides a consistent snapshot of the database for each
transaction.
• Disadvantages:
• Increased storage requirements due to multiple versions of data
items.
Q 2. Explain 2 phase locking for concurrency control.
Ans- the two phase locking(2PL) is a concurrency control mechanism used in
database management system to ensure the consistency and integrity of
the data when multiple transaction are executed concurrently.
The two phase locking protocol consist of 2 phases:
1.Growing phase ( Lock Acquisition)
- During this phase, a transaction is allowed to acquire locks but is not
allowed to release any locks.
- The transaction can acquire locks on data items until it has acquired all the
lock it requires
2. Shrinking phase(Lock Release)
- Once a transaction releases its first lock, it enters the shrinking phase.
- In the shrinking phase, the transaction is not allowed to acquire any new
locks, but it can release locks that it holds.
- After releasing a lock, a transaction cannot acquires any new locks.
- The transaction proceeds to release all its locks.
Advantages of Two-Phase Locking:
1. Serializability: Guarantees that the execution of transactions is
serializable, ensuring consistency of the database.
2. Deadlock Prevention: Helps prevent deadlocks by ensuring that
transactions follow a strict protocol for acquiring and releasing locks.
Disadvantages of Two-Phase Locking:
1. Conservative Approach: Can lead to inefficiencies when transactions are
unnecessarily delayed due to the conservative lock acquisition strategy.
2. Locking Overhead: The need to acquire and release locks incurs
additional overhead in terms of both time and system resources.
Example- consider two transactions T1 and T2
T1: Lock(x)
Read(x)
Lock(y)
Write(x)
Unlock(x)
Unlock(y)
T2: Lock(y)
Read(y)
Lock(x)
Write(y)
Unlock(y)
Unlock(x)
Q 3. How do optimistic concurrency control techniques differ from other
concurrency control techniques? Why they are also called validation or
certification techniques? Discuss the typical phases of an optimistic
concurrency control method.
Ans-
Optimistic Concurrency Control:

Optimistic concurrency control is a type of concurrency control technique that differs


from traditional or pessimistic concurrency control methods. In optimistic
concurrency control, transactions are allowed to proceed without acquiring locks on
data items initially. Instead, conflicts are detected and resolved at the time of
transaction commitment. It is often associated with techniques known as validation
or certification.

Key Differences from Other Concurrency Control Techniques:

1. Locks are Acquired Late:


• In optimistic concurrency control, transactions do not acquire locks on
data items during the read phase. Locks are only acquired at the time
of the write or update phase, just before the transaction commits.
2. Conflict Detection at Commit Time:
• Conflicts between transactions are detected at the time of transaction
commitment, not during the read or write phases. Conflicts can arise
when two transactions attempt to modify the same data item
concurrently.
3. Assumption of Low Contention:
• Optimistic concurrency control assumes that conflicts between
transactions are relatively infrequent. It is well-suited for scenarios
where contention for data items is low.

Validation or Certification: Optimistic concurrency control is also referred to as


validation or certification because the validation phase involves checking whether a
transaction's execution has caused any conflicts with other concurrently executing
transactions. If conflicts are detected, appropriate actions are taken to resolve them.

Typical Phases of an Optimistic Concurrency Control Method:

1. Read Phase:
• Transactions read data items without acquiring locks. The current state
of the data is recorded or buffered locally for the transaction

.
2. Validation Phase:
• At the time of transaction commitment, the system checks whether any
conflicts exist between the locally recorded state of the transaction and
the current state of the database. Conflicts may arise if another
transaction has modified the same data items concurrently.
3. Conflict Detection:
• Conflicts are detected by comparing the timestamps, versions, or other
markers associated with the data items. If conflicts are detected, the
transaction is considered potentially invalid.
4. Resolution of Conflicts:
• If conflicts are detected, the system may employ conflict resolution
mechanisms to address them. This may involve aborting one or more
transactions, forcing them to roll back and retry.
5. Write Phase:
• If the transaction passes the validation phase without conflicts, it
acquires locks on the data items and proceeds to the write phase. The
changes are applied to the database, and the transaction is committed.

Advantages of Optimistic Concurrency Control:

1. Reduced Lock Contention:


• Lock contention is reduced since transactions acquire locks only at the
time of commit.
2. Increased Concurrency:
• Optimistic concurrency control allows for increased concurrency among
transactions since they are not blocked during the read phase.

Disadvantages:

1. Potential for Rollbacks:


• Transactions may need to be rolled back and retried if conflicts are
detected during validation.
2. Increased Validation Overhead:
• The validation phase can introduce additional overhead in terms of
comparing states and detecting conflicts.
Q 4. What do you mean by multiple granularities ? How it is implemented in
transaction system ?
Ans-

Multiple granularity :
1. Multiple granularity can be defined as hierarchically breaking up the
database into blocks which can be locked.
2. It maintains the track of what to lock and how to lock.
3. It makes easy to decide either to lock a data item or to unlock a data
item.
Implementation :
1. Multiple granularity is implemented in transaction system by defining
multiple levels of granularity by allowing data items to be of various
sizes and defining a hierarchy of data granularity where the small
granularities are nested within larger ones.
2. In the tree, a non leaf node represents the data associated with its
descendents.
3. Each node is an independent data item.
4. The highest level represents the entire database.
5. Each node in the tree can be locked individually using shared or exclusive
mode locks.
6. If a node is locked in an intention mode, explicit locking is being done at
lower level of the tree (that is, at a finer granularity).
7. Intention locks are put on all the ancestors of a node before that node is
locked explicitly.
8. While traversing the tree, the transaction locks the various nodes in an
intention mode. This hierarchy can be represented graphically as a tree
Q 5. What is concurrency control ? Why it is needed in database system.
Ans-
1. Concurrency Control (CC) is a process to ensure that data is updated
correctly and appropriately when multiple transactions are concurrently
executed in DBMS.
2. It is a mechanism for correctness when two or more database transactions
that access the same data or dataset are executed concurrently with time
overlap.
3. In general, concurrency control is an essential part of transaction
management.
Concurrency control is needed :
1. To ensure consistency in the database. 2. To prevent following problem : a.
Lost update : i. A second transaction writes a second value of a data item on
top of a first value written by a first concurrent transaction, and the first value
is lost to other transactions running concurrently which need, by their
precedence, to read the first value. ii. The transactions that have read the
wrong value end with incorrect results. b. Dirty read : i. Transactions read a
value written by a transaction that has been later aborted. ii. This value
disappears from the database upon abort, and should not have been read by
any transaction (“dirty read”). iii. The reading transactions end with incorrect
results.

You might also like