DBMSass2 Removed
DBMSass2 Removed
K.TEJESHWAR
Types of Indexing:
Primary Index
101 Alice 20
102 Bob 22
103 Charlie 21
The index will store Student_ID values (101, 102, 103) with pointers to their rows.
2. Clustered Index
1 John 3000
3 Sarah 4000
2 Emma 5000
The rows are physically sorted by Salary, and the index reflects this order.
100522729022
K.TEJESHWAR
3. Non-Clustered Index
Separate structure that contains the indexed column and pointers to rows in the table.
201 Keyboard 20
202 Mouse 15
A non-clustered index on Price stores values (15, 20, 120) with pointers to rows, while the table itself is
unsorted.
Properties:
2. Multi-way Tree: Each node can have multiple keys and children (defined by the order m of the tree).
3. Dynamic Growth: Automatically adjusts its height as keys are inserted or deleted.
4. Internal Node Storage: Keys and data are stored in internal and leaf nodes.
Merits:
Efficient Searching: Reduces the number of disk accesses due to its balanced structure.
Fast Updates: Insertion and deletion are efficient, maintaining balance dynamically.
Demerits:
Complex Structure: Managing internal nodes and balancing increases implementation complexity.
Slower for Sequential Access: Requires traversing internal nodes to access all records.
B+ Trees
Properties:
1. Leaf Nodes Only for Data: All data records are stored in the leaf nodes; internal nodes store keys for
navigation.
100522729022
K.TEJESHWAR
2. Linked Leaf Nodes: Leaf nodes are linked sequentially for faster traversal.
3. Balanced Tree: Like B-trees, all leaf nodes are at the same level.
Merits:
Fast Sequential Access: Linked leaf nodes allow quick traversal of all records.
Compact Internal Nodes: Internal nodes only store keys, leading to better utilization of memory.
Demerits:
More Disk Access: For single key lookups, as traversal always reaches the leaf nodes.
Higher Storage Requirement: Linked leaf nodes and separation of data from internal nodes use more
space.
Comparison:
B-Trees are better for frequent insertions and deletions due to fewer structural changes.
B+ Trees are preferred for range queries and sequential access due to linked leaf nodes.
A transaction is a sequence of one or more database operations (like read, write, update, or delete) performed
as a single logical unit of work. A transaction must maintain the database's consistency, even in case of failures.
1. Atomicity:
Either all operations in the transaction are completed, or none are applied.
Example: Transferring money from Account A to Account B must deduct from A and add to B; if
one fails, neither operation should occur.
2. Consistency:
A transaction must transform the database from one consistent state to another.
The integrity constraints of the database are maintained before and after the transaction.
100522729022
K.TEJESHWAR
Example: If the total amount in all bank accounts is $10,000, this must remain true after the
transaction.
3. Isolation:
Example: While transferring money, another transaction should not see a partially updated bal-
ance.
4. Durability:
Once a transaction is successfully committed, its changes are permanent, even in the case of a
system failure.
Example: After a successful transfer, the updated balances must remain in the database even if
the system crashes.
States of a Transaction
1. Active:
2. Partially Committed:
The transaction has completed its operations but has not yet been finalized.
Example: All database operations are executed, but the transaction is waiting for a commit.
3. Committed:
Example: Money transfer is finalized, and both accounts reflect the updated balances.
4. Failed:
The transaction cannot proceed due to an error (e.g., system failure or constraint violation).
5. Aborted:
The transaction is terminated, and all its operations are rolled back (undone).
Transactions ensure that database operations are consistent, reliable, and recoverable.
Serializability is a concept used to ensure the correctness of concurrent transactions. It ensures that the outcome
of executing multiple transactions concurrently is the same as if the transactions were executed serially (one af-
ter the other).
Types of Serializability
1. Conflict Serializability
2. View Serializability
1. Conflict Serializability
A schedule (sequence of operations) is conflict serializable if it can be transformed into a serial schedule by
swapping non-conflicting operations.
Example:
Transactions:
2. View Serializability
A schedule is view serializable if it produces the same final result as a serial schedule, even if operations cannot
be swapped like in conflict serializability.
100522729022
K.TEJESHWAR
Example:
Transactions:
Final values of A and B are the same as in a serial schedule, so it is view serializable.
Comparison
View Ensures the same final result as a serial Systems where operation reordering
Serializability schedule. is not possible.
Conclusion
1. Hashing in DBMS
Hashing is a technique used to map data to a fixed-size value called a hash key or hash code.
This mapping is done using a hash function, which helps in quick data retrieval, especially in
large datasets
Types of Hashing
1. Static Hashing
Description: The hash table size is fixed, and the same hash function is used throughout the life-
time of the database.
Structure: Keys are mapped to buckets using the hash function.
Characteristics:
100522729022
K.TEJESHWAR
Easy to implement.
Performance degrades as the data grows beyond the table's capacity.
Example:
Hash Function: h(key) = key % 10
Keys: {11, 22, 33}
Buckets: Bucket 1 → 11, Bucket 2 → 22, Bucket 3 → 33.
2. Dynamic Hashing
Description: The hash table size grows or shrinks dynamically as the dataset changes.
Characteristics:
Adapts to changes in data size.
Reduces overflow issues.
Example:
Keys: {101, 202, 303}.
When a bucket overflows, the table doubles in size, and data is redistributed using a new hash
function.
3. Extendible Hashing
Description: A form of dynamic hashing where a directory structure points to hash buckets.
Characteristics:
Directory doubles in size when buckets overflow.
Uses a binary representation of hash values.
Example:
Directory: 00 → Bucket 0, 01 → Bucket 1, 10 → Bucket 2.
4. Linear Hashing
Description: A dynamic hashing technique that grows the table incrementally rather than dou-
bling its size.
Characteristics:
Avoids sudden large memory usage.
Efficient for applications with gradual data growth.
Example:
If the hash table size is initially 4, it increases gradually as buckets overflow (e.g., 4 → 5 → 6).
A log-based protocol ensures database consistency and supports recovery from system failures by maintaining
a log of all database transactions. The log is a sequential record of all operations performed by transactions.
The log must be written to stable storage before any changes are made to the database.
1. Undo Logging:
Condition: Write log entry <T, X, old_value, new_value> before updating the database.
Example:
2. Redo Logging:
Condition: Write log entry and commit <T, commit> before updating the database.
Example:
100522729022
K.TEJESHWAR
3. Undo-Redo Logging:
Advantages:
Disadvantages:
Log-based protocols are fundamental in DBMS to ensure data integrity and support effective recovery in multi-
transactional environments.
The Timestamp-Based Protocol is a concurrency control method used in databases to ensure serializability of
transactions. It assigns a unique timestamp to each transaction and uses these timestamps to order the execution
of operations, ensuring that conflicting operations follow the order of their timestamps.
Key Concepts
1. Timestamp (TS):
A unique identifier for each transaction, typically based on the system clock or a counter.
Older transactions have smaller timestamps, while newer transactions have larger timestamps.
The protocol ensures serializability by controlling read and write operations as follows:
This means T is trying to read a value of X that has already been overwritten by a more
recent transaction.
Otherwise:
T is allowed to read X.
This means T is trying to write a value that may be read by an older transaction.
This means T is trying to write a value of X that has already been written by a newer
transaction.
Otherwise:
T is allowed to write X.
Example
Transactions:
T1: Timestamp = 1
T2: Timestamp = 2
Operations:
1. T1: Read(X):
100522729022
K.TEJESHWAR
2. T2: Write(X):
3. T1: Write(X):
Advantages
Disadvantages
2. Starvation: A transaction may be restarted repeatedly if it gets a smaller timestamp compared to others.
3. Overhead: Maintaining timestamps for transactions and data items adds complexity and requires addi-
tional storage.
Use Case
Timestamp-based protocols are suitable for applications with frequent read-write conflicts where deadlock pre-
vention is critical, but they may not be ideal for high contention environments due to the risk of starvation and
frequent aborts.
Q)ACID properties?
ACID Properties in DBMS
ACID is a set of properties that guarantee that database transactions are processed reliably and ensure the integ-
rity of the database, even in cases of system crashes, power failures, or errors during transaction processing.
1. Atomicity
Definition: A transaction is an atomic unit of work. It is either fully completed or fully rolled back. If any
part of the transaction fails, the entire transaction is canceled, and the database remains unchanged.
Example: If a money transfer operation involves two steps—deducting money from one account and
adding it to another—both steps must be completed. If one step fails, the entire transaction is rolled
back, and neither account is updated.
2. Consistency
Definition: A transaction must take the database from one consistent state to another. After the transac-
tion completes, all database rules, constraints, and relationships must be satisfied, ensuring data integri-
ty.
Example: A transaction transferring money from one account to another must preserve the total balance
across all accounts. If the system had a rule that no account could go below a minimum balance, the
transaction must not violate that rule.
3. Isolation
Definition: Transactions should operate independently and not interfere with each other. The results of
a transaction should not be visible to other transactions until it is fully committed. This ensures that in-
termediate results of one transaction do not affect other concurrent transactions.
Example: If two transactions are trying to transfer money between accounts simultaneously, one trans-
action should not see the changes made by the other until both are completed.
4. Durability
Definition: Once a transaction is committed, its changes to the database are permanent, even in the
case of a system crash or failure. The system must ensure that committed data is saved to non-volatile
storage.
Example: After successfully completing a transaction that transfers money between two accounts, the
changes (such as new balances) should persist even if the database crashes immediately after the com-
mit.
A transaction is fully completed or fully If a transfer fails after deducting money, the sys-
Atomicity rolled back. tem reverts to the original state.
100522729022
K.TEJESHWAR
The database remains in a valid state A transfer must not violate account balance con-
Consistency before and after the transaction. straints or database rules.
Transactions do not interfere with each Transaction A’s changes should not be visible to
Isolation other. Transaction B until A is committed.
Committed transactions are permanent After a successful transfer, the changes (account
Durability and unaffected by crashes. balances) persist even if the system crashes.
ACID properties are crucial in DBMS to ensure that transactions are executed in a reliable, consistent, and re-
coverable manner. This guarantees the integrity of the database and prevents issues such as data corruption or
loss during concurrent processing or system failures.
Normalization is the process of organizing the attributes and tables of a database to reduce redundancy and de-
pendency. The goal is to ensure that the data is stored in the most efficient way, maintaining data integrity and
reducing the likelihood of anomalies (insertion, update, and deletion anomalies).
Normalization involves dividing a database into two or more tables and defining relationships between the ta-
bles. It follows a series of stages called normal forms (NF), each of which eliminates specific types of redundan-
cy.
Normal Forms
In this case, the Subjects column contains multiple values, violating 1NF.
1 Alice Math
1 Alice Science
2 Bob History
2 Bob Math
It is in 1NF.
Here, the non-prime attribute Student_Name depends only on Student_ID, and Course_Name depends only
on Course_ID. These are partial dependencies.
Students Table:
Student_ID Student_Name
100522729022
K.TEJESHWAR
Student_ID Student_Name
1 Alice
2 Bob
Courses Table:
Course_ID Course_Name
101 Math
102 Science
103 History
Student_ID Course_ID
1 101
1 102
2 101
2 103
It is in 2NF.
Here, Department_Head depends on Department, not directly on Student_ID, creating a transitive dependency.
Students Table:
100522729022
K.TEJESHWAR
1 Alice CS
2 Bob Math
Departments Table:
Department Department_Head
CS Dr. Smith
It is in 3NF.
For every functional dependency, the determinant (the attribute on the left
side) must be a superkey.
1 101 Prof. A
1 102 Prof. B
2 101 Prof. A
Here, the Instructor depends on Course_ID, not just on the primary key (Student_ID, Course_ID), so it violates
BCNF.
After BCNF:
Enrollments Table:
Student_ID Course_ID
1 101
1 102
2 101
Courses Table:
100522729022
K.TEJESHWAR
Course_ID Instructor
101 Prof. A
102 Prof. B
3NF Eliminate transitive dependencies (non-prime attributes depend only on the key).
Benefits of Normalization
Reduces Redundancy: Redundant data is minimized, reducing storage and making updates
easier.
Normalization helps design efficient and well-organized databases by breaking down large, complex tables into
smaller ones, maintaining relationships among them through foreign keys.
A good relation design in a database ensures efficiency, minimizes redundancy, and supports the maintenance
of data integrity. Here are the key features of a well-designed relational database schema:
1. Minimal Redundancy
Feature: Each piece of information is stored only once, reducing storage requirements and the risk of
inconsistencies.
100522729022
K.TEJESHWAR
Example: Storing student details in a separate Students table rather than duplicating the same details in
every enrollment record.
2. Data Integrity
Feature: Relational designs use constraints (such as primary keys, foreign keys, and unique constraints)
to maintain data integrity.
Example: A foreign key constraint between a Course table and a Student table ensures that only valid
students can be enrolled in a course.
3. Avoidance of Anomalies
Definition: Prevents issues like insertion, update, and deletion anomalies that can occur due to poor
design.
Feature: Proper normalization and normalization to higher normal forms (like 3NF or BCNF) help elimi-
nate these anomalies.
Example: An insertion anomaly occurs if a student can only be enrolled in a course if they have a
course-related detail. This can be avoided by separating course-related data into its own table.
Definition: The design should easily accommodate future growth or changes in requirements without
needing major redesigns.
Feature: A good design uses normalization and modularity, ensuring that new entities or relationships
can be added without disrupting existing structures.
Example: Adding new subjects or student records can be done without altering the existing course
structure, as subjects are in a separate table.
Definition: The design should support quick and efficient queries, ensuring minimal computational
overhead.
Feature: Proper indexing and use of primary and foreign keys help optimize query performance.
Example: Creating an index on the Student_ID column helps retrieve student information faster.
Feature: The design should have clear foreign key relationships and should reflect the business rules or
real-world entities.
100522729022
K.TEJESHWAR
Example: A Student_Course table clearly represents the many-to-many relationship between students
and courses.
Definition: The relation design should ensure that transactions can adhere to ACID (Atomicity, Con-
sistency, Isolation, Durability) properties.
Feature: Constraints and proper schema design ensure that operations on data remain consistent, even
in case of failures.
Example: A Transaction table that keeps track of financial operations should be designed to guarantee
that each transaction is atomic and consistent.
Definition: Each column in the table should have an appropriate data type to minimize wasted space
and ensure data accuracy.
Feature: Choosing the correct data type for each attribute ensures that data is stored efficiently and val-
idation rules are applied.
Example: Using INT for age or VARCHAR for names ensures efficient storage.
Definition: The design should minimize the need for complex joins, which can decrease query perfor-
mance.
Feature: The schema should be normalized to an extent that makes joins simpler and more efficient.
Example: Instead of storing all data in a single, overly large table, breaking the data into logically con-
nected smaller tables minimizes the need for complex joins.
Definition: Naming conventions should be clear, consistent, and follow a logical pattern.
Feature: Tables, columns, and constraints should have descriptive names that reflect the data they
store.
Example: Naming a column Student_Name in the Students table is clearer than just Name, as it defines
the context.
Concurrency Control is a mechanism used in database management systems (DBMS) to manage simultaneous
transactions in a way that ensures the consistency and correctness of the database. Since multiple transactions
100522729022
K.TEJESHWAR
may execute concurrently, proper control is required to prevent conflicts and ensure that the database remains
in a consistent state.
The main goal of concurrency control is to achieve serializability, which means that the concurrent execution of
transactions should result in the same state as if the transactions were executed serially (one after another).
1. Lock-Based Protocols
Definition: Transactions acquire locks on data items before accessing them. Locks prevent
other transactions from accessing the same data item in conflicting ways.
Types of Locks:
Shared Lock (S-lock): Allows a transaction to read a data item, but prevents other
transactions from modifying it.
Exclusive Lock (X-lock): Allows a transaction to both read and write a data item, and
prevents others from accessing it.
Two-Phase Locking (2PL): A protocol that requires each transaction to acquire all its locks be-
fore it releases any. This guarantees serializability but can lead to deadlocks.
2. Timestamp-Based Protocols
Definition: Every transaction is assigned a unique timestamp. The database system uses these
timestamps to determine the order of transactions and ensures that conflicting transactions are
executed in timestamp order.
Definition: In OCC, a transaction is allowed to execute without locking resources, but before
committing, the system checks for conflicts with other transactions. If a conflict is detected, the
transaction is rolled back.
Phases:
2. Validation Phase: The system checks if the transaction can commit without causing conflicts.
3. Write Phase: The transaction writes the changes to the database if no conflicts are found.
Definition: MVCC allows multiple versions of data to exist. Each transaction reads a snapshot
of the data at the time it started, and transactions modify data without blocking others, creating
new versions of the data.
Example: Systems like PostgreSQL use MVCC to allow concurrent reads and writes.
100522729022
K.TEJESHWAR
When transactions execute concurrently, several problems can arise due to conflicting operations on the same
data. The main problems include:
Definition: This occurs when two or more transactions update the same data item, and the updates are
lost due to conflicts.
Example: Transaction T1 reads a data item X, modifies it, and writes it back. Meanwhile, transaction T2
also reads the same X, modifies it, and writes it back. The update from T1 is lost, and only T2's changes
are retained.
Definition: A transaction reads data that is in an intermediate or inconsistent state, which can lead to in-
correct results.
Example: Transaction T1 reads a record of an account balance, performs some calculations, and writes
the result. Meanwhile, transaction T2 updates the account balance, leading T1 to perform calculations
based on outdated data.
Definition: A transaction reads data that has been modified by another transaction but not yet commit-
ted. If the second transaction is rolled back, the first transaction will have read invalid or "dirty" data.
Example: Transaction T1 reads an account balance, while T2 is updating that balance but hasn’t commit-
ted yet. If T2 aborts, T1 will have used invalid data in its operations.
4. Non-Serializable Schedules
Example: Transaction T1 reads a data item, and T2 writes it. If the two transactions are interleaved in a
non-serializable order, the final result might not be the same as if T1 and T2 had executed one after the
other.
Deadlock occurs when two or more transactions are waiting for each other to release locks, causing them to be
stuck indefinitely.
Deadlock Prevention: Systems can prevent deadlock by using techniques like acquiring all locks be-
fore executing a transaction or using a timeout approach.
Deadlock Detection: Periodically checks for cycles in the wait-for graph to identify and resolve dead-
locks, often by aborting one or more transactions.
100522729022
K.TEJESHWAR
Non-serializability refers to a situation where the execution of concurrent transactions in a database does not
produce the same final result as some serial execution of those transactions. A serial execution is one where the
transactions are executed one after the other without any interleaving. When transactions are interleaved (exe-
cuted concurrently), the order in which operations occur can affect the final state of the database. If the interleav-
ing leads to a final state that cannot be reproduced by a serial execution, the schedule is considered non-
serializable.
Non-serializability is a critical problem in database management systems, as it may lead to inconsistent or in-
correct results. A schedule of transactions is said to be serializable if the result of executing the transactions in a
specific order is equivalent to executing them serially (one by one).
1. Conflict Serializable
2. View Serializable
A conflict serializable schedule is one where the transactions can be reordered (without changing the relative
order of conflicting operations) to form a serial schedule. Conflicting operations are operations that access the
same data item and at least one of them is a write.
Conflict Operations: Two operations conflict if they meet all the following conditions:
A view serializable schedule is one where, if we look at the values of the data items read and written by the
transactions, the schedule can be transformed into a serial schedule that produces the same final results.
View serializability is a broader concept than conflict serializability because it allows for a larger set of inter-
leaved operations to still produce the same final result.
Transaction T1:
1. Read(A)
2. Write(A)
Transaction T2:
1. Read(A)
2. Write(A)
Interleaved Schedule:
1 Read(A) T1
2 Write(A) T1
3 Read(A) T2
4 Write(A) T2
In this schedule:
This schedule is non-serializable because the final state of A after these operations depends on the in-
terleaving. If T1 is executed before T2, the result will be different than if T2 is executed before T1, lead-
ing to non-serializability.
In this example, the interleaved operations cannot be transformed into a serial schedule without changing the
outcome, hence non-serializability occurs.
100522729022
K.TEJESHWAR
1. Inconsistent Data: Non-serializable schedules can lead to the database being in an inconsistent state.
For instance, one transaction may read an intermediate state of data that is not yet committed by another
transaction, leading to incorrect results.
2. Lost Updates: A non-serializable schedule may cause updates from one transaction to overwrite the up-
dates of another transaction, leading to lost data.
Transaction T1:
Read(A)
Transaction T2:
Read(A)
Interleaved Schedule:
1 Read(A) T1
2 Read(A) T2
3 Write(A=50) T1
4 Write(A=100) T2
In this case, both transactions read the value of A before either transaction writes back their value.
T1 reads A as some value (let’s assume A = 20), updates A to 50, and commits.
The final value of A is 100, which is different from what would happen if the transactions were serialized.
Serial Schedule 1: T1 → T2
100522729022
K.TEJESHWAR
Final A = 100.
Serial Schedule 2: T2 → T1
Final A = 50.
In this case, the non-serializable schedule yields a different result than either serial schedule, demonstrating
non-serializability.
Conclusion
Non-serializability is a significant issue in concurrent transaction processing. It occurs when the interleaving of
transaction operations leads to results that are inconsistent with some serial execution. To avoid non-
serializability, DBMS uses concurrency control techniques, such as locking protocols, timestamp-based proto-
cols, and optimistic concurrency control, that ensure serializability or manage conflicts in a way that maintains
data integrity. Proper isolation levels are critical to ensuring that transactions do not interfere with each other in
ways that cause non-serializability.