IGNOU MCS-23 Solved Question Exam Preparation
IGNOU MCS-23 Solved Question Exam Preparation
concise yet comprehensive explanations and examples for each question. This will make the information
more manageable and easier to digest.
8. Recovery Techniques
Let's begin!
Define:
DBMS (Database Management System): A DBMS is a software system that allows users to
define, create, maintain, and control access to the database. It acts as an interface between the
user and the database, providing tools for data manipulation, security, and integrity. Examples
include MySQL, PostgreSQL, Oracle, SQL Server.
Data Independence: Data independence is the ability to modify the schema at one level of the
database system without affecting the schema at the next higher level. It insulates applications
from changes in how data is stored and organized. There are two types:
o Logical Data Independence: The ability to change the conceptual schema without
affecting the external schemas (user views). This means that if we add a new attribute to
a table or change the data type of an existing attribute, it should not impact existing
applications that do not use those specific attributes.
o Physical Data Independence: The ability to change the internal schema (physical storage
details) without affecting the conceptual schema. This means that changes in how data
is physically stored (e.g., using different file organization, adding an index) should not
require changes to the application programs.
Differentiate between:
Data Sharing Easy, concurrent access controlled Difficult, concurrent access problematic
Data Security High, built-in access control and features Low, limited security mechanisms
Export to Sheets
Logical vs Physical Data Independence: (Already explained above under "Data Independence")
o Super Key: A set of one or more attributes that, taken collectively, can uniquely identify
a row in a table. It may contain redundant attributes.
o Candidate Key: A minimal super key; it is a super key such that no proper subset of its
attributes is a super key. There can be multiple candidate keys for a table.
Example: For the Employee table, if EmpID uniquely identifies an employee and
EmpPhone also uniquely identifies an employee, then {EmpID} and {EmpPhone}
are candidate keys. {EmpID, EmpName} is a super key but not a candidate key
because EmpID alone is sufficient.
o Primary Key: A candidate key chosen by the database designer to uniquely identify each
row in a table. It must be unique and cannot contain NULL values (NOT NULL constraint).
Each table can have only one primary key.
Example: From the candidate keys {EmpID} and {EmpPhone}, we would typically
choose EmpID as the primary key for the Employee table.
Explain:
Users interact with their own "views" of the data, tailored to their specific
needs.
Multiple external views can exist for the same conceptual schema.
Example: A student might see their course registration and grades, while an
instructor sees their assigned courses and student rosters.
This level describes the entire database in terms of its logical structure,
independent of physical storage.
It defines all the entities, attributes, relationships, data types, and constraints for
the entire database.
It describes how the data is actually stored on the physical storage devices (e.g.,
hard drives).
Example: How Student records are stored in blocks on disk, the type of index
used for StudentID, and the order of attributes within a record.
Diagram:
+-----------------+
| External View 1|
+-----------------+
| Mapping 1
+-----------------+
| External View N|
+-----------------+
| Mapping N
+-----------------+
| Conceptual Schema|
+-----------------+
V
+-----------------+
| Internal Schema |
+-----------------+
o Query Processor: Interprets user queries and converts them into an executable form. It
includes:
o Storage Manager: Responsible for the interaction with the file system and managing the
physical storage of data. It includes:
Authorization and Integrity Manager: Checks for user permissions and ensures
that data integrity constraints are maintained.
Buffer Manager: Manages the main memory (buffer) for caching data blocks
from disk.
File Manager: Organizes data into files and manages their allocation on disk.
Index Manager: Manages the creation and use of indexes to speed up data
retrieval.
o Utilities: A set of tools for database administration tasks like backup, recovery,
performance monitoring, data loading, and reorganization.
o User Interfaces: Various interfaces for different types of users (e.g., SQL command line,
graphical tools).
ER to relational mapping steps: Converting an Entity-Relationship (ER) model into a relational
schema involves a systematic process to translate entities, attributes, and relationships into
tables, columns, and keys.
Choose one of its candidate keys as the primary key. If composite, all attributes
of the composite key become part of the primary key.
Include the primary key of its identifying (owner) strong entity type as a foreign
key.
The primary key of the weak entity's relation will be the combination of the
foreign key from the owner entity and its own partial key (discriminator).
Option A (Foreign Key in one entity): Add the primary key of one entity type as a
foreign key in the relation of the other entity type. The choice often depends on
participation constraints. If one side has total participation, its relation often
gets the FK.
Option B (Combined relation): If both entities have total participation, they can
be merged into a single relation.
Include the primary keys of both participating entity types as foreign keys in this
new relation.
The primary key of the new relation will be the combination of these foreign
keys.
Include any attributes of the relationship itself as columns in this new relation.
The primary key of this new relation will be the primary key of the original entity
type combined with the multivalued attribute itself.
Example: Address (composite of Street, City, ZipCode). The Student table would
have Street, City, ZipCode as separate columns.
Library Management:
o Entities:
o Relationships:
Borrowing has Book: 1:1 (one borrowing record refers to one book instance).
Librarian manages Borrowing: 1:N (one librarian can manage multiple borrowing
records).
o Assumptions: A book can have multiple copies, but here BookID refers to a unique book
type. If we need to track individual copies, we'd add a BookCopy entity.
+-------------------+
| manages (1:N)
|
+-----------+
| Librarian |
| (LibID) |
+-----------+
o Note: A full ER diagram would use standard ER notation (rectangles for entities,
diamonds for relationships, ovals for attributes, lines for connections, and cardinality
notations).
University/College Database:
o Entities:
o Relationships:
Enrollment has Student: 1:1 (one enrollment record for one student).
Enrollment for Course: 1:1 (one enrollment record for one course instance).
Instructor teaches Course: 1:N (one instructor can teach multiple courses, a
course is taught by one instructor usually).
o Entities:
Customer: CustomerID (PK), Name, Address, Phone, Email, DoB.
o Relationships:
Customer holds Account: 1:N (one customer can hold multiple accounts).
Account has Transaction: 1:N (one account can have many transactions).
Employee handles Customer: 1:N (one employee can handle many customers).
Convert ER model to relational schema: (Refer to the "ER to relational mapping steps" in Section 1. I'll
provide a simplified example for the Library Management system.)
ER Model (simplified):
Borrows (M:N relationship between Member and Book, with BorrowDate, DueDate as
attributes)
Relational Schema:
MemberID FK Member(MemberID)
BookID FK Book(BookID)
Explain:
o Simple Attribute: An attribute that cannot be further divided into smaller components.
It has a single atomic value.
o Composite Attribute: An attribute that can be divided into smaller, more meaningful
simple attributes.
Example: Address (can be divided into Street, City, State, ZipCode), Name (can
be divided into FirstName, MiddleInitial, LastName).
o Derived Attribute: An attribute whose value can be computed or derived from other
attributes in the database. It is not physically stored in the database but is generated
when needed.
Types of Relationships (1:1, 1:N, M:N): These describe the number of instances of one entity
that can be associated with the number of instances of another entity.
Example: Employee 1:1 Parking Space (one employee is assigned one parking
space, and one parking space is assigned to one employee).
o One-to-Many (1:N or 1:M): An instance of entity A can be associated with many
instances of entity B, but an instance of entity B is associated with at most one instance
of entity A.
Example: Student M:N Course (a student can enroll in many courses, and a
course can have many students enrolled).
Relational Algebra: A procedural query language that takes relations as input and produces
relations as output. It forms the theoretical basis for SQL and other query languages. It consists
of a set of fundamental operations.
o Selection (sigma): Selects a subset of tuples from a relation that satisfy a specified
condition.
Syntax: sigma_textcondition(R)
Syntax: pi_textattributelist(R)
o Join (bowtie): Combines tuples from two relations based on a common attribute or a
join condition.
Natural Join (bowtie): Joins relations on all common attributes with the same
name, eliminating duplicate columns.
Syntax: RbowtieS
Example: Join Employee and Department on DeptID: Employee bowtie
Department
Syntax: Rbowtie_thetaS
o Union (cup): Combines two relations (must be union-compatible, i.e., same number of
attributes and corresponding attribute domains). Removes duplicates.
Syntax: RcupS
o Set Difference ($- $): Returns tuples that are in the first relation but not in the second
(union-compatible relations).
Syntax: R−S
"Find students enrolled in more than one course" This requires a self-join or a combination of
operations. Let's find pairs of (StudentID, CourseID) and then group by StudentID. Temp1 =
pi_textStudentID(textEnrolls) Temp2 = rho_textStudentID2,CourseID2,Grade2(textEnrolls) Result
= $\\pi\_{\\text{E1.StudentID}}(\\sigma\_{\\text{E1.StudentID = E2.StudentID AND E1.CourseID
\<\> E2.CourseID}}(\\text{Enrolls as E1} \\times \\text{Enrolls as E2}))$ (This is a common pattern
for "more than one" using self-join. Alternatively, one could use extended relational algebra
operations like grouping and counting, which are not standard fundamental RA but often
supported.)
Tuple Relational Calculus (TRC) queries: A non-procedural query language where queries are
expressed as predicates that define a set of tuples. It describes what to retrieve, not how.
Syntax: { t | P(t) }, where t is a tuple variable and P(t) is a predicate.
o Example: Find all employees with salary > 50000: { t | Employee(t) AND t.Salary > 50000
}
Domain Relational Calculus (DRC) examples: Similar to TRC but uses domain variables (variables
representing attribute values) instead of tuple variables. Syntax: { a1, a2, ..., an | P(a1, a2, ..., an)
}, where a_i are domain variables.
o Example: Find all employees with salary > 50000 (EmpID, EmpName, DeptID, Salary are
domain variables): { e, n, d, s | Employee(e, n, d, s) AND s > 50000 }
o Example: Find names of employees in 'Sales' department (e, n, d_emp are for Employee;
d_dept, d_name, loc are for Department): { n | EXISTS e, d_emp, s (Employee(e, n,
d_emp, s) AND EXISTS d_dept, d_name, loc (Department(d_dept, d_name, loc) AND
d_emp = d_dept AND d_name = 'Sales')) }
Assume tables: EMPLOYEES (EmpID INT PRIMARY KEY, EmpName VARCHAR(50), DeptID INT, Salary
DECIMAL(10,2)) DEPARTMENTS (DeptID INT PRIMARY KEY, DeptName VARCHAR(50), Location
VARCHAR(50)) COURSES (CourseID INT PRIMARY KEY, CourseName VARCHAR(100), Credits INT)
STUDENTS (StudentID INT PRIMARY KEY, StudentName VARCHAR(50), Major VARCHAR(50))
ENROLLMENTS (StudentID INT, CourseID INT, Grade CHAR(2), PRIMARY KEY (StudentID, CourseID),
FOREIGN KEY (StudentID) REFERENCES STUDENTS(StudentID), FOREIGN KEY (CourseID) REFERENCES
COURSES(CourseID))
SQL
DeptID INT,
);
Location VARCHAR(50)
);
);
Major VARCHAR(50)
);
CREATE TABLE ENROLLMENTS (
StudentID INT,
CourseID INT,
Grade CHAR(2),
);
SQL
-- INSERT
INSERT INTO DEPARTMENTS (DeptID, DeptName, Location) VALUES (101, 'Sales', 'New York');
INSERT INTO EMPLOYEES (EmpID, EmpName, DeptID, Salary) VALUES (1, 'Alice', 101, 60000.00);
INSERT INTO ENROLLMENTS (StudentID, CourseID, Grade) VALUES (1, 101, 'A');
-- UPDATE
UPDATE EMPLOYEES
WHERE EmpID = 1;
-- DELETE
FROM EMPLOYEES E
o Left (Outer) Join: Returns all rows from the left table, and the matched rows from the
right table. NULLs for non-matches.
SQL
FROM STUDENTS S
o Right (Outer) Join: Returns all rows from the right table, and the matched rows from the
left table. NULLs for non-matches.
SQL
FROM EMPLOYEES E
o Full (Outer) Join: Returns all rows when there is a match in one of the tables.
SQL
-- Not supported by all databases (e.g., MySQL doesn't directly support FULL OUTER JOIN)
FROM EMPLOYEES E
FROM EMPLOYEES E
UNION
FROM EMPLOYEES E
WHERE E.EmpID IS NULL; -- Ensures only unmatched rows from the right table are added
o Natural Join: Joins tables implicitly based on columns with the same name.
SQL
(Caution: Natural Join can be risky if column names are coincidentally the same but conceptually
different.)
SQL
FROM DEPARTMENTS D
GROUP BY D.DeptName;
-- Average salary per department, only for departments with more than 5 employees
FROM DEPARTMENTS D
FROM STUDENTS S
GROUP BY S.StudentName;
o Find employees whose salary is greater than the average salary of all employees:
SQL
FROM EMPLOYEES
o Find the names of students who are enrolled in the 'Database Management Systems'
course:
SQL
SELECT S.StudentName
FROM STUDENTS S
WHERE S.StudentID IN (
SELECT E.StudentID
FROM ENROLLMENTS E
);
o Find departments that have no employees:
SQL
SELECT DeptName
FROM DEPARTMENTS
WHERE DeptID NOT IN (SELECT DISTINCT DeptID FROM EMPLOYEES WHERE DeptID IS NOT NULL);
Views creation and use: A virtual table based on the result-set of an SQL query.
SQL
FROM EMPLOYEES E
FROM EmployeeDepartmentView
Triggers:
o Purpose: Enforce complex business rules, maintain data consistency, audit data changes,
automate tasks.
o Characteristics:
o Example: An AFTER INSERT trigger on an Orders table to automatically update the Stock
quantity in a Products table.
Stored Procedures:
o Definition: A named block of SQL statements (and procedural logic like loops,
conditionals) that is stored in the database and can be executed multiple times.
o Characteristics:
Can accept input parameters and return output parameters or result sets.
o Example: A stored procedure GetEmployeeDetails(IN emp_id INT) that returns the name
and salary of a given employee.
Cursors:
o Definition: A database object that enables traversal over the rows of a result set, one
row at a time. They allow procedural processing of query results.
o Characteristics:
Limited procedural Full procedural control (loops, Enables row-by-row control over
Control Flow
control if/else) result set
Export to Sheets
Define:
o Example: In (StudentID, CourseID, Grade), StudentID -> StudentName means if two rows
have the same StudentID, they must have the same StudentName. (StudentID, CourseID)
-> Grade means a specific student in a specific course gets a unique grade.
o A relation is in 1NF if all attribute values are atomic (indivisible) and each column
contains values of a single data type.
o A relation is in 2NF if it is in 1NF and all non-key attributes are fully functionally
dependent on the primary key. This means there are no partial dependencies.
o Partial Dependency: A non-key attribute is dependent on only part of a composite
primary key.
o A relation is in BCNF if it is in 3NF and for every non-trivial functional dependency A -> B,
A must be a super key.
o It's a stricter form of 3NF. It handles cases where 3NF fails, particularly when a non-key
attribute determines part of a candidate key.
o Difference from 3NF: 3NF allows A -> B where A is not a super key, if B is a primary key
attribute. BCNF does not. BCNF ensures that every determinant is a candidate key.
Let's normalize the relation R (StudentID, StudentName, Major, AdvisorID, AdvisorName, AdvisorOffice,
CourseID, CourseName, Grade) with the following FDs:
Step 1: Convert to 1NF Assume the initial relation is already in 1NF (all attributes are atomic, no
repeating groups). R (<u>StudentID</u>, <u>CourseID</u>, StudentName, Major, AdvisorID,
AdvisorName, AdvisorOffice, CourseName, Grade) Candidate Key: (StudentID, CourseID)
Step 2: Convert to 2NF Check for partial dependencies on the composite primary key (StudentID,
CourseID):
StudentID -> StudentName, Major (Partial dependency, StudentName and Major depend only on
StudentID).
CourseID -> CourseName (Partial dependency, CourseName depends only on CourseID). Remove
partial dependencies by creating new relations:
Remaining FDs for STUDENTS: StudentID -> StudentName, Major Remaining FDs for COURSES: CourseID -
> CourseName Remaining FDs for ENROLLMENTS: (StudentID, CourseID) -> Grade
Now, let's look at STUDENTS again: (<u>StudentID</u>, StudentName, Major, AdvisorID, AdvisorName,
AdvisorOffice) (if we hadn't decomposed yet based on AdvisorID etc. initially). No, the original relation
was R (StudentID, StudentName, Major, AdvisorID, AdvisorName, AdvisorOffice, CourseID, CourseName,
Grade). Let's restart the decomposition after finding the partial FDs.
Initial Relation R and its FDs: R (<u>StudentID</u>, <u>CourseID</u>, StudentName, Major, AdvisorID,
AdvisorName, AdvisorOffice, CourseName, Grade) PK: (StudentID, CourseID) FDs:
Decomposition to 2NF:
Partial Dependency 1: StudentID -> StudentName, Major (depends only on StudentID, part of
PK)
Partial Dependency 2: CourseID -> CourseName (depends only on CourseID, part of PK)
Step 3: Convert to 3NF Now check STUDENTS_INFO (<u>StudentID</u>, StudentName, Major, AdvisorID,
AdvisorName, AdvisorOffice) for transitive dependencies. FDs for STUDENTS_INFO:
Major -> AdvisorID (Transitive: StudentID -> Major -> AdvisorID. Major is a non-key attribute
determining AdvisorID).
AdvisorID -> AdvisorName, AdvisorOffice (Transitive: StudentID -> Major -> AdvisorID ->
AdvisorName, AdvisorOffice)
o Self-correction: For this to be BCNF, Major must be a candidate key for this table. If a
Major can have multiple advisors, then (Major, AdvisorID) might be the key. But the FD
Major -> AdvisorID implies a 1:1 or 1:N relationship where Major is the "one" side for
AdvisorID. Assuming Major uniquely determines AdvisorID in this context, Major is a
candidate key.
In this specific example, all relations are also in BCNF. BCNF issues typically arise with overlapping
candidate keys or when a non-key attribute determines part of a candidate key (which didn't happen
here after 2NF).
Identify partial and transitive dependencies: (Already covered in the normalization steps above)
Transitive Dependency: StudentID -> AdvisorID via Major (StudentID -> Major and Major ->
AdvisorID). StudentID -> AdvisorName, AdvisorOffice via Major -> AdvisorID.
Definition In 2NF, and no non-key attribute is transitively In 3NF, and for every non-trivial FD X
Feature 3NF (Third Normal Form) BCNF (Boyce-Codd Normal Form)
May still have some redundancy in certain edge Eliminates all forms of redundancy
Redundancy
cases involving overlapping candidate keys. based on functional dependencies.
Export to Sheets
Define:
Transaction: A logical unit of work that consists of one or more database operations (e.g.,
SELECT, INSERT, UPDATE, DELETE). It must be treated as a single, indivisible unit; either all its
operations are completed successfully (committed), or none of them are (rolled back).
o Atomicity: (All or Nothing) A transaction is treated as a single, indivisible unit. Either all
operations within it are successfully completed, or none are. If any part of the
transaction fails, the entire transaction is aborted, and the database is rolled back to its
state before the transaction began.
Example: Money transfer: If debit succeeds but credit fails, the debit is rolled
back.
o Consistency: A transaction brings the database from one valid state to another valid
state. It must obey all defined integrity constraints (e.g., primary keys, foreign keys,
check constraints). If a transaction starts with a consistent database, it will end with a
consistent database.
Example: Money transfer: Total money in the system must remain constant
before and after the transaction (assuming no new money is created/destroyed).
o Isolation: The execution of concurrent transactions should not interfere with each other.
Each transaction should appear to execute in isolation, as if it were the only transaction
running. This prevents one transaction from seeing the intermediate, uncommitted
results of another transaction.
o Durability: Once a transaction has been committed, its changes are permanently stored
in the database and will survive subsequent system failures (e.g., power outages,
crashes).
o Example:
Dirty Read (Uncommitted Read): Occurs when a transaction reads data that has been written by
another concurrent transaction, but that data has not yet been committed (it might be rolled
back). If the uncommitted transaction fails, the first transaction has read "dirty" data.
o Example:
T2 now has an incorrect balance (90) based on data that was never committed.
Inconsistent Read (Non-repeatable Read): Occurs when a transaction reads the same data item
twice, and between the two reads, another transaction modifies that data item and commits,
causing the two reads to return different values.
o Example:
T2: Commit
T1: Read Balance (80) -- First read was 100, second is 80. Inconsistent.
Two-Phase Locking (2PL) protocol: A concurrency control protocol that ensures serializability by
restricting the way transactions acquire and release locks. It consists of two phases:
1. Growing Phase: A transaction can acquire locks but cannot release any locks.
2. Shrinking Phase: A transaction can release locks but cannot acquire any new locks.
Once a transaction releases its first lock, it enters the shrinking phase and cannot acquire any more
locks. This protocol guarantees serializability.
o Example:
Transaction T1:
LOCK(A)
LOCK(B)
UNLOCK(B)
Transaction T2:
LOCK(C)
LOCK(D)
UNLOCK(C)
UNLOCK(D)
Consider two transactions: T1: R(A), W(A), R(B), W(B) T2: R(A), W(A), R(B), W(B)
Time | T1 | T2
-----|-----------|-----------
1 | R(A) |
2 | W(A) |
3 | R(B) |
4 | W(B) |
5 | COMMIT |
6 | | R(A)
7 | | W(A)
8 | | R(B)
9 | | W(B)
10 | | COMMIT
Time | T1 | T2
-----|-----------|-----------
1 | R(A) |
2 | | R(A)
3 | W(A) |
5 | R(B) |
6 | W(B) |
7 | COMMIT |
8 | | R(B)
9 | | W(B)
10 | | COMMIT
This schedule is not conflict serializable because T2's write on A overwrites T1's write, and T1's read on B
happens after T2's write on B, but the order of operations doesn't match any serial execution. The write-
write conflict on A creates a cycle in the precedence graph.
2. Draw a directed edge from T_i to T_j if an operation in T_i conflicts with an operation in
T_j, and T_i's operation occurs before T_j's operation in the schedule.
3. If the graph has no cycles, the schedule is conflict serializable (and thus serializable).
For Schedule 2:
o T1: W(A) before T2: W(A) -> Edge T1 -> T2 (WW conflict on A)
Deadlock prevention methods: Deadlock occurs when two or more transactions are indefinitely
waiting for each other to release resources (locks) that they need.
1. Pre-declaration of Locks:
Transactions declare all the locks they need before starting execution.
The DBMS grants all locks at once or none. If not all locks can be granted, the
transaction waits.
Cons: May lead to lower concurrency, difficult to know all required locks in
advance for complex transactions.
2. Wait-Die Scheme:
If TS(T_i) > TS(T_j) (T_i is younger), T_i dies (rolls back) and restarts later
with the same timestamp.
Ensures: Older transactions always wait for younger ones, preventing circular
wait.
3. Wound-Wait Scheme:
If TS(T_i) < TS(T_j) (T_i is older), T_j is wounded (rolled back) and T_i
acquires the lock. T_j restarts later.
Ensures: Younger transactions are rolled back if an older one needs their
resource, preventing starvation of older transactions.
Define:
o Definition: The simplest file organization where records are placed in the file in the
order in which they are inserted. There is no specific ordering or indexing.
o Characteristics:
o Cons: Very slow for searching (requires full file scan), updates and deletions can be
inefficient if not done at the end.
o Definition: Records are stored in a specific sorted order based on the value of a
designated search key (ordering attribute).
o Characteristics:
o Pros: Efficient for sequential processing (e.g., generating reports) and exact match
queries on the ordering key (using binary search).
o Cons: Inefficient for random access, insertions and deletions are costly as they disrupt
the order.
o Definition: Records are stored based on a hash function, which computes a memory
address from a specified attribute (hash key). The record is stored at or near that
address.
o Characteristics:
Collision handling mechanisms are required when different keys map to the
same address.
o Pros: Very fast for exact match queries on the hash key.
o Cons: Inefficient for range queries, collisions can degrade performance, poor for
sequential access.
Explain:
Primary Index:
o Definition: An index on a file where the search key specifies the sequential order of the
file. It is built on the primary key of the table. There can be only one primary index per
file.
o Characteristics:
The records in the data file are physically ordered according to the primary index
key.
o Example: An index on StudentID in a Student file, where the Student records are
physically stored sorted by StudentID.
Secondary Index:
o Definition: An index on a non-ordering attribute (could be a non-key or even a candidate
key, but not the primary key if it's already used for primary index). The data file is not
physically ordered by the secondary index key.
o Characteristics:
The data file is not necessarily sorted by the secondary index key.
Dense vs Sparse Index: These terms describe how many index entries there are relative to the
data records.
o Dense Index:
Definition: An index that contains an index entry for every search key value in
the data file.
Pros: Faster for exact match lookups, as you can directly find the record's
location.
o Sparse Index:
Definition: An index that contains an index entry for only some of the search key
values in the data file.
Characteristics: Typically stores an entry for the first record in each data block.
Cons: Might require some sequential scan within a block after finding the
correct block.
Used for: Primary indexes on files that are physically sorted by the search key.
B+ Tree structure (with diagram): A self-balancing tree data structure that maintains sorted data
and allows searches, sequential access, insertions, and deletions in logarithmic time. Widely
used for indexing in databases.
o Characteristics:
Internal Nodes: Store pointers to child nodes and range of key values to guide
searches. They do not store actual data pointers.
Leaf Nodes: Form a doubly-linked list, containing all actual data pointers (or the
data records themselves for clustered indexes). This allows for efficient range
queries.
Order m: Each node (except root) has between ceil(m/2) and m children. The
root can have fewer.
[ Root Node ]
/ | \ \
/ | \ \
/ | \ / | \ / | \ / | \
/ | \/ | \/ | \/ | \
[10|ptr][20|ptr][30|ptr]... ... [150|ptr][160|ptr] ... (Leaf Nodes - data pointers or actual records)
<--------------------------------------------------------->
o All actual data (or pointers to data) are in the leaf nodes.
8. Recovery Techniques
Describe:
Log-based recovery:
o Concept: The most common recovery technique. It involves maintaining a log file (or
journal) that records all database modifications. The log contains information about each
transaction, including the transaction ID, the operation performed (insert, delete,
update), the data item affected, the before-image (value before change), and the after-
image (value after change).
o Process:
1. During normal operation: All changes are first written to the log before being
applied to the database itself (Write-Ahead Logging - WAL). This ensures that if a
crash occurs, the necessary information for recovery is available in the log.
Redo Phase (Forward Pass): Scan the log forward from the last
checkpoint. For all committed transactions (those with a commit record
in the log), any changes that were not written to disk are redone
(reapplied) using the after-images in the log. This brings the database to
a state where all committed transactions are reflected.
Undo Phase (Backward Pass): Scan the log backward from the end. For
all uncommitted transactions (those without a commit or abort record,
or with an abort record but no completed undo), any changes that were
written to disk are undone (rolled back) using the before-images in the
log. This removes the effects of partial/failed transactions.
o Cons: Performance overhead due to log writes. Log can grow very large.
Shadow Paging:
o Process:
When a transaction starts, the shadow page table is a copy of the current page
table.
All database updates by the transaction are performed on new physical pages.
The current page table is updated to point to these new pages, while the
shadow page table continues to point to the original (old) pages.
If the transaction aborts, the current page table is discarded, and the shadow
page table becomes the current page table, effectively restoring the database to
its state before the transaction. The new pages are simply ignored (garbage
collected).
If the transaction commits, the shadow page table is updated to become a copy
of the current page table, and the old pages are discarded.
o Cons: Can lead to data fragmentation, overhead of copying page tables, doesn't easily
handle concurrent transactions (typically used for single-user or small multi-user
systems).
Checkpoints:
o Process:
It writes a checkpoint record to the log, listing all currently active transactions.
It updates a restart file (or similar metadata) with the address of the checkpoint
record in the log.
o Benefit during recovery: Instead of scanning the entire log from the beginning, recovery
can start from the last checkpoint. Only transactions active at the time of the
checkpoint, or those that started after the checkpoint, need to be considered for
undo/redo.
o Cons: Can temporarily halt database operations during the checkpoint process (though
modern systems try to minimize this).
Explain:
Immediate vs Deferred update: These are two strategies for how database updates are applied
to the disk.
o Immediate Update:
Recovery: Requires both undo (for uncommitted transactions that have written
to disk) and redo (for committed transactions whose changes might not have
made it to disk due to a crash after commit but before flush). The log must
contain both before-images and after-images.
Pros: Changes are visible to other transactions sooner (if isolation levels allow).
Concept: Database modifications are not written to the actual database on disk
until the transaction successfully commits. All changes are initially buffered in
main memory or temporary log.
Recovery:
Pros: Simpler undo logic (no undo needed), faster commit (no disk writes before
commit).
Cons: Changes are not durable until commit. Requires more memory for
buffering.
Recovery with concurrent transactions: When multiple transactions are executing concurrently,
recovery becomes more complex. The goal is to ensure that after a crash, the database is
restored to a consistent state, reflecting only the effects of committed transactions.
o Log-based recovery is crucial: The log must capture the order of operations and the
before/after images for all interleaved transactions.
o Atomicity for individual transactions: Each transaction must still satisfy its own ACID
properties.
1. Find relevant log records: Start scanning the log from the last checkpoint.
3. Redo Phase: Apply redo operations for all committed transactions (and possibly
some active ones whose changes reached disk) to ensure all committed changes
are reflected in the database. This typically happens forward from the
checkpoint.
4. Undo Phase: Apply undo operations for all uncommitted transactions (those
active at crash or started after checkpoint and not committed) to remove their
partial effects. This typically happens backward from the end of the log.