0% found this document useful (0 votes)
21 views40 pages

IGNOU MCS-23 Solved Question Exam Preparation

The document outlines various topics related to relational databases, including definitions of databases and DBMS, data independence, and key concepts like super keys, candidate keys, and primary keys. It also explains the three-level DB architecture, components of a DBMS, and the process of converting ER models to relational schemas, along with examples for library management and university databases. Additionally, it provides detailed mappings for different relationship types and attributes in database design.

Uploaded by

Samir Rajak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views40 pages

IGNOU MCS-23 Solved Question Exam Preparation

The document outlines various topics related to relational databases, including definitions of databases and DBMS, data independence, and key concepts like super keys, candidate keys, and primary keys. It also explains the three-level DB architecture, components of a DBMS, and the process of converting ER models to relational schemas, along with examples for library management and university databases. Additionally, it provides detailed mappings for different relationship types and attributes in database design.

Uploaded by

Samir Rajak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Instead of providing a single, monolithic answer, I will break down the response by topic, providing

concise yet comprehensive explanations and examples for each question. This will make the information
more manageable and easier to digest.

I will cover the following topics in detail, as requested:

1. Relational Database Concepts

2. Entity-Relationship (ER) Modeling

3. Relational Algebra & Calculus

4. SQL (Structured Query Language)

5. Normalization & Functional Dependencies

6. Transaction Management & Concurrency Control

7. File Organization & Indexing

8. Recovery Techniques

Let's begin!

1. Relational Database Concepts

Define:

 Database: A database is an organized collection of structured information, or data, typically


stored electronically in a computer system. It is designed to efficiently store, retrieve, and
manage large amounts of data. Databases can be categorized based on their organization (e.g.,
relational, NoSQL, object-oriented).

 DBMS (Database Management System): A DBMS is a software system that allows users to
define, create, maintain, and control access to the database. It acts as an interface between the
user and the database, providing tools for data manipulation, security, and integrity. Examples
include MySQL, PostgreSQL, Oracle, SQL Server.

 Data Independence: Data independence is the ability to modify the schema at one level of the
database system without affecting the schema at the next higher level. It insulates applications
from changes in how data is stored and organized. There are two types:

o Logical Data Independence: The ability to change the conceptual schema without
affecting the external schemas (user views). This means that if we add a new attribute to
a table or change the data type of an existing attribute, it should not impact existing
applications that do not use those specific attributes.
o Physical Data Independence: The ability to change the internal schema (physical storage
details) without affecting the conceptual schema. This means that changes in how data
is physically stored (e.g., using different file organization, adding an index) should not
require changes to the application programs.

Differentiate between:

 DBMS vs File System:

Feature DBMS File System

Data Redundancy Minimized, controlled High, uncontrolled

Data Consistency High, enforced through constraints Low, difficult to maintain

Data Sharing Easy, concurrent access controlled Difficult, concurrent access problematic

Data Security High, built-in access control and features Low, limited security mechanisms

Data Integrity High, enforced through constraints Low, application-dependent

Backup & Recovery Built-in mechanisms Manual, application-dependent

Querying Powerful query languages (SQL) Programmatic file access only

Scalability Highly scalable Limited scalability

Cost Generally higher initial cost Lower initial cost

Export to Sheets

 Logical vs Physical Data Independence: (Already explained above under "Data Independence")

 Super Key vs Candidate Key vs Primary Key:

o Super Key: A set of one or more attributes that, taken collectively, can uniquely identify
a row in a table. It may contain redundant attributes.

 Example: In an Employee table with attributes {EmpID, EmpName, EmpAddress,


EmpPhone}, {EmpID}, {EmpID, EmpName}, {EmpPhone} (if unique) are all super
keys.

o Candidate Key: A minimal super key; it is a super key such that no proper subset of its
attributes is a super key. There can be multiple candidate keys for a table.
 Example: For the Employee table, if EmpID uniquely identifies an employee and
EmpPhone also uniquely identifies an employee, then {EmpID} and {EmpPhone}
are candidate keys. {EmpID, EmpName} is a super key but not a candidate key
because EmpID alone is sufficient.

o Primary Key: A candidate key chosen by the database designer to uniquely identify each
row in a table. It must be unique and cannot contain NULL values (NOT NULL constraint).
Each table can have only one primary key.

 Example: From the candidate keys {EmpID} and {EmpPhone}, we would typically
choose EmpID as the primary key for the Employee table.

Explain:

 Three-level DB architecture (External, Conceptual, Internal): Also known as the ANSI/SPARC


architecture, it provides a structured way to view and manage data in a database system,
promoting data independence.

o External Level (View Level):

 This is the highest level, closest to the end-users.

 It describes only a part of the database relevant to a particular user or


application.

 Users interact with their own "views" of the data, tailored to their specific
needs.

 Multiple external views can exist for the same conceptual schema.

 Example: A student might see their course registration and grades, while an
instructor sees their assigned courses and student rosters.

o Conceptual Level (Logical Level):

 This level describes the entire database in terms of its logical structure,
independent of physical storage.

 It defines all the entities, attributes, relationships, data types, and constraints for
the entire database.

 It hides the details of physical storage.

 It provides a global view of the database.

 Example: A detailed ER diagram or a set of relational schemas for the entire


university database (students, courses, instructors, departments, etc.).
o Internal Level (Physical Level):

 This is the lowest level, closest to physical storage.

 It describes how the data is actually stored on the physical storage devices (e.g.,
hard drives).

 It includes details about file organization, indexing, data compression, and


physical record layouts.

 Example: How Student records are stored in blocks on disk, the type of index
used for StudentID, and the order of attributes within a record.

Diagram:

+-----------------+

| External View 1|

+-----------------+

| Mapping 1

+-----------------+

| External View N|

+-----------------+

| Mapping N

+-----------------+

| Conceptual Schema|

+-----------------+

| Mapping to Internal Schema

V
+-----------------+

| Internal Schema |

+-----------------+

 Components of DBMS: A typical DBMS consists of several integrated components:

o Query Processor: Interprets user queries and converts them into an executable form. It
includes:

 DDL Interpreter: Processes DDL (Data Definition Language) statements to create,


modify, and drop database objects.

 DML Compiler/Query Optimizer: Translates DML (Data Manipulation Language)


statements into low-level instructions that the database can understand. The
optimizer finds the most efficient execution plan for a query.

o Storage Manager: Responsible for the interaction with the file system and managing the
physical storage of data. It includes:

 Authorization and Integrity Manager: Checks for user permissions and ensures
that data integrity constraints are maintained.

 Transaction Manager: Ensures that concurrent transactions are executed


correctly and that the database remains in a consistent state.

 Buffer Manager: Manages the main memory (buffer) for caching data blocks
from disk.

 File Manager: Organizes data into files and manages their allocation on disk.

 Index Manager: Manages the creation and use of indexes to speed up data
retrieval.

o Data Dictionary/Catalog: Stores metadata (data about data), including schema


definitions, data types, constraints, user permissions, and storage details. It's a crucial
component for the DBMS to operate.

o Utilities: A set of tools for database administration tasks like backup, recovery,
performance monitoring, data loading, and reorganization.

o Application Programmers Interface (API): Provides a way for external applications to


interact with the database (e.g., JDBC, ODBC).

o User Interfaces: Various interfaces for different types of users (e.g., SQL command line,
graphical tools).
 ER to relational mapping steps: Converting an Entity-Relationship (ER) model into a relational
schema involves a systematic process to translate entities, attributes, and relationships into
tables, columns, and keys.

1. Mapping of Strong Entity Types:

 For each strong entity type, create a relation (table).

 Include all its simple attributes as columns.

 Choose one of its candidate keys as the primary key. If composite, all attributes
of the composite key become part of the primary key.

 Example: Student (<u>StudentID</u>, Name, DoB, Address)

2. Mapping of Weak Entity Types:

 For each weak entity type, create a relation.

 Include all its simple attributes as columns.

 Include the primary key of its identifying (owner) strong entity type as a foreign
key.

 The primary key of the weak entity's relation will be the combination of the
foreign key from the owner entity and its own partial key (discriminator).

 Example: If Dependent is a weak entity of Employee, Dependent


(<u>EmpID</u>, <u>DependentName</u>, DoB). EmpID is a foreign key
referencing Employee.

3. Mapping of 1:1 Relationship Types:

 Option A (Foreign Key in one entity): Add the primary key of one entity type as a
foreign key in the relation of the other entity type. The choice often depends on
participation constraints. If one side has total participation, its relation often
gets the FK.

 Option B (Combined relation): If both entities have total participation, they can
be merged into a single relation.

 Example: Employee 1:1 Manages Department. If Employee always manages


Department, add ManagerID (FK to Employee) to Department table:
Department (<u>DeptID</u>, DeptName, ManagerID).

4. Mapping of 1:N Relationship Types:


 Add the primary key of the "one" side entity as a foreign key in the relation of
the "many" side entity.

 Example: Department 1:N Employee. Add DeptID (FK to Department) to


Employee table: Employee (<u>EmpID</u>, EmpName, DeptID).

5. Mapping of M:N Relationship Types:

 Create a new relation (junction table) for the relationship type.

 Include the primary keys of both participating entity types as foreign keys in this
new relation.

 The primary key of the new relation will be the combination of these foreign
keys.

 Include any attributes of the relationship itself as columns in this new relation.

 Example: Student M:N Course. Create Enrollment table: Enrollment


(<u>StudentID</u>, <u>CourseID</u>, Grade). StudentID and CourseID are
foreign keys.

6. Mapping of Multivalued Attributes:

 Create a new relation for the multivalued attribute.

 The primary key of this new relation will be the primary key of the original entity
type combined with the multivalued attribute itself.

 Example: Employee has PhoneNumbers (multivalued). Create Employee_Phone


(<u>EmpID</u>, <u>PhoneNumber</u>).

7. Mapping of Composite Attributes:

 Only the simple components of a composite attribute are included as attributes


in the relation. The composite attribute itself is not directly mapped.

 Example: Address (composite of Street, City, ZipCode). The Student table would
have Street, City, ZipCode as separate columns.

2. Entity-Relationship (ER) Modeling

Draw ER diagram for:

 Library Management:
o Entities:

 Book: Attributes: BookID (PK), Title, Author, ISBN, Publisher, PublicationYear,


Genre, AvailabilityStatus.

 Member: Attributes: MemberID (PK), Name, Address, Phone, Email,


MembershipDate.

 Librarian: Attributes: LibrarianID (PK), Name, EmployeeID, Shift.

 Borrowing: (Could be a relationship or an entity depending on complexity, let's


make it a strong entity for tracking details) Attributes: BorrowID (PK),
BorrowDate, DueDate, ReturnDate.

o Relationships:

 Member borrows Book: M:N relationship. Implemented via the Borrowing


entity. A member can borrow multiple books, and a book can be borrowed by
multiple members over time.

 Borrowing has Book: 1:1 (one borrowing record refers to one book instance).

 Borrowing done by Member: 1:1 (one borrowing record is initiated by one


member).

 Librarian manages Borrowing: 1:N (one librarian can manage multiple borrowing
records).

o Assumptions: A book can have multiple copies, but here BookID refers to a unique book
type. If we need to track individual copies, we'd add a BookCopy entity.

Simplified ER Diagram Sketch:

+-----------+ +-------------------+ +----------+

| Member |<----| Borrows/Borrowing |----->| Book |

| (MemberID)| | (BorrowID, FK_MemID,| | (BookID) |

+-----------+ | FK_BookID, Date) | +----------+

+-------------------+

| manages (1:N)

|
+-----------+

| Librarian |

| (LibID) |

+-----------+

o Note: A full ER diagram would use standard ER notation (rectangles for entities,
diamonds for relationships, ovals for attributes, lines for connections, and cardinality
notations).

 University/College Database:

o Entities:

 Student: StudentID (PK), Name, DoB, Address, Phone, Email, Major.

 Course: CourseID (PK), CourseName, Credits, Description.

 Instructor: InstructorID (PK), Name, Department, Phone, Email, Rank.

 Department: DeptID (PK), DeptName, Location, Phone.

 Enrollment: (Could be a relationship or entity, strong entity for grading)


Attributes: EnrollmentID (PK), Grade, Semester.

o Relationships:

 Student enrolls in Course: M:N relationship. Implemented via Enrollment entity.

 Enrollment has Student: 1:1 (one enrollment record for one student).

 Enrollment for Course: 1:1 (one enrollment record for one course instance).

 Instructor teaches Course: 1:N (one instructor can teach multiple courses, a
course is taught by one instructor usually).

 Department offers Course: 1:N (one department offers many courses).

 Instructor belongs to Department: 1:N (one department has many instructors).

 Student majors in Department: 1:N (one department has many students


majoring in it).

 Banking or Hospital System: (Let's go with Banking)

o Entities:
 Customer: CustomerID (PK), Name, Address, Phone, Email, DoB.

 Account: AccountNum (PK), AccountType (Savings, Checking, Loan), Balance,


OpenDate.

 Transaction: TransID (PK), TransType (Deposit, Withdrawal, Transfer), Amount,


TransDate, TransTime.

 Branch: BranchID (PK), BranchName, Address, Phone.

 Employee: EmpID (PK), Name, Position, Salary, Phone.

o Relationships:

 Customer holds Account: 1:N (one customer can hold multiple accounts).

 Account has Transaction: 1:N (one account can have many transactions).

 Account at Branch: 1:N (one branch has many accounts).

 Employee works at Branch: 1:N (one branch has many employees).

 Employee handles Customer: 1:N (one employee can handle many customers).

 Transaction initiated by Customer: 1:N (one customer can initiate many


transactions).

Convert ER model to relational schema: (Refer to the "ER to relational mapping steps" in Section 1. I'll
provide a simplified example for the Library Management system.)

ER Model (simplified):

 Book (BookID, Title, Author)

 Member (MemberID, Name, Address)

 Borrowing (<u>BorrowID</u>, BorrowDate, DueDate, ReturnDate) - Weak entity dependent on


Book and Member conceptually, but can be made strong with a composite PK.

 Borrows (M:N relationship between Member and Book, with BorrowDate, DueDate as
attributes)

Relational Schema:

1. Book: Book (BookID, Title, Author)

2. Member: Member (MemberID, Name, Address)

3. Borrowing (M:N mapping):


o Create a new table for the Borrows relationship.

o Borrow (MemberID, BookID, BorrowDate, DueDate, ReturnDate)

o MemberID is a foreign key referencing Member.

o BookID is a foreign key referencing Book.

o The primary key is the composite of (MemberID, BookID, BorrowDate) to handle


multiple borrowings of the same book by the same member over time. If each
borrowing is unique by BorrowID, then:

 Borrowing (BorrowID, MemberID, BookID, BorrowDate, DueDate, ReturnDate)

 MemberID FK Member(MemberID)

 BookID FK Book(BookID)

Explain:

 Attributes (simple, composite, derived):

o Simple Attribute: An attribute that cannot be further divided into smaller components.
It has a single atomic value.

 Example: Age, Gender, BookID.

o Composite Attribute: An attribute that can be divided into smaller, more meaningful
simple attributes.

 Example: Address (can be divided into Street, City, State, ZipCode), Name (can
be divided into FirstName, MiddleInitial, LastName).

o Derived Attribute: An attribute whose value can be computed or derived from other
attributes in the database. It is not physically stored in the database but is generated
when needed.

 Example: Age (derived from DateOfBirth), TotalSalary (derived from BasicSalary


+ Allowances).

 Types of Relationships (1:1, 1:N, M:N): These describe the number of instances of one entity
that can be associated with the number of instances of another entity.

o One-to-One (1:1): An instance of entity A is associated with at most one instance of


entity B, and vice versa.

 Example: Employee 1:1 Parking Space (one employee is assigned one parking
space, and one parking space is assigned to one employee).
o One-to-Many (1:N or 1:M): An instance of entity A can be associated with many
instances of entity B, but an instance of entity B is associated with at most one instance
of entity A.

 Example: Department 1:N Employee (one department can have many


employees, but an employee belongs to only one department).

o Many-to-Many (M:N): An instance of entity A can be associated with many instances of


entity B, and an instance of entity B can also be associated with many instances of entity
A.

 Example: Student M:N Course (a student can enroll in many courses, and a
course can have many students enrolled).

3. Relational Algebra & Calculus

Define & use:

 Relational Algebra: A procedural query language that takes relations as input and produces
relations as output. It forms the theoretical basis for SQL and other query languages. It consists
of a set of fundamental operations.

o Selection (sigma): Selects a subset of tuples from a relation that satisfy a specified
condition.

 Syntax: sigma_textcondition(R)

 Example: Find employees in 'Sales' department:


sigma_textDeptName=′Sales′(textEmployee)

o Projection (pi): Selects a subset of attributes (columns) from a relation. It eliminates


duplicate tuples in the result.

 Syntax: pi_textattributelist(R)

 Example: Get names and salaries of all employees:


pi_textEmpName,Salary(textEmployee)

o Join (bowtie): Combines tuples from two relations based on a common attribute or a
join condition.

 Natural Join (bowtie): Joins relations on all common attributes with the same
name, eliminating duplicate columns.

 Syntax: RbowtieS
 Example: Join Employee and Department on DeptID: Employee bowtie
Department

 Theta Join (bowtie_theta): Joins relations based on an arbitrary condition theta.

 Syntax: Rbowtie_thetaS

 Example: Join Employee and Department where Employee.DeptID =


Department.DeptID: Employee
bowtie_textEmployee.DeptID=Department.DeptID Department

o Union (cup): Combines two relations (must be union-compatible, i.e., same number of
attributes and corresponding attribute domains). Removes duplicates.

 Syntax: RcupS

 Example: List all names from Student and Instructor tables:


pi_textName(textStudent)cuppi_textName(textInstructor)

o Set Difference ($- $): Returns tuples that are in the first relation but not in the second
(union-compatible relations).

 Syntax: R−S

 Example: Find students who have not enrolled in 'DBMS' course:


pi_textStudentID(textStudent)−pi_textStudentID(sigma_textCourseName=′DBM
S′(textEnrollmentbowtietextCourse))

o Rename (rho): Renames a relation or an attribute within a relation.

 Syntax: rho_textNewName(A1,A2,...)(R) or rho_textNewName(R)

 Example: Rename Employee to Workers: rho_textWorkers(textEmployee)

Relational Algebra expressions for queries like:

Assume relations: Employee (<u>EmpID</u>, EmpName, DeptID, Salary) Department (<u>DeptID</u>,


DeptName, Location) Course (<u>CourseID</u>, CourseName, Credits) Student (<u>StudentID</u>,
StudentName) Enrolls (<u>StudentID</u>, <u>CourseID</u>, Grade)

 "Find names of employees working in department X" (Let X = 'Sales')


pi_textEmpName(sigma_textDeptName=′Sales′(textEmployeebowtietextDepartment))

 "Find students enrolled in more than one course" This requires a self-join or a combination of
operations. Let's find pairs of (StudentID, CourseID) and then group by StudentID. Temp1 =
pi_textStudentID(textEnrolls) Temp2 = rho_textStudentID2,CourseID2,Grade2(textEnrolls) Result
= $\\pi\_{\\text{E1.StudentID}}(\\sigma\_{\\text{E1.StudentID = E2.StudentID AND E1.CourseID
\<\> E2.CourseID}}(\\text{Enrolls as E1} \\times \\text{Enrolls as E2}))$ (This is a common pattern
for "more than one" using self-join. Alternatively, one could use extended relational algebra
operations like grouping and counting, which are not standard fundamental RA but often
supported.)

 Tuple Relational Calculus (TRC) queries: A non-procedural query language where queries are
expressed as predicates that define a set of tuples. It describes what to retrieve, not how.
Syntax: { t | P(t) }, where t is a tuple variable and P(t) is a predicate.

o Example: Find all employees with salary > 50000: { t | Employee(t) AND t.Salary > 50000
}

o Example: Find names of employees in 'Sales' department: { t.EmpName | Employee(t)


AND EXISTS d (Department(d) AND t.DeptID = d.DeptID AND d.DeptName = 'Sales') }

 Domain Relational Calculus (DRC) examples: Similar to TRC but uses domain variables (variables
representing attribute values) instead of tuple variables. Syntax: { a1, a2, ..., an | P(a1, a2, ..., an)
}, where a_i are domain variables.

o Example: Find all employees with salary > 50000 (EmpID, EmpName, DeptID, Salary are
domain variables): { e, n, d, s | Employee(e, n, d, s) AND s > 50000 }

o Example: Find names of employees in 'Sales' department (e, n, d_emp are for Employee;
d_dept, d_name, loc are for Department): { n | EXISTS e, d_emp, s (Employee(e, n,
d_emp, s) AND EXISTS d_dept, d_name, loc (Department(d_dept, d_name, loc) AND
d_emp = d_dept AND d_name = 'Sales')) }

4. SQL (Structured Query Language)

Write SQL queries for:

Assume tables: EMPLOYEES (EmpID INT PRIMARY KEY, EmpName VARCHAR(50), DeptID INT, Salary
DECIMAL(10,2)) DEPARTMENTS (DeptID INT PRIMARY KEY, DeptName VARCHAR(50), Location
VARCHAR(50)) COURSES (CourseID INT PRIMARY KEY, CourseName VARCHAR(100), Credits INT)
STUDENTS (StudentID INT PRIMARY KEY, StudentName VARCHAR(50), Major VARCHAR(50))
ENROLLMENTS (StudentID INT, CourseID INT, Grade CHAR(2), PRIMARY KEY (StudentID, CourseID),
FOREIGN KEY (StudentID) REFERENCES STUDENTS(StudentID), FOREIGN KEY (CourseID) REFERENCES
COURSES(CourseID))

 Table creation with constraints (PK, FK, CHECK, NOT NULL):

SQL

CREATE TABLE EMPLOYEES (


EmpID INT PRIMARY KEY,

EmpName VARCHAR(50) NOT NULL,

DeptID INT,

Salary DECIMAL(10,2) CHECK (Salary > 0),

FOREIGN KEY (DeptID) REFERENCES DEPARTMENTS(DeptID)

);

CREATE TABLE DEPARTMENTS (

DeptID INT PRIMARY KEY,

DeptName VARCHAR(50) UNIQUE,

Location VARCHAR(50)

);

CREATE TABLE COURSES (

CourseID INT PRIMARY KEY,

CourseName VARCHAR(100) NOT NULL,

Credits INT CHECK (Credits >= 1 AND Credits <= 6)

);

CREATE TABLE STUDENTS (

StudentID INT PRIMARY KEY,

StudentName VARCHAR(50) NOT NULL,

Major VARCHAR(50)

);
CREATE TABLE ENROLLMENTS (

StudentID INT,

CourseID INT,

Grade CHAR(2),

PRIMARY KEY (StudentID, CourseID), -- Composite Primary Key

FOREIGN KEY (StudentID) REFERENCES STUDENTS(StudentID),

FOREIGN KEY (CourseID) REFERENCES COURSES(CourseID)

);

 Insert, Update, Delete:

SQL

-- INSERT

INSERT INTO DEPARTMENTS (DeptID, DeptName, Location) VALUES (101, 'Sales', 'New York');

INSERT INTO EMPLOYEES (EmpID, EmpName, DeptID, Salary) VALUES (1, 'Alice', 101, 60000.00);

INSERT INTO ENROLLMENTS (StudentID, CourseID, Grade) VALUES (1, 101, 'A');

-- UPDATE

UPDATE EMPLOYEES

SET Salary = 65000.00

WHERE EmpID = 1;

-- DELETE

DELETE FROM ENROLLMENTS

WHERE StudentID = 1 AND CourseID = 101;

 Joins (Inner, Outer, Natural):

o Inner Join: Returns rows when there is a match in both tables.


SQL

SELECT E.EmpName, D.DeptName

FROM EMPLOYEES E

INNER JOIN DEPARTMENTS D ON E.DeptID = D.DeptID;

o Left (Outer) Join: Returns all rows from the left table, and the matched rows from the
right table. NULLs for non-matches.

SQL

SELECT S.StudentName, E.Grade, C.CourseName

FROM STUDENTS S

LEFT JOIN ENROLLMENTS E ON S.StudentID = E.StudentID

LEFT JOIN COURSES C ON E.CourseID = C.CourseID;

o Right (Outer) Join: Returns all rows from the right table, and the matched rows from the
left table. NULLs for non-matches.

SQL

SELECT E.EmpName, D.DeptName

FROM EMPLOYEES E

RIGHT JOIN DEPARTMENTS D ON E.DeptID = D.DeptID;

o Full (Outer) Join: Returns all rows when there is a match in one of the tables.

SQL

-- Not supported by all databases (e.g., MySQL doesn't directly support FULL OUTER JOIN)

-- For databases that support it:

SELECT E.EmpName, D.DeptName

FROM EMPLOYEES E

FULL OUTER JOIN DEPARTMENTS D ON E.DeptID = D.DeptID;

-- Alternative for MySQL (UNION of LEFT and RIGHT JOIN):


SELECT E.EmpName, D.DeptName

FROM EMPLOYEES E

LEFT JOIN DEPARTMENTS D ON E.DeptID = D.DeptID

UNION

SELECT E.EmpName, D.DeptName

FROM EMPLOYEES E

RIGHT JOIN DEPARTMENTS D ON E.DeptID = D.DeptID

WHERE E.EmpID IS NULL; -- Ensures only unmatched rows from the right table are added

o Natural Join: Joins tables implicitly based on columns with the same name.

SQL

SELECT EmpName, DeptName

FROM EMPLOYEES NATURAL JOIN DEPARTMENTS;

(Caution: Natural Join can be risky if column names are coincidentally the same but conceptually
different.)

 Aggregate functions (COUNT, AVG, SUM) with GROUP BY, HAVING:

SQL

-- Count employees per department

SELECT D.DeptName, COUNT(E.EmpID) AS NumberOfEmployees

FROM DEPARTMENTS D

LEFT JOIN EMPLOYEES E ON D.DeptID = E.DeptID

GROUP BY D.DeptName;

-- Average salary per department, only for departments with more than 5 employees

SELECT D.DeptName, AVG(E.Salary) AS AverageSalary

FROM DEPARTMENTS D

INNER JOIN EMPLOYEES E ON D.DeptID = E.DeptID


GROUP BY D.DeptName

HAVING COUNT(E.EmpID) > 5;

-- Total credits for each student

SELECT S.StudentName, SUM(C.Credits) AS TotalCredits

FROM STUDENTS S

INNER JOIN ENROLLMENTS E ON S.StudentID = E.StudentID

INNER JOIN COURSES C ON E.CourseID = C.CourseID

GROUP BY S.StudentName;

 Nested Queries, Subqueries:

o Find employees whose salary is greater than the average salary of all employees:

SQL

SELECT EmpName, Salary

FROM EMPLOYEES

WHERE Salary > (SELECT AVG(Salary) FROM EMPLOYEES);

o Find the names of students who are enrolled in the 'Database Management Systems'
course:

SQL

SELECT S.StudentName

FROM STUDENTS S

WHERE S.StudentID IN (

SELECT E.StudentID

FROM ENROLLMENTS E

INNER JOIN COURSES C ON E.CourseID = C.CourseID

WHERE C.CourseName = 'Database Management Systems'

);
o Find departments that have no employees:

SQL

SELECT DeptName

FROM DEPARTMENTS

WHERE DeptID NOT IN (SELECT DISTINCT DeptID FROM EMPLOYEES WHERE DeptID IS NOT NULL);

 Views creation and use: A virtual table based on the result-set of an SQL query.

SQL

-- Create a view showing employee and department details

CREATE VIEW EmployeeDepartmentView AS

SELECT E.EmpName, E.Salary, D.DeptName, D.Location

FROM EMPLOYEES E

INNER JOIN DEPARTMENTS D ON E.DeptID = D.DeptID;

-- Use the view

SELECT EmpName, DeptName

FROM EmployeeDepartmentView

WHERE Salary > 70000;

Define and differentiate:

 Triggers:

o Definition: A special type of stored procedure that automatically executes or "fires"


when a specific event occurs in the database (e.g., INSERT, UPDATE, DELETE on a table).

o Purpose: Enforce complex business rules, maintain data consistency, audit data changes,
automate tasks.

o Characteristics:

 Associated with a specific table and event.

 Can be BEFORE or AFTER the event.


 Can be FOR EACH ROW or FOR EACH STATEMENT.

o Example: An AFTER INSERT trigger on an Orders table to automatically update the Stock
quantity in a Products table.

 Stored Procedures:

o Definition: A named block of SQL statements (and procedural logic like loops,
conditionals) that is stored in the database and can be executed multiple times.

o Purpose: Encapsulate complex logic, improve performance (pre-compiled), reduce


network traffic, enhance security (grant execute permission without direct table access).

o Characteristics:

 Explicitly called by name.

 Can accept input parameters and return output parameters or result sets.

 Can perform DML and DDL operations.

o Example: A stored procedure GetEmployeeDetails(IN emp_id INT) that returns the name
and salary of a given employee.

 Cursors:

o Definition: A database object that enables traversal over the rows of a result set, one
row at a time. They allow procedural processing of query results.

o Purpose: To process individual rows returned by a SELECT statement, typically within


stored procedures or functions, when set-based operations are not sufficient.

o Characteristics:

 Requires DECLARE, OPEN, FETCH, and CLOSE operations.

 Can be FOR UPDATE or FOR READ ONLY.

 Can be SCROLL (move backward/forward) or NO SCROLL (forward only).

o Example: Iterating through a result set of customer orders to calculate a complex


discount for each order based on specific criteria.

Feature Trigger Stored Procedure Cursor

Automatic (on DML Used within SPs/functions for


Execution Manual (explicit call)
event) row-by-row processing
Feature Trigger Stored Procedure Cursor

Enforce rules, audit, Encapsulate logic, Row-level processing of result


Purpose
automate performance, security sets

No direct input/output Can have input/output


Input/Output Iterates over a result set
parameters parameters, return sets

Limited procedural Full procedural control (loops, Enables row-by-row control over
Control Flow
control if/else) result set

Tied to a specific General-purpose, callable Tied to a specific SELECT


Context
table/event from anywhere statement

Export to Sheets

5. Normalization & Functional Dependencies

Define:

 Functional Dependency (FD): An association between two sets of attributes in a relation. An


attribute or set of attributes A is said to functionally determine an attribute B (written as A -> B)
if, for any two tuples in the relation, if they have the same value for A, then they must also have
the same value for B.

o Example: In (StudentID, CourseID, Grade), StudentID -> StudentName means if two rows
have the same StudentID, they must have the same StudentName. (StudentID, CourseID)
-> Grade means a specific student in a specific course gets a unique grade.

 1NF (First Normal Form):

o A relation is in 1NF if all attribute values are atomic (indivisible) and each column
contains values of a single data type.

o There are no repeating groups (multivalued attributes).

o Each row is uniquely identified by a primary key.

o Violation Example: A Student table with PhoneNumbers as a comma-separated list in


one column.

 2NF (Second Normal Form):

o A relation is in 2NF if it is in 1NF and all non-key attributes are fully functionally
dependent on the primary key. This means there are no partial dependencies.
o Partial Dependency: A non-key attribute is dependent on only part of a composite
primary key.

o Violation Example: (<u>StudentID</u>, <u>CourseID</u>, StudentName, CourseName,


Grade). If (StudentID, CourseID) is PK, then StudentName partially depends on StudentID
(part of PK), and CourseName partially depends on CourseID (part of PK).

 3NF (Third Normal Form):

o A relation is in 3NF if it is in 2NF and there are no transitive dependencies of non-key


attributes on the primary key.

o Transitive Dependency: A non-key attribute is dependent on another non-key attribute,


which in turn is dependent on the primary key. If A -> B and B -> C, then A -> C is a
transitive dependency (if B is not a super key or part of a candidate key).

o Violation Example: (<u>EmpID</u>, EmpName, DeptName, DeptLocation). If EmpID ->


DeptName and DeptName -> DeptLocation, then EmpID -> DeptLocation transitively.

 BCNF (Boyce-Codd Normal Form):

o A relation is in BCNF if it is in 3NF and for every non-trivial functional dependency A -> B,
A must be a super key.

o It's a stricter form of 3NF. It handles cases where 3NF fails, particularly when a non-key
attribute determines part of a candidate key.

o Difference from 3NF: 3NF allows A -> B where A is not a super key, if B is a primary key
attribute. BCNF does not. BCNF ensures that every determinant is a candidate key.

Normalize given relation step by step:

Let's normalize the relation R (StudentID, StudentName, Major, AdvisorID, AdvisorName, AdvisorOffice,
CourseID, CourseName, Grade) with the following FDs:

1. StudentID -> StudentName, Major

2. AdvisorID -> AdvisorName, AdvisorOffice

3. Major -> AdvisorID (Each major has one assigned advisor)

4. (StudentID, CourseID) -> Grade

5. CourseID -> CourseName

Step 1: Convert to 1NF Assume the initial relation is already in 1NF (all attributes are atomic, no
repeating groups). R (<u>StudentID</u>, <u>CourseID</u>, StudentName, Major, AdvisorID,
AdvisorName, AdvisorOffice, CourseName, Grade) Candidate Key: (StudentID, CourseID)
Step 2: Convert to 2NF Check for partial dependencies on the composite primary key (StudentID,
CourseID):

 StudentID -> StudentName, Major (Partial dependency, StudentName and Major depend only on
StudentID).

 CourseID -> CourseName (Partial dependency, CourseName depends only on CourseID). Remove
partial dependencies by creating new relations:

 STUDENTS: (<u>StudentID</u>, StudentName, Major) (from FD 1)

 COURSES: (<u>CourseID</u>, CourseName) (from FD 5)

 ENROLLMENTS: (<u>StudentID</u>, <u>CourseID</u>, Grade) (remaining, with composite PK)

Remaining FDs for STUDENTS: StudentID -> StudentName, Major Remaining FDs for COURSES: CourseID -
> CourseName Remaining FDs for ENROLLMENTS: (StudentID, CourseID) -> Grade

Now, let's look at STUDENTS again: (<u>StudentID</u>, StudentName, Major, AdvisorID, AdvisorName,
AdvisorOffice) (if we hadn't decomposed yet based on AdvisorID etc. initially). No, the original relation
was R (StudentID, StudentName, Major, AdvisorID, AdvisorName, AdvisorOffice, CourseID, CourseName,
Grade). Let's restart the decomposition after finding the partial FDs.

Initial Relation R and its FDs: R (<u>StudentID</u>, <u>CourseID</u>, StudentName, Major, AdvisorID,
AdvisorName, AdvisorOffice, CourseName, Grade) PK: (StudentID, CourseID) FDs:

1. StudentID -> StudentName, Major

2. AdvisorID -> AdvisorName, AdvisorOffice

3. Major -> AdvisorID

4. (StudentID, CourseID) -> Grade

5. CourseID -> CourseName

Decomposition to 2NF:

 Partial Dependency 1: StudentID -> StudentName, Major (depends only on StudentID, part of
PK)

o Create STUDENTS (<u>StudentID</u>, StudentName, Major)

 Partial Dependency 2: CourseID -> CourseName (depends only on CourseID, part of PK)

o Create COURSES (<u>CourseID</u>, CourseName)

 Remaining attributes with the original PK:


o Create ENROLLMENTS (<u>StudentID</u>, <u>CourseID</u>, Grade) (and include
AdvisorID and AdvisorName, AdvisorOffice since they were in the original relation, but
they don't depend on (StudentID, CourseID) directly here, their place will be determined
by transitive dependencies)

So, after 2NF decomposition, we have:

 STUDENTS_INFO (<u>StudentID</u>, StudentName, Major, AdvisorID, AdvisorName,


AdvisorOffice)

 COURSES (<u>CourseID</u>, CourseName)

 ENROLLMENTS (<u>StudentID</u>, <u>CourseID</u>, Grade)

Step 3: Convert to 3NF Now check STUDENTS_INFO (<u>StudentID</u>, StudentName, Major, AdvisorID,
AdvisorName, AdvisorOffice) for transitive dependencies. FDs for STUDENTS_INFO:

 StudentID -> StudentName, Major (This is fine, StudentID is PK)

 Major -> AdvisorID (Transitive: StudentID -> Major -> AdvisorID. Major is a non-key attribute
determining AdvisorID).

 AdvisorID -> AdvisorName, AdvisorOffice (Transitive: StudentID -> Major -> AdvisorID ->
AdvisorName, AdvisorOffice)

Remove transitive dependencies:

 From Major -> AdvisorID, create MAJOR_ADVISOR (<u>Major</u>, AdvisorID)

 From AdvisorID -> AdvisorName, AdvisorOffice, create ADVISORS (<u>AdvisorID</u>,


AdvisorName, AdvisorOffice)

Update STUDENTS_INFO by removing the determined attributes:

 STUDENTS (<u>StudentID</u>, StudentName, Major)

Final relations after 3NF:

1. STUDENTS: (<u>StudentID</u>, StudentName, Major)

2. COURSES: (<u>CourseID</u>, CourseName)

3. ENROLLMENTS: (<u>StudentID</u>, <u>CourseID</u>, Grade)

4. MAJOR_ADVISOR: (<u>Major</u>, AdvisorID) (Here, Major is a FK to STUDENTS, and AdvisorID


is a FK to ADVISORS)

5. ADVISORS: (<u>AdvisorID</u>, AdvisorName, AdvisorOffice)


All these relations are now in 3NF.

Step 4: Convert to BCNF Check if for every FD X -> Y, X is a super key.

1. STUDENTS (StudentID, StudentName, Major):

o StudentID -> StudentName, Major. StudentID is PK (super key). OK.

2. COURSES (CourseID, CourseName):

o CourseID -> CourseName. CourseID is PK (super key). OK.

3. ENROLLMENTS (StudentID, CourseID, Grade):

o (StudentID, CourseID) -> Grade. (StudentID, CourseID) is PK (super key). OK.

4. MAJOR_ADVISOR (Major, AdvisorID):

o Major -> AdvisorID. Major is the PK (super key). OK.

o Self-correction: For this to be BCNF, Major must be a candidate key for this table. If a
Major can have multiple advisors, then (Major, AdvisorID) might be the key. But the FD
Major -> AdvisorID implies a 1:1 or 1:N relationship where Major is the "one" side for
AdvisorID. Assuming Major uniquely determines AdvisorID in this context, Major is a
candidate key.

5. ADVISORS (AdvisorID, AdvisorName, AdvisorOffice):

o AdvisorID -> AdvisorName, AdvisorOffice. AdvisorID is PK (super key). OK.

In this specific example, all relations are also in BCNF. BCNF issues typically arise with overlapping
candidate keys or when a non-key attribute determines part of a candidate key (which didn't happen
here after 2NF).

Identify partial and transitive dependencies: (Already covered in the normalization steps above)

 Partial Dependency: StudentID -> StudentName, Major in R (<u>StudentID</u>,


<u>CourseID</u>, StudentName, Major, ...).

 Transitive Dependency: StudentID -> AdvisorID via Major (StudentID -> Major and Major ->
AdvisorID). StudentID -> AdvisorName, AdvisorOffice via Major -> AdvisorID.

Difference between 3NF and BCNF:

Feature 3NF (Third Normal Form) BCNF (Boyce-Codd Normal Form)

Definition In 2NF, and no non-key attribute is transitively In 3NF, and for every non-trivial FD X
Feature 3NF (Third Normal Form) BCNF (Boyce-Codd Normal Form)

dependent on the primary key. -> Y, X must be a super key.

Stricter? Less strict Stricter than 3NF

May still have some redundancy in certain edge Eliminates all forms of redundancy
Redundancy
cases involving overlapping candidate keys. based on functional dependencies.

Always ensures lossless join


Always ensures lossless join decomposition and
Lossless Join decomposition, but may not always
dependency preservation.
preserve all dependencies.

Used for higher levels of


normalization and when redundancy
More commonly used in practical database design
Applicability must be completely eliminated, even
due to guaranteed dependency preservation.
at the cost of dependency
preservation.

Occurs when there are multiple overlapping


When 3NF is achieved, but there
candidate keys, and a non-key attribute determines
When 3NF ≠ exists an FD X -> Y where X is not a
part of a candidate key. Or when a non-key
BCNF super key, and Y is a prime attribute
attribute determines another non-key attribute that
(part of a candidate key).
is also part of a candidate key.

Consider R(A, B, C) with FDs: AB -> C, C -> B. (A, C)


Example and (A, B) are candidate keys. Here, C -> B is a
Scenario violation of BCNF (C is not a superkey), but it's in
3NF because B is a prime attribute.

Export to Sheets

6. Transaction Management & Concurrency Control

Define:

 Transaction: A logical unit of work that consists of one or more database operations (e.g.,
SELECT, INSERT, UPDATE, DELETE). It must be treated as a single, indivisible unit; either all its
operations are completed successfully (committed), or none of them are (rolled back).

o Example: Transferring money from account A to account B involves: DEBIT A, CREDIT B.


Both must succeed or fail together.
 ACID properties: A set of properties that guarantee reliable processing of database transactions.

o Atomicity: (All or Nothing) A transaction is treated as a single, indivisible unit. Either all
operations within it are successfully completed, or none are. If any part of the
transaction fails, the entire transaction is aborted, and the database is rolled back to its
state before the transaction began.

 Example: Money transfer: If debit succeeds but credit fails, the debit is rolled
back.

o Consistency: A transaction brings the database from one valid state to another valid
state. It must obey all defined integrity constraints (e.g., primary keys, foreign keys,
check constraints). If a transaction starts with a consistent database, it will end with a
consistent database.

 Example: Money transfer: Total money in the system must remain constant
before and after the transaction (assuming no new money is created/destroyed).

o Isolation: The execution of concurrent transactions should not interfere with each other.
Each transaction should appear to execute in isolation, as if it were the only transaction
running. This prevents one transaction from seeing the intermediate, uncommitted
results of another transaction.

 Example: If Transaction 1 is updating an account balance, Transaction 2 trying to


read that balance should either see the balance before Transaction 1 started or
after it committed, but not an inconsistent intermediate value.

o Durability: Once a transaction has been committed, its changes are permanently stored
in the database and will survive subsequent system failures (e.g., power outages,
crashes).

 Example: After a money transfer is committed, even if the system crashes


immediately, the changes to the account balances will be present when the
system recovers.

 Serializability: A property of concurrent transaction schedules. A schedule (interleaving of


operations from multiple transactions) is serializable if its effect on the database is equivalent to
the effect of some serial execution of the same set of transactions. Serial execution means
transactions run one after another, with no interleaving. Serializability is a key goal of
concurrency control.

Explain with examples:


 Lost Update: Occurs when two concurrent transactions both read and then update the same
data item, and the update of one transaction is overwritten by the update of the other,
effectively "losing" one of the updates.

o Example:

 Initial: Account Balance = 100

 T1: Read Balance (100)

 T2: Read Balance (100)

 T1: Balance = Balance - 10 (Balance = 90)

 T2: Balance = Balance + 20 (Balance = 120)

 T1: Write Balance (90)

 T2: Write Balance (120) -- T1's update is lost.

 Expected: 100 - 10 + 20 = 110. Actual: 120.

 Dirty Read (Uncommitted Read): Occurs when a transaction reads data that has been written by
another concurrent transaction, but that data has not yet been committed (it might be rolled
back). If the uncommitted transaction fails, the first transaction has read "dirty" data.

o Example:

 Initial: Account Balance = 100

 T1: Begin Transaction

 T1: Update Balance = Balance - 10 (Balance = 90) -- Not yet committed

 T2: Read Balance (90)

 T1: Rollback (Balance reverts to 100)

 T2 now has an incorrect balance (90) based on data that was never committed.

 Inconsistent Read (Non-repeatable Read): Occurs when a transaction reads the same data item
twice, and between the two reads, another transaction modifies that data item and commits,
causing the two reads to return different values.

o Example:

 Initial: Balance = 100

 T1: Read Balance (100)


 T2: Update Balance = Balance - 20 (Balance = 80)

 T2: Commit

 T1: Read Balance (80) -- First read was 100, second is 80. Inconsistent.

 Two-Phase Locking (2PL) protocol: A concurrency control protocol that ensures serializability by
restricting the way transactions acquire and release locks. It consists of two phases:

1. Growing Phase: A transaction can acquire locks but cannot release any locks.

2. Shrinking Phase: A transaction can release locks but cannot acquire any new locks.

Once a transaction releases its first lock, it enters the shrinking phase and cannot acquire any more
locks. This protocol guarantees serializability.

o Example:

 Transaction T1:

 LOCK(A)

 LOCK(B)

 UNLOCK(A) -- Enters Shrinking Phase

 UNLOCK(B)

 Transaction T2:

 LOCK(C)

 LOCK(D)

 UNLOCK(C)

 UNLOCK(D)

If T1 needs C after releasing A, it won't be allowed.

 Draw schedule and check serializability:

Consider two transactions: T1: R(A), W(A), R(B), W(B) T2: R(A), W(A), R(B), W(B)

Example Schedule 1 (Serializable - Conflict Serializable):

Time | T1 | T2

-----|-----------|-----------
1 | R(A) |

2 | W(A) |

3 | R(B) |

4 | W(B) |

5 | COMMIT |

6 | | R(A)

7 | | W(A)

8 | | R(B)

9 | | W(B)

10 | | COMMIT

This is a serial schedule (T1 then T2), hence serializable.

Example Schedule 2 (Non-serializable - Lost Update):

Time | T1 | T2

-----|-----------|-----------

1 | R(A) |

2 | | R(A)

3 | W(A) |

4 | | W(A) -- T1's write of A is lost if T1 commits later

5 | R(B) |

6 | W(B) |

7 | COMMIT |

8 | | R(B)

9 | | W(B)

10 | | COMMIT
This schedule is not conflict serializable because T2's write on A overwrites T1's write, and T1's read on B
happens after T2's write on B, but the order of operations doesn't match any serial execution. The write-
write conflict on A creates a cycle in the precedence graph.

Checking Conflict Serializability (Precedence Graph):

1. Create a node for each transaction.

2. Draw a directed edge from T_i to T_j if an operation in T_i conflicts with an operation in
T_j, and T_i's operation occurs before T_j's operation in the schedule.

 Conflicts: Read-Write (RW), Write-Read (WR), Write-Write (WW) on the same


data item.

3. If the graph has no cycles, the schedule is conflict serializable (and thus serializable).

For Schedule 2:

o T1: W(A) before T2: W(A) -> Edge T1 -> T2 (WW conflict on A)

o There is a cycle: T1 -> T2. So, not conflict serializable.

 Deadlock prevention methods: Deadlock occurs when two or more transactions are indefinitely
waiting for each other to release resources (locks) that they need.

1. Pre-declaration of Locks:

 Transactions declare all the locks they need before starting execution.

 The DBMS grants all locks at once or none. If not all locks can be granted, the
transaction waits.

 Pros: Prevents deadlocks.

 Cons: May lead to lower concurrency, difficult to know all required locks in
advance for complex transactions.

2. Wait-Die Scheme:

 A non-preemptive method based on timestamps. Each transaction T is assigned


a timestamp TS(T) when it begins.

 If T_i requests a lock held by T_j:

 If TS(T_i) < TS(T_j) (T_i is older), T_i waits for T_j.

 If TS(T_i) > TS(T_j) (T_i is younger), T_i dies (rolls back) and restarts later
with the same timestamp.
 Ensures: Older transactions always wait for younger ones, preventing circular
wait.

3. Wound-Wait Scheme:

 Another non-preemptive method using timestamps.

 If T_i requests a lock held by T_j:

 If TS(T_i) < TS(T_j) (T_i is older), T_j is wounded (rolled back) and T_i
acquires the lock. T_j restarts later.

 If TS(T_i) > TS(T_j) (T_i is younger), T_i waits for T_j.

 Ensures: Younger transactions are rolled back if an older one needs their
resource, preventing starvation of older transactions.

7. File Organization & Indexing

Define:

 Heap File Organization:

o Definition: The simplest file organization where records are placed in the file in the
order in which they are inserted. There is no specific ordering or indexing.

o Characteristics:

 New records are simply appended to the end of the file.

 Deletions create empty spaces that might be reused.

 No sorting or logical order.

o Pros: Very fast for insertions.

o Cons: Very slow for searching (requires full file scan), updates and deletions can be
inefficient if not done at the end.

 Sequential File Organization:

o Definition: Records are stored in a specific sorted order based on the value of a
designated search key (ordering attribute).

o Characteristics:

 Maintained in logical order.


 Updates and insertions can be complex as they might require shifting records to
maintain order.

o Pros: Efficient for sequential processing (e.g., generating reports) and exact match
queries on the ordering key (using binary search).

o Cons: Inefficient for random access, insertions and deletions are costly as they disrupt
the order.

 Hashed File Organization:

o Definition: Records are stored based on a hash function, which computes a memory
address from a specified attribute (hash key). The record is stored at or near that
address.

o Characteristics:

 Uses a hash function to map key values to disk block addresses.

 Collision handling mechanisms are required when different keys map to the
same address.

o Pros: Very fast for exact match queries on the hash key.

o Cons: Inefficient for range queries, collisions can degrade performance, poor for
sequential access.

Explain:

 Primary Index:

o Definition: An index on a file where the search key specifies the sequential order of the
file. It is built on the primary key of the table. There can be only one primary index per
file.

o Characteristics:

 The records in the data file are physically ordered according to the primary index
key.

 Usually a dense or sparse index (sparse if data blocks are sorted).

o Example: An index on StudentID in a Student file, where the Student records are
physically stored sorted by StudentID.

 Secondary Index:
o Definition: An index on a non-ordering attribute (could be a non-key or even a candidate
key, but not the primary key if it's already used for primary index). The data file is not
physically ordered by the secondary index key.

o Characteristics:

 The data file is not necessarily sorted by the secondary index key.

 Always a dense index, as there's no guarantee of physical contiguity for records


with similar key values.

 Multiple secondary indexes can exist on a single file.

o Example: An index on StudentName or Major in a Student file.

 Dense vs Sparse Index: These terms describe how many index entries there are relative to the
data records.

o Dense Index:

 Definition: An index that contains an index entry for every search key value in
the data file.

 Characteristics: Each record or block has a corresponding entry in the index.

 Pros: Faster for exact match lookups, as you can directly find the record's
location.

 Cons: Larger index size, more overhead to maintain.

 Used for: Secondary indexes, or primary indexes on unsorted files.

o Sparse Index:

 Definition: An index that contains an index entry for only some of the search key
values in the data file.

 Characteristics: Typically stores an entry for the first record in each data block.

 Pros: Smaller index size, less overhead.

 Cons: Might require some sequential scan within a block after finding the
correct block.

 Used for: Primary indexes on files that are physically sorted by the search key.
 B+ Tree structure (with diagram): A self-balancing tree data structure that maintains sorted data
and allows searches, sequential access, insertions, and deletions in logarithmic time. Widely
used for indexing in databases.

o Characteristics:

 Balanced Tree: All leaf nodes are at the same level.

 Internal Nodes: Store pointers to child nodes and range of key values to guide
searches. They do not store actual data pointers.

 Leaf Nodes: Form a doubly-linked list, containing all actual data pointers (or the
data records themselves for clustered indexes). This allows for efficient range
queries.

 Order m: Each node (except root) has between ceil(m/2) and m children. The
root can have fewer.

 Efficient for: Both exact match and range queries.

Simplified B+ Tree Diagram:

[ Root Node ]

| 100 200 | (Keys)

/ | \ \

/ | \ \

[ 1-99 ] [100-199] [200-299] [300+] (Internal Nodes/Pointers to Leaf Nodes)

/ | \ / | \ / | \ / | \

/ | \/ | \/ | \/ | \

[10|ptr][20|ptr][30|ptr]... ... [150|ptr][160|ptr] ... (Leaf Nodes - data pointers or actual records)

<--------------------------------------------------------->

Doubly-linked list for efficient range scanning

o ptr represents a pointer to the actual data record or data block.

o Internal nodes only contain keys and pointers to lower-level nodes.

o All actual data (or pointers to data) are in the leaf nodes.
8. Recovery Techniques

Describe:

 Log-based recovery:

o Concept: The most common recovery technique. It involves maintaining a log file (or
journal) that records all database modifications. The log contains information about each
transaction, including the transaction ID, the operation performed (insert, delete,
update), the data item affected, the before-image (value before change), and the after-
image (value after change).

o Process:

1. During normal operation: All changes are first written to the log before being
applied to the database itself (Write-Ahead Logging - WAL). This ensures that if a
crash occurs, the necessary information for recovery is available in the log.

2. During recovery (after crash):

 Redo Phase (Forward Pass): Scan the log forward from the last
checkpoint. For all committed transactions (those with a commit record
in the log), any changes that were not written to disk are redone
(reapplied) using the after-images in the log. This brings the database to
a state where all committed transactions are reflected.

 Undo Phase (Backward Pass): Scan the log backward from the end. For
all uncommitted transactions (those without a commit or abort record,
or with an abort record but no completed undo), any changes that were
written to disk are undone (rolled back) using the before-images in the
log. This removes the effects of partial/failed transactions.

o Pros: Provides a robust and flexible recovery mechanism. Supports concurrent


transactions well.

o Cons: Performance overhead due to log writes. Log can grow very large.

 Shadow Paging:

o Concept: A recovery technique that avoids undo/redo operations by maintaining two


page tables (or directories): a current page table and a shadow page table.

o Process:

 When a transaction starts, the shadow page table is a copy of the current page
table.
 All database updates by the transaction are performed on new physical pages.
The current page table is updated to point to these new pages, while the
shadow page table continues to point to the original (old) pages.

 If the transaction aborts, the current page table is discarded, and the shadow
page table becomes the current page table, effectively restoring the database to
its state before the transaction. The new pages are simply ignored (garbage
collected).

 If the transaction commits, the shadow page table is updated to become a copy
of the current page table, and the old pages are discarded.

o Pros: Simpler and faster recovery from crashes (no undo/redo).

o Cons: Can lead to data fragmentation, overhead of copying page tables, doesn't easily
handle concurrent transactions (typically used for single-user or small multi-user
systems).

 Checkpoints:

o Concept: A mechanism used in log-based recovery to reduce the amount of work


needed during recovery after a crash. A checkpoint is a point in time where the DBMS
ensures that all log records for committed transactions up to that point have been
written to stable storage, and all modified buffer pages have been written to disk.

o Process:

 Periodically, the DBMS performs a checkpoint operation.

 It writes a checkpoint record to the log, listing all currently active transactions.

 It flushes all dirty (modified) buffer blocks to disk.

 It updates a restart file (or similar metadata) with the address of the checkpoint
record in the log.

o Benefit during recovery: Instead of scanning the entire log from the beginning, recovery
can start from the last checkpoint. Only transactions active at the time of the
checkpoint, or those that started after the checkpoint, need to be considered for
undo/redo.

o Pros: Significantly reduces recovery time.

o Cons: Can temporarily halt database operations during the checkpoint process (though
modern systems try to minimize this).

Explain:
 Immediate vs Deferred update: These are two strategies for how database updates are applied
to the disk.

o Immediate Update:

 Concept: Database modifications (updates) are written to the actual database


on disk before a transaction commits.

 Recovery: Requires both undo (for uncommitted transactions that have written
to disk) and redo (for committed transactions whose changes might not have
made it to disk due to a crash after commit but before flush). The log must
contain both before-images and after-images.

 Pros: Changes are visible to other transactions sooner (if isolation levels allow).

 Cons: More complex recovery logic (both undo and redo).

o Deferred Update (No-undo/Redo):

 Concept: Database modifications are not written to the actual database on disk
until the transaction successfully commits. All changes are initially buffered in
main memory or temporary log.

 Recovery:

 If a transaction aborts, no undo is needed because no changes were


written to disk.

 If a crash occurs before commit, the changes are simply discarded.

 If a crash occurs after commit, redo is required for committed


transactions to ensure their changes are eventually written to disk. The
log only needs to contain after-images.

 Pros: Simpler undo logic (no undo needed), faster commit (no disk writes before
commit).

 Cons: Changes are not durable until commit. Requires more memory for
buffering.

 Recovery with concurrent transactions: When multiple transactions are executing concurrently,
recovery becomes more complex. The goal is to ensure that after a crash, the database is
restored to a consistent state, reflecting only the effects of committed transactions.

o Log-based recovery is crucial: The log must capture the order of operations and the
before/after images for all interleaved transactions.
o Atomicity for individual transactions: Each transaction must still satisfy its own ACID
properties.

o Serializability for the schedule: The recovery mechanism, in conjunction with


concurrency control (e.g., 2PL), ensures that the effects of concurrent transactions are
equivalent to some serial execution.

o Steps during recovery (using log and checkpoints):

1. Find relevant log records: Start scanning the log from the last checkpoint.

2. Identify active/committed transactions: Build a list of active transactions at the


time of the crash and a list of committed transactions.

3. Redo Phase: Apply redo operations for all committed transactions (and possibly
some active ones whose changes reached disk) to ensure all committed changes
are reflected in the database. This typically happens forward from the
checkpoint.

4. Undo Phase: Apply undo operations for all uncommitted transactions (those
active at crash or started after checkpoint and not committed) to remove their
partial effects. This typically happens backward from the end of the log.

You might also like