22MCA21 - DBMS - Simplified Notes
22MCA21 - DBMS - Simplified Notes
Simplified Notes
Based on
Previous Year VTU Questions and answers
Ms.SHAHEENA K V
Assistant Professor, Department of MCA
Acharya Institute of Technology
Soladevanahalli, Bengaluru – 560107
2023-24
ACHARYA INSTITUTE OF TECHNOLOGY
DEPARTMENT OF MCA
PO 2: Problem analysis: Identify, formulate, review research literature, and analyze complex
engineering and business problems reaching substantiated conclusions using first principles of
mathematics, natural sciences, and engineering sciences.
PO 5:Modern tool usage: Create, select, and apply appropriate techniques, resources, and
modern engineering and IT tools including prediction and modeling to complex
engineering activities with an understanding of the limitations
III. PROGRAM EDUCATIONAL OBJECTIVES (PEOS)
PEO 1: be skilled software engineers who can adopt lifelong learning philosophy for
continuous improvement and acquire higher education.
PEO 3: Expertise to use research, experiment, contemporary issues to solve real time industrial
problems.
IV. PROGRAM SPECIFIC OUTCOME (PSOS)
PSO 1: Develop self-motivational ability to use current technologies, skills and models in
computing discipline.
PSO 3: Expertise to use research, experiment, contemporary issues to solve real time industrial
problemscomplete zealand commitment.
CO1
CO2
CO3
CO4
CO5
1. What is DBMS ? Explain the actors on the scene and workers behind the scene or
Discus on the different types of people who works in database environment
A DBMS is software that helps define, store, manipulate, and share databases. Its key functions include:
Defining the database: Specifying data types, structures, and constraints.
Constructing the database: Storing data on a controlled storage medium.
Manipulating the database: Retrieving and updating data, and generating reports.
Sharing the database: Allowing multiple users and applications to access it simultaneously.
Additional functions:
Protection: Safeguarding against hardware failures and unauthorized access.
Maintenance: Evolving the system as requirements change over time, since databases can
have a long life cycle.
Actors on the Scene in Database Systems
Database Administrators (DBAs): Responsible for managing and overseeing the database. Tasks
include: Authorizing user access. ,Managing database resources.,Addressing security issues,Monitoring
system performance. In large organizations, DBAs often work with a team for support.
Database Designers: Focus on designing the structure of the database. Decide what data needs to
be stored and how it should be structured. Collaborate with users to understand their requirements and
create a database that fulfills their needs.
End Users:
Casual Users: Use the database occasionally with advanced tools for queries.
Naive/Parametric Users: Perform routine tasks using predefined queries and updates.
Sophisticated Users: Advanced users, like engineers, who handle complex tasks using DBMS
features.
Standalone Users: Manage personal databases using pre-built software.
System Analysts and Application Programmers:
System Analysts: Gather user requirements and specify the needs for the database.
Application Programmers: Write programs that implement standard transactions and applications
based on these requirements.
Workers Behind the Scene:
1. DBMS System Designers and Implementers: Develop the software that powers the DBMS.
Create modules for query processing, data access, concurrency control, and data recovery.
2. Tool Developers: Develop specialized software tools for database design, performance monitoring,
and other tasks. These tools are often developed by independent software vendors.
3. Operators and Maintenance Personnel: Ensure the smooth operation of the database hardware
and software. They do not interact with the database content directly but manage the infrastructure.
changes (like adding a new field) are made in the catalog, not in the programs. This allows flexible updates
without altering applications.
3. Support for Multiple Views of the Data: A database can provide different views to different
users based on their needs. For example, an admin might need access to student grades, while a teacher
might only need attendance records.
4. Sharing of Data and Multi-User Transaction Processing: Multiple users can access the
database simultaneously. The DBMS ensures that transactions (such as booking a flight seat) are handled
without conflicts, maintaining data accuracy through concurrency control.
Advantages of Using the DBMS Approach
1. Controlling Redundancy:In traditional systems, different departments maintain their own files,
leading to duplicate data. A DBMS centralizes data storage, reducing redundancy and ensuring consistency,
though controlled redundancy can be used for performance improvement.
2. Restricting Unauthorized Access :A DBMS allows restricting access, so only authorized users
can view or modify data. For instance, salary information might be accessible only to HR staff.
3. Providing Persistent Storage for Program Objects: Databases can permanently store complex
program objects, helping maintain object integrity across programs without requiring manual conversion to
and from files.
4. Efficient Query Processing: A DBMS optimizes queries using indexes and specialized data
structures, speeding up data retrieval. It also manages buffering and caching for frequently accessed data.
5. Backup and Recovery: A DBMS provides mechanisms for recovering from hardware or
software failures, ensuring that the database can be restored to its previous state in case of errors.
6. Providing Multiple User Interfaces: A DBMS supports various user interfaces like web-based
GUIs, query languages, and programming interfaces for different types of users.
7. Representing Complex Relationships:A DBMS handles complex relationships efficiently. For
example, it can manage relationships between students and courses and ensure consistent updates.
8. Enforcing Integrity Constraints: A DBMS enforces rules (like data types or unique keys) to
maintain data accuracy and prevent errors.
9. Permitting Inference and Actions: Modern databases allow for automated inference (deductive
rules) and actions (triggers or stored procedures) based on changes in the data.
3. Explain advantages of using database system approach over traditional file system
approach
Advantages of Using the DBMS Approach
1. Controlling Redundancy:In traditional systems, different departments maintain their own files,
leading to duplicate data. A DBMS centralizes data storage, reducing redundancy and ensuring consistency,
though controlled redundancy can be used for performance improvement.
2. Restricting Unauthorized Access :A DBMS allows restricting access, so only authorized users
can view or modify data. For instance, salary information might be accessible only to HR staff.
3. Providing Persistent Storage for Program Objects: Databases can permanently store complex
program objects, helping maintain object integrity across programs without requiring manual conversion to
and from files.
4. Efficient Query Processing: A DBMS optimizes queries using indexes and specialized data
structures, speeding up data retrieval. It also manages buffering and caching for frequently accessed data.
5. Backup and Recovery: A DBMS provides mechanisms for recovering from hardware or
software failures, ensuring that the database can be restored to its previous state in case of errors.
6. Providing Multiple User Interfaces: A DBMS supports various user interfaces like web-based
GUIs, query languages, and programming interfaces for different types of users.
7. Representing Complex Relationships:A DBMS handles complex relationships efficiently. For
example, it can manage relationships between students and courses and ensure consistent updates.
8. Enforcing Integrity Constraints: A DBMS enforces rules (like data types or unique keys) to
maintain data accuracy and prevent errors.
9. Permitting Inference and Actions: Modern databases allow for automated inference (deductive
rules) and actions (triggers or stored procedures) based on changes in the data.
A database environment is made up of various components that work together to manage, store, and use
data efficiently. These components include:
Hardware: The physical equipment such as computers and devices used to run the database system.
Software: Includes the operating system and database management software like MS Access or
SQL Server, which helps in managing the data.
People: Those who manage and use the database system, including database administrators and
end-users.
Techniques: The rules, concepts, and procedures used to manage the database and guide how
people and software interact with the data.
Data: The actual information or facts stored in the database that are organized and accessed
through the system.
In short, the database environment is a combination of all these components working together to handle
data efficiently.
Entity Entities are represented by rectangles in ERD. Rectangles are named with the entity set they
represent.
For Example
Attribute Attributes are represented by ellipses. Ellipses are name with attribute name and connected with
rectangle (entity).
Relationship A relationship describes how entities interact. For example, the entity “Carpenter” may be
related to the entity “table” by the relationship “builds” or “makes”. Relationships are represented by
diamond shapes and are labelled using verbs.
Relation Instance: A set of tuples at any given point in time. A relation instance doesn't have duplicate
tuples.
Relation Key: One or more attributes that uniquely identify a tuple in a relation.
Attribute Domain: The predefined set of values and data types for an attribute.
2. Describe the entity integrity and referential integrity in details with necessary
example
Integrity Constraints: Integrity constraints are a set of rules. It is used to maintain the quality of
information. Integrity constraints ensure that the data insertion, updating, and other processes have to be
performed in such a way that data integrity is not affected. Thus, integrity constraint is used to guard
against accidental damage to the database.
Types of Integrity Constraint
This is because the primary key value is used to identify individual rows in relation and if the primary
key has a null value, then we can't identify those rows.
A table can contain a null value other than the primary key field.
For example:
A foreign key is a column or group of columns in a relational database table that provides a link between
data in two tables. It is a column (or columns) that references a column (most often the primary key) of
another table.
Example: STUD_NO in STUDENT_COURSE is a foreign key to STUD_NO in STUDENT relation.
Table Student_Course
4. How does the various update operations deal with Constraint violation's
In a database, there are four main update operations:
Insert: Adds new data to a table.
Delete: Removes data from a table.
Update: Changes existing data in a table.
Select: Retrieves specific data from the database (this doesn't modify data).
Insert Operation :When adding new data (inserting), certain rules (called constraints) must be followed:
Domain: The data must be in the correct format (like numbers for age).
Primary Key: The key (ID) must be unique and cannot be empty (NULL).
Foreign Key: If a column refers to another table, that data must exist in the other table.
Delete Operation:When deleting data, there could be issues if the data is linked to another table:
Restrict: You can't delete the data if it's linked.
Cascade: Automatically delete the linked data too.
Set Null: Set the linked data to NULL.
Update Operation: When changing data (updating), you must ensure: The new data fits the rules (like the
correct format). Changing a key (ID) doesn't create duplicates or break links with other tables.
In case of any rule violations, the database rejects the operation to maintain data integrity.
Other Operations
JOIN:Combines rows from two tables based on a matching condition. Example: Combine employees
with their department details based on Dno.
Cartesian Product (Cross Product):
Operation: Combines every tuple from one relation with every tuple from another relation.
Resulting Relation: Includes all attributes from both relations.
Tuple Count: If relation A has 'n' tuples and relation B has 'm' tuples, then the Cartesian product
A×B will have n×mn \times mn×m tuples.
Example:
If A = {(1, 2), (3, 4)} and B = {(5, 6)}, then A×BA \times BA×B = {(1, 2, 5, 6), (3, 4, 5, 6)}.
2. JOIN Operations:
JOIN: Combines tuples from two relations based on a related condition.
Difference from Cartesian Product: JOIN is more efficient as it only returns rows where the join condition
is met.
Types of JOIN:
Theta Join: Uses comparison operators (e.g., <, >, =).
Cross Join (Cartesian Product): When there is no join condition, it results in a Cartesian Product.
Outer Joins:
Left Outer Join: Returns all tuples from the left table (A), and matching tuples from the right table (B).
Non-matching tuples are filled with NULL.
Right Outer Join: Returns all tuples from the right table (B), and matching tuples from the left table (A).
Non-matching tuples are filled with NULL.
Full Outer Join: Combines both Left and Right Outer Joins. It returns all tuples from both tables, filling
non-matching tuples with NULL.
3. DIVISION Operation: Used for queries that require identifying tuples in one relation (R) associated
with all tuples in another relation (S).
Operation:
Given R(Z) and S(X), where X⊆Z and Y=Z−X the result of R÷S is a relation T(Y) containing tuples that
are associated with all tuples in S.
Example: If relation R contains student-course combinations and S contains a list of courses, division
would find students enrolled in all courses in S.
Aggregate Functions and Grouping in Relational Algebra: Aggregate functions like SUM, COUNT,
AVERAGE, MAX, and MIN are used to perform calculations on data. Grouping is used to organize data
based on one or more attributes before applying these functions. Special Operator: ℑ
Function list: Aggregate functions applied to grouped data (e.g., COUNT(Ssn), AVERAGE(Salary)).
R: The relation (table) on which the operation is performed.
Example
Suppose we want to retrieve each department's number (Dno), the number of employees in the department
(COUNT(Ssn)), and their average salary (AVERAGE(Salary)). The query is written as:
ρR(Dno,No_of_employees,Average_sal)(Dno ℑ COUNT(Ssn),AVERAGE(Salary)(EMPLOYEE))
Here, ρR renames the resulting relation's attributes to Dno, No_of_employees, and Average_salary.
Without Grouping: If no grouping is specified, the aggregate functions apply to the entire relation.
For example, to compute the total number of employees and the average salary for all employees, the query
is: ℑCOUNT(Ssn),AVERAGE(Salary)(EMPLOYEE))
This would result in a single tuple relation with the values of the aggregate functions.
Common Aggregate Functions:
COUNT(X): Counts the number of tuples.
SUM(X): Adds up values in a column.
AVERAGE(X): Calculates the average value.
MAX(X): Finds the highest value.
MIN(X): Finds the lowest value.
TIMESTAMP: Stores date and time, including fractions of a second (e.g., '2024-10-16
10:30:45.123').
Binary Data Types:
BLOB: Stores large binary data like images or videos (e.g., up to 4 GB).
Boolean (Emulated in SQL):
BOOLEAN: Represents true/false values (often emulated with 1 for true and 0 for false).
Data Type Description Range/Storage
Stores integers and real numbers with precision Precision: up to 38 digits. Scale: -84
NUMBER(p,s)
p and scale s. to 127.
TIMESTAMP Stores date and time with fractional seconds. Up to 9 fractional seconds precision
4. Briefly explain about Data Manipulation Commands with syntax and examples.
Data Manipulation Commands (DML) in SQL: Data Manipulation Language (DML) includes commands
that manage data stored in the database. These commands allow users to retrieve, insert, update, and delete
data.
Key DML Commands:
SELECT:Retrieves data from one or more tables.
Syntax: SELECT column1, column2, ... FROM table_name WHERE condition;
Example: SELECT EmpName, Salary FROM Employees WHERE Department = 'HR';
Retrieves the names and salaries of employees from the HR department.
INSERT: Adds new data (rows) to a table.
Syntax: INSERT INTO table_name (column1, column2, ...) VALUES (value1, value2, ...);
Example:
INSERT INTO Employees (EmpID, EmpName, Department, Salary) VALUES (101, 'John Doe', 'HR',
50000);
Inserts a new employee into the Employees table.
UPDATE:Modifies existing data in a table.
Syntax: UPDATE table_name SET column1 = value1, column2 = value2, ... WHERE condition;
Example:UPDATE Employees SET Salary = 55000 WHERE EmpID = 101;
Updates the salary of the employee with ID 101.
DELETE:Removes rows from a table based on a condition.
Syntax:DELETE FROM table_name WHERE condition;
Example:DELETE FROM Employees WHERE EmpID = 101;
Deletes the employee with ID 101 from the Employees table.
This query groups employees by their department and counts the number of employees in each department.
2. HAVING Clause: The HAVING clause is used to filter groups created by the GROUP BY clause.
It is similar to the WHERE clause but works after the grouping is done, allowing you to filter groups based
on aggregate functions.
Syntax: SELECT column1, aggregate_function(column2)FROM table_name GROUP BY
column1HAVING condition;
Example: SELECT Department, COUNT(EmpID) AS TotalEmployeesFROM EmployeesGROUP BY
Department HAVING COUNT(EmpID) > 5;
This query groups employees by department, then filters those groups to show only departments with more
than 5 employees.
Step 1: Create a View: Suppose we want to create a view that displays the names and salaries of employees
in the IT department:
CREATE VIEW IT_Employees ASSELECT Name, Salary FROM Employees
WHERE Department = 'IT';
Step 2: Query the View:Now you can query the view just like a table:
SELECT * FROM IT_Employees;
Output:
Name Salary
Bob 80000
Charlie 75000
Step 3: Updating a View:If the view is updatable and you want to change Bob's salary, you can do so:
UPDATE IT_EmployeesSET Salary = 85000WHERE Name = 'Bob';
This update will reflect in the underlying Employees table.
Step 4: Dropping a View: If you no longer need the view, you can drop it using:
DROP VIEW IT_Employees;
Employees Table:
EmployeeID Name DepartmentID Salary Departments Table:
1 Alice 1 60000
2 Bob 2 80000 DepartmentID DepartmentName
3 Charlie 2 75000 1 HR
4 David 3 70000 2 IT
To find the names of employees who earn more than the average salary in the Employees table, you can
use a subquery:
SELECT Name FROM Employees WHERE Salary > (SELECT AVG(Salary) FROM Employees);
The inner query (SELECT AVG(Salary) FROM Employees) calculates the average salary, and the outer
query retrieves the names of employees whose salaries exceed this average.
2. Correlated Subqueries: A correlated subquery is a type of subquery that references columns from the
outer query. Unlike a regular subquery, which can be executed independently, a correlated subquery cannot.
It is executed once for each row processed by the outer query.
Example of a Correlated Subquery
Using the same Employees and Departments tables, let’s find employees who earn more than the average
salary in their respective departments.
SELECT Name, Salary FROM Employees e WHERE Salary >
(SELECT AVG(Salary) FROM Employees WHERE DepartmentID = e.DepartmentID);
The inner query calculates the average salary for the department of the current employee from the outer
query (e.DepartmentID). The outer query selects the name and salary of employees whose salaries are
greater than the average salary of their respective departments.
The trigger ensures that the Constituency table is updated automatically whenever a new voter is added.
Triggers are event-driven, so once created, you don't need to call them manually—they execute whenever
the defined event occurs.
11. Define Database stored procedure. Explain creating and calling stored procedure
with example.
Database Stored Procedure :A stored procedure is a precompiled group of SQL statements stored in the
database. It allows you to execute multiple SQL commands as a single unit.
Reusable: Can be called multiple times, reducing repetition.
Takes Parameters: Can accept input and output parameters.
Efficient: Runs faster as it’s precompiled.
Syntax to Create a Stored Procedure:
CREATE OR REPLACE PROCEDURE procedure_name
( param1 IN datatype, param2 OUT datatype )
AS
BEGIN
-- SQL statements
END procedure_name;
Example:Creating a procedure to add a new employee to the Employees table:
CREATE OR REPLACE PROCEDURE AddNewEmployee (
emp_id IN NUMBER,
emp_name IN VARCHAR2,
emp_salary IN NUMBER
)
AS
BEGIN
INSERT INTO Employees (EmployeeID, EmployeeName, Salary)
VALUES (emp_id, emp_name, emp_salary);
END
JDBC Drivers:The drivers are used to translate generic JDBC calls into database-specific calls.
Data Sources: Databases or data storage systems that the Java application interacts with.
Types of JDBC Drivers: JDBC drivers are classified into four types based on how they communicate with
the database.
Type I: JDBC-ODBC Bridge Driver: Translates JDBC calls into ODBC (Open Database Connectivity)
calls, and then uses ODBC to connect to the database.
Type I driver provides mapping between JDBC and access API of a database
A common Type I driver defines a JDBC to ODBC bridge
Native API
Driver (Type I)
Client
ODBC
API
Data
Application
base
Type II: Native-API Driver (Partially Java): Converts JDBC calls into database-specific calls using
native database libraries.
Type II driver communicates directly with native API
Type II makes calls directly to the native API calls
More efficient since there is one less layer to contend with (i.e. no ODBC)
It is dependent on the existence of a native API for a database
Type III: Network Protocol Driver:Converts JDBC calls into a database-independent protocol that is sent to
a server. The server then translates the requests into database-specific calls.
This communication uses a database independent net protocol
Middleware server then makes calls to the database using database-specific protocol
The program sends JDBC call through the JDBC driver to the middle tier
Middle-tier may use Type I or II JDBC driver to communicate with the database.
Type IV: Thin Driver (Pure Java): Directly converts JDBC calls into database-specific protocol using Java.
It communicates directly with the database.
It issues requests directly to the database using its native protocol
It can be used directly on platform with a JVM
Most efficient since requests only go through one layer
Simplest to deploy since no additional libraries or middle-ware
each employee, it’s redundant since these could be stored once in the DEPARTMENT table.
Anomalies:
Insertion: Adding new data might require unnecessary or inconsistent information.
Deletion: Deleting data could unintentionally remove other data.
Modification: Updating one attribute could require changes across multiple rows.
Guideline: Design the schema to avoid insertion, deletion, and update anomalies.
Minimize NULL Values: NULL Values: Attributes that frequently hold NULLs could be split into
separate relations.
Reasons for NULLs:
Attribute is not applicable.
Attribute value is unknown.
Value exists but is currently unavailable.
Guideline: Design relations to have as few NULL values as possible.
Prevent Spurious Tuples:
Spurious Tuples: Poor designs can lead to erroneous results when performing JOIN operations.
Lossless Join: Ensures no incorrect data appears when relations are joined.
Guideline: Ensure the schema satisfies the lossless join property to avoid generating spurious tuples.
2. What are the problems caused by Redundancy? Explain about Normalization and
Insertion Anomalies: Adding new data may require unnecessary or repetitive information.
Example: To add a new employee, you might also have to insert department details if stored redundantly in
the same table.
Deletion Anomalies: Removing data can unintentionally delete other important information.
Example: Deleting the last employee in a department might result in the deletion of department details if
they are stored in the same table.
Modification Anomalies: Updating data in one place but missing it elsewhere leads to inconsistencies
across the database.
Normalization:Normalization is a systematic process of organizing data in a database to reduce
redundancy and improve data integrity. It involves decomposing tables to ensure that each piece of
information is stored only once.
Need for Normalization
Reduce Redundancy: By eliminating duplicate data, normalization ensures that each fact is
stored in only one place.
Improve Data Integrity: Since data is not repeated, updates, insertions, and deletions are easier
to manage and less prone to errors.
Prevent Anomalies:
Update Anomalies are minimized since data changes only need to happen in one place.
Insertion Anomalies are reduced by separating data into logical tables.
Deletion Anomalies are avoided because data relationships are maintained properly.
Efficient Data Organization: Breaking down data into related tables helps streamline queries
and optimizes database performance.
3. Define Functional Dependency. State and prove Arm Strong's inference rules.
4. Discuss Different inference rules of Functional Dependencies
two tuples (rows) have the same values for certain attributes, then they must have the same values for
another set of attributes.
Notation: If X and Y are sets of attributes in a relation R, then the functional dependency is represented as:
X→Y
This means that if two tuples have the same value for X, they must also have the same value for Y. In other
words, X uniquely determines Y.
Example: In a table of students, if StudentID determines StudentName, we write it as:
StudentID→StudentName
This means that each unique StudentID corresponds to one StudentName.
Armstrong's Axioms (Inference Rules for Functional Dependencies):Armstrong's Axioms are a set of
sound and complete inference rules used to derive all possible functional dependencies logically implied by
a given set of functional dependencies. These axioms were introduced by William W. Armstrong in 1974.
Armstrong's Axioms
Reflexivity Rule: If Y is a subset of X, then X → Y.
Example: If X = {A, B} and Y = {A}, then X → Y is valid.
Augmentation Rule: If X → Y, then XZ → YZ for any Z (where Z is an additional set of attributes).
Example: If StudentID → StudentName, then {StudentID, CourseID} → {StudentName, CourseID} is
valid.
Transitivity Rule: If X → Y and Y → Z, then X → Z.
Example: If StudentID → StudentName and StudentName → StudentAddress, then StudentID →
StudentAddress.
Additional Inference Rules (Derived from Armstrong's Axioms):
These additional rules can be derived from the three basic axioms (reflexivity, augmentation, and
transitivity):
Union Rule (Decomposition and Union): If X → Y and X → Z, then X → YZ.
Proof:
From X → Y and using augmentation, we get X → YZ.
From X → Z and using augmentation, we get X → YZ.
Decomposition Rule (Projectivity): If X → YZ, then X → Y and X → Z.
Proof:
Since X → YZ, it implies X → Y and X → Z as individual functional dependencies.
Normal forms: Condition using keys and FDs of a relation to certify whether a relation schema is in
a particular normal form
INF
2NF, 3NF, BCNF - based on keys and FDs of a relation schema
4NF-based on keys, multi-valued dependencies : MVDs;
5NF - based on keys, join dependencies : JDs
Here, DeptName depends only on DeptID and not the whole key (partial dependency).
Third Normal Form: A given relation is called in Third Normal Form (3NF) if and only if
1. Relation already exists in 2NF.
Here, DeptLocation depends on DeptID, not directly on EmpID, leading to a transitive dependency.
STUDENT_COURSE Table:
StudentID Course
101 Math
102 Math
103 Science
104 Science
In COURSE_INSTRUCTOR, Course uniquely determines Instructor, and Course is the primary key.
In STUDENT_COURSE, (StudentID, Course) is the primary key, and there are no non-trivial
dependencies other than the primary key itself.
Benefits of BCNF: Reduces redundancy by removing anomalies. And Makes updates, deletions, and
insertions more efficient and consistent.
In this example, Grade has a fully functional dependency on (StudentID, CourseID), as it depends on both
attributes together, while Instructor only depends on StudentID (a partial dependency). To meet 2NF, we
should remove partial dependencies by decomposing the table.
2. Transitive Dependency:A transitive dependency occurs when a non-key attribute depends on another
non-key attribute rather than directly on the primary key. In other words, if attribute A depends on attribute
B, and B depends on attribute C, then A is transitively dependent on C.
Transitive dependencies often lead to redundancy and anomalies and prevent a table from meeting Third
Normal Form (3NF).
Example: Consider a table EmployeeDept with attributes (EmployeeID, DepartmentID,
DepartmentLocation, DepartmentHead):
FD 1: EmployeeID → DepartmentID
steps successfully or none at all. If any step fails, the transaction must be rolled back to maintain database
consistency.
ACID Properties of Transactions: The ACID properties are crucial for maintaining the reliability,
consistency, and integrity of transactions in a database system. They ensure that even in the presence of
failures, concurrent executions, or other unexpected events, the database remains in a consistent state.
To ensure that transactions are processed reliably, they are characterized by four key properties known as
ACID:
Atomicity: This property states that a transaction must be treated as an atomic unit; that is, either all of its
operations are executed successfully, or none of them are executed at all. In other words, a transaction is
"all or nothing."
Example: If the money transfer transaction described above fails after deducting the amount from the
sender's account but before crediting it to the recipient's account, atomicity ensures that the deduction is
rolled back. As a result, the sender's balance remains unchanged, preventing any partial completion of the
transaction.
Consistency: The consistency property ensures that a transaction takes the database from one consistent
state to another consistent state. It mandates that all data integrity constraints must be maintained.
Example: In the money transfer example, if the sender’s account has sufficient funds before the transaction,
the database should remain consistent after the transaction. The total amount of money in the system before
and after the transaction should be the same.
Isolation: Isolation ensures that transactions are executed independently of one another. Even if multiple
transactions are running concurrently, the results of each transaction should not be visible to others until
they are committed.
Example: If two users are transferring money simultaneously, isolation ensures that the operations of one
transaction do not affect the operations of another. Each transaction appears to run in isolation, which
prevents any interference that could lead to inconsistent results.
Durability:The durability property guarantees that once a transaction has been committed, its changes are
permanent, even in the event of a system failure. Once the database confirms that a transaction has been
completed, the changes it made are stored in a way that survives crashes and restarts.
Example: After the money transfer transaction is successfully committed, even if a power failure occurs
immediately afterward, the changes (i.e., updated balances of both accounts) will be saved in the database.
Non-recoverable Schedule
Recoverable Schedule: A schedule is recoverable if it ensures that no transaction reads data written by
another uncommitted transaction.
Example: Consider two transactions, T1 and T2.
If T1 writes to a data item A and T2 reads A, T2 must not commit until T1 has committed.
If T1 fails and is rolled back, T2 should also roll back to maintain consistency.
Non-Recoverable Schedule: In a non-recoverable schedule, a transaction may read data written by another
uncommitted transaction.
Example: If T1 writes to data item A, and T2 reads A before T1 commits, then if T1 fails and rolls back,
T2 may have committed using an invalid value, resulting in inconsistencies.
Such schedules can lead to the "lost update" or "dirty read" problems discussed earlier.
Cascading Schedule: A cascading schedule is one where the rollback of one transaction may cause other
transactions to also rollback.
Example: If T1 writes to A and T2 reads A before T1 commits, if T1 fails, T2 must also be rolled back.
This can lead to performance issues and should be avoided in practice, as it creates a chain reaction of
rollbacks.
Strict Schedule: A strict schedule prohibits any transaction from reading or writing data written by another
transaction until that transaction has been committed.
This is the safest type of scheduling for ensuring recoverability, as it eliminates the possibility of cascading
rollbacks.
Importance of Recoverability
Data Integrity: Ensuring that transactions do not interfere with each other's uncommitted changes helps
maintain the integrity and consistency of the database.
System Stability: Recoverability helps the system recover from failures without compromising data
correctness. It minimizes the potential for inconsistencies that can arise from concurrent transaction
execution.
Conflict Management: By managing read and write operations carefully, recoverability mechanisms help
prevent common concurrency problems like lost updates and dirty reads.
3. Describe the two-phase locking technique for concurrency control in databases. Explain
how it ensures serializability.
Two-Phase Locking (2PL) Technique for Concurrency Control in Databases:Two-phase locking (2PL)
is a concurrency control protocol used in database management systems to ensure that transactions are
executed in a manner that maintains database consistency and ensures serializability. The protocol divides
the execution of a transaction into two distinct phases: the growing phase and the shrinking phase.
Phases of Two-Phase Locking
Growing Phase:
1. In this phase, a transaction may acquire any number of locks on data items it needs to access, but it
cannot release any locks.
2. As the transaction progresses, it requests and obtains the necessary locks to perform read or write
operations on the data.
3. The goal is to gather all the required locks before proceeding to the next phase.
Shrinking Phase:
1. Once a transaction releases its first lock, it enters the shrinking phase. During this phase, the
transaction can only release locks; it cannot acquire any new locks.
2. This restriction ensures that once a transaction starts releasing locks, it is moving toward completion.
Lock Conversion: In 2PL, lock conversion is allowed:
Upgrading Lock: In the growing phase, a transaction can upgrade a shared lock (S) to an exclusive
lock (X) on a data item. For example, if a transaction holds a shared lock S(a) on a data item and
later requires an exclusive lock, it can upgrade it to X(a).
Downgrading Lock: In the shrinking phase, if a transaction holds an exclusive lock (X) on a data
item and wants to downgrade it to a shared lock (S), this must occur during the shrinking phase.
Lock Release: In Strict-2PL, a transaction does not release any locks until the transaction is fully
committed. This means that all locks are held until the end of the transaction, eliminating the
shrinking phase.
This approach guarantees that no other transaction can access the data items that are locked by the
transaction until it has completed, ensuring a higher level of consistency.
Ensuring Serializability: Two-phase locking ensures serializability, which means that the outcome of
executing transactions concurrently is equivalent to some serial execution of those transactions. Here’s
how 2PL achieves this:
No New Locks in Shrinking Phase: Since no new locks can be acquired in the shrinking phase, once a
transaction starts releasing locks, it is guaranteed that the operations can no longer interfere with other
transactions.
Lock Point: The concept of a lock point (the moment when a transaction acquires its last lock) helps
establish a clear boundary between the growing and shrinking phases, ensuring that any locks held by
other transactions are respected.
Conflict Serializability: Transactions are structured in such a way that if two transactions access the
same data items, they will do so in a consistent manner, preventing anomalies such as lost updates or
dirty reads.
Strict-2PL Advantage: By delaying lock releases until a transaction is completely finished, Strict-2PL
further enhances the guarantee of serializability. Since locks are not released until commit, it prevents
cascading rollbacks, where the failure of one transaction might necessitate rolling back others.
When a transaction wants to write to a data item, it checks the timestamps of any existing transactions that
have read or written that item. If there are conflicts (i.e., another transaction with an earlier timestamp has
written to the item), the transaction is aborted.
Let's assume there are two transactions T1 and T2. Suppose the transaction T1 has entered the system at
007 times and transaction T2 has entered the system at 009 times. T1 has the higher priority, so it executes
first as it is entered the system first.
The timestamp ordering protocol also maintains the timestamp of last 'read' and 'write'
operation on a data.
Basic Timestamp ordering protocol works as follows:
1. Check the following condition whenever a transaction Ti issues a Read (X) operation:
If W_TS(X) >TS(Ti) then the operation is rejected.
If W_TS(X) <= TS(Ti) then the operation is executed.
Timestamps of all the data items are updated.
2. Check the following condition whenever a transaction Ti issues a Write(X) operation:
If TS(Ti) < R_TS(X) then the operation is rejected.
If TS(Ti) < W_TS(X) then the operation is rejected and Ti is rolled back otherwise the operation is
executed.
Where,
TS(TI) denotes the timestamp of the transaction Ti.
R_TS(X) denotes the Read time-stamp of data-item X.
W_TS(X) denotes the Write time-stamp of data-item X.
Granularity Locking allows a more efficient concurrency control mechanism by providing flexibility in the
size of locks. It reduces contention for resources, improves concurrency by allowing multiple transactions
to access different parts of the database simultaneously, and ultimately enhances the overall performance of
database operations.
Here’s how it works and improves concurrency:
Hierarchical Structure: The database is divided into different levels (or granularity levels), which can be
visualized as a tree. For example:
Root Level: Entire database
Second Level: Areas (logical divisions of the database)
Third Level: Files (collections of records)
Fourth Level: Records (individual data entries)
Locking Mechanism: In this structure, you can lock at different levels. For instance, locking an entire area
would also implicitly lock all files and records within that area.
Conversely, if you lock a specific record, you do not lock the entire file or area, allowing other transactions
to access other records within the same file or area concurrently.
Improved Concurrency:
Reduced Lock Overhead: By allowing locks at different levels, Multiple Granularity Locking
reduces the overhead of managing locks, as fewer locks need to be held simultaneously.
Increased Flexibility: It provides flexibility in how locks are acquired and released, enabling
transactions to work with only the specific data they need without locking larger sections unnecessarily.
Better Resource Utilization: Multiple transactions can work on different levels of the database
simultaneously, improving resource utilization and overall throughput.
Reference books:
1. Fundamentals of Database Systems, Ramez Elmasri and Shamkant B. Navathe, 7th Edition, 2017,
Pearson.
2. Database management systems, Ramakrishnan, and Gehrke, 3rd Edition, 2014, McGraw Hill.
3. Abraham Silberschatz, Henry F. Korth and S. Sudarshan‟s Database System Concepts 6th EditionTata
Mcgraw Hill