Database Management Systems Solutions
Database Management Systems Solutions
The core components of a database schema are integral to its design. Tables represent the
basic units of data storage, organized into rows and columns. Columns define the specific type
of data to be held, while rows represent individual entries or records. Fields (Columns) within
these tables are assigned specific data types, such as integers, variable characters (varchar), or
dates, which dictate the kind of data they can hold. These fields can also be associated with
constraints, including primary keys, foreign keys, and unique constraints, to enforce data
integrity. Relationships are established through primary keys, which uniquely identify records
within a table, and foreign keys, which link tables together to ensure referential integrity.
Indexes are crucial for enhancing the speed of data retrieval, allowing for quicker access to
specific data points. Lastly, Views are virtual tables derived from queries across one or more
existing tables, serving to simplify complex queries and enhance data security by restricting
access to specific data subsets.
Database schemas are categorized into various types, each offering distinct levels of
abstraction and serving different purposes. The Physical Schema describes the lowest level of
data organization, detailing how data is physically stored in the database. This includes
specifications for files, indices, storage devices, and hardware configurations, with a focus on
optimizing storage and retrieval performance. The Logical Schema outlines the logical design
of the database, focusing on the structure without considering physical implementation details. It
specifies tables, fields, data types, relationships, and constraints, defining how data is logically
organized and interconnected. The Conceptual Schema, often referred to as the View
Schema, provides a high-level overview of the entire database structure. It abstracts the logical
schema, presenting an overall picture of entities, relationships, and constraints without delving
into implementation specifics. The existence of physical, logical, and conceptual schemas is a
deliberate design choice to manage the inherent complexity of database systems. Each level
provides a different perspective or abstraction, allowing various stakeholders, such as database
administrators, developers, and end-users, to interact with the database at their appropriate
level of detail without being overwhelmed by unnecessary complexities. For instance, an
end-user does not need to understand physical storage mechanisms to query data; they only
require a conceptual view. This multi-level architecture is crucial for modularity, data
independence, and maintainability in large-scale database systems, enabling changes at one
level (e.g., physical storage optimization) with minimal impact on higher levels (e.g., user
applications), which directly supports the concept of data independence.
In contrast to a static database schema, a Database Instance refers to a specific instantiation
or snapshot of a database system at a particular moment in time. It encompasses the
operational database along with its associated resources, including memory, processes, and
background processes. Unlike the static schema blueprint, a database instance is dynamic and
evolves as data is inserted, updated, or deleted. The primary distinction is that the schema
serves as the static blueprint of the database's structure, while the instance represents the
dynamic, active state containing the actual data values at any given point.
Table 1.3: Database Schema vs. Database Instance
Aspect Database Schema Database Instance
Definition Blueprint or design of the Actual data stored in the
database structure. database at a given time.
Nature Static (does not change Dynamic (changes with every
frequently). data modification).
Represents Structure (tables, columns, data State of the data in the
types, relationships). database.
Example Table definitions, data types, Actual rows of data in the
constraints. tables.
Change Frequency Changes infrequently (e.g., Changes frequently with
during schema design transactions.
changes).
Database Management Systems provide a specialized set of languages for manipulating data,
encompassing operations such as insertion, deletion, updating, and modification. These
database languages are specifically designed to read, update, and store data efficiently within
the database.
Data Definition Language (DDL) is used for describing structures, patterns, and their
relationships within a database. It is the language used to define the database schema,
including tables, indexes, and constraints. DDL commands primarily affect the structure of the
database, not the data itself. A key characteristic of DDL commands is that they are
"auto-committed," meaning that any changes made are permanently saved to the database
immediately upon execution. Common DDL commands include:
● CREATE: Used to construct a new table or an entire database. For example, CREATE
TABLE EMPLOYEE(Name VARCHAR2(20), Email VARCHAR2(100), DOB DATE);.
● ALTER: Employed to modify the existing structure of a database. This could involve
adding a new column, such as ALTER TABLE STU_DETAILS ADD(ADDRESS
VARCHAR2(20));, or modifying the characteristics of an existing column, as in ALTER
TABLE STU_DETAILS MODIFY (NAME VARCHAR2(20));.
● DROP: Used to completely delete both the structure and all records stored within a table.
For instance, DROP TABLE EMPLOYEE;.
● TRUNCATE: Removes all rows from a table while preserving its structure and freeing up
the space occupied by the deleted data. An example is TRUNCATE TABLE EMPLOYEE;.
● RENAME: Used to change the name of a table.
● COMMENT: Allows for adding descriptive comments to the data dictionary, aiding in
documentation.
Data Manipulation Language (DML) is used to manipulate the actual data present within
tables or the database. It enables users to perform operations such as storing new data,
modifying existing data, updating records, and deleting information. Unlike DDL commands,
DML commands are not auto-committed, which means that changes made can be rolled back if
necessary, providing a layer of transactional control. Common DML commands include:
● INSERT: Used to add new rows of data into a table. For example, INSERT INTO BCA
VALUES (“Anuj", "DBMS");.
● UPDATE: Employed to modify or update the values of one or more columns in a table. An
example is UPDATE students SET User_Name = ’Anuj' WHERE Student_Id = '3'.
● DELETE: Used to remove one or more rows from a table based on specified conditions.
For instance, DELETE FROM BCA WHERE Author=“Anuj";.
● SELECT: Used to retrieve records from a specific table, often combined with a WHERE
clause to filter for particular records.
● MERGE: A command that allows for both insert and update operations (often referred to
as UPSERT) in a single statement.
The clear distinction between DDL and DML highlights a fundamental separation of concerns in
database management. DDL focuses on the schema or blueprint, defining the rules and
structure, while DML focuses on the instance or content, manipulating the actual data within that
defined structure. The auto-commit nature of DDL versus the rollback capability of DML further
emphasizes this, as structural changes are typically permanent, while data modifications often
require transactional integrity. This dual linguistic approach provides precise control over
different aspects of the database. It allows database administrators to manage the underlying
structure independently from how application developers and end-users interact with the data,
ensuring both structural integrity and operational flexibility.
Table 3.1: SQL Command Categories
Category Purpose Common Commands Syntax/Example
DDL (Data Definition Defines/modifies CREATE, ALTER, CREATE TABLE
Language) database structure. DROP, TRUNCATE, EMPLOYEE(Name
Auto-committed. RENAME, COMMENT VARCHAR2(20), Email
VARCHAR2(100));
<br> ALTER TABLE
STU_DETAILS
ADD(ADDRESS
VARCHAR2(20)); <br>
DROP TABLE
EMPLOYEE; <br>
TRUNCATE TABLE
EMPLOYEE;
DML (Data Manipulates data within INSERT, UPDATE, INSERT INTO BCA
Manipulation tables. Not DELETE, SELECT, VALUES (“Anuj",
Language) auto-committed (can MERGE "DBMS"); <br>
rollback). UPDATE students SET
User_Name = ’Anuj'
WHERE Student_Id =
'3'; <br> DELETE
FROM BCA WHERE
Author=“Anuj"; <br>
SELECT
COUNT(PHONE)
FROM STUDENT;
DCL (Data Control Manages user access GRANT, REVOKE GRANT SELECT,
Language) and permissions. UPDATE ON
MY_TABLE TO
SOME_USER; <br>
REVOKE SELECT,
UPDATE ON
MY_TABLE FROM
USER1;
TCL (Transaction Manages database COMMIT, ROLLBACK, DELETE FROM
Control Language) transactions. Used with SAVEPOINT CUSTOMERS WHERE
DML. AGE = 25; COMMIT;
<br> DELETE FROM
Category Purpose Common Commands Syntax/Example
CUSTOMERS WHERE
AGE = 25; ROLLBACK;
<br> SAVEPOINT
MY_SAVEPOINT;
Unit 2: Data Modeling using the Entity-Relationship
Model
This unit introduces the Entity-Relationship (ER) model as a high-level conceptual data model,
detailing its components, types of relationships, key concepts, and its translation into the
relational data model.
The fundamental building blocks of an ER model include entities, attributes, and relationships.
An Entity can be any distinct object, class, person, or place about which data needs to be
stored. Examples in an organizational context might include a manager, product, employee, or
department. In ER diagrams, entities are represented as rectangles. An Attribute describes a
property or characteristic of an entity. For instance, for a student entity, attributes could include
an ID, age, contact number, or name. Attributes are graphically represented as ellipses. A
Relationship defines an association or connection between entities. These connections are
depicted using diamonds or rhombuses in ER diagrams.
Beyond basic entities, the ER model distinguishes between regular and Weak Entities. A weak
entity is one that depends on another entity for its existence and does not possess a key
attribute of its own. It is visually represented by a double rectangle.
Attributes themselves can be further classified based on their characteristics:
● A Key Attribute represents the main distinguishing characteristic of an entity, often
serving as its primary identifier. In an ER diagram, it is shown as an ellipse with
underlined text.
● A Composite Attribute is an attribute that is composed of several other, simpler
attributes. For example, a "Name" attribute might be composed of "First_name,"
"Middle_name," and "Last_name." It is represented by an ellipse connected to other
ellipses that denote its constituent parts.
● A Multivalued Attribute is an attribute that can hold more than one value for a single
entity instance. An example is a student having multiple phone numbers. This type of
attribute is depicted by a double oval.
● A Derived Attribute is an attribute whose value can be computed or derived from other
attributes. For instance, a person's "Age" can be derived from their "Date of birth." It is
represented by a dashed ellipse.
Relationships between entities are categorized based on their cardinality, which describes how
many instances of one entity can be associated with instances of another.
● One-to-One (1:1) Relationship: Occurs when only one instance of an entity is
associated with exactly one instance of another entity. An example is a female marrying
one male.
● One-to-Many (1:M) Relationship: Involves one instance of an entity on one side being
associated with multiple instances of an entity on the other side. For example, a scientist
can invent many inventions, but each invention is attributed to a specific scientist.
● Many-to-One (M:1) Relationship: The inverse of one-to-many, where multiple instances
of an entity on one side are associated with a single instance on the other. For instance,
many students can enroll for only one course, but a course can have many students.
● Many-to-Many (M:N) Relationship: Occurs when multiple instances of an entity on one
side can be associated with multiple instances of an entity on the other. An example is
employees being assigned to many projects, and a project having many employees.
Mapping Cardinality (or Cardinality Ratio) defines the maximum number of relationship
instances in which an entity can participate. For binary relationship types, the possible ratios
include 1:1, 1:N, N:1, and N:M.
Participation or Existence Constraint represents the minimum number of relationship
instances each entity must participate in, also known as the minimum cardinality constraint.
● Total Participation: Implies that every entity in the set must be related to another entity
via the relationship. This is also referred to as existence dependency and is represented
by a double line connecting the entity to the relationship in an ER diagram.
● Partial Participation: Indicates that not every entity in the set needs to be related
through the relationship. This is shown by a single line connecting the entity to the
relationship.
The ER model's detailed concepts—entities, attributes, relationships, weak entities, and various
attribute types—along with precise mapping constraints like cardinality and participation, are not
just arbitrary symbols. They provide a standardized, high-level language to capture the
semantics of the real world. This allows database designers to communicate complex business
rules and data interdependencies to non-technical stakeholders, ensuring the database
accurately reflects the organizational domain before technical implementation. The ER model
thus serves as a crucial conceptual bridge, translating ambiguous real-world requirements into a
structured, unambiguous blueprint for database design. This conceptual clarity minimizes
misinterpretations during the translation to logical and physical models, ultimately leading to
more accurate and robust database systems.
Table 2.1: ER Diagram Notations
ER Component Description Standard Graphical
Representation
Entity Any object, class, person, or Rectangle
place.
Weak Entity Entity dependent on another, Double Rectangle
without its own key.
Attribute Property of an entity. Ellipse
Key Attribute Main characteristic, primary Ellipse with underlined text
identifier.
Composite Attribute Attribute composed of other Ellipse connected to other
attributes. ellipses
Multivalued Attribute Attribute with more than one Double Oval
value.
Derived Attribute Attribute derived from other Dashed Ellipse
attributes.
Relationship Association between entities. Diamond or Rhombus
2.1.5 Keys in DBMS
Keys play a fundamental role in relational databases, serving as essential tools for uniquely
identifying records and establishing meaningful relationships between tables.
● Primary Key: This is the most critical key, used to uniquely identify one and only one
instance of an entity within a table. While an entity might possess multiple attributes that
could potentially serve as unique identifiers, the most suitable one is chosen as the
primary key. For example, in an EMPLOYEE table, Employee_ID is an ideal candidate for
a primary key because it is unique for each employee. Other attributes like
License_Number or Passport_Number could also be unique, but Employee_ID might be
chosen for practical reasons.
● Candidate Key: A candidate key is an attribute or a set of attributes that can uniquely
identify a tuple (row) in a table. All attributes that are not selected as the primary key but
still possess the ability to uniquely identify a tuple are considered candidate keys. These
keys are as strong as the primary key in their identification capability. Continuing with the
EMPLOYEE table example, if Employee_ID is designated as the primary key, then SSN,
Passport_Number, and License_Number would all be considered candidate keys because
they too can uniquely identify an employee record.
● Super Key: A super key is a set of attributes that, when combined, can uniquely identify a
tuple in a table. It is essentially a superset of a candidate key. This implies that a super
key can include additional attributes beyond what is strictly necessary for unique
identification, as long as the combination still guarantees uniqueness. For instance, for
the EMPLOYEE table, EMPLOYEE_ID by itself is a super key. The combination
(EMPLOYEE_ID, EMPLOYEE_NAME) is also a super key, because even if two
employees share the same name, their EMPLOYEE_ID will be distinct, ensuring unique
identification of the tuple.
● Foreign Key: Foreign keys are columns in one table that are used to point to the primary
key of another table. Their primary purpose is to establish and identify relationships
between different tables in a relational database. This mechanism is vital for linking
related information across various entities without introducing data redundancy. For
example, in a company, employees work in specific departments. To link the EMPLOYEE
table with the DEPARTMENT table, a Department_ID column in the EMPLOYEE table
would act as a foreign key, referencing the Department_ID (which is the primary key) in
the DEPARTMENT table. This approach ensures that departmental information is not
duplicated within the employee table, maintaining data integrity.
The various types of keys (Primary, Candidate, Super, Foreign) are not just labels; they are the
fundamental mechanisms through which data integrity and relationships are enforced in a
relational database. The primary key ensures uniqueness within a table, preventing duplicate
records. Candidate keys represent alternative unique identifiers. Super keys illustrate that
uniqueness can be achieved with more attributes than strictly necessary. Most importantly,
foreign keys are the glue that binds related tables together, maintaining referential integrity and
allowing meaningful queries across multiple entities. A robust key structure is paramount for a
well-designed database. It ensures data accuracy, prevents inconsistencies, and facilitates
efficient data retrieval and manipulation by clearly defining how data points relate to each other
across the entire database schema.
Table 2.2: Types of Keys in DBMS
Key Type Definition Example
Primary Key Uniquely identifies one and only Employee_ID in an
one instance of an entity. Most EMPLOYEE table.
suitable key from potential
candidates.
Candidate Key An attribute or set of attributes If Employee_ID is PK, then
that can uniquely identify a SSN, Passport_Number,
tuple. All non-primary keys that License_Number in
can uniquely identify a tuple. EMPLOYEE table.
Super Key A set of attributes that can EMPLOYEE_ID or
uniquely identify a tuple. A (EMPLOYEE_ID,
superset of a candidate key. EMPLOYEE_NAME) in
EMPLOYEE table.
Foreign Key Columns in one table that Department_ID in EMPLOYEE
reference the primary key of table referencing
another table, establishing Department_ID (PK) in
relationships. DEPARTMENT table.
2.1.6 Generalization and Aggregation
Generalization and Aggregation are advanced ER modeling techniques that directly address the
challenge of complexity in larger database designs, providing mechanisms for abstraction.
Generalization is a bottom-up process that involves extracting common properties from a set of
lower-level entities to create a higher-level, more generalized entity. This approach simplifies the
model by identifying commonalities and creating super-entities, reducing redundancy in the
model itself. For example, entities such as Pigeon, House Sparrow, Crow, and Dove can all be
generalized into a single Birds entity. Similarly, STUDENT and FACULTY entities, which share
common attributes like name and address, can be generalized into a PERSON entity. This
process supports both attribute inheritance (where lower-level entities inherit attributes from
higher-level entities, e.g., a Car inheriting a Model attribute from Vehicle) and participation
inheritance (where relationships involving a higher-level entity set are also inherited by
lower-level entities).
Aggregation is an abstraction mechanism used when an ER diagram needs to represent a
relationship between an entity and another relationship. It allows a relationship, along with its
corresponding entities, to be treated as a single higher-level entity set. This is particularly useful
for modeling 'has-a', 'is-a', or 'is-part-of' relationships. The primary purpose of aggregation is to
address scenarios where a direct relationship between an entity and a relationship cannot be
adequately represented in a standard ER diagram. For instance, if an Employee WORKS_FOR
a Project, and this combined WORKS_FOR relationship then REQUIRES Machinery,
aggregation allows the WORKS_FOR relationship (along with EMPLOYEE and PROJECT) to
be treated as a single entity. A new REQUIRE relationship can then be established between this
aggregated entity and the MACHINERY entity.
These abstraction mechanisms are critical for creating clear, concise, and manageable ER
diagrams for complex systems. They improve the readability and maintainability of the
conceptual model, making it easier to translate into an efficient relational schema.
Relational Algebra is a procedural query language that operates on relations (tables), taking
relations as input and producing relations as output. It serves as the mathematical and logical
foundation upon which all relational database query languages, such as SQL, are built.
Basic Operators:
● Selection (σ): Used to select tuples (rows) from a relation that satisfy a specified
condition. The syntax is σ(Condition)(Relation Name). For example,
σ(AGE>18)(STUDENT) would extract all students older than 18 from the STUDENT
relation.
● Projection (∏): Used to select specific columns from a relation. The syntax is ∏(Column
1, Column 2…Column n)(Relation Name). A key characteristic is that it automatically
removes duplicate rows from the result. For instance, ∏(ROLL_NO,NAME)(STUDENT)
would extract only the ROLL_NO and NAME columns from the STUDENT relation.
● Cross Product (X): Used to combine two relations by concatenating every row of the first
relation with every row of the second relation. If Relation1 has 'm' tuples and Relation2
has 'n' tuples, their cross product will yield 'm x n' tuples. The syntax is Relation1 X
Relation2.
● Union (U): This operator combines tuples from two relations (R1 and R2) that are "union
compatible," meaning they must have the same number of attributes and their
corresponding attributes must have the same domain. The result contains all unique
tuples present in either R1 or R2, with duplicates appearing only once. The syntax is
Relation1 U Relation2.
● Minus (-): Also applicable only to union-compatible relations (R1 and R2), the minus
operator R1 - R2 yields a relation containing tuples that are present in R1 but not in R2.
The syntax is Relation1 - Relation2.
Extended Operators: These operators are derived from the basic relational algebra operators
and provide more specialized functionalities.
● Join: Used to combine data from two or more tables based on a related column between
them. Joins are crucial for efficient data retrieval in complex queries.
○ Inner Join: Returns only those rows where there is a match in both tables. If no
match exists, the row is excluded. Types include:
■ Conditional Join (Theta Join ⋈θ): Joins relations based on any specified
condition (e.g., equality, inequality, greater than).
■ Equi Join: A special case of conditional join where the join condition is solely
based on equality between attributes.
■ Natural Join (⋈): Automatically combines tables based on matching column
names and data types, eliminating duplicate columns in the result set without
explicit equality conditions.
○ Outer Join: Returns all records from one table and the matched records from the
other. If no match is found, NULL values are included for the non-matching
columns. Types include:
■ Left Outer Join (⟕): Returns all records from the left table and matching
records from the right table. Unmatched rows from the left table have NULLs
for right table columns.
■ Right Outer Join (⟖): Returns all records from the right table and matching
records from the left table. Unmatched rows from the right table have NULLs
for left table columns.
■ Full Outer Join (⟗): Returns all records when there is a match in either the
left or right table. If no match, it includes all rows from both tables with NULL
values for the missing side.
● Intersection (∩): Returns the common records from two union-compatible relations. It
retrieves rows that appear in both tables, ensuring only matching data is included.
● Divide (÷): Used to find records in one relation that are associated with all records in
another relation. This operator is particularly useful for identifying entities that satisfy
conditions across multiple related datasets.
Relational Algebra is presented as a procedural query language, which is crucial because it is
the mathematical and logical foundation upon which all relational database query languages
(like SQL) are built. Understanding these operators (selection, projection, joins, set operations)
provides a deep insight into how data is manipulated and retrieved at a fundamental level,
regardless of the specific SQL syntax used. The distinction between basic and extended
operators shows how complex operations can be built from simpler ones. Mastering relational
algebra provides a powerful conceptual framework for understanding query optimization and
database performance. It allows for thinking about data manipulation in a structured, formal way,
which is invaluable for designing efficient queries and understanding query execution plans.
Table 2.3: Relational Algebra Operators
Operator Symbol Purpose Syntax/Example
Selection σ Selects tuples (rows) σ(AGE>18)(STUDENT)
based on a condition.
Projection ∏ Selects particular ∏(ROLL_NO,NAME)(S
columns; removes TUDENT)
duplicates.
Cross Product X Joins two relations by STUDENT X
concatenating every STUDENT_SPORTS
row of R1 with every
row of R2.
Union U Combines unique STUDENT U
tuples from two EMPLOYEE
union-compatible
relations.
Minus - Returns tuples in R1 STUDENT -
but not in R2 EMPLOYEE
(union-compatible).
Join ⋈, ⋈θ, ⟕, ⟖, ⟗ Combines data from R ⋈ S (Natural Join)
two or more tables <br> R ⟕ S (Left Outer
based on related Join)
columns.
Intersection ∩ Returns common R∩S
unique records from
two union-compatible
relations.
Divide ÷ Finds records in one R÷S
relation associated with
all records in another.
Data Definition Language (DDL) commands are used to alter the structure of database objects
like tables. These commands are characterized by their auto-committed nature, meaning that
changes are permanently saved to the database immediately upon execution.
● CREATE: This command is used to construct new tables or entire databases. For
example, to create an EMPLOYEE table with specific columns for name, email, and date
of birth, the syntax would be CREATE TABLE EMPLOYEE(Name VARCHAR2(20), Email
VARCHAR2(100), DOB DATE);.
● ALTER: The ALTER command modifies the structure of an existing database object. This
could involve adding a new column to a table, such as ALTER TABLE STU_DETAILS
ADD(ADDRESS VARCHAR2(20));, or changing the characteristics of an existing column,
as in ALTER TABLE STU_DETAILS MODIFY (NAME VARCHAR2(20));.
● DROP: This command is used to delete both the structure and all records stored within a
table. For instance, DROP TABLE EMPLOYEE; would remove the entire EMPLOYEE
table from the database.
● TRUNCATE: The TRUNCATE command is used to delete all rows from a table while
preserving its structure. It also frees up the space previously occupied by the deleted
data. An example is TRUNCATE TABLE EMPLOYEE;.
Data Manipulation Language (DML) commands are employed to modify the data residing within
the database. Unlike DDL commands, DML operations are not auto-committed, allowing
changes to be rolled back if necessary, which is crucial for transactional integrity.
● INSERT: This statement adds new rows of data into a table. For example, INSERT INTO
BCA VALUES (“Anuj", "DBMS"); would insert a new record into the BCA table.
● UPDATE: The UPDATE command is used to modify existing values in one or more
columns of a table. A conditional WHERE clause is typically used to specify which rows to
update. For instance, UPDATE students SET User_Name = ’Anuj' WHERE Student_Id =
'3' changes the User_Name for a specific student.
● DELETE: This command removes one or more rows from a table. Similar to UPDATE, a
WHERE clause can be used to specify which rows to delete. An example is DELETE
FROM BCA WHERE Author=“Anuj";.
● SELECT: Although often categorized separately as DQL (Data Query Language),
SELECT is a fundamental DML operation used to retrieve records from a table. It is
frequently combined with a WHERE clause to filter for particular records, as seen in
various examples throughout the document.
3.3.3 Subqueries
Subqueries, also known as inner queries or nested queries, are SQL queries embedded within
another SQL query. The inner query executes first, and its result is then used by the outer query.
This capability allows for complex data retrieval and the implementation of sophisticated
business logic directly within the database.
Subqueries play several important roles, including filtering records based on data from related
tables, aggregating data and performing calculations dynamically, cross-referencing data
between tables to retrieve specific insights, and conditionally selecting rows without requiring
explicit joins or external code logic.
There are several types of subqueries, each suited for different scenarios:
● Scalar Subqueries: These return a single value (one row and one column). They are
frequently used where a single value is expected, such as in calculations, comparisons, or
assignments within SELECT or WHERE clauses. For example, SELECT
employee_name, salary FROM employees WHERE salary > (SELECT AVG(salary)
FROM employees); retrieves employees whose salary is greater than the overall average
salary.
● Column Subqueries: These return a single column but multiple rows. They are often
used with operators like IN or ANY, where the outer query compares values from multiple
rows. An example is SELECT employee_name FROM employees WHERE department_id
IN (SELECT department_id FROM departments WHERE location = 'New York'); which
filters employees based on departments located in a specific city.
● Row Subqueries: These return a single row containing multiple columns. They are
typically used with comparison operators that can compare an entire row of data, such as
= or IN, when multiple values are expected. For instance, SELECT employee_name
FROM employees WHERE (department_id, job_title) = (SELECT department_id, job_title
FROM managers WHERE manager_id = 1); finds employees with matching department
and job titles to a specific manager.
● Table Subqueries (Derived Tables): These return a complete table of multiple rows and
columns. They are commonly used in the FROM clause as a temporary table within a
query. An example is SELECT dept_avg.department_id, dept_avg.avg_salary FROM
(SELECT department_id, AVG(salary) AS avg_salary FROM employees GROUP BY
department_id) AS dept_avg WHERE dept_avg.avg_salary > 50000; which uses a
derived table of average salaries per department to filter departments above a certain
threshold.
● Correlated Subqueries: These refer to columns from the outer query in their WHERE
clause and are re-executed once for each row processed by the outer query. This means
the subquery depends on the outer query for its values.
● Non-Correlated Subqueries: These do not refer to the outer query and can be executed
independently. Their result is calculated once and then used by the outer query.
The introduction of subqueries signifies SQL's capability to handle increasingly complex data
retrieval and business logic. Instead of just fetching data directly, subqueries allow for multi-step
computations and filtering, where one query's result informs another. This enables dynamic data
aggregation, cross-referencing, and conditional selection that would be cumbersome or
impossible with simple queries. The different types (scalar, column, row, table) show the
versatility in how these intermediate results can be used. Subqueries are a powerful feature for
advanced SQL users, enabling them to construct sophisticated queries that mimic complex
application logic directly within the database. This reduces the need for external
application-level processing, potentially improving performance and simplifying application code.
Aggregate functions perform mathematical operations on a set of data values within a relation
and return a single summary result. These functions are crucial for data analysis and reporting,
transforming raw data into meaningful insights.
● COUNT: This function counts the number of rows in a relation that satisfy a specified
condition. For example, SELECT COUNT (PHONE) FROM STUDENT; would count the
number of students with a phone number.
● SUM: The SUM function adds up the values of a specific numeric attribute in a relation.
An example is SELECT SUM(AGE) FROM STUDENT; which calculates the total sum of
ages for all students.
● AVERAGE (AVG): This function calculates the average value of tuples for a given
attribute. It can be expressed as AVG(attributename) or
SUM(attributename)/COUNT(attributename).
● MAXIMUM (MAX): The MAX function extracts the highest value among a set of tuples for
a specified attribute. Its syntax is MAX(attributename).
● MINIMUM (MIN): Conversely, the MIN function extracts the lowest value among a set of
tuples for a specified attribute. Its syntax is MIN(attributename).
The GROUP BY clause is used in conjunction with aggregate functions to group the tuples of a
relation based on one or more attributes. The aggregation function is then computed for each
distinct group. For example, SELECT ADDRESS, SUM(AGE) FROM STUDENT GROUP BY
(ADDRESS); would calculate the sum of ages for students residing in each distinct address.
The output would show aggregated sums per address, such as DELHI 36, GURGAON 18, and
ROHTAK 20.
Aggregate functions combined with GROUP BY transform raw data into meaningful summaries.
This is a critical capability for business intelligence and reporting. Instead of just listing individual
records, these operations allow users to derive insights like total sales per region, average
employee salary per department, or the number of students in each course. This moves beyond
simple data retrieval to data analysis. The ability to aggregate and group data directly within
SQL empowers analysts and decision-makers to gain high-level insights from large datasets
efficiently. This reduces the need for external tools for basic statistical analysis, making SQL a
powerful tool for data-driven decision-making.
Table 3.3: SQL Aggregate Functions
Function Name Purpose Example SQL Query
COUNT Counts the number of rows (or SELECT COUNT(PHONE)
non-NULL values in a column). FROM STUDENT;
SUM Calculates the sum of values in SELECT SUM(AGE) FROM
a numeric column. STUDENT;
Function Name Purpose Example SQL Query
AVG (Average) Calculates the average value of SELECT AVG(AGE) FROM
a numeric column. STUDENT;
MAX (Maximum) Finds the maximum value in a SELECT MAX(AGE) FROM
column. STUDENT;
MIN (Minimum) Finds the minimum value in a SELECT MIN(AGE) FROM
column. STUDENT;
3.3.5 Joins
Joins are fundamental SQL operations used to combine data from two or more tables based on
related columns between them. They are essential for reconstructing information that is logically
distributed across multiple normalized tables.
● Inner Join: This type of join returns only those rows where there is a match in both tables
based on the join condition. If a row in one table does not have a corresponding match in
the other, it is excluded from the result.
○ Conditional Join (Theta Join ⋈θ): A general form of join that combines relations
based on any specified condition, which can include comparison operators like >, <,
=, >=, <=, or !=.
○ Equi Join: A specific type of conditional join where the join condition is solely
based on an equality (=) between attributes from the two tables. In the result, only
one of the equal attributes is typically displayed.
○ Natural Join (⋈): This join automatically combines two tables based on matching
column names and data types. It implicitly applies an equality condition on all
identically named columns and eliminates duplicate columns from the result,
providing a seamless combined dataset.
● Outer Join: Unlike inner joins, outer joins return all records from one table and the
matched records from the other. If no match is found for a row in one table, the result will
include NULL values for the columns of the non-matching table.
○ Left Outer Join (⟕): Returns all records from the "left" table (the first table specified
in the FROM clause) and only the matching records from the "right" table. If there is
no match in the right table, the columns from the right table will contain NULL
values for that row.
○ Right Outer Join (⟖): Returns all records from the "right" table and only the
matching records from the "left" table. Unmatched rows from the right table will
have NULL values for the left table's columns.
○ Full Outer Join (⟗): Returns all records when there is a match in either the left or
right table. If no match is found in either table, it includes all rows from both tables,
with NULL values for the missing side.
● Cross Join: This operation produces the Cartesian product of two tables. It combines
every row from the first table with every row from the second table, resulting in m x n
tuples, where m and n are the number of rows in each table, respectively.
Joins are fundamental to the relational model's power. They enable the reconstruction of
information that is logically distributed across multiple tables (due to normalization). Without
joins, normalized databases would be fragmented, making it impossible to answer queries that
require data from related entities. The different types of joins (inner, outer) provide flexibility in
how data is combined, allowing for precise control over what data is included or excluded based
on matching criteria. Joins are essential for data integration and comprehensive querying in
relational databases. They allow complex relationships between entities to be leveraged,
enabling the retrieval of a holistic view of the data that is critical for business operations and
analysis.
Table 3.4: SQL Join Types
Join Type Description
Inner Join Returns rows only when there is a match in
both tables based on the join condition.
Left Outer Join Returns all rows from the left table, and the
matched rows from the right table. Unmatched
left rows have NULLs for right columns.
Right Outer Join Returns all rows from the right table, and the
matched rows from the left table. Unmatched
right rows have NULLs for left columns.
Full Outer Join Returns all rows from both tables when there is
a match in either. Unmatched rows from either
side have NULLs for the missing columns.
Cross Join Produces the Cartesian product of two tables,
combining every row from the first with every
row from the second.
SQL provides set operations that allow for combining the results of two or more SELECT
statements. These operations require the participating queries to be "union compatible,"
meaning they must have the same number of columns, and corresponding columns must have
compatible data types.
● Union (U): This operation combines the result sets of two or more SELECT statements
and returns all unique rows from both queries. Duplicate rows are eliminated from the final
result. For example, STUDENT U EMPLOYEE would combine unique rows from both
student and employee tables, assuming they are union compatible.
● Intersection (∩): The INTERSECT operator returns only the distinct rows that are
present in both result sets of two SELECT statements. For instance, SELECT manager_id
FROM departments INTERSECT SELECT employee_id FROM employees; would list IDs
that are present in both the manager_id column of the departments table and the
employee_id column of the employees table, effectively showing managers who are also
employees.
● Minus (EXCEPT): The MINUS (or EXCEPT in some SQL dialects like SQL Server)
operator returns distinct rows from the first SELECT statement that are not present in the
second SELECT statement. For example, STUDENT - EMPLOYEE would return records
of individuals who are students but not employees. Similarly, SELECT employee_id
FROM employees EXCEPT SELECT manager_id FROM departments; would list
employees who are not managers.
Set operations are powerful tools for comparative analysis between different datasets, provided
they are "union compatible." They allow users to find combined results, common elements, or
distinct differences between two query results, which is invaluable for identifying overlaps, gaps,
or unique populations within the data. These operations extend SQL's analytical capabilities,
enabling users to perform sophisticated comparisons directly within the database. This is
particularly useful for tasks like identifying shared customer bases, finding employees who are
not also managers, or combining sales data from different regions.
Table 3.5: SQL Set Operations
Operation Description Example (Conceptual)
Union Combines unique rows from SELECT Name FROM
two or more queries. Students UNION SELECT
Name FROM Employees; (Lists
all unique names from both
tables)
Intersection Returns common unique rows SELECT ProductID FROM
found in both queries. Sales2023 INTERSECT
SELECT ProductID FROM
Sales2024; (Lists products sold
in both years)
Minus (EXCEPT) Returns unique rows from the SELECT EmployeeID FROM
first query that are not present FullTimeStaff EXCEPT
in the second. SELECT EmployeeID FROM
PartTimeStaff; (Lists full-time
staff not also part-time)
Data Control Language (DCL) commands are specifically designed to manage user access
privileges and control authority within a database. They determine who can perform what
actions on which database objects.
● GRANT: This command is used to bestow specific user access privileges to a database
or its objects. For instance, GRANT SELECT, UPDATE ON MY_TABLE TO
SOME_USER, ANOTHER_USER; would allow SOME_USER and ANOTHER_USER to
read and modify data in MY_TABLE.
● REVOKE: Conversely, the REVOKE command is used to withdraw permissions that were
previously granted to users. An example is REVOKE SELECT, UPDATE ON MY_TABLE
FROM USER1, USER2;, which would remove the SELECT and UPDATE privileges from
USER1 and USER2 on MY_TABLE.
DCL commands are the primary mechanisms for implementing access control and security
within the database. They define who can do what with the data and schema. This is not just
about preventing unauthorized access but also about implementing the principle of least
privilege, ensuring users only have the necessary permissions for their roles. This directly ties
into the broader concept of database security. Proper use of DCL is critical for maintaining the
confidentiality, integrity, and availability of sensitive database information. It allows
administrators to enforce granular security policies, protecting data from unauthorized
modifications or disclosures.
Transaction Control Language (TCL) commands are used in conjunction with DML commands
to manage transactions. These operations are distinct in that they are not automatically
committed, providing explicit control over the permanence of data modifications.
● COMMIT: The COMMIT command is used to permanently save all transactions to the
database. Once a transaction is committed, its changes become a permanent part of the
database and cannot be undone by a simple rollback. For example, after deleting
customers aged 25, DELETE FROM CUSTOMERS WHERE AGE = 25; COMMIT; would
make those deletions permanent.
● ROLLBACK: The ROLLBACK command is used to undo transactions that have not yet
been permanently saved to the database. This is crucial for reverting the database to its
state before a series of operations if an error occurs or the transaction is not completed
successfully. For instance, DELETE FROM CUSTOMERS WHERE AGE = 25;
ROLLBACK; would undo the deletion if it had not yet been committed.
● SAVEPOINT: A SAVEPOINT allows for rolling back a transaction to a specific designated
point without having to roll back the entire transaction. This provides finer-grained control
over transaction management, enabling partial rollbacks. The syntax is SAVEPOINT
SAVEPOINT_NAME;.
TCL commands are directly linked to the ACID properties of transactions. They provide the
explicit control necessary to ensure Atomicity (all or nothing) and Durability (permanent
changes) for DML operations. The ability to ROLLBACK is crucial for maintaining consistency in
the face of errors or incomplete operations, preventing partial updates from corrupting the
database state. TCL is essential for managing the reliability of data modifications in a multi-user
environment. It allows developers to define logical units of work that are either fully completed or
completely undone, guaranteeing the integrity of the database even during complex operations
or system failures.
To systematically derive new functional dependencies from a given set of existing ones, a set of
inference rules, often referred to as Armstrong's Axioms, are employed. These rules form the
formal grammar that defines valid relationships between attributes in a relational database.
They are not just descriptive; they are prescriptive rules that dictate how data must behave to
ensure consistency. The inference rules allow designers to logically deduce new dependencies
from a given set, which is critical for identifying potential problems and designing optimal
schemas.
The six primary inference rules are:
● Reflexivity Rule (IR1): If Y is a subset of X, then X determines Y (X → Y). This rule
states that a set of attributes always determines any of its subsets. For example, if a
database contains StudentID and Name attributes, then {StudentID,Name} → {StudentID}
is a valid dependency, as StudentID is a subset of the combined set.
● Augmentation Rule (IR2): If X → Y, then XZ → YZ. This rule indicates that adding any
set of attributes (Z) to both sides of an existing functional dependency does not invalidate
the dependency. For instance, if StudentID → Name, then adding Address to both sides
implies that {StudentID, Address} → {Name, Address} is also true.
● Transitive Rule (IR3): If X → Y and Y → Z, then X → Z. This rule allows for the chaining
of dependencies, meaning if X determines Y, and Y in turn determines Z, then X must also
determine Z. An example is if StudentID → Name and Name → Address, then StudentID
→ Address can be inferred.
● Union Rule (IR4) / Additivity: If X → Y and X → Z, then X → YZ. This rule states that if a
set of attributes X determines two other sets of attributes Y and Z separately, then X also
determines the combination of Y and Z. For example, if EmployeeID → EmployeeName
and EmployeeID → Department, then EmployeeID → EmployeeName, Department can
be derived.
● Decomposition Rule (IR5) / Projectivity: If X → YZ, then X → Y and X → Z. This rule is
the inverse of the Union Rule, allowing a combined dependency to be broken down into
individual dependencies. For instance, if ProductID → ProductName, ProductPrice, then it
can be decomposed into ProductID → ProductName and ProductID → ProductPrice.
● Pseudo-transitive Rule (IR6): If X → Y and YZ → W, then XZ → W. This rule is a
variation of transitivity, where an additional set of attributes (Z) is involved in the
intermediate step. An example is if CourseID → Department and {Department, Semester}
→ Instructor, then {CourseID, Semester} → Instructor can be inferred.
A deep understanding of functional dependencies is essential for effective database design and
normalization. They provide the theoretical foundation for identifying redundancy and
anomalies, guiding the decomposition of tables into well-structured forms that maintain data
integrity and improve query performance.
Table 4.1: Functional Dependency Inference Rules (Armstrong's Axioms)
Rule Name Syntax Example
Reflexivity If Y ⊆ X then X → Y {StudentID,Name} →
{StudentID}
Augmentation If X → Y then XZ → YZ If StudentID → Name, then
{StudentID, Address} →
{Name, Address}
Rule Name Syntax Example
Transitivity If X → Y and Y → Z then X → Z If StudentID → Name and
Name → Address, then
StudentID → Address
Union (Additivity) If X → Y and X → Z then X → If EmployeeID →
YZ EmployeeName and
EmployeeID → Department,
then EmployeeID →
EmployeeName, Department
Decomposition (Projectivity) If X → YZ then X → Y and X → If ProductID → ProductName,
Z ProductPrice, then ProductID
→ ProductName and ProductID
→ ProductPrice
Pseudo-transitivity If X → Y and YZ → W then XZ If CourseID → Department and
→W {Department, Semester} →
Instructor, then {CourseID,
Semester} → Instructor
First Normal Form (1NF) is the most basic level of normalization. A relation is in 1NF if each
table cell contains only a single, atomic value, and each column has a unique name. Essentially,
every attribute in that relation must be a single-valued attribute. A relation violates 1NF if it
contains composite or multi-valued attributes. For example, a STUDENT table with a
multi-valued STUD_PHONE attribute (where a student can have multiple phone numbers stored
in a single cell) would violate 1NF. To convert such a table to 1NF, the multi-valued attribute
would need to be separated, perhaps by creating a new table for phone numbers linked by the
STUD_NO.
To be in Second Normal Form (2NF), a relation must first satisfy the conditions of 1NF, and it
must not contain any partial dependency. A partial dependency occurs when a non-prime
attribute (an attribute that is not part of any candidate key) is dependent on a proper subset of a
candidate key of the table. Consider a table with attributes STUD_NO, COURSE_NO, and
COURSE_FEE. If {STUD_NO, COURSE_NO} is the primary key, and COURSE_NO alone
determines COURSE_FEE (i.e., COURSE_NO → COURSE_FEE), then COURSE_FEE is a
non-prime attribute that is partially dependent on COURSE_NO (which is a proper subset of the
primary key). This scenario violates 2NF. To convert this relation to 2NF, it would be split into
two tables: (STUD_NO, COURSE_NO) and (COURSE_NO, COURSE_FEE), thereby removing
the partial dependency and reducing data redundancy.
Third Normal Form (3NF) builds upon 2NF. A relation is in 3NF if it is in 2NF and does not have
any transitive dependency for non-prime attributes. A transitive dependency exists when a
non-prime attribute depends on another non-prime attribute, which in turn depends on the
primary key, creating an indirect dependency. The basic condition for a non-trivial functional
dependency X → Y to satisfy 3NF is that either X is a Super Key, or Y is a Prime Attribute
(meaning every element of Y is part of a candidate key). For example, consider an Enrollments
table where Enrollment ID is the primary key, and it includes Course ID and Course Name. If
Enrollment ID → Course ID and Course ID → Course Name, then Course Name transitively
depends on Enrollment ID via Course ID. This violates 3NF. To achieve 3NF, the table would
be split into Enrollments (Enrollment ID, Student Name, Course ID) and Courses (Course ID,
Course Name), ensuring that Course Name directly depends only on Course ID.
Transactions adhere to four fundamental properties, collectively known as ACID, which are
essential for maintaining consistency and reliability in a database before and after a
transaction's execution. These properties form the bedrock of transactional reliability in DBMS.
They are not merely desirable features but a contract that the database system offers to ensure
that concurrent operations and system failures do not compromise data integrity.
● Atomicity: This property dictates that all operations within a transaction must either take
place entirely or not at all. There is no concept of partial execution; a transaction is treated
as a single, indivisible unit (an "all-or-nothing" proposition). If a transaction Aborts, none of
its changes are visible in the database. If it Commits, all its changes become permanently
visible.
● Consistency: This property ensures that integrity constraints are maintained. A
transaction must transform the database from one consistent state to another consistent
state. For example, in a money transfer, if the total balance across two accounts is 900
before the transaction, it must remain 900 after the transaction, even if individual account
balances change. Inconsistency would arise if one part of the transfer succeeded while
another failed.
● Isolation: This property mandates that data being used by one transaction during its
execution cannot be accessed or modified by a second transaction until the first one is
completed. This prevents interference between concurrent transactions, ensuring that
each transaction appears to execute in isolation from others. The concurrency control
subsystem of the DBMS is responsible for enforcing isolation.
● Durability: This property guarantees that once a transaction is successfully completed
and committed, the permanent changes made to the database will not be lost, even in the
event of system failures (such as power outages or hardware malfunctions). The recovery
subsystem of the DBMS is responsible for ensuring this durability.
Adherence to ACID properties is paramount for any reliable database system. It ensures that
complex, multi-step operations are treated as indivisible units, preventing data corruption and
providing a consistent view of the data to all users, even in highly concurrent or failure-prone
environments.
During its lifecycle, a transaction progresses through various states, each representing a
different stage in its execution. This structured progression is essential for the DBMS to manage
complex operations, especially in multi-user environments.
● Active State: This is the initial state for every transaction. In this state, the transaction is
actively executing its operations, such as inserting, deleting, or updating records.
However, at this point, none of the changes made by the transaction have been
permanently saved to the database.
● Partially Committed State: A transaction enters this state after it has successfully
executed its final operation. Although all intended operations are complete, the data
changes are still residing in volatile memory and are not yet permanently saved to the
database.
● Committed State: A transaction reaches the committed state if all its operations have
been executed successfully and its changes have been permanently saved on the
database system. Once committed, the effects of the transaction are durable and cannot
be lost.
● Failed State: A transaction enters the failed state if any of the checks performed by the
database recovery system fail. This could occur due to various issues, such as a query
failing to execute or a system constraint being violated.
● Aborted State: If a transaction reaches a failed state, the database recovery system
ensures that the database is returned to its previous consistent state. This involves
aborting or rolling back the transaction, undoing all changes made since its start. After
aborting, the database recovery module will either attempt to restart the transaction or
terminate it permanently.
Understanding transaction states is crucial for debugging database applications and for
designing robust error handling mechanisms. It clarifies how the DBMS ensures data
consistency and recovers from failures by systematically managing the lifecycle of each unit of
work.
5.2 Serializability
Serializability is a fundamental concept in concurrency control, aiming to identify non-serial
schedules that allow transactions to execute concurrently without interfering with one another. A
non-serial schedule is considered serializable if its final result is equivalent to the result of some
serial execution of the same transactions. This concept is central to concurrency control
because while serial execution is simple and correct, it is inefficient in multi-user environments.
Non-serial schedules allow concurrency, but they risk data inconsistency. Serializability provides
the theoretical framework to determine if a concurrent schedule is "correct" (i.e., produces the
same result as some serial execution).
● Serial Schedule: In a serial schedule, transactions are executed one after another,
completely finishing one transaction before starting the next. There is no interleaving of
operations from different transactions.
● Non-serial Schedule: A non-serial schedule allows for the interleaving of operations from
multiple transactions. This means that operations from different transactions can be
executed concurrently, leading to many possible orders in which the system can execute
the individual operations.
A Serialization Graph, also known as a Precedence Graph, is a directed graph used to test
the conflict serializability of a schedule. For a given schedule S, a graph G = (V, E) is
constructed, where V is a set of vertices representing all participating transactions, and E is a
set of directed edges.
An edge Ti → Tj is drawn in the precedence graph if one of the following conflicting conditions
holds:
1. Transaction Ti executes a write(Q) operation before transaction Tj executes a read(Q)
operation on the same data item Q.
2. Transaction Ti executes a read(Q) operation before transaction Tj executes a write(Q)
operation on the same data item Q.
3. Transaction Ti executes a write(Q) operation before transaction Tj executes a write(Q)
operation on the same data item Q.
The serializability condition for a schedule is determined by the presence of cycles in its
precedence graph. If the precedence graph for schedule S contains no cycle, then S is
considered serializable. Conversely, if the precedence graph contains a cycle, then S is
non-serializable.
To understand the nature and origin of problems, failures are generally classified into the
following categories:
● Transaction Failure: This occurs when a transaction fails to execute or reaches a point
from which it cannot proceed further. Reasons for transaction failure include:
○ Logical Errors: These happen if a transaction cannot complete due to a code error
or an internal error condition.
○ Syntax Error: This occurs when the DBMS itself terminates an active transaction
because it is unable to execute it, for example, in cases of deadlock or resource
unavailability.
● System Crash: System failures can be caused by power outages or other hardware or
software malfunctions, such as operating system errors. In the context of system crashes,
a "fail-stop assumption" is often made, meaning that non-volatile storage (where data is
permanently stored) is assumed not to be corrupted.
● Disk Failure: This type of failure was more common in the early days of technology. It
occurs when hard-disk drives or storage drives fail frequently. Disk failure can result from
issues like bad sectors, disk head crashes, or unreachability to the disk, which can
destroy all or part of the stored data.
The core principle behind recovery is redundancy, achieved through various mechanisms that
ensure a consistent state can be reconstructed even if primary data or ongoing operations are
lost.
● Commit: This operation is used to permanently save the work done by a transaction,
making its changes durable in the database.
● Rollback: This operation is used to undo the work done by a transaction, reverting the
database to a consistent state prior to the transaction's initiation.
● Log-Based Recovery: This technique involves keeping a detailed record of all database
changes in a log file. If a failure occurs, this log file is used to redo completed transactions
(that were committed but not yet written to disk) and undo incomplete ones (that were not
committed). This can operate in two modes: Deferred database modification, where all
logs are written to stable storage and the database is updated only when a transaction
commits; or Immediate database modification, where the database is modified
immediately after every operation, with logs also recorded.
● Checkpointing: This technique involves creating a snapshot of the database at a
particular point in time. During recovery, the system can start from the last checkpoint,
significantly reducing the amount of data that needs to be processed from the log file.
● Shadow Paging: This method maintains two copies of a database page: a shadow page
(the old, consistent version) and a current page (where updates are made). During
updates, changes are made to the current page while the shadow page remains intact,
providing an immediate recovery point to revert to the previous state in case of failure.
● Backup and Restore: This involves creating periodic backups of the entire database. In
the event of a catastrophic failure, these backups can be used to restore the data to a
previous consistent state.
Robust recovery mechanisms are fundamental to the "Durability" aspect of ACID properties.
They ensure business continuity and data trustworthiness by minimizing data loss and
downtime in the face of unexpected system failures, which is paramount for mission-critical
applications.
When multiple transactions execute simultaneously, they can interfere with each other, leading
to various concurrency issues:
● Lost Update Problem: This occurs when two transactions read the same data, modify it
independently, and then write their changes back without proper synchronization. The
update made by the first transaction is effectively overwritten and "lost" by the second
transaction's write. For example, if Transaction T1 reads Balance = 5000, T2 also reads
Balance = 5000. T1 adds 1000 (resulting in 6000), and T2 subtracts 500 (resulting in
4500). If T2 writes its result last, T1's update is lost, and the final balance becomes 4500
instead of the correct 5500.
● Dirty Read Problem: Also known as reading uncommitted data, this problem arises when
a transaction reads data that has been modified by another transaction but has not yet
been committed. If the modifying transaction later rolls back, the data read by the first
transaction becomes invalid. For instance, if T1 updates Balance = 7000 but has not
committed, and T2 reads this 7000. If T1 then rolls back, restoring the balance to 5000,
T2 is left holding an incorrect value.
● Unrepeatable Read Problem: This occurs when a transaction reads the same data item
multiple times but retrieves different values because another committed transaction
modified the data between the reads. For example, T1 reads Balance = 5000. T2 updates
Balance = 6000 and commits. When T1 reads the balance again, it now sees 6000,
leading to inconsistent results within the same transaction.
● Phantom Read Problem: This problem occurs when a transaction executes a query
twice, and the second execution returns a different set of rows satisfying the query
criteria. This happens because another committed transaction inserted or deleted rows
that match the criteria between the two reads. For example, T1 selects all employees with
salary > 50000. T2 inserts a new employee with salary = 60000 and commits. When T1
re-executes the same query, it sees an extra "phantom" row.
To prevent these concurrency problems, DBMS employ various concurrency control techniques.
Locking techniques are a primary method, where a lock is a mechanism that prevents multiple
transactions from accessing the same resource simultaneously.
● Types of Locks:
○ Shared Lock (S-Lock): This type of lock, also known as a read-only lock, allows
multiple transactions to access the same data item for reading concurrently.
However, no transaction holding an S-Lock can make changes to the data. An
S-Lock is requested using the lock-S instruction.
○ Exclusive Lock (X-Lock): An Exclusive Lock provides a transaction with exclusive
access to a data item, allowing it to both read and modify the data. While an X-Lock
is held, no other transaction can access the same data item (neither read nor write).
An X-Lock is requested using the lock-X instruction.
● Two-Phase Locking Protocol (2PL): This is a widely used concurrency control protocol
that defines clear rules for managing data locks to ensure serializability. It divides a
transaction's execution into two distinct phases:
○ Phase 1 (Growing Phase): In this phase, a transaction can acquire new locks, but
it cannot release any locks it currently holds. It continues to acquire all the locks it
needs to access the required data.
○ Phase 2 (Shrinking Phase): Once a transaction releases its first lock, it enters the
shrinking phase. In this phase, the transaction can only release locks; it cannot
acquire any new ones. The 2PL protocol guarantees serializability, meaning that
any schedule produced by 2PL will be equivalent to some serial execution of the
transactions. However, it is susceptible to deadlocks, where two or more
transactions are indefinitely waiting for each other to release locks.
Conclusions
The comprehensive examination of Database Management Systems, spanning from
foundational concepts to advanced transaction processing and security, underscores the critical
role of DBMS in modern information technology. The evolution from rudimentary file systems to
sophisticated multi-tier architectures demonstrates a continuous drive to overcome limitations in
data management, particularly regarding redundancy, consistency, scalability, and security. The
multi-level abstraction provided by database schemas (physical, logical, conceptual) is
instrumental in managing complexity, allowing different stakeholders to interact with the
database at appropriate levels of detail while ensuring data independence and system
maintainability.
The relational model, with its structured tables and rigorous integrity constraints, provides a
robust framework for organizing data. Relational algebra serves as the logical foundation for
querying, while SQL, as its practical implementation, offers a powerful and universal language
for data definition, manipulation, and control. The systematic process of normalization, guided
by functional dependencies, is essential for designing efficient and anomaly-free database
schemas, directly impacting data integrity and query performance.
Finally, the intricacies of transaction processing, governed by the ACID properties, are
paramount for ensuring data reliability in concurrent environments. Concurrency control
techniques, such as locking, timestamping, and validation-based protocols, are vital for
balancing simultaneous access with data consistency. Coupled with robust recovery
mechanisms and comprehensive database security measures, these elements collectively
ensure the resilience, trustworthiness, and continuous availability of critical data assets.
In essence, the study of DBMS is not merely about understanding software tools; it is about
grasping the fundamental principles of data organization, integrity, and secure access that
underpin virtually all modern digital systems. Mastering these concepts is crucial for designing,
implementing, and managing reliable and high-performing database solutions in any domain.
Works cited