0% found this document useful (0 votes)
10 views43 pages

Database Management Systems Solutions

This study guide provides an in-depth overview of Database Management Systems (DBMS) for BCA Semester IV, covering key concepts such as data modeling, SQL, normalization, and transaction processing. It contrasts DBMS with traditional file systems, highlighting advantages like reduced redundancy, improved data consistency, and enhanced security. The guide also discusses various DBMS architectures, including 1-Tier, 2-Tier, and 3-Tier, emphasizing their respective benefits and limitations.

Uploaded by

0ye12x0fse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views43 pages

Database Management Systems Solutions

This study guide provides an in-depth overview of Database Management Systems (DBMS) for BCA Semester IV, covering key concepts such as data modeling, SQL, normalization, and transaction processing. It contrasts DBMS with traditional file systems, highlighting advantages like reduced redundancy, improved data consistency, and enhanced security. The guide also discusses various DBMS architectures, including 1-Tier, 2-Tier, and 3-Tier, emphasizing their respective benefits and limitations.

Uploaded by

0ye12x0fse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Comprehensive Study Guide for

Database Management Systems


(BCA(MAWT)-Semester IV)
Abstract
This study guide provides a comprehensive and detailed examination of core concepts in
Database Management Systems (DBMS), aligned with the BCA (Honours) with Specialization in
MAWT - Semester IV syllabus. Covering Units 1 through 5, the report delves into fundamental
principles, data modeling techniques, structured query language, data normalization, and
transaction processing, including concurrency control and recovery mechanisms. Designed for
exam preparation, it offers in-depth explanations, illustrative examples, and structured insights
to facilitate a thorough understanding of DBMS.

Unit 1: Introduction to Database Management Systems


This unit lays the foundational groundwork for understanding database systems, distinguishing
them from traditional file systems, exploring their architectural components, and introducing the
concept of data independence and various database languages.

1.1 Overview of DBMS


A Database Management System (DBMS) represents a systematic approach to the creation,
retrieval, updating, and overall management of data. It functions as a specialized software
system designed to manage and organize databases efficiently. Prominent examples of DBMS
include MySQL, Oracle, SQL Server, IBM DB2, and cloud-based solutions like Amazon
SimpleDB.
The operation of a DBMS is characterized by several key features. Data within a DBMS is
typically structured and stored in tables, which facilitates organization and access. A primary
characteristic is the significant reduction in data redundancy, meaning that information is stored
once, minimizing duplication. This also contributes to improved data consistency across the
system. Furthermore, DBMS are designed to support multiple users concurrently, allowing
various individuals or applications to access and modify data simultaneously. They incorporate
powerful query languages, such as SQL, for efficient data retrieval and manipulation, and they
offer robust security features to protect data integrity and confidentiality.
The advantages of employing a DBMS are substantial. By minimizing data duplication, DBMS
optimize storage space and streamline data management. The availability of sophisticated
query languages enables easy and efficient retrieval of specific data. Moreover, DBMS
significantly reduce the development time required for applications that interact with data, and
they lower ongoing maintenance needs due to their structured and consistent nature.
Despite these benefits, DBMS also present certain challenges. Their inherent complexity can be
a barrier, requiring specialized knowledge for setup and administration. Licensed DBMS
products, such as Oracle, often come with a comparatively high cost. Additionally, these
systems can be quite large in terms of software footprint and resource requirements.
The pervasive nature of data management in modern systems is evident across numerous
sectors. DBMS are integral to banking operations, managing all transactions. In the airline
industry, they handle reservations and schedules. Universities rely on DBMS for student
registration and grade management. Sales operations utilize them for customer, product, and
purchase records. Manufacturing processes, including production, inventory, orders, and supply
chain management, are heavily dependent on DBMS. Human resources departments use them
for employee records, salaries, and tax deductions. This widespread adoption highlights that
data management is not a niche IT function but a critical, pervasive element across virtually
every sector of contemporary life. The characteristics of DBMS, such as reduced redundancy,
consistency, and multi-user support, directly address the complex problems that arise when data
is not well-managed, establishing DBMS as a critical enabling technology for almost all modern
information systems. This underscores its importance beyond simple data storage, extending to
complex operational and strategic functions. The existence of specialized roles like Database
Administrators (DBAs) and Application Programmers further emphasizes the complexity and
dedicated expertise required for effective database management.
The users of a DBMS typically fall into three main categories: Database Administrators (DBAs),
who are responsible for the overall management and maintenance of the database system;
Application Programmers or Software Developers, who design and implement applications that
interact with the database; and End Users, who utilize the applications to access and
manipulate data for their specific tasks.

1.2 Database System vs. File System


The evolution of Database Management Systems was driven by the inherent limitations of
traditional file systems. A detailed comparative analysis reveals why DBMS became a
necessary advancement for modern data management.
Traditional file systems organize files within a storage medium on a computer, providing a basic
method for data storage. In contrast, a DBMS is a sophisticated software system specifically
designed for managing databases, offering a more structured and controlled environment.
A significant drawback of file systems is the prevalence of data redundancy, where the same
information might be stored in multiple locations, leading to inefficiencies and inconsistencies.
DBMS, by design, aim to eliminate or significantly reduce redundant data, ensuring that
information is stored uniquely. This problem-solution paradigm is fundamental: almost every
deficiency of a file system, such as redundancy, poor consistency, lack of security, and difficult
sharing, is directly addressed as an advantage of DBMS. This is not coincidental; it represents
the historical and technological evolution driven by the need to overcome these specific
limitations.
File systems typically lack an inbuilt mechanism for data backup and recovery, making data
vulnerable to loss in case of system failures. DBMS, however, provide robust in-house tools for
backing up and recovering data, even if it is lost due to unforeseen circumstances.
Query processing in file systems is generally inefficient, requiring manual or custom
programming to retrieve specific data. DBMS offer efficient query processing capabilities,
allowing users to retrieve complex data sets quickly using specialized query languages.
Data consistency is a major challenge in file systems, as redundant data can easily lead to
conflicting information. DBMS ensure higher data consistency through structured storage and
the application of normalization processes.
In terms of complexity, file systems are relatively simpler to manage for basic data storage.
However, DBMS are inherently more complex due to their advanced features for data
management, security, and concurrency control.
Security constraints are often rudimentary in file systems, providing less protection against
unauthorized access or data breaches. DBMS incorporate more sophisticated security
mechanisms, including access controls and encryption, to safeguard sensitive information.
From a cost perspective, file systems are generally less expensive to implement and maintain
for simple applications. Conversely, DBMS, especially commercial licensed versions, can have a
comparatively higher cost due to their advanced functionalities and support infrastructure.
A critical limitation of file systems is the absence of data independence, meaning changes to
data storage or structure often require modifications to applications accessing that data. DBMS,
in contrast, provide both logical and physical data independence, allowing changes at one level
without affecting others.
File systems typically support only single-user access at a time, limiting collaborative work.
DBMS are designed for multi-user access, enabling many users to interact with the database
concurrently.
For file systems, users often have to write explicit procedures for managing data, which can be
time-consuming and error-prone. In a DBMS, many common data management tasks are
handled by the system itself, reducing the need for extensive user-written procedures.
Data sharing is difficult in file systems because data is often distributed across many disparate
files. Due to their centralized nature, DBMS facilitate easy data sharing among authorized users
and applications.
File systems expose details of data storage and representation to the user. DBMS, on the other
hand, offer data abstraction, hiding the internal details of the database from users and
applications, simplifying interaction.
Implementing integrity constraints, which enforce rules to maintain data quality, is challenging in
file systems. DBMS make it easy to define and enforce such constraints, ensuring data
accuracy and validity.
To access data in a file system, users typically require specific attributes like file name and file
location. In a DBMS, such explicit attributes are generally not required for data access, as the
system manages data location and retrieval internally.
Examples of environments using file system principles include programming languages like
Cobol and C++ for data handling, while Oracle and SQL Server are prime examples of DBMS.
This comprehensive comparison serves as a powerful justification for the existence and
widespread adoption of DBMS. It highlights that DBMS is not just an alternative but a necessary
advancement for managing complex, shared, and evolving data environments, leading to
improved data quality, efficiency, and security.
Table 1.1: Comparison of File System and DBMS
Criteria File System DBMS
Structure Arranges files in storage Software for managing
medium. databases.
Data Redundancy Can be present. Absent.
Backup & Recovery No inbuilt mechanism. Provides in-house tools.
Query Processing Inefficient. Efficient.
Consistency Less data consistency. More data consistency (due to
normalization).
Complexity Less complex. More complex.
Security Constraints Less security. More security mechanisms.
Criteria File System DBMS
Cost Less expensive. Comparatively higher cost.
Data Independence No data independence. Logical and physical data
independence exists.
User Access Single user at a time. Multiple users at a time.
User Procedures Users write procedures for Users not required to write
management. procedures.
Sharing Difficult (data distributed in Easy (centralized nature).
many files).
Data Abstraction Gives details of storage and Hides internal details of
representation. Database.
Integrity Constraints Difficult to implement. Easy to implement.
Attributes for Access Requires file name, location. No such attributes required.
Examples Cobol, C++. Oracle, SQL Server.
1.3 Database System Concepts and Architecture
The conceptualization of a database system involves understanding its fundamental
components and how they are structured to facilitate data management. A database schema
serves as the foundational blueprint, meticulously defining the structure, organization, and
constraints of the data stored within a database. It outlines how the database is constructed and
how its data elements are interconnected.

1.3.1 Database Schema and Instances

The core components of a database schema are integral to its design. Tables represent the
basic units of data storage, organized into rows and columns. Columns define the specific type
of data to be held, while rows represent individual entries or records. Fields (Columns) within
these tables are assigned specific data types, such as integers, variable characters (varchar), or
dates, which dictate the kind of data they can hold. These fields can also be associated with
constraints, including primary keys, foreign keys, and unique constraints, to enforce data
integrity. Relationships are established through primary keys, which uniquely identify records
within a table, and foreign keys, which link tables together to ensure referential integrity.
Indexes are crucial for enhancing the speed of data retrieval, allowing for quicker access to
specific data points. Lastly, Views are virtual tables derived from queries across one or more
existing tables, serving to simplify complex queries and enhance data security by restricting
access to specific data subsets.
Database schemas are categorized into various types, each offering distinct levels of
abstraction and serving different purposes. The Physical Schema describes the lowest level of
data organization, detailing how data is physically stored in the database. This includes
specifications for files, indices, storage devices, and hardware configurations, with a focus on
optimizing storage and retrieval performance. The Logical Schema outlines the logical design
of the database, focusing on the structure without considering physical implementation details. It
specifies tables, fields, data types, relationships, and constraints, defining how data is logically
organized and interconnected. The Conceptual Schema, often referred to as the View
Schema, provides a high-level overview of the entire database structure. It abstracts the logical
schema, presenting an overall picture of entities, relationships, and constraints without delving
into implementation specifics. The existence of physical, logical, and conceptual schemas is a
deliberate design choice to manage the inherent complexity of database systems. Each level
provides a different perspective or abstraction, allowing various stakeholders, such as database
administrators, developers, and end-users, to interact with the database at their appropriate
level of detail without being overwhelmed by unnecessary complexities. For instance, an
end-user does not need to understand physical storage mechanisms to query data; they only
require a conceptual view. This multi-level architecture is crucial for modularity, data
independence, and maintainability in large-scale database systems, enabling changes at one
level (e.g., physical storage optimization) with minimal impact on higher levels (e.g., user
applications), which directly supports the concept of data independence.
In contrast to a static database schema, a Database Instance refers to a specific instantiation
or snapshot of a database system at a particular moment in time. It encompasses the
operational database along with its associated resources, including memory, processes, and
background processes. Unlike the static schema blueprint, a database instance is dynamic and
evolves as data is inserted, updated, or deleted. The primary distinction is that the schema
serves as the static blueprint of the database's structure, while the instance represents the
dynamic, active state containing the actual data values at any given point.
Table 1.3: Database Schema vs. Database Instance
Aspect Database Schema Database Instance
Definition Blueprint or design of the Actual data stored in the
database structure. database at a given time.
Nature Static (does not change Dynamic (changes with every
frequently). data modification).
Represents Structure (tables, columns, data State of the data in the
types, relationships). database.
Example Table definitions, data types, Actual rows of data in the
constraints. tables.
Change Frequency Changes infrequently (e.g., Changes frequently with
during schema design transactions.
changes).

1.3.2 Types of DBMS Architecture

The architecture of a Database Management System is fundamentally determined by how users


connect to the database to fulfill their requests. This design choice significantly impacts the
system's scalability, security, and performance.
1-Tier Architecture In a 1-Tier Architecture, the database is directly accessible to the user on
the same system. This means that all components—the client application, the server logic, and
the database itself—reside on a single machine. This setup is straightforward and easy to use,
making it ideal for personal or standalone applications where no external server or network
connection is required. The advantages of this architecture include its simplicity of setup, as
only a single machine is needed for maintenance. It is also cost-effective because it requires no
additional hardware, making it easy to implement for small-scale projects. However, the 1-Tier
Architecture has significant disadvantages. It is limited to a single user, as it is not designed for
multi-user access or collaborative environments. Security is poor, as compromising the single
machine grants easy access to both the data and the application. There is no centralized
control, as data is stored locally, complicating data management and backup across multiple
devices. Consequently, sharing data between users becomes difficult since all information
resides on a single computer.
2-Tier Architecture The 2-Tier Architecture resembles a basic client-server model, where
applications on the client side communicate directly with the database on the server side. This
interaction typically occurs through Application Programming Interfaces (APIs) like ODBC (Open
Database Connectivity) and JDBC (Java Database Connectivity). The server is responsible for
core database functionalities such as query processing and transaction management, while the
client handles user interfaces and application programs, establishing a direct connection with
the server to interact with the DBMS. This architecture offers several advantages, including easy
access to the database, which leads to fast data retrieval. It is also scalable to some extent,
allowing for the addition of more clients or hardware upgrades. Compared to 3-Tier Architecture,
it is generally lower in cost and easier to deploy, and its two-component structure makes it
simple to understand. Despite its benefits, the 2-Tier Architecture faces limitations. Its scalability
is limited, as system performance can degrade significantly when the number of users
increases, potentially overloading the server with too many direct requests. Security can be an
issue due to direct client-to-database connections, which make the system more vulnerable to
attacks or data leaks. The architecture also exhibits tight coupling between the client and server,
often necessitating client application updates if the database schema changes. This can make
maintenance more challenging as the number of users or systems grows.
3-Tier Architecture In a 3-Tier Architecture, an additional intermediate layer, typically an
application server, is introduced between the client and the database server. The client does not
communicate directly with the database; instead, it interacts with this application server, which
then communicates with the database system for query processing and transaction
management. This intermediate layer facilitates the exchange of partially processed data
between the server and the client and is commonly employed in large web applications, such as
e-commerce platforms. The advantages of the 3-Tier Architecture are significant. It offers
enhanced scalability due to the distributed deployment of application servers, eliminating the
need for individual client-server connections. Data integrity is better maintained because the
middle layer acts as a buffer, helping to prevent or remove data corruption. Security is also
enhanced, as this model prevents direct client interaction with the database server, thereby
reducing access to unauthorized data. However, this architecture introduces increased
complexity, making it more challenging to design and manage compared to 2-Tier systems.
Interaction can be more difficult due to the presence of additional middle layers. Response times
may be slower because requests must pass through an extra layer (the application server).
Finally, the higher cost is a notable disadvantage, as setting up and maintaining three separate
layers requires more hardware, software, and skilled personnel. The progression from 1-Tier to
3-Tier architecture demonstrates an evolutionary trend in database system design. Each
subsequent tier addresses a critical limitation of its predecessor, particularly concerning
scalability and security. The 1-Tier is simple but unscalable and insecure for shared data. The
2-Tier improves access but still struggles with large user bases and direct exposure. The 3-Tier
introduces an application layer to abstract complexity, improve security by preventing direct
client-database interaction, and enhance scalability through distributed processing. This
evolution reflects the increasing demands placed on database systems by growing user bases,
complex application logic, and stringent security requirements. The choice of architecture is a
strategic decision balancing simplicity, cost, performance, and robustness, directly impacting the
system's ability to meet business needs.
Table 1.2: Types of DBMS Architecture
Architecture Type Explanation Advantages Disadvantages
1-Tier Database directly Simple, cost-effective, Limited to single user,
available to user on easy to implement. poor security, no
same system (client, centralized control,
server, database on hard to share data.
one machine).
2-Tier Client applications Easy access (fast Limited scalability
communicate directly retrieval), scalable (add (performance degrades
with database server clients/upgrade with many users),
using APIs (e.g., hardware), lower cost security issues (direct
ODBC/JDBC). than 3-tier, easier connection), tight
deployment, simple. coupling, difficult
maintenance.
3-Tier Intermediate Enhanced scalability More complex, difficult
application server layer (distributed app interaction, slower
between client and servers), data integrity response time, higher
database server. Client (middle layer), cost.
interacts with app enhanced security (no
server. direct client-server
interaction).

1.4 Data Independence and Database Languages


The effective management of complex database systems relies heavily on the principle of data
independence and the structured use of database languages.

1.4.1 Logical and Physical Data Independence

Data Independence is a crucial feature in a Database Management System that allows


changes to be made to the schema of one layer without affecting the schema of the layer above
it. This separation of concerns is fundamental to the flexibility and maintainability of modern
DBMS.
Logical Data Independence refers to the ability to modify the logical schema, which represents
the conceptual level of the database, without impacting the external schema or view level. This
means that alterations made at the conceptual level, such as adding or deleting new entities or
attributes, will not necessitate changes in how users view or interact with the data. This type of
independence operates primarily at the user interface level, ensuring that applications and user
queries remain unaffected by changes to the underlying logical structure.
Physical Data Independence is the ability to change the physical schema without altering the
logical schema. For example, if the storage size of the database system server is modified, or if
the database is moved from one physical drive to another (e.g., from C drive to D drive), these
changes will not affect the conceptual structure of the database or the applications that interact
with it. This independence ensures that the conceptual level remains distinct from the internal
physical storage details and operates at the logical interface level.
Data independence (both logical and physical) is not merely a feature; it is a core design
principle that underpins the flexibility and maintainability of modern DBMS. Without it, every
change to storage (physical) or conceptual structure (logical) would necessitate widespread
modifications across applications and user views, leading to brittle, high-maintenance systems.
This separation of concerns is fundamental to agile development and long-term system viability.
Data independence significantly reduces the cost and complexity of database evolution. It
allows developers to optimize physical storage or refine the logical model without forcing
application rewrites, thereby increasing system longevity and adaptability to changing
requirements.

1.4.2 Overview of Data Definition Language (DDL) and Data Manipulation


Language (DML)

Database Management Systems provide a specialized set of languages for manipulating data,
encompassing operations such as insertion, deletion, updating, and modification. These
database languages are specifically designed to read, update, and store data efficiently within
the database.
Data Definition Language (DDL) is used for describing structures, patterns, and their
relationships within a database. It is the language used to define the database schema,
including tables, indexes, and constraints. DDL commands primarily affect the structure of the
database, not the data itself. A key characteristic of DDL commands is that they are
"auto-committed," meaning that any changes made are permanently saved to the database
immediately upon execution. Common DDL commands include:
●​ CREATE: Used to construct a new table or an entire database. For example, CREATE
TABLE EMPLOYEE(Name VARCHAR2(20), Email VARCHAR2(100), DOB DATE);.
●​ ALTER: Employed to modify the existing structure of a database. This could involve
adding a new column, such as ALTER TABLE STU_DETAILS ADD(ADDRESS
VARCHAR2(20));, or modifying the characteristics of an existing column, as in ALTER
TABLE STU_DETAILS MODIFY (NAME VARCHAR2(20));.
●​ DROP: Used to completely delete both the structure and all records stored within a table.
For instance, DROP TABLE EMPLOYEE;.
●​ TRUNCATE: Removes all rows from a table while preserving its structure and freeing up
the space occupied by the deleted data. An example is TRUNCATE TABLE EMPLOYEE;.
●​ RENAME: Used to change the name of a table.
●​ COMMENT: Allows for adding descriptive comments to the data dictionary, aiding in
documentation.
Data Manipulation Language (DML) is used to manipulate the actual data present within
tables or the database. It enables users to perform operations such as storing new data,
modifying existing data, updating records, and deleting information. Unlike DDL commands,
DML commands are not auto-committed, which means that changes made can be rolled back if
necessary, providing a layer of transactional control. Common DML commands include:
●​ INSERT: Used to add new rows of data into a table. For example, INSERT INTO BCA
VALUES (“Anuj", "DBMS");.
●​ UPDATE: Employed to modify or update the values of one or more columns in a table. An
example is UPDATE students SET User_Name = ’Anuj' WHERE Student_Id = '3'.
●​ DELETE: Used to remove one or more rows from a table based on specified conditions.
For instance, DELETE FROM BCA WHERE Author=“Anuj";.
●​ SELECT: Used to retrieve records from a specific table, often combined with a WHERE
clause to filter for particular records.
●​ MERGE: A command that allows for both insert and update operations (often referred to
as UPSERT) in a single statement.
The clear distinction between DDL and DML highlights a fundamental separation of concerns in
database management. DDL focuses on the schema or blueprint, defining the rules and
structure, while DML focuses on the instance or content, manipulating the actual data within that
defined structure. The auto-commit nature of DDL versus the rollback capability of DML further
emphasizes this, as structural changes are typically permanent, while data modifications often
require transactional integrity. This dual linguistic approach provides precise control over
different aspects of the database. It allows database administrators to manage the underlying
structure independently from how application developers and end-users interact with the data,
ensuring both structural integrity and operational flexibility.
Table 3.1: SQL Command Categories
Category Purpose Common Commands Syntax/Example
DDL (Data Definition Defines/modifies CREATE, ALTER, CREATE TABLE
Language) database structure. DROP, TRUNCATE, EMPLOYEE(Name
Auto-committed. RENAME, COMMENT VARCHAR2(20), Email
VARCHAR2(100));
<br> ALTER TABLE
STU_DETAILS
ADD(ADDRESS
VARCHAR2(20)); <br>
DROP TABLE
EMPLOYEE; <br>
TRUNCATE TABLE
EMPLOYEE;
DML (Data Manipulates data within INSERT, UPDATE, INSERT INTO BCA
Manipulation tables. Not DELETE, SELECT, VALUES (“Anuj",
Language) auto-committed (can MERGE "DBMS"); <br>
rollback). UPDATE students SET
User_Name = ’Anuj'
WHERE Student_Id =
'3'; <br> DELETE
FROM BCA WHERE
Author=“Anuj"; <br>
SELECT
COUNT(PHONE)
FROM STUDENT;
DCL (Data Control Manages user access GRANT, REVOKE GRANT SELECT,
Language) and permissions. UPDATE ON
MY_TABLE TO
SOME_USER; <br>
REVOKE SELECT,
UPDATE ON
MY_TABLE FROM
USER1;
TCL (Transaction Manages database COMMIT, ROLLBACK, DELETE FROM
Control Language) transactions. Used with SAVEPOINT CUSTOMERS WHERE
DML. AGE = 25; COMMIT;
<br> DELETE FROM
Category Purpose Common Commands Syntax/Example
CUSTOMERS WHERE
AGE = 25; ROLLBACK;
<br> SAVEPOINT
MY_SAVEPOINT;
Unit 2: Data Modeling using the Entity-Relationship
Model
This unit introduces the Entity-Relationship (ER) model as a high-level conceptual data model,
detailing its components, types of relationships, key concepts, and its translation into the
relational data model.

2.1 ER Model Concepts


The Entity-Relationship (ER) model is a high-level data model specifically designed to define the
data elements and their relationships within a given system. It serves as a crucial tool for
developing a conceptual design for the database, which is visually represented through an
Entity-Relationship Diagram (ERD).
ER diagrams are widely used in database design for several reasons. They effectively represent
the E-R model in a visual format, making them straightforward to convert into relational tables.
These diagrams are highly valuable for real-world modeling of objects, providing an intuitive
representation of complex systems. A significant advantage is that ER diagrams do not require
technical knowledge of the underlying DBMS, allowing for broader participation in the design
process. Ultimately, they offer a standardized and accessible solution for logically visualizing
data structures.

2.1.1 Entities, Attributes, and Relationships

The fundamental building blocks of an ER model include entities, attributes, and relationships.
An Entity can be any distinct object, class, person, or place about which data needs to be
stored. Examples in an organizational context might include a manager, product, employee, or
department. In ER diagrams, entities are represented as rectangles. An Attribute describes a
property or characteristic of an entity. For instance, for a student entity, attributes could include
an ID, age, contact number, or name. Attributes are graphically represented as ellipses. A
Relationship defines an association or connection between entities. These connections are
depicted using diamonds or rhombuses in ER diagrams.

2.1.2 Weak Entities and various Attribute Types

Beyond basic entities, the ER model distinguishes between regular and Weak Entities. A weak
entity is one that depends on another entity for its existence and does not possess a key
attribute of its own. It is visually represented by a double rectangle.
Attributes themselves can be further classified based on their characteristics:
●​ A Key Attribute represents the main distinguishing characteristic of an entity, often
serving as its primary identifier. In an ER diagram, it is shown as an ellipse with
underlined text.
●​ A Composite Attribute is an attribute that is composed of several other, simpler
attributes. For example, a "Name" attribute might be composed of "First_name,"
"Middle_name," and "Last_name." It is represented by an ellipse connected to other
ellipses that denote its constituent parts.
●​ A Multivalued Attribute is an attribute that can hold more than one value for a single
entity instance. An example is a student having multiple phone numbers. This type of
attribute is depicted by a double oval.
●​ A Derived Attribute is an attribute whose value can be computed or derived from other
attributes. For instance, a person's "Age" can be derived from their "Date of birth." It is
represented by a dashed ellipse.

2.1.3 Types of Relationships

Relationships between entities are categorized based on their cardinality, which describes how
many instances of one entity can be associated with instances of another.
●​ One-to-One (1:1) Relationship: Occurs when only one instance of an entity is
associated with exactly one instance of another entity. An example is a female marrying
one male.
●​ One-to-Many (1:M) Relationship: Involves one instance of an entity on one side being
associated with multiple instances of an entity on the other side. For example, a scientist
can invent many inventions, but each invention is attributed to a specific scientist.
●​ Many-to-One (M:1) Relationship: The inverse of one-to-many, where multiple instances
of an entity on one side are associated with a single instance on the other. For instance,
many students can enroll for only one course, but a course can have many students.
●​ Many-to-Many (M:N) Relationship: Occurs when multiple instances of an entity on one
side can be associated with multiple instances of an entity on the other. An example is
employees being assigned to many projects, and a project having many employees.

2.1.4 Mapping Constraints: Cardinality Ratios and Participation Constraints

Mapping Cardinality (or Cardinality Ratio) defines the maximum number of relationship
instances in which an entity can participate. For binary relationship types, the possible ratios
include 1:1, 1:N, N:1, and N:M.
Participation or Existence Constraint represents the minimum number of relationship
instances each entity must participate in, also known as the minimum cardinality constraint.
●​ Total Participation: Implies that every entity in the set must be related to another entity
via the relationship. This is also referred to as existence dependency and is represented
by a double line connecting the entity to the relationship in an ER diagram.
●​ Partial Participation: Indicates that not every entity in the set needs to be related
through the relationship. This is shown by a single line connecting the entity to the
relationship.
The ER model's detailed concepts—entities, attributes, relationships, weak entities, and various
attribute types—along with precise mapping constraints like cardinality and participation, are not
just arbitrary symbols. They provide a standardized, high-level language to capture the
semantics of the real world. This allows database designers to communicate complex business
rules and data interdependencies to non-technical stakeholders, ensuring the database
accurately reflects the organizational domain before technical implementation. The ER model
thus serves as a crucial conceptual bridge, translating ambiguous real-world requirements into a
structured, unambiguous blueprint for database design. This conceptual clarity minimizes
misinterpretations during the translation to logical and physical models, ultimately leading to
more accurate and robust database systems.
Table 2.1: ER Diagram Notations
ER Component Description Standard Graphical
Representation
Entity Any object, class, person, or Rectangle
place.
Weak Entity Entity dependent on another, Double Rectangle
without its own key.
Attribute Property of an entity. Ellipse
Key Attribute Main characteristic, primary Ellipse with underlined text
identifier.
Composite Attribute Attribute composed of other Ellipse connected to other
attributes. ellipses
Multivalued Attribute Attribute with more than one Double Oval
value.
Derived Attribute Attribute derived from other Dashed Ellipse
attributes.
Relationship Association between entities. Diamond or Rhombus
2.1.5 Keys in DBMS

Keys play a fundamental role in relational databases, serving as essential tools for uniquely
identifying records and establishing meaningful relationships between tables.
●​ Primary Key: This is the most critical key, used to uniquely identify one and only one
instance of an entity within a table. While an entity might possess multiple attributes that
could potentially serve as unique identifiers, the most suitable one is chosen as the
primary key. For example, in an EMPLOYEE table, Employee_ID is an ideal candidate for
a primary key because it is unique for each employee. Other attributes like
License_Number or Passport_Number could also be unique, but Employee_ID might be
chosen for practical reasons.
●​ Candidate Key: A candidate key is an attribute or a set of attributes that can uniquely
identify a tuple (row) in a table. All attributes that are not selected as the primary key but
still possess the ability to uniquely identify a tuple are considered candidate keys. These
keys are as strong as the primary key in their identification capability. Continuing with the
EMPLOYEE table example, if Employee_ID is designated as the primary key, then SSN,
Passport_Number, and License_Number would all be considered candidate keys because
they too can uniquely identify an employee record.
●​ Super Key: A super key is a set of attributes that, when combined, can uniquely identify a
tuple in a table. It is essentially a superset of a candidate key. This implies that a super
key can include additional attributes beyond what is strictly necessary for unique
identification, as long as the combination still guarantees uniqueness. For instance, for
the EMPLOYEE table, EMPLOYEE_ID by itself is a super key. The combination
(EMPLOYEE_ID, EMPLOYEE_NAME) is also a super key, because even if two
employees share the same name, their EMPLOYEE_ID will be distinct, ensuring unique
identification of the tuple.
●​ Foreign Key: Foreign keys are columns in one table that are used to point to the primary
key of another table. Their primary purpose is to establish and identify relationships
between different tables in a relational database. This mechanism is vital for linking
related information across various entities without introducing data redundancy. For
example, in a company, employees work in specific departments. To link the EMPLOYEE
table with the DEPARTMENT table, a Department_ID column in the EMPLOYEE table
would act as a foreign key, referencing the Department_ID (which is the primary key) in
the DEPARTMENT table. This approach ensures that departmental information is not
duplicated within the employee table, maintaining data integrity.
The various types of keys (Primary, Candidate, Super, Foreign) are not just labels; they are the
fundamental mechanisms through which data integrity and relationships are enforced in a
relational database. The primary key ensures uniqueness within a table, preventing duplicate
records. Candidate keys represent alternative unique identifiers. Super keys illustrate that
uniqueness can be achieved with more attributes than strictly necessary. Most importantly,
foreign keys are the glue that binds related tables together, maintaining referential integrity and
allowing meaningful queries across multiple entities. A robust key structure is paramount for a
well-designed database. It ensures data accuracy, prevents inconsistencies, and facilitates
efficient data retrieval and manipulation by clearly defining how data points relate to each other
across the entire database schema.
Table 2.2: Types of Keys in DBMS
Key Type Definition Example
Primary Key Uniquely identifies one and only Employee_ID in an
one instance of an entity. Most EMPLOYEE table.
suitable key from potential
candidates.
Candidate Key An attribute or set of attributes If Employee_ID is PK, then
that can uniquely identify a SSN, Passport_Number,
tuple. All non-primary keys that License_Number in
can uniquely identify a tuple. EMPLOYEE table.
Super Key A set of attributes that can EMPLOYEE_ID or
uniquely identify a tuple. A (EMPLOYEE_ID,
superset of a candidate key. EMPLOYEE_NAME) in
EMPLOYEE table.
Foreign Key Columns in one table that Department_ID in EMPLOYEE
reference the primary key of table referencing
another table, establishing Department_ID (PK) in
relationships. DEPARTMENT table.
2.1.6 Generalization and Aggregation

Generalization and Aggregation are advanced ER modeling techniques that directly address the
challenge of complexity in larger database designs, providing mechanisms for abstraction.
Generalization is a bottom-up process that involves extracting common properties from a set of
lower-level entities to create a higher-level, more generalized entity. This approach simplifies the
model by identifying commonalities and creating super-entities, reducing redundancy in the
model itself. For example, entities such as Pigeon, House Sparrow, Crow, and Dove can all be
generalized into a single Birds entity. Similarly, STUDENT and FACULTY entities, which share
common attributes like name and address, can be generalized into a PERSON entity. This
process supports both attribute inheritance (where lower-level entities inherit attributes from
higher-level entities, e.g., a Car inheriting a Model attribute from Vehicle) and participation
inheritance (where relationships involving a higher-level entity set are also inherited by
lower-level entities).
Aggregation is an abstraction mechanism used when an ER diagram needs to represent a
relationship between an entity and another relationship. It allows a relationship, along with its
corresponding entities, to be treated as a single higher-level entity set. This is particularly useful
for modeling 'has-a', 'is-a', or 'is-part-of' relationships. The primary purpose of aggregation is to
address scenarios where a direct relationship between an entity and a relationship cannot be
adequately represented in a standard ER diagram. For instance, if an Employee WORKS_FOR
a Project, and this combined WORKS_FOR relationship then REQUIRES Machinery,
aggregation allows the WORKS_FOR relationship (along with EMPLOYEE and PROJECT) to
be treated as a single entity. A new REQUIRE relationship can then be established between this
aggregated entity and the MACHINERY entity.
These abstraction mechanisms are critical for creating clear, concise, and manageable ER
diagrams for complex systems. They improve the readability and maintainability of the
conceptual model, making it easier to translate into an efficient relational schema.

2.1.7 Extended ER Model (EER) Concepts

The Extended Entity-Relationship (EER) model represents a significant enhancement to the


original ER model, designed to support the more complex data model requirements found in
modern database systems. It incorporates all the fundamental elements of the ER model while
introducing several new constructs.
Key features of the EER model include:
●​ Subclasses and Superclasses: The EER model allows for the creation of a hierarchical
structure where a superclass is a high-level entity that can be further divided into more
specific subsets called subclasses. Subclass entities automatically inherit all attributes
and relationships defined for their superclass. For example, Employee could be a
superclass, with Secretary, Technician, and Engineer as its subclasses. An employee with
emp_no 1001 who is a secretary would inherit general employee attributes like eno,
name, and salary, in addition to specific typing_speed.
●​ Specialization: This is a top-down process of defining subclasses from a superclass. It
involves identifying unique attributes and relationships among entities and creating
specialized subtypes based on these distinguishing features.
●​ Generalization: This is the reverse process of specialization, a bottom-up approach
where common attributes and relationships between multiple entities are identified to
create a higher-level superclass.
●​ Inheritance: A core mechanism in the EER model, inheritance allows subtypes
(subclasses) to automatically acquire attributes and relationships defined for their
supertype (superclass). The EER model also supports multiple inheritance, where an
entity can be a subclass of multiple superclasses, combining attributes from all its parent
superclasses. A Teaching Assistant, for instance, could be a subclass of both Employee
and Student.
●​ Constraints on Subclass Relationships: EER models allow for specifying constraints
on how entities relate to their subclasses:
○​ Total or Partial: Determines whether every entity in the superclass must be
associated with at least one entity in a subclass (total participation), or if it's optional
(partial participation).
○​ Overlapped or Disjoint: Specifies whether an entity from a superclass can exist in
multiple subclass sets (overlapped) or can belong to only one subclass set
(disjoint).
●​ Union (Category) Type: This concept represents a collection of objects formed by the
union of objects from different entity types, indicating an "either/or" relationship. For
example, a Library Member could be a union of Faculty, Student, and Staff, meaning a
library member is one of these types. Unlike subclasses, entities in a union type do not
necessarily inherit attributes from a common superclass; their attributes can be
independent.
The development of the Extended ER (EER) model indicates that the basic ER model, while
powerful, was insufficient for representing the intricate relationships and hierarchies found in
more complex, real-world databases (e.g., CAD/CAM, telecommunications). Concepts like
subclasses, superclasses, specialization, generalization, and union types directly address this
gap by providing richer semantic modeling capabilities, particularly for "is-a" relationships and
shared characteristics. EER models allow for a more precise and expressive representation of
complex organizational structures and data relationships. This enhanced modeling capability
leads to database designs that are more accurate, flexible, and capable of supporting advanced
applications, ultimately improving data quality and system utility.

2.2 Relational Data Model and Language


The Relational Data Model, proposed by E.F. Codd, revolutionized database management by
representing data in the form of relations, which are essentially tables. A relational database
stores data in these structured tables.

2.2.1 Relational Data Model Concepts

Several key terminologies define the relational data model:


●​ An Attribute refers to the properties that define a relation, corresponding to the columns
in a table. For instance, in a STUDENT relation, ROLL_NO and NAME are attributes.
●​ A Relation Schema represents the name of the relation along with its attributes. An
example is STUDENT (ROLL_NO, NAME, ADDRESS, PHONE, AGE) for the STUDENT
table. If a schema comprises more than one relation, it is termed a Relational Schema.
●​ A Tuple is each individual row within a relation.
●​ A Relation Instance is the collection of tuples in a relation at a specific point in time. This
instance is dynamic and can change with insertions, deletions, or updates to the
database.
●​ The Degree of a relation is the total number of attributes it contains. For example, a
STUDENT relation with ROLL_NO, NAME, ADDRESS, PHONE, and AGE has a degree
of 5.
●​ Cardinality refers to the number of tuples (rows) present in a relation. A STUDENT
relation with 4 rows has a cardinality of 4.
●​ A Column represents the set of values for a particular attribute.
●​ NULL Values denote data that is unknown or unavailable, typically represented by a
blank space.

2.2.2 Integrity Constraints


Integrity constraints are fundamental conditions that must be upheld for the data within a
database. These constraints are rigorously checked before any operations (such as insertion,
deletion, or updation) are performed. If a constraint is violated, the intended operation will fail,
thus ensuring data consistency and validity.
The types of integrity constraints discussed are:
●​ Domain Constraints: These are attribute-level constraints that dictate that an attribute
can only accept values that fall within its predefined domain range. For example, if an
AGE attribute has a constraint AGE > 0, attempting to insert a negative age value would
result in a failure.
●​ Key Integrity: Every relation in the database must possess at least one set of attributes
that uniquely identifies a tuple. This set is known as a key. A key must satisfy two
properties: it must be unique for all tuples, and it cannot contain NULL values. For
instance, ROLL_NO in the STUDENT table functions as a key because no two students
can have the same roll number. This category includes Primary Key, Candidate Key, and
Super Key, as previously discussed in section 2.1.5.
●​ Referential Integrity: This constraint ensures that an attribute in one relation can only
take values that are already present in another attribute of the same or a different relation.
For example, if a STUDENT table includes a BRANCH_CODE attribute, its values must
correspond to existing BRANCH_CODE values in a BRANCH table. The relation that
references another is termed the REFERENCING RELATION (e.g., STUDENT), and the
relation being referred to is the REFERENCED RELATION (e.g., BRANCH).
Integrity constraints (Domain, Key, Referential) are not just theoretical concepts; they are
practical rules implemented to ensure the quality, accuracy, and consistency of data within the
database. Without them, a database could quickly become a repository of invalid or
contradictory information, undermining its utility. For example, referential integrity prevents
"orphan" records by ensuring foreign key values always refer to existing primary key values,
maintaining meaningful relationships between tables. The rigorous enforcement of integrity
constraints is a cornerstone of reliable database systems. It prevents data anomalies at the
point of entry or modification, reducing the need for costly data cleaning and ensuring that the
database remains a trustworthy source of information for applications and analysis.

2.2.3 Relational Algebra

Relational Algebra is a procedural query language that operates on relations (tables), taking
relations as input and producing relations as output. It serves as the mathematical and logical
foundation upon which all relational database query languages, such as SQL, are built.
Basic Operators:
●​ Selection (σ): Used to select tuples (rows) from a relation that satisfy a specified
condition. The syntax is σ(Condition)(Relation Name). For example,
σ(AGE>18)(STUDENT) would extract all students older than 18 from the STUDENT
relation.
●​ Projection (∏): Used to select specific columns from a relation. The syntax is ∏(Column
1, Column 2…Column n)(Relation Name). A key characteristic is that it automatically
removes duplicate rows from the result. For instance, ∏(ROLL_NO,NAME)(STUDENT)
would extract only the ROLL_NO and NAME columns from the STUDENT relation.
●​ Cross Product (X): Used to combine two relations by concatenating every row of the first
relation with every row of the second relation. If Relation1 has 'm' tuples and Relation2
has 'n' tuples, their cross product will yield 'm x n' tuples. The syntax is Relation1 X
Relation2.
●​ Union (U): This operator combines tuples from two relations (R1 and R2) that are "union
compatible," meaning they must have the same number of attributes and their
corresponding attributes must have the same domain. The result contains all unique
tuples present in either R1 or R2, with duplicates appearing only once. The syntax is
Relation1 U Relation2.
●​ Minus (-): Also applicable only to union-compatible relations (R1 and R2), the minus
operator R1 - R2 yields a relation containing tuples that are present in R1 but not in R2.
The syntax is Relation1 - Relation2.
Extended Operators: These operators are derived from the basic relational algebra operators
and provide more specialized functionalities.
●​ Join: Used to combine data from two or more tables based on a related column between
them. Joins are crucial for efficient data retrieval in complex queries.
○​ Inner Join: Returns only those rows where there is a match in both tables. If no
match exists, the row is excluded. Types include:
■​ Conditional Join (Theta Join ⋈θ): Joins relations based on any specified
condition (e.g., equality, inequality, greater than).
■​ Equi Join: A special case of conditional join where the join condition is solely
based on equality between attributes.
■​ Natural Join (⋈): Automatically combines tables based on matching column
names and data types, eliminating duplicate columns in the result set without
explicit equality conditions.
○​ Outer Join: Returns all records from one table and the matched records from the
other. If no match is found, NULL values are included for the non-matching
columns. Types include:
■​ Left Outer Join (⟕): Returns all records from the left table and matching
records from the right table. Unmatched rows from the left table have NULLs
for right table columns.
■​ Right Outer Join (⟖): Returns all records from the right table and matching
records from the left table. Unmatched rows from the right table have NULLs
for left table columns.
■​ Full Outer Join (⟗): Returns all records when there is a match in either the
left or right table. If no match, it includes all rows from both tables with NULL
values for the missing side.
●​ Intersection (∩): Returns the common records from two union-compatible relations. It
retrieves rows that appear in both tables, ensuring only matching data is included.
●​ Divide (÷): Used to find records in one relation that are associated with all records in
another relation. This operator is particularly useful for identifying entities that satisfy
conditions across multiple related datasets.
Relational Algebra is presented as a procedural query language, which is crucial because it is
the mathematical and logical foundation upon which all relational database query languages
(like SQL) are built. Understanding these operators (selection, projection, joins, set operations)
provides a deep insight into how data is manipulated and retrieved at a fundamental level,
regardless of the specific SQL syntax used. The distinction between basic and extended
operators shows how complex operations can be built from simpler ones. Mastering relational
algebra provides a powerful conceptual framework for understanding query optimization and
database performance. It allows for thinking about data manipulation in a structured, formal way,
which is invaluable for designing efficient queries and understanding query execution plans.
Table 2.3: Relational Algebra Operators
Operator Symbol Purpose Syntax/Example
Selection σ Selects tuples (rows) σ(AGE>18)(STUDENT)
based on a condition.
Projection ∏ Selects particular ∏(ROLL_NO,NAME)(S
columns; removes TUDENT)
duplicates.
Cross Product X Joins two relations by STUDENT X
concatenating every STUDENT_SPORTS
row of R1 with every
row of R2.
Union U Combines unique STUDENT U
tuples from two EMPLOYEE
union-compatible
relations.
Minus - Returns tuples in R1 STUDENT -
but not in R2 EMPLOYEE
(union-compatible).
Join ⋈, ⋈θ, ⟕, ⟖, ⟗ Combines data from R ⋈ S (Natural Join)
two or more tables <br> R ⟕ S (Left Outer
based on related Join)
columns.
Intersection ∩ Returns common R∩S
unique records from
two union-compatible
relations.
Divide ÷ Finds records in one R÷S
relation associated with
all records in another.

2.2.4 Reduction of an ER Diagram to Tables

The process of reduction of an ER diagram to tables is a pivotal step in database design,


transforming a high-level ER conceptual model into a more detailed relational schema that can
be implemented in a relational database. This involves systematically breaking down entities,
attributes, and relationships from the ERD into tables (relations), columns (fields), and keys
within the relational schema.
The transformation follows a structured algorithm:
●​ Mapping Entity: For each entity identified in the ER Model, a corresponding table is
created in the relational schema. All attributes of the entity become fields (columns) in this
new table, retaining their respective data types. A primary key is then identified and
declared for the table to uniquely identify each record. For example, if a "Student" entity
has attributes like "StudentID", "Name", and "Major", a "Student" table would be created
with these as fields, and "StudentID" would be declared as the primary key.
●​ Mapping Relationship: A new table is created specifically for each relationship. The
primary keys of all participating entities are added as fields (columns) in this new
relationship table, serving as foreign keys. If the relationship itself possesses any
attributes, these are also added as fields. A primary key for the relationship table is
declared, which is typically a composite key formed by combining the primary keys of all
participating entities. Finally, all necessary foreign key constraints are declared to link the
primary keys from the participating entity tables to their corresponding fields in the
relationship table, ensuring referential integrity. For instance, a "Works_On" relationship
between "Employee" and "Project" entities would lead to a "Works_On" table containing
"EmployeeID" (foreign key from Employee), "ProjectID" (foreign key from Project), and
any relationship-specific attributes like "HoursWorked". The primary key for "Works_On"
would be the composite of "EmployeeID" and "ProjectID".
●​ Mapping Weak Entity Sets: For a weak entity set, which lacks its own primary key and
depends on an identifying entity for its existence, a dedicated table is created. All
attributes of the weak entity set are added as fields. Crucially, the primary key of its
identifying entity set is included as a field in the weak entity set's table, where it functions
as a foreign key. All foreign key constraints are then declared to link the weak entity table
to its identifying entity table. For example, if "Dependent" is a weak entity dependent on
"Employee", the "Dependent" table would include "DependentName", "Relationship", and
"EmployeeID" (as a foreign key referencing the "Employee" table).
●​ Mapping Hierarchical Entities (Specialization/Generalization): For hierarchical
structures representing specialization or generalization relationships
(superclass/subclass), separate tables are created for all higher-level entities
(superclasses) and all lower-level entities (subclasses). The primary keys of the
higher-level entities are added to the tables of their respective lower-level entities, serving
as foreign keys and often as primary keys for the subclass tables. Other specific attributes
unique to the lower-level entities are also included. All primary keys and foreign key
constraints are declared. For example, a "Person" superclass with "Employee" and
"Customer" subclasses would result in a "Person" table (with PersonID as PK), an
"Employee" table (with PersonID as FK/PK, plus EmployeeID, Salary), and a "Customer"
table (with PersonID as FK/PK, plus CustomerID, LoyaltyPoints). Foreign key constraints
would link the PersonID in Employee and Customer tables back to the Person table.
The "reduction of an ER diagram to tables" is a pivotal step in database design. It is where the
high-level, human-centric conceptual model (ERD) is systematically translated into the
structured, machine-implementable logical model (relational schema). This process is
algorithmic, ensuring that the integrity and relationships captured in the ERD are preserved in
the relational tables, forming the basis for actual database creation. The detailed steps for
entities, relationships, weak entities, and hierarchies illustrate the precision required in this
translation. This mapping process is foundational for database implementation. A correct and
systematic reduction ensures that the resulting relational database accurately reflects the
business requirements and maintains data integrity, preventing design flaws that could lead to
performance issues or data inconsistencies later.

Unit 3: Structured Query Language (SQL)


This unit focuses on SQL, the standard language for interacting with relational databases,
covering its characteristics, advantages, various operations for data definition, manipulation,
control, and transaction management.
3.1 Characteristics and Advantages of SQL
SQL, or Structured Query Language, is the standard language for managing data in relational
database management systems (RDBMS). It empowers users to create, read, update, and
delete relational databases and their constituent tables.
The characteristics of SQL make it a powerful and widely adopted language: it is easy to
learn, allowing rapid adoption by new users. SQL is specifically designed to access data from
RDBMS, enabling efficient data retrieval and manipulation. It can execute queries against the
database, retrieve specific information, and perform complex data operations. SQL is used to
describe the data, defining its structure and properties. It also serves to define the data within
the database and manipulate it as needed. Furthermore, SQL facilitates the creation and
deletion of databases and tables, and it supports the creation of views, stored procedures, and
functions, which enhance database functionality and security. SQL also allows administrators to
set granular permissions on tables, procedures, and views, controlling user access.
The advantages of SQL are numerous and contribute to its widespread use:
●​ High speed: SQL queries are designed to retrieve large amounts of records quickly and
efficiently from a database.
●​ No extensive coding needed: Managing the database system with standard SQL
requires a relatively small amount of code compared to other programming paradigms,
simplifying database administration.
●​ Well-defined standards: SQL adheres to long-established standards set by
organizations like ISO and ANSI, ensuring consistency and interoperability across
different RDBMS platforms.
●​ Portability: SQL can be used across various computing environments, including laptops,
personal computers, servers, and even some mobile phones, making it highly versatile.
●​ Interactive language: SQL serves as a domain-specific language for communicating with
the database, enabling users to pose complex questions and receive rapid answers.
●​ Multiple data view: SQL allows users to create different logical views of the database
structure, tailoring data presentation to specific needs without altering the underlying
physical storage.
The listed characteristics and advantages of SQL highlight its role as the de facto standard for
relational database interaction. Its "easy to learn" and "English-like" nature democratizes
database access, while "well-defined standards" and "portability" ensure interoperability across
different RDBMS products. This standardization is a key factor in the widespread adoption and
success of relational databases. SQL's universality makes it an indispensable skill for anyone
working with data. Its design facilitates both simple data retrieval and complex data
manipulation, enabling a broad range of applications from basic reporting to sophisticated data
analytics.

3.2 SQL Data Types and Literals


SQL data types and literals are fundamental concepts that define how data is represented and
stored within a database.
SQL Data Types specify the kind of data that can be stored in a column, ensuring data integrity
and efficient storage.
●​ Numeric Data Types: Used for storing numerical values. Examples include INT for whole
numbers, DECIMAL for exact decimal numbers (often used for financial calculations to
avoid rounding errors), and FLOAT or DOUBLE for approximate floating-point numbers.
●​ String Data Types: Used for storing textual data. CHAR stores fixed-length strings,
meaning it will always occupy a specified amount of storage regardless of the actual
string length. VARCHAR stores variable-length strings, which are more memory-efficient
as they only occupy space for the actual string plus a small overhead.
●​ Date and Time Data Types: Essential for storing temporal information. DATE stores year,
month, and day in a YYYY-MM-DD format. TIME stores hour, minute, and second in an
HH:MM:SS format. DATETIME combines both date and time components. TIMESTAMP is
an extension often storing year, month, day, hour, minute, second, and even fractional
seconds, and can also track changes over time.
SQL Literals are values directly expressed in a SQL command. They represent fixed values,
such as numbers or strings, that are provided directly within the SQL code rather than being
referenced from a database table or calculated at runtime.
●​ String Literals: Character sequences enclosed in single quotes. Examples include 'hello'
or 'Name'.
●​ Numeric Literals: Represent numerical values. These can be integers (e.g., 123, 55533),
floating-point numbers containing a decimal point (e.g., 3.14, 7777.333), or values
expressed in scientific notation (e.g., 1.5e3 for 1500).
●​ Date and Time Literals: Character representations of datetime values, also enclosed in
single quotes, typically following a specific format like 'YYYY-MM-DD' for dates or
'HH:MM:SS' for times.
●​ Boolean Literals: Represent logical truth values, typically TRUE or FALSE.
Understanding data types and literals is essential for writing correct and efficient SQL queries
and for designing robust database schemas. Data types define the nature of information a
column can hold, which is fundamental for data integrity and efficient storage. Literals are the
concrete values used in queries, directly representing data. This combination ensures that data
is correctly interpreted and manipulated by the DBMS. Without proper data type assignment,
numerical operations might fail, or dates might be misinterpreted. Incorrect data type choices
can lead to storage inefficiencies, data corruption, and erroneous query results, impacting the
overall reliability of the database.
Table 3.2: Common SQL Data Types
Data Type Category Specific Data Types Description/Example of Usage
Numeric INT (Integer) Stores whole numbers (e.g.,
employee_id, quantity).
DECIMAL(P, S) Stores exact decimal numbers
(e.g., amount DECIMAL(10, 2)
for currency).
FLOAT, DOUBLE Stores approximate
floating-point numbers (e.g., fee
FLOAT).
String CHAR(N) Stores fixed-length strings (e.g.,
gender CHAR(1)).
VARCHAR(N) Stores variable-length strings
(e.g., name VARCHAR(255)).
Date and Time DATE Stores date values
(YYYY-MM-DD).
TIME Stores time values
Data Type Category Specific Data Types Description/Example of Usage
(HH:MM:SS).
DATETIME Stores both date and time
values.
TIMESTAMP Stores date and time, often with
fractional seconds, and can
track changes.

3.3 SQL Operations


SQL operations comprise a set of commands used to interact with and manage data in a
relational database management system. These operations empower users to define the
structure of databases and tables, manipulate the data within them, control access permissions,
and manage the integrity of transactions.

3.3.1 DDL Commands

Data Definition Language (DDL) commands are used to alter the structure of database objects
like tables. These commands are characterized by their auto-committed nature, meaning that
changes are permanently saved to the database immediately upon execution.
●​ CREATE: This command is used to construct new tables or entire databases. For
example, to create an EMPLOYEE table with specific columns for name, email, and date
of birth, the syntax would be CREATE TABLE EMPLOYEE(Name VARCHAR2(20), Email
VARCHAR2(100), DOB DATE);.
●​ ALTER: The ALTER command modifies the structure of an existing database object. This
could involve adding a new column to a table, such as ALTER TABLE STU_DETAILS
ADD(ADDRESS VARCHAR2(20));, or changing the characteristics of an existing column,
as in ALTER TABLE STU_DETAILS MODIFY (NAME VARCHAR2(20));.
●​ DROP: This command is used to delete both the structure and all records stored within a
table. For instance, DROP TABLE EMPLOYEE; would remove the entire EMPLOYEE
table from the database.
●​ TRUNCATE: The TRUNCATE command is used to delete all rows from a table while
preserving its structure. It also frees up the space previously occupied by the deleted
data. An example is TRUNCATE TABLE EMPLOYEE;.

3.3.2 DML Commands

Data Manipulation Language (DML) commands are employed to modify the data residing within
the database. Unlike DDL commands, DML operations are not auto-committed, allowing
changes to be rolled back if necessary, which is crucial for transactional integrity.
●​ INSERT: This statement adds new rows of data into a table. For example, INSERT INTO
BCA VALUES (“Anuj", "DBMS"); would insert a new record into the BCA table.
●​ UPDATE: The UPDATE command is used to modify existing values in one or more
columns of a table. A conditional WHERE clause is typically used to specify which rows to
update. For instance, UPDATE students SET User_Name = ’Anuj' WHERE Student_Id =
'3' changes the User_Name for a specific student.
●​ DELETE: This command removes one or more rows from a table. Similar to UPDATE, a
WHERE clause can be used to specify which rows to delete. An example is DELETE
FROM BCA WHERE Author=“Anuj";.
●​ SELECT: Although often categorized separately as DQL (Data Query Language),
SELECT is a fundamental DML operation used to retrieve records from a table. It is
frequently combined with a WHERE clause to filter for particular records, as seen in
various examples throughout the document.

3.3.3 Subqueries

Subqueries, also known as inner queries or nested queries, are SQL queries embedded within
another SQL query. The inner query executes first, and its result is then used by the outer query.
This capability allows for complex data retrieval and the implementation of sophisticated
business logic directly within the database.
Subqueries play several important roles, including filtering records based on data from related
tables, aggregating data and performing calculations dynamically, cross-referencing data
between tables to retrieve specific insights, and conditionally selecting rows without requiring
explicit joins or external code logic.
There are several types of subqueries, each suited for different scenarios:
●​ Scalar Subqueries: These return a single value (one row and one column). They are
frequently used where a single value is expected, such as in calculations, comparisons, or
assignments within SELECT or WHERE clauses. For example, SELECT
employee_name, salary FROM employees WHERE salary > (SELECT AVG(salary)
FROM employees); retrieves employees whose salary is greater than the overall average
salary.
●​ Column Subqueries: These return a single column but multiple rows. They are often
used with operators like IN or ANY, where the outer query compares values from multiple
rows. An example is SELECT employee_name FROM employees WHERE department_id
IN (SELECT department_id FROM departments WHERE location = 'New York'); which
filters employees based on departments located in a specific city.
●​ Row Subqueries: These return a single row containing multiple columns. They are
typically used with comparison operators that can compare an entire row of data, such as
= or IN, when multiple values are expected. For instance, SELECT employee_name
FROM employees WHERE (department_id, job_title) = (SELECT department_id, job_title
FROM managers WHERE manager_id = 1); finds employees with matching department
and job titles to a specific manager.
●​ Table Subqueries (Derived Tables): These return a complete table of multiple rows and
columns. They are commonly used in the FROM clause as a temporary table within a
query. An example is SELECT dept_avg.department_id, dept_avg.avg_salary FROM
(SELECT department_id, AVG(salary) AS avg_salary FROM employees GROUP BY
department_id) AS dept_avg WHERE dept_avg.avg_salary > 50000; which uses a
derived table of average salaries per department to filter departments above a certain
threshold.
●​ Correlated Subqueries: These refer to columns from the outer query in their WHERE
clause and are re-executed once for each row processed by the outer query. This means
the subquery depends on the outer query for its values.
●​ Non-Correlated Subqueries: These do not refer to the outer query and can be executed
independently. Their result is calculated once and then used by the outer query.
The introduction of subqueries signifies SQL's capability to handle increasingly complex data
retrieval and business logic. Instead of just fetching data directly, subqueries allow for multi-step
computations and filtering, where one query's result informs another. This enables dynamic data
aggregation, cross-referencing, and conditional selection that would be cumbersome or
impossible with simple queries. The different types (scalar, column, row, table) show the
versatility in how these intermediate results can be used. Subqueries are a powerful feature for
advanced SQL users, enabling them to construct sophisticated queries that mimic complex
application logic directly within the database. This reduces the need for external
application-level processing, potentially improving performance and simplifying application code.

3.3.4 Aggregate Functions and GROUP BY

Aggregate functions perform mathematical operations on a set of data values within a relation
and return a single summary result. These functions are crucial for data analysis and reporting,
transforming raw data into meaningful insights.
●​ COUNT: This function counts the number of rows in a relation that satisfy a specified
condition. For example, SELECT COUNT (PHONE) FROM STUDENT; would count the
number of students with a phone number.
●​ SUM: The SUM function adds up the values of a specific numeric attribute in a relation.
An example is SELECT SUM(AGE) FROM STUDENT; which calculates the total sum of
ages for all students.
●​ AVERAGE (AVG): This function calculates the average value of tuples for a given
attribute. It can be expressed as AVG(attributename) or
SUM(attributename)/COUNT(attributename).
●​ MAXIMUM (MAX): The MAX function extracts the highest value among a set of tuples for
a specified attribute. Its syntax is MAX(attributename).
●​ MINIMUM (MIN): Conversely, the MIN function extracts the lowest value among a set of
tuples for a specified attribute. Its syntax is MIN(attributename).
The GROUP BY clause is used in conjunction with aggregate functions to group the tuples of a
relation based on one or more attributes. The aggregation function is then computed for each
distinct group. For example, SELECT ADDRESS, SUM(AGE) FROM STUDENT GROUP BY
(ADDRESS); would calculate the sum of ages for students residing in each distinct address.
The output would show aggregated sums per address, such as DELHI 36, GURGAON 18, and
ROHTAK 20.
Aggregate functions combined with GROUP BY transform raw data into meaningful summaries.
This is a critical capability for business intelligence and reporting. Instead of just listing individual
records, these operations allow users to derive insights like total sales per region, average
employee salary per department, or the number of students in each course. This moves beyond
simple data retrieval to data analysis. The ability to aggregate and group data directly within
SQL empowers analysts and decision-makers to gain high-level insights from large datasets
efficiently. This reduces the need for external tools for basic statistical analysis, making SQL a
powerful tool for data-driven decision-making.
Table 3.3: SQL Aggregate Functions
Function Name Purpose Example SQL Query
COUNT Counts the number of rows (or SELECT COUNT(PHONE)
non-NULL values in a column). FROM STUDENT;
SUM Calculates the sum of values in SELECT SUM(AGE) FROM
a numeric column. STUDENT;
Function Name Purpose Example SQL Query
AVG (Average) Calculates the average value of SELECT AVG(AGE) FROM
a numeric column. STUDENT;
MAX (Maximum) Finds the maximum value in a SELECT MAX(AGE) FROM
column. STUDENT;
MIN (Minimum) Finds the minimum value in a SELECT MIN(AGE) FROM
column. STUDENT;
3.3.5 Joins

Joins are fundamental SQL operations used to combine data from two or more tables based on
related columns between them. They are essential for reconstructing information that is logically
distributed across multiple normalized tables.
●​ Inner Join: This type of join returns only those rows where there is a match in both tables
based on the join condition. If a row in one table does not have a corresponding match in
the other, it is excluded from the result.
○​ Conditional Join (Theta Join ⋈θ): A general form of join that combines relations
based on any specified condition, which can include comparison operators like >, <,
=, >=, <=, or !=.
○​ Equi Join: A specific type of conditional join where the join condition is solely
based on an equality (=) between attributes from the two tables. In the result, only
one of the equal attributes is typically displayed.
○​ Natural Join (⋈): This join automatically combines two tables based on matching
column names and data types. It implicitly applies an equality condition on all
identically named columns and eliminates duplicate columns from the result,
providing a seamless combined dataset.
●​ Outer Join: Unlike inner joins, outer joins return all records from one table and the
matched records from the other. If no match is found for a row in one table, the result will
include NULL values for the columns of the non-matching table.
○​ Left Outer Join (⟕): Returns all records from the "left" table (the first table specified
in the FROM clause) and only the matching records from the "right" table. If there is
no match in the right table, the columns from the right table will contain NULL
values for that row.
○​ Right Outer Join (⟖): Returns all records from the "right" table and only the
matching records from the "left" table. Unmatched rows from the right table will
have NULL values for the left table's columns.
○​ Full Outer Join (⟗): Returns all records when there is a match in either the left or
right table. If no match is found in either table, it includes all rows from both tables,
with NULL values for the missing side.
●​ Cross Join: This operation produces the Cartesian product of two tables. It combines
every row from the first table with every row from the second table, resulting in m x n
tuples, where m and n are the number of rows in each table, respectively.
Joins are fundamental to the relational model's power. They enable the reconstruction of
information that is logically distributed across multiple tables (due to normalization). Without
joins, normalized databases would be fragmented, making it impossible to answer queries that
require data from related entities. The different types of joins (inner, outer) provide flexibility in
how data is combined, allowing for precise control over what data is included or excluded based
on matching criteria. Joins are essential for data integration and comprehensive querying in
relational databases. They allow complex relationships between entities to be leveraged,
enabling the retrieval of a holistic view of the data that is critical for business operations and
analysis.
Table 3.4: SQL Join Types
Join Type Description
Inner Join Returns rows only when there is a match in
both tables based on the join condition.
Left Outer Join Returns all rows from the left table, and the
matched rows from the right table. Unmatched
left rows have NULLs for right columns.
Right Outer Join Returns all rows from the right table, and the
matched rows from the left table. Unmatched
right rows have NULLs for left columns.
Full Outer Join Returns all rows from both tables when there is
a match in either. Unmatched rows from either
side have NULLs for the missing columns.
Cross Join Produces the Cartesian product of two tables,
combining every row from the first with every
row from the second.

3.3.6 Set Operations

SQL provides set operations that allow for combining the results of two or more SELECT
statements. These operations require the participating queries to be "union compatible,"
meaning they must have the same number of columns, and corresponding columns must have
compatible data types.
●​ Union (U): This operation combines the result sets of two or more SELECT statements
and returns all unique rows from both queries. Duplicate rows are eliminated from the final
result. For example, STUDENT U EMPLOYEE would combine unique rows from both
student and employee tables, assuming they are union compatible.
●​ Intersection (∩): The INTERSECT operator returns only the distinct rows that are
present in both result sets of two SELECT statements. For instance, SELECT manager_id
FROM departments INTERSECT SELECT employee_id FROM employees; would list IDs
that are present in both the manager_id column of the departments table and the
employee_id column of the employees table, effectively showing managers who are also
employees.
●​ Minus (EXCEPT): The MINUS (or EXCEPT in some SQL dialects like SQL Server)
operator returns distinct rows from the first SELECT statement that are not present in the
second SELECT statement. For example, STUDENT - EMPLOYEE would return records
of individuals who are students but not employees. Similarly, SELECT employee_id
FROM employees EXCEPT SELECT manager_id FROM departments; would list
employees who are not managers.
Set operations are powerful tools for comparative analysis between different datasets, provided
they are "union compatible." They allow users to find combined results, common elements, or
distinct differences between two query results, which is invaluable for identifying overlaps, gaps,
or unique populations within the data. These operations extend SQL's analytical capabilities,
enabling users to perform sophisticated comparisons directly within the database. This is
particularly useful for tasks like identifying shared customer bases, finding employees who are
not also managers, or combining sales data from different regions.
Table 3.5: SQL Set Operations
Operation Description Example (Conceptual)
Union Combines unique rows from SELECT Name FROM
two or more queries. Students UNION SELECT
Name FROM Employees; (Lists
all unique names from both
tables)
Intersection Returns common unique rows SELECT ProductID FROM
found in both queries. Sales2023 INTERSECT
SELECT ProductID FROM
Sales2024; (Lists products sold
in both years)
Minus (EXCEPT) Returns unique rows from the SELECT EmployeeID FROM
first query that are not present FullTimeStaff EXCEPT
in the second. SELECT EmployeeID FROM
PartTimeStaff; (Lists full-time
staff not also part-time)

3.3.7 Data Control Language (DCL)

Data Control Language (DCL) commands are specifically designed to manage user access
privileges and control authority within a database. They determine who can perform what
actions on which database objects.
●​ GRANT: This command is used to bestow specific user access privileges to a database
or its objects. For instance, GRANT SELECT, UPDATE ON MY_TABLE TO
SOME_USER, ANOTHER_USER; would allow SOME_USER and ANOTHER_USER to
read and modify data in MY_TABLE.
●​ REVOKE: Conversely, the REVOKE command is used to withdraw permissions that were
previously granted to users. An example is REVOKE SELECT, UPDATE ON MY_TABLE
FROM USER1, USER2;, which would remove the SELECT and UPDATE privileges from
USER1 and USER2 on MY_TABLE.
DCL commands are the primary mechanisms for implementing access control and security
within the database. They define who can do what with the data and schema. This is not just
about preventing unauthorized access but also about implementing the principle of least
privilege, ensuring users only have the necessary permissions for their roles. This directly ties
into the broader concept of database security. Proper use of DCL is critical for maintaining the
confidentiality, integrity, and availability of sensitive database information. It allows
administrators to enforce granular security policies, protecting data from unauthorized
modifications or disclosures.

3.3.8 Transaction Control Language (TCL)

Transaction Control Language (TCL) commands are used in conjunction with DML commands
to manage transactions. These operations are distinct in that they are not automatically
committed, providing explicit control over the permanence of data modifications.
●​ COMMIT: The COMMIT command is used to permanently save all transactions to the
database. Once a transaction is committed, its changes become a permanent part of the
database and cannot be undone by a simple rollback. For example, after deleting
customers aged 25, DELETE FROM CUSTOMERS WHERE AGE = 25; COMMIT; would
make those deletions permanent.
●​ ROLLBACK: The ROLLBACK command is used to undo transactions that have not yet
been permanently saved to the database. This is crucial for reverting the database to its
state before a series of operations if an error occurs or the transaction is not completed
successfully. For instance, DELETE FROM CUSTOMERS WHERE AGE = 25;
ROLLBACK; would undo the deletion if it had not yet been committed.
●​ SAVEPOINT: A SAVEPOINT allows for rolling back a transaction to a specific designated
point without having to roll back the entire transaction. This provides finer-grained control
over transaction management, enabling partial rollbacks. The syntax is SAVEPOINT
SAVEPOINT_NAME;.
TCL commands are directly linked to the ACID properties of transactions. They provide the
explicit control necessary to ensure Atomicity (all or nothing) and Durability (permanent
changes) for DML operations. The ability to ROLLBACK is crucial for maintaining consistency in
the face of errors or incomplete operations, preventing partial updates from corrupting the
database state. TCL is essential for managing the reliability of data modifications in a multi-user
environment. It allows developers to define logical units of work that are either fully completed or
completely undone, guaranteeing the integrity of the database even during complex operations
or system failures.

Unit 4: Data Normalization


This unit delves into functional dependencies, which are the theoretical basis for data
normalization. It explains the purpose and advantages of normalization, identifies data
anomalies, and details the various normal forms up to BCNF, providing criteria and examples for
each.

4.1 Functional Dependencies


A functional dependency (FD) is a relationship that exists between two attributes or sets of
attributes within a database table. It specifies that the value of one attribute (or set of attributes),
known as the determinant (X), uniquely determines the value of another attribute (or set of
attributes), known as the dependent (Y). This is commonly expressed as X → Y. For example,
in an employee table, Emp_Id → Emp_Name signifies that knowing an employee's ID uniquely
determines their name.
Functional dependencies are categorized into different types:
●​ Trivial Functional Dependency: An FD A → B is considered trivial if B is a subset of A.
This means that the dependency is self-evident and does not provide new information
about the relationships between attributes. For instance, {Employee_id, Employee_Name}
→ Employee_Id is a trivial functional dependency because Employee_Id is a subset of
{Employee_id, Employee_Name}.
●​ Non-trivial Functional Dependency: An FD A → B is non-trivial if B is not a subset of A.
These dependencies reveal meaningful relationships between distinct attributes. An
example is ID → Name, where Name is not a subset of ID.
4.1.2 Inference Rules for Functional Dependencies (Armstrong's Axioms)

To systematically derive new functional dependencies from a given set of existing ones, a set of
inference rules, often referred to as Armstrong's Axioms, are employed. These rules form the
formal grammar that defines valid relationships between attributes in a relational database.
They are not just descriptive; they are prescriptive rules that dictate how data must behave to
ensure consistency. The inference rules allow designers to logically deduce new dependencies
from a given set, which is critical for identifying potential problems and designing optimal
schemas.
The six primary inference rules are:
●​ Reflexivity Rule (IR1): If Y is a subset of X, then X determines Y (X → Y). This rule
states that a set of attributes always determines any of its subsets. For example, if a
database contains StudentID and Name attributes, then {StudentID,Name} → {StudentID}
is a valid dependency, as StudentID is a subset of the combined set.
●​ Augmentation Rule (IR2): If X → Y, then XZ → YZ. This rule indicates that adding any
set of attributes (Z) to both sides of an existing functional dependency does not invalidate
the dependency. For instance, if StudentID → Name, then adding Address to both sides
implies that {StudentID, Address} → {Name, Address} is also true.
●​ Transitive Rule (IR3): If X → Y and Y → Z, then X → Z. This rule allows for the chaining
of dependencies, meaning if X determines Y, and Y in turn determines Z, then X must also
determine Z. An example is if StudentID → Name and Name → Address, then StudentID
→ Address can be inferred.
●​ Union Rule (IR4) / Additivity: If X → Y and X → Z, then X → YZ. This rule states that if a
set of attributes X determines two other sets of attributes Y and Z separately, then X also
determines the combination of Y and Z. For example, if EmployeeID → EmployeeName
and EmployeeID → Department, then EmployeeID → EmployeeName, Department can
be derived.
●​ Decomposition Rule (IR5) / Projectivity: If X → YZ, then X → Y and X → Z. This rule is
the inverse of the Union Rule, allowing a combined dependency to be broken down into
individual dependencies. For instance, if ProductID → ProductName, ProductPrice, then it
can be decomposed into ProductID → ProductName and ProductID → ProductPrice.
●​ Pseudo-transitive Rule (IR6): If X → Y and YZ → W, then XZ → W. This rule is a
variation of transitivity, where an additional set of attributes (Z) is involved in the
intermediate step. An example is if CourseID → Department and {Department, Semester}
→ Instructor, then {CourseID, Semester} → Instructor can be inferred.
A deep understanding of functional dependencies is essential for effective database design and
normalization. They provide the theoretical foundation for identifying redundancy and
anomalies, guiding the decomposition of tables into well-structured forms that maintain data
integrity and improve query performance.
Table 4.1: Functional Dependency Inference Rules (Armstrong's Axioms)
Rule Name Syntax Example
Reflexivity If Y ⊆ X then X → Y {StudentID,Name} →
{StudentID}
Augmentation If X → Y then XZ → YZ If StudentID → Name, then
{StudentID, Address} →
{Name, Address}
Rule Name Syntax Example
Transitivity If X → Y and Y → Z then X → Z If StudentID → Name and
Name → Address, then
StudentID → Address
Union (Additivity) If X → Y and X → Z then X → If EmployeeID →
YZ EmployeeName and
EmployeeID → Department,
then EmployeeID →
EmployeeName, Department
Decomposition (Projectivity) If X → YZ then X → Y and X → If ProductID → ProductName,
Z ProductPrice, then ProductID
→ ProductName and ProductID
→ ProductPrice
Pseudo-transitivity If X → Y and YZ → W then XZ If CourseID → Department and
→W {Department, Semester} →
Instructor, then {CourseID,
Semester} → Instructor

4.2 Normal Forms


Normalization is a systematic process in database management systems aimed at minimizing
redundancy from a relation or set of relations. Its primary purpose is to ensure that the design of
a database is efficient, organized, and free from data anomalies. Normal forms serve as a series
of guidelines to achieve this reduction in redundancy and improve data quality.
Redundancy in a relation can lead to various undesirable inconsistencies or problems known as
anomalies during data manipulation. These include:
●​ Insertion Anomalies: Difficulties that arise when attempting to insert new data into the
database.
●​ Deletion Anomalies: Problems where deleting certain data inadvertently leads to the
unintended loss of other related data.
●​ Update Anomalies: Issues that occur when updating data, requiring multiple changes
across the database to maintain consistency, which can be error-prone.
Normalization offers several significant advantages for database design and management:
●​ Reduced Data Redundancy: By eliminating duplicate data in tables, normalization
optimizes the amount of storage space required and enhances overall database
efficiency.
●​ Improved Data Consistency: It ensures that data is stored in a consistent and organized
manner, thereby significantly reducing the risk of data inconsistencies and errors.
●​ Simplified Database Design: Normalization provides clear guidelines for organizing
tables and defining data relationships, making the database easier to design, understand,
and maintain.
●​ Improved Query Performance: Normalized tables are typically easier to search and
retrieve data from, which often results in faster query execution and improved overall
system performance.
●​ Easier Database Maintenance: By breaking down a database into smaller, more
manageable tables, normalization reduces complexity, simplifying tasks such as adding,
modifying, or deleting data.
4.2.2 First Normal Form (1NF)

First Normal Form (1NF) is the most basic level of normalization. A relation is in 1NF if each
table cell contains only a single, atomic value, and each column has a unique name. Essentially,
every attribute in that relation must be a single-valued attribute. A relation violates 1NF if it
contains composite or multi-valued attributes. For example, a STUDENT table with a
multi-valued STUD_PHONE attribute (where a student can have multiple phone numbers stored
in a single cell) would violate 1NF. To convert such a table to 1NF, the multi-valued attribute
would need to be separated, perhaps by creating a new table for phone numbers linked by the
STUD_NO.

4.2.3 Second Normal Form (2NF)

To be in Second Normal Form (2NF), a relation must first satisfy the conditions of 1NF, and it
must not contain any partial dependency. A partial dependency occurs when a non-prime
attribute (an attribute that is not part of any candidate key) is dependent on a proper subset of a
candidate key of the table. Consider a table with attributes STUD_NO, COURSE_NO, and
COURSE_FEE. If {STUD_NO, COURSE_NO} is the primary key, and COURSE_NO alone
determines COURSE_FEE (i.e., COURSE_NO → COURSE_FEE), then COURSE_FEE is a
non-prime attribute that is partially dependent on COURSE_NO (which is a proper subset of the
primary key). This scenario violates 2NF. To convert this relation to 2NF, it would be split into
two tables: (STUD_NO, COURSE_NO) and (COURSE_NO, COURSE_FEE), thereby removing
the partial dependency and reducing data redundancy.

4.2.4 Third Normal Form (3NF)

Third Normal Form (3NF) builds upon 2NF. A relation is in 3NF if it is in 2NF and does not have
any transitive dependency for non-prime attributes. A transitive dependency exists when a
non-prime attribute depends on another non-prime attribute, which in turn depends on the
primary key, creating an indirect dependency. The basic condition for a non-trivial functional
dependency X → Y to satisfy 3NF is that either X is a Super Key, or Y is a Prime Attribute
(meaning every element of Y is part of a candidate key). For example, consider an Enrollments
table where Enrollment ID is the primary key, and it includes Course ID and Course Name. If
Enrollment ID → Course ID and Course ID → Course Name, then Course Name transitively
depends on Enrollment ID via Course ID. This violates 3NF. To achieve 3NF, the table would
be split into Enrollments (Enrollment ID, Student Name, Course ID) and Courses (Course ID,
Course Name), ensuring that Course Name directly depends only on Course ID.

4.2.5 Boyce-Codd Normal Form (BCNF)

Boyce-Codd Normal Form (BCNF) is a stricter form of 3NF. A relation is in BCNF if it is


already in 3NF, and for every non-trivial functional dependency X → Y, X must be a superkey.
This means that every determinant in the table must be a candidate key. A BCNF violation
occurs when a non-trivial functional dependency exists, and the determinant (the left side of the
dependency) is not a superkey. This situation often arises in tables that have multiple
overlapping candidate keys. For example, in a Courses table where (course_id, professor) is the
primary key, if there is a dependency course_id → room (meaning each course is taught in a
specific room), this violates BCNF because course_id alone is not a superkey, even though it
determines room. To convert this table to BCNF, it would be decomposed into courses_rooms
(course_id, room) and professors_courses (course_id, professor).
Normalization (1NF, 2NF, 3NF, BCNF) is a systematic, step-by-step process for refining
database schemas. It is not just about "reducing redundancy" but about eliminating specific
types of data anomalies (insertion, deletion, update) that can compromise data integrity. Each
normal form builds upon the previous one, imposing stricter rules based on functional
dependencies, leading to increasingly robust and consistent database designs. The progression
reflects a deepening understanding of data relationships and their potential pitfalls.
Normalization is a critical discipline in database design that directly impacts the long-term
maintainability, reliability, and performance of a database. By systematically addressing
dependencies and anomalies, normalized databases are less prone to errors, easier to update,
and provide a consistent foundation for data analysis and application development.
Table 4.2: Normal Forms Summary
Normal Form Condition(s) Example of Violation &
Resolution
1NF Each cell has a single value; Violation: STUDENT table with
each column unique name; no multi-valued STUD_PHONE.
multi-valued/composite <br> Resolution: Create
attributes. separate STUDENT_PHONE
table (STUD_NO, PHONE).
2NF In 1NF; no partial Violation: (STUD_NO,
dependencies (non-prime COURSE_NO) PK,
attribute dependent on subset COURSE_NO →
of candidate key). COURSE_FEE. <br>
Resolution: Split into
(STUD_NO, COURSE_NO)
and (COURSE_NO,
COURSE_FEE).
3NF In 2NF; no transitive Violation: Enrollment ID PK,
dependencies (non-prime Course ID → Course Name.
attribute dependent on another <br> Resolution: Split into
non-prime attribute). Enrollments (Enrollment ID,
Student Name, Course ID) and
Courses (Course ID, Course
Name).
BCNF In 3NF; for every non-trivial FD Violation: (course_id,
X → Y, X must be a superkey. professor) PK, but course_id →
room. <br> Resolution: Split
into courses_rooms (course_id,
room) and professors_courses
(course_id, professor).

Unit 5: Transaction Processing and Concurrency


Control
This unit explores the critical aspects of transaction management, including the ACID
properties, transaction states, serializability for concurrent execution, recovery mechanisms
from failures, and various concurrency control techniques, concluding with an overview of
database security.

5.1 Transaction Processing Concepts


Transaction processing is a core component of Database Management Systems, ensuring data
integrity and reliability, especially in multi-user environments.

5.1.1 Transaction System

A transaction is defined as a set of logically related operations or a cohesive group of tasks


performed by a single user to access and manipulate the contents of a database. For example,
the seemingly simple act of transferring Rs 800 from account X to account Y involves multiple
low-level, interdependent tasks: reading X's balance, deducting 800 from X, writing the new X
balance, reading Y's balance, adding 800 to Y, and finally writing the new Y balance.
The primary operations of a transaction are:
●​ Read(X): This operation retrieves the value of a data item X from the database and
temporarily stores it in a buffer within the main memory.
●​ Write(X): This operation takes the value from the buffer and writes it back to the
database, making the changes persistent.
Transactions can fail due to various reasons, such as hardware malfunctions, software errors, or
power outages, before completing all their operations. To manage such failures and ensure data
consistency, two crucial operations are employed:
●​ Commit: This operation is used to permanently save all the work done by a transaction to
the database, making its changes visible and durable.
●​ Rollback: This operation is used to undo all the work done by a transaction that has not
yet been saved to the database. It reverts the database to its state prior to the
transaction's initiation, ensuring that partial or erroneous updates are not made
permanent.

5.1.2 ACID Properties

Transactions adhere to four fundamental properties, collectively known as ACID, which are
essential for maintaining consistency and reliability in a database before and after a
transaction's execution. These properties form the bedrock of transactional reliability in DBMS.
They are not merely desirable features but a contract that the database system offers to ensure
that concurrent operations and system failures do not compromise data integrity.
●​ Atomicity: This property dictates that all operations within a transaction must either take
place entirely or not at all. There is no concept of partial execution; a transaction is treated
as a single, indivisible unit (an "all-or-nothing" proposition). If a transaction Aborts, none of
its changes are visible in the database. If it Commits, all its changes become permanently
visible.
●​ Consistency: This property ensures that integrity constraints are maintained. A
transaction must transform the database from one consistent state to another consistent
state. For example, in a money transfer, if the total balance across two accounts is 900
before the transaction, it must remain 900 after the transaction, even if individual account
balances change. Inconsistency would arise if one part of the transfer succeeded while
another failed.
●​ Isolation: This property mandates that data being used by one transaction during its
execution cannot be accessed or modified by a second transaction until the first one is
completed. This prevents interference between concurrent transactions, ensuring that
each transaction appears to execute in isolation from others. The concurrency control
subsystem of the DBMS is responsible for enforcing isolation.
●​ Durability: This property guarantees that once a transaction is successfully completed
and committed, the permanent changes made to the database will not be lost, even in the
event of system failures (such as power outages or hardware malfunctions). The recovery
subsystem of the DBMS is responsible for ensuring this durability.
Adherence to ACID properties is paramount for any reliable database system. It ensures that
complex, multi-step operations are treated as indivisible units, preventing data corruption and
providing a consistent view of the data to all users, even in highly concurrent or failure-prone
environments.

5.1.3 Transaction States

During its lifecycle, a transaction progresses through various states, each representing a
different stage in its execution. This structured progression is essential for the DBMS to manage
complex operations, especially in multi-user environments.
●​ Active State: This is the initial state for every transaction. In this state, the transaction is
actively executing its operations, such as inserting, deleting, or updating records.
However, at this point, none of the changes made by the transaction have been
permanently saved to the database.
●​ Partially Committed State: A transaction enters this state after it has successfully
executed its final operation. Although all intended operations are complete, the data
changes are still residing in volatile memory and are not yet permanently saved to the
database.
●​ Committed State: A transaction reaches the committed state if all its operations have
been executed successfully and its changes have been permanently saved on the
database system. Once committed, the effects of the transaction are durable and cannot
be lost.
●​ Failed State: A transaction enters the failed state if any of the checks performed by the
database recovery system fail. This could occur due to various issues, such as a query
failing to execute or a system constraint being violated.
●​ Aborted State: If a transaction reaches a failed state, the database recovery system
ensures that the database is returned to its previous consistent state. This involves
aborting or rolling back the transaction, undoing all changes made since its start. After
aborting, the database recovery module will either attempt to restart the transaction or
terminate it permanently.
Understanding transaction states is crucial for debugging database applications and for
designing robust error handling mechanisms. It clarifies how the DBMS ensures data
consistency and recovers from failures by systematically managing the lifecycle of each unit of
work.

5.2 Serializability
Serializability is a fundamental concept in concurrency control, aiming to identify non-serial
schedules that allow transactions to execute concurrently without interfering with one another. A
non-serial schedule is considered serializable if its final result is equivalent to the result of some
serial execution of the same transactions. This concept is central to concurrency control
because while serial execution is simple and correct, it is inefficient in multi-user environments.
Non-serial schedules allow concurrency, but they risk data inconsistency. Serializability provides
the theoretical framework to determine if a concurrent schedule is "correct" (i.e., produces the
same result as some serial execution).

5.2.1 Serial and Non-serial Schedules

●​ Serial Schedule: In a serial schedule, transactions are executed one after another,
completely finishing one transaction before starting the next. There is no interleaving of
operations from different transactions.
●​ Non-serial Schedule: A non-serial schedule allows for the interleaving of operations from
multiple transactions. This means that operations from different transactions can be
executed concurrently, leading to many possible orders in which the system can execute
the individual operations.

5.2.2 Testing of Serializability: Precedence Graph

A Serialization Graph, also known as a Precedence Graph, is a directed graph used to test
the conflict serializability of a schedule. For a given schedule S, a graph G = (V, E) is
constructed, where V is a set of vertices representing all participating transactions, and E is a
set of directed edges.
An edge Ti → Tj is drawn in the precedence graph if one of the following conflicting conditions
holds:
1.​ Transaction Ti executes a write(Q) operation before transaction Tj executes a read(Q)
operation on the same data item Q.
2.​ Transaction Ti executes a read(Q) operation before transaction Tj executes a write(Q)
operation on the same data item Q.
3.​ Transaction Ti executes a write(Q) operation before transaction Tj executes a write(Q)
operation on the same data item Q.
The serializability condition for a schedule is determined by the presence of cycles in its
precedence graph. If the precedence graph for schedule S contains no cycle, then S is
considered serializable. Conversely, if the precedence graph contains a cycle, then S is
non-serializable.

5.2.3 Conflict and View Serializability

●​ Conflict Serializable Schedule: A schedule is deemed conflict serializable if it can be


transformed into a serial schedule by swapping only non-conflicting operations. This
implies that the schedule is conflict equivalent to a serial schedule, meaning it produces
the same result as if the transactions were executed one after another in some serial
order.
●​ View Serializable Schedule: Some non-serial schedules that are not conflict serializable
can still be considered correct if they produce the same final result as some serial
schedule. Such schedules are called view serializable. View serializability is typically
checked by verifying three conditions for "view equivalence" between the non-serial
schedule and a serial schedule: Initial Read, Final Write, and Updated Read. A blind
write, where a transaction writes to a data item without first reading its value, can impact
view serializability.
The concepts of serializability are critical for ensuring the correctness of concurrent
transactions. They allow DBMS to maximize throughput by interleaving operations while
guaranteeing that the final state of the database is consistent, as if transactions ran one after
another.

5.3 Recoverability of Schedules


A schedule is considered recoverable if, whenever a transaction Ti reads a data item that was
previously written by another transaction Tj, Tj must commit before Ti commits. This condition is
crucial for maintaining data consistency in the event of failures.
An irrecoverable schedule is one where this condition is violated. For example, if transaction
Tj writes a value, and transaction Ti reads that uncommitted value and then commits, but Tj
subsequently fails, Ti cannot be rolled back because it has already committed. This leads to an
inconsistent state in the database, as Ti's committed work is based on data that was later
undone.
The concept of recoverability highlights a critical interdependence between transactions. It is not
enough for individual transactions to be atomic; their interleaving must also allow for proper
recovery in case of failure. An irrecoverable schedule demonstrates how a committed
transaction can be based on data from a later-failed transaction, leading to a permanent
inconsistent state. This implies that the order of commits relative to reads is crucial for
maintaining data integrity. Ensuring recoverability is a vital aspect of designing robust
concurrency control mechanisms. It prevents cascading rollbacks and ensures that the
database can always be restored to a consistent state, even when transactions fail in a
concurrent environment.

5.4 Recovery from Transaction Failures


Database systems are designed with an inherent understanding of their vulnerability to
disruptions. When a transaction fails to execute completely due to various issues—such as
software problems, system crashes, or hardware failures—it must be rolled back to maintain
data consistency. Furthermore, if other transactions have used values produced by the failed
transaction, those dependent transactions also need to be rolled back.

5.4.1 Failure Classification

To understand the nature and origin of problems, failures are generally classified into the
following categories:
●​ Transaction Failure: This occurs when a transaction fails to execute or reaches a point
from which it cannot proceed further. Reasons for transaction failure include:
○​ Logical Errors: These happen if a transaction cannot complete due to a code error
or an internal error condition.
○​ Syntax Error: This occurs when the DBMS itself terminates an active transaction
because it is unable to execute it, for example, in cases of deadlock or resource
unavailability.
●​ System Crash: System failures can be caused by power outages or other hardware or
software malfunctions, such as operating system errors. In the context of system crashes,
a "fail-stop assumption" is often made, meaning that non-volatile storage (where data is
permanently stored) is assumed not to be corrupted.
●​ Disk Failure: This type of failure was more common in the early days of technology. It
occurs when hard-disk drives or storage drives fail frequently. Disk failure can result from
issues like bad sectors, disk head crashes, or unreachability to the disk, which can
destroy all or part of the stored data.

5.4.2 Basic Recovery Techniques

The core principle behind recovery is redundancy, achieved through various mechanisms that
ensure a consistent state can be reconstructed even if primary data or ongoing operations are
lost.
●​ Commit: This operation is used to permanently save the work done by a transaction,
making its changes durable in the database.
●​ Rollback: This operation is used to undo the work done by a transaction, reverting the
database to a consistent state prior to the transaction's initiation.
●​ Log-Based Recovery: This technique involves keeping a detailed record of all database
changes in a log file. If a failure occurs, this log file is used to redo completed transactions
(that were committed but not yet written to disk) and undo incomplete ones (that were not
committed). This can operate in two modes: Deferred database modification, where all
logs are written to stable storage and the database is updated only when a transaction
commits; or Immediate database modification, where the database is modified
immediately after every operation, with logs also recorded.
●​ Checkpointing: This technique involves creating a snapshot of the database at a
particular point in time. During recovery, the system can start from the last checkpoint,
significantly reducing the amount of data that needs to be processed from the log file.
●​ Shadow Paging: This method maintains two copies of a database page: a shadow page
(the old, consistent version) and a current page (where updates are made). During
updates, changes are made to the current page while the shadow page remains intact,
providing an immediate recovery point to revert to the previous state in case of failure.
●​ Backup and Restore: This involves creating periodic backups of the entire database. In
the event of a catastrophic failure, these backups can be used to restore the data to a
previous consistent state.
Robust recovery mechanisms are fundamental to the "Durability" aspect of ACID properties.
They ensure business continuity and data trustworthiness by minimizing data loss and
downtime in the face of unexpected system failures, which is paramount for mission-critical
applications.

5.5 Concurrency Control Techniques


Concurrency control in Database Management Systems is crucial for ensuring data
consistency and preventing conflicts when multiple users or transactions attempt to access and
modify the database simultaneously. Without proper concurrency control, several problems can
arise, compromising data integrity.
5.5.1 Concurrency Control Problems

When multiple transactions execute simultaneously, they can interfere with each other, leading
to various concurrency issues:
●​ Lost Update Problem: This occurs when two transactions read the same data, modify it
independently, and then write their changes back without proper synchronization. The
update made by the first transaction is effectively overwritten and "lost" by the second
transaction's write. For example, if Transaction T1 reads Balance = 5000, T2 also reads
Balance = 5000. T1 adds 1000 (resulting in 6000), and T2 subtracts 500 (resulting in
4500). If T2 writes its result last, T1's update is lost, and the final balance becomes 4500
instead of the correct 5500.
●​ Dirty Read Problem: Also known as reading uncommitted data, this problem arises when
a transaction reads data that has been modified by another transaction but has not yet
been committed. If the modifying transaction later rolls back, the data read by the first
transaction becomes invalid. For instance, if T1 updates Balance = 7000 but has not
committed, and T2 reads this 7000. If T1 then rolls back, restoring the balance to 5000,
T2 is left holding an incorrect value.
●​ Unrepeatable Read Problem: This occurs when a transaction reads the same data item
multiple times but retrieves different values because another committed transaction
modified the data between the reads. For example, T1 reads Balance = 5000. T2 updates
Balance = 6000 and commits. When T1 reads the balance again, it now sees 6000,
leading to inconsistent results within the same transaction.
●​ Phantom Read Problem: This problem occurs when a transaction executes a query
twice, and the second execution returns a different set of rows satisfying the query
criteria. This happens because another committed transaction inserted or deleted rows
that match the criteria between the two reads. For example, T1 selects all employees with
salary > 50000. T2 inserts a new employee with salary = 60000 and commits. When T1
re-executes the same query, it sees an extra "phantom" row.

5.5.2 Locking Techniques for Concurrency Control

To prevent these concurrency problems, DBMS employ various concurrency control techniques.
Locking techniques are a primary method, where a lock is a mechanism that prevents multiple
transactions from accessing the same resource simultaneously.
●​ Types of Locks:
○​ Shared Lock (S-Lock): This type of lock, also known as a read-only lock, allows
multiple transactions to access the same data item for reading concurrently.
However, no transaction holding an S-Lock can make changes to the data. An
S-Lock is requested using the lock-S instruction.
○​ Exclusive Lock (X-Lock): An Exclusive Lock provides a transaction with exclusive
access to a data item, allowing it to both read and modify the data. While an X-Lock
is held, no other transaction can access the same data item (neither read nor write).
An X-Lock is requested using the lock-X instruction.
●​ Two-Phase Locking Protocol (2PL): This is a widely used concurrency control protocol
that defines clear rules for managing data locks to ensure serializability. It divides a
transaction's execution into two distinct phases:
○​ Phase 1 (Growing Phase): In this phase, a transaction can acquire new locks, but
it cannot release any locks it currently holds. It continues to acquire all the locks it
needs to access the required data.
○​ Phase 2 (Shrinking Phase): Once a transaction releases its first lock, it enters the
shrinking phase. In this phase, the transaction can only release locks; it cannot
acquire any new ones. The 2PL protocol guarantees serializability, meaning that
any schedule produced by 2PL will be equivalent to some serial execution of the
transactions. However, it is susceptible to deadlocks, where two or more
transactions are indefinitely waiting for each other to release locks.

5.5.3 Time Stamping Protocols for Concurrency Control

Timestamp-based protocols are concurrency control mechanisms that ensure serializability


and avoid conflicts without relying on traditional locking. The core idea is to assign a unique
timestamp to each transaction, which determines the order in which transactions should be
executed. The timestamp can be the system's clock time or a logical counter that increments
with each new transaction. Older transactions (with smaller timestamps) are generally given
priority over newer ones.
These protocols operate with specific rules for read and write operations:
●​ Read Rule: If a transaction T wants to read an item that was last written by a younger
transaction T' (i.e., TS(T') > TS(T)), the read operation is rejected because it attempts to
read a value from the "future" relative to its timestamp. Otherwise, the read is permitted.
●​ Write Rule: If a transaction T wants to write an item that has been read or written by a
younger transaction T' (TS(T') > TS(T)), the write operation is rejected. This prevents
overwriting a value that a younger transaction has already interacted with. When a
transaction's operation violates these rules, the transaction is typically rolled back and
restarted or aborted. Schedules generated by basic timestamp ordering protocols are
conflict serializable and deadlock-free.

5.5.4 Validation Based Protocol (Optimistic Concurrency Control)

Validation-based protocols, also known as Optimistic Concurrency Control (OCC), are a


set of techniques that operate on the assumption that conflicts between transactions will be
rare. Unlike locking or timestamping, OCC defers conflict checking until the transaction's commit
time, aiming to increase system concurrency and performance by allowing transactions to
proceed without immediate checks.
A typical validation-based protocol consists of three phases:
1.​ Read Phase: The transaction reads data from the database. All updates are made to a
local, private copy of the data items, not directly to the main database.
2.​ Validation Phase: Before committing, the system checks to ensure that the transaction's
local updates will not cause conflicts with other concurrently running or recently
committed transactions. If conflicts are detected, the transaction fails validation.
3.​ Write Phase: If the transaction successfully passes validation, its local updates are then
applied to the actual database. If validation fails, the transaction is aborted and may be
restarted.
Advantages of OCC include high concurrency, as transactions do not acquire locks during the
read phase, and it is deadlock-free. It also avoids cascading rollbacks because updates are
applied to the database only after successful validation. However, disadvantages include the
overhead introduced by the validation process and the potential for wasted work if transactions
frequently fail validation and need to be restarted, especially in environments with high conflict
rates.

5.5.5 Multiple Granularity

Multiple granularity locking is a concurrency control mechanism that allows transactions to


lock data items at different levels of granularity within a hierarchical structure, such as the entire
database, a table, a page, or a single data item. This approach aims to reduce lock overhead
and increase concurrency by allowing transactions to lock only the minimum amount of data
necessary.
The system defines a hierarchy of lockable objects. For example, a common hierarchy might be:
Database -> Area -> File -> Record. Transactions can then acquire locks at different levels
depending on their specific needs. To manage this, additional types of locks are introduced
beyond simple shared (S) and exclusive (X) locks:
●​ Intention Shared (IS) Lock: Indicates that a transaction intends to set S-locks on some
descendants (finer-grained objects) in the hierarchy.
●​ Intention Exclusive (IX) Lock: Indicates that a transaction intends to set X-locks on
some descendants in the hierarchy.
●​ Shared and Intention Exclusive (SIX) Lock: Allows a transaction to read an entire
subtree (S-lock on the node) and also to update some individual items within that subtree
(intends to set X-locks on descendants).
Locks are typically acquired in a top-down order (from coarser to finer granularity) and released
in a bottom-up order. This flexibility improves concurrency by reducing the likelihood of lock
contention, where transactions are blocked waiting for locks held by others. It also enhances
scalability by allowing transactions to lock only the subset of data they need. However, multiple
granularity locking introduces complexity and overhead, and while it improves concurrency,
deadlocks can still occur.

5.5.6 Database Security

Database security is a critical aspect of database management, encompassing various layers


and types of information security controls to protect the confidentiality, integrity, and availability
(CIA Triad) of sensitive data. It spans across physical infrastructure, operating systems,
applications, and the database platforms themselves.
Key concepts and practices in database security include:
●​ Access Control: This involves defining and enforcing who can access the database and
what actions they are permitted to perform. Role-Based Access Control (RBAC) is a
common method, where permissions are assigned to roles, and users are assigned to
roles, simplifying management. This includes checking levels of access privilege (e.g.,
read-only vs. administrative users).
●​ Authentication: The process of verifying the identity of a user attempting to access the
database, ensuring they are who they claim to be. This can involve passwords,
multi-factor authentication (MFA), or other verification methods.
●​ Authorization: Once authenticated, authorization determines what specific operations a
user is permitted to perform on database objects (e.g., SELECT, INSERT, UPDATE,
DELETE on specific tables or views). This involves both system privileges (administrative
actions) and object privileges (operations on specific objects).
●​ Encryption: Protecting the contents of the database from unauthorized access by
transforming data into an unreadable format. This applies to data at rest (stored on disk)
and data in transit (moving across networks).
●​ Auditing: The process of monitoring and reviewing database activities and access logs to
track who accessed what data, when, and what operations were performed. This helps in
detecting anomalies, suspicious user activity, and ensuring compliance with security
policies.
●​ Vulnerability Assessments and Penetration Testing: Regularly performing these tests
helps identify security weaknesses that could be exploited by malicious actors.
●​ Proactive Threat Hunting: Scanning database environments for malware or threat
signatures to discover and mitigate attacks early.
●​ Separation of Duties: Physically separating database servers from other servers (e.g.,
web servers) to reduce the attack surface.
●​ Regular Patching and Updates: Keeping database software up-to-date to address
known vulnerabilities.
●​ Data Discovery: Continuously identifying where sensitive data resides, including
backups, development/test instances, and shadow IT, to ensure all data is protected.
Database security is a multi-layered defense, ensuring that data is protected from unauthorized
access, modification, or destruction. It involves a combination of technical controls,
administrative procedures, and continuous monitoring to uphold the confidentiality, integrity, and
availability of information.

Conclusions
The comprehensive examination of Database Management Systems, spanning from
foundational concepts to advanced transaction processing and security, underscores the critical
role of DBMS in modern information technology. The evolution from rudimentary file systems to
sophisticated multi-tier architectures demonstrates a continuous drive to overcome limitations in
data management, particularly regarding redundancy, consistency, scalability, and security. The
multi-level abstraction provided by database schemas (physical, logical, conceptual) is
instrumental in managing complexity, allowing different stakeholders to interact with the
database at appropriate levels of detail while ensuring data independence and system
maintainability.
The relational model, with its structured tables and rigorous integrity constraints, provides a
robust framework for organizing data. Relational algebra serves as the logical foundation for
querying, while SQL, as its practical implementation, offers a powerful and universal language
for data definition, manipulation, and control. The systematic process of normalization, guided
by functional dependencies, is essential for designing efficient and anomaly-free database
schemas, directly impacting data integrity and query performance.
Finally, the intricacies of transaction processing, governed by the ACID properties, are
paramount for ensuring data reliability in concurrent environments. Concurrency control
techniques, such as locking, timestamping, and validation-based protocols, are vital for
balancing simultaneous access with data consistency. Coupled with robust recovery
mechanisms and comprehensive database security measures, these elements collectively
ensure the resilience, trustworthiness, and continuous availability of critical data assets.
In essence, the study of DBMS is not merely about understanding software tools; it is about
grasping the fundamental principles of data organization, integrity, and secure access that
underpin virtually all modern digital systems. Mastering these concepts is crucial for designing,
implementing, and managing reliable and high-performing database solutions in any domain.

Works cited

1. Database Schemas | GeeksforGeeks, https://www.geeksforgeeks.org/database-schemas/ 2.


Data Models: Physical Data Model vs. Logical Data Model | GoodData,
https://www.gooddata.com/blog/physical-vs-logical-data-model/ 3. What is a database instance?
- Explanation & Examples | Secoda,
https://www.secoda.co/glossary/what-is-a-database-instance 4. What Is a Database Instance? -
Techslang, https://www.techslang.com/definition/what-is-a-database-instance/ 5. DBMS
Architecture 1-level, 2-Level, 3-Level | GeeksforGeeks,
https://www.geeksforgeeks.org/dbms-architecture-2-level-3-level/ 6. What Is Database
Architecture? - MongoDB,
https://www.mongodb.com/resources/basics/databases/database-architecture 7. What is
Mapping Cardinalities in ER Diagrams | GeeksforGeeks,
https://www.geeksforgeeks.org/what-is-mapping-cardinalities-er-diagrams/ 8. Understanding
Mapping Constraints in DBMS Types and Significance,
https://www.ccbp.in/blog/articles/mapping-constraints-in-dbms 9. ajaynagne.wordpress.com,
https://ajaynagne.wordpress.com/wp-content/uploads/2024/11/2.8-generalization-specialization-
and-aggregation-in-er-model.pdf 10. Generalization, Specialization and Aggregation in ER
Model ...,
https://www.geeksforgeeks.org/generalization-specialization-and-aggregation-in-er-model/ 11.
Enhanced ER Model | GeeksforGeeks, https://www.geeksforgeeks.org/enhanced-er-model/ 12.
Enhanced entity–relationship model - Wikipedia,
https://en.wikipedia.org/wiki/Enhanced_entity%E2%80%93relationship_model 13. Extended
Operators in Relational Algebra | GeeksforGeeks,
https://www.geeksforgeeks.org/extended-operators-in-relational-algebra/ 14. Relational Algebra
and Calculus - OMSCS Notes,
https://www.omscs-notes.com/databases/relational-algebra-and-calculus/ 15. JOIN | Snowflake
Documentation, https://docs.snowflake.com/en/sql-reference/constructs/join 16. ER Model to
Relational Model Conversion - Tutorialspoint,
https://www.tutorialspoint.com/dbms/er_model_to_relational_model.htm 17. 2.3. ERD Mapping
To Relational Data Model — Database - OpenDSA,
https://opendsa.cs.vt.edu/ODSA/Books/Database/html/ERDMappingToRDD.html 18. Common
SQL Data Types Explained with Practical Examples - Secoda,
https://www.secoda.co/learn/common-sql-data-types-explained-with-practical-examples 19. SQL
Literals - Hyperskill, https://hyperskill.org/university/sql/sql-literals 20. What are SQL literals? -
Quora, https://www.quora.com/What-are-SQL-literals 21. SQL Subquery: A Comprehensive
Guide - DataCamp, https://www.datacamp.com/tutorial/sql-subquery 22. SQL | Subquery -
GeeksforGeeks, https://www.geeksforgeeks.org/sql-subquery/ 23. Joins (SQL Server) - Learn
Microsoft,
https://learn.microsoft.com/en-us/sql/relational-databases/performance/joins?view=sql-server-ve
r17 24. Set Operators - EXCEPT and INTERSECT (Transact-SQL) - Learn Microsoft,
https://learn.microsoft.com/en-us/sql/t-sql/language-elements/set-operators-except-and-intersect
-transact-sql?view=sql-server-ver17 25. New EXCEPT, INTERSECT and MINUS Operators -
Software Download,
https://download.oracle.com/otndocs/products/rdb/pdf/tech_archive/except_intersect_minus_op
s.pdf 26. A guide to functional dependencies in database systems - Wrike,
https://www.wrike.com/blog/functional-dependencies-database-systems/ 27. Functional
Dependency in DBMS: Types, Examples & Advantages in 2025 - Fynd Academy,
https://www.fynd.academy/blog/functional-dependency-in-dbms 28. The most basic Inference
Rules: Armstrong's axioms - CS457 ...,
http://www.cs.emory.edu/~cheung/Courses/377/Syllabus/9-NormalForms/InferenceRules1b.html
29. Armstrong Axioms in DBMS: The Building Blocks of Data Organization,
https://www.ccbp.in/blog/articles/armstrong-axioms-in-dbms 30. Inference Rules for Functional
Dependencies in DBMS - Tutorialspoint,
https://www.tutorialspoint.com/explain-the-inference-rules-for-functional-dependencies-in-dbms
31. eecs.csuohio.edu,
https://eecs.csuohio.edu/~sschung/cis611/FunctionalDependencies_LectureNotes.pdf 32. Third
Normal Form (3NF) | GeeksforGeeks, https://www.geeksforgeeks.org/third-normal-form-3nf/ 33.
What is Third Normal Form (3NF)? A Beginner-Friendly Guide - DataCamp,
https://www.datacamp.com/tutorial/third-normal-form 34. BCNF Decomposition | A step by step
approach - Data Science Duniya,
https://ashutoshtripathi.com/gate/dbms/normalization-normal-forms/procedure-to-decompose-a-
given-relation-in-bcnf-bcnf-algorithm/ 35. Boyce-Codd Normal Form (BCNF) - Kev's Robots,
https://www.kevsrobots.com/learn/sql/11_boyce_codd_normal_form.html 36. Precedence Graph
for Testing Conflict Serializability in DBMS - GeeksforGeeks,
https://www.geeksforgeeks.org/precedence-graph-for-testing-conflict-serializability-in-dbms/ 37.
Precedence graph - Wikipedia, https://en.wikipedia.org/wiki/Precedence_graph 38. View
Serializability in DBMS - GeeksforGeeks,
https://www.geeksforgeeks.org/view-serializability-in-dbms/ 39. DBMS View Serializability -
BeginnersBook, https://beginnersbook.com/2018/12/dbms-view-serializability/ 40. DBMS Data
Recovery Techniques - Tutorialspoint,
https://www.tutorialspoint.com/dbms/dbms_data_recovery.htm 41. Database Recovery
Management - Tutorial - takeUforward,
https://takeuforward.org/dbms/database-recovery-management 42. Concurrency Management
in DBMS - Dr. Balvinder Taneja, https://drbtaneja.com/concurrency-management-in-dbms/ 43.
What Is Database Concurrency? Problems and Control Techniques | Netdata,
https://www.netdata.cloud/academy/what-is-database-concurrency/ 44. Two Phase Locking
Protocol | GeeksforGeeks, https://www.geeksforgeeks.org/two-phase-locking-protocol/ 45.
Lecture 13: Two Phase Locking,
https://faculty.cc.gatech.edu/~jarulraj/courses/8803-s22/slides/13-two-phase-locking-annotated.
pdf 46. Timestamp Based Protocols in DBMS - Study Glance,
https://studyglance.in/dbms/display.php?tno=45&topic=Timestamp-Based-Protocols-in-DBMS
47. Timestamp Based Protocols in DBMS- Scaler Topics,
https://www.scaler.com/topics/timestamp-based-protocols-in-dbms/ 48. Validation Based
Protocols in DBMS - Study Glance,
https://studyglance.in/dbms/display.php?tno=46&topic=Validation-Based-Protocols-in-DBMS 49.
Validation Based Protocol in DBMS - GeeksforGeeks,
https://www.geeksforgeeks.org/validation-based-protocol-in-dbms/ 50. Multiple Granularity
Locking in DBMS - PrepBytes,
https://www.prepbytes.com/blog/dbms/multiple-granularity-locking-in-dbms/ 51. What is Multiple
Granularity in DBMS? - Scaler Topics,
https://www.scaler.com/topics/multiple-granularity-in-dbms/ 52. Database Security: Concepts
and Best Practices - Rubrik, https://www.rubrik.com/insights/database-security 53. Database
security - Wikipedia, https://en.wikipedia.org/wiki/Database_security

You might also like