0% found this document useful (0 votes)
11 views78 pages

DBMS Notes by Tarun

The document outlines the history, advantages, disadvantages, applications, and architecture of Database Management Systems (DBMS). It details the evolution from early systems in the 1960s to modern relational and NoSQL databases, highlighting key innovations like SQL standardization and cloud computing. Additionally, it discusses the need for DBMS over traditional file systems, emphasizing data integrity, security, and efficient data management.

Uploaded by

nocreepy7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views78 pages

DBMS Notes by Tarun

The document outlines the history, advantages, disadvantages, applications, and architecture of Database Management Systems (DBMS). It details the evolution from early systems in the 1960s to modern relational and NoSQL databases, highlighting key innovations like SQL standardization and cloud computing. Additionally, it discusses the need for DBMS over traditional file systems, emphasizing data integrity, security, and efficient data management.

Uploaded by

nocreepy7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 78

History of DBMS

1. Origins of DBMS (1960s)


• Developed for business data management.
• Key Systems:
IDS (Integrated Data Store) – Charles Bachman.
IMS (Information Management System) – Used in NASA’s Apollo program.
• Database Models: Hierarchical (tree-like) & Network (graph-like).

2. Relational Model & SQL (1970s)


o Edgar F. Codd (1970) introduced the Relational Model (data in tables).
o SQL (Structured Query Language) created for querying relational databases.
o Impact: Led to modern DBMS (Oracle, MySQL).
4. Conclusion
o Evolution: Hierarchical/Network DBMS → Relational DBMS (SQL) → NoSQL & Cloud
Databases.
• Key Innovations:
o SQL standardization (1980s).
o Big Data & NoSQL for unstructured data.
o Cloud & distributed DBMS for large-scale processing.

DBMS continues to evolve with AI, cloud computing, and real-time analytics!
5. Key FAQs
• First DBMS: IBM IMS (1960s), used in NASA’s Apollo program.
• Relational Model: Introduced by Edgar F. Codd (1970).
• Pre-relational models: Hierarchical & Network models.
• SQL Standardization: Became industry standard (1980s) by ANSI.

DBMS: Advantages & Disadvantages


Advantages

Security & Integrity – Encryption, authentication, and access control.


Efficiency – Reduces redundancy, ensures consistency, and speeds up queries.
Data Sharing – Multi-user access without conflicts.
Scalability – Handles growing data and users.
Backup & Recovery – Prevents data loss and ensures business continuity.
Disadvantages
High Cost – Expensive hardware, software, and training.
Complexity – Requires skilled personnel and proper design.
Maintenance – Frequent updates and security patches needed.
Performance Issues – Can be slow for small-scale use.
Data Risks – Vulnerable to corruption, breaches, and redundancy.
Compatibility – May not integrate well with all systems.

FAQs on Disadvantages of DBMS


1. Why is DBMS considered complex?
o DBMS requires knowledge of query languages and database design, making it harder to use
compared to simple file-based systems.
2. Why is data recovery important in DBMS?
o Recovery techniques like backups, logging, checkpoints, and shadow paging ensure data
consistency during system failures.
3. Is DBMS suitable for small businesses?
o DBMS is often too costly for small businesses. Alternative cloud-based or simpler database
solutions may be more appropriate.

Applications of DBMS
Reservations (Railway/Airline) – Manages bookings, schedules, and transactions.
Library – Tracks books, issues, and returns.
Banking & Finance – Handles accounts, transactions, and security.
Education & HR – Manages student records, payroll, and recruitment.
E-Commerce & Credit Cards – Tracks orders, payments, and recommendations.
Social Media & Web – Stores user data, messages, and activity logs.
Telecom – Manages calls, billing, and subscriptions.
Healthcare – Stores patient records, prescriptions, and billing.
Security – Ensures encryption and access control.
Manufacturing – Tracks inventory, production, and supply chain.

FAQs on DBMS Applications


1. How is DBMS used in banking systems?
o DBMS manages transactions, customer accounts, and financial records, ensuring secure
and reliable banking operations.
2. What is the role of DBMS in hospital management?
o It stores patient records, schedules appointments, manages billing, and ensures efficient
healthcare administration.
3. Why is DBMS important for e-commerce?
o It tracks orders, manages inventory, processes payments, and provides personalized
customer experiences.
4. How does DBMS help in social media?
o It stores user profiles, interactions, and multimedia content, enabling seamless
connectivity.

Need for DBMS


Limitations of File Systems

Data Redundancy – Duplicate data across multiple files.


Data Inconsistency – Changes in one file don’t reflect in others.
Difficult Data Retrieval – Requires manual effort or complex programs.
Limited Security – No role-based access control.
No Relationship Support – Hard to link related data.
Concurrency Issues – No simultaneous multi-user access.
Advantages of DBMS

Eliminates Redundancy – Centralized data storage.


Ensures Integrity & Consistency – Auto-updates across related records.
Enhanced Security – Role-based access control.
Efficient Data Retrieval – SQL for easy querying.
Supports Relationships – Data linking via unique keys.
Concurrency Control – Multiple users can modify data without conflict.
Why DBMS is Needed

Organized & Scalable Data Storage – Faster search & retrieval.


Security & Compliance – Controls access, follows legal standards.
ACID Transactions – Ensures consistency & reliability.
Concurrent Access – Multiple users operate without issues.
Advanced Data Analysis – Supports business intelligence & reporting.
Cost-Effective & Scalable – Reduces storage costs, supports large-scale growth.

Frequently Asked Questions (FAQs)


1. What is a Database?
A database is a structured collection of data that allows for easy storage, retrieval, and management.
2. What are the Types of Databases?
DBMS is categorized into:
• Relational Database Management System (RDBMS) – Uses structured tables (e.g., MySQL,
PostgreSQL).
• Non-Relational (NoSQL) Databases – Handles unstructured data (e.g., MongoDB, Cassandra).
3. What is a Database Model?
A database model defines the structure of a database, such as:
• Hierarchical Model
• Relational Model
• Object-Oriented Model
4. What are Modern Databases?
Modern databases support cloud storage, machine learning, and big data processing with high scalability.
5. What is a Datastore?
A datastore is a large repository for structured and unstructured data, used by enterprises for file storage,
customer records, and application data.

DBMS Architecture Overview


DBMS architecture defines how data is stored, managed, and accessed efficiently. It ensures structured
data management, better scalability, and secure access.
Types of DBMS Architecture:
1. 1-Tier Architecture (Single-Tier)
o The database, client, and server are all on the same machine.
o Example: Microsoft Excel
o Advantages:
▪ Simple and easy to set up
▪ Cost-effective (no extra hardware needed)
▪ Best for personal use or standalone applications
2. 2-Tier Architecture (Client-Server Model)
o The application (client) interacts directly with the database (server) via APIs like ODBC/JDBC.
o Example: Library Management System
o Advantages:
▪ Faster access to data
▪ Scalable (supports more clients)
▪ Cheaper than 3-tier systems
3. 3-Tier Architecture (Client - Application Server - Database Server)
o The client interacts with an application server, which then connects to the database.
o Example: E-commerce websites (Amazon, Flipkart, etc.)
o Advantages:
▪ Highly scalable
▪ Enhanced security (clients don’t directly access the database)
▪ Ensures data integrity and efficient load balancing
o Disadvantages:
▪ More complex
▪ Requires better infrastructure
Comparison of DBMS Architectures

Feature 1-Tier 2-Tier 3-Tier

Complexity Low Medium High

Scalability Low Medium High

Security Low Medium High

Cost Low Medium High

Key Takeaways
• 1-Tier: Best for standalone applications.
• 2-Tier: Good for small businesses or internal applications.
• 3-Tier: Ideal for large-scale, multi-user applications with better security and performance.

Data Independence in DBMS


What is Data Independence?
Data Independence refers to the ability to change the schema at one level of a database without affecting
the schema at the next higher level. This allows flexibility in modifying the database structure without
changing applications or user views.

Types of Data Independence


1. Logical Data Independence
• Ability to change the logical schema (e.g., adding/removing attributes, changing relationships)
without affecting applications.
• Applications continue to function even if the database structure changes.
Example:
• A new column (e.g., "Email") is added to a Customer table.
• Queries and applications using other columns (Name, Address) remain unchanged.
Key Benefit:
✔ Reduces the need to modify application programs when the database schema changes.

2. Physical Data Independence


• Ability to change the physical storage of data (e.g., indexing, file organization) without affecting the
logical schema.
• Changes in storage structure do not require modifications in queries or applications.
Example:
• A database administrator moves data from one storage device to another.
• No changes are required in how users or applications access the data.
Key Benefit:
✔ Optimizes storage and performance without affecting the database's logical design.

Importance of Data Independence

✔ Reduces maintenance costs by minimizing the impact of schema changes.


✔ Enhances database flexibility by allowing easy modifications.
✔ Improves data security by separating physical storage from user access.
✔ Ensures application stability, even when database structures evolve.

File System vs. DBMS


What is a File System?
A file system is a method for storing and organizing files on a storage device. It handles file creation,
deletion, and retrieval but lacks advanced data management features.
Example: NTFS (Windows), EXT (Linux)
What is a DBMS (Database Management System)?
DBMS is software designed to store, retrieve, and manage structured data efficiently. It provides security,
integrity, and advanced data-handling features.
Example: MySQL, Oracle, SQL Server

Key Differences Between File System and DBMS

Feature File System DBMS

Manages databases with tables and


Structure Manages files and directories
relationships

Data Redundancy High (duplicate data) Low (prevents redundancy)

Backup &
No built-in mechanism Provides automated backup and recovery
Recovery

Query Processing No efficient query system Uses SQL for complex queries

Ensures data consistency through


Data Consistency Low consistency
normalization

Complexity Simple More complex but efficient

Security Low High (access control, encryption)

Cost Less expensive Higher cost

Data Supports logical and physical data


No separation of data from applications
Independence independence

User Access One user at a time Supports multiple concurrent users


Feature File System DBMS

Data Sharing Hard to share data across systems Easy data sharing due to centralization

Enforces constraints (Primary Key, Foreign


Data Integrity Hard to enforce
Key)

Requires file name and location to access


Attributes Data is accessed through queries
data

Examples C++, COBOL (manual file handling) MySQL, Oracle, PostgreSQL

Key Takeaways

Use File System when you need simple file storage (documents, images, media).
Use DBMS for handling structured data, complex queries, and multiple user access.
DBMS ensures data integrity, security, and scalability, unlike the file system.

Entity-Relationship (ER) Model in Database Design


Introduction to ER Model
1. Purpose: The ER Model is used to design databases by identifying entities and their relationships.
2. Steps in Database Design:
o Gather requirements (functional and data).
o Perform logical/conceptual design using the ER Model.
o Physical database design (e.g., indexing) and external design (e.g., views).
3. ER Model: A graphical representation of the logical structure of a database, showing entities,
attributes, and relationships.

Why Use ER Diagrams?


1. Visual Representation: ER diagrams make it easy to convert entities and relationships into database
tables.
2. Real-World Modeling: They represent real-world objects and their interactions.
3. Non-Technical: No technical knowledge of DBMS is required to understand ER diagrams.
4. Standardization: Provides a standard way to visualize data logically.

Symbols Used in ER Model


1. Rectangles: Represent Entities.
2. Ellipses: Represent Attributes.
3. Diamond: Represents Relationships between entities.
4. Lines: Connect attributes to entities and entity sets to relationships.
5. Double Ellipse: Represents Multi-Valued Attributes.
6. Double Rectangle: Represents Weak Entities.

Components of ER Diagram
1. Entities: Objects with physical or conceptual existence (e.g., person, company).
2. Attributes: Properties that define an entity (e.g., name, age).
3. Relationships: Associations between entities (e.g., student enrolled in a course).

What is an Entity?
1. Definition: An object with physical (e.g., person) or conceptual (e.g., company) existence.
2. Entity Set: A collection of entities of the same type (e.g., all students).
o Represented in ER diagrams, but individual entities (rows) are not.

Types of Entities
1. Strong Entity:
o Has a key attribute (primary key) for unique identification.
o Does not depend on other entities.
o Represented by a rectangle.
2. Weak Entity:
o Lacks a key attribute and depends on a strong entity for identification.
o Represented by a double rectangle.
o Example: Dependents of an employee.

What are Attributes?


1. Definition: Properties that define an entity (e.g., Roll_No, Name).
2. Representation: Shown as ovals in ER diagrams.

Types of Attributes
1. Key Attribute:
o Uniquely identifies an entity (e.g., Roll_No).
o Represented by an oval with underlying lines.
2. Composite Attribute:
o Made up of multiple attributes (e.g., Address = Street + City).
o Represented by an oval containing smaller ovals.
3. Multivalued Attribute:
o Can have multiple values (e.g., Phone_No).
o Represented by a double oval.
4. Derived Attribute:
o Can be derived from other attributes (e.g., Age from DOB).
o Represented by a dashed oval.

Relationship Types and Relationship Sets


1. Relationship Type:
o Association between entities (e.g., "Enrolled in" between Student and Course).
o Represented by a diamond.
2. Relationship Set:
o A collection of relationships of the same type.
o Example: Students enrolled in courses.

Degree of a Relationship Set


1. Unary Relationship: Involves one entity set (e.g., a person married to another person).
2. Binary Relationship: Involves two entity sets (e.g., student enrolled in a course).
3. Ternary Relationship: Involves three entity sets.
4. N-ary Relationship: Involves n entity sets.

Cardinality in Relationships
1. One-to-One (1:1):
o Each entity in one set relates to only one entity in another set.
o Example: One person marries one person.
2. One-to-Many (1:M ):
o One entity relates to multiple entities in another set.
o Example: One department has many doctors.
3. Many-to-One (M:1):
o Many entities relate to one entity in another set.
o Example: Many students enroll in one course.
4. Many-to-Many (M:N ):
o Entities in both sets relate to multiple entities.
o Example: Students enroll in multiple courses, and courses have multiple students.
Participation Constraints
1. Total Participation:
o Every entity in the set must participate in the relationship.
o Represented by a double line in ER diagrams.
2. Partial Participation:
o Entities may or may not participate in the relationship.
o Example: Some courses may not have any students enrolled.

How to Draw an ER Diagram


1. Identify all entities and represent them as rectangles.
2. Identify relationships and represent them as diamonds.
3. Connect entities to relationships using lines.
4. Add attributes to entities.
5. Remove redundant entities and relationships.
6. Use colors to highlight data.

Conclusion
• The ER Model is a powerful tool for designing databases.
• It visually represents entities, attributes, and relationships, making it easier to understand and
organize data.

Frequently Asked Questions (FAQs)


1. What is the purpose of an ER Diagram?
o To visually represent the structure of a database, showing entities, attributes, and
relationships.
2. How do ER Diagrams help in database design?
o They simplify the process of organizing data and understanding entity interactions.
3. What is the difference between a Weak Entity and a Strong Entity?
o A Strong Entity has a primary key, while a Weak Entity depends on a Strong Entity for
identification.
4. Can ER Diagrams represent complex relationships?
o Yes, they can model one-to-one, one-to-many, and many-to-many relationships.
5. Why are Participation Constraints used?
o To indicate whether all entities must participate in a relationship or only some may do so.
Structural Constraints of Relationships in ER Model
Introduction
Structural constraints in Entity-Relationship (ER) modeling define the participation rules and limitations
between entities in a database. These constraints ensure the correctness and efficiency of the database
schema by establishing rules for entity interaction. The two primary structural constraints are:
• Cardinality Constraints: Defines the number of instances involved in a relationship.
• Participation Constraints: Specifies whether all or some instances must participate in a relationship.
Understanding these constraints is crucial for designing accurate and functional databases.

Cardinality Ratios in ER Model


Cardinality ratios define the maximum number of entities that can be associated with each other in a
relationship. In ER diagrams, cardinality is represented by numbers (M, N) above the relationship lines.
Structural Constraints in ER Model
Structural constraints combine cardinality and participation constraints to enforce database rules. They
ensure consistency and prevent data anomalies.
Min-Max Notation
• Minimum (m): The minimum number of times an entity can participate in a relationship.
• Maximum (n): The maximum number of times an entity can participate in a relationship.
• Interpretation:
o If m = 0, it represents partial participation.
o If m \u2265 1, it represents total participation.
Example:
• In a library system, a book may or may not be borrowed by a member (partial participation).
• A student must be enrolled in at least one course (total participation).

Conclusion
Structural constraints in ER modeling play a crucial role in defining relationships between entities and
ensuring database integrity. By combining cardinality constraints and participation constraints, designers
can create well-structured and efficient databases. These constraints help prevent data inconsistencies and
optimize schema design, leading to improved query performance and schema evolution.

FAQs on Structural Constraints in ER Model


1. How are cardinality constraints represented in ER diagrams?
o Cardinality constraints are shown as symbols: ‘1’ for one-to-one, ‘1’ and ‘N’ (or crow’s foot)
for one-to-many, and crow’s feet at both ends for many-to-many relationships.
2. What is a weak entity, and how does it relate to structural constraints?
o A weak entity depends on a strong entity for identification and cannot exist independently.
Structural constraints ensure it is always associated with a strong entity.
3. How do participation constraints affect database design and data integrity?
o Total participation ensures complete data inclusion, improving integrity.
o Partial participation provides flexibility but requires careful handling to maintain
correctness.

Difference between Entity, Entity Type, and Entity Set in the ER


Model:
1. Entity
• A real-world object with distinct identity and attributes.
• Can be tangible (e.g., a person, car) or intangible (e.g., a bank account, contract).
• Example: A student named "Avi" with ID 1 in a university database.

2. Entity Type
• A category or class of similar entities that share the same attributes.
• Defines what attributes an entity will have in a database.
• Represented as a table schema in a relational database.
• Example: The "Student" entity type includes attributes like Student_ID, Name, Age.

3. Entity Set
• The collection of all entities of a particular entity type at a given time.
• Represents the data stored in the table at a specific moment.
• Can grow or shrink as entities are added or removed.
• Example: All students currently enrolled in a university form the Student entity set.

Comparison Table

Feature Entity Entity Type Entity Set

Definition A real-world object A category of similar entities A collection of all entities of a type

A single row (record) in a The schema (structure) of a


Representation All rows (records) in the table
table table

All student records in the "Student"


Example A student with ID 1 The "Student" table schema
table
Feature Entity Entity Type Entity Set

Change Over
Fixed identity Fixed structure Can grow or shrink
Time

In short:
✔ Entity = Single record (row in a table)
✔ Entity Type = Table schema (structure of records)
✔ Entity Set = Collection of records (all rows in a table)

Strong Entity vs. Weak Entity in ER Model


1. Strong Entity
• Independent: Can exist on its own.
• Primary Key: Has a unique identifier.
• Representation: Single rectangle in ER diagram.
• Relationships: Linked with a single diamond.
• Example: A Student with a unique Student_ID.
2. Weak Entity
• Dependent: Cannot exist without a strong entity.
• No Primary Key: Uses a partial key combined with the strong entity’s key.
• Representation: Double rectangle in ER diagram.
• Relationships: Linked with a double diamond (Identifying Relationship).
• Total Participation: Always requires a relationship with a strong entity.
• Example: A Dependent (like a "Spouse" or "Child") relies on an Employee.
Comparison Table

Feature Strong Entity Weak Entity

Primary Key Yes No (uses partial key)

Dependency Independent Dependent on strong entity

Representation Single rectangle Double rectangle

Relationship Representation Single diamond Double diamond (Identifying)

Participation Partial or total Always total

Integrity Constraints in DBMS


Integrity constraints in DBMS are rules that ensure the accuracy, consistency, and validity of data in a
database. These constraints help maintain data integrity and prevent errors or inconsistencies.
Types of Integrity Constraints
1. Entity Integrity Constraint
• Ensures each record in a table is uniquely identifiable.
• Rule: Every table must have a Primary Key, and it cannot be NULL.
• Example:
CREATE TABLE Student (
Student_ID INT PRIMARY KEY, -- Ensures uniqueness and non-null values
Name VARCHAR(50),
Age INT
);

• Prevents duplicate and null primary keys.

2. Referential Integrity Constraint


• Ensures relationships between tables remain consistent.
• Rule: A Foreign Key must reference a valid Primary Key in another table or be NULL.
• Example:
CREATE TABLE Student (
Student_ID INT PRIMARY KEY,
Name VARCHAR(50)
);

CREATE TABLE Course (


Course_ID INT PRIMARY KEY,
Student_ID INT,
FOREIGN KEY (Student_ID) REFERENCES Student(Student_ID) -- Enforces referential integrity
);

• Prevents orphaned records by ensuring referenced data exists.

3. Domain Integrity Constraint


• Ensures data values fall within a valid range or domain.
• Rule: Attribute values must be valid, meaningful, and within predefined limits.
• Example:
CREATE TABLE Employee (
Emp_ID INT PRIMARY KEY,
Age INT CHECK (Age >= 18), -- Restricts age to be 18 or above
Salary DECIMAL(10,2) CHECK (Salary > 0) -- Salary must be positive
);

• Prevents invalid data entry (e.g., negative salaries or ages below 18).

4. Key Integrity Constraint


• Ensures that keys uniquely identify records in a table.
• Rule: Every table must have at least one candidate key, and one Primary Key is chosen from them.
• Example:
CREATE TABLE Employee (
Emp_ID INT UNIQUE, -- Ensures uniqueness
Email VARCHAR(100) UNIQUE -- Ensures unique emails
);

• Ensures uniqueness in key attributes.

5. NOT NULL Constraint


• Ensures that a column cannot contain NULL values.
• Rule: Certain attributes must always have a value.
• Example:
CREATE TABLE Customer (
Cust_ID INT PRIMARY KEY,
Name VARCHAR(50) NOT NULL, -- Name cannot be NULL
Email VARCHAR(100) NOT NULL
);

• Ensures critical data is always provided.

Generalization, Specialization, and Aggregation in ER Model


To manage complex databases, we use Generalization, Specialization, and Aggregation as data
abstraction techniques in the ER Model. These techniques help in organizing and structuring data more
efficiently.

1. Generalization (Bottom-Up Approach)


• What is it?
Generalization is the process of combining multiple lower-level entities into a higher-level entity
based on common attributes.
• How does it work?
o It extracts common properties from two or more entities and creates a new generalized
entity.
o Common attributes move up to the generalized entity.
• Example:
o Entities STUDENT and FACULTY have common attributes like P_NAME and P_ADD.
o A new generalized entity called PERSON is created.
o Specific attributes (e.g., S_FEE for STUDENT) remain in the specialized entities.
• Key Feature:
Bottom-Up Approach (Moves from specific to general).

2. Specialization (Top-Down Approach)


• What is it?
Specialization is the process of breaking down a higher-level entity into two or more sub-entities
based on specific characteristics.
• How does it work?
o A general entity is divided into sub-entities that inherit common attributes.
o Sub-entities get additional specific attributes.
• Example:
o EMPLOYEE entity can be specialized into DEVELOPER and TESTER.
o Common attributes (e.g., E_NAME, E_SAL) belong to the EMPLOYEE entity.
o Specific attributes (e.g., TES_TYPE) belong to TESTER.
• Key Feature:
Top-Down Approach (Moves from general to specific).

3. Inheritance in Generalization & Specialization


Inheritance allows sub-entities to inherit attributes and participation constraints from a higher-level
entity.
• Attribute Inheritance:
o Lower-level entities inherit attributes from the higher-level entity.
o Example: CAR inherits the Model attribute from VEHICLE.
• Participation Inheritance:
o Sub-entities inherit participation constraints but not relationships.
o Example: VEHICLE has a relationship with CYCLE, but the relationship itself is not inherited.
4. Aggregation (Abstraction of Relationships)
• What is it?
Aggregation is used when a relationship itself needs to be treated as an entity.
• How does it work?
o A relationship and its entities are grouped together as a higher-level entity.
o This new entity can then participate in additional relationships.
• Example:
o An EMPLOYEE WORKS_FOR a PROJECT and REQUIRES MACHINERY.
o The WORKS_FOR relationship (between EMPLOYEE and PROJECT) is aggregated into a
higher entity.
o A new REQUIRE relationship is created between this aggregated entity and MACHINERY.
• Key Feature:
Helps when a relationship needs to act as an entity.

Comparison Table

Feature Generalization Specialization Aggregation

Combining similar entities into a Dividing a higher-level entity Representing a relationship as an


Definition
higher-level entity into sub-entities entity

Approach Bottom-Up Top-Down Relationship Abstraction

Reduces redundancy by merging Increases specificity by defining Allows relationships to be treated


Purpose
similar entities sub-entities as entities

EMPLOYEE → DEVELOPER, (EMPLOYEE, PROJECT) →


Example STUDENT + FACULTY → PERSON
TESTER REQUIRES MACHINERY

Key Takeaways

✔ Generalization = Bottom-Up (Combine similar entities into one)


✔ Specialization = Top-Down (Divide one entity into sub-entities)
✔ Aggregation = Treat a relationship as an entity
✔ Inheritance allows attributes & constraints to pass between entities

Recursive Relationships in ER Diagrams


A recursive relationship (also called a self-referential relationship) occurs when an entity relates to itself in
an ER diagram. This means that a single entity set participates more than once in the same relationship but
plays different roles each time.
Understanding Recursive Relationships
• Definition:
A relationship where an entity is related to itself is called a recursive relationship.
• Common Use Cases:
o Organizational Hierarchies (Employees reporting to other employees)
o Social Networks (Users adding other users as friends)
o Family Trees (A person has parents who are also people)
• Example:
o In a company, an EMPLOYEE entity can have a REPORTS_TO relationship with itself.
o An employee can be both a supervisor (manager) and a subordinate (worker).
o The CEO has no manager, but other employees report to someone.
Frequently Asked Questions (FAQs)
1. What is a recursive relationship?
A recursive relationship is when an entity relates to itself in an ER diagram, such as an EMPLOYEE reporting
to another EMPLOYEE.
2. How do you represent a recursive relationship in an ER diagram?
• Use a self-relationship with different role names (e.g., Manager and Subordinate).
• The relationship has one-to-many (1:N) or one-to-one (1:1) cardinality.
3. How do you implement a recursive relationship in SQL?
• Create a foreign key referencing the primary key of the same table:
FOREIGN KEY (manager_id) REFERENCES employee(id)
4. What happens when a manager is deleted?
• If a manager is deleted, the foreign key constraint ensures subordinates’ manager_id is either:
o Set to NULL (ON DELETE SET NULL)
o Updated to another manager (ON DELETE CASCADE)

Relational Model and Codd's Rules in DBMS


Notes:
Relational Model Overview
• Relational Model organizes data into tables (relations) consisting of rows (tuples) and columns
(attributes).
• Simplifies data storage, retrieval, and management.
• SQL (Structured Query Language) is the most common tool used to manage and query relational
databases.
• NoSQL databases offer alternatives for unstructured data but relational databases remain popular
for applications requiring consistency and complex queries.
Key Features of the Relational Model
1. Simplicity: Easy to implement, simplifying operations.
2. Linking: Uses primary and secondary keys to relate tables.
3. Normalization: Eliminates data redundancy and improves efficiency.
4. Data Processing: Uses Relational Algebra and Relational Calculus to manipulate data.
Relational Model Terminology
• Relation (Table): Basic structure where data is stored, consisting of rows and columns.
o Example: Student table with StudentID, Name, Age, and Course.
• Relational Schema: Defines the structure of a relation.
o Example: STUDENT(StudentID, Name, Age, Course).
• Relational Instance: The actual set of data (tuples) in a relation at any given time.
• Attribute: A property or column in a relation.
o Example: StudentID, Name, Age, and Course.
• Domain of an Attribute: The set of possible values for an attribute.
o Example: The domain of Age might be valid ages like 21, 22, etc.
• Tuple: A row in a relation.
o Example: A single student entry in the Student table.
• Cardinality: The number of distinct values in a column.
o Example: The Age column might have distinct values like 21, 22, and 23.
• Degree (Arity): The number of attributes (columns) in a relation.
o Example: The Student table has 4 columns, so its degree is 4.
• Primary Key: A set of attributes that uniquely identify a tuple in a relation.
• NULL values: Represents missing, unknown, or undefined values.

Relational Database Management Systems (RDBMS)


• Common RDBMS include Oracle, MySQL, PostgreSQL, Microsoft SQL Server, and MariaDB.
Relational Algebra
• A procedural language used for querying relational databases. It includes various operators:
o Union (U): Combines all elements from two relations without duplicates.
o Intersection (∩): Finds common elements in two relations.
o Difference (─): Displays elements in one relation that aren't in the other.
o Cartesian Product (X): Combines all pairs of tuples from two relations.
o Selection (σ): Chooses subsets of tuples based on a condition.
o Projection (π): Selects specific columns from a relation.
o Join: Combines tuples from two relations based on a common attribute.
o Division (÷): Divides tuples in one relation by another.
o Rename (ρ): Renames relations or attributes.
Features of the Relational Model and Codd's Rules
• Tables/Relations: The core structure where data is represented.
• Primary and Foreign Keys: Ensure data consistency and define relationships between tables.
• Normalization: Reduces redundancy and organizes data efficiently.
• Codd’s Rules: Set of 12 rules that a DBMS must follow to be considered a true relational database.
o Ensure data consistency, integrity, and ease of access.
• ACID Properties: Transactions must follow Atomicity, Consistency, Isolation, and Durability to
maintain data integrity.
Codd’s Twelve Rules of Relational Database
1. Rule 0 (Foundation): DBMS must manage databases solely through relational capabilities.
2. Rule 1 (Information): All data must be stored in tables.
3. Rule 2 (Guaranteed Access): Every data element must be accessible by the table name, primary key,
and attribute name.
4. Rule 3 (Systematic Treatment of NULLs): NULL values should represent missing or unknown data.
5. Rule 4 (Active Online Catalog): Database structure must be stored in an online catalog.
6. Rule 5 (Comprehensive Data Sub-language): Supports a language for definition, manipulation, and
transaction operations.
7. Rule 6 (View Updating): Views should be automatically updatable.
8. Rule 7 (Insert/Update/Delete Operations): Supports operations like insert, delete, and update at
each relation level.
9. Rule 8 (Physical Data Independence): Changes to physical storage shouldn't affect the application.
10. Rule 9 (Logical Data Independence): Changes to the logical schema shouldn't affect applications.
11. Rule 10 (Integrity Independence): Integrity constraints should be modifiable at the database level.
12. Rule 11 (Distribution Independence): Data distribution should be transparent to users.
13. Rule 12 (Non-Subversion): Low-level access should not bypass integrity rules.
Advantages of Relational Algebra
• Simplicity: Easy to learn and use with simple operators.
• Formality: Provides a standardized way of expressing queries.
• Abstraction: Focuses on the logical structure rather than physical storage.
• Portability: Queries can be easily moved between different systems.
• Efficiency: Optimized for quick query execution.
• Extensibility: Can be extended with new operators.
Disadvantages of Relational Algebra
• Limited Expressiveness: May require advanced techniques for complex queries.
• Lack of Flexibility: Not suitable for non-relational data.
• Performance Limitations: Can be slow with large or complex datasets.
• Limited Data Types: Struggles with complex data types like multimedia or spatial data.
• Integration Issues: Can be challenging to integrate with other systems.
Conclusion
• The Relational Model revolutionized data storage and management through its use of tables.
• Codd’s Rules ensure that databases are consistent, reliable, and maintainable.
• Despite some limitations, relational databases remain widely used for their simplicity and powerful
query capabilities.

FAQs (Frequently Asked Questions)


1. What are Codd’s Rules in the Relational Model?
o Codd’s Rules define the characteristics a DBMS must follow to be considered relational,
ensuring data independence, consistency, and proper relational operations.
2. Why are Codd’s Rules important in relational databases?
o They standardize relational databases, ensuring integrity, flexibility, and the ability to handle
complex queries.
3. What is the difference between a primary key and a foreign key in a relational database?
o A primary key uniquely identifies rows in a table, while a foreign key links a column in one
table to the primary key of another table.
4. Who introduced the relational model?
o Dr. Edgar F. Codd introduced the relational model in 1970 at IBM’s San Jose Research
Laboratory.

Types of Keys in Relational Model (Candidate, Super, Primary,


Alternate, and Foreign)
Why Keys are Required in a DBMS:
Keys are essential in a DBMS to uniquely identify records in a table, prevent duplication, and ensure data
accuracy and integrity. They also establish relationships between tables, enabling efficient querying and
management of data. Without keys, retrieving or updating specific records would be difficult, leading to
data inconsistency.
Different Types of Keys:
1. Super Key:
o A super key is a set of one or more attributes (columns) that uniquely identifies a tuple
(record) in a table.
o It can include extra attributes that aren't necessary for uniqueness.
o Example: In the STUDENT table, (STUD_NO, PHONE) is a super key, even though STUD_NO
alone is sufficient.
2. Candidate Key:
o A candidate key is a minimal set of attributes that can uniquely identify a tuple.
o It is a super key with no extra attributes.
o A table can have multiple candidate keys, but only one is chosen as the primary key.
o Example: In the STUDENT table, STUD_NO is a candidate key.
3. Primary Key:
o A primary key is one of the candidate keys selected to uniquely identify records in a table.
o It must have unique values and cannot contain NULL values.
o It can be a single column or a combination of multiple columns (composite primary key).
o Example: In the STUDENT table, STUD_NO is the primary key.
4. Alternate Key:
o An alternate key is any candidate key that was not chosen as the primary key.
o It can uniquely identify records in the table, just like the primary key.
o Example: In the STUDENT table, PHONE could be an alternate key if STUD_NO is the primary
key.
5. Foreign Key:
o A foreign key is an attribute in one table that references the primary key of another table.
o It establishes relationships between tables and ensures referential integrity.
o Example: In the STUDENT_COURSE table, STUD_NO is a foreign key referring to STUD_NO in
the STUDENT table.
6. Composite Key:
o A composite key is made up of two or more columns used together to uniquely identify a
tuple.
o Example: In the STUDENT_COURSE table, {STUD_NO, COURSE_NO} can form a composite
key.
Conclusion:
The relational model relies on various keys like candidate, primary, alternate, and foreign keys to uniquely
identify records and establish relationships between tables. The proper use of these keys is crucial in
creating robust and efficient relational databases.
FAQs:
1. Why are keys necessary for DBMS?
o Keys uniquely identify records in a table and help establish relationships between tables,
ensuring data consistency and efficient access.
2. What is a Unique Key?
o A unique key ensures uniqueness but can allow one NULL value. Unlike the primary key,
which cannot contain NULL values, a unique key can.
3. What is an Artificial Key?
o An artificial key is used when no existing attributes can serve as a primary key, often because
they are too complex.
4. Can one column have two foreign keys?
o Yes, a column can have multiple foreign keys, typically in composite relationships where the
column references more than one table.
5. Can we update a foreign key in a table?
o Yes, foreign keys can be updated, but the new value must still exist in the referenced table to
maintain referential integrity. Cascading updates can also be set to automate this process.

Constraints in DBMS
1. Primary Key Constraint: Ensures that each record in a table is unique (e.g., "StudentID" for a
"Students" table).
2. Foreign Key Constraint: Links a table to another by referencing its primary key, maintaining
referential integrity (e.g., linking "OrderID" in an "OrderDetails" table to "Orders" table).
3. Unique Constraint: Ensures all values in a column are different (e.g., "Email" column in a "Users"
table).
4. Not Null Constraint: Ensures that a column cannot have null values (e.g., "LastName" column in an
"Employees" table).
5. Check Constraint: Ensures that values meet a specific condition (e.g., "Age" must be over 18).
6. Default Constraint: Provides a default value for a column if no value is specified (e.g., default
"Pending" status in an "Orders" table).

Introduction to Relational Algebra in DBMS


Relational Algebra is a procedural query language used in relational databases. It provides the theoretical
foundation for relational databases and SQL. The primary purpose of Relational Algebra is to define
operators that transform one or more input relations into an output relation. It uses mathematical
operators and does not rely on English keywords, using symbols to represent operations.

What is Relational Algebra?


Relational Algebra is a set of operations used to manipulate and query data from a relational database.
These operations allow users to filter, combine, and organize data efficiently, serving as the basis for most
database queries. Though often implemented via SQL, it is a foundational aspect of querying relational
data.
Fundamental Operators of Relational Algebra
1. Selection (σ): Filters rows based on a given condition.
o Example: To select records where a column C > 3, use σ(c > 3)R.
o Purpose: Filters out rows that match the specified condition.
2. Projection (π): Selects specific columns from a relation.
o Example: To select columns B and C, use π(B,C)R.
o Purpose: Focuses on specific attributes (columns).
3. Union (U): Combines the results of two queries with the same number of columns and data types.
o Example: π(Student_Name)FRENCH U π(Student_Name)GERMAN.
o Purpose: Merges two sets of data.
4. Set Difference (-): Returns rows present in one table but not in another.
o Example: π(Student_Name)FRENCH - π(Student_Name)GERMAN.
o Purpose: Finds elements that are exclusive to one set.
5. Set Intersection (∩): Returns rows that are common between two sets.
o Example: π(Student_Name)FRENCH ∩ π(Student_Name)GERMAN.
o Purpose: Identifies the overlap between two sets.
6. Rename (ρ): Renames a relation or its attributes temporarily.
o Example: ρ(a/b)R renames attribute 'b' to 'a'.
o Purpose: Resolves ambiguities or simplifies complex queries.
7. Cartesian Product (X): Combines every row from one table with every row of another table,
creating all possible combinations.
o Example: A X B.
o Purpose: Precursor to complex operations like joins.

Derived Operators
1. Natural Join (⋈): Combines tables based on common attributes with matching values.
o Example: EMP ⋈ DEPT where Dept_Name is common between EMP and DEPT.
o Purpose: Combines related data using shared attributes.
2. Conditional Join: Similar to Natural Join, but allows for custom conditions such as >=, <, or ≠.
o Example: Join R and S where R.Marks >= S.Marks.
o Purpose: More flexible join conditions.

Relational Calculus
While Relational Algebra is procedural, Relational Calculus is non-procedural. It describes what data is
required but not how to obtain it. There are two types:
• Tuple Relational Calculus (TRC)
• Domain Relational Calculus (DRC)

Conclusion
Relational Algebra may seem theoretical, but it plays a critical role in database query design and
optimization. By understanding its operators, you can break down complex queries into simpler ones and
efficiently retrieve and manipulate data in relational databases. Whether you're working with selection,
projection, or joins, these operators are essential tools for anyone managing a relational database.

FAQs on Relational Algebra in DBMS


• What is a relational database?
A relational database stores data in tables (rows and columns), with relationships between tables
defined by shared attributes, making data retrieval more organized.
• What is the relational model?
The relational model organizes data into tables, each representing a specific type of information,
and defines how tables relate to one another through keys.
• Difference between Selection and Projection?
Selection filters rows, while projection selects specific columns from a table.
• Cartesian Product vs Join?
Cartesian Product creates all possible combinations of rows, whereas joins combine rows based on
a related column (usually a key).
• Why is Relational Algebra important?
It provides a framework for designing efficient queries, helping to retrieve and manipulate data in
relational databases more effectively.

FAQs
• What is the difference between Relational Algebra and SQL?
o Relational Algebra is procedural (it tells how to retrieve data), whereas SQL is declarative (it
tells what data to retrieve).
• Why is Relational Algebra important?
o It forms the theoretical foundation for relational databases and query optimization.
• Can Relational Algebra handle complex queries?
o Yes, by combining basic operators, complex queries can be constructed.
• What is meant by “union-compatible” relations?
o Relations are union-compatible if they have the same number of attributes with matching
data types.
• How does the Join operator work in Relational Algebra?
o It combines rows based on a specified condition, usually involving a common attribute.

Extended Operators in Relational Algebra


Extended operators enhance basic relational algebra operations to handle complex queries efficiently.
These include Intersection, Conditional Join, Equijoin, Natural Join, Outer Joins, and Division.

1. Intersection (∩)
• Returns common tuples between two relations.
• Relations must be union-compatible (same attributes and domains).
2. Conditional Join (⋈ₓ)
• Joins two relations based on any condition, not just equality.
• Uses selection and cross-product.
3. Equijoin (⋈)
• A specific Conditional Join based on equality.
4. Natural Join (⋈)
• Automatically joins two relations on attributes with the same name.
• Duplicate columns are removed.
5. Left Outer Join (⟕)
• Returns all tuples from the left relation, even if no match exists.
• Unmatched tuples in the right relation get NULL values.
6. Right Outer Join (⟖)
• Returns all tuples from the right relation, even if no match exists.
• Unmatched tuples in the left relation get NULL values.
7. Full Outer Join (⟗)
• Returns all tuples from both relations.
• If no match is found, missing attributes are filled with NULL.
8. Division (÷)
• Finds tuples in A that are related to all tuples in B.
• Used for "For All" queries.

SQL Joins: An Overview


SQL joins are critical for querying and combining data from multiple tables based on relationships between
columns. Understanding how and when to use different types of joins will significantly improve your ability
to retrieve meaningful insights from relational databases. The most commonly used SQL joins are INNER
JOIN, LEFT JOIN, RIGHT JOIN, FULL JOIN, and NATURAL JOIN.
What is a SQL Join?
A JOIN in SQL is used to combine rows from two or more tables based on a related column between them.
It helps to bring data from multiple tables together for meaningful analysis. Joins are often paired with
other clauses, like the WHERE clause, to filter data as needed.
Types of SQL Joins
1. INNER JOIN
o Description: The INNER JOIN keyword returns rows when there is a match in both tables. If
there’s no match, the row is excluded from the result set.
o Syntax:
o SELECT table1.column1, table1.column2, table2.column1, ...
o FROM table1
o INNER JOIN table2
o ON table1.matching_column = table2.matching_column;
o Example:
o SELECT StudentCourse.COURSE_ID, Student.NAME, Student.AGE
o FROM Student
o INNER JOIN StudentCourse
o ON Student.ROLL_NO = StudentCourse.ROLL_NO;
o Result: Displays student names, ages, and course IDs for students enrolled in courses.
2. LEFT JOIN (or LEFT OUTER JOIN)
o Description: The LEFT JOIN returns all rows from the left table and matching rows from the
right table. If no match is found, NULL values are returned for the right table's columns.
o Syntax:
o SELECT table1.column1, table1.column2, table2.column1, ...
o FROM table1
o LEFT JOIN table2
o ON table1.matching_column = table2.matching_column;
o Example:
o SELECT Student.NAME, StudentCourse.COURSE_ID
o FROM Student
o LEFT JOIN StudentCourse
o ON Student.ROLL_NO = StudentCourse.ROLL_NO;
o Result: Retrieves all students and the courses they are enrolled in. Students without courses
will have NULL for the COURSE_ID.
3. RIGHT JOIN (or RIGHT OUTER JOIN)
o Description: The RIGHT JOIN returns all rows from the right table and matching rows from
the left table. If no match is found, NULL values are returned for the left table's columns.
o Syntax:
o SELECT table1.column1, table1.column2, table2.column1, ...
o FROM table1
o RIGHT JOIN table2
o ON table1.matching_column = table2.matching_column;
o Example:
o SELECT Student.NAME, StudentCourse.COURSE_ID
o FROM Student
o RIGHT JOIN StudentCourse
o ON Student.ROLL_NO = StudentCourse.ROLL_NO;
o Result: Retrieves all courses and the students enrolled in them. If a course has no students,
the student-related columns will be NULL.
4. FULL JOIN (or FULL OUTER JOIN)
o Description: The FULL JOIN returns all rows from both the left and right tables. If a row has
no match in the opposite table, NULL values are used for the missing data.
o Syntax:
o SELECT table1.column1, table1.column2, table2.column1, ...
o FROM table1
o FULL JOIN table2
o ON table1.matching_column = table2.matching_column;
o Example:
o SELECT Student.NAME, StudentCourse.COURSE_ID
o FROM Student
o FULL JOIN StudentCourse
o ON Student.ROLL_NO = StudentCourse.ROLL_NO;
o Result: Retrieves all students and courses. Students without courses and courses without
students will have NULL values in their respective columns.
5. NATURAL JOIN
o Description: A NATURAL JOIN automatically joins tables based on columns with the same
name and data type in both tables. It eliminates duplicate columns in the result set.
o Syntax:
o SELECT table1.column1, table1.column2, table2.column1, ...
o FROM table1
o NATURAL JOIN table2;
o Example:
o SELECT Employee.Emp_id, Employee.Emp_name, Department.Dept_name
o FROM Employee
o NATURAL JOIN Department;
o Result: Joins the Employee and Department tables based on the common column Dept_id,
returning each employee and their department.
Practical Use Cases and Visual Representation
1. INNER JOIN:
o Retrieves only matched rows between tables, like fetching students enrolled in courses.
o Example: List students and the courses they are enrolled in.
2. LEFT JOIN:
o Retrieves all rows from the left table, ensuring no data is lost from it, even when there is no
match in the right table.
o Example: List all students and their course enrollments, even if some students are not
enrolled in any courses.
3. RIGHT JOIN:
o Retrieves all rows from the right table, even when there is no match in the left table.
o Example: List all courses and their enrolled students, even if some courses have no students.
4. FULL JOIN:
o Combines the results of both LEFT JOIN and RIGHT JOIN, ensuring all data from both tables is
included.
o Example: List all students and all courses, even if some students are not enrolled in any
courses and some courses have no students.
5. NATURAL JOIN:
o Automatically joins tables based on common columns with the same name and data type,
simplifying queries.
o Example: Combine employee details and department details based on the common Dept_id.
Conclusion
SQL joins are essential for efficiently querying relational databases, combining data from multiple tables
based on logical relationships. By mastering INNER JOIN, LEFT JOIN, RIGHT JOIN, FULL JOIN, and NATURAL
JOIN, we can write more powerful and meaningful queries that address complex business needs.
FAQs
1. What are the 4 types of joins in SQL?
o INNER JOIN, LEFT JOIN, RIGHT JOIN, FULL JOIN.
2. What is a join in SQL?
o A join combines rows from two or more tables based on a related column between them.
3. What is the difference between INNER JOIN and LEFT JOIN?
o INNER JOIN returns only matched rows from both tables, while LEFT JOIN returns all rows
from the left table, including unmatched rows, with NULLs for the right table's columns.

Join Operation vs Nested Query in DBMS


In relational databases, both join operations and nested queries (subqueries) are used to retrieve data
from multiple tables. They serve similar purposes but differ in their approach, performance, and use cases.
Here’s a detailed comparison:

1. Join Operation
A join operation is used to combine data from two or more tables based on a common column or
condition. The result of a join is a single result set containing columns from all the involved tables.
• Types of Joins:
o INNER JOIN: Returns rows where there is a match in both tables.
o LEFT JOIN (OUTER JOIN): Returns all rows from the left table, and matching rows from the
right table. If no match, returns NULL for columns from the right table.
o RIGHT JOIN (OUTER JOIN): Similar to LEFT JOIN but returns all rows from the right table.
o NATURAL JOIN: Automatically joins tables based on columns with the same name and
compatible data types.
Example of an Inner Join:
Let's say we have two tables:
Table1 (ID, Name):

ID Name

1 John

2 Sarah

3 David

Table2 (ID, Address):

ID Address

1 123 Main St.

2 456 Elm St.

4 789 Oak St.

SQL Query:
SELECT Table1.ID, Table1.Name, Table2.Address
FROM Table1
INNER JOIN Table2
ON Table1.ID = Table2.ID;
Result:

ID Name Address

1 John 123 Main St.

2 Sarah 456 Elm St.

Explanation: The query combines rows from both tables based on the common column ID.

2. Nested Query (Subquery)


A nested query (also called a subquery) is a query embedded within another SQL query. The subquery is
executed first and its results are passed to the outer query.
• Types of Subqueries:
o Single-row subquery: Returns a single value to the outer query.
o Multiple-row subquery: Returns multiple values.
o Correlated subquery: References columns from the outer query, executing for each row of
the outer query.
Example of a Nested Query:
Using the same tables:
Table1 (ID, Name):

ID Name

1 John

2 Sarah

3 David

Table2 (ID, Address):

ID Address

1 123 Main St.

2 456 Elm St.

4 789 Oak St.

SQL Query:
SELECT Name
FROM Table1
WHERE ID IN (SELECT ID FROM Table2);
Result:
Name

John

Sarah

Explanation: The subquery (SELECT ID FROM Table2) retrieves IDs from Table2, and the outer query selects
the names of the people in Table1 whose IDs match those returned by the subquery.

Comparison: Join vs Nested Query

Aspect Join Operation Nested Query

Combines rows from multiple tables based Query within a query; results of inner query
Definition
on a condition. used by outer query.

Often more efficient, especially for large May be slower for large datasets, as the
Performance
datasets. subquery is executed for each row.

Best for complex conditions or when a


Best for straightforward relationships
Use Cases subquery is required for a particular part of
between tables.
the query.

Easier to understand for complex queries,


Readability Can become complex with multiple joins.
as they isolate different parts of the task.

Less flexible in cases requiring multiple More flexible for advanced conditions or
Flexibility
conditions or nested logic. operations on subsets of data.

Joins may require fetching entire tables from Nested queries can reduce data
Distributed
multiple locations, leading to higher transferred, as only relevant data is fetched
Databases
overhead. from each location.

When to Use Joins vs Nested Queries


• Use Joins when:
o You need to retrieve data from multiple tables based on shared columns.
o The query is relatively simple and doesn’t require complex conditions or operations on
subsets.
o Performance is a concern, especially when dealing with large datasets.
• Use Nested Queries when:
o You need to perform complex filtering or calculations that depend on intermediate results.
o You require operations like checking for membership (IN), comparing values (ANY, ALL), or
working with aggregate data in one part of the query.
o Readability and clarity are important, especially for more complex tasks.

Functional Dependency and Attribute Closure


In Relational Database Management Systems (RDBMS), functional dependency (FD) and attribute closure
are essential concepts for maintaining data integrity, building normalized databases, and optimizing
queries. Let's go through these concepts in detail:

1. Functional Dependency (FD)


A functional dependency specifies a relationship between two sets of attributes in a relation. If a relation
has an FD, it means that for any two tuples in the relation, if they have the same values for a set of
attributes (called the determinant), they must also have the same value for another set of attributes (called
the dependent).
• Notation: If attribute set A functionally determines attribute set B, we write A → B.
• Example:
o If STUD_NO → STUD_NAME, this means that for any two records with the same STUD_NO,
the STUD_NAME will be the same.
o Similarly, STUD_NO → STUD_PHONE implies that the STUD_PHONE is uniquely determined
by STUD_NO.
Example Table: STUDENT

STUD_NO STUD_NAME STUD_PHONE STUD_STATE STUD_COUNTRY STUD_AGE

101 John 1234567890 NY USA 22

102 Sarah 9876543210 CA USA 21

103 David 1112233445 NY USA 23

In this case:
• STUD_NO → STUD_NAME, STUD_NO → STUD_PHONE, STUD_NO → STUD_STATE, etc., hold true
because STUD_NO uniquely identifies these attributes.

2. How to Find Functional Dependencies in a Relation


To identify FDs in a relation, consider the domain of the attributes. If one attribute (or set of attributes)
uniquely identifies another attribute, a functional dependency exists.
Example: In the STUDENT relation, the following FDs hold:
• STUD_NO → STUD_NAME (A student's number determines their name)
• STUD_NO → STUD_PHONE
• STUD_NO → STUD_STATE
• STUD_STATE → STUD_COUNTRY (If two students share the same state, they are from the same
country)
FD Set for STUDENT:
• { STUD_NO → STUD_NAME, STUD_NO → STUD_PHONE, STUD_NO → STUD_STATE, STUD_NO →
STUD_COUNTRY, STUD_NO → STUD_AGE, STUD_STATE → STUD_COUNTRY }
3. Attribute Closure
Attribute closure is the set of all attributes that can be functionally determined by a given set of attributes.
To find the closure of an attribute set:
1. Add all the attributes in the attribute set to the result.
2. Recursively add attributes that can be functionally determined from the current set.
Steps:
1. Start with the given attribute set.
2. Apply functional dependencies to add new attributes to the set until no more attributes can be
added.
Example:
For STUD_NO+ (the closure of STUD_NO):
• STUD_NO → STUD_NAME, STUD_NO → STUD_PHONE, STUD_NO → STUD_STATE, STUD_NO →
STUD_COUNTRY, STUD_NO → STUD_AGE
• STUD_NO+ = {STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE, STUD_COUNTRY, STUD_AGE}
For STUD_STATE+ (the closure of STUD_STATE):
• STUD_STATE → STUD_COUNTRY
• STUD_STATE+ = {STUD_STATE, STUD_COUNTRY}

4. Functional Dependency Set (FD Set)


An FD set is the collection of all functional dependencies that hold true for a given relation. The FD set for a
relation helps in identifying the relationships between attributes and is essential for designing and
optimizing databases.
For example, the FD set for the STUDENT relation is:
• {STUD_NO → STUD_NAME, STUD_NO → STUD_PHONE, STUD_NO → STUD_STATE, STUD_NO →
STUD_COUNTRY, STUD_NO → STUD_AGE, STUD_STATE → STUD_COUNTRY}

5. How to Find Candidate Keys and Super Keys Using Attribute Closure
• Super Key: A set of attributes that can uniquely identify a tuple in a relation. If the closure of an
attribute set contains all attributes in the relation, the set is a super key.
• Candidate Key: A minimal super key (i.e., no proper subset of the attribute set can uniquely identify
all attributes).
Steps to Find Candidate Keys:
1. Find the closure of different attribute sets.
2. The attribute set whose closure contains all the attributes of the relation is a super key.
3. If no subset of the set can be a super key, it is a candidate key.
Example: For STUD_NO+ = {STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE, STUD_COUNTRY,
STUD_AGE}, STUD_NO is a candidate key.
Prime Attribute: Attributes that are part of any candidate key.
• In the STUDENT relation, STUD_NO is a prime attribute because it is part of the candidate key.
Non-Prime Attribute: Attributes that are not part of any candidate key.
• In the STUDENT relation, STUD_NAME, STUD_PHONE, etc., are non-prime attributes.

6. Example Questions on Functional Dependency


Q1: Find the Key for the Relation Scheme
Given:
• Relation R = {E, F, G, H, I, J, K, L, M, N}
• Functional Dependencies:
o {E, F} → {G}
o {F} → {I, J}
o {E, H} → {K, L}
o K → {M}
o L → {N}
Solution:
• {E, F}+ = {E, F, G, I, J}
• {E, F, H}+ = {E, F, G, I, J, K, L, M, N}
Thus, {E, F, H} is a candidate key.
Q2: Deriving an FD from a Given FD Set
Given: A → B, A → C, B → D, C → E, find whether A → D holds.
• {A}+ = {A, B, C, D, E} Thus, A → D holds, as D is in the closure of A.

Conclusion
Functional Dependency and Attribute Closure are fundamental concepts in database design, helping
ensure data consistency, integrity, and efficiency. By understanding these tools, one can:
• Identify candidate keys and super keys.
• Determine the closure of attribute sets.
• Optimize database queries and designs for better performance.

➤ Prime Attribute
An attribute that is part of a candidate key (i.e., it contributes to uniquely identifying records).
Example: In a STUDENT table with (Roll_No, Name, Age), if Roll_No is the candidate key, then Roll_No is
a prime attribute.

➤ Non-Prime Attribute
An attribute that is not part of any candidate key.
Example: If Roll_No is the candidate key, then Name and Age are non-prime attributes.
Example Table

Roll_No (PK) Name Age

101 Ram 18

102 Sita 19

• Roll_No (Prime Attribute)


• Name, Age (Non-Prime Attributes)

Functional Dependencies (FDs) in DBMS


1. Introduction
A functional dependency (FD) is a constraint between two sets of attributes in a relation (table). It defines
how one attribute (or a group of attributes) determines another attribute uniquely.

➤ Notation
X→Y
This means:
• If two tuples have the same value for X, they must have the same value for Y.
• X is called the determinant, and Y is the dependent attribute.

Example:
• Consider a STUDENT table:

Roll_No Name Age Course

101 Ram 18 CS

102 Sita 19 IT

103 Amit 18 CS

• Here, Roll_No → Name, Age, Course because:


o If two rows have the same Roll_No, they must have the same Name, Age, and Course.

2. Types of Functional Dependencies


1. Trivial Functional Dependency

X → Y is trivial if Y is a subset of X.
Example: {Roll_No, Name} → Name
• Since Name is already part of {Roll_No, Name}, it is trivial.

General Rule
• A functional dependency X → Y is trivial if X already contains Y.

2. Non-Trivial Functional Dependency

X → Y is non-trivial if Y is NOT a subset of X.


Example: Roll_No → Name
• Name is not a part of Roll_No, so this is a non-trivial dependency.

General Rule
• A dependency is non-trivial if X does not contain Y.

3. Fully Functional Dependency

X → Y is fully functional if Y depends on the entire X, not just a part of it.


Example: (Roll_No, Subject) → Marks
• Marks depend on both Roll_No and Subject together, not individually.

General Rule
• If removing any part of X breaks the dependency, it is fully functional.

4. Partial Functional Dependency

A dependency where an attribute depends on only part of a composite key.


Example: (Roll_No, Subject) → Name
• Name depends only on Roll_No, not Subject, so this is a partial dependency.

Why is it a problem?
• Leads to redundancy (same name repeated multiple times).
• Violates 2NF (Second Normal Form).

5. Transitive Functional Dependency

If X → Y and Y → Z, then X → Z is a transitive dependency.


Example:
• Roll_No → Class & Class → Teacher
• Since Roll_No indirectly determines Teacher, we get Roll_No → Teacher (transitive dependency).

Why is it a problem?
• Leads to redundancy.
• Violates 3NF (Third Normal Form).

6. Multivalued Dependency (MVD)


If one attribute has multiple independent values for another attribute.
Example: Course →→ Book (A course can have multiple books, but books do not determine the
course).

Why is it a problem?
• Leads to data redundancy.
• Violates 4NF (Fourth Normal Form).

3. Functional Dependency Closure (F⁺)

The closure of a set of functional dependencies is the complete set of dependencies that can be
logically inferred.
Steps to Find Closure (F⁺)
1. Start with a given set of attributes X.
2. Apply all functional dependencies iteratively.
3. Find all attributes that can be derived from X.

Example:
Given:
• A→B
• B→C
Find A⁺ (Closure of A):
• A → B (Given)
• B → C (Since B is known, we get C)
• So, A⁺ = {A, B, C}

4. Armstrong’s Axioms for Functional Dependencies


Armstrong's Axioms are a set of rules to infer all functional dependencies.
Axioms (Rules)

1⃣ Reflexivity: If Y ⊆ X, then X → Y
• Example: {Roll_No, Name} → Name

2⃣ Augmentation: If X → Y, then X, Z → Y, Z
• Example: If Roll_No → Name, then {Roll_No, Age} → {Name, Age}

3⃣ Transitivity: If X → Y and Y → Z, then X → Z


• Example: Roll_No → Class and Class → Teacher ⟹ Roll_No → Teacher

5. Importance of Functional Dependencies in Normalization


Functional dependencies are used in Normalization to remove redundancy and improve database
efficiency.

How FDs affect normal forms?


1NF (First Normal Form):
• Ensures atomicity (each attribute contains only one value).

2NF (Second Normal Form):


• Removes partial dependencies.

3NF (Third Normal Form):


• Removes transitive dependencies.

BCNF (Boyce-Codd Normal Form):


• Ensures every determinant is a candidate key.

6. Key Takeaways

✔ Functional Dependencies define relationships between attributes.


✔ Prime Attributes are part of the candidate key, while Non-Prime Attributes are not.
✔ Normalization uses functional dependencies to remove redundancy.
✔ Armstrong's Axioms help infer new functional dependencies.

Functional dependencies are critical for designing an efficient and well-structured database.

Anomalies in the Relational Model


What Are Anomalies?
Anomalies in the relational model refer to issues that arise during data insertion, deletion, or modification
in relational databases. These anomalies typically occur due to poor database design, lack of normalization,
data redundancy, or improper use of primary or foreign keys. There are three primary types of anomalies:
1. Insertion Anomalies
2. Deletion Anomalies
3. Update Anomalies
Causes of Anomalies
• Poor Database Management: Storing data in a flat, non-normalized structure.
• Data Redundancy: Storing duplicate data across multiple records.
• Improper Key Usage: Not using primary or foreign keys correctly, leading to integrity issues.
These problems lead to inconsistencies during insert, update, or delete operations, which ultimately
undermine the reliability of the database.
Types of Anomalies:
1. Insertion Anomalies:
o Definition: Occur when it's impossible to insert data into the database because some
required fields are missing or incomplete.
o Example: Trying to insert a record into the STUDENT_COURSE table with STUD_NO = 7
where there is no corresponding STUD_NO = 7 in the STUDENT table will cause an insertion
anomaly.
2. Deletion Anomalies:
o Definition: Occur when deleting a record unintentionally removes valuable data or creates
inconsistencies.
o Example: Deleting a record from the STUDENT table (e.g., STUD_NO = 1) might delete all
associated records from the STUDENT_COURSE table, leading to the loss of valuable
information.
3. Update Anomalies:
o Definition: Occur when updating data in one location but failing to update it in related
locations, resulting in inconsistent or incorrect data.
o Example: If an employee’s salary is updated in one record but not in others, it could lead to
incorrect salary reporting.
Solutions to Anomalies:
• On Delete/Update Set NULL: If a record in the referenced table is deleted or updated, the
corresponding referencing attribute in the referencing table is set to NULL.
• On Delete/Update Cascade: If a record in the referenced table is deleted or updated, the
corresponding record in the referencing table is also deleted or updated.
Removal of Anomalies (Normalization):
Normalization is the process of organizing data into tables to eliminate redundancy and ensure consistency.
This reduces anomalies by breaking data into smaller, manageable parts and ensuring that relationships are
logical and well-structured.
The normalization process follows several stages:
1. First Normal Form (1NF):
o Ensures that each column contains atomic values (no repeating groups).
2. Second Normal Form (2NF):
o Removes partial dependencies, ensuring that all non-key attributes are fully dependent on
the primary key.
3. Third Normal Form (3NF):
o Removes transitive dependencies, ensuring that non-key attributes depend only on the
primary key.
By applying these steps, a database is structured in a way that minimizes redundancy and ensures data
integrity, which prevents insertion, update, and deletion anomalies.
FAQs:
1. What is Normalization?
o Normalization is the process of dividing large tables into smaller ones to remove data
redundancy and anomalies. It improves database organization and integrity.
2. What are Anomalies in the Relational Model?
o Anomalies refer to inconsistencies or errors in the database due to poor data management,
redundancy, or improper key usage. These issues can be removed through normalization.
3. How Can Anomalies Be Removed?
o Anomalies can be eliminated by normalizing the database. This involves organizing data into
smaller tables and applying rules that ensure consistent and efficient data storage.

Introduction to Database Normalization


Normalization is a critical process in database design aimed at organizing the attributes of a database to
minimize redundancy and prevent anomalies. The purpose of normalization is to improve efficiency, ensure
consistency, and make it easier to manage and maintain the data over time.
What is Normalization?
Normalization is the process of organizing the attributes of the database to reduce or eliminate data
redundancy. Data redundancy refers to the unnecessary repetition of the same data in various parts of the
database, which can lead to inefficient use of storage, inconsistencies, and errors during data manipulation.
Why is Normalization Necessary?
The primary goal of normalization is to avoid the following anomalies that arise due to redundancy:
1. Insertion Anomalies: These occur when it is not possible to insert data due to missing or
incomplete fields.
2. Deletion Anomalies: Deleting a record may unintentionally remove critical related data.
3. Updation Anomalies: When data is modified inconsistently across multiple records.
Normalization helps in eliminating these anomalies and improving the overall structure of the database,
ensuring that data can be efficiently and accurately stored, updated, and deleted.
Prerequisites for Understanding Database Normalization
To understand database normalization, a few fundamental concepts are necessary:
1. Keys: Unique identifiers in a table (e.g., student ID).
2. Functional Dependency: Describes the relationship between data attributes where one attribute
can determine the value of another.
Additionally, concepts like Dependency Preserving Decomposition and Lossless Decomposition are crucial
for splitting tables while maintaining the integrity of the data.
Features of Database Normalization
• Elimination of Data Redundancy: Reduces repetitive data and ensures consistency.
• Ensuring Data Consistency: Prevents discrepancies caused by redundant data.
• Simplification of Data Management: Eases the process of updating and managing the database.
• Improved Database Design: Makes the database flexible and adaptable.
• Avoiding Update Anomalies: Ensures smooth updating without inconsistent data.
• Standardization: Ensures that the database follows a consistent and uniform structure.
Normal Forms in DBMS
There are several normal forms in database normalization, each offering specific guidelines for structuring
the data:
1. First Normal Form (1NF): Requires that every attribute in the relation is single-valued.
2. Second Normal Form (2NF): In addition to 1NF, it ensures that all non-key attributes are fully
dependent on the primary key.
3. Third Normal Form (3NF): Ensures no transitive dependency for non-prime attributes and is based
on 2NF.
4. Boyce-Codd Normal Form (BCNF): A stricter version of 3NF where every functional dependency’s
left-hand side is a superkey.
5. Fourth Normal Form (4NF): Ensures that no multi-valued dependencies exist.
6. Fifth Normal Form (5NF): Ensures that the relation cannot be further decomposed without loss of
information.
Advantages of Normalization
• Eliminates Redundancy: Reduces the need for repeated data entries.
• Improves Data Integrity: Ensures consistency and accuracy of data.
• Simplifies Updates: Changes are made in one place, reducing errors.
• Flexible Queries: Easier to query the database with smaller, more specific tables.
• Integration: Ensures consistency across different applications using the same database.
Disadvantages of Normalization
• Performance Overhead: Increased complexity in querying due to additional joins.
• Loss of Context: Data is split across tables, requiring more joins to retrieve complete data.
• Complexity: The process of normalization may be complex and requires a deep understanding of
database design.
Conclusion
Database normalization plays a crucial role in organizing data efficiently within a database. By reducing
redundancy, ensuring consistency, and improving overall structure, normalization enhances the accuracy,
scalability, and maintainability of a database.
FAQs on Database Normalization
1. When should I Denormalize a database?
o Denormalization is used when you intentionally introduce redundancy to improve
performance, especially in systems with high read operations.
2. How does normalization impact database performance?
o While normalization reduces redundancy and ensures consistency, it may add performance
overhead due to additional joins.
3. What role does database design play in normalization?
o Effective database design ensures that the database is properly structured and that
normalization procedures can be correctly implemented.

Overview of Normalization and Normal Forms in DBMS


• Normalization is the process of organizing a database to eliminate redundancy and anomalies,
ensuring data consistency and reducing redundancy.
• Normal Forms (NF) are specific criteria that guide the normalization process, with each higher form
providing stricter rules to maintain the database's integrity.
Steps of Normalization:
1. First Normal Form (1NF):
o Ensures each column contains atomic values and each row is unique.
o Eliminates composite or multi-valued attributes.
2. Second Normal Form (2NF):
o Builds upon 1NF.
o Eliminates partial dependencies by ensuring all non-prime attributes depend on the entire
candidate key.
3. Third Normal Form (3NF):
o Builds upon 2NF.
o Eliminates transitive dependencies, ensuring non-prime attributes are only dependent on
super keys or candidate keys.
4. Boyce-Codd Normal Form (BCNF):
o A stricter version of 3NF.
o Ensures every determinant is a candidate key, eliminating any remaining redundancy.
5. Fourth Normal Form (4NF):
o Builds upon BCNF.
o Eliminates multi-valued dependencies.
6. Fifth Normal Form (5NF):
o Builds upon 4NF.
o Ensures no further lossless decomposition is possible.
Key Concepts:
• Candidate Key: A minimal set of attributes that can uniquely identify a record.
• Prime Attributes: Attributes that are part of the candidate key.
• Non-prime Attributes: Attributes not part of any candidate key.
Advantages of Normalization:
• Reduces Redundancy: Minimizes duplicate data.
• Improves Data Integrity: Ensures consistent data and relationships.
• Simplifies Database Design: Helps in creating manageable, efficient structures.
• Improves Query Performance: By organizing data into smaller, related tables.
Disadvantages of Over-Normalization:
• Performance Overhead: Complex queries due to multiple table joins.
• Loss of Data Context: Data might need to be reconstructed through joins.
Example of Highest Normal Form:
• A step-by-step example shows how to find the highest normal form (2NF, in this case) of a given
relation by analyzing its functional dependencies.
Conclusion:
• While normalization is crucial for maintaining data integrity, too much normalization may introduce
complexity. It’s essential to strike a balance based on the needs of the application.
FAQs on Normal Forms:
1. Why is normalization important?
o Ensures consistency, eliminates redundancy, and facilitates easier database maintenance.
2. Can a database be over-normalized?
o Yes, excessive normalization can lead to complex queries and slower performance.
3. Is it necessary to achieve the highest normal form?
o Not necessarily. Lower forms can suffice based on performance and simplicity needs.
4. Can a relation in 2NF have partial dependency?
o Yes, a relation in 2NF can still have partial dependencies involving candidate keys, but non-
prime attributes must be fully dependent on the candidate key.
This structure provides clarity in understanding the need for and application of normal forms in database
management.

First Normal Form (1NF) in DBMS


Overview:
First Normal Form (1NF) is the first level of normalization in database design, focusing on ensuring that a
table has a basic structure that minimizes redundancy and complexity. It addresses fundamental issues
related to data consistency and integrity by imposing certain rules on how data should be organized in
tables.
Key Points of 1NF:
1. Single Valued Attributes:
o Every column must contain only a single value for each record. This means no multi-valued
or composite attributes are allowed in a cell.
o Example: A column containing multiple phone numbers or addresses should be split into
multiple rows, each having a single phone number or address.
2. Consistent Data Type per Column:
o Each column must store data of the same type. Mixing data types (e.g., storing names in a
date column) is not allowed.
o Example: A column labeled "DOB" should only contain date values, not text or numeric
values.
3. Unique Column Names:
o Each column must have a distinct name to avoid confusion when accessing or manipulating
the data.
o Example: Two columns with the same name would cause ambiguity in queries and updates.
4. Order of Data Doesn’t Matter:
o The order in which the data is stored in the table (rows or columns) does not affect the
table’s operation or structure.
o Example: Reordering the rows or columns in a table won't change its function or the
accuracy of its data retrieval.
Example of 1NF Violation and Correction:
Consider a table with columns such as [Writer 1], [Writer 2], and [Writer 3] for a single book:

Book ID Writer 1 Writer 2 Writer 3

101 John Sarah Jack

This violates 1NF because the "Writer" attribute is multi-valued. To make it compliant with 1NF, the table
should be restructured so that each writer gets its own row:

Book ID Writer

101 John

101 Sarah

101 Jack

Conclusion:
1NF lays the foundation for a well-structured database by enforcing that each column has atomic
(indivisible) values, reducing redundancy and making the data easier to manage.
FAQs on First Normal Form (1NF)
• What does 1NF mean? 1NF ensures that a database table contains only atomic (indivisible) values
and that all columns have unique names, facilitating easy and consistent data management.
• What is the significance of 1NF in database design? Implementing 1NF is crucial because it
removes redundant data and ensures that tables are structured in a way that supports data
integrity, efficient queries, and operations.
• What is the first normal form (1NF)? 1NF guarantees that there are no repeating groups within
rows and that all columns contain atomic, indivisible values, ensuring a basic level of data
consistency and organization in the database.
This explanation of First Normal Form (1NF) emphasizes its role in organizing and structuring data
efficiently to eliminate redundancy and maintain consistency. It also highlights the importance of following
simple rules to make the database more manageable and query-friendly.

Second Normal Form (2NF) in DBMS


Overview:
Second Normal Form (2NF) is a level of database normalization that builds on the foundation of First
Normal Form (1NF). It aims to eliminate partial dependencies, where non-prime attributes depend on a
part of a composite primary key instead of the whole key. By removing these partial dependencies, 2NF
reduces redundancy and ensures data integrity.
Key Points of 2NF:
1. Meeting 1NF Requirements:
o To be in 2NF, a table must first satisfy the conditions of First Normal Form (1NF): it should
have atomic (single) values in each cell, and there should be no repeating groups.
2. Eliminating Partial Dependencies:
o In 2NF, partial dependencies are removed. A partial dependency occurs when a non-prime
attribute (an attribute not part of the primary key) is functionally dependent on part of a
composite primary key rather than on the entire key.
o Non-prime Attribute: An attribute that is not part of any candidate key.
o Partial Dependency Example:
▪ If a table has a composite primary key consisting of A and B, and a non-prime
attribute C depends only on A (not the whole key {A, B}), then C is partially
dependent on the key.
3. Functional Dependency:
o For a table to be in 2NF, every non-prime attribute must be fully dependent on the entire
candidate key, not just a part of it.
Examples:
Example 1: Consider a table with the following attributes: STUD_NO, COURSE_NO, COURSE_FEE.

STUD_NO COURSE_NO COURSE_FEE

101 C1 1000

101 C2 2000

102 C1 1000

• The composite primary key here is {STUD_NO, COURSE_NO}.


• The non-prime attribute COURSE_FEE depends only on COURSE_NO (not the entire key).
• This creates a partial dependency (since COURSE_FEE depends only on COURSE_NO, not
STUD_NO).
To convert this to 2NF, we remove the partial dependency by splitting the table into two:
1. Student-Course Table: {STUD_NO, COURSE_NO} (Primary key: {STUD_NO, COURSE_NO})
2. Course-Fee Table: {COURSE_NO, COURSE_FEE} (Primary key: {COURSE_NO})
Now, COURSE_FEE is fully dependent on COURSE_NO, which is the primary key of the second table. The
tables are in 2NF.
Example 2: Consider a relation R(A, B, C, D) with the following functional dependencies:
• AB → C (A and B together determine C)
• BC → D (B and C together determine D)
Here, AB is the only candidate key, and there is no partial dependency because no proper subset of AB
determines a non-prime attribute. Hence, the relation is already in 2NF.
What is Partial Dependency?
A partial dependency occurs when a non-prime attribute is functionally dependent on part of a composite
key, rather than the entire composite key.
• Example: If a composite primary key is {A, B}, and a non-prime attribute C depends only on A, then
C is partially dependent on {A, B}, because it depends on only part of the key (just A).
Conclusion:
Second Normal Form (2NF) ensures that a database is well-structured by eliminating partial dependencies,
which helps reduce redundancy and improve data consistency. It is an essential step in organizing a
database schema that optimizes memory and improves query efficiency.
FAQs on Second Normal Form (2NF)
• What is the purpose of normalization? Normalization is the process of structuring data to minimize
redundancy and inconsistencies, helping with efficient storage and retrieval while maintaining data
integrity.
• What is 1NF? First Normal Form (1NF) eliminates repeating groups and ensures that all columns
contain atomic values, avoiding multi-valued attributes.
• What is 2NF? Second Normal Form (2NF) ensures that a table is in 1NF and that all non-prime
attributes are fully dependent on the entire candidate key, eliminating partial dependencies.
• What is a composite primary key? A composite primary key is made up of two or more columns
that together uniquely identify each record in the table.
• What is a candidate key? A candidate key is any set of attributes that can uniquely identify a
record. A table can have multiple candidate keys, but only one is chosen as the primary key.
• What are functional dependencies? Functional dependencies define the relationship between
attributes in a table. If attribute A determines B, we write it as A → B.
• How does 2NF improve efficiency? 2NF minimizes redundancy by ensuring that non-prime
attributes are fully dependent on the entire candidate key, reducing unnecessary data duplication
and optimizing database performance.
• What are transitive dependencies? Transitive dependencies occur when non-prime attributes
depend on other non-prime attributes. These are eliminated in Third Normal Form (3NF).

What is Boyce-Codd Normal Form (BCNF)?


Boyce-Codd Normal Form (BCNF) is a stronger version of Third Normal Form (3NF), designed to eliminate
redundancy more effectively. It requires that for every non-trivial functional dependency (FD) in a relation,
the left-hand side (determinant) of the FD must be a superkey.
In simpler terms, BCNF ensures that:
1. A relation is in 3NF.
2. Every functional dependency in the relation must have a superkey as its determinant.

Rules for BCNF:


1. The table must be in 3NF.
2. For every functional dependency X→YX \rightarrow YX→Y, X must be a superkey (i.e., a candidate
key or a combination of keys that uniquely identifies each row in the relation).

Why BCNF?
Even though a relation might be in 3NF, there could still be cases where a functional dependency exists
where the left-hand side isn't a superkey. This can lead to redundancy and anomalies. BCNF eliminates this
problem by ensuring all dependencies are properly associated with superkeys.

Key Concepts:
• Superkey: A set of one or more attributes that can uniquely identify a record in a relation.
• Candidate Key: A minimal superkey (i.e., no subset of the key can uniquely identify records).
• Non-prime Attribute: Attributes that are not part of any candidate key.

Advantages of BCNF:
1. Reduces Redundancy: Eliminates data duplication by ensuring that functional dependencies
depend on superkeys.
2. Improves Data Integrity: By removing unnecessary dependencies, BCNF enhances the consistency
and integrity of the database.
3. Prevents Update Anomalies: BCNF ensures there are no situations where updates might result in
inconsistencies, like having multiple entries for the same data.

Frequently Asked Questions (FAQs):


1. How does BCNF differ from 3NF?
o BCNF is stricter than 3NF because it requires that every determinant in a functional
dependency must be a superkey, while 3NF only requires that non-prime attributes be
dependent on a candidate key.
2. What are the benefits of applying BCNF?
o BCNF minimizes redundancy, improves data integrity, and prevents update anomalies,
making it highly beneficial for large databases with multiple candidate keys.
3. Why is BCNF considered stricter than 3NF?
o BCNF ensures that every functional dependency has a superkey as the determinant, while
3NF allows functional dependencies with non-prime attributes as long as they depend on a
candidate key.
4. When should BCNF be used instead of 3NF?
o BCNF should be used when you have a database with multiple candidate keys, or when you
experience update anomalies despite achieving 3NF.

Conclusion
BCNF is a more robust normalization technique compared to 3NF, addressing potential redundancy issues
caused by non-superkey determinants. While it may not always be feasible to apply BCNF without losing
dependency preservation, it remains a powerful tool for ensuring a consistent, efficient, and less redundant
database design.

Third Normal Form (3NF)


Third Normal Form (3NF) is an essential stage in the process of database normalization. It aims to eliminate
redundancy and improve data integrity while still allowing for flexibility and efficiency in database design.
While 2NF helps eliminate partial dependency, 3NF addresses transitive dependencies.

What is Third Normal Form (3NF)?


Third Normal Form (3NF) is a normal form used in relational database design that builds on the
requirements of First Normal Form (1NF) and Second Normal Form (2NF). It ensures that:
1. A relation is in 2NF.
2. There are no transitive dependencies.
In simpler terms:
• No transitive dependency means that non-prime attributes (attributes that are not part of any
candidate key) should not depend on other non-prime attributes.

Rules for 3NF:


1. The relation must be in 2NF.
2. For every functional dependency X→YX \rightarrow Y, at least one of the following conditions must
hold:
o XX is a superkey, or
o YY is a prime attribute (i.e., part of a candidate key).

Why 3NF?
The primary goal of 3NF is to eliminate transitive dependencies. These are cases where one non-prime
attribute depends on another non-prime attribute, potentially leading to data redundancy and
inconsistency. By removing these dependencies, 3NF ensures a more efficient, consistent, and reliable
database design.

Key Concepts:
• Superkey: A set of one or more attributes that uniquely identifies a record in a relation.
• Candidate Key: A minimal superkey, meaning no proper subset of it can uniquely identify records.
• Prime Attribute: An attribute that is part of any candidate key.
• Non-prime Attribute: An attribute that is not part of any candidate key.

Advantages of 3NF:
1. Reduces Redundancy: By eliminating transitive dependencies, 3NF helps in removing unnecessary
data duplication.
2. Improves Data Integrity: 3NF ensures that the data in the database remains consistent and free
from anomalies.
3. Flexibility in Queries: With fewer dependencies, queries become simpler and more efficient.
4. Minimizes Update Anomalies: Eliminating transitive dependencies reduces the chances of
inconsistent updates.

Frequently Asked Questions (FAQs):


1. What is the difference between 2NF and 3NF?
o While 2NF eliminates partial dependencies (i.e., attributes depending on part of a candidate
key), 3NF addresses transitive dependencies, ensuring that non-prime attributes do not
depend on other non-prime attributes.
2. What is a transitive dependency?
o A transitive dependency occurs when a non-prime attribute depends on another non-prime
attribute through a chain of dependencies. For example, A→B→CA \rightarrow B
\rightarrow C, where both BB and CC are non-prime attributes.
3. When should 3NF be used instead of 2NF?
o 3NF should be used after achieving 2NF. If a database is in 2NF, but still has transitive
dependencies, it should be normalized to 3NF to further reduce redundancy and improve
data integrity.
4. What are the limitations of 3NF?
o While 3NF eliminates transitive dependencies, it does not necessarily eliminate all types of
redundancy. There could still be cases where certain redundancies persist, such as when a
table has multiple candidate keys or when some dependencies are not captured effectively.

Conclusion
Third Normal Form (3NF) is an important normalization step that builds on 1NF and 2NF by eliminating
transitive dependencies. By ensuring that non-prime attributes only depend on superkeys or are part of
candidate keys, 3NF helps in reducing redundancy, improving data integrity, and maintaining a consistent
and efficient database design. However, for even stricter designs, higher normal forms like BCNF may be
considered.

Introduction to Fourth and Fifth Normal Forms (4NF & 5NF)


Last Updated: 06 Feb, 2025
The Fourth Normal Form (4NF) and Fifth Normal Form (5NF) are advanced levels of database normalization,
primarily addressing multivalued dependencies and join dependencies, respectively. These normal forms
are used to ensure that databases are free from unnecessary redundancy and to maintain high integrity by
organizing data in a structured way.

Fourth Normal Form (4NF)


Definition: The Fourth Normal Form (4NF) builds upon the Boyce-Codd Normal Form (BCNF) and focuses
on multivalued dependencies. A relation is in 4NF if:
1. It is in BCNF.
2. It does not have any non-trivial multivalued dependencies other than a candidate key.
What are Multivalued Dependencies?:
• A multivalued dependency (MVD) occurs when a set of attributes AAA in a relation determines two
or more independent sets of attributes, say BBB and CCC. These sets are independent of each other
but both depend on AAA.
• In simpler terms, if for a given value of attribute AAA, there are multiple independent values for BBB
and CCC, a multivalued dependency exists.
Conclusion
• 4NF addresses multivalued dependencies, ensuring that no non-trivial multivalued dependencies
exist in a relation except those involving candidate keys.
• 5NF deals with join dependencies, ensuring that a relation is decomposable without loss of data
through natural joins.
• These higher normal forms are used for more complex database structures, aiming to eliminate
redundancies and ensure the integrity and accuracy of data.
• While applying 4NF and 5NF can result in a more normalized database, it may also introduce more
complex database schemas and possibly slow down query performance, so these are often used
when data accuracy and integrity are of utmost importance.

FAQs on 4NF and 5NF


Q.1: What is the difference between 4NF and 5NF in DBMS?
• 4NF handles multivalued dependencies, ensuring that a relation does not contain non-trivial
multivalued dependencies.
• 5NF addresses join dependencies, ensuring that a relation cannot be further decomposed without
losing information.
Q.2: What is the 6th Normal Form (6NF)?
• 6NF is used when dealing with temporal data that varies over time. It focuses on eliminating
unwanted duplication in such cases, ensuring that data is represented in the most granular form
possible to handle temporal relationships efficiently.

The Problem of Redundancy in Database


Redundancy refers to the presence of duplicate copies of data within a database, often resulting from a
lack of proper database normalization. When data is not efficiently organized, multiple records containing
the same information are stored, which can cause various issues related to database integrity,
performance, and maintenance.

Example of Redundancy
Consider a table storing details about students, including attributes like student ID, name, college name,
course, and rank. The following table shows how data redundancy can appear:

Student_ID Name Contact College Course Rank

100 Himanshu 7300934851 GEU B.Tech 1

101 Ankit 7900734858 GEU B.Tech 1

102 Ayush 7300936759 GEU B.Tech 1

103 Ravi 7300901556 GEU B.Tech 1

• Notice that College, Course, and Rank attributes are repeated for every student, which creates
unnecessary redundancy.
Problems Caused by Redundancy
1. Insertion Anomaly
• Definition: This occurs when inserting a new record into the database requires unnecessary
additional data to be inserted.
• Example: If a student’s course is undecided, the student's record cannot be inserted without
assigning a course. This would prevent the insertion of incomplete records.
2. Deletion Anomaly
• Definition: Deleting a record can unintentionally remove useful data.
• Example: If a student’s record is deleted, the college information might also be deleted, which
should not happen as the college's data is independent of the student's record.
3. Updation Anomaly
• Definition: Updates to data may need to be applied multiple times across the database, leading to
inconsistencies.
• Example: If the college rank changes, every record for students at that college must be updated.
Failing to do so would leave some records with outdated rank data.

Additional Issues Caused by Redundancy


1. Data Inconsistency:
o Redundant data can lead to inconsistencies where different copies of the same information
may not match, leading to errors and unreliable data.
2. Storage Requirements:
o Redundant data requires more storage space, which can be costly and inefficient. More
space is required to store duplicate data, leading to higher resource consumption.
3. Performance Issues:
o Redundant data can result in slower database performance, as it requires more time to
search, update, and maintain larger amounts of data.
4. Security Issues:
o Redundant data increases the risk of unauthorized access and manipulation. Multiple copies
of data can lead to breaches and reduce the overall security of the system.
5. Maintenance Complexity:
o The presence of redundant data makes it more challenging to maintain consistency across
the system. Each copy of the data must be updated and managed, requiring more effort and
resources.
6. Data Duplication:
o The unnecessary repetition of data can confuse users and result in inconsistent information
across different parts of the database.
7. Data Integrity:
o Redundancy can compromise data integrity because if one copy of the data is updated and
the others are not, it leads to inconsistent or inaccurate data.
8. Usability Issues:
o Users may struggle to find the correct version of the data due to redundancy, leading to
frustration and reduced productivity.

How to Prevent Redundancy


To reduce redundancy and improve database performance, normalization is the most effective approach.
Normalization involves breaking down large, redundant tables into smaller, related ones and using keys to
link them together, thereby reducing unnecessary repetition.

Advantages and Disadvantages of Redundant Data


Advantages:
1. Enhanced Query Performance: Redundant data can sometimes speed up data retrieval by avoiding
complex joins.
2. Offline Access: Redundant copies allow access to data when the primary source is unavailable.
3. Increased Availability: Data redundancy can increase fault tolerance, ensuring data is accessible
even if one server or location fails.
Disadvantages:
1. Increased Storage Requirements: Redundant data increases the storage costs and space needed.
2. Inconsistency: If data is stored in multiple places, there is a risk of having outdated or conflicting
versions of the same data.
3. Difficulty in Maintenance: Keeping all copies of data consistent requires more effort and resources,
making maintenance more complex.
4. Increased Risk of Errors: Redundant data increases the chances of errors in updates, deletions, and
modifications.
5. Reduced Flexibility: Changing redundant data requires making multiple updates, which can be time-
consuming and error-prone.

Conclusion
Redundancy in databases is a common issue that can cause data inconsistencies, higher storage
requirements, performance degradation, and security risks. The best way to handle redundancy is through
normalization, which organizes data to eliminate unnecessary duplication and improve database efficiency
and integrity.

FAQs on Redundancy in Database


Q.1: What is the Redundancy Problem?
• Answer: The redundancy problem arises when multiple copies of the same data are stored in a
database, leading to increased size, complexity, and potential data inconsistency.
Q.2: What are the problems caused due to redundancy?
• Answer: Problems include data inconsistency, higher storage requirements, update anomalies,
security issues, and increased maintenance complexity.
Q.3: How is data redundancy handled?
• Answer: Data redundancy is handled by normalizing the database, breaking down large tables into
smaller ones, and removing duplicate data entries, ensuring each piece of information is stored only
once.

Lossless Join and Dependency Preserving Decomposition in


DBMS
When designing a normalized database, it's crucial to decompose relations (tables) in a way that meets two
important properties: Lossless Join and Dependency Preservation. These properties ensure that the
decomposition maintains the integrity and enforceability of the original data.
1. Lossless (Non-Additive) Join Decomposition
Definition:
A decomposition of a relation R into two or more subrelations (e.g., R1 and R2) is said to be lossless if,
when the subrelations are joined back together, the result is exactly the original relation R—no tuples are
lost or spurious (extra) tuples are generated.
Key Points:
• No Data Loss: The original relation's data can be completely reconstructed by joining the
decomposed relations.
• No Spurious Tuples: The join does not create any tuples that did not exist in the original relation.
• Necessary Condition: For a binary decomposition (R = R1 ∪ R2), a sufficient condition for lossless
join is that the intersection of R1 and R2 (i.e., common attributes) should be a superkey for at least
one of the decomposed relations.
Example:
Suppose we have a relation R(A, B, C) and we decompose it into:
• R1(A, B)
• R2(B, C)
For the join to be lossless, the common attribute B should be such that B → A (making B a superkey for R1)
or B → C (making B a superkey for R2). If this condition is met, joining R1 and R2 on B will reconstruct R
exactly.

2. Dependency Preserving Decomposition


Definition:
A decomposition is dependency preserving if all functional dependencies (FDs) defined on the original
relation R can be enforced by simply enforcing them on each of the decomposed subrelations without
needing to perform a join. In other words, the union of the FDs that hold in the decomposed relations is
equivalent to the set of FDs in the original relation.
Key Points:
• Ease of Enforcement: Dependency preservation means that constraints (FDs) can be checked within
individual subrelations. This avoids the need for costly join operations to enforce or verify the
dependencies.
• Maintenance of Integrity: When all FDs are preserved in the decomposed relations, the integrity
constraints of the original relation remain enforceable.
• Desirability vs. Lossless: While lossless join is non-negotiable for correctness, dependency
preservation is highly desirable for practical maintenance and performance. Sometimes, designers
may face a trade-off where achieving both properties simultaneously is not possible.
Example:
Consider a relation R(A, B, C) with a functional dependency A → B.
• If we decompose R into:
o R1(A, B)
o R2(A, C)
The dependency A → B is preserved in R1 (since R1 contains both A and B). Additionally, if there are no
dependencies involving C that are lost, then the decomposition is dependency preserving.

Summary of the Two Properties

Property Description Key Condition

The common attributes must form


Original relation can be exactly reconstructed by
Lossless Join a superkey for at least one of the
joining the decomposed subrelations.
subrelations.

All functional dependencies of the original relation The set of FDs in subrelations
Dependency
are enforceable on the decomposed subrelations should cover all FDs of the original
Preservation
without needing to join them. relation.

Conclusion
• Lossless Join ensures that no data is lost or extraneous data added during decomposition. It is
essential for maintaining the correctness of the database.
• Dependency Preserving Decomposition allows all integrity constraints to be enforced locally within
each subrelation, making the database easier to maintain and efficient in operation.
Both properties are crucial in the normalization process to achieve a database design that is both efficient
and maintains data integrity.

Denormalization in Databases
Denormalization is an optimization technique used in databases to improve query performance, primarily
by reducing the number of joins required during querying. However, this technique comes at the cost of
adding redundancy to the database, which can lead to increased maintenance complexity.
What is Denormalization in Databases?
Denormalization involves adding redundant data to one or more tables after a database has been
normalized. This is done to avoid the performance overhead of joining multiple normalized tables during
query execution. It is not a reversal of normalization but an optimization technique applied to make
databases more efficient for read-heavy applications.
In a normalized database, the goal is to minimize redundancy by splitting data into smaller, related tables.
For example, in a normalized schema, a Courses table and a Teachers table might store the teacher's name
and ID separately. To get a list of courses with teacher names, a join would be required.
In a denormalized schema, some of this information may be combined into one table, such as having both
the course and teacher's information in the same table to speed up query execution, albeit at the cost of
redundancy.
Steps to Denormalization
1. Unnormalized Table:
In this stage, all data is stored in one large table with significant redundancy. For example, student
names and class information could appear multiple times.
2. Normalized Structure:
In a normalized schema, this data is split into smaller, related tables to minimize redundancy and
avoid update anomalies. Each table now represents a specific aspect, such as students or classes.
3. Denormalized Table:
To improve query performance, we can combine the related tables into a single table. This removes
the need for complex joins when fetching data, improving performance for read-heavy systems.
Denormalization vs. Normalization
• Normalization: Focuses on removing redundancy, improving data integrity, and ensuring efficient
storage. It splits data into logical, smaller tables and avoids duplicate entries.
• Denormalization: Introduces redundancy by combining related tables, which reduces the need for
complex joins, thus optimizing query performance for read-heavy operations.
Advantages of Denormalization
• Improved Query Performance: By reducing the need for joins, denormalization speeds up read
operations and query performance.
• Reduced Complexity: It simplifies queries and the overall database schema by consolidating data
into fewer tables.
• Easier Maintenance and Updates: Fewer tables make it easier to update the schema or modify
queries.
• Improved Read Performance: The database is optimized for quick read access, which is beneficial
for systems with high read-to-write ratios.
• Better Scalability: Systems that focus on read-heavy operations benefit from denormalization due
to reduced joins and simpler query execution plans.
Disadvantages of Denormalization
• Reduced Data Integrity: Redundant data increases the risk of data inconsistencies. Updates need to
be applied to all copies of duplicated data.
• Increased Complexity: While it can simplify queries, the introduction of redundant data can
complicate database management and schema changes.
• Increased Storage Requirements: Redundant data consumes more storage space, potentially
increasing costs and database size.
• Increased Update and Maintenance Complexity: When data changes, it must be updated in all
places where it appears, which can lead to issues with consistency if not properly managed.
• Limited Flexibility: Redundancy makes it more difficult to modify the database schema, as changes
must be reflected in all places where data is duplicated.
When Should You Use Denormalization?
Denormalization is most useful in systems where read performance is more critical than write performance.
It is ideal for:
• Read-heavy systems: Applications where queries are frequent and must be optimized for quick
retrieval.
• Reporting systems: Where complex queries or aggregations are frequently run and performance is
a priority.
• Data Warehouses: For systems focused on large-scale data analysis and querying.
How to Maintain Data Consistency in Denormalized Databases
To address the main drawback of denormalization—data inconsistency—the following techniques are
typically used:
• Triggers and Stored Procedures: These can be employed to ensure that when data is updated in
one table, all related copies are updated accordingly.
• Caching: Helps avoid repeated updates by storing frequently accessed data.
• Version Control: To keep track of changes and avoid discrepancies across redundant data entries.
Denormalization vs. Data Aggregation
• Denormalization: Involves adding redundant raw data by combining tables, often resulting in larger
table sizes.
• Data Aggregation: Focuses on reducing data size by summarizing it (e.g., computing averages or
totals). Aggregation does not increase redundancy, but rather reduces the total amount of data by
summarizing it.
Conclusion
Denormalization is a useful technique for improving the performance of read-heavy systems by reducing
the complexity of joins. However, it introduces redundancy, which can lead to data integrity issues and
increased maintenance complexity. Its application should be considered carefully, particularly in systems
where performance and scalability are prioritized over strict adherence to normalization principles.

Transaction and Concurrency Control in DBMS


1. What is a Transaction in DBMS?
A transaction in DBMS is a sequence of one or more database operations (such as INSERT, UPDATE, DELETE,
SELECT) that are performed as a single logical unit of work. It ensures data integrity and consistency, even
in the case of system failures.
Key Characteristics of a Transaction
• Atomicity: A transaction is all-or-nothing; either all operations are executed, or none are.
• Consistency: The database must move from one consistent state to another.
• Isolation: Concurrent transactions should not interfere with each other.
• Durability: Once committed, changes must be permanent, even if the system crashes.
Example of a Transaction:
START TRANSACTION;
UPDATE accounts SET balance = balance - 500 WHERE account_id = 1;
UPDATE accounts SET balance = balance + 500 WHERE account_id = 2;
COMMIT;
If any step fails, the entire transaction is rolled back.

2. ACID Properties of RDBMS


The ACID properties ensure the reliability of database transactions.
1. Atomicity:
o Ensures that a transaction is either fully executed or not executed at all.
o If any part of a transaction fails, the entire transaction is rolled back.
o Example: Transferring money between accounts; if debit is successful but credit fails, the
debit should be undone.
2. Consistency:
o Guarantees that a transaction moves the database from one valid state to another.
o Example: If an employee's salary is updated, the total payroll must reflect the change.
3. Isolation:
o Ensures that concurrent transactions do not interfere with each other.
o Implemented using different isolation levels (Read Uncommitted, Read Committed,
Repeatable Read, Serializable).
4. Durability:
o Once a transaction is committed, changes are permanent, even in case of a system crash.
o Uses techniques like database logs and checkpoints.

3. Transaction States in DBMS


A transaction goes through different states during its execution:
1. Active:
o The transaction starts and executes its operations.
2. Partially Committed:
o The transaction has executed all operations but is yet to be permanently stored.
3. Committed:
o The transaction is successfully completed, and changes are permanently saved.
4. Failed:
o The transaction encounters an error before reaching the commit stage.
5. Aborted (Rolled Back):
o If a failure occurs, the database restores to its previous state.
6. Terminated:
o The transaction has completed successfully or has been rolled back.

4. Advantages of Concurrency in DBMS


Concurrency control allows multiple transactions to execute simultaneously, improving system efficiency.
Advantages:
• Increases System Throughput: Multiple transactions can run at the same time, reducing waiting
time.
• Better Resource Utilization: CPU, memory, and disk are utilized efficiently.
• Faster Response Time: Reduces waiting time for users by executing queries in parallel.
• Ensures Data Consistency: Proper concurrency control prevents data corruption or inconsistency.
• Supports Multiple Users: Multiple users can perform operations simultaneously.

5. Lost Update Problem in Concurrency


Occurs when multiple transactions read and update the same data, leading to incorrect results.
Example:
Two transactions (T1 and T2) read the same balance of an account and update it independently, causing
one update to be lost.
Scenario:
Initial Balance: $1000
1. T1 reads balance = $1000
2. T2 reads balance = $1000
3. T1 updates balance to $1200
4. T2 updates balance to $1100 (ignores T1’s update, overwriting it)
Final balance = $1100 instead of $1200 (T1’s update is lost)

Solution: Use locking mechanisms or timestamp ordering.

6. Dirty Read Problem in Concurrency


Happens when a transaction reads uncommitted data from another transaction, which may later be rolled
back.
Example:
1. T1 updates balance to $1500 but does NOT commit.
2. T2 reads balance as $1500.
3. T1 rolls back (balance returns to original $1000).
4. T2 has read incorrect data ($1500), leading to inconsistencies.

Solution: Use Read Committed isolation level.

7. Unrepeatable Read Problem


Occurs when a transaction reads the same data multiple times but gets different values due to another
transaction’s update.
Example:
1. T1 reads balance as $1000.
2. T2 updates balance to $1200 and commits.
3. T1 reads balance again and finds $1200 (different from initial read).

Solution: Use Repeatable Read isolation level.

8. Phantom Read Problem


Occurs when a transaction reads a set of records twice, but new records appear in the second read due to
another transaction.
Example:
1. T1 executes:
SELECT * FROM Employees WHERE salary > 5000;
2. T2 inserts a new employee with a salary of 6000 and commits.
3. T1 executes the same query again and sees an extra record.

Solution: Use Serializable isolation level.

9. Why is Concurrency Control Needed?


Concurrency control ensures that transactions execute safely in a multi-user environment without violating
data integrity.
Key Reasons:
• Prevents Lost Updates: Ensures no changes are accidentally overwritten.
• Avoids Dirty Reads: Ensures transactions only see committed data.
• Ensures Consistency: Prevents data anomalies like phantom reads.
• Maintains Isolation: Transactions execute independently.
• Improves Performance: Optimizes parallel execution of transactions.
Concurrency Control Techniques:
• Lock-Based Protocols (Shared & Exclusive Locks)
• Timestamp Ordering
• Optimistic Concurrency Control
• Multiversion Concurrency Control (MVCC)

Schedules and Serializability in DBMS


1. What Are Schedules in DBMS?
A schedule in DBMS is the sequence in which operations (like READ, WRITE, COMMIT, ABORT) of multiple
transactions are executed.
Key Characteristics of Schedules:
• Determines the order of execution of transactions.
• Affects data consistency and concurrency control.
• Ensures transactions do not lead to inconsistencies or conflicts.
Types of Schedules:
1. Serial Schedule – Transactions execute one after another, without interleaving.
2. Non-Serial Schedule – Transactions are interleaved to improve performance.
3. Serializable Schedule – A non-serial schedule that produces the same output as a serial execution.
4. Recoverable Schedule – Ensures that transactions commit only after all dependent transactions
commit.
5. Cascadeless Schedule – Prevents cascading rollbacks by ensuring transactions read only committed
values.

2. Serial and Non-Serial Schedules in DBMS


Serial Schedule:
A serial schedule is one where only one transaction executes at a time, meaning no two transactions
overlap.
Example of a Serial Schedule:

Step Transaction Operation

1 T1 Read(A)

2 T1 Write(A)

3 T1 Read(B)

4 T1 Write(B)

5 T2 Read(A)

6 T2 Write(A)

Advantage: Guarantees consistency.


Disadvantage: Slow execution due to lack of concurrency.

Non-Serial Schedule:
A non-serial schedule allows transactions to execute concurrently, meaning their operations are
interleaved.
Example of a Non-Serial Schedule:

Step Transaction Operation

1 T1 Read(A)
Step Transaction Operation

2 T2 Read(B)

3 T1 Write(A)

4 T2 Write(B)

Advantage: Improves system performance by executing transactions in parallel.


Disadvantage: Can lead to concurrency issues such as lost updates, dirty reads, and unrepeatable
reads.

3. Serializable Schedules in DBMS


A serializable schedule is a non-serial schedule that ensures the final result is equivalent to some serial
execution.
Why Serializable Schedules?
• Ensures consistency and correctness of transactions.
• Allows concurrency while preserving a valid state.
Types of Serializability:
1. Conflict Serializability – Based on conflicting operations (read/write conflicts).
2. View Serializability – Based on equivalent views of transactions.

4. Conflict Serializability in DBMS


A schedule is conflict serializable if it can be transformed into a serial schedule by swapping non-
conflicting operations.
Conflicting Operations:
Two operations conflict if:
1. They belong to different transactions.
2. They operate on the same data item.
3. At least one operation is a WRITE.

Case Transaction 1 (T1) Transaction 2 (T2) Conflict?

1 READ(A) READ(A) No

2 READ(A) WRITE(A) Yes

3 WRITE(A) READ(A) Yes

4 WRITE(A) WRITE(A) Yes

Testing for Conflict Serializability:


• Construct a precedence graph.
• If the graph has a cycle, the schedule is not conflict serializable.
Example of Conflict Serializability Check:

Step Transaction Operation

1 T1 Read(A)

2 T2 Write(A)

3 T1 Write(A)

Step 1: Identify Conflicts


• T1 Read(A) vs. T2 Write(A) → T1 → T2
• T2 Write(A) vs. T1 Write(A) → T2 → T1
Step 2: Draw Precedence Graph
T1 → T2
T2 → T1 (Cycle detected)

Cycle found → NOT conflict serializable.

5. View Serializability in DBMS


A schedule is view serializable if:
1. Initial Reads are the same – Transactions read the same values as in a serial execution.
2. Writes preserve final values – The final write operation results in the same database state as a
serial execution.
Example of View Serializability:

Step Transaction Operation

1 T1 Read(A)

2 T2 Write(A)

3 T1 Write(A)

• If this schedule produces the same final state as a serial execution, it is view serializable.
• View serializability is weaker than conflict serializability.

6. Serializability Testing (Precedence Graph Method)


Steps to Check Serializability:
1. Identify Transactions and Operations
2. Find Conflicts (Read-Write, Write-Read, Write-Write on the same data)
3. Draw Precedence Graph:
o Nodes = Transactions
o Edges = Dependencies (T1 → T2 if T1’s operation must precede T2’s)
4. Check for Cycles:
o Cycle Found → Not Serializable
o No Cycle → Conflict Serializable

7. Numerical Example of Serializability


Consider the following schedule:

Step Transaction Operation

1 T1 Read(A)

2 T2 Read(A)

3 T1 Write(A)

4 T2 Write(A)

Step 1: Identify Conflicts


• T1 Read(A) vs. T2 Write(A) → T1 → T2
• T1 Write(A) vs. T2 Write(A) → T1 → T2
Step 2: Draw Precedence Graph
T1 → T2

No cycle → Conflict Serializable!

Conclusion
• Schedules control transaction execution order.
• Serializable schedules ensure correct transaction execution.
• Conflict serializability can be tested using precedence graphs.
• View serializability is less strict than conflict serializability.

Conflict Serializability & View Serializability in DBMS

1. Conflict Serializability
What is Conflict Serializability?
A schedule is Conflict Serializable if it can be transformed into a serial schedule by swapping non-
conflicting operations without changing the result.
Conflicting Operations:
Two operations conflict if:
1. They belong to different transactions.
2. They operate on the same data item.
3. At least one of them is a WRITE operation.
Types of Conflicts:

Conflict Type Example Conflict?

Read-Read (RR) T1: Read(A), T2: Read(A) No Conflict

Read-Write (RW) T1: Read(A), T2: Write(A) Conflict

Write-Read (WR) T1: Write(A), T2: Read(A) Conflict

Write-Write (WW) T1: Write(A), T2: Write(A) Conflict

How to Check for Conflict Serializability (Precedence Graph Method)?


Steps to Check Conflict Serializability:
1. Identify conflicting operations in the schedule.
2. Create a precedence graph:
o Nodes represent transactions.
o Edges (T1 → T2) indicate dependency (if T1 must be executed before T2).
3. Check for cycles:
o If a cycle exists → NOT Conflict Serializable.
o If no cycle → Conflict Serializable.
Example 1: Conflict Serializable Schedule

Step Transaction Operation

1 T1 Read(A)

2 T2 Read(A)

3 T1 Write(A)

4 T2 Write(A)

Step 1: Identify Conflicts

• T1 Read(A), T2 Read(A) → No conflict

• T1 Write(A), T2 Read(A) → T1 → T2

• T1 Write(A), T2 Write(A) → T1 → T2
Step 2: Draw Precedence Graph
T1 → T2
No cycle → Conflict Serializable!

Example 2: Not Conflict Serializable

Step Transaction Operation

1 T1 Read(A)

2 T2 Write(A)

3 T1 Write(A)

Step 1: Identify Conflicts

• T1 Read(A), T2 Write(A) → T1 → T2

• T1 Write(A), T2 Write(A) → T1 → T2

• T2 Write(A), T1 Write(A) → T2 → T1
Step 2: Draw Precedence Graph
T1 → T2
T2 → T1 (Cycle detected)

Cycle found → NOT Conflict Serializable!

2. View Serializability
What is View Serializability?
A schedule is View Serializable if it produces the same final result as a serial schedule, even if operations
cannot be swapped like in conflict serializability.
Conditions for View Serializability:
A schedule is view serializable if it satisfies:
1. Initial Reads are the same – Every transaction reads the same value as in a serial execution.
2. Final Writes are the same – The last write operation in both schedules must be the same.
3. Read-Write Order is Maintained – If a transaction T2 reads a value written by T1, this order must be
preserved in the equivalent serial schedule.

View serializability is more flexible than conflict serializability but harder to test.

3. Numerical Example: View Serializability


Example 1: View Serializable Schedule

Step Transaction Operation

1 T1 Read(A)

2 T2 Write(A)
Step Transaction Operation

3 T1 Write(A)

Check View Serializability:

1. Initial Read (Same as Serial Execution)


o Both schedules read A first.

2. Final Write (Same in Both Schedules)


o Both schedules end with T1 writing A.

3. Read-Write Order Maintained


o T1 reads A first, then T2 writes, then T1 writes.

✔ View Serializable! (Even though it was NOT conflict serializable)

Example 2: Not View Serializable

Step Transaction Operation

1 T1 Read(A)

2 T2 Write(A)

3 T1 Write(A)

4 T2 Read(A)

Check View Serializability:

1. Initial Read is Different


o In a serial order, T2 writes first, but T1 reads A before it.

2. Final Write is Different


o The final write in serial execution must match, but it differs.

3. Read-Write Order is Violated


o T2 writes before T1 writes, breaking consistency.

Not View Serializable!

4. Conflict Serializability vs. View Serializability

Feature Conflict Serializability View Serializability

Operations must be swappable to match


Concept Final result must match a serial execution.
serial execution.

Testing Compare reads/writes in serial and non-serial


Precedence Graph (Check for cycles).
Method schedules.
Feature Conflict Serializability View Serializability

Stricter (If a schedule is conflict serializable, More flexible (Some schedules are view
Flexibility
it's always view serializable). serializable but not conflict serializable).

Complexity Easier to check (Precedence Graph). Harder to verify manually.

5. Summary

Topic Key Points

Uses precedence graph to check if transactions can be reordered into a serial


Conflict Serializability
schedule.

View Serializability Ensures transactions produce the same final result as a serial execution.

Conflict vs. View Conflict serializability is stricter; view serializability is more flexible.

Cycle in Precedence
If a cycle exists → Not conflict serializable.
Graph

Conflict serializable if no cycle; view serializable if it produces the same final


Numerical Example
state as serial execution.

Recoverability and Concurrency Control in DBMS

1. Recoverability of Schedules
A schedule is recoverable if it ensures that transactions commit only after all transactions whose changes
they depend on have committed.
Types of Recoverability:
1. Recoverable Schedule
2. Cascadeless Schedule
3. Strict Schedule

1.1 Recoverable Schedule


A recoverable schedule ensures that if a transaction T2 reads a value written by T1, then T1 must commit
before T2 commits.
Example of a Recoverable Schedule:
Step Transaction Operation

1 T1 Write(A)

2 T2 Read(A)

3 T1 Commit

4 T2 Commit

T1 commits before T2, so this schedule is recoverable.

1.2 Non-Recoverable Schedule


A schedule is non-recoverable if a transaction commits before another transaction it depends on.
Example of a Non-Recoverable Schedule:

Step Transaction Operation

1 T1 Write(A)

2 T2 Read(A)

3 T2 Commit

4 T1 Abort

T2 commits before T1, but if T1 aborts, T2 is left with an inconsistent value!


This is non-recoverable and leads to inconsistencies.

2. Cascading Rollback & Cascadeless Schedule


2.1 Cascading Rollback
A cascading rollback occurs when one transaction’s failure causes multiple other transactions to roll back.
Example of Cascading Rollback:

Step Transaction Operation

1 T1 Write(A)

2 T2 Read(A)

3 T3 Read(A)

4 T1 Abort

T1 aborts, forcing T2 and T3 to roll back, causing a cascading effect.

2.2 Cascadeless Schedule


A cascadeless schedule prevents cascading rollbacks by ensuring that transactions only read committed
values.
Example of a Cascadeless Schedule:

Step Transaction Operation

1 T1 Write(A)

2 T1 Commit

3 T2 Read(A)

T2 reads A only after T1 commits → No cascading rollback!

3. Concurrency Control Techniques


Concurrency control ensures that multiple transactions can execute simultaneously without conflicts or
data inconsistencies.
Types of Concurrency Control Techniques:
1. Lock-Based Protocols (e.g., Two-Phase Locking)
2. Timestamp-Based Protocols
3. Validation-Based Protocols

4. Lock-Based Protocols
Locks restrict access to data items to ensure consistency.
4.1 Granularity of Locks
Granularity defines the level at which locks are applied in the database.
Levels of Lock Granularity:

Granularity Level Example Use Case

Database Level Entire DB Backups, admin operations

Table Level Single Table Prevents updates on entire tables

Page Level Single Page Blocks certain storage units

Row Level Single Row High concurrency, but overhead

Column Level Single Column Rare, used for analytics

Field Level Specific Field Most precise but slow

Smaller granularity (row, field) → More concurrency


Larger granularity (table, database) → Less concurrency, more control

5. Shared and Exclusive Locks


Locks can be shared (S) or exclusive (X):
Lock Type Operations Allowed Concurrency Level

Shared (S) Multiple transactions can READ, but no writes. High

Exclusive (X) Only one transaction can READ and WRITE. Low

Shared locks allow multiple reads.


Exclusive locks prevent all other access.
Lock Compatibility Table:

Requested Lock Existing Lock (S) Existing Lock (X)

Shared (S) Allowed Not Allowed

Exclusive (X) Not Allowed Not Allowed

6. Locking in Lock-Based Protocols


Lock-based protocols ensure serializability and prevent conflicts.
Types of Locking Protocols:
1. Two-Phase Locking (2PL)
2. Strict Two-Phase Locking (Strict 2PL)
3. Rigorous Two-Phase Locking (Rigorous 2PL)

6.1 Two-Phase Locking (2PL)


A transaction must follow two phases:
1. Growing Phase – Can acquire locks but cannot release them.
2. Shrinking Phase – Can release locks but cannot acquire new ones.

Ensures Conflict Serializability


May lead to Deadlocks

6.2 Strict Two-Phase Locking (Strict 2PL)


• All exclusive (X) locks are held until COMMIT/ABORT.
• Prevents cascading rollbacks.

Prevents dirty reads


Higher transaction wait times

6.3 Rigorous Two-Phase Locking (Rigorous 2PL)


• Both shared and exclusive locks are held until the transaction commits.

Stronger than Strict 2PL


More delays, but best for consistency
7. Summary Table

Topic Key Points

Recoverable Schedule Transactions commit in a proper order to avoid inconsistency.

Cascading Rollback One transaction failure can cause multiple rollbacks.

Cascadeless Schedule Transactions read only committed values to prevent cascading rollbacks.

Concurrency Control Ensures safe execution of concurrent transactions.

Granularity of Locks Locks can be applied at different levels (DB, Table, Row, etc.).

Shared vs. Exclusive Locks Shared (Read only, allows concurrency), Exclusive (Read/Write, blocks others).

Locking Protocols 2PL, Strict 2PL, and Rigorous 2PL ensure consistency and prevent conflicts.

Two-Phase Locking (2PL) and Timestamp-Based Protocols in


DBMS

1. Two-Phase Locking (2PL) Protocol


The Two-Phase Locking (2PL) protocol is a locking-based concurrency control method that ensures
serializability.
Rules of 2PL:
• Every transaction must follow two distinct phases:
1. Growing Phase: The transaction acquires locks but cannot release any.
2. Shrinking Phase: The transaction releases locks but cannot acquire new ones.

Example of 2PL:
Example Schedule:

Step Transaction Operation

1 T1 Lock-X(A)

2 T1 Write(A)

3 T2 Lock-X(B)

4 T2 Write(B)

5 T1 Unlock(A) → Shrinking Phase Starts


Step Transaction Operation

6 T2 Unlock(B) → Shrinking Phase Starts

This schedule follows 2PL since unlocking happens after acquiring all locks.

Types of 2PL Protocols:


1. Strict Two-Phase Locking (Strict 2PL)
• All exclusive (X) locks are held until COMMIT/ABORT.
• Prevents dirty reads and cascading rollbacks.

Better consistency
More waiting time

2. Rigorous Two-Phase Locking (Rigorous 2PL)


• Both shared (S) and exclusive (X) locks are held until the transaction commits.
• More restrictive than Strict 2PL but ensures higher consistency.

Stronger than Strict 2PL


Causes more blocking

2. Timestamp-Based Protocol
The Timestamp-Based Concurrency Control Protocol ensures that transactions execute in order of their
timestamps without locks.
How Timestamps Work?
• Each transaction Ti is assigned a timestamp TS(Ti) when it starts.
• The timestamp represents the order in which transactions should execute.
• Each data item Q has two timestamps:
1. Read Timestamp (RTS(Q)) – Last transaction that read Q.
2. Write Timestamp (WTS(Q)) – Last transaction that wrote Q.

Rules of Timestamp Ordering Protocol:


1. Read Rule
• If Ti wants to read Q:
o If TS(Ti)<WTS(Q) → Abort Ti (too late, Q has been modified).
o Otherwise, read Q and update RTS(Q).
2. Write Rule
• If TiT_i wants to write Q:
o If TS(Ti)<RTS(Q) → Abort Ti (too late, another transaction already read Q).
o If TS(Ti)<WTS(Q) → Abort Ti (Q has been written by a newer transaction).
o Otherwise, write Q and update WTS(Q).

Example of Timestamp-Based Protocol

Step Transaction Operation Read Timestamp (RTS) Write Timestamp (WTS)

1 T1 (TS=5) Read(A) RTS(A) = 5 WTS(A) = 0

2 T2 (TS=10) Write(A) RTS(A) = 5 WTS(A) = 10

3 T3 (TS=3) Write(A) Abort (TS=3 < WTS(A)=10) WTS(A) = 10

T3 is aborted because it tries to write an old value after a newer write by T2.

3. Timestamp-Based Locking (Ordering)


Timestamp Ordering ensures serializability by preventing conflicts based on timestamps.
How It Works:
1. Every transaction is assigned a unique timestamp when it starts.
2. Transactions execute in timestamp order (older transactions execute before newer ones).
3. If a transaction tries to access outdated data, it is aborted and restarted.

Ensures serializability without deadlocks!


Old transactions never wait for new ones.

4. Advantages & Disadvantages of Timestamp-Based Protocols

Advantages:

✔ Deadlock-Free – No waiting, transactions execute based on timestamps.


✔ No Locks Required – Avoids issues like lock contention.
✔ Ensures Serializability – Transactions execute in a strict order.

Disadvantages:

Many Transactions Abort – Older transactions get aborted often.


Wasted Computation – Transactions may execute partially but get rolled back.
Not Suitable for High Contention Databases – Frequent restarts reduce performance.

5. Summary Table
Concept Key Points

Ensures serializability by dividing transactions into a growing and


Two-Phase Locking (2PL)
shrinking phase.

Strict 2PL Holds exclusive locks until commit, preventing cascading rollbacks.

Rigorous 2PL Holds all locks until commit, ensuring maximum consistency.

Timestamp-Based Protocol Transactions execute based on timestamps instead of locks.

Ensures order based on transaction timestamps to maintain


Timestamp Ordering
serializability.

Advantages of Timestamp
No deadlocks, no locks required, guarantees serializability.
Protocols

Disadvantages of Timestamp High abort rate, wasted computation, not ideal for high-contention
Protocols databases.

Deadlocks in DBMS: Causes, Handling, and Recovery

1. Necessary Conditions for Deadlock


A deadlock occurs when two or more transactions wait indefinitely for resources held by each other.

Deadlock happens only if these four conditions hold simultaneously:


1. Mutual Exclusion
• At least one resource is held in an exclusive mode (locked by one transaction only).
• Other transactions must wait to access it.
2. Hold and Wait
• A transaction holds at least one resource and is waiting for additional resources held by others.
3. No Preemption
• A resource cannot be forcibly taken from a transaction; it must be released voluntarily.
4. Circular Wait
• A set of transactions {T1, T2, …, Tn} exist in a cycle where each transaction waits for a resource
held by the next.

All four conditions must be true for a deadlock to occur!

2. Methods for Handling Deadlocks in DBMS


DBMS uses three main approaches to handle deadlocks:
1. Deadlock Prevention (Avoiding the occurrence of deadlocks)
Ensures that at least one necessary condition does not hold.
Techniques:

Prevention of Hold & Wait – A transaction must request all resources at once.
Prevention of Circular Wait – Order transactions to prevent cycles.
Prevention of No Preemption – If a transaction requests a resource and is denied, it must release all
held resources and restart.

2. Deadlock Avoidance (Allow transactions to execute but avoid circular waits)

Uses additional information (timestamps, priority) to avoid deadlocks.


Techniques:

Wait-Die Scheme – Older transactions can wait, but younger transactions are aborted.
Wound-Wait Scheme – Older transactions can force younger transactions to abort and restart.

3. Deadlock Detection and Recovery (Allow deadlocks and resolve them when detected)

The system detects deadlocks using a Wait-for Graph and then recovers by aborting transactions.

3. Deadlock Detection and Recovery


If deadlock prevention and avoidance fail, DBMS must detect and recover from deadlocks.
3.1 Detection: Wait-for Graph (WFG)

A Wait-for Graph (WFG) is a directed graph where:


• Transactions are nodes.
• Edges (T1 → T2) indicate T1 is waiting for T2 to release a resource.
• A cycle in the graph indicates a deadlock.

Example of Wait-for Graph:


Before Deadlock Detection:
sql
CopyEdit

T1 → T2 → T3 → T1 (Cycle exists )

A cycle in the graph means deadlock is present.


After Recovery (T3 is aborted):
pgsql
CopyEdit

T1 → T2 (No cycle )
Deadlock resolved!

3.2 Recovery from Deadlock


Once a deadlock is detected, DBMS must recover by breaking the cycle.
Techniques for Recovery:

Transaction Rollback – Abort one or more transactions to remove deadlock.


Transaction Restart – Restart aborted transactions later.
Resource Preemption – Forcefully take a resource from a transaction and assign it to another.

4. Summary Table

Concept Key Points

Necessary Conditions for Deadlock Mutual Exclusion, Hold & Wait, No Preemption, Circular Wait.

Deadlock Prevention Prevents at least one of the four conditions.

Deadlock Avoidance Uses Wait-Die and Wound-Wait schemes.

Deadlock Detection Uses Wait-for Graph (WFG) to detect cycles.

Deadlock Recovery Resolves deadlock by aborting or rolling back transactions.

You might also like