DBMS
DBMS
4 Data Models
Definition:
A data model is a collection of conceptual tools used to describe:
• Data,
• Relationships among data,
• The semantics (meaning) of data, and
• Constraints on the data.
It acts as a blueprint below the database, providing a structured way to describe the design of a database
at physical, logical, and view levels.
Advantages:
1. Structural Independence – Changes in database structure don’t affect overall application.
2. Conceptual Simplicity – Easy to design and understand.
3. Powerful Query Capability – SQL can be used to retrieve and manipulate data.
4. Easy Maintenance – Logical design allows better maintainability.
Disadvantages:
1. Requires powerful hardware and large storage.
2. May result in slower processing.
3. Poor design leads to inefficiency.
Advantages:
1. Simple & Intuitive – Easy to draw and understand.
2. Effective Communication Tool – Helps developers and stakeholders discuss database design.
3. Integrates Well – Can be easily converted into relational model.
4. Flexible Design – Easy to update and adapt.
Disadvantages:
1. Possible Information Loss – During abstraction some data may be omitted.
2. Limited Relationships – Can't capture very complex associations.
3. No Data Manipulation Representation – Only describes structure, not behavior.
4. Lack of Standard Notation – No universal format.
Advantages:
1. Enriched Modeling – Can closely mimic real-world entities.
2. Reusability – Object-oriented features enable reusable code.
3. Schema Evolution Support – Flexible for changes over time.
4. Performance – Can outperform traditional models in certain applications.
Disadvantages:
1. No Universal Model – Lacks a standard theoretical foundation.
2. Less Experience – Not as widely adopted as relational model.
3. Complex Design – More features can make it harder to design.
Advantages:
1. Schema Flexibility – No fixed schema required.
2. Portable and Adaptable – Can be used across different platforms.
3. Useful for Irregular Data – Ideal for heterogeneous data sets.
Disadvantage:
• Inefficient Queries – Query execution may be slower compared to structured models.
5. Hierarchical Model:
• Organizes data in a tree-like structure with parent-child relationships.
• Each parent can have multiple children, but each child has only one parent.
Example:
Company
|
Department
|
Employee
Advantages:
1. Simple and fast for 1:N relationships
2. Easy to navigate using predefined paths
3. Data security is better due to fixed structure
Disadvantages:
1. Cannot handle many-to-many relationships
2. Insertion and deletion are complex
3. Changes in structure affect the whole system
6. Network Model:
• Organizes data using graph structure.
• Entities are represented as nodes; relationships as edges.
• A child can have multiple parents, supporting many-to-many relationships.
Example:
Student
/ \
Subject Project
Advantages:
1. Handles complex relationships
2. More flexible than hierarchical model
3. Easy data access via pointers
Disadvantages:
1. Complex structure and implementation
2. Difficult to manage and maintain
3. Requires knowledge of navigational commands
Conclusion:
Data models are essential tools in the design and implementation of databases. Depending on the use
case—structured, semi-structured, or object-oriented—different models offer different trade-offs in terms
of performance, flexibility, and complexity.
1. Introduction to Relational Database
A Relational Database (RDB) is a type of database that stores data in the form of tables. It follows the
relational model proposed by E.F. Codd in 1970, which represents data as collections of relations.
Each relation is stored as a table composed of rows (tuples) and columns (attributes). These relations are
linked through keys, allowing retrieval of meaningful and connected information.
Definition:
A relational database is a collection of interrelated tables with unique names. It ensures data consistency,
integrity, and avoids redundancy through its structure.
i. Student Table
RollNo Name Phone
001 Ram 9876543210
002 Shyam 9123456789
003 CCC 9900112233
• Each row contains details of one student.
• RollNo is the unique identifier for each student.
Relation (Table)
A table with rows and columns that represents data about a particular entity.
Example: Student, Course, and Admission are relations.
Attribute (Column)
Each column represents a property or characteristic of the entity.
Example: RollNo, Name, and Phone are attributes of Student.
Relation Schema
It defines the structure of a relation.
Example:
Student(RollNo INT, Name VARCHAR, Phone BIGINT)
Relation Instance
The actual data stored in a table at a point in time.
It changes over time as we insert/delete data.
Domain
The set of allowed values for an attribute.
Domain of Phone is 10-digit mobile numbers.
Atomicity
Each attribute holds atomic (indivisible) values.
Phone = 9876543210 is atomic; a list of phones is not.
NULL
A special marker indicating unknown or missing value.
NULL ≠ 0 or blank. Example: if a student’s phone is unknown, it's NULL.
Super Key
Set of one or more attributes that uniquely identify each tuple.
Example: {RollNo}, {RollNo, Name}, {RollNo, Phone}
Candidate Key
Minimal super key. No subset of this key can uniquely identify a tuple.
Example: {RollNo}, {Phone} (if unique)
Primary Key
A candidate key chosen to uniquely identify tuples. It cannot be NULL.
In Student table, RollNo is primary key.
Alternate Key
Candidate keys not chosen as primary key.
If both RollNo and Phone are unique, and RollNo is chosen, then Phone is alternate key.
Foreign Key
An attribute in one table that refers to primary key of another table.
In Admission, RollNo is a foreign key referencing Student.RollNo.
Parent Table → Student
Child Table → Admission
8. Diagrammatic Representation
[Student] [Course]
RollNo ─────────┐ CourseID
Name │ Name
Phone └──> Credits
[Admission]
RollNo
CourseID
Conclusion
A Relational Database is the backbone of modern information systems. It allows storing and managing data
efficiently using structured tables and defined relationships. By using concepts such as primary keys,
foreign keys, and normalization, relational databases provide high performance, consistency, and ease of
access.
Relational models are implemented in systems like MySQL, Oracle, PostgreSQL, and SQL Server, and form
the basis of almost all business, academic, and web applications today.
B+ Tree Index Files
Introduction:
A B+ Tree is a balanced tree data structure that maintains sorted data and allows search, sequential
access, insertions, and deletions in logarithmic time. It is widely used in database systems and file systems
for indexing large datasets.
Unlike B-Trees, all actual data records are stored at the leaf level in a B+ Tree. Internal nodes contain only
keys to direct the search.
Characteristics of B+ Tree:
1. Balanced tree: All leaves are at the same level.
2. Dynamic growth/shrinkage: Adapts with insertions and deletions.
3. Minimum 50% occupancy in non-root nodes.
4. Only leaf nodes contain actual data entries.
5. Sequential access is fast due to linked leaves.
6. Supports range queries efficiently.
Insertion Algorithm:
Step 1: Find the appropriate leaf node L to insert the key.
Step 2: Insert the key:
• If L has space, insert directly.
• Else, split L:
o Create new node L2
o Redistribute keys between L and L2
o Copy middle key up to the parent
o Insert pointer to L2 in the parent
Step 3: If the parent overflows, repeat the split recursively up to the root.
Step 4: If the root splits, the tree's height increases.
Example:
Insert keys: 30, 31, 23, 32, 22, 28, 24, 29 (max 4 keys per node)
1. Start with 30, 31, 23, 32 ➝ sorted ➝ no split yet
2. Insert 22 ➝ causes overflow ➝ split ➝ promote 30
3. Insert 28, 24, 29 ➝ causes new split ➝ promote 24
(Show step-wise B+ Tree diagrams here.)
Deletion Algorithm:
Step 1: Find the leaf node L containing the key.
Step 2: Delete the key from L:
• If L is still at least half-full, done.
• If underflow occurs:
o Try redistributing keys from a sibling.
o If redistribution not possible, merge with sibling.
Step 3: If merge happens, delete parent key pointing to L or sibling.
Step 4: If root has only one child, reduce height.
Example:
• Keys: 2, 3, 5, 7, 11, 17, 19, 23, 29, 31 (max 3 keys per node)
• Insert: 9, 10, 8
• Delete: 23 ➝ merge leaves
• Delete: 19 ➝ rearrange parent
Diagram Summary:
You can draw these in your answer sheet:
• Structure of a B+ Tree node (with P1, K1, P2, ..., Pn format)
• Linked leaf nodes
• Sample tree before/after insertion and deletion
Conclusion:
B+ Trees are crucial for database indexing due to their efficiency, balance, and support for both point and
range queries. They ensure faster retrieval of data even in massive datasets and adapt well to frequent
insertions and deletions, making them suitable for real-world database applications.
B-Tree Index Files
[AU: Dec-12,14, May-08 | Marks: 16]
Definition:
• A B-tree is a self-balancing search tree used primarily for indexing large amounts of data stored on
disks.
• It is a multi-way search tree (not binary), meaning each node can have more than two children.
• Search-key values appear only once in the tree, unlike B+ trees where they may appear in internal
and leaf nodes.
Structure of B-Tree:
• Order m of a B-tree defines the maximum number of children a node can have.
• Each node contains at most m – 1 keys and m pointers (children).
• A node with k keys has k+1 children.
• All leaves appear at the same level (height-balanced).
• Keys are stored in both internal and leaf nodes.
Node Properties:
• Root: Minimum 1 key.
• Internal/leaf nodes (except root): Minimum ⌈m/2⌉ - 1 keys.
• Maximum m - 1 keys in any node.
Operations on B-Tree:
1. Insertion Algorithm:
Step 1: Start from root, find the appropriate leaf node to insert the key.
Step 2: Insert the key in sorted order.
Step 3:
• If the node has space (less than m-1 keys), insert directly.
• Else, split the node and promote the middle key to the parent.
Step 4:
• Repeat recursively.
• If the root splits, create a new root – tree grows in height.
Example:
Construct a B-tree of order 3 for the following values:
20, 10, 30, 15, 12, 40, 50
(Order 3 → Max 2 keys per node)
Step-by-step Construction:
1. Insert 20, 10 → Node = [10, 20]
2. Insert 30 → [10, 20, 30] → Overflow → split →
• Middle key 20 goes up,
• Left child = [10], Right child = [30]
• Root = [20]
3. Insert 15 → goes to [10] → [10, 15]
4. Insert 12 → [10, 12, 15] → Overflow → split →
• Middle key 12 goes up
• Root = [12, 20], children = [10], [15], [30]
5. Insert 40 → goes to [30] → [30, 40]
6. Insert 50 → [30, 40, 50] → Overflow → split →
• Middle key 40 goes up
• Root = [12, 20, 40] → Overflow →
• Middle key 20 goes up → new root = [20]
• Tree now has:
o Root = [20]
o Left child = [12], with children [10], [15]
o Right child = [40], with children [30], [50]
Final B-Tree:
[20]
/ \
[12] [40]
/ \ / \
[10] [15] [30] [50]
Merits of B-Tree:
1. Efficient searching due to balanced tree structure.
2. No redundant storage of key values.
3. Minimizes disk I/O as tree height remains low.
4. Good for both random and sequential access.
5. Supports dynamic insertions and deletions without re-balancing entire tree.
Demerits of B-Tree:
1. More complex to implement than binary trees.
2. Slower range queries compared to B+ trees.
3. Data stored at multiple levels, so disk access is not uniform.
4. Requires more pointer manipulations during insertions and deletions.
Conclusion:
B-Tree is a dynamic, balanced tree structure ideal for indexing in disk-based storage systems. It provides
efficient operations for insertion, deletion, and search, and is widely used in databases and file systems
where large datasets are stored.