DBMS Mid2
DBMS Mid2
The basic form of an SQL query follows a structured syntax to retrieve or manipulate data in a database. The
general structure of a SELECT query is:
Example:
SELECT name, age, city FROM students WHERE age > 18 GROUP BY city HAVING COUNT(*) > 5
ORDER BY name ASC;
This query selects the name, age, and city of students who are older than 18, groups them by city, filters groups with
more than 5 students, and sorts them in ascending order by name.
● This query calculates the total price of each product ordered by multiplying the quantity with the price per
unit.
● Selects employees who either work in HR or have a salary greater than 50,000.
WHERE (age > 30 AND department = 'Finance') OR NOT salary < 40000;
These queries show how arithmetic and logical operators are applied in SQL to filter and manipulate data efficiently
SQL Functions:
SQL functions are built-in methods used to perform operations on data in a database. These functions can be
categorized into Aggregate Functions, String Functions, Date Functions, Numeric Functions, and Conversion
Functions.
1. Aggregate Functions
Function Description
Example:
2. String Functions
Function Description
Example:
3. Date Functions
Function Description
Example:
4. Numeric Functions
Function Description
Example:
5. Conversion Functions
Example:
3.Create tables using unique, primary key, check and foreign key constraints.
age INT CHECK (age >= 18), -- Check Constraint (age must be 18 or above)
);
credits INT CHECK (credits BETWEEN 1 AND 10) -- Check Constraint (valid credit range)
);
);
Constraint Description
A nested query (also known as a subquery) is a query that is placed inside another SQL query. The inner query is
executed first, and its result is used by the outer query.
A simple nested query runs independently, meaning the inner query executes first, and then its result is used by
the outer query.
● The inner query SELECT MAX(marks) FROM students; returns the highest marks.
● The outer query then selects students whose marks match the result.
A correlated nested query is dependent on the outer query, meaning the inner query runs for each row processed
by the outer query.
Example: Find employees earning more than the average salary of their department
WHERE salary > (SELECT AVG(salary) FROM employees e2 WHERE e1.department_id = e2.department_id);
● The inner query calculates the average salary for the specific department (e2.department_id).
● The outer query checks which employees (e1.salary) earn more than that department’s average.
Example: Find students who have scored above the average marks of their class
WHERE marks > (SELECT AVG(marks) FROM students s2 WHERE s1.class_id = s2.class_id);
● The inner query calculates the average marks for the student's class.
● The outer query filters students who scored above their class average.
Execution Inner query runs once before the outer Inner query runs for each row of the outer query
query
Use Case When filtering based on a single value When filtering based on row-wise conditions
from inner query
Conclusion
● Use simple nested queries when you need a single value from the inner query.
● Use correlated nested queries when the inner query depends on each row of the outer query
Set operators in SQL are used to combine the results of two or more SELECT queries. The major set operators
in SQL are:
1. UNION
2. UNION ALL
3. INTERSECT
4. EXCEPT (or MINUS in some databases)
1. UNION
The UNION operator combines results from two queries and removes duplicate rows.
● If a person is both a student and a teacher, their name appears only once.
2. UNION ALL
The UNION ALL operator works like UNION, but does not remove duplicates.
SELECT name FROM students UNION ALL SELECT name FROM teachers;
3. INTERSECT
The INTERSECT operator returns only common rows between two queries.
The EXCEPT operator returns rows from the first query that are not in the second query.
Aggregate functions in SQL perform calculations on a set of values and return a single result. These functions are
commonly used in SELECT statements, often with the GROUP BY clause.
The COUNT() function returns the total number of rows that match a condition.
● Groups employees by department and calculates average salary for each department.
SQL provides powerful clauses like GROUP BY, HAVING, and ORDER BY to organize and filter query results.
1. GROUP BY Clause
The GROUP BY clause is used to group rows with the same values in one or more columns and perform
aggregate functions like COUNT(), SUM(), AVG(), etc.
FROM table_name
GROUP BY column_name;
FROM employees
GROUP BY department;
● Groups employees by department and counts the number of employees in each department.
2. HAVING Clause
The HAVING clause is used to filter grouped results based on an aggregate function. It is similar to WHERE, but
WHERE cannot be used with aggregate functions.
FROM table_name
GROUP BY column_name
HAVING condition;
FROM employees
GROUP BY department
3. ORDER BY Clause
The ORDER BY clause is used to sort the result set in ascending (ASC) or descending (DESC) order.
SELECT column_name
FROM table_name
FROM employees
Example: Sort employees first by department (A-Z), then by salary (high to low)
SELECT name, department, salary FROM employees ORDER BY department ASC, salary DESC;
Example: Find departments with more than 5 employees and sort them by total employees (descending)
GROUP BY department
Clause Purpose
ORDER BY Sorts the result set in ascending (ASC) or descending (DESC) order.
Joins in SQL are used to combine rows from two or more tables based on a related column between them. They
help retrieve meaningful information from multiple tables efficiently.
Types of Joins
1. INNER JOIN
○ Returns only the matching records from both tables.
○ Non-matching rows are excluded.
Syntax:
SELECT column_names FROM table1 INNER JOIN table2 ON table1.common_column = table2.common_column;
Example:
SELECT employees.name, departments.dept_name FROM employees INNER JOIN departments ON
employees.dept_id = departments.dept_id;
Syntax:
SELECT column_names FROM table1 LEFT JOIN table2 ON table1.common_column = table2.common_column;
Example:
SELECT employees.name, departments.dept_name FROM employees LEFT JOIN departments
ON employees.dept_id = departments.dept_id;
3. RIGHT JOIN (RIGHT OUTER JOIN)
○ Returns all records from the right table and matching records from the left table.
○ If no match is found, NULL is returned for columns from the left table.
Syntax:
SELECT column_names FROM table1 RIGHT JOIN table2 ON table1.common_column = table2.common_column;
Example:
SELECT employees.name, departments.dept_name FROM employees RIGHT JOIN departments
ON employees.dept_id = departments.dept_id;
Syntax:
SELECT column_names FROM table1 FULL JOIN table2 ON table1.common_column = table2.common_column;
Example:
SELECT employees.name, departments.dept_name FROM employees FULL JOIN departments ON
employees.dept_id = departments.dept_id;
5. CROSS JOIN
○ Produces a Cartesian product, where each row from the first table is combined with every row from
the second table.
Syntax:
SELECT column_names FROM table1 CROSS JOIN table2;
Example:
SELECT employees.name, departments.dept_name FROM employees CROSS JOIN departments;
6. SELF JOIN
○ A table joins itself to compare rows within the same table.
○ Uses an alias to differentiate table instances.
Syntax:
SELECT A.column_name, B.column_name FROM table_name A, table_name B WHERE condition;
Example:
SELECT A.employee_name, B.employee_name AS Manager FROM employees A, employees B
A NULL value in SQL represents missing or unknown data. While NULL values help handle incomplete data, they
introduce several challenges:
If bonus is NULL, total_income will also be NULL, which can cause incorrect calculations.
b) Issues in Comparisons
If some employees have NULL salaries, they are excluded from the calculation, potentially skewing results.
● When using joins, NULL values in key columns can prevent proper matching.
Outer joins (LEFT, RIGHT, FULL) return unmatched rows with NULL values from one or both tables. While useful,
they can introduce issues:
● When there is no match, columns from the unmatched table return NULL.
c) Performance Overhead
● FULL OUTER JOIN can be slow on large datasets because it returns all records from both tables, filling
unmatched rows with NULL.
● When dealing with NULLs in outer joins, additional conditions (COALESCE(), CASE, IS NULL) are often
needed to handle missing values properly.
Example: SELECT name, COALESCE(dept_name, 'No Department') AS department
A view in SQL is a virtual table based on the result of a SQL query. It does not store data physically but provides a
stored query that can be executed when needed. Views help in simplifying complex queries, improving security, and
maintaining data abstraction.
Creating a View
Syntax
CREATE VIEW view_name AS SELECT column1, column2, … FROM table_name WHERE condition;
Example
We create a view that shows only the employees working in the IT department:
Types of Views
1. Simple Views
Example:
CREATE VIEW High_Salary AS SELECT name, salary FROM employees WHERE salary > 55000;
2. Complex Views
Example:
CREATE VIEW Employee_Department AS
3. Inline Views
Example:
SELECT AVG(salary) FROM
4. Materialized Views
● Unlike regular views, materialized views store data physically for better performance.
● Used for large queries that don’t need frequent updates.
● Requires manual refresh.
Modifying Views
Updating a View
Example: CREATE OR REPLACE VIEW IT_Employees AS SELECT emp_id, name, salary, department FROM
employees WHERE department = 'IT';
Deleting a View
Advantages of Views
1. Data Security
Example:
CREATE VIEW SalesReport AS SELECT sales.date, customers.name, sales.amount FROM sales
3. Data Consistency
4. Reduces Redundancy
Limitations of Views
1. Performance Issues
○ Since views are virtual tables, each query runs on the base table.
○ For better performance, materialized views are preferred.
2. Cannot Modify Certain Views
○ Complex views with JOIN, GROUP BY, or DISTINCT cannot be updated directly.
Example: UPDATE Employee_Department SET salary = 70000 WHERE name = 'Alice'; -- This might fail if the view
has a JOIN
Views are a powerful SQL feature that enhance security, simplify queries, and improve data abstraction. While they
offer advantages in query efficiency and user access control, they also come with limitations like update restrictions
and performance concerns. Understanding how and when to use views is essential for efficient database
management.
8.Write SQL queries using the following relational database. Sailors (sid:integer,
sname:string, rating:integer, age:real),Boats (bid:integer, bname:string,
color:string),Reserves (sid:integer, bid:integer, day:date)
a)
SELECT AVG(age) AS avg_age FROM Sailors
WHERE rating = 10;
● This query calculates the average age of sailors who have a rating of 10.
b)
SELECT sname, age
FROM Sailors
WHERE age = (SELECT MAX(age) FROM Sailors);
● This query first finds the maximum age from the Sailors table and retrieves the sname and age of the
sailor(s) with that age.
c)
SELECT sid, sname, rating FROM Sailors
WHERE rating = (SELECT MAX(rating) FROM Sailors);
● This query finds the highest rating using MAX(rating) and retrieves all sailors who have that rating.
d)
SELECT sname FROM Sailors
WHERE age > (SELECT MAX(age) FROM Sailors WHERE rating = 10);
● The subquery finds the maximum age among sailors with a rating of 10.
● The outer query retrieves the names of sailors whose age is greater than that.
a)
SELECT DISTINCT s.sname FROM Sailors s
JOIN Reserves r ON s.sid = r.sid WHERE r.bid = 100;
● This query finds sailors (sname) who have reserved the boat with bid = 100 using a JOIN between Sailors
and Reserves.
b)
SELECT DISTINCT s.sname FROM Sailors s
JOIN Reserves r ON s.sid = r.sid
JOIN Boats b ON r.bid = b.bid
WHERE b.color IN ('Red', 'Green');
● This query finds sailors (sname) who have reserved boats that are either red or green, using a JOIN
between Sailors, Reserves, and Boats.
c)
SELECT DISTINCT b.color FROM Boats b
JOIN Reserves r ON b.bid = r.bid
JOIN Sailors s ON r.sid = s.sid
WHERE s.sname = 'Anil';
● This query retrieves all distinct colors of boats reserved by the sailor Anil.
d)
SELECT DISTINCT s.sid FROM Sailors s WHERE s.age > 20 AND s.sid NOT IN (
SELECT r.sid FROM Reserves r JOIN Boats b ON r.bid = b.bid
WHERE b.color = 'Red');
● This query selects sailors (sid) whose age > 20 and who have not reserved a red boat.
● The subquery retrieves sid of sailors who have reserved a red boat, and NOT IN ensures exclusion.
e)
SELECT s.sname FROM Sailors s WHERE NOT EXISTS (
SELECT b.bid
FROM Boats b
WHERE NOT EXISTS (
SELECT r.bid
FROM Reserves r
WHERE r.bid = b.bid AND r.sid = s.sid ) );
● This query uses double NOT EXISTS to find sailors who have reserved every boat in the Boats table.
● The inner query ensures that for each boat in Boats, there is at least one reservation by the sailor.
a) Find the names of students who are enrolled in a class taught by Harish.
b) Find the age of oldest student.
c) Find the names of students enrolled in History.
d) Find the department of faculty whose name starts with ‘s’.
e) Find the names of students who are enrolled in a class and age is over 17 taught by
Harish.
a)
SELECT DISTINCT s.sname FROM Students s
JOIN Enrolled e ON s.cid = e.cid
JOIN Faculty f ON e.fid = f.fid WHERE f.fname = 'Harish';
● This query finds students (sname) who are enrolled in a class taught by Harish using a JOIN between
Students, Enrolled, and Faculty.
b)
SELECT MAX(age) AS oldest_age FROM Students;
● This query retrieves the maximum age from the Students table.
c)
SELECT DISTINCT s.sname FROM Students s
JOIN Enrolled e ON s.cid = e.cid
WHERE e.cname = 'History';
● This query finds students (sname) who are enrolled in History by checking the cname column in Enrolled.
d)
SELECT DISTINCT dept FROM Faculty WHERE fname LIKE 'S%';
● This query retrieves departments of faculty members whose names start with ‘S’, using LIKE 'S%'.
e)
SELECT DISTINCT s.sname FROM Students s
JOIN Enrolled e ON s.cid = e.cid
JOIN Faculty f ON e.fid = f.fid
WHERE f.fname = 'Harish' AND s.age > 17;
● This query retrieves students (sname) who are: Enrolled in a class ;Age is greater than 17 ;Taught by
Harish
i)
SELECT DISTINCT b.color FROM Boats b
JOIN Reserves r ON b.bid = r.bid JOIN Sailors s ON r.sid = s.sid
WHERE s.sname = 'Anil';
● This query finds distinct colors of boats reserved by the sailor Anil using JOINs.
ii)
SELECT sname FROM Sailors WHERE age > 20;
● This query retrieves sailors’ names whose age is greater than 20.
iii)
SELECT DISTINCT s.sname FROM Sailors s JOIN Reserves r ON s.sid = r.sid
JOIN Boats b ON r.bid = b.bid WHERE b.color IN ('Red', 'Green');
● This query retrieves sailors (sname) who have reserved either a red or green boat.
iv)
SELECT DISTINCT s.sname FROM Sailors s WHERE s.sid IN (
SELECT r1.sid FROM Reserves r1 JOIN Boats b1 ON r1.bid = b1.bid
WHERE b1.color = 'Red')
AND s.sid IN ( SELECT r2.sid FROM Reserves r2 JOIN Boats b2 ON r2.bid = b2.bid WHERE b2.color = 'Green');
● This query ensures that a sailor has reserved both a Red boat AND a Green boat by using two
subqueries.
v)
SELECT s.sname FROM Sailors s
WHERE NOT EXISTS ( SELECT b.bid FROM Boats b WHERE NOT EXISTS ( SELECT r.bid
FROM Reserves r
WHERE r.bid = b.bid AND r.sid = s.sid )
);
● This query finds sailors who have reserved every boat in the Boats table using double NOT EXISTS.
UNIT 4
1.Explain about the problems caused by Redundancy
Redundancy in databases and information systems refers to the unnecessary duplication of data, which can lead to
several issues in data management and integrity. Some of the key problems caused by redundancy include:
Solution: Normalization
To avoid redundancy and its associated problems, database normalization techniques are used. Normalization
involves organizing data into related tables to minimize duplication while ensuring data integrity and efficiency.
By reducing redundancy, organizations can improve data consistency, enhance performance, and lower storage and
maintenance costs.
In database design, decomposition refers to breaking a large relation (table) into smaller relations to remove
redundancy and ensure normalization. While decomposition helps in eliminating redundancy and anomalies, it can
also introduce several challenges:
● A decomposition must be lossless to ensure that no data is lost when splitting relations.
● If the decomposition is not lossless, it may become impossible to reconstruct the original relation correctly.
● Solution: Ensure that the common attribute (joining key) in decomposed relations maintains sufficient
information to reconstruct the original table.
● Functional dependencies define relationships between attributes in a table. When a table is decomposed,
some dependencies may be lost.
● If dependencies are not preserved, queries may require complex joins to retrieve missing data.
● Solution: Choose a decomposition that preserves all functional dependencies to maintain database integrity.
● Decomposition may require frequent join operations to retrieve data, leading to performance issues.
● A poorly designed decomposition can slow down queries, especially for large datasets.
● Solution: Ensure decomposition balances normalization and efficiency by minimizing unnecessary joins.
● Splitting a relation into multiple tables may result in loss of context or meaning.
● Users may find it difficult to understand how different decomposed tables relate to each other.
● Solution: Maintain clear relationships between decomposed tables and ensure proper documentation.
Conclusion
Decomposition is necessary for database normalization but must be performed carefully to avoid issues like loss of
data, dependency violations, and performance degradation. A well-structured database design ensures that
decomposition achieves both efficiency and data integrity while minimizing unnecessary complexity.
A functional dependency (FD) is a constraint between two sets of attributes in a relational database. It describes
how the value of one attribute (or a set of attributes) determines the value of another attribute.
Notation
Where:
Example:
In a STUDENT table, if Student_ID uniquely determines Student_Name, then: {Student_ID}-->{Student_Name}
This means that for each Student_ID, there is only one Student_Name.
Conclusion
Functional dependencies play a vital role in relational database design by ensuring data consistency, reducing
redundancy, and improving efficiency. Understanding FDs helps in normalization and structuring optimized
database schemas.
In database normalization, different normal forms (NF) help reduce redundancy and improve data integrity. The
main normal forms are:
● It is already in 1NF.
● No partial dependency exists (i.e., non-key attributes should depend on the whole primary key, not just a
part of it).
Here, Student_Name depends only on Student_ID, not on the full (Student_ID, Course_ID) key.
Student_ID Student_Name
101 Alice
102 Bob
Course Table
Course_ID Course_Name
C1 Math
C2 Physics
Student_ID Course_ID
101 C1
102 C2
● It is in 2NF.
● No transitive dependency exists (i.e., non-key attributes should depend only on the primary key, not on
other non-key attributes).
Example (Before 3NF - Transitive Dependency):
● It is in 3NF.
● Every determinant (a field that determines another field) is a candidate key.
T1 Math R1
T2 Physics R2
T1 Physics R3
Physics R2
Teacher_ID Course
Physics R3
T1 Math
T2 Physics
T1 Physics
● It is in BCNF.
● No multi-valued dependencies exist.
Here, Course and Hobby are independent of each other but are related to Student_ID.
Student_ID Hobby
Student_ID Course
101 Paintin
101 Math g
● It is in 4NF.
● It cannot be decomposed further without losing information (i.e., no join dependency).
● The Subjects column has multiple values (Math, Science, etc.), which violates 1NF.
Converting to 1NF
Now, each column contains atomic values, and there are no repeating groups.
1. It is in 1NF.
2. No Partial Dependency: A non-prime attribute (an attribute that is not part of the primary key) must depend
on the whole primary key, not just a part of it.
Converting to 2NF
To remove partial dependency, split the table into two:
Subjec Teache
Student_ID Subject t r
Math Mr. A
101 Math
Science Mr. B
101 Science
English Mr. C
102 English
103 Math
103 English
(2NF)Now, every non-key column depends fully on the primary key, ensuring 2NF.
Third Normal Form (3NF) and Boyce-Codd Normal Form (BCNF) both eliminate transitive dependencies, but
BCNF is stricter than 3NF.
1. It is in 2NF.
2. No Transitive Dependency: Every non-prime attribute (a column not part of the primary key) must depend
only on the primary key.
3NF Tables:
1. Student_Course Table
Student_ID Course_ID
101 CSE101
102 CSE102
103 CSE101
2.
Course Table
Course_ID Course_Name
CSE101 DBMS
CSE102 OS
1. It is in 3NF.
2. For every functional dependency (X → Y), X should be a superkey.
Key Difference: In 3NF, a table can have a non-trivial functional dependency where a non-
superkey determines another non-key attribute. BCNF removes even this possibility.
Prof. A CS DBMS
Prof. B CS OS
Prof. C EE Circuits
Converting to BCNF
Prof_Dept Table
Professor Department
Prof. A CS
Prof. B CS
Prof. C EE
Dept_Course Table
Department Course
CS DBMS
CS OS
EE Circuits
5(a).Explain 3NF
Third Normal Form (3NF) is a database normalization form that aims to reduce redundancy and dependency by
ensuring that every non-key attribute is only dependent on the primary key.
Transitive Dependency:
A transitive dependency occurs when a non-key attribute depends on another non-key attribute instead of
depending directly on the primary key.
Here, the HOD_Name depends on the Department, not directly on Student_ID. This is a transitive dependency.
Converting to 3NF
Student Table
Department Table
Department HOD_Name
Now, HOD_Name depends only on Department, and all attributes in the Student table depend only on Student_ID.
This ensures 3NF.
Benefits of 3NF
5(b).Consider the relation SUPPLIER (SNAME, STREET, CITY, STATE, TAX) with key on
SNAME and FD: STATE →TAX. Decompose the relation SUPPLIER into 3NF Relations.
Here, we observe:
● SNAME → STREET, CITY, STATE, TAX (Since SNAME is the primary key)
● STATE → TAX (This creates a transitive dependency because TAX depends on STATE, not directly on
SNAME.)
Since TAX is indirectly dependent on SNAME through STATE, this violates 3NF.
2. STATE_TAX Table (Stores tax information for each state) (STATE, TAX)
● In SUPPLIER (SNAME, STREET, CITY, STATE), all non-key attributes (STREET, CITY, STATE) depend
only on SNAME, making it 3NF compliant.
● In STATE_TAX (STATE, TAX), TAX depends only on STATE, and STATE is the primary key, so it is also
in 3NF.
This decomposition ensures that the database is in Third Normal Form (3NF).
When decomposing a relation into smaller relations, we aim to maintain two important properties:
1. Lossless-Join Decomposition
A decomposition is lossless if we can reconstruct the original relation by joining the decomposed relations
without any loss of data.
Definition:
R1⋈R2=R
This ensures that no extra tuples are introduced, and no information is lost.
R1∩R2→R1 or R1∩R2→R2
This means that the common attributes between R1 and R2 must act as a superkey in at least one of the
decomposed relations.
Example:
If we decompose into:
1. R1(A,B)
2. R2(A,C)
The common attribute is A, and since A→B, A acts as a superkey for R1.
Thus, the decomposition is lossless.
2. Dependency-Preserving Decomposition
A decomposition is dependency-preserving if all functional dependencies (FDs) in the original relation are
maintained in at least one decomposed relation.
Definition:
If a relation RRR with a set of functional dependencies (FDs), F is decomposed into R1,R2,…,Rn, then the
decomposition is dependency-preserving if:
(F1∪F2∪⋯∪Fn)+=F+
Example:
Decomposing into:
1. R1(A,B)
2. R2(B,C)
Here:
● A→B is preserved in R1
● B→Cis preserved in R2
Property Purpose
Ideal Decomposition:
However, sometimes we must choose between them, especially in higher normal forms.
In database theory, a multivalued dependency (MVD) is a constraint that specifies that the presence of certain
tuples in a relation implies the presence of other tuples. Here's a breakdown:
Core Concepts:
● What it is:
○ An MVD exists when having a value for one attribute determines a set of values for another attribute,
and this set of values is independent of the values of other attributes in the relation.
○
○ It's a constraint between sets of attributes in a relation.
● Key Distinction from Functional Dependency (FD):
○ While an FD states that one attribute (or set of attributes) determines a single value of another
attribute, an MVD deals with situations where one attribute determines multiple independent values of
another attribute.
● Role in Normalization:
○ MVDs are crucial in database normalization, specifically in the context of Fourth Normal Form (4NF).
4NF aims to eliminate redundancy caused by MVDs.
○
Imagine a database for a course. A course can have multiple assigned textbooks and multiple assigned instructors.
These two sets of information (textbooks and instructors) are independent of each other.
Key Points:
Fourth Normal Form (4NF) is a level of database normalization that builds upon Boyce-Codd Normal Form (BCNF).
Its primary goal is to eliminate redundancies caused by multivalued dependencies. Here's a breakdown:
Key Concepts:
● Foundation:
○ A relation (table) is in 4NF if it is already in BCNF.
○
○ It must have no non-trivial multivalued dependencies other than a candidate key.
● Multivalued Dependencies (MVDs):
○ These occur when an attribute determines multiple independent values for another attribute.
○ 4NF focuses on removing these dependencies to reduce redundancy.
○
● Purpose:
○ To further refine database design and prevent anomalies that can arise from MVDs.
Example:
Let's consider a scenario involving students, their hobbies, and the courses they are enrolled in.
○ A student can have multiple hobbies, and a student can enroll in multiple courses.
○ The hobbies and courses are independent of each other.
○ This leads to redundant data. For example, the student "S1" and their hobbies and courses are
repeated.
○ This table has multi valued dependencies. StudentID ->-> Hobby and StudentID ->-> Course.
● Solution (4NF):
■ StudentCourses Table:
● Result:
○ Each table now contains only one multivalued dependency, which is based on the candidate key
(StudentID).
○ Redundancy is eliminated, and data integrity is improved.
○
In essence:
4NF ensures that independent multivalued facts are stored in separate tables, preventing unnecessary repetition of
data
Surrogate keys play a crucial role in database design, particularly in data warehousing, by providing a stable and
efficient way to identify records. Here's a breakdown of their uses:
● Stability:
○ Natural keys (keys derived from real-world data) can change. For example, a customer's name or
address might be used as a natural key, but these can be updated. Surrogate keys, once assigned,
remain constant, ensuring data integrity.
○
● Simplicity and Performance:
○ Surrogate keys are typically simple data types, such as integers. This makes them efficient for
indexing, joining tables, and performing queries, leading to improved database performance.
○
● Handling Complex Natural Keys:
○ Natural keys can be complex, involving multiple columns or lengthy strings. Surrogate keys provide a
single, concise identifier, simplifying database operations.
○
● Data Integration:
○ When integrating data from multiple sources, natural keys may conflict or have inconsistencies.
Surrogate keys provide a unified and consistent way to identify records across different systems.
● Historical Data Tracking:
○ In data warehousing, it's essential to track changes over time. Surrogate keys allow you to maintain
historical records even when natural key values change.
○
● Data Anonymization:
○ Surrogate keys can replace sensitive natural keys, such as social security numbers, to protect privacy
while still maintaining the ability to uniquely identify records.
● Decoupling from Business Logic:
○ By using surrogate keys, the database structure is decoupled from the business logic. This means
that changes to the business rules will have less of an impact on the database structure.
In essence, surrogate keys provide a reliable and efficient way to manage data relationships, especially in complex
database environments.
To understand 5NF, we must first grasp the concept of join dependencies. Here's a breakdown:
Join Dependency:
● Definition:
○ A join dependency (JD) exists when a relation can be reconstructed by joining certain of its
projections. In simpler terms, it means that a table can be losslessly decomposed into multiple
smaller tables, and those smaller tables can be joined back together to recreate the original table.
○
○ It's a generalization of multivalued dependencies.
○
● Purpose:
○ Join dependencies highlight complex relationships where data is dependent on combinations of
attributes, rather than just single attributes.
○
● Definition:
○ A relation is in 5NF if it is in 4NF and every join dependency in it is implied by the candidate keys.
○
○ Essentially, 5NF aims to eliminate redundancy that cannot be removed by 4NF, focusing on very
complex join dependencies.
○
○ It is also known as Project-join normal form (PJ/NF).
○
● Purpose:
○ 5NF addresses situations where breaking down a table into smaller tables is necessary to avoid
redundancy, but those smaller tables must be able to be rejoined without losing any information. It is
the final normal form that is often discussed.
● Relationship with Join Dependency:
○ 5NF is directly related to join dependencies. A relation that has a non-trivial join dependency that is
not implied by candidate keys is not in 5NF.
○
○ So in essence, 5NF ensures that all join dependencies are a result of the candidate keys.
○
In simpler terms:
● Imagine a scenario where a complex relationship involves multiple attributes that must be considered
together. 5NF ensures that this relationship is broken down into its most basic components, preventing any
redundancy that could arise from storing those attributes together.
Key takeaway:
● 5NF is a high level of normalization, and it is less commonly used than lower normal forms.
It is most relevent in situations with very complex data relationships
UNIT 5
1(a).Describe the transaction states
In database management systems (DBMS), a transaction represents a single logical unit of work. To ensure data
integrity and consistency, transactions progress through various states during their lifecycle. Here's a breakdown of
the common transaction states:
● Active:
○ This is the initial state. A transaction enters the active state when it begins execution.
○ During this state, the transaction performs read and write operations on the database.
○ Changes made in this state are typically stored in temporary memory or buffers.
● Partially Committed:
○ A transaction enters this state after it has executed its final operation.
○ At this point, the transaction has completed its logical processing, but the changes have not yet been
permanently written to the database.
○ There's still a possibility that the transaction might fail before the changes are made permanent.
● Committed:
○ A transaction reaches the committed state when all its operations have been successfully completed,
and its changes have been permanently recorded in the database.
○ Once a transaction is committed, its effects are durable, meaning they will survive system failures.
● Failed:
○ A transaction enters the failed state if any error or failure occurs during its execution.
○ This could be due to hardware failures, software errors, or violation of database constraints.
○ When a transaction fails, it cannot continue its normal execution.
● Aborted:
○ If a transaction enters the failed state, the DBMS initiates the abortion process.
○ During abortion, the DBMS rolls back the transaction, undoing any changes it made to the database.
○ This ensures that the database returns to a consistent state.
● Terminated:
○ This is the final state. A transaction enters the terminated state after it has either been committed or
aborted.
○ At this point, the transaction has completed its lifecycle, and the system is ready to process new
transactions.
In essence:
These transaction states are crucial for maintaining the ACID properties (Atomicity, Consistency, Isolation,
Durability) of database transactions, which are essential for reliable data management.
When discussing database transactions, the acronym ACID is fundamental. It represents the four key properties that
guarantee reliable transaction processing. Here's a breakdown:
ACID Properties:
● Atomicity:
○ This property ensures that a transaction is treated as a single, indivisible unit of work.
○ Either all operations within the transaction are completed successfully, or none of them are.
○ If any part of the transaction fails, the entire transaction is rolled back, 1 and the database returns to its
previous consistent state.
○ Essentially, it's the "all or nothing" principle.
● Consistency:
○ This property guarantees that a transaction moves the database from one valid consistent state to
another.
○ It ensures that the database adheres to all defined rules, constraints, and integrity conditions.
○ The transaction must preserve the database's integrity.
● Isolation:
○ This property ensures that concurrent transactions do not interfere with each other.
○ Each transaction appears to execute independently, as if it were the only transaction running.
○ This prevents data corruption and ensures that transactions do not see intermediate, uncommitted
changes made by other transactions.
● Durability:
○ This property guarantees that once a transaction is committed, its changes are permanent and will
survive even system failures, such as power outages or crashes.
○ Committed changes are written to persistent storage, ensuring that they are not lost.
● The ACID properties are crucial for maintaining data integrity and reliability in database systems.
● They ensure that transactions are processed correctly, even in complex and concurrent environments.
● They provide a foundation for building robust and dependable database applications.
In summary, the ACID properties are essential for ensuring that database transactions are processed reliably and
accurately, safeguarding the integrity of the data.
In modern computing, concurrent execution refers to the ability of a system to execute multiple tasks
simultaneously. This is essential for improving system performance, resource utilization, and responsiveness. The
need for concurrent execution arises in various scenarios, including multi-user environments, parallel computing,
and real-time applications.
Without concurrency, a CPU may remain idle while waiting for input/output (I/O) operations to complete. By allowing
multiple processes or threads to execute simultaneously, the system ensures that the CPU is used efficiently,
minimizing idle time.
Concurrency increases the number of tasks a system can process within a given time. Instead of executing tasks
sequentially, concurrent execution enables multiple tasks to run in parallel, leading to better system throughput.
In applications like web browsers, operating systems, and real-time systems, concurrency ensures that multiple user
requests can be processed simultaneously. This enhances responsiveness, as users do not have to wait for one
task to finish before another begins.
In systems where multiple users interact simultaneously (e.g., databases, web servers), concurrent execution allows
multiple queries or transactions to be processed without significant delays, preventing bottlenecks.
Modern processors have multiple cores that can handle multiple threads or processes simultaneously. Concurrency
enables efficient parallel execution, improving the performance of applications such as scientific computing, artificial
intelligence, and simulations.
Concurrency enables multiple processes to share resources like memory, files, and networks efficiently. However,
synchronization mechanisms (e.g., locks, semaphores) are required to prevent race conditions and ensure data
consistency.
In real-time systems (e.g., autonomous vehicles, industrial automation, medical systems), concurrency ensures that
time-sensitive tasks are executed without delays, meeting strict deadlines for data processing.
2(b).Analyse the anomalies associated with interleaved execution.
Interleaved execution occurs when multiple processes or threads execute concurrently, with their instructions
interleaved over time. While this improves system efficiency, it can also lead to various anomalies that affect data
consistency, correctness, and program behavior. These anomalies are particularly significant in database systems,
operating systems, and multi-threaded applications.
Occurs when two transactions or processes update the same data simultaneously, and one update is lost due to
interleaved execution.
Example:
Since T2 was unaware of T1's update, the final value is X = 20, and T1's update to 15 is lost.
Occurs when a transaction reads data that another transaction has modified but not yet committed. If the modifying
transaction is rolled back, the read transaction will have incorrect data.
Example:
Happens when a transaction reads the same data multiple times, but another transaction modifies it between reads,
leading to inconsistent results.
Example:
● T1 reads X as 10.
● T2 updates X to 15 and commits.
● T1 reads X again and gets 15.
4. Phantom Read
Occurs when a transaction retrieves a set of records based on a condition, but another transaction inserts, updates,
or deletes records, changing the result set.
Example:
5. Deadlocks
Deadlocks occur when two or more transactions wait for each other to release resources, causing a permanent
block in execution.
Example:
Since neither can proceed without the other releasing the lock, a deadlock occurs.
6. Priority Inversion
Happens when a high-priority task is waiting for a low-priority task to release a resource, but the low-priority task
cannot complete due to system constraints, leading to delays.
Interleaved execution improves system utilization and responsiveness but introduces anomalies that can lead to
data inconsistencies, deadlocks, and unpredictable program behavior. To mitigate these issues, synchronization
techniques such as locks, transactions, isolation levels, and concurrency control mechanisms must be used.
3.Explain the following i) Serializability ii) Testing for Serializability iii) Recoverability
i) Serializability
Serializability is a key concept in database concurrency control that ensures the correctness of transactions
executed concurrently. A schedule (a sequence of interleaved operations from different transactions) is serializable
if it results in the same final database state as some serial execution (where transactions execute one after another
without interleaving).
Types of Serializability:
1. Conflict Serializability
○A schedule is conflict serializable if it can be transformed into a serial schedule by swapping non-
conflicting operations.
○ Two operations conflict if:
■ They belong to different transactions.
■ They access the same data.
■ At least one of them is a write operation.
2. View Serializability
○ A schedule is view serializable if it produces the same final result as a serial schedule, even if
conflicts exist.
○ This is less restrictive than conflict serializability but harder to test.
To determine if a schedule is serializable, we use the precedence graph (wait-for graph) method:
iii) Recoverability
Recoverability ensures that a schedule maintains database consistency by allowing transactions to undo changes
safely in case of failure. A schedule is recoverable if a transaction commits only after all transactions from which it
has read data have also committed.
○ A transaction Tj should not commit before Ti if Tj has read uncommitted data from Ti.
2. Cascadeless Schedule (ACA - Avoids Cascading Aborts)
○ Prevents cascading rollbacks, where aborting one transaction forces others to abort.
○ No transaction should read uncommitted data from another transaction.
3. Strict Schedule (ST)
○Ensures strict two-phase locking (Strict 2PL), where no transaction reads or writes a data item
until the transaction that last modified it has committed.
4. Rigorous Schedule
○ The most restrictive, where locks are held until a transaction commits or aborts, preventing dirty
reads and cascading rollbacks.
● Serializability ensures that concurrent execution produces the same results as some serial order.
● Testing for serializability involves checking for cycles in the precedence graph.
● Recoverability ensures that transactions commit safely without leading to inconsistencies or cascading
failures.
The Two-Phase Locking (2PL) Protocol is a concurrency control mechanism used in databases to ensure
serializability by managing how transactions acquire and release locks. It prevents anomalies like dirty reads, lost
updates, and non-repeatable reads.
1. Growing Phase:
○ A transaction acquires locks but does not release any locks during this phase.
○ It can obtain shared (read) locks and exclusive (write) locks as needed.
○ This phase continues until the transaction reaches its lock point, where it acquires its last lock.
2. Shrinking Phase:
Transaction T1 Transaction T2
Lock-X (X)
Read(X)
Lock-X (Y)
Write(Y) Read(Z)
● All locks (both read and write) are held until the transaction commits or aborts.
● Prevents dirty reads and cascading rollbacks.
● Used in most database systems.
● Even stricter than Strict 2PL—all locks are held until the transaction commits, ensuring strict
serializability.
● Provides better recoverability but increases waiting time.
● All required locks are acquired before the transaction starts execution (pre-locking).
● Avoids deadlocks but may cause delays due to waiting for lock availability.
The Two-Phase Locking (2PL) Protocol ensures serializability by dividing a transaction into growing and
shrinking phases. Variants like Strict 2PL and Rigorous 2PL enhance safety but may reduce concurrency.
Deadlock detection and prevention mechanisms are often used alongside 2PL to manage its limitations effectively.
Conflict in a Schedule
Two operations in a schedule are said to be in conflict if they meet the following three conditions simultaneously:
○ Both operations must access the same database item (e.g., X, Y, etc.).
3. At least one of the operations is a write (update)
○ If at least one operation is a write, a conflict occurs because the value of the data item may change.
Types of Conflicts
○ A transaction reads a value that another transaction is updating but has not committed.
○ Example:
■ T1: Read(X)
■ T2: Write(X) (Conflict occurs)
2. Write-Read Conflict (Inconsistent Read / Non-Repeatable Read)
○ Two transactions write to the same data item, causing one update to be lost.
○ Example:
■ T1: Write(X)
■ T2: Write(X) (Conflict occurs)
Time T1 T2
1 Read(X)
3 Write(X)
Two operations are in conflict if they are from different transactions, access the same data item, and at least one of
them is a write operation. Conflicts lead to concurrency anomalies, which are managed using locking,
timestamp ordering, and concurrency control techniques.
5.Explain about the lock management in detail.
Lock management is a concurrency control mechanism in databases and operating systems that ensures data
consistency and prevents conflicts in multi-user environments. A lock manager is responsible for granting and
releasing locks, ensuring that transactions follow correct synchronization protocols.
1. Row-Level Locking
3. Locking Protocols
● All locks are held until commit or rollback, preventing dirty reads.
● Prevents cascading rollbacks.
● Even stricter than Strict 2PL—all locks are held until the transaction commits.
Deadlock Occurrence
● A deadlock occurs when two or more transactions wait indefinitely for each other to release locks.
● Example:
○ T1 locks A and requests B (held by T2).
○ T2 locks B and requests A (held by T1).
○ Both transactions wait indefinitely → Deadlock!
A. Lock Table
2. Key Concepts
A. Timestamp (TS)
1. Read Timestamp (RTS(X)): The largest timestamp of any transaction that successfully read X.
2. Write Timestamp (WTS(X)): The largest timestamp of any transaction that successfully wrote X.
The Timestamp-Ordering (TO) Protocol ensures serializability by enforcing the following rules:
1 Read(X) (Allowed) 10 0
2 Write(X) (Allowed) 10 10
3 Read(X) (Allowed) 20 10
4 Write(X) (Allowed) 20 20
● T1’s write is rejected in Step 5 because T2 (newer transaction) has already updated X.
✅ Advantages
❌ Disadvantages
✘ May cause frequent transaction aborts (if timestamps are not managed well).
✘ Not suitable for write-heavy workloads (as older transactions may get aborted).
✘ Requires system clock synchronization for accurate timestamp ordering.
● If TS(T) < WTS(X), ignore the Write(X) instead of aborting the transaction.
● This reduces unnecessary aborts and improves performance.
Timestamp-based concurrency control ensures serializability without locks, using timestamps for ordering. While it
avoids deadlocks, it may cause frequent aborts. Variants like the Thomas Write Rule and Multiversion
Timestamp Ordering help optimize performance.
Optimistic Concurrency Control (OCC) is a concurrency control method that assumes conflicts are rare and
allows transactions to execute without acquiring locks. Instead of locking data items during execution, OCC
verifies conflicts at the validation phase before committing the transaction.
A. Read Phase
● The transaction reads data from the database without acquiring locks.
● It performs all necessary computations and stores updates in a local workspace (buffer).
B. Validation Phase
C. Write Phase
● If the validation is successful, changes from the local workspace are written to the database.
● Otherwise, the transaction is aborted and restarted.
A transaction T is validated by checking if it conflicts with other transactions T’ that have already committed. The
conditions to avoid conflicts are:
4. Example of OCC
● T2 is aborted because its read value (100) is no longer valid due to T1’s update (110).
✅ Advantages
❌ Disadvantages
Optimistic Concurrency Control is efficient for systems where conflicts are rare. It avoids locking overhead and
deadlocks, making it ideal for read-heavy workloads. However, it can lead to frequent transaction rollbacks in
write-heavy environments.
A deadlock occurs in a database when two or more transactions are waiting indefinitely for resources locked by
each other, creating a cyclic dependency. Deadlock management is crucial for maintaining the performance and
reliability of a database system.
1. Deadlock Prevention – Ensures that deadlocks never occur by controlling how transactions request
resources.
2. Deadlock Detection and Recovery – Allows deadlocks to occur but detects and resolves them when
they happen.
2. Deadlock Prevention
Deadlock prevention techniques ensure that the system never enters a deadlock state by following specific rules.
The main strategies include:
D. Timeout-Based Prevention
● If a transaction waits too long for a resource, it is automatically aborted and restarted.
● Works well in systems where deadlocks are rare.
Instead of preventing deadlocks, some systems allow deadlocks to occur and use detection mechanisms to
identify and resolve them.
A. Deadlock Detection
● Uses a Wait-for Graph (WFG) to represent transactions and their waiting dependencies.
● A cycle in the graph indicates a deadlock.
● The system periodically checks the WFG and detects cycles using algorithms like Depth-First Search
(DFS).
B. Deadlock Recovery
Once a deadlock is detected, the system must resolve it by aborting one or more transactions. Recovery strategies
include:
○ Instead of aborting an entire transaction, roll back only the conflicting part.
3. Preempting Resources
4. Example of Deadlock
Scenario:
Using Prevention:
Using Detection:
● The system detects a cycle in the Wait-for Graph and aborts one transaction.
Deadlock handling is essential for maintaining database performance. Prevention techniques avoid deadlocks
entirely but may delay transactions. Detection and recovery allow deadlocks but require periodic checks and
rollback strategies. The choice depends on the system workload and performance requirements.
Isolation is one of the four ACID properties (Atomicity, Consistency, Isolation, Durability) that ensures transactions
execute independently without interfering with each other. It prevents concurrent transaction anomalies such as
dirty reads, non-repeatable reads, and phantom reads.
● Isolation ensures that the intermediate states of a transaction are not visible to other transactions.
● It controls the way transactions interact in a multi-user environment.
Isolation is implemented using different isolation levels, as defined by SQL standards. The higher the isolation
level, the stronger the data consistency but at the cost of performance.
B. Read Committed
C. Repeatable Read
● Ensures that if a transaction reads a value multiple times, it sees the same value (no non-repeatable
reads).
● Issues: Phantom reads can still occur.
● Used in: MySQL InnoDB (default), banking transactions.
🔹 Types of Locks:
1. Shared Lock (S-Lock) – Allows multiple transactions to read but not write the same data.
2. Exclusive Lock (X-Lock) – Only one transaction can read and write.
🔹 Implementation Methods:
○ Holds all locks until the end of the transaction to prevent cascading aborts.
🔹 Rules:
1. If T1 (older) wants to write X, it waits if X is modified by T2 (newer).
2. If T2 (newer) wants to write X, but T1 (older) read it earlier, T2 is aborted.
● Used in: High-performance databases that avoid locks.
● Transactions read old versions of data while others write new versions.
In the context of database management systems (DBMS), "Recovery" and "Atomicity" are crucial concepts that
ensure data integrity and reliability. Here's a breakdown:
Atomicity:
● Definition:
○ Atomicity is one of the ACID properties (Atomicity, Consistency, Isolation, Durability) of database
transactions.
○ It ensures that a transaction is treated as a single, indivisible unit of work.
○ This means that either all operations within a transaction are completed successfully, or none of them
are. There's no partial execution.
● Importance:
○ Atomicity prevents inconsistent data states. If a transaction fails in the middle of its execution (due to
a system crash, for example), the database is rolled back to its previous consistent state.
○ This guarantees that data remains accurate and reliable.
● Example:
○ Consider a bank transfer from account A to account B. Atomicity ensures that either both the debit
from A and the credit to B occur, or neither occurs. If the system crashes after the debit but before the
credit, the system will roll back the debit, maintaining data integrity.
Recovery:
● Definition:
○ Recovery refers to the process of restoring a database to a consistent state after a system failure.
○ DBMSs employ various techniques to recover from failures, such as hardware crashes, software
errors, or power outages.
● Importance:
○ Recovery ensures that data is not lost or corrupted due to system failures.
○ It allows the database to resume operations from a known, consistent state.
● Key Techniques:
○ Log-based recovery:
■ This involves maintaining a log of all database changes.
■ In case of a failure, the log is used to undo or redo transactions, restoring the database to a
consistent state.
○ Checkpoints:
■ Checkpoints are points in time when the database's state is written to stable storage.
■ They reduce the amount of log data that needs to be processed during recovery.
● Relationship between Atomicity and Recovery:
○ Atomicity is a property that recovery mechanisms rely on.
○ Recovery procedures use logs to ensure that transactions are either fully applied or fully undone,
upholding the principle of atomicity.
○ In essence, Atomicity is a property that must be upheld, and recovery is the mechanism that allows
the database to uphold that property, even after system failures.
Failure classification is essential in database systems to ensure reliability and consistency in the presence of errors.
Failures can occur due to various reasons such as system crashes, software bugs, or human errors. Proper
classification of failures helps in implementing recovery mechanisms and maintaining data integrity.
Types of Failures
1. Transaction Failures
○Occur when a transaction cannot complete its execution due to logical or system-related errors.
○Examples:
■ Deadlock detection and termination.
■ Logical errors (e.g., division by zero, constraint violations).
■ System-imposed aborts due to excessive resource usage.
2. System Failures
○Occur when the system crashes due to hardware or software issues, affecting the database’s normal
operations.
○ Characteristics:
■ The main memory (volatile storage) is lost, but secondary storage (disk) remains intact.
■ Requires recovery mechanisms like undo-redo logging to restore consistency.
○ Examples:
■ Power failure.
■ Operating system crash.
■ Memory corruption.
3. Media Failures
○
Occur when physical storage devices such as hard disks or SSDs are damaged.
○
Results in loss of stored data unless proper backups exist.
○
Examples:
■ Hard disk crash.
■ Bad sectors or corrupted storage blocks.
■ SSD wear-out.
4. Communication Failures
○ Occur due to errors in application logic that lead to incorrect data being processed.
○ Examples:
■ Software bugs.
■ Misconfigured transactions.
■ Incorrect user inputs leading to erroneous database operations.
Failure Recovery Mechanisms
By classifying failures properly, database systems can apply suitable recovery strategies and ensure high availability
and data integrity.
ARIES is a widely used recovery algorithm in database systems that ensures atomicity and durability in the
presence of failures. It follows a Write-Ahead Logging (WAL) approach and supports fine-grained concurrency
control.
○ Before applying any change to the database, a log record is written to stable storage.
○ Ensures that redo and undo operations can be performed correctly.
2. Repeating History During Redo:
○After a failure, ARIES repeats the exact history of the system by reapplying all operations from the
log.
○ Ensures that all committed transactions are recovered properly.
3. Logging Undo Operations:
1. Analysis Phase
2. Redo Phase
● Reapplies all logged operations from the last checkpoint to reconstruct the database state.
● Ensures that all committed transactions are properly applied.
● Uses the log sequence number (LSN) to avoid redundant operations.
3. Undo Phase
Advantages of ARIES
ARIES is a robust and efficient recovery algorithm that guarantees data consistency and durability in database
management systems. By following Write-Ahead Logging, Repeating History, and Logging Undo Operations, it
ensures reliable transaction recovery even in the presence of complex failures
A B+ Tree is a self-balancing m-ary search tree used in database indexing and file systems. It efficiently supports
search, insert, delete, and range queries.
1. Insertion in B+ Tree
[10 | 20 | 30]
● Since the leaf node can hold only 2 keys, it overflows when inserting 40.
● We split the node into two and promote 30 to a new root.
[30]
/ \
[30 | 60]
/ | \
2. Searching in B+ Tree
To search 50:
● Start at the root [30 | 60] → 50 is in the middle subtree.
● Move to [40 | 50] → Found 50!
Search takes O(log n) time due to the balanced structure.
3. Deletion in B+ Tree
[30 | 60]
/ | \
The B+ Tree maintains balance using splitting and merging operations, ensuring efficient O(log n) performance
for search, insert, and delete operations. It is widely used in databases for indexing and range queries due to its
linked leaf nodes for sequential access.
Static Hashing is a technique used in database management systems (DBMS) and file organization to store and
retrieve records efficiently using hash functions. In static hashing, the number of primary buckets (storage
locations) remains fixed throughout the lifespan of the database.
1. Hash Function:
○
A hash function H(K) = B maps a given key K to a specific bucket B.
○
Example: If the total number of buckets is 10, a simple hash function can be: H(K)=Kmod 10H(K) = K
\mod 10H(K)=Kmod10
○ For K = 25, the bucket assigned is H(25) = 25 % 10 = 5.
2. Bucket Structure:
1. Insertion
Example: Insert keys {10, 22, 35, 40} into a hashing scheme with H(K) = K mod 10.
2. Search
3. Deletion
Example: Delete 40
❌ Fixed bucket size – Leads to overflow if too many records are inserted.
❌ Wasted space – If records are fewer, some buckets remain unused.
❌ Poor scalability – If data grows, reorganization of the entire hash table is needed.
Static Hashing is efficient for small, stable datasets but struggles with scalability. For dynamic applications,
Dynamic Hashing (e.g., Extendible or Linear Hashing) is preferred.