DBMS Module 2,4
DBMS Module 2,4
Module 2
Relational Algebra (Think Building Blocks):
This approach breaks down data retrieval into basic operations like building
blocks. These operations are then combined to form more complex queries.
Operators like SELECT , PROJECT , JOIN , and SET DIFFERENCE are like Lego bricks for
your query.
Example: Say you want to find all the albums in the Rock genre (SELECT) and
display only the album title and artist (PROJECT). Relational Algebra would let
you combine these operations to get the desired results.
This method uses a more logical approach to express your query. It focuses
on what data you want, rather than how to get it step-by-step.
Imagine writing a sentence like "Find all albums where the genre is 'Rock'".
That's kind of like tuple relational calculus.
Example: You want to find all jazz albums released after 2010. Tuple relational
calculus allows you to write a logical statement that specifies this condition.
Imagine you want to find all artists whose names start with the letter "A" and
have at least one Rock album in the database. Domain relational calculus
allows for such granular queries.
DBMS 1
Easier to understand for More complex
Complexity Most complex
basic queries syntax
Data Manipulation
DML INSERT, UPDATE, DELETE, SELECT
Language
Requires more
Setup and Easier to set up Easier to set up Easier to set up
technical
Management and manage and manage and manage
expertise
Requires careful
configuration Robust security Strong security Strong security
Security
and features features features
maintenance
DBMS 2
corresponding attribute in another table. This ensures that the data in each table is
consistent and can be used to create queries that return the desired results.
Data Dependency:
There are different types of data dependencies, but some common ones
include:
Armstrong's Axioms:
These are a set of inference rules that allow you to determine all the functional
dependencies (FDs) that hold true within a set of existing FDs. Functional
Dependency (FD) simply means that one attribute (determinant) uniquely
determines another attribute (dependent) in a table.
DBMS 3
Armstrong's Axioms provide a systematic way to derive additional FDs based on
existing ones. There are three main axioms:
By repeatedly applying these axioms to a set of initial FDs, you can discover all the
implicit FDs that hold true in the data.
Normal Forms:
First Normal Form (1NF): The most basic level. It eliminates repeating groups
of data within a table and ensures each cell contains a single atomic value (no
lists or sets).
The Relationship:
Armstrong's Axioms help you identify all the functional dependencies within your
data. By understanding these dependencies, you can then choose the appropriate
normal form to normalize your database tables.
DBMS 4
space and minimize the risk of data inconsistencies.
Armstrong's Axioms guide normalization: By identifying all the FDs, you can
see which normal form is most suitable for your data structure and avoid
potential anomalies.
Dependency Preservation:
Why it matters:
Lossless Design:
Think of it as: Imagine a jigsaw puzzle. Lossless design ensures you can take
the separate pieces (normalized tables) and put them back together (join the
tables) to get the complete picture (original data) without any missing pieces.
DBMS 5
Preserving FDs is a key requirement for achieving lossless design. If you
lose FDs during normalization, you might not be able to reconstruct the
original data accurately.
Flexibility: A lossless design allows you to easily retrieve the original data if
needed, which can be helpful for reporting or analysis purposes.
2. Bottom-Up Evaluation: The evaluation starts from the leaves (tables) and
progresses upwards.
3. Operator Application: At each level, the operator at that node is applied to the
results coming from its children nodes.
Evaluation Example:
Consider a query that retrieves all customer names ( Name ) from a "Customers"
table who placed orders after a specific date ( OrderDate ). Here's a possible
relational algebra expression:
DBMS 6
π Name (σ OrderDate > '2024-06-15' (Orders)) JOIN Customers ON O
Evaluation:
The result of this selection is then joined with the "Customers" table based
on the matching CustomerID .
Finally, the π (pi) operation projects only the "Name" attribute from the
joined result.
Query Equivalence:
In the world of relational databases, sometimes multiple ways of writing a query
can lead to the same result. This concept is called query equivalence.
Understanding query equivalence allows you to optimize your queries for better
performance.
Concept: Two queries are considered equivalent if they produce the same
output for any given database state.
Benefits:
Flexibility: You can rewrite your queries for better readability or to utilize
specific functionalities of the database system.
Join Strategies:
Joins are a fundamental operation in relational databases, combining data from
multiple tables based on a common attribute. Choosing the right join strategy can
significantly affect query performance.
Types of Joins:
DBMS 7
Inner Join: Returns rows where the join condition is met in both tables.
Left Join: Includes all rows from the left table and matching rows from the
right table, with null values for non-matching rows in the right table.
Full Join: Includes all rows from both tables, regardless of whether they
match the join condition.
Join Strategies:
Nested Loop Join: A simple but potentially inefficient method where rows
from one table are compared to every row in the other table.
Merge Join: More efficient strategy that sorts the tables on the join
attribute and then merges them based on the sorted order.
Hash Join: Another efficient method that creates a hash table from one
table and probes the hash table with values from the other table.
1. Cost-Based Optimization:
This is the most widely used approach. The optimizer estimates the cost
(processing time, memory usage) of executing different query plans for the same
query. These costs are based on factors like:
Number of rows processed in each step of the query plan (e.g., selection,
join).
The optimizer then chooses the plan with the lowest estimated cost, aiming for the
most efficient execution.
2. Rule-Based Optimization:
This approach relies on a set of pre-defined rules that guide the optimizer in
choosing an execution plan. These rules are based on best practices and
DBMS 8
knowledge about how different operations interact. While simpler than cost-based
optimization, it might not always find the absolute best plan for complex queries.
3. Heuristic Optimization:
Here, the optimizer uses heuristics (informed guesses) to guide its decision-
making. These heuristics can be based on past experiences or statistical analysis
of the data. While less precise than cost-based optimization, heuristics can be
helpful for situations where accurate cost estimation might be difficult.
4. Genetic Algorithms:
These algorithms, inspired by the process of natural selection, involve creating a
population of possible execution plans. The plans are then evaluated based on
their estimated cost, and "better" plans are used to create new generations
through crossover and mutation operations. This iterative process aims to
converge on the optimal plan over multiple generations.
Module 4
ACID Properties:
ACID stands for Atomicity, Consistency, Isolation, and Durability. These properties
guarantee reliable data manipulation within a database transaction:
Imagine transferring money from one account to another. The transaction would
involve debiting the sender's account and crediting the receiver's account.
Atomicity guarantees that either both debits and credits happen (successful
transfer), or neither happens (no transfer occurs).
DBMS 9
Consistency: Each transaction transforms the database from one valid state
to another valid state. It enforces business rules and data integrity constraints.
Continuing the money transfer example, consistency ensures that the total amount
of money in the system remains the same after the transaction. There's no
possibility of the sender's account being debited but the receiver's account not
being credited (violating the money balance).
Isolation: Transactions are isolated from each other, even if they happen
concurrently. This prevents data inconsistencies that could arise if multiple
transactions try to modify the same data at the same time.
Think of multiple users trying to update the same inventory record at the same
time. Isolation ensures that each user's update is treated as if they were the only
one accessing the data, preventing conflicts and maintaining data integrity.
Going back to the money transfer, durability ensures that even if the system
crashes after debiting the sender's account, the transaction is not rolled back. The
credited amount is reflected in the receiver's account upon system recovery.
Concurrency Control:
Concurrency control mechanisms ensure that multiple transactions can access
and modify the database concurrently without corrupting data or violating the
ACID properties. Here are some common techniques:
DBMS 10
Multi-Version Concurrency Control (MVCC):
Benefits:
Benefits:
Drawbacks:
Aborted transactions can lead to wasted work and require retry logic.
Database Recovery:
DBMS 11
Even with concurrency control, unexpected events like system crashes or
hardware failures can corrupt data. Database recovery techniques ensure data
remains consistent and available even after such incidents.
Recovery Techniques:
Undo logging: If a transaction was not committed (e.g., due to a crash), its
changes are undone using the log information, preventing inconsistent
data from persisting.
DBMS 12