0% found this document useful (0 votes)
0 views12 pages

DBMS Module 2,4

The document discusses various concepts in database management systems (DBMS), including relational algebra, tuple and domain relational calculus, and normalization principles. It explains the importance of functional dependencies, Armstrong's Axioms, and the ACID properties for ensuring data integrity and consistency. Additionally, it covers query optimization strategies and concurrency control mechanisms to manage multiple transactions effectively.

Uploaded by

aditric
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views12 pages

DBMS Module 2,4

The document discusses various concepts in database management systems (DBMS), including relational algebra, tuple and domain relational calculus, and normalization principles. It explains the importance of functional dependencies, Armstrong's Axioms, and the ACID properties for ensuring data integrity and consistency. Additionally, it covers query optimization strategies and concurrency control mechanisms to manage multiple transactions effectively.

Uploaded by

aditric
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

DBMS

Module 2
Relational Algebra (Think Building Blocks):

This approach breaks down data retrieval into basic operations like building
blocks. These operations are then combined to form more complex queries.

Operators like SELECT , PROJECT , JOIN , and SET DIFFERENCE are like Lego bricks for
your query.

Example: Say you want to find all the albums in the Rock genre (SELECT) and
display only the album title and artist (PROJECT). Relational Algebra would let
you combine these operations to get the desired results.

Tuple Relational Calculus (Think Logical Statements):

This method uses a more logical approach to express your query. It focuses
on what data you want, rather than how to get it step-by-step.

Imagine writing a sentence like "Find all albums where the genre is 'Rock'".
That's kind of like tuple relational calculus.

Example: You want to find all jazz albums released after 2010. Tuple relational
calculus allows you to write a logical statement that specifies this condition.

Domain Relational Calculus (Think Going Deeper):

This is an advanced version of tuple relational calculus. It focuses on


individual data values within a table, rather than entire tuples (rows).

Imagine you want to find all artists whose names start with the letter "A" and
have at least one Rock album in the database. Domain relational calculus
allows for such granular queries.

Here's a quick comparison:

Tuple Relational Domain Relational


Feature Relational Algebra
Calculus Calculus

Approach Building block operations Logical statements Values within tables

DBMS 1
Easier to understand for More complex
Complexity Most complex
basic queries syntax

Can express complex


Good for specific Highly granular data
Power queries through
data retrieval manipulation
combinations

Feature Purpose Examples

SQL3 Standard for SQL Defines advanced database functionalities

CREATE, ALTER, DROP (tables, views,


DDL Data Definition Language
users)

Data Manipulation
DML INSERT, UPDATE, DELETE, SELECT
Language

MySQL (Open Oracle DB2 SQL Server


Feature
Source) (Commercial) (Commercial) (Commercial)

Licensing Free Paid Paid Paid

Lower upfront Higher upfront Higher upfront Higher upfront


Cost
cost cost cost cost

Requires more
Setup and Easier to set up Easier to set up Easier to set up
technical
Management and manage and manage and manage
expertise

Strong for web Mature and


Most extensive Scalable, good
applications, widely used,
features, good for OLTP (Online
Features good good for
for large Transaction
community Windows
enterprises Processing)
support environments

Requires careful
configuration Robust security Strong security Strong security
Security
and features features features
maintenance

Community- Vendor support Vendor support Vendor support


Support
driven support available available available

Domain dependency in a database management system (DBMS) is a rule that


states that the domain of an attribute in one table must match the domain of the

DBMS 2
corresponding attribute in another table. This ensures that the data in each table is
consistent and can be used to create queries that return the desired results.
Data Dependency:

A data dependency is a relationship between two or more attributes (columns)


within a table, or between tables themselves. It describes how a change in the
value of one attribute might affect the value of another.

There are different types of data dependencies, but some common ones
include:

Functional dependency (FD): When the value of one attribute


(determinant) uniquely determines the value of another attribute
(dependent). For example, in a "Customers" table, "CustomerID"
functionally determines "CustomerName" (one customer cannot have two
different names).

Multivalued dependency (MVD): When a single value in one attribute


determines multiple possible values in another attribute. Imagine an
"Orders" table with an attribute "OrderID" and an attribute "ProductIDs"
(allowing multiple products in one order). "OrderID" has a multivalued
dependency on "ProductIDs".

Join dependency (JD): When a combination of two or more attributes is


needed to determine the value of another attribute. This often arises when
data is split across multiple tables to avoid redundancy.

Armstrong's Axioms:

These are a set of inference rules that allow you to determine all the functional
dependencies (FDs) that hold true within a set of existing FDs. Functional
Dependency (FD) simply means that one attribute (determinant) uniquely
determines another attribute (dependent) in a table.

Imagine a table "Customers" with attributes "CustomerID", "CustomerName", and


"Email". The FD "CustomerID -> CustomerName" holds true, meaning the
customer ID uniquely identifies the customer's name.

DBMS 3
Armstrong's Axioms provide a systematic way to derive additional FDs based on
existing ones. There are three main axioms:

Reflexivity: If X is a subset of Y, then X determines Y (written as X -> Y).

Augmentation: If X determines Y, and Z is any set of attributes, then XZ


determines YZ.

Transitivity: If X determines Y, and Y determines Z, then X determines Z.

By repeatedly applying these axioms to a set of initial FDs, you can discover all the
implicit FDs that hold true in the data.
Normal Forms:

Normal forms are a series of increasingly stricter guidelines for designing


relational database tables to minimize redundancy and improve data integrity.
Here are the common normal forms:

First Normal Form (1NF): The most basic level. It eliminates repeating groups
of data within a table and ensures each cell contains a single atomic value (no
lists or sets).

Second Normal Form (2NF): Builds on 1NF by eliminating partial


dependencies. In simpler terms, all non-key attributes must be fully dependent
on the entire primary key (the unique identifier for a row), not just a part of it.

Third Normal Form (3NF): Further reduces redundancy by eliminating


transitive dependencies. This means no non-key attribute should be
dependent on another non-key attribute; they should all be dependent solely
on the primary key.

Boyce-Codd Normal Form (BCNF): A stricter version of 3NF, ensuring there


are no determinant dependencies (a special type of dependency) besides
functional dependencies on the primary key.

The Relationship:

Armstrong's Axioms help you identify all the functional dependencies within your
data. By understanding these dependencies, you can then choose the appropriate
normal form to normalize your database tables.

Normalization removes redundancy: By eliminating duplicate data and


ensuring each attribute depends solely on the primary key, you save storage

DBMS 4
space and minimize the risk of data inconsistencies.

Armstrong's Axioms guide normalization: By identifying all the FDs, you can
see which normal form is most suitable for your data structure and avoid
potential anomalies.

Dependency Preservation:

Concept: This principle emphasizes maintaining the functional dependencies


(FDs) that exist within the original data during the process of database
normalization. FDs, as a reminder, define relationships between attributes,
where one attribute (determinant) uniquely identifies another (dependent).

Imagine: A table storing student information with attributes "StudentID",


"Name", and "Major". "StudentID" functionally determines "Name" (one
student has one name). When you normalize this table (split it into smaller
tables), dependency preservation ensures this relationship is preserved in the
resulting tables.

Why it matters:

Data integrity: Preserving FDs ensures data remains consistent. If an FD is


lost during normalization, changes to one attribute might not be reflected
correctly in the dependent attribute, leading to inconsistencies.

Normalization benefits: Normalization, which involves splitting tables to


reduce redundancy, is essential for efficient database management.
Dependency preservation ensures you can still derive the original FDs even
after splitting the tables.

Lossless Design:

Concept: This approach aims to create a normalized database design where


you can reconstruct the original data from the decomposed tables without any
loss of information. It essentially ensures the process of normalization is
reversible.

Think of it as: Imagine a jigsaw puzzle. Lossless design ensures you can take
the separate pieces (normalized tables) and put them back together (join the
tables) to get the complete picture (original data) without any missing pieces.

Lossless design and dependency preservation are closely linked:

DBMS 5
Preserving FDs is a key requirement for achieving lossless design. If you
lose FDs during normalization, you might not be able to reconstruct the
original data accurately.

Lossless design methods, like normalization based on Armstrong's Axioms,


often ensure dependency preservation. By following these methods, you can
be confident that the relationships within your data are maintained even after
normalization.

Benefits of Lossless Design:

Data integrity: Like dependency preservation, lossless design helps maintain


data consistency and avoid information loss.

Flexibility: A lossless design allows you to easily retrieve the original data if
needed, which can be helpful for reporting or analysis purposes.

Maintainability: Normalized databases with lossless design principles are


easier to maintain and update as your data needs evolve.

Relational Algebra Expressions:

Imagine building blocks for manipulating data. Relational algebra provides


operators like SELECT , PROJECT , JOIN , SET DIFFERENCE , and others that act on tables to
achieve specific results. Here's how evaluation works:

1. Expression Breakdown: The query is first parsed into a tree-like structure


representing the relational algebra expression. Each node in the tree
represents an operation (e.g., SELECT ), and the leaves represent the tables
involved.

2. Bottom-Up Evaluation: The evaluation starts from the leaves (tables) and
progresses upwards.

3. Operator Application: At each level, the operator at that node is applied to the
results coming from its children nodes.

Evaluation Example:

Consider a query that retrieves all customer names ( Name ) from a "Customers"
table who placed orders after a specific date ( OrderDate ). Here's a possible
relational algebra expression:

DBMS 6
π Name (σ OrderDate > '2024-06-15' (Orders)) JOIN Customers ON O

Breakdown: This expression involves a JOIN operation between two tables:


"Orders" and "Customers".

Evaluation:

The σ (sigma) operation selects rows from "Orders" where OrderDate is


greater than the specified date.

The result of this selection is then joined with the "Customers" table based
on the matching CustomerID .

Finally, the π (pi) operation projects only the "Name" attribute from the
joined result.

Query Equivalence:
In the world of relational databases, sometimes multiple ways of writing a query
can lead to the same result. This concept is called query equivalence.
Understanding query equivalence allows you to optimize your queries for better
performance.

Concept: Two queries are considered equivalent if they produce the same
output for any given database state.

Benefits:

Optimization opportunities: By recognizing equivalent queries, the


database optimizer can choose the most efficient execution plan.

Flexibility: You can rewrite your queries for better readability or to utilize
specific functionalities of the database system.

Join Strategies:
Joins are a fundamental operation in relational databases, combining data from
multiple tables based on a common attribute. Choosing the right join strategy can
significantly affect query performance.

Types of Joins:

DBMS 7
Inner Join: Returns rows where the join condition is met in both tables.

Left Join: Includes all rows from the left table and matching rows from the
right table, with null values for non-matching rows in the right table.

Right Join: Similar to left join, but with tables reversed.

Full Join: Includes all rows from both tables, regardless of whether they
match the join condition.

Join Strategies:

Nested Loop Join: A simple but potentially inefficient method where rows
from one table are compared to every row in the other table.

Merge Join: More efficient strategy that sorts the tables on the join
attribute and then merges them based on the sorted order.

Hash Join: Another efficient method that creates a hash table from one
table and probes the hash table with values from the other table.

Query optimization algorithms:

1. Cost-Based Optimization:
This is the most widely used approach. The optimizer estimates the cost
(processing time, memory usage) of executing different query plans for the same
query. These costs are based on factors like:

Number of rows processed in each step of the query plan (e.g., selection,
join).

Availability of indexes on relevant attributes, which can significantly speed up


operations.

I/O operations required to access data from disk.

The optimizer then chooses the plan with the lowest estimated cost, aiming for the
most efficient execution.

2. Rule-Based Optimization:
This approach relies on a set of pre-defined rules that guide the optimizer in
choosing an execution plan. These rules are based on best practices and

DBMS 8
knowledge about how different operations interact. While simpler than cost-based
optimization, it might not always find the absolute best plan for complex queries.
3. Heuristic Optimization:

Here, the optimizer uses heuristics (informed guesses) to guide its decision-
making. These heuristics can be based on past experiences or statistical analysis
of the data. While less precise than cost-based optimization, heuristics can be
helpful for situations where accurate cost estimation might be difficult.

4. Genetic Algorithms:
These algorithms, inspired by the process of natural selection, involve creating a
population of possible execution plans. The plans are then evaluated based on
their estimated cost, and "better" plans are used to create new generations
through crossover and mutation operations. This iterative process aims to
converge on the optimal plan over multiple generations.

5. Machine Learning Techniques:


Emerging techniques involve training machine learning models on historical data
about query execution times and costs. These models can then be used to predict
the cost of different plans for new queries, potentially leading to more accurate
optimization.

Module 4
ACID Properties:
ACID stands for Atomicity, Consistency, Isolation, and Durability. These properties
guarantee reliable data manipulation within a database transaction:

Atomicity: A transaction is treated as an indivisible unit. Either all the


operations within the transaction are completed successfully, or none of them
are. This ensures that the database never ends up in an inconsistent state due
to partial updates.

Imagine transferring money from one account to another. The transaction would
involve debiting the sender's account and crediting the receiver's account.
Atomicity guarantees that either both debits and credits happen (successful
transfer), or neither happens (no transfer occurs).

DBMS 9
Consistency: Each transaction transforms the database from one valid state
to another valid state. It enforces business rules and data integrity constraints.

Continuing the money transfer example, consistency ensures that the total amount
of money in the system remains the same after the transaction. There's no
possibility of the sender's account being debited but the receiver's account not
being credited (violating the money balance).

Isolation: Transactions are isolated from each other, even if they happen
concurrently. This prevents data inconsistencies that could arise if multiple
transactions try to modify the same data at the same time.

Think of multiple users trying to update the same inventory record at the same
time. Isolation ensures that each user's update is treated as if they were the only
one accessing the data, preventing conflicts and maintaining data integrity.

Durability: Once a transaction is committed (marked as successful), the


changes made by the transaction are permanent and survive system failures
like crashes or power outages.

Going back to the money transfer, durability ensures that even if the system
crashes after debiting the sender's account, the transaction is not rolled back. The
credited amount is reflected in the receiver's account upon system recovery.

Concurrency Control:
Concurrency control mechanisms ensure that multiple transactions can access
and modify the database concurrently without corrupting data or violating the
ACID properties. Here are some common techniques:

Locking: This technique prevents other transactions from accessing data


while one transaction is modifying it. There are different locking mechanisms
(shared locks, exclusive locks) to manage read and write access.

Optimistic Concurrency Control (OCC): This approach allows multiple


transactions to proceed without locking data initially. However, it validates
changes before committing them. If conflicts are detected, the transaction
might be rolled back.

Timestamp Ordering: Transactions are assigned timestamps, and their


execution order is determined based on these timestamps. This ensures a
predictable execution order and helps prevent conflicts.

DBMS 10
Multi-Version Concurrency Control (MVCC):

Concept: MVCC avoids locking entire data items by maintaining multiple


versions of a record. Each transaction "sees" its own version, and writes
create new versions instead of modifying existing ones.

Benefits:

Improves concurrency by avoiding long-held locks.

Enables features like read-as-commit, where a read operation sees the


data as it was at the start of the transaction.

Example: Imagine two transactions trying to update the same bank


account balance. MVCC allows both transactions to read the current
balance (seeing different versions) and then write their updates as new
versions, preventing conflicts.

Optimistic Concurrency Control (OCC):

Concept: OCC takes a more relaxed approach, assuming conflicts are


rare. Transactions proceed without locking data upfront.

Validation: When a transaction attempts to commit, it's validated against


the current database state. If a conflict is detected (e.g., another
transaction modified the same data), the transaction is aborted and needs
to be retried.

Benefits:

Simpler implementation compared to locking-based schemes.

Potentially higher concurrency as data is not locked during the entire


transaction.

Drawbacks:

Requires additional processing during commit for validation, which


might impact performance.

Aborted transactions can lead to wasted work and require retry logic.

Database Recovery:

DBMS 11
Even with concurrency control, unexpected events like system crashes or
hardware failures can corrupt data. Database recovery techniques ensure data
remains consistent and available even after such incidents.

Transaction Logs: These logs track all database modifications made by


transactions. They record information like before-and-after states of data
items and transaction statuses (committed, aborted).

Recovery Techniques:

Redo logging: After a crash, the system replays the committed


transactions from the log to restore the database to its most recent
consistent state.

Undo logging: If a transaction was not committed (e.g., due to a crash), its
changes are undone using the log information, preventing inconsistent
data from persisting.

DBMS 12

You might also like