Adbms Notes
Adbms Notes
Query Processing
Query processing in a database management system (DBMS) refers to the series of steps
required to execute a query and retrieve the desired data. It involves translating a high-level
SQL query into an efficient execution plan, optimizing that plan, and then executing it to
obtain the result. The main goal of query processing is to minimize response time and
resource usage.
1. Query :
User queries are initially written in high-level database languages such as SQL.
These queries need to be translated into expressions that can be used at the physical level of
the file system.
2. Parsing and Translation:
3. Relational-Algebra Expression:
Description: The relational-algebra expression represents the logical steps needed to execute
the query.
SQL is suitable for humans to write and understand queries.
However, SQL is not perfectly suitable for internal representation within the system.
Relational algebra is better suited for internal representation of queries.
Example: The relational algebra operation σ (selection) is applied to the employees relation
to filter rows where department = 'Sales'.
4. Optimizer
Description: The optimizer's goal is to find the most efficient way to execute the
query.
Logical Optimization: Applies transformations to the relational-algebra expression to find
more efficient equivalent expressions.
Physical Optimization: Determines the best physical execution plan by choosing algorithms
for each operation and considering different access paths.
For optimizing a query, the query optimizer should have an estimated cost analysis of each
operation. It is because the overall operation cost depends on the memory allocations to
several operations, execution costs, and so on.
It increases the perfomance of query by selecting the best possible query.
It selects the most optimized or best query with the help of Statistics About Data (It is the
meta data i.e data about data).
5. Statistic Data:
Role: Provides information about the database, such as table sizes, data distribution, and
index usage, to help estimate the cost of different execution plans.
Example: Statistics might indicate that using an index on the department column is faster
than a full table scan.
6. Execution Plan:
An execution plan is a detailed strategy that the database management system (DBMS) uses
to execute a query. It involves choosing the most efficient way to retrieve the requested data
by specifying the operations and the order in which they should be performed.
The execution plan is always represented in the form of qrery tree.
7. Evaluation:
After finding the best execution plan, the DBMS starts the execution of the optimized query.
And it gives the results from the database.
In this step, DBMS can perform operations on the data. These operations are selecting the
data, inserting something, updating the data, and so on.
Once everything is completed, DBMS returns the result after the evaluation step.
8. Query Output:
Description: The final result of the query is generated and returned to the user.
Example: The output might be a set of rows showing all employees in the Sales department.
Recovery in Database Management Systems (DBMS) refers to the process of restoring a database to
a consistent and usable state after a system failure or error.
1. Data Integrity: Databases often store critical and sensitive information. In the event
of system failures, software bugs, or human errors, data can become corrupted or lost.
Recovery mechanisms ensure that the database can recover to a consistent and valid
state, preserving data integrity.
2. Transaction Atomicity: Databases maintain the ACID properties (Atomicity,
Consistency, Isolation, Durability) to ensure reliable transactions. Atomicity
guarantees that either all operations within a transaction are completed successfully,
or none are. Recovery mechanisms help maintain atomicity by ensuring that partially
completed transactions are rolled back in the event of a failure.
3. Durability: The durability property of transactions ensures that committed changes
persist even in the face of system failures. Recovery mechanisms, such as logging and
checkpointing, help achieve durability by recording changes to disk and allowing the
system to recover committed changes if a failure occurs.
4. Data Consistency: Databases must remain in a consistent state, adhering to integrity
constraints and business rules. Recovery mechanisms ensure that the database can
recover to a consistent state after a failure, by rolling back incomplete transactions
and applying committed changes.
5. System Reliability: System failures, such as power outages, hardware malfunctions,
or software crashes, can occur unexpectedly. Recovery mechanisms enhance system
reliability by providing mechanisms to recover from these failures, minimizing data
loss and downtime.
Check Points:
The checkpoint is a type of mechanism where all the previous logs are removed from
the system and permanently stored in the storage disk.
The checkpoint is like a bookmark. While the execution of the transaction, such
checkpoints are marked, and the transaction is executed then using the steps of the
transaction, the log files will be created.
When it reaches to the checkpoint, then the transaction will be updated into the
database, and till that point, the entire log file will be removed from the file. Then the
log file is updated with the new step of transaction till next checkpoint and so on.
The checkpoint is used to declare a point before which the DBMS was in the
consistent state, and all transactions were committed.
The recovery system reads log files from the end to start. It reads log files from T4 to
T1.
Recovery system maintains two lists, a redo-list, and an undo-list.
The transaction is put into redo state if the recovery system sees a log with <Tn,
Start> and <Tn, Commit> or just <Tn, Commit>. In the redo-list and their previous
list, all the transactions are removed and then redone before saving their logs.
For example: In the log file, transaction T2 and T3 will have <Tn, Start> and <Tn,
Commit>. The T1 transaction will have only <Tn, commit> in the log file. That's why
the transaction is committed after the checkpoint is crossed. Hence it puts T1, T2 and
T3 transaction into redo list.
The transaction is put into undo state if the recovery system sees a log with <Tn,
Start> but no commit or abort log found. In the undo-list, all the transactions are
undone, and their logs are removed.
For example: Transaction T4 will have <Tn, Start>. So T4 will be put into undo list
since this transaction is not yet complete and failed amid.
"Log-based recovery" is a technique used in database systems to ensure that the database
remains in a consistent state even in the event of system failures. It involves maintaining a
log, also known as a transaction log or redo log, which records all the changes made to the
database. This log allows the database management system (DBMS) to recover transactions
that were in progress but not yet completed at the time of a failure.
1. Logging: Whenever a transaction modifies data in the database, the DBMS writes a
record of the modification to the transaction log before making the actual changes to
the database. This log record contains information such as the type of operation
performed (insert, update, delete), the data item affected, and the old and new values
of the data item.
2. Checkpointing: Periodically, the DBMS performs a checkpoint operation where it
writes a record to the log indicating that all transactions that were committed before
the checkpoint have been successfully written to disk. This helps in reducing the time
needed for recovery by limiting the portion of the log that needs to be examined
during the recovery process.
3. Recovery: In the event of a system failure, such as a crash or power outage, the
DBMS uses the transaction log to recover the database to a consistent state. It replays
the logged changes starting from the last checkpoint, applying the changes to the
database in the same order they were originally made. This process ensures that all
committed transactions are successfully re-executed, restoring the database to a
consistent state.
Deferred database modification − All logs are written on to the stable storage and
the database is updated when a transaction commits.
Immediate database modification − Each log follows an actual database
modification. That is, the database is modified immediately after every operation.
1. Active: The initial state of a transaction where it is executing and performing database
operations, such as reading or modifying data.
2. Partially Committed: After a transaction has executed all of its operations and is
ready to commit, but the changes have not been made permanent in the database yet.
This is a transitional state before committing.
3. Committed: In this state, the transaction has successfully completed all of its
operations, and its changes have been made permanent in the database. Once
committed, the changes are durable and cannot be rolled back.
4. Aborted: If a transaction encounters an error or is explicitly rolled back before
committing, it enters the aborted state. In this state, any changes made by the
transaction are undone, restoring the database to its state before the transaction began.
5. Failed: A transaction enters the failed state if it encounters a system failure or an error
that prevents it from completing its operations. In this state, the DBMS may
automatically abort the transaction or initiate recovery procedures to restore the
database to a consistent state.
ACID Properties of a transaction:
In the context of databases, the term "ACID properties" refers to the four key properties of a
transaction. A transaction is a logical unit of work that is performed within a database
management system. Here's an explanation of each of the ACID properties:
4. Durability: Durability ensures that once a transaction is committed, its effects persist
even in the event of system failures such as power outages or crashes. Once a
transaction has been successfully committed, the changes made by the transaction are
permanent and cannot be undone. The database system must store the changes
permanently, typically by writing them to non-volatile storage such as disk, to ensure
durability.
Together, these ACID properties provide a set of guarantees that ensure the reliability,
integrity, and consistency of transactions in a database system, even in the presence of
concurrent execution and system failures.
Concurrency:
What is a Schedule?
A schedule is a series of operations from one or more transactions. A schedule can be of two
types:
Serial Schedule: When one transaction completely executes before starting another
transaction, the schedule is called a serial schedule. A serial schedule is always consistent.
e.g.; If a schedule S has debit transaction T1 and credit transaction T2, possible serial
schedules are T1 followed by T2 (T1->T2) or T2 followed by T1 ((T2->T1). A serial
schedule has low throughput and less resource utilization.
Concurrency control is a very important concept of DBMS which ensures the simultaneous
execution or manipulation of data by several processes or user without resulting in data
inconsistency.
Concurrency control provides a procedure that is able to control concurrent execution of the
operations in the database.
Another is to treat some operations besides read and write as fundamental low-level
operations and to extend concurrency control to deal with them.
Lock Based Concurrency Control Protocol in DBMS
In a database management system (DBMS), lock-based concurrency control (BCC) is used to
control the access of multiple transactions to the same data item. This protocol helps to
maintain data consistency and integrity across multiple users. In the protocol, transactions
gain locks on data items to control their access and prevent conflicts between concurrent
transactions.
1. Shared Lock (S): also known as Read-only lock. As the name suggests it can be
shared between transactions because while holding this lock the transaction does not
have the permission to update data on the data item. S-lock is requested using lock-S
instruction.
2. Exclusive Lock (X): Data item can be both read as well as written. This is Exclusive
and cannot be held simultaneously on the same data item. X-lock is requested using
lock-X instruction.
Implementing Shared S(a) and Exclusive X(a) lock system without any restrictions gives us
the Simple Lock-based protocol (or Binary Locking), but it has its own disadvantages, they
do not guarantee Serializability.
A transaction is said to follow the Two-Phase Locking protocol if Locking and Unlocking
can be done in two phases.
Growing Phase: New locks on data items may be acquired but none can be released.
Shrinking Phase: Existing locks may be released but no new locks can be acquired.
Assumptions : T1 comes before T2, T1 will be commited before T2.
1. Read phase: In this phase, the transaction T is read and executed. It is used to read the value
of various data items and stores them in temporary local variables. It can perform all the
write operations on temporary variables without an update to the actual database.
2. Validation phase: In this phase, the temporary variable value will be validated against the
actual data to see if it violates the serializability.
3. Write phase: If the validation of the transaction is validated, then the temporary results are
written to the database or system otherwise the transaction is rolled back.
Multi-Version Concurrency Control (MVCC) is a way for databases to handle
multiple users reading and writing data at the same time without getting in each other's way.
Types:
Replication:
Duplicates Everywhere: Every piece of data is copied and stored in two or more
locations. This means that multiple copies of the same data exist across different
places.
Advantages:
o More Accessible: Having data in multiple places means it's easier to access from
different locations.
o Parallel Processing: Queries can be handled simultaneously because each location
has its own copy of the data.
Drawbacks:
o High Update Frequency: Whenever data changes, updates must be made to every
copy of that data across all locations.
o Overhead: Maintaining multiple copies of data requires additional storage and
resources.
o Complex Concurrency Management: Managing concurrent access to data across
multiple locations becomes more complicated.
Fragmentation:
Breaking Up Data: Data is divided into smaller pieces, called fragments, and each
piece is stored in multiple locations where it's needed. These fragments are designed
in a way that allows the original data to be reconstructed.
No Duplicate Data: Unlike replication, where data is duplicated, fragmentation does
not result in duplicate data. Each piece of data is unique and distributed across
different locations.
Advantage:
o Consistency Not an Issue: Since there are no duplicate copies, there's no need to
worry about keeping multiple copies consistent.
Consideration:
o Reconstruction: While fragmentation doesn't duplicate data, it's important to ensure
that the original data can be reconstructed from its fragments when needed.
Copying Data: Data replication means making copies of data from one place (like a
server or a database) to another. This helps in making sure that the same data is
available in multiple locations.
Improving Availability: By having copies of data in different places, it ensures that
the data is available even if one server or location goes down. This improves the
availability of data for users.
Avoiding Inconsistencies: When many users need to access the same data,
replication ensures that everyone gets the same version of the data. This prevents any
confusion or inconsistencies in the data users are working with.
Continuous Updates: Replication involves regularly updating the copies of data so
that they stay consistent with the original data source. This ensures that all copies are
up-to-date and match the latest changes made to the data.
Improved Access: Users can access data from their nearest or most convenient location,
improving efficiency.
Redundancy: Having multiple copies of data adds redundancy, ensuring data availability even
if one copy is lost or inaccessible.
Data Consistency: It ensures that everyone sees the same version of data, avoiding
confusion or conflicts.
Full Replication: In full replication, all data from the primary database is replicated to one
or more secondary databases asynchronously. This means that every change made to the
primary database is eventually propagated to all secondary databases. Full replication ensures
that all replicas have the same complete dataset as the primary database. It provides strong
consistency guarantees but may introduce higher overhead in terms of network bandwidth
and storage space.
No Replication: In this scheme, no data replication occurs. Changes made to the primary
database are not propagated to any secondary databases. This might be suitable for scenarios
where data redundancy or fault tolerance is not a requirement, or where the overhead of
maintaining replicas is not justified.
Partial Replication: Partial replication involves replicating only a subset of the data from
the primary database to secondary databases asynchronously. This subset could be selected
based on criteria such as access frequency, importance, or relevance to specific geographic
regions. Partial replication can help reduce replication overhead and optimize resource usage
while still providing some level of fault tolerance and scalability.
Commit Protocol in DBMS
The concept of the Commit Protocol was developed in the context of database systems.
Commit protocols are defined as algorithms that are used in distributed systems to ensure that
the transaction is completed entirely or not. It helps us to maintain data integrity, Atomicity,
and consistency of the data. It helps us to create robust, efficient and reliable systems.
1. One-Phase Commit
It is the simplest commit protocol. In this commit protocol, there is a controlling site, and
there are a variety of slave sites where the transaction is performed. The steps followed in the
one-phase commit protocol are following: –
Each slave sends a ‘DONE’ message to the controlling site after each slave has completed its
transaction.
After sending the ‘DONE’ message, the slaves start waiting for a ‘Commit’ or ‘Abort’ response
from the controlling site.
After receiving the ‘DONE’ message from all the slaves, then the controlling site decides
whether they have to commit or abort. Here the confirmation of the message is done. Then
it sends a message to every slave.
Then after the slave performs the operation as instructed by the controlling site, they send
an acknowledgement to the controlling site.
2. Two-Phase Commit (2PC):
Each slave sends a ‘DONE’ message to the controlling site after each slave has completed its
transaction.
Phase 1 - Prepare Phase: In the first phase, the coordinator (typically the transaction
manager) sends a prepare request to all participating nodes (participants). Each
participant replies with an acknowledgment indicating whether it is ready to commit
or abort the transaction.
Phase 2 - Commit Phase (or Abort): If all participants vote to commit in Phase 1,
the coordinator sends a commit message to all participants. If any participant votes to
abort or if the coordinator times out waiting for responses, it sends an abort message
to all participants.
Advantages:
o Simple and easy to implement.
o Provides atomicity guarantees for distributed transactions.
Disadvantages:
o Blocking: The coordinator may block if it cannot reach a participant or if a
participant fails.
o Single point of failure: The coordinator can become a single point of failure.
o Blocking locks: Participants may hold locks until the decision is made,
potentially leading to deadlocks or performance issues.
3. Three-Phase Commit (3PC):
Phase 1 - Can Commit Phase: Similar to the prepare phase in 2PC, the coordinator
asks each participant if it can commit the transaction. However, instead of waiting for
an immediate response, it asks participants to promise to commit if all are able.
Phase 2 - Pre-Commit Phase: If all participants agree to commit in Phase 1, the
coordinator sends a pre-commit message to all participants, instructing them to
prepare to commit.
Phase 3 - Commit (or Abort): If all participants are prepared to commit in Phase 2,
the coordinator sends a commit message to all participants. If any participant is not
prepared, the coordinator sends an abort message to all participants.
Advantages:
o Reduces the chance of blocking: The extra phase helps to ensure that
transactions are not blocked indefinitely.
o Reduces the window of inconsistency: The third phase reduces the time during
which a transaction may commit locally but fail globally.
Disadvantages:
o Increased complexity: 3PC is more complex to implement compared to 2PC.
o Higher latency: The additional phase can introduce extra latency into the
commit process.
UNIT – 4
1. Object Structure:
o Refers to the components that make up an object, including attributes
(characteristics) and methods (operations).
o Objects encapsulate data and behavior in a single unit, providing data abstraction.
2. Messages and Methods:
o Messages: Act as communication channels between entities and the outside world.
Read-only messages: Do not change the value of a variable.
Update messages: Modify the value of a variable.
o Methods: Chunks of code executed in response to messages.
Read-only methods: Do not change variable values.
Update methods: Modify variable values.
3. Variables:
o Store an object's data, allowing objects to be distinguished from each other.
4. Object Classes:
o Blueprints for creating objects with similar characteristics and behaviors.
o Instances of a class are objects representing real-world items.
o Classes define attributes, methods, and messages related to objects.
5. Inheritance:
o Allows classes to inherit attributes and methods from other classes.
o Establishes a class hierarchy to illustrate commonalities between classes.
6. Encapsulation:
o Conceals internal details of objects, exposing only necessary parts for interaction.
o Supports the idea of data or information hiding for cleaner and more secure code.
7. Abstract Data Types (ADTs):
o User-defined data types that carry both data and methods.
o Can encapsulate complex data structures and operations, promoting modularity and
reusability.
Object-Relational Data Model in DBMS
The Object-Relational data model refers to a combination of a Relational database model and
an Object-Oriented database model. As a result, it supports classes, objects, inheritance, and
other features found in Object-Oriented models, as well as data types, tabular structures, and
other features found in Relational Data Models.
Association:
Definition:
Association represents a relationship between two or more objects. It describes how objects
are related or connected to each other within the system.
Characteristics:
Aggregation:
Definition:
Aggregation represents a whole-part relationship between objects, where one object (the
whole) is composed of or contains other objects (the parts). It's a specialized form of
association that denotes a stronger relationship between objects.
Characteristics:
1. Ownership: The whole object owns or manages the parts. When the whole object is
deleted, its parts may or may not be deleted depending on the aggregation type.
2. Composition: Stronger form of aggregation where the parts are exclusively owned by
the whole. Parts cannot exist independently outside the context of the whole.
3. Association: Aggregation implies an association between the whole and its parts, but
the relationship is more tightly coupled compared to regular associations.
Example:
In a car manufacturing system, a Car object may aggregate Engine, Wheel, and Body objects.
The Car object is composed of these parts, and it manages their lifecycle. If the Car object is
destroyed, its parts are typically destroyed as well (composition), but if it's a weaker form of
aggregation, the parts might still exist independently.
Comparison:
2. Specialization:
3. Generalization:
4. Aggregation:
5. Association:
Example:
A Deadlock is a situation where a set of processes are blocked because each process is
holding a resource and waiting for another resource occupied by some other process. When
this situation arises, it is known as Deadlock.
In this matrix:
"Y" indicates compatibility, meaning that two transactions holding locks of these
types can coexist without conflict.
"N" indicates incompatibility, meaning that if one transaction holds a lock of a certain
type, another transaction cannot acquire a conflicting lock type simultaneously.
In the example:
Shared (S) locks are compatible with other Shared (S) locks but not with Exclusive
(X) locks.
Exclusive (X) locks are not compatible with any other locks, including Shared (S)
locks.
This matrix helps in determining when transactions can proceed concurrently without causing
conflicts or deadlocks. It guides the behavior of the concurrency control mechanisms in a
database system, ensuring data consistency while allowing for efficient concurrent access to
the database.
A join operation combines rows from two or more tables based on a related column between
them. It allows you to retrieve data that spans multiple tables by linking rows with related
information.
Cartesian Product Operation in Relational Algebra
On applying CARTESIAN PRODUCT on two relations that is on two sets of tuples, it will
take every tuple one by one from the left set(relation) and will pair it up with all the tuples in
the right set(relation).