0% found this document useful (0 votes)
52 views

Adbms Notes

Uploaded by

xidita3483
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views

Adbms Notes

Uploaded by

xidita3483
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Unit – 1

Query Processing

Query processing in a database management system (DBMS) refers to the series of steps
required to execute a query and retrieve the desired data. It involves translating a high-level
SQL query into an efficient execution plan, optimizing that plan, and then executing it to
obtain the result. The main goal of query processing is to minimize response time and
resource usage.

Adva tages of Query processsing:


 Efficiency: Query processing optimizes the retrieval of data, making it faster and
more efficient.
 Concurrency Control: DBMS handles multiple queries simultaneously, ensuring
data integrity and consistency.
 Security: Access control mechanisms in DBMS prevent unauthorized access to data.
 Scalability: DBMS can handle a large volume of queries and data, scaling up as
needed.
 Data Integrity: Queries are processed with transaction management to ensure data
remains accurate and consistent.
 Flexibility: Users can query data in various formats and structures, enhancing
flexibility in data retrieval.
 Optimization: DBMS employs query optimization techniques to improve
performance and resource utilization.
Query Processing is the activity performed in extracting data from the database. In query
processing, it takes various steps for fetching the data from the database. The steps involved
are:

1. Parsing and translation


2. Optimization
3. Evaluation

1. Query :

 User queries are initially written in high-level database languages such as SQL.
 These queries need to be translated into expressions that can be used at the physical level of
the file system.
2. Parsing and Translation:

 Before processing, the query undergoes various evaluations and query-optimizing


transformations.
 The system needs to translate the query into a language that the system can internally
represent and optimize.
 Parser:
o Syntax Checking: The parser checks the query for correct SQL syntax.
o Parse Tree Generation: If the syntax is correct, it generates a parse tree, which
represents the syntactic structure of the query.
 Translator:
o Semantic Checking: Ensures that the query is semantically correct (e.g., checking
table and column names).
o Intermediate Representation: Converts the parse tree into an intermediate
representation, typically a relational-algebra expression.
 Example: The SQL query is converted into a relational-algebra expression like
σ(department='Sales')(employees).

3. Relational-Algebra Expression:

 Description: The relational-algebra expression represents the logical steps needed to execute
the query.
 SQL is suitable for humans to write and understand queries.
 However, SQL is not perfectly suitable for internal representation within the system.
 Relational algebra is better suited for internal representation of queries.
 Example: The relational algebra operation σ (selection) is applied to the employees relation
to filter rows where department = 'Sales'.

4. Optimizer

 Description: The optimizer's goal is to find the most efficient way to execute the
query.
 Logical Optimization: Applies transformations to the relational-algebra expression to find
more efficient equivalent expressions.
 Physical Optimization: Determines the best physical execution plan by choosing algorithms
for each operation and considering different access paths.
 For optimizing a query, the query optimizer should have an estimated cost analysis of each
operation. It is because the overall operation cost depends on the memory allocations to
several operations, execution costs, and so on.
 It increases the perfomance of query by selecting the best possible query.
 It selects the most optimized or best query with the help of Statistics About Data (It is the
meta data i.e data about data).
5. Statistic Data:

 Role: Provides information about the database, such as table sizes, data distribution, and
index usage, to help estimate the cost of different execution plans.
 Example: Statistics might indicate that using an index on the department column is faster
than a full table scan.

6. Execution Plan:

 An execution plan is a detailed strategy that the database management system (DBMS) uses
to execute a query. It involves choosing the most efficient way to retrieve the requested data
by specifying the operations and the order in which they should be performed.
 The execution plan is always represented in the form of qrery tree.

7. Evaluation:

 After finding the best execution plan, the DBMS starts the execution of the optimized query.
And it gives the results from the database.
 In this step, DBMS can perform operations on the data. These operations are selecting the
data, inserting something, updating the data, and so on.
 Once everything is completed, DBMS returns the result after the evaluation step.

8. Query Output:

 Description: The final result of the query is generated and returned to the user.
 Example: The output might be a set of rows showing all employees in the Sales department.

# Some ways to improve query performance:

1. Optimize Query Structure: Simplify and streamline queries by removing


unnecessary operations and optimizing joins and filters.
2. Indexing: Create indexes on frequently queried columns to speed up data
retrieval and reduce the need for full-table scans.
3. Table Partitioning: Partition large tables into smaller segments based on
specific criteria to minimize data scanned during queries.
4. Statistics Maintenance: Keep database statistics up-to-date to enable the
query optimizer to make informed decisions.
5. Hardware Scaling: Scale up hardware resources or distribute the database
across multiple servers for improved parallelism.
6. Caching: Implement result caching to store frequently accessed query results
and reduce processing time.
7. Regular Maintenance: Perform routine tasks like index rebuilds, statistics
updates, and data compaction to ensure optimal performance over time.
UNIT – 2

Recovery in Database Management Systems (DBMS) refers to the process of restoring a database to
a consistent and usable state after a system failure or error.

Recovery is crucial in databases for several reasons:

1. Data Integrity: Databases often store critical and sensitive information. In the event
of system failures, software bugs, or human errors, data can become corrupted or lost.
Recovery mechanisms ensure that the database can recover to a consistent and valid
state, preserving data integrity.
2. Transaction Atomicity: Databases maintain the ACID properties (Atomicity,
Consistency, Isolation, Durability) to ensure reliable transactions. Atomicity
guarantees that either all operations within a transaction are completed successfully,
or none are. Recovery mechanisms help maintain atomicity by ensuring that partially
completed transactions are rolled back in the event of a failure.
3. Durability: The durability property of transactions ensures that committed changes
persist even in the face of system failures. Recovery mechanisms, such as logging and
checkpointing, help achieve durability by recording changes to disk and allowing the
system to recover committed changes if a failure occurs.
4. Data Consistency: Databases must remain in a consistent state, adhering to integrity
constraints and business rules. Recovery mechanisms ensure that the database can
recover to a consistent state after a failure, by rolling back incomplete transactions
and applying committed changes.
5. System Reliability: System failures, such as power outages, hardware malfunctions,
or software crashes, can occur unexpectedly. Recovery mechanisms enhance system
reliability by providing mechanisms to recover from these failures, minimizing data
loss and downtime.

Check Points:
 The checkpoint is a type of mechanism where all the previous logs are removed from
the system and permanently stored in the storage disk.
 The checkpoint is like a bookmark. While the execution of the transaction, such
checkpoints are marked, and the transaction is executed then using the steps of the
transaction, the log files will be created.
 When it reaches to the checkpoint, then the transaction will be updated into the
database, and till that point, the entire log file will be removed from the file. Then the
log file is updated with the new step of transaction till next checkpoint and so on.
 The checkpoint is used to declare a point before which the DBMS was in the
consistent state, and all transactions were committed.

 The recovery system reads log files from the end to start. It reads log files from T4 to
T1.
 Recovery system maintains two lists, a redo-list, and an undo-list.
 The transaction is put into redo state if the recovery system sees a log with <Tn,
Start> and <Tn, Commit> or just <Tn, Commit>. In the redo-list and their previous
list, all the transactions are removed and then redone before saving their logs.
 For example: In the log file, transaction T2 and T3 will have <Tn, Start> and <Tn,
Commit>. The T1 transaction will have only <Tn, commit> in the log file. That's why
the transaction is committed after the checkpoint is crossed. Hence it puts T1, T2 and
T3 transaction into redo list.
 The transaction is put into undo state if the recovery system sees a log with <Tn,
Start> but no commit or abort log found. In the undo-list, all the transactions are
undone, and their logs are removed.
 For example: Transaction T4 will have <Tn, Start>. So T4 will be put into undo list
since this transaction is not yet complete and failed amid.

Log Based Recovery :-


Log is a sequence of records, which maintains the records of actions performed by a
transaction. It is important that the logs are written prior to the actual modification and stored
on a stable storage media, which is failsafe.

"Log-based recovery" is a technique used in database systems to ensure that the database
remains in a consistent state even in the event of system failures. It involves maintaining a
log, also known as a transaction log or redo log, which records all the changes made to the
database. This log allows the database management system (DBMS) to recover transactions
that were in progress but not yet completed at the time of a failure.

Here's how log-based recovery typically works:

1. Logging: Whenever a transaction modifies data in the database, the DBMS writes a
record of the modification to the transaction log before making the actual changes to
the database. This log record contains information such as the type of operation
performed (insert, update, delete), the data item affected, and the old and new values
of the data item.
2. Checkpointing: Periodically, the DBMS performs a checkpoint operation where it
writes a record to the log indicating that all transactions that were committed before
the checkpoint have been successfully written to disk. This helps in reducing the time
needed for recovery by limiting the portion of the log that needs to be examined
during the recovery process.
3. Recovery: In the event of a system failure, such as a crash or power outage, the
DBMS uses the transaction log to recover the database to a consistent state. It replays
the logged changes starting from the last checkpoint, applying the changes to the
database in the same order they were originally made. This process ensures that all
committed transactions are successfully re-executed, restoring the database to a
consistent state.

The database can be modified using two approaches −

 Deferred database modification − All logs are written on to the stable storage and
the database is updated when a transaction commits.
 Immediate database modification − Each log follows an actual database
modification. That is, the database is modified immediately after every operation.

Deferred database modification refers to a strategy used in database management systems


(DBMS) where changes made by transactions are not immediately applied to the database.
Instead, the modifications are temporarily held in memory or in a transaction log until the
transaction is ready to commit.

Here's how deferred database modification typically works:

1. Transaction Execution: When a transaction begins executing, any data modifications


it performs, such as inserts, updates, or deletes, are applied to a temporary area in
memory or recorded in a transaction log.
2. Transaction Completion: Throughout the transaction's execution, its changes are
kept separate from the main database. Other transactions running concurrently might
not be able to see these changes.
3. Commit Point: When the transaction reaches a point where it is ready to be
committed (i.e., all its operations have completed successfully), it signals its intention
to commit.
4. Commitment: At this point, the changes made by the transaction are applied to the
main database, making them visible to other transactions. If the transaction involves
multiple operations, all changes are applied atomically, ensuring that either all
changes are committed or none.
5. Rollback: If the transaction encounters an error or is explicitly rolled back before
reaching the commit point, all the changes made by the transaction are discarded. The
database remains unchanged, and it's as if the transaction never occurred.

Immediate database modification is a strategy used in database management systems


(DBMS) where changes made by transactions are immediately applied to the database upon
execution of the modification operation. In contrast to deferred database modification, where
changes are held temporarily until the transaction commits, immediate modification applies
the changes directly to the database.

Here's how immediate database modification typically works:

1. Transaction Execution: When a transaction begins executing, any data modifications


it performs, such as inserts, updates, or deletes, are immediately applied to the
database. The changes become visible to other transactions as soon as the
modification operation is executed.
2. Visibility: Changes made by the transaction are immediately visible to other
transactions running concurrently. This means that other transactions can potentially
see intermediate states of the data modified by the transaction.
3. Commit Point: While immediate modification applies changes immediately,
transactions may still have a commit point. However, this commit point is primarily
for ensuring atomicity and durability rather than for applying changes to the database.
4. Atomicity and Durability: Although changes are applied immediately, the
transaction's commit point ensures that all changes made by the transaction are
committed atomically. Once committed, the changes are durable and persist even in
the event of system failures.
States of Transaction:
A transaction in the context of database management is a logical unit of work that comprises
one or more database operations, such as reading, writing, or modifying data. Transactions
are fundamental to maintaining the consistency, integrity, and reliability of the data within a
database.

1. Active: The initial state of a transaction where it is executing and performing database
operations, such as reading or modifying data.
2. Partially Committed: After a transaction has executed all of its operations and is
ready to commit, but the changes have not been made permanent in the database yet.
This is a transitional state before committing.
3. Committed: In this state, the transaction has successfully completed all of its
operations, and its changes have been made permanent in the database. Once
committed, the changes are durable and cannot be rolled back.
4. Aborted: If a transaction encounters an error or is explicitly rolled back before
committing, it enters the aborted state. In this state, any changes made by the
transaction are undone, restoring the database to its state before the transaction began.
5. Failed: A transaction enters the failed state if it encounters a system failure or an error
that prevents it from completing its operations. In this state, the DBMS may
automatically abort the transaction or initiate recovery procedures to restore the
database to a consistent state.
ACID Properties of a transaction:

In the context of databases, the term "ACID properties" refers to the four key properties of a
transaction. A transaction is a logical unit of work that is performed within a database
management system. Here's an explanation of each of the ACID properties:

1. Atomicity: Atomicity ensures that a transaction is indivisible or atomic. This means


that either all the operations within the transaction are completed successfully, or
none of them are. If any part of the transaction fails, the entire transaction is rolled
back to its original state, ensuring data consistency. For example, in a bank transfer
transaction, if money is being withdrawn from one account and deposited into
another, either both operations should succeed or neither should take place.

2. Consistency: Consistency ensures that the database remains in a consistent state


before and after the transaction. This means that the transaction must adhere to all the
integrity constraints, business rules, and data validation rules defined in the database
schema. It prevents the database from entering into an inconsistent state, ensuring that
all data modifications maintain the overall correctness and validity of the database.

3. Isolation: Isolation ensures that the concurrent execution of multiple transactions


does not result in interference or inconsistency. Each transaction should appear to
execute in isolation from other transactions, even though they may be executing
concurrently. This property prevents concurrent transactions from accessing each
other's intermediate states, thereby avoiding issues such as dirty reads, non-repeatable
reads, and phantom reads.

4. Durability: Durability ensures that once a transaction is committed, its effects persist
even in the event of system failures such as power outages or crashes. Once a
transaction has been successfully committed, the changes made by the transaction are
permanent and cannot be undone. The database system must store the changes
permanently, typically by writing them to non-volatile storage such as disk, to ensure
durability.

Together, these ACID properties provide a set of guarantees that ensure the reliability,
integrity, and consistency of transactions in a database system, even in the presence of
concurrent execution and system failures.

Concurrency:

What is a Schedule?
A schedule is a series of operations from one or more transactions. A schedule can be of two
types:

Serial Schedule: When one transaction completely executes before starting another
transaction, the schedule is called a serial schedule. A serial schedule is always consistent.
e.g.; If a schedule S has debit transaction T1 and credit transaction T2, possible serial
schedules are T1 followed by T2 (T1->T2) or T2 followed by T1 ((T2->T1). A serial
schedule has low throughput and less resource utilization.

Concurrent Schedule: When operations of a transaction are interleaved with operations of


other transactions of a schedule, the schedule is called a Concurrent schedule. e.g.; the
Schedule of debit and credit transactions shown in Table 1 is concurrent. But concurrency
can lead to inconsistency in the database. The above example of a concurrent schedule is
also inconsistent.
Concurrency Control in DBMS
 Executing a single transaction at a time will increase the waiting time of the other
transactions which may result in delay in the overall execution. Hence for increasing the
overall throughput and efficiency of the system, several transactions are executed.

 Concurrency control is a very important concept of DBMS which ensures the simultaneous
execution or manipulation of data by several processes or user without resulting in data
inconsistency.

 Concurrency control provides a procedure that is able to control concurrent execution of the
operations in the database.

 The fundamental goal of database concurrency control is to ensure that concurrent


execution of transactions does not result in a loss of database consistency. The concept of
serializability can be used to achieve this goal, since all serializable schedules preserve
consistency of the database. However, not all schedules that preserve consistency of the
database are serializable.

 In general it is not possible to perform an automatic analysis of low-level operations by


transactions and check their effect on database consistency constraints. However, there are
simpler techniques. One is to use the database consistency constraints as the basis for a split
of the database into subdatabases on which concurrency can be managed separately.

 Another is to treat some operations besides read and write as fundamental low-level
operations and to extend concurrency control to deal with them.
Lock Based Concurrency Control Protocol in DBMS
In a database management system (DBMS), lock-based concurrency control (BCC) is used to
control the access of multiple transactions to the same data item. This protocol helps to
maintain data consistency and integrity across multiple users. In the protocol, transactions
gain locks on data items to control their access and prevent conflicts between concurrent
transactions.

1. Shared Lock (S): also known as Read-only lock. As the name suggests it can be
shared between transactions because while holding this lock the transaction does not
have the permission to update data on the data item. S-lock is requested using lock-S
instruction.

2. Exclusive Lock (X): Data item can be both read as well as written. This is Exclusive
and cannot be held simultaneously on the same data item. X-lock is requested using
lock-X instruction.
Implementing Shared S(a) and Exclusive X(a) lock system without any restrictions gives us
the Simple Lock-based protocol (or Binary Locking), but it has its own disadvantages, they
do not guarantee Serializability.

o guarantee serializability, we must follow some additional protocols concerning the


positioning of locking and unlocking operations in every transaction. This is where the
concept of Two-Phase Locking(2-PL) comes into the picture, 2-PL ensures serializability.

Two Phase Locking

A transaction is said to follow the Two-Phase Locking protocol if Locking and Unlocking
can be done in two phases.

 Growing Phase: New locks on data items may be acquired but none can be released.

 Shrinking Phase: Existing locks may be released but no new locks can be acquired.
Assumptions : T1 comes before T2, T1 will be commited before T2.

Validation Based Protocol


Validation phase is also known as optimistic concurrency control technique. In the validation
based protocol, the transaction is executed in the following three phases:

1. Read phase: In this phase, the transaction T is read and executed. It is used to read the value
of various data items and stores them in temporary local variables. It can perform all the
write operations on temporary variables without an update to the actual database.
2. Validation phase: In this phase, the temporary variable value will be validated against the
actual data to see if it violates the serializability.
3. Write phase: If the validation of the transaction is validated, then the temporary results are
written to the database or system otherwise the transaction is rolled back.
Multi-Version Concurrency Control (MVCC) is a way for databases to handle
multiple users reading and writing data at the same time without getting in each other's way.

Here's how it works:

1. Keeping Track of Changes: Instead of immediately changing data when someone


makes a change, MVCC keeps track of what the data looked like before the change
and what it looks like after. This means that different users can see different versions
of the data depending on when they started working.
2. No Waiting in Line: With MVCC, users don't have to wait for others to finish
making changes before they can read the data. Even if someone is changing the data,
others can still read it without any delay.
3. Everyone Gets a Snapshot: Each user sees a snapshot of the data as it was when they
started working. This way, they can be sure that what they're looking at won't
suddenly change while they're working on it.
4. Cleaning Up Old Versions: To keep things tidy, MVCC periodically cleans up old
versions of the data that nobody needs anymore. This helps keep the database running
smoothly.
UNIT – 3

Distributed Database System in DBMS


A distributed database is essentially a database that is dispersed across numerous sites, i.e., on
various computers or over a network of computers, and is not restricted to a single system. A
distributed database system is spread across several locations with distinct physical
components. This can be necessary when different people from all over the world need to
access a certain database. It must be handled such that, to users, it seems to be a single
database.

Types:

1. Homogeneous Database: A homogeneous database stores data uniformly across all


locations. All sites utilize the same operating system, database management system, and data
structures. They are therefore simple to handle.

2. Heterogeneous Database: With a heterogeneous distributed database, many locations may


employ various software and schema, which may cause issues with queries and transactions.
Moreover, one site could not be even aware of the existence of the other sites. Various
operating systems and database applications may be used by various machines. They could
even employ separate database data models. Translations are therefore necessary for
communication across various sites.
Data may be stored on several places in two ways using distributed data storage:

Replication:

 Duplicates Everywhere: Every piece of data is copied and stored in two or more
locations. This means that multiple copies of the same data exist across different
places.
 Advantages:
o More Accessible: Having data in multiple places means it's easier to access from
different locations.
o Parallel Processing: Queries can be handled simultaneously because each location
has its own copy of the data.
 Drawbacks:
o High Update Frequency: Whenever data changes, updates must be made to every
copy of that data across all locations.
o Overhead: Maintaining multiple copies of data requires additional storage and
resources.
o Complex Concurrency Management: Managing concurrent access to data across
multiple locations becomes more complicated.

Fragmentation:

 Breaking Up Data: Data is divided into smaller pieces, called fragments, and each
piece is stored in multiple locations where it's needed. These fragments are designed
in a way that allows the original data to be reconstructed.
 No Duplicate Data: Unlike replication, where data is duplicated, fragmentation does
not result in duplicate data. Each piece of data is unique and distributed across
different locations.
 Advantage:
o Consistency Not an Issue: Since there are no duplicate copies, there's no need to
worry about keeping multiple copies consistent.
 Consideration:
o Reconstruction: While fragmentation doesn't duplicate data, it's important to ensure
that the original data can be reconstructed from its fragments when needed.

Fragmentation and Its Types

Fragmentation in database management refers to the division of a database table into


smaller, more manageable pieces. This division helps in optimizing performance, managing
storage efficiently, and enhancing security. Fragmentation can occur in several ways:
 Horizontal Fragmentation: Horizontal fragmentation refers to the process of
dividing a table horizontally by assigning each row (or a group of rows) of relation to
one or more fragments. These fragments can then be assigned to different sites in the
distributed system. Some of the rows or tuples of the table are placed in one system
and the rest are placed in other systems. The rows that belong to the horizontal
fragments are specified by a condition on one or more attributes of the relation.
 Vertical Fragmentation: Divides a table into subsets of columns. Each subset
contains a portion of the table's columns but maintains the same set of rows. This type
of fragmentation is useful for partitioning data based on the frequency of access to
columns or grouping related columns together.The fragmentation should be in such a
manner that we can rebuild a table from the fragment by taking the natural JOIN
operation and to make it possible we need to include a special attribute called Tuple-
id to the schema. For this purpose, a user can use any super key. And by this, the
tuples or rows can be linked together.
 Hybrid Fragmentation: Combines both horizontal and vertical fragmentation
techniques. It allows for more flexible partitioning strategies by partitioning both rows
and columns simultaneously. This type of fragmentation is useful for complex
partitioning requirements where neither horizontal nor vertical fragmentation alone is
sufficient.

Data Replication in DBMS


Data Replication is the process of storing data in more than one site or node. It is useful
in improving the availability of data. It is simply copying data from a database from one
server to another server so that all the users can share the same data without any
inconsistency. The result is a distributed database in which users can access data relevant to
their tasks without interfering with the work of others. Data replication encompasses the
duplication of transactions on an ongoing basis so that the replicate is in a consistently
updated state and synchronized with the source.

What is Data Replication?

 Copying Data: Data replication means making copies of data from one place (like a
server or a database) to another. This helps in making sure that the same data is
available in multiple locations.
 Improving Availability: By having copies of data in different places, it ensures that
the data is available even if one server or location goes down. This improves the
availability of data for users.
 Avoiding Inconsistencies: When many users need to access the same data,
replication ensures that everyone gets the same version of the data. This prevents any
confusion or inconsistencies in the data users are working with.
 Continuous Updates: Replication involves regularly updating the copies of data so
that they stay consistent with the original data source. This ensures that all copies are
up-to-date and match the latest changes made to the data.

Why Do We Use Data Replication?

 Improved Access: Users can access data from their nearest or most convenient location,
improving efficiency.
 Redundancy: Having multiple copies of data adds redundancy, ensuring data availability even
if one copy is lost or inaccessible.
 Data Consistency: It ensures that everyone sees the same version of data, avoiding
confusion or conflicts.
 Full Replication: In full replication, all data from the primary database is replicated to one
or more secondary databases asynchronously. This means that every change made to the
primary database is eventually propagated to all secondary databases. Full replication ensures
that all replicas have the same complete dataset as the primary database. It provides strong
consistency guarantees but may introduce higher overhead in terms of network bandwidth
and storage space.

 No Replication: In this scheme, no data replication occurs. Changes made to the primary
database are not propagated to any secondary databases. This might be suitable for scenarios
where data redundancy or fault tolerance is not a requirement, or where the overhead of
maintaining replicas is not justified.

 Partial Replication: Partial replication involves replicating only a subset of the data from
the primary database to secondary databases asynchronously. This subset could be selected
based on criteria such as access frequency, importance, or relevance to specific geographic
regions. Partial replication can help reduce replication overhead and optimize resource usage
while still providing some level of fault tolerance and scalability.
Commit Protocol in DBMS

The concept of the Commit Protocol was developed in the context of database systems.
Commit protocols are defined as algorithms that are used in distributed systems to ensure that
the transaction is completed entirely or not. It helps us to maintain data integrity, Atomicity,
and consistency of the data. It helps us to create robust, efficient and reliable systems.

1. One-Phase Commit

It is the simplest commit protocol. In this commit protocol, there is a controlling site, and
there are a variety of slave sites where the transaction is performed. The steps followed in the
one-phase commit protocol are following: –

 Each slave sends a ‘DONE’ message to the controlling site after each slave has completed its
transaction.

 After sending the ‘DONE’ message, the slaves start waiting for a ‘Commit’ or ‘Abort’ response
from the controlling site.

 After receiving the ‘DONE’ message from all the slaves, then the controlling site decides
whether they have to commit or abort. Here the confirmation of the message is done. Then
it sends a message to every slave.

 Then after the slave performs the operation as instructed by the controlling site, they send
an acknowledgement to the controlling site.
2. Two-Phase Commit (2PC):

 Each slave sends a ‘DONE’ message to the controlling site after each slave has completed its
transaction.
 Phase 1 - Prepare Phase: In the first phase, the coordinator (typically the transaction
manager) sends a prepare request to all participating nodes (participants). Each
participant replies with an acknowledgment indicating whether it is ready to commit
or abort the transaction.
 Phase 2 - Commit Phase (or Abort): If all participants vote to commit in Phase 1,
the coordinator sends a commit message to all participants. If any participant votes to
abort or if the coordinator times out waiting for responses, it sends an abort message
to all participants.
 Advantages:
o Simple and easy to implement.
o Provides atomicity guarantees for distributed transactions.
 Disadvantages:
o Blocking: The coordinator may block if it cannot reach a participant or if a
participant fails.
o Single point of failure: The coordinator can become a single point of failure.
o Blocking locks: Participants may hold locks until the decision is made,
potentially leading to deadlocks or performance issues.
3. Three-Phase Commit (3PC):

 Phase 1 - Can Commit Phase: Similar to the prepare phase in 2PC, the coordinator
asks each participant if it can commit the transaction. However, instead of waiting for
an immediate response, it asks participants to promise to commit if all are able.
 Phase 2 - Pre-Commit Phase: If all participants agree to commit in Phase 1, the
coordinator sends a pre-commit message to all participants, instructing them to
prepare to commit.
 Phase 3 - Commit (or Abort): If all participants are prepared to commit in Phase 2,
the coordinator sends a commit message to all participants. If any participant is not
prepared, the coordinator sends an abort message to all participants.
 Advantages:
o Reduces the chance of blocking: The extra phase helps to ensure that
transactions are not blocked indefinitely.
o Reduces the window of inconsistency: The third phase reduces the time during
which a transaction may commit locally but fail globally.
 Disadvantages:
o Increased complexity: 3PC is more complex to implement compared to 2PC.
o Higher latency: The additional phase can introduce extra latency into the
commit process.
UNIT – 4

The ODBMS which is an abbreviation for object-oriented database management system is


the data model in which data is stored in form of objects, which are instances of classes.
These classes and objects together make an object-oriented data model.

Object-Oriented Data Model Elements:

1. Object Structure:
o Refers to the components that make up an object, including attributes
(characteristics) and methods (operations).
o Objects encapsulate data and behavior in a single unit, providing data abstraction.
2. Messages and Methods:
o Messages: Act as communication channels between entities and the outside world.
 Read-only messages: Do not change the value of a variable.
 Update messages: Modify the value of a variable.
o Methods: Chunks of code executed in response to messages.
 Read-only methods: Do not change variable values.
 Update methods: Modify variable values.
3. Variables:
o Store an object's data, allowing objects to be distinguished from each other.
4. Object Classes:
o Blueprints for creating objects with similar characteristics and behaviors.
o Instances of a class are objects representing real-world items.
o Classes define attributes, methods, and messages related to objects.
5. Inheritance:
o Allows classes to inherit attributes and methods from other classes.
o Establishes a class hierarchy to illustrate commonalities between classes.
6. Encapsulation:
o Conceals internal details of objects, exposing only necessary parts for interaction.
o Supports the idea of data or information hiding for cleaner and more secure code.
7. Abstract Data Types (ADTs):
o User-defined data types that carry both data and methods.
o Can encapsulate complex data structures and operations, promoting modularity and
reusability.
Object-Relational Data Model in DBMS
The Object-Relational data model refers to a combination of a Relational database model and
an Object-Oriented database model. As a result, it supports classes, objects, inheritance, and
other features found in Object-Oriented models, as well as data types, tabular structures, and
other features found in Relational Data Models.

 Relational Foundation: ORDBMS is built upon the foundation of relational database


systems. It retains the traditional tabular structure of relational databases, consisting
of tables with rows and columns.
 Object Extensions: ORDBMS extends the relational model by incorporating object-
oriented features such as user-defined data types, methods, and inheritance. This
allows for more complex data modeling capabilities compared to traditional relational
databases.
 User-Defined Data Types (UDTs): ORDBMS allows users to define custom data
types, which can encapsulate both data and behavior. These user-defined data types
can be used as attributes within tables, providing a way to model complex data
structures more naturally.
 Methods and Functions: ORDBMS supports the association of methods or functions
with user-defined data types. These methods can perform operations on the data
encapsulated within the type, adding behavior to the data.
 Inheritance: ORDBMS supports inheritance relationships between user-defined data
types. This means that a derived type can inherit attributes and methods from a base
type, facilitating code reuse and promoting a hierarchical data model.
 Integration with SQL: ORDBMS typically provides SQL (Structured Query
Language) extensions to support object-oriented features. These extensions allow
users to define and manipulate user-defined data types, methods, and inheritance
within SQL queries.
 Complex Data Modeling: ORDBMS offers more flexibility in data modeling
compared to traditional relational databases. It allows for the representation of
complex relationships, polymorphism, and encapsulation of behavior within the
database schema.
 Scalability and Performance: ORDBMS aims to maintain the scalability and
performance benefits of relational databases while providing additional features for
complex data modeling. However, the performance of ORDBMS may vary depending
on the implementation and specific use cases.

Objects, Object Identity, Equality, and Object Reference

1. Objects: The fundamental units in object-oriented databases, encapsulating data


(attributes) and behavior (methods). Each object belongs to a class.
2. Object Identity (OID): A unique identifier assigned to each object. It distinguishes
objects independently of their attribute values.
3. Equality: Objects can be compared based on their OID (identity equality) or their
state (state equality).
o Identity Equality: Two objects are identical if they have the same OID.
o State Equality: Two objects are equal if they have the same attribute values.
4. Object Reference: Objects can reference other objects through their OID, enabling
complex relationships and structures.
Object-oriented databases (OODBs), aggregation and association are two fundamental
concepts that describe relationships between objects. Let's break down each concept:

Association:

Definition:
Association represents a relationship between two or more objects. It describes how objects
are related or connected to each other within the system.

Characteristics:

1. Multiplicity: Specifies the number of objects participating in the association. For


example, a one-to-one, one-to-many, or many-to-many relationship.
2. Navigation: Determines how objects navigate or interact with each other. This
includes accessing related objects through references or pointers.
3. Directionality: Defines the direction of the relationship, such as uni-directional (one-
way) or bi-directional (two-way) association.
Example:
Consider a library system where Book objects are associated with Author objects. Each book
may have one or more authors, creating a one-to-many association. This association allows
books to be linked to their respective authors, enabling navigation from a book to its authors
and vice versa.

Aggregation:

Definition:
Aggregation represents a whole-part relationship between objects, where one object (the
whole) is composed of or contains other objects (the parts). It's a specialized form of
association that denotes a stronger relationship between objects.

Characteristics:

1. Ownership: The whole object owns or manages the parts. When the whole object is
deleted, its parts may or may not be deleted depending on the aggregation type.
2. Composition: Stronger form of aggregation where the parts are exclusively owned by
the whole. Parts cannot exist independently outside the context of the whole.
3. Association: Aggregation implies an association between the whole and its parts, but
the relationship is more tightly coupled compared to regular associations.

Example:
In a car manufacturing system, a Car object may aggregate Engine, Wheel, and Body objects.
The Car object is composed of these parts, and it manages their lifecycle. If the Car object is
destroyed, its parts are typically destroyed as well (composition), but if it's a weaker form of
aggregation, the parts might still exist independently.

Comparison:

 Association represents a generic relationship between objects, while aggregation


represents a more specific relationship where one object is composed of or contains
other objects.
 Association describes how objects are related and navigated, while aggregation
emphasizes ownership and composition of parts within a whole.
 Association is often used for loosely coupled relationships, while aggregation is used
for stronger, hierarchical relationships.

Modeling Complex Data Semantics:

1. Complex Data Semantics:


 Definition: Complex data semantics refer to the ability to represent and manipulate
data structures and relationships that are more intricate and sophisticated than simple
values or atomic types.
 Application: OODBs allow for the representation of complex data structures such as
objects, arrays, lists, or nested structures. This includes modeling relationships
between objects through associations, aggregations, and inheritance hierarchies.

Specialization and Generalization:

2. Specialization:

 Definition: Specialization involves creating more specialized subclasses (or subtypes)


of existing classes (or supertypes) by refining their attributes or behaviors.
 Application: In OODBs, specialization enables the creation of subclasses with
additional attributes or methods that refine the behavior of the parent class. For
example, a superclass "Animal" could be specialized into subclasses like "Mammal"
and "Bird" with specific attributes and behaviors.

3. Generalization:

 Definition: Generalization abstracts common attributes and behaviors of multiple


classes into a higher-level class (or supertype), capturing shared characteristics.
 Application: OODBs use generalization to create abstract classes that define common
properties shared by multiple subclasses. For instance, a superclass "Vehicle" may
generalize common attributes like "make" and "model" shared by subclasses such as
"Car" and "Truck".

Aggregation and Association:

4. Aggregation:

 Definition: Aggregation represents a whole-part relationship between objects, where


one object (the whole) contains or is composed of other objects (the parts).
 Application: OODBs utilize aggregation to model hierarchical relationships between
objects. For instance, a "Library" object may aggregate "Book" objects as its parts,
representing the composition relationship between them.

5. Association:

 Definition: Association represents a generic relationship between objects, describing


how they are related or connected.
 Application: OODBs use association to establish connections between objects. For
example, a "Student" object may be associated with multiple "Course" objects
through enrollment relationships, enabling navigation between them.

Example:

University Management System:


 Complex Data Semantics: The system models complex relationships between
objects such as students, courses, professors, and departments, capturing intricate data
semantics like enrollment, teaching assignments, and departmental affiliations.
 Specialization: The system may define specialized subclasses like
"UndergraduateStudent" and "GraduateStudent" inheriting from a common "Student"
superclass, with additional attributes or methods specific to each subtype.
 Generalization: Common properties shared by entities like "Student" and "Professor"
may be abstracted into higher-level classes like "Person", capturing shared
characteristics such as name and contact information.
 Aggregation and Association: The system utilizes aggregation to represent
hierarchical structures like courses containing lectures and assignments, and
association to model relationships like students enrolling in courses or professors
teaching courses.
Short Questions:
Relational algebra:

 Formal Framework: Relational algebra provides a formal framework for


manipulating relations (tables) in a relational database management system
(RDBMS).
 Basic Operations: Includes operations like selection (σ), projection (π), union (∪),
intersection (∩), and difference (-), which operate on relations to retrieve, filter, and
combine data.
 Composite Operations: More complex operations like join (⨝) and division (÷)
combine multiple relations based on common attributes or criteria.
 Optimization: Provides a basis for query optimization techniques to improve the
efficiency of database operations.

A Deadlock is a situation where a set of processes are blocked because each process is
holding a resource and waiting for another resource occupied by some other process. When
this situation arises, it is known as Deadlock.

Handling of deadlocks in distributed systems is more complex than in centralized systems


because the resources, the processes, and other relevant information are scattered on different
nodes of the system.

Three commonly used strategies to handle deadlocks are as follows:

 Avoidance: Resources are carefully allocated to avoid deadlocks.


 Prevention: Constraints are imposed on the ways in which processes request
resources in order to prevent deadlocks.
 Detection and recovery: Deadlocks are allowed to occur and a detection algorithm is
used to detect them. After a deadlock is detected, it is resolved by certain means.

A lock compatibility matrix, often used in concurrency control mechanisms, specifies


which types of locks are compatible with each other and which types conflict. Here's a
simplified example of a lock compatibility matrix:

| Shared (S) | Exclusive (X) |


-----------------------------------------
Shared (S) | Y | N |
-----------------------------------------
Exclusive | N | N |
(X)
-----------------------------------------

In this matrix:
 "Y" indicates compatibility, meaning that two transactions holding locks of these
types can coexist without conflict.
 "N" indicates incompatibility, meaning that if one transaction holds a lock of a certain
type, another transaction cannot acquire a conflicting lock type simultaneously.

In the example:

 Shared (S) locks are compatible with other Shared (S) locks but not with Exclusive
(X) locks.
 Exclusive (X) locks are not compatible with any other locks, including Shared (S)
locks.

This matrix helps in determining when transactions can proceed concurrently without causing
conflicts or deadlocks. It guides the behavior of the concurrency control mechanisms in a
database system, ensuring data consistency while allowing for efficient concurrent access to
the database.
A join operation combines rows from two or more tables based on a related column between
them. It allows you to retrieve data that spans multiple tables by linking rows with related
information.
Cartesian Product Operation in Relational Algebra
On applying CARTESIAN PRODUCT on two relations that is on two sets of tuples, it will
take every tuple one by one from the left set(relation) and will pair it up with all the tuples in
the right set(relation).

You might also like