0% found this document useful (0 votes)
17 views27 pages

DB Unit-2

The document provides an overview of distributed databases, detailing their types, architectures, and transaction management. It distinguishes between homogeneous and heterogeneous databases, explains various architectures like client-server and peer-to-peer, and outlines distributed transaction mechanisms including the two-phase commit protocol. Additionally, it discusses concurrency control, recovery methods, and the importance of maintaining ACID properties in distributed transactions.

Uploaded by

Manoj D
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views27 pages

DB Unit-2

The document provides an overview of distributed databases, detailing their types, architectures, and transaction management. It distinguishes between homogeneous and heterogeneous databases, explains various architectures like client-server and peer-to-peer, and outlines distributed transaction mechanisms including the two-phase commit protocol. Additionally, it discusses concurrency control, recovery methods, and the importance of maintaining ACID properties in distributed transactions.

Uploaded by

Manoj D
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

UNIT-2

Distributed databases:
A distributed database is a type of database that has
contributions from the common database and information
from common database is captured by local computers.
In this type of database system, the data is not in one
place and is distributed at various organizations.
Types of Distributed Databases

Homogeneous Distributed Databases:


Here, all the sites use similar DBMS and operating
systems. Its properties are –
• The sites use very similar software.
• The sites use identical DBMS or DBMS from the same
vendor.
• Each site is aware of all other sites and cooperates with
other sites to process user requests.
Types of Homogeneous Distributed Database :
There are two types of homogeneous distributed
database –
• Autonomous − Each database is independent that
functions on its own. They are integrated to a
controlling application and use message passing to
share data updates.
• Non-autonomous − Data is distributed across the
homogeneous nodes and a central or master DBMS
coordinates data updates across the sites.
Heterogeneous Distributed Databases:
In a heterogeneous distributed database, different sites
have different operating systems, DBMS products and data
models. Its properties are –
• Different sites use dissimilar schemas and software.
• The system may be composed of a variety of DBMS
like relational, network, hierarchical or object
oriented.
• Query processing is complex due to dissimilar
schemas.
• A site may not be aware of other sites and so there
is limited co-operation in processing user requests.
Types of Heterogeneous Distributed Databases
• Federated − The heterogeneous database systems are
independent in nature and integrated together so that
they function as a single database system.
• Un-federated − The database systems employ a central
coordinating module through which the databases are
accessed.
Distributed DBMS Architectures :
DDBMS architectures are generally developed depending
on three parameters –
• Distribution − It states the physical distribution of data
across the different sites.
• Autonomy − It indicates the distribution of control of the
database system to DBMS to operate independently.
• Heterogeneity − It refers to the uniformity or
dissimilarity of the data models.
Architectural Models :
Some of the common architectural models are −
• Client - Server Architecture for DDBMS
• Peer - to - Peer Architecture for DDBMS
• Multi - DBMS Architecture
Client - Server Architecture for DDBMS :
This is a two-level architecture where the functionality is
divided into servers and clients.
The server functions primarily on data management,
query processing, optimization and transaction management.
Client functions include mainly user interface.
The two different client - server architecture are −
• Single Server Multiple Client
• Multiple Server Multiple Client

Peer- to-Peer Architecture for DDBMS :


In these systems, each peer acts both as a client and a
server. The peers share their resource with other peers and
co-ordinate their activities.
This architecture generally has four levels of schemas –
• Global Conceptual Schema −It describe the global
logical view of data.
• Local Conceptual Schema – It describe local logical
data organization at each site.
• Local Internal Schema – It describe physical data
organization at each site.
• External Schema – It describe user view of data.

Multi - DBMS Architectures:


This is an integrated database system formed by a
collection of two or more autonomous database systems.
Multi-DBMS can be expressed through six levels of schemas –
• Multi-database View Level − Describes user point
of view views
• Multi-database Conceptual Level – Describes
concept (how implementation done)
• Multi-database Internal Level − Describes the
physical storage of data.
• Local database View Level − Describes public view
of local data.
• Local database Conceptual Level − Describes local
data organization at each site.
• Local database Internal Level − Describes physical
data organization at each site.
There are two design alternatives for multi-DBMS −
Design Alternatives :
• Non-replicated and non-fragmented
• Fully replicated
• Partially replicated
• Fragmented
• Mixed
Non-replicated & Non-fragmented :
Different tables are placed at different sites. Data is
placed near to the site where it is used most.
It is most suitable for DB where percentage of joining
tables from different sites is low.
Fully Replicated:
At each site, one copy of all the database tables is
stored. Since, each site has its own copy of the entire
database.
So the data retravel time is low. but, update operation
need to made in every site so, updation cost is high.
Partially Replicated :
Copies of some portions of tables are stored at different
sites. Necessary files(ie) frequently accessing files copies are
stored in different sites.
Fragmented :
A table is divided into two or more pieces referred to as
fragments or partitions, and each fragment can be stored at
different sites.
Here, there is only one copy of each fragment in the
system, i.e. no redundant data.
The three fragmentation techniques are −
1. Vertical fragmentation
2. Horizontal fragmentation
3. Hybrid fragmentation
Mixed Distribution:
Fragmentation + Partial replications=Mixed Distribution.
Here, the tables are initially fragmented in any form
(horizontal or vertical), and then these fragments are
partially replicated across the different sites.
Distributed Data Storage
A distributed database is basically a database that is not
limited to one system, it is spread over different sites, i.e, on
multiple computers or over a network of computers.
This may be required when a particular database needs
to be accessed by various users globally.
Types of Data Storage in Distributed DB:
There are 2 ways in which data can be stored on different
sites. These are:
1.Replication – The entire relation is stored redundantly
at 2 or more sites. If the entire database is available at all
sites.
Advantage: Easy access of any data
Disadvantage: Need to update in all sites.
2.Fragmentation –The relations are fragmented (i.e.,
they’re divided into smaller parts) and each of the fragments
is stored in different sites where they’re required.
Fragmentation of relations can be done in two ways:
• Horizontal fragmentation – Splitting by rows – The
relation is fragmented into groups of tuples so that each
tuple is assigned to at least one fragment.
• Vertical fragmentation – Splitting by columns – The
schema of the relation is divided into smaller schemas.
Each fragment must contain a common candidate key so
as to ensure lossless join.
Distributed Transactions:
A distributed transaction includes one or more
statements that, individually or as a group, update data on
two or more distinct nodes of a distributed database.
A transaction is a series of object operations that must
be done in ACID-compliant manner.
1. Atomicity – The transaction is completed entirely
or not at all.may be commit or abort.
2. Consistency – It is a term that refers to the
transition from one consistent state to another.
3. Isolation – It is carried out separately from other
transactions.
4. Durability– Once completed, it is long lasting.
Transactions – Commands :
• Begin– initiate a new transaction.
• Commit– End a transaction and the changes made
during the transaction are saved. Also, it allows other
transactions to see the modifications you’ve made.
• Abort– End a transaction and all changes made during
the transaction will be undone.
Various roles are allocated to run a transaction successfully :
• Client– The transactions are issued by the clients.
• Coordinator– The execution of the entire transaction is
controlled by it (handles Begin, commit & abort).
• Server–The coordinator(Staff) must be known by the
transactional server(Clg Management). The transactional
server registers its participation in a transaction with the
coordinator.
Distributed transactions can be structured in two different
ways:
1. Flat transactions
2. Nested transactions
FLAT TRANSACTIONS:
A flat transaction has a single initiating point(Begin) and
a single end point(Commit or abort).
They are usually very simple and are generally used for
short activities rather than larger ones.
A client makes requests to multiple servers in a flat
transaction.
Limitations of a flat Transaction :
1. All work is lost if crash happened in event.
2. Only one DBMS may be used at a time.
3. No partial rollback is possible.
NESTED TRANSACTIONS :
A transaction that includes other transactions within its
initiating point and at end point are known as nested
transactions.
The nested transactions here are called sub transactions.

EXAMPLE of distributed transaction (T):


Customer transfers :
➢ Rs. 105 from account A to account C and
➢ Subsequently, Rs. 205 from account B to account D.
It can be viewed/ thought of as :
Transaction T : Start
Transfer Rs 105 from A to C :
Deduct Rs 105 from A(withdraw from A) & Add Rs 105 to
C(depopsit to C)
Transfer Rs 205 from B to D : Deduct Rs 205 from B
(withdraw from B)& Add Rs 205 to D(depopsit to D)
End
Mechanisms in Distributed Transactions:
1. Supported Types of Distributed Transactions
2. Session Trees for Distributed Transactions
3. Two-Phase Commit Mechanism
Supported Types of Distributed Transactions
➢ DML and DDL Transactions
➢ Transaction Control Statements
DML and DDL Transactions
➢ CREATE TABLE AS SELECT
➢ DELETE
➢ INSERT (default and direct load)
➢ LOCK TABLE SELECT
➢ SELECT FOR UPDATE
Transaction Control Statements
➢ COMMIT
➢ ROLLBACK
➢ SAVEPOINT
Session Trees for Distributed Transactions
A session tree is a hierarchical model of the transaction
that describes the relationships among the nodes.
The node that originates the transaction is the global
coordinator(root node), and the node in charge of initiating a
commit or rollback is called the commit point site(leaf node).
Two-Phase Commit Mechanism:
Ensures the integrity of data in a distributed transaction
using the two-phase commit mechanism.

Different phases in 2-phase commit:


1) Prepare phase- initiating node in the transaction asks
the other participating nodes to promise to commit
or roll back the transaction.
2) Commit phase- initiating node asks all participating
nodes to commit the transaction.
3) Forget phase- forgets all the transactions.
In-Doubt Transactions: (2marks)
The two-phase commit mechanism ensures that all
nodes either commit or perform a rollback together.
If any of the three phases fails because of a system or
network error? The transaction becomes in-doubt.
Until RECOVERY is done, the data is locked for both
reads and writes.

3)Distributed Transaction Management :


It deals with the problems of providing a consistent
distributed database in the presence of a large number of
transactions (local and global) and failures (communication
link and/or site failures).
This is accomplished through
(i) Distributed commit protocols (ensure atomicity
property).
(ii) Distributed concurrency control techniques
(ensure consistency and isolation properties).
(iii) Distributed recovery methods (preserve
consistency and durability when failures occur).
1)Distributed Commit Protocols :
In a local database:
For committing a transaction, the transaction manager
has to only convey the decision to commit to the recovery
manager.
In a distributed database:
The transaction manager convey the decision to commit
to all the servers in the various sites where the transaction
is being executed and uniformly enforce the decision.
When processing is complete at each site, it reaches the
partially committed transaction state and waits for all
other transactions to reach their partially committed
states.
When it receives the message that all the sites are ready
to commit, it starts to commit. In a distributed system,
either all sites commit or none of them does.

The different distributed commit protocols are −


➢ One-phase commit
➢ Two-phase commit
➢ Three-phase commit

Distributed One-phase Commit


Distributed one-phase commit is the simplest commit
protocol. The steps in distributed commit are –
• After each slave has locally completed its transaction, it
sends a “DONE” message to the controlling site.
• The slaves wait for “Commit” or “Abort” message from
the controlling site. This waiting time is called window of
vulnerability.
• When the controlling site receives “DONE” message
from each slave, it makes a decision to commit or abort.
This is called the commit point. Then, it sends this
message to all the slaves.
• On receiving this message, a slave either commits or
aborts and then sends an acknowledgement message to
the controlling site.

Distributed Two-phase Commit:


Distributed two-phase commit reduces the vulnerability
of one-phase commit protocols. The steps performed in the
two phases are as follows –
Phase 1: Prepare Phase
• After each slave has locally completed its transaction, it
sends a “DONE” message to the controlling site. When
the controlling site has received “DONE” message from
all slaves, it sends a “Prepare” message to the slaves.
• The slaves vote on whether they still want to commit or
not. If a slave wants to commit, it sends a “Ready”
message.
• A slave that does not want to commit sends a “Not
Ready” message. This may happen when the slave has
conflicting concurrent transactions or there is a timeout.
Phase 2: Commit/Abort Phase :
After the controlling site has received “Ready” message from
all the slaves –
• The controlling site sends a “Global Commit” message to
the slaves.
• The slaves apply the transaction and send a “Commit
ACK” message to the controlling site.
• When the controlling site receives “Commit ACK”
message from all the slaves, it considers the transaction
as committed.
After the controlling site has received the first “Not Ready”
message from any slave –
• The controlling site sends a “Global Abort” message to
the slaves.
• The slaves abort the transaction and send a “Abort ACK”
message to the controlling site.
• When the controlling site receives “Abort ACK” message
from all the slaves, it considers the transaction as
aborted.

Distributed Three-phase Commit :


Phase 1: Prepare Phase
The steps are same as in distributed two-phase commit.
Phase 2: Prepare to Commit Phase
• The controlling site issues an “Enter Prepared State”
broadcast message.
• The slave sites vote “OK” in response.
Phase 3: Commit / Abort Phase:
The steps are same as two-phase commit except that
“Commit ACK”/”Abort ACK” message is not required.

2)Distributed DBMS - Controlling Concurrency


Concurrency controlling techniques ensure that multiple
transactions are executed simultaneously while maintaining
the ACID properties of the transactions and serializability in
the schedules.
Locking Based Concurrency Control Protocols
It use the concept of locking data items. A lock is a
variable associated with a data item that determines whether
read/write operations can be performed on that data item or
not.
Locking-based concurrency control systems can use
either one-phase or two-phase locking protocols.
One-phase Locking Protocol
Here, each transaction locks an item before use and
releases the lock as soon as it has finished using it.
Note: One-phase locking provides maximum concurrency but
does not always guaranteed serializability.
Two-phase Locking Protocol
Growing phase: Transaction only acquires all the locks it
needs and do not release any lock.
Shrinking phase: Transaction releases the locks and
cannot request any new locks.
Note: Two-phase locking protocol is guaranteed to be
serializable.

3)Distributed Recovery Methods :


In order to recover from database failure, database
management systems resort to a number of recovery
management techniques
The typical strategies for database recovery are −
• Soft failures: that result in inconsistency of database,
recovery strategy includes transaction undo or rollback.
• Hard failures: resulting in extensive damage to
database, recovery strategies encompass restoring a
past copy of the database from archival backup.
Other type of recovery methods are:
1) Recovery from Power Failure
2) Recovery from Disk Failure
3) Recovery Using UNDO / REDO
Checkpointing (2mark):
Checkpoint is a point of time at which a record is written
onto the database from the buffers.
If system crash, the recovery manager does not have to
redo the transactions that have been committed before
checkpoint.
The two types of checkpointing techniques are −
• Consistent checkpointing
• Fuzzy checkpointing

Consistent Checkpointing :
It creates a consistent image of the database at
checkpoint.
During recovery, only those transactions which are on
the right side of the last checkpoint are undone or
redone.
The transactions to the left side of the last consistent
checkpoint are already committed and needn’t be
processed again.
Fuzzy Checkpointing :
In fuzzy checkpointing, at the time of checkpoint, all the
active transactions are written in the log.
5)Event-Condition-Action rule(ECA Rule):
ECA rules approach was developed to support the need
to react to different kinds of events occurring in active
databases.
There are three components in an ECA rule:
•Event: Specifies the event that triggers the invocation of the
rule.
•Condition: Consists of the conditions that need to be
satisfied in order to carry out the specified action.
•Action: Specifies the actions to be taken on the data.

Eg : ATM process
Event – ATM card insertion
Condition – ask the Password,whether the password match
or not
Action – If Password matches then do the transaction

Active Databases:
Active Database is a database consisting of set of
triggers. In such database, DBMS initially verifies whether the
particular trigger specified in the statement that modifies is
activated or not, prior to executing the statement.
If the trigger is active then DBMS executes the condition
part and then executes the action part only if the specified
condition is evaluated to true.

Features of Active Database:


1. It possess all the concepts of a conventional database
i.e. data modelling facilities, query language etc.
2. It supports all the functions of a traditional database
like data definition, data manipulation, storage management
etc.
TRIGGER:
A trigger is a procedure which is automatically invoked
by the DBMS in response to changes to the database, and is
specified by the database administrator (DBA).
Parts of trigger:
• Event − An event is a changes that made to the database
• Condition − A query that is run when the trigger is
activated.
• Action −A procedure which is executed when the trigger
is activated and its condition is true.
Use of trigger:
• To implement any complex business rule.
• Triggers will be used to audit the process. For example,
to keep track of changes made to a table.
• Trigger is used to perform automatic action when
another concerned action takes place.
Types of triggers:
1)Statement level trigger − It is fired only once for DML
statement irrespective of number of rows.
2)Before-triggers − The trigger is to be fired before a
command like INSERT, DELETE, or UPDATE is executed or after
the command is executed.
3)After-triggers − It is used after the triggering action is
completed.
4)Row-level triggers − It is fired for each row that is affected
by DML command.
Create database trigger
CREATE TRIGGER triggername
{BEFORE|AFTER} {DELETE|INSERT|UPDATE} ON table
[FOR EACH ROW] BEGIN
SQL Statements….. END

6)ODBC:
ODBC stands for Open Database Connectivity. It is an
open standard Application Programming Interface(API).
It is an important technology in the database to access
and manage databases.
ODBC allows programs to use SQL requests that access
databases without knowing the proprietary interfaces to the
databases.
ODBC consists of four components, working together to
enable functions:
• Application. The end-user program that calls the ODBC
functions and submits the SQL statements.
• Driver manager. Loads the correct driver for each
application and database.
• Driver. Handles ODBC function calls, and then submits
each SQL request to the data source.
• Data source. The database being accessed and
its database management system.
ODBC Architecture:

ODBC Enabled Application


This is any ODBC compliant application, such as
Microsoft Excel, Tableau, Crystal Reports, Microsoft Power BI,
or similar application (Spreadsheet, Word processor, Data
Access & Retrievable Tool, etc.).
ODBC Driver Manager
The ODBC Driver Manager loads and unloads ODBC
drivers on behalf of an application.
The Windows platform comes with a default Driver
Manager, while non-windows platforms have the choice to
use an open source ODBC.

You might also like