0% found this document useful (0 votes)
58 views5 pages

Alfred Advanced DTB

The document contains solutions to three questions regarding distributed database systems. Q1 is summarized as discussing how the top three layers of a reference architecture for distributed databases are site-independent and define logical relationships rather than physical data distribution. Q2 asks about dependency types in distributed database reference architectures and the role of local mapping schemas in integrating heterogeneous multi-site databases. Q3 differentiates between transaction recovery, which undoes faulty transactions, and crash recovery, which recovers a database after a failure and brings it to a consistent state. Transaction and transaction coordinator responsibilities are also defined.

Uploaded by

Mr Tuchel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views5 pages

Alfred Advanced DTB

The document contains solutions to three questions regarding distributed database systems. Q1 is summarized as discussing how the top three layers of a reference architecture for distributed databases are site-independent and define logical relationships rather than physical data distribution. Q2 asks about dependency types in distributed database reference architectures and the role of local mapping schemas in integrating heterogeneous multi-site databases. Q3 differentiates between transaction recovery, which undoes faulty transactions, and crash recovery, which recovers a database after a failure and brings it to a consistent state. Transaction and transaction coordinator responsibilities are also defined.

Uploaded by

Mr Tuchel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

NAME: ALFRED ADEDE

REG : 19/02741

ADVANCED DATABASE

LEC. MR ISAAC OKOLA

SOLUTION

Q1. Why the top three layers in the reference architecture for distributed database systems are
often referred to as site-independent schemas? Define physical image of a global relation at a site.

Due to the fact that the top three layers of the reference architecture describe the logical database
application as if there is no data distribution and specify entities, relationships, and constraints,
including security and integrity restrictions for worldwide database program. They are in charge of
guaranteeing physical data independence between an application and the environment where data
distribution will take place when it is implemented (global conceptual scheme). The level of the
global external scheme that specifies the portion of the distributed database that is important to
various users is at the top, where user applications and user access to the distributed database are
represented.

The data should be logically partitioned in a distributed database according to the fragmentation
schema. A set of international relationships make up the global conceptual schema.

The fragmentation schema defines the mapping between the global relation and the pieces.

The position of the data fragments is specified by the allocation scheme. The distributed database's
redundancy is determined by the replication kind of mapping that will be used in this case. When
data is distributed redundantly, the mapping is one to many, whereas when data is distributed non-
redundantly, it is one to one.

All fragments at site J that belong to the same connection are referred to as the physical image of
the global relation at that site. It consists of the global relation, fragments and physical images. A
copy of fragment is denoted as Rji

Q2) Comment on the different types of dependency for the bottom layers in the reference
architecture of DDBMS. Explain the role of local mapping schema towards the integration of
heterogeneous multi-site databases in this context. (5marks)

The local mapping schema maps fragments in the allocation schema into external objects in the local
database and this type of mapping depends on the local DBMS and therefore in a heterogenous
distributed DBMS, they may have different types of local mapping at different nodes.

Q3.) a) Differentiate between transaction recovery and crash recovery

Transaction recovery is done to eliminate the adverse effects of faulty transactions rather than to
recover from a failure. Faulty transactions include all transactions that have changed the database
into undesired state and the transactions that have used values written by the faulty transactions.
Transaction recovery in these cases is a two-step process:

 UNDO all faulty transactions and transactions that may be affected by the faulty transactions.

 REDO all transactions that are not faulty but have been undone due to the faulty transactions.
Steps for the UNDO operation are:
 If the faulty transaction has done INSERT, the recovery manager deletes the data item(s) inserted.
 If the faulty transaction has done DELETE, the recovery manager inserts the deleted data item(s)
from the log.

 If the faulty transaction has done UPDATE, the recovery manager eliminates the value by writing
the before-update value from the log. Steps for the REDO operation are:

 If the transaction has done INSERT, the recovery manager generates an insert from the log.

 If the transaction has done DELETE, the recovery manager generates a delete from the log.

 If the transaction has done UPDATE, the recovery manager generates an update from the log.

Crash recovery is the process by which the database is moved back to a consistent and usable state.
This is done by rolling back incomplete transactions and completing committed transactions that
were still in memory when the crash occurred

Conditions that can necessitate a crash recovery include:

i. A power failure on the machine, causing the database manager and the database partitions on it to
go down.

ii. A hardware failure such as memory, disk, CPU, or network failure.

iii. A serious operating system error that causes the Database instance to end abnormally.

b) Briefly explain the objectives of distributed transaction management. 5 marks

i. CPU and main memory utilization should be improved

ii. Response time should be minimized

iii. Communication cost should be minimized

Q3) Write down the responsibilities of transaction manager and transaction coordinator.

Transaction managers often have the following responsibilities:

i. Demarcation: Starting and finishing transactions by means of begin, commit and rollback
methods.
ii. Controlling the Transaction Context: Transaction contexts contain all the information that
helps a transaction manager to monitor a transaction. Transaction managers are in charge
of building transaction contexts and connecting them to the existing thread.
iii. Coordinating the Transaction: Transaction managers generally have the ability to harmonize
a transaction over various resources. This feature demands the two-phase commit protocol.
XA protocol is also used to register and manage the resources.
iv. Recovery from Failure: Transaction managers are accountable to guarantee that the
resources are not kept in an inconsistent state in case of a system or application failure.
Responsibilities of Transaction coordinator.

i. Starting the execution of a transaction initiated at that site.


ii. Breaking the transaction into a set of sub-transactions if needed.
iii. Distributing the sub-transactions to different sites.
iv. Coordinating the completion of the transaction.
b) Define distributed serializability. How is it ensured in a DDBMS? {5 marks}

Serializability is the classical concurrency scheme. It ensures that a schedule for executing
concurrent transactions is equivalent to one that executes the transactions serially in some
order. It assumes that all accesses to the database are done using read and write
operations.

4) a) Compare and contrast Pessimistic and Optimistic Concurrency Control techniques. {5


marks}
i. Pessimistic concurrency control allows concurrency conflict happens and if it happens, we
react on it in some manner while optimistic concurrency control protects system from
concurrency conflict so it will not happen.
ii. In pessimistic concurrency control, the best solution is when concurrency possibility is
rather low while in optimistic concurrency control, the best solution is when there is a lot of
updates and concurrency possibility is high.
iii. Pessimistic concurrency control doesn’t lock records – to ensure record wasn’t changed
in time between select & submit operations, it checks row version while optimistic
concurrency control locks records so record selected for update will not be updated
meantime by another user.
iv. Pessimistic concurrency control is simple in designing and programming while optimistic
concurrency control is more complex in designing and managing the programming part
(deadlocks’ risk)
v. Pessimistic concurrency control suits best when database has a lot of records and not too
many (relatively) users while optimistic concurrency control suits well when we have a table
with relatively small amount of records but a lot of update operations. Often transaction
rollback would be an ‘effort waste’.

b) Describe quorum-based protocol for distributed concurrency control. {5marks}

A quorum is the minimum number of votes that a distributed transaction has to obtain in
order to be allowed to perform an operation in a distributed system. A quorum-based
technique is implemented to enforce consistent operation in a distributed system. Quorum-
based voting in commit protocol In a distributed database system, a transaction could be
executing its operations at multiple sites. Since atomicity requires every distributed
transaction to be atomic, the transaction must have the same fate (commit or abort) at
every site. In case of network partitioning, sites are partitioned and the partitions may not
be able to communicate with each other. This is where a quorum-based technique comes
in. The fundamental idea is that a transaction is executed if the majority of sites vote to
execute it. Every site in the system is assigned a vote Vi. Let us assume that the total
number of votes in the system is V and the abort and commit quorums are Va and Vc,
respectively. Then the following rules must be obeyed in the implementation of the commit
protocol:
1. Va + Vc > V, where 0 < Vc, Va ≤ V.
2. Before a transaction commits, it must obtain a commit quorum Vc. The total of at least
one site that is prepared to commit and zero or more sites waiting ≥Vc.
3. Before a transaction aborts, it must obtain an abort quorum Va The total of zero or more
sites that are prepared to abort or any sites waiting ≥Va. The first rule ensures that a
transaction cannot be committed and aborted at the same time. The next two rules indicate
the votes that a transaction has to obtain before it can terminate one way or the other.
used to ensure that no two copies of a data item are read or written by two transactions
concurrently. The quorum-based voting for replica control is due to [Gifford, 1979]. Each
copy of a replicated data item is assigned a vote. Each operation then has to obtain a read
quorum (Vr) or a write quorum (Vw) to read or write a data item, respectively. If a given
data item has a total of V votes, the quorums have to obey the following rules: 1. Vr + Vw >
V 2. Vw > V/2 The first rule ensures that a data item is not read and written by two
transactions concurrently. Additionally, it ensures that a read quorum contains at least one
site with the newest version of the data item. The second rule ensures that two write
operations from two transactions cannot occur concurrently on the same data item. The
two rules ensure that one-copy serializability is maintained.

5) Discuss how distributed systems are used in organizations {10marks}

Distributed system is a collection of independent computers that appears to its users as a


single system.
 They can be used in real time process control in:
 Industrial control systems
 Aircraft control systems.
 Distributed systems are used in parallel communication for;
 Distributed rendering in computer graphics
 Scientific computing including cluster computing and grid computing.
 Used in telecommunication networks for;
 Routing algorithms
 Computer networks such as the internet
 Telephone networks and cellular networks
 Wireless sensor networks.
 Can also be used in network applications such as;
 World wide web- mostly to perform resource sharing
 Peer to peer networks.

6) Discuss the techniques used to facilitate distributed query processing and Optimization.
{10 marks}

The query processing methods for multiple dimensions are divided as follows:
1.Selection Query Model: Selection query model is a technique where tuples are directly
assigned to the scores. And the other such technique is join query model and this model is
used to compute over the attached results. The another such technique is aggregate query
model where it is used mostly in ranking the tuples group.
2.Data access model: these methods are available in underlying data sources and are
classified accordingly by the processing techniques. example: Some techniques are used for
the availability of random access whereas the others are permitted to only sorted access.
3.Implementation-Level: These implementation techniques are divided to their level of
integration with their DBMS. For example: some of the implementation techniques are
implemented in top of the database systems of application layer, the others are in query
operations.
4. Query and Data uncertainty: query processing involved in data and query models based
on the division of their un-certainty. For Example: some may give the accurate answers
while the other may give approximate answers with uncertain data. 5.Ranking Function:
Here the techniques they impose on underlying score functions are based on the
restrictions they impose on scores.
The steps involved to transform high- level to low-level queries are as follows:
1.Parsing and translation
2.Optimization

3.Evaluation

4.Execution

1.Parsing and translation: This phase is used for translating the query into internal form.
After translating to internal form, it is then translated into relational algebra. It is used for
checking syntax and also used for relations verification. (or) This phase job is to extract the
raw tokens from strings of characters and they translate them to the data elements
internally. It also helps to check the validity and also checks the syntax.
2.Optimization: Here for query planning we used to generate an evaluation plan of lowest
cost.
3.Evaluation: Here for executing the query we the query execution engine do an evaluation
plan and executes that plan and also give back its answer to the query. 4.Execution: Here
after evaluation it return its answer to the query
Techniques;
(ARRQ) technique To process queries with a minimum quantity of intersite data transfer.
The technique can be used to process the query where all of the relations referenced by a
query are non- fragmented but distributed in different sites. The proposed technique is
used to determine which relations are to be partitioned into fragments, and where the
fragments are to be sent for processing. The technique is efficient compared to other
techniques, as it generally chooses more than one relation to remain fragmented which
exploits parallelism, while replicating the other relations (excluding the fragmented
relations) to the sites of the fragmented relations. Thus, the communication costs and local
processing costs can be reduced due to the reduced size of the fragmented relations and
the response time of queries can be improved.

Six definitions D-1 to D-6 technique


Proposed Technique for Query Processing in DDBS. Our proposed technique is based on the
six definitions D-1 to D-6. We first explain that the straightforward (naive) approach to
processing a distributed query would be sending all relations directly to the assembly site,
where all joins are performed. This naive method, however, is unfavorable due to its high
transmission overhead and low level of parallelism.

You might also like