Answer:: The Different Components of DDBMS Are As Follows
Answer:: The Different Components of DDBMS Are As Follows
Q # 2: What is the difference between homogeneous and heterogeneous distributed database system?
Answer: A DDBMS may be classified as homogeneous or heterogeneous. In a homogeneous system, all sites use the same DBMS
product. In a heterogeneous system, sites may run different DBMS products, which need not be based on the same underlying data
model, and so the system may be composed of relational, network, hierarchical and object-oriented DBMSs.
Homogeneous systems are much easier to design and manage. This approach provides incremental growth, making the addition of a new
site to the DDBMS easy, and allows increased performance by exploiting the parallel processing capability of multiple sites.
Heterogeneous system usually result when individual sites have implemented their own database and integration is considered at a later
stage. In a heterogeneous system, translations are required to allow communication between different DBMSs. To provide DBMS
transparency, users must be able to make requests in the language of the DBMS at their local site. The system then has the task of
locating the data and performing any necessary translation.
Q # 3: What is location transparency?
Answer: Location Transparency:
Location transparency ensures that the user can query on any table(s) or fragment(s) of a table as if they were stored locally in the user’s
site. The fact that the table or its fragments are stored at remote site in the distributed database system, should be completely oblivious to
the end user. The address of the remote site(s) and the access mechanisms are completely hidden.
In order to incorporate location transparency, DDBMS should have access to updated and accurate data dictionary and DDBMS
directory which contains the details of locations of data.
Q # 4: What is Fragmentation? Discuss various types of Fragmentations with the help of suitable examples.
Answer:
Fragmentation:
Fragmentation is the task of dividing a table into a set of smaller tables. The subsets of the table are called fragments. Fragmentation
can be of three types: horizontal, vertical, and hybrid (combination of horizontal and vertical). Horizontal fragmentation can further be
classified into two techniques: primary horizontal fragmentation and derived horizontal fragmentation.
Fragmentation should be done in a way so that the original table can be reconstructed from the fragments. This is needed so that the
original table can be reconstructed from the fragments whenever required. This requirement is called “re-constructiveness.”
By fragmenting the relation in DB allows:
Easy usage of Data: It makes most frequently accessed set of data near to the user. Hence these data can be accessed easily as and
when required by them.
Efficiency: It in turn increases the efficiency of the query by reducing the size of the table to smaller subset and making them
available with less network access time.
Security: It provides security to the data. That means only valid and useful records will be available to the actual user. The DB near
to the user will not have any unwanted data in their DB. It will contain only those information’s, which are necessary for them.
Parallelism: Fragmentation allows user to access the same table at the same time from different locations. Users at different
locations will be accessing the same table in the DB at their location, seeing the data that are meant for them. If they are accessing
the table at one location, then they have to wait for the locks to perform their transactions.
Reliability: It increases the reliability of fetching the data. If the users are located at different locations accessing the single DB,
then there will be huge network load. This will not guarantee that correct records are fetched and returned to the user. Accessing the
fragment of data in the nearest DB will reduce the risk of data loss and correctness of data.
Balanced Storage: Data will be distributed evenly among the databases in DDB.
Information about the fragmentation of the data is stored in DDC. When user sends a query, this DDC will determine which fragment to
be accessed and it points that data fragment.
Fragmentation of data can be done according to the DBs and user requirement. But while fragmenting the data, below points should be
kept in mind:
Completeness: While creating the fragment, partial records in the table should not be considered. Fragmentation should be
performed on whole table’s data to get the correct result. For example, if we are creating fragment on EMPLOYEE table, then
we need to consider whole EMPLOYEE table for constructing fragments. It should not be created on the subset of EMPLOYEE
records.
Reconstructions: When all the fragments are combined, it should give whole table’s data. That means whole table should be
able to reconstruct using all fragments. For example all fragments’ of EMPLOYEE table in the DB, when combined should give
complete EMPLOYEE table records.
Disjointedness: There should not be any overlapping data in the fragments. If so, it will be difficult to maintain the consistency
of the data. Effort needs to be put to create same replication in all the copies of data. Suppose we have fragments on
EMPLOYEE table based on location then, there should not be any two fragments having the details of same employee.
Now these queries will give the subset of records from EMPLOYEE table depending on the location of the employees. These sub set of
data will be stored in the DBs at respective locations. Any insert, update and delete on the employee records will be done on the DBs at
their location and it will be synched with the main table at regular intervals.
Above is the simple example of horizontal fragmentation. This fragmentation can be done with more than one conditions joined by AND
or OR clause. Fragmentation is done based on the requirement and the purpose of DDB.
Vertical Data Fragmentation:
This is the vertical subset of a relation. That means a relation / table is fragmented by considering the columns of it.
For example consider the EMPLOYEE table with ID, Name, Address, Age, location, DeptID, ProjID. The vertical fragmentation of this
table may be dividing the table into different tables with one or more columns from EMPLOYEE.
This type of fragment will have fragmented details about whole employee. This will be useful when the user needs to query only few
details about the employee. For example consider a query to find the department of the employee. This can be done by querying the third
fragment of the table. Consider a query to find the name and age of an employee whose ID is given. This can be done by querying first
fragment of the table. This will avoid performing ‘SELECT *’ operation which will need lot of memory to query the whole table – to
traverse whole data as well as to hold all the columns.
In this fragment overlapping columns can be seen but these columns are primary key and are hardly changed throughout the life cycle of
the record. Hence maintaining cost of this overlapping column is very least. In addition this column is required if we need to reconstruct
the table or to pull the data from two fragments. Hence it still meets the conditions of fragmentation.
Hybrid Data Fragmentation:
This is the combination of horizontal as well as vertical fragmentation. This type of fragmentation will have horizontal fragmentation to
have subset of data to be distributed over the DB, and vertical fragmentation to have subset of columns of the table.
As we observe in above diagram, this type of fragmentation can be done in any order. It does not have any particular order. It is solely
based on the user requirement. But it should satisfy fragmentation conditions.
Consider the EMPLOYEE table with below fragmentations.
Answer:
a) Network Transparency:
Network transparency is basically one of the properties of distributed database. According to this property, a
distributed database must be network transparent. Network transparency means that a user must be unaware
about the operational details of the network.
Actually in distributed databases when a user wants to access data and if that particular data does not exist on
user computer then it is the responsibility of DBMS to provide the data from any other computer where it
exists. User does not know about this thing as from where data is coming.
Concurrency control is the procedure in DBMS for managing simultaneous operations without conflicting with each another. Concurrent
access is quite easy if all users are just reading data. There is no way they can interfere with one another. Though for any practical
database, would have a mix of reading and WRITE operations and hence the concurrency is a challenge.
Concurrency control is used to address such conflicts which mostly occur with a multi-user system. It helps you to make sure that
database transactions are performed concurrently without violating the data integrity of respective databases.
Therefore, concurrency control is a most important element for the proper functioning of a system where two or multiple database
transactions that require access to the same data, are executed simultaneously.
Here, are some issues which you will likely to face while using the Concurrency Control method:
Lost Updates occur when multiple transactions select the same row and update the row based on the value selected
Uncommitted dependency issues occur when the second transaction selects a row which is updated by another transaction (dirty
read)
Non-Repeatable Read occurs when a second transaction is trying to access the same row several times and reads different data
each time.
Incorrect Summary issue occurs when one transaction takes summary over the value of all the instances of a repeated data-
item, and second transaction update few instances of that specific data-item. In that situation, the resulting summary does not
reflect a correct result.
The system needs to control the interaction among the concurrent transactions. This control is achieved using concurrent-control
schemes.
Example
Assume that two people who go to electronic kiosks at the same time to buy a movie ticket for the same movie and the same show time.
However, there is only one seat left in for the movie show in that particular theatre. Without concurrency control, it is possible that both
moviegoers will end up purchasing a ticket. However, concurrency control method does not allow this to happen. Both moviegoers can
still access information written in the movie seating database. But concurrency control only provides a ticket to the buyer who has
completed the transaction process first.
Different concurrency control protocols offer different benefits between the amount of concurrency they allow and the amount of
overhead that they impose.
Lock-Based Protocols
Two Phase
Timestamp-Based Protocols
Validation-Based Protocols
Lock-based Protocols
A lock is a data variable which is associated with a data item. This lock signifies that operations that can be performed on the
data item. Locks help synchronize access to the database items by concurrent transactions.
All lock requests are made to the concurrency-control manager. Transactions proceed only once the lock request is granted.
Binary Locks: A Binary lock on a data item can either locked or unlocked states.
Shared/exclusive: This type of locking mechanism separates the locks based on their uses. If a lock is acquired on a data
item to perform a write operation, it is called an exclusive lock.
Two Phase Locking (2PL) Protocol
Two-Phase locking protocol which is also known as a 2PL protocol. It is also called P2L. In this type of locking protocol, the
transaction should acquire a lock after it releases one of its locks.
This locking protocol divides the execution phase of a transaction into three different parts.
In the first phase, when the transaction begins to execute, it requires permission for the locks it needs.
The second part is where the transaction obtains all the locks. When a transaction releases its first lock, the third
phase starts.
In this third phase, the transaction cannot demand any new locks. Instead, it only releases the acquired locks.
The Two-Phase Locking protocol allows each transaction to make a lock or unlock request in two steps:
Growing Phase: In this phase transaction may obtain locks but may not release any locks.
Shrinking Phase: In this phase, a transaction may release locks but not obtain any new lock
It is true that the 2PL protocol offers serializability. However, it does not ensure that deadlocks do not happen.
In the above-given diagram, you can see that local and global deadlock detectors are searching for deadlocks and solve them
with resuming transactions to their initial states.
Timestamp-based Protocols
The timestamp-based algorithm uses a timestamp to serialize the execution of concurrent transactions. This protocol ensures
that every conflicting read and write operations are executed in timestamp order. The protocol uses the System Time or
Logical Count as a Timestamp.
The older transaction is always given priority in this method. It uses system time to determine the time stamp of the
transaction. This is the most commonly used concurrency protocol.
Lock-based protocols help you to manage the order between the conflicting transactions when they will execute. Timestamp-
based protocols manage conflicts as soon as an operation is created.
Case Study
By taking any real time situation, explain how a participating node performs its recovery when it fails during the processing of a
transaction?
Answer:
Scalability:
A scalable system is any system that is flexible with its number of components.
For an efficiently designed distributed system, adding and removing nodes should be an easy task. The system architecture
must be capable of accommodating such changes.You call a system scalable when adding or removing the components
doesn’t make the user sense a difference. The entire system feels like one coherent, logical system.
A distributed system ( system comprising of many servers and probably many networks ) will be called scalable system
when the system is able to give right response to the requests immaterial of traffic coming in - basically as the
computation grows ( may be 1 user now and 1M users in an hour and 1K users in 2nd hour so on… ) and does not fail.
Scalability can be achieved in many ways: horizontal , vertical etc…