Q # 1: What Are The Components of Distributed Database System? Explain With The Help of A Diagram. Answer
Q # 1: What Are The Components of Distributed Database System? Explain With The Help of A Diagram. Answer
Answer: The contents of a distributed database are spread across multiple locations. That means the contents may be stored
in different systems that are located in the same place or geographically far away. However, the database still appears
uniform to the users i.e the fact that the database is stored at multiple locations is transparent to the users.
The different components of a distributed database are −
Vertical Fragmentation
In vertical fragmentation, the fields or columns of a table are grouped into fragments. In order to maintain
reconstructiveness, each fragment should contain the primary key field(s) of the table. Vertical fragmentation can be used to
enforce privacy of data.
For example, let us consider that a University database keeps records of all registered students in a Student table having the
following schema.
STUDENT
Regd_No Name Course Address Semester Fees Marks
Now, the fees details are maintained in the accounts section. In this case, the designer will fragment the database as follows
−
CREATE TABLE STD_FEES AS
SELECT Regd_No, Fees
FROM STUDENT;
Horizontal Fragmentation
Horizontal fragmentation groups the tuples of a table in accordance to values of one or more fields. Horizontal
fragmentation should also confirm to the rule of re-constructiveness. Each horizontal fragment must have all columns of the
original base table.
For example, in the student schema, if the details of all students of Computer Science Course needs to be maintained at the
School of Computer Science, then the designer will horizontally fragment the database as follows −
CREATE COMP_STD AS
SELECT * FROM STUDENT
WHERE COURSE = "Computer Science";
Hybrid Fragmentation
In hybrid fragmentation, a combination of horizontal and vertical fragmentation techniques are used. This is the most
flexible fragmentation technique since it generates fragments with minimal extraneous information. However,
reconstruction of the original table is often an expensive task.
Hybrid fragmentation can be done in two alternative ways −
At first, generate a set of horizontal fragments; then generate vertical fragments from one or more of the horizontal
fragments.
At first, generate a set of vertical fragments; then generate horizontal fragments from one or more of the vertical
fragments.
Q # 5: Write short note on any three of the following.
a. Remote and distributed transactions.
b. Replication and consideration while designing the distributed data base system
c. Network Transparency.
d. Underlying design principles in COBRA architecture.
e. Deadlock prevention deadlock detection algorithms.
Answer:
a) Remote and distributed transactions.
Remote transaction:
A remote transaction, composed of several requests, accesses data at a single remote site. A remote
transaction is illustrated in the following figure.
The features of remote transaction are as follows:
• The transaction updates the PRODUCT and INVOICE tables (located at site B).
• The remote transaction is sent to and executed at the remote site B.
• The transaction can reference only one remote DP.
• Each SQL statement (or request) can reference only one (the same) remote DP at a time, and the
entire transaction can reference and be executed at only one remote DP.
Distributed transactions :
A distributed transaction is a database transaction in which two or more network hosts are
involved. Usually, hosts provide transactional resources, while the transaction manager is
responsible for creating and managing a global transaction that encompasses all operations against such
resources. Distributed transactions, as any other transactions, must have all four ACID (atomicity,
consistency, isolation, durability) properties, where atomicity guarantees all-or-nothing outcomes for the
unit of work (operations bundle).
b) Replication and consideration while designing the distributed data base system
Data Replication:
Data replication is the process of storing separate copies of the database at two or more sites. It is a popular
fault tolerance technique of distributed databases.
Advantages of Data Replication:
Reliability − In case of failure of any site, the database system continues to work since a copy is available
at another site(s).
Reduction in Network Load − Since local copies of data are available, query processing can be done with
reduced network usage, particularly during prime hours. Data updating can be done at non-prime hours.
Quicker Response − Availability of local copies of data ensures quick query processing and consequently
quick response time.
Simpler Transactions − Transactions require less number of joins of tables located at different sites and
minimal coordination across the network. Thus, they become simpler in nature .
Disadvantages of Data Replication
Increased Storage Requirements − Maintaining multiple copies of data is associated with increased
storage costs. The storage space required is in multiples of the storage required for a centralized system.
Increased Cost and Complexity of Data Updating − Each time a data item is updated, the update needs to
be reflected in all the copies of the data at the different sites. This requires complex synchronization
techniques and protocols.
Undesirable Application – Database coupling − If complex update mechanisms are not used, removing
data inconsistency requires complex co-ordination at application level. This results in undesirable
application – database coupling.
Some commonly used replication techniques are −
Snapshot replication
Near-real-time replication
Pull replication
Consideration while designing the distributed data base system:
Following are some of the characteristics of distributed systems that should be considered in
designing a project in a distributed environment:
1- No global clock
2- Geographical distribution
3- No shared memory
4- Independence and heterogeneity
5- Fail-over mechanism
6- Security concerns
7- Java support
c) Network Transparency:
Network transparency is basically one of the properties of distributed database. According to this
property, a distributed database must be network transparent. Network transparency means that a user
must be unaware about the operational details of the network.
Actually in distributed databases when a user wants to access data and if that particular data does not
exist on user computer then it is the responsibility of DBMS to provide the data from any other
computer where it exists. User does not know about this thing as from where data is coming.
d) Deadlock prevention deadlock detection algorithms.
Deadlock Prevention:
The deadlock prevention approach does not allow any transaction to acquire locks that will lead to
deadlocks. The convention is that when more than one transactions request for locking the same data
item, only one of them is granted the lock.
One of the most popular deadlock prevention methods is pre-acquisition of all the locks. In this
method, a transaction acquires all the locks before starting to execute and retains the locks for the
entire duration of transaction. If another transaction needs any of the already acquired locks, it has to
wait until all the locks it needs are available. Using this approach, the system is prevented from being
deadlocked since none of the waiting transactions are holding any lock.
Deadlock Detection:
The deadlock detection and removal approach runs a deadlock detection algorithm periodically and
removes deadlock in case there is one. It does not check for deadlock when a transaction places a
request for a lock. When a transaction requests a lock, the lock manager checks whether it is available.
If it is available, the transaction is allowed to lock the data item; otherwise the transaction is allowed to
wait.
Q # 1: What are the functions of distributed database management system? What are the
advantages and disadvantages of distributed database system?
Answer: Functions of Distributed Database
Below are the functions of the Distributed Database System:
1. Cataloguing
As the data on a distributed system is spread across locations, it becomes imperative to have a Cataloguing of
what is stored at what location along with details to its autonomy and confidentiality. The details of the database
with respect to data distribution, data fragmentation, and replication need to be expanded in the Database
catalog which is dynamic and ever-expanding.
2. Data Recoverability
One of the main advantages of DDMS is that it spreads across a network of computers with independent
components and therefore recovery of data at a particular site becomes an important function of the distributed
system. If there were to be a problem with one of the locations, the system needs to be robust enough to
recover the data that has been lost in that particular location.
3. Security
As discussed, security becomes an important aspect of distributed systems and with ever-increasing use by
organizations various complex security measures have been possible under distributed systems with its
advantage of being at multiple locations.
4. Distributed Query Processing
In an environment where data is located in various sites in a distributed system, the Distributed Query Processing
is used to process the queries.
5. Data Transaction Management
In a distributed system the transactions take place in different systems at various physical locations, therefore in
order to complete a transaction, the Data Transaction Manager communicates with all the local transaction
managers. Thus, the data transaction management plays a very important role.
Advantages & Disadvantages of Distributed System
Below are the advantages and disadvantages of a Distributed System:
: Advantages:
They are cost-effective and can drastically reduce database management costs by a fraction. This helps a
lot of startups and cash strapped companies to invest in other technologies.
It’s easily Scalable and therefore as the business grows, they can easily scale their distributed system to
handle an increased workload.
The performance is drastically improved vis-a-vis a traditional single large computer system as the data is
located where the site has the greatest demand and the database systems can be parallelized allowing
load distribution to be just and as per requirement.
An Organization can better structure their various departments by better sorting of their database
systems where the data for a department is fragmented in their location.
There can be a reliable transaction due to the replication of the database.
Fault-tolerant and single site failure won’t affect the entire system.
Local or central autonomy where there is greater flexibility with the organization and they can choose
who can access what data.
It supports both OLTP (Online Transaction Processing) and OLAP (Online Analytical processing) upon
diversified systems that may have common data.
Most organizations use various applications and with distributed systems, it’s robust enough to use the
same data under various applications.
: Disadvantages:
While it is economical to distributed systems, in the long run, its installation can be expensive as it would
need relatively high resources to set up a distributed system effectively.
The need for updating data on every site can sometimes affect data integrity.
The entire system can become unresponsive or slow if data is not properly distributed and work is not
handled correctly. This again can also increase overheads as personnel would need to be available to
monitor the system constantly.
Organizations wanting to convert from a centralized database to distributed systems also might face
problems due to lack of a standard procedure as there are no tools or methodologies for the same.
Conclusion
As far as database management system goes, a distributed database management system is what most
organizations opt for and due to several advantages and use cases, it becomes imperative for anyone looking for
a database management system to consider opting for a distributed system. Also, various cloud solutions
providers like Microsoft Azure, AWS and Google are already offering these systems at one-tenth of the cost of
centralized or even some decentralized systems making it easier for businesses and organizations to switch to
distributed systems.
In the phase of validation, serializability is ensured by this algorithm y following three rules. They are -
Rule 1 – Consider two transactions Ti and Tj, when the data items are being read by the Ti and are written by Tj,
the commitment phase of Tjcannot be overlapped with the execution phase of Ti. In other words only once the
execution is finished by Ti, Tj can start its commitment.
Rule 2 – Consider two transactions Ti and Tj, if the data item is written by Ti and it is read by Tj, the execution
phase of Tj cannot be overlapped by Ti commitment phase. Only once the Ti has committed, Tj can start
executing.
Rule 3 − Consider two transactions Ti and Tj, if Ti is writing the data item which Tj is also writing, then Ti’s
commit phase cannot overlap with Tj’s commit phase. Tj can start to commit only after Ti has already committed.
Q # 3: Why is distributed database system said to be scalable?
Answer:
Scalability:
At a high level, both scalability and elasticity help to improve availability and performance when demand is changing,
especially when changes are unpredictable. If the data is not available, applications cannot run. If applications cannot run or
run slowly, the company loses business. Therefore, it is important to be able to ensure that databases are kept online and
operational.
Scalability refers to the capability of a system to handle a growing amount of work, or its potential to perform more total
work in the same elapsed time when processing power is expanded to accommodate growth. A system is said to be scalable
if it can increase its workload and throughput when additional resources are added.
There are two broad categories for scaling database systems: vertical scaling and horizontal scaling.
Vertical scaling, also known as scaling up, is the process of adding resources, such as memory or more powerful CPUs to an
existing server. Removing the memory or changing to a less powerful CPU is known as scaling down.
Adding or replacing resources to a system typically results in performance gains, but realizing such gains often requires
reconfiguration and downtime. Furthermore, there are limitations to the number of additional resources that can be applied to
a single system, as well as to the software that uses the system.
Horizontal scaling sometimes referred to as scaling out, is the process of adding more hardware to a system. This typically
means adding nodes (new servers) to an existing system. Doing the opposite, that is removing hardware, is known as scaling
in.
With the cost of hardware declining, it makes more sense to adopt horizontal scaling using low-cost "commodity" systems
for tasks that previously required larger computers, such as mainframes. Of course, horizontal scaling can be limited by the
capability of software to exploit networked computer resources and other technical constraints. And keep in mind that
traditional database servers cannot run on more than a few machines. In such cases, scaling is limited, in that you are scaling
to several machines, not to 100x or more.
Case Study
By taking any real time situation, explain how a participating node performs its recovery when it
fails during the processing of a transaction?
Hospital Management System
Aim:
XYZ hospital is a multi specialty hospital that includes a number of departments, rooms, doctors, nurses, compounders, and
other staff working in the hospital. Patients having different kinds of ailments come to the hospital and get checkup done fro
m the concerned doctors. If required they are admitted in the hospital and discharged after treatment.
The aim of this case study is to design and develop a database for the hospital to maintain the records of various departments,
rooms, and doctors in the hospital. It also maintains records of the regular patients, patients admitted in the hospital, the chec
k up of patients done by the doctors, the patients that have been operated, and patients discharged from the hospital.
Description: In hospital, there are many departments like Orthopedic, Pathology, Emergency, Dental, Gynecology, Anestheti
cs, I.C.U., Blood Bank, Operation Theater, Laboratory, M.R.I., Neurology, Cardiology, Cancer Department, Corpse, etc. The
re is an OPD where patients come and get a card (that is, entry card of the patient) for check up from the concerned doctor. A
fter making entry in the card, they go to the concerned doctor’s room and the doctor checks up their ailments. According to t
he ailments, the doctor either prescribes medicine or admits the patient in the concerned department. The patient may choose
either private or general room according to his/her need. But before getting admission in the hospital, the patient has to fulfill
certain formalities of the hospital like room charges, etc. After the treatment is completed, the doctor discharges the patient.
Before discharging from the hospital, the patient again has to complete certain formalities of the hospital like balance
charges, test charges, operation charges (if any), blood charges, doctors’ charges, etc.
Next we talk about the doctors of the hospital. There are two types of the doctors in the hospital, namely, regular doctors and
call on doctors. Regular doctors are those doctors who come to the hospital daily. Calls on doctors are those doctors who are
called by the hospital if the concerned doctor is not available. Table Description:
Following are the tables along with constraints used in Hospital Management database.
1. DEPARTMENT:
This table consists of details about the various departments in the
hospital. The information stored in this table includes department name, department location, and facilities available in that d
epartment.
Constraint: Department name will be unique for each department.
2. ALL_DOCTORS:
This table stores information about all the doctors working for the hospital and the departments they are associated with. Eac
h doctor is given an identity number starting with DR or DC prefixes only.
Constraint: Identity number is unique for each doctor and the corresponding
department should exist in DEPARTMENT table.
3. DOC_REG:
This table stores details of regular doctors working in the hospital. Doctors are referred to by their doctor number. This table
also stores personal details of doctors like name, qualification, address, phone number, salary, date of joining, etc.
Constraint: Doctor’s number entered should contain DR only as a prefix and must exist in ALL_DOCTORS table.
4. DOC_ON_CALL:
This table stores details of doctors called by hospital when additional doctors are required. Doctors are referred to by their d
octor number. Other personal details like name, qualification, fees per call, payment due, address, phone number, etc., are als
o stored.
Constraint: Doctor’s number entered should contain DC only as a prefix and must exist in ALL_DOCTORS table.
5. PAT_ENTRY:
The record in this table is created when any patient arrives in the hospital for a check up. When patient arrives, a patient num
ber is generated which acts as a primary key. Other details like name, age, sex, address, city, phone number, entry date, and
name of the doctor referred to, diagnosis, and department name are also stored. After storing the necessary details patient is s
ent to the doctor for check up.
Constraint: Patient number should begin with prefix PT. Sex should be M or F only.
Doctor’s name and department referred must exist.
6. PAT_CHKUP:
This table stores the details about the patients who get treatment from the doctor referred to. Details like patient number from
patient entry table, doctor number, date of check up, diagnosis, and treatment are stored. One more field status is used to indi
cate whether patient is admitted, referred for operation or is a regular patient to the hospital. If patient is admitted, further det
ails are stored in PAT_ADMIT
Table. If patient is referred for operation, the further details are stored in PAT_OPR table and if patient is a regular patient to
the hospital, the further details are stored in PAT_REG table.
Constraint: Patient number should exist in PAT_ENTRY table and it should be unique.
7. PAT_ADMIT:
When patient is admitted, his/her related details are stored in this table.
Information stored includes patient number, advance payment, mode of payment, room
number, department, date of admission, initial condition, diagnosis, treatment, number
of the doctor under whom treatment is done, attendant name, etc.
Constraint: Patient number should exist in PAT_ENTRY table. Department, doctor number, room number must be valid.
8. PAT_DIS:
An entry is made in this table whenever a patient gets discharged from the hospital. Each entry includes details like patient n
umber, treatment given, treatment advice, payment made, mode of payment, date of discharge, etc.
Constraint: Patient number should exist in PAT_ENTRY table.
9. PAT_REG:
Details of regular patients are stored in this table. Information stored includes date of visit, diagnosis, treatment, medicine rec
ommended, status of treatment, etc.
Constraint: Patient number should exist in patient entry table. There can be multiple entries of one patient as patient might b
e visiting hospital repeatedly for check up and there will be entry for patient’s each visit.
10. PAT_OPR:
If patient is operated in the hospital, his/her details are stored in this table. Information stored includes patient number,
date of admission, date of operation,
number of the doctor who conducted the operation, number of the operation theater in
which operation was carried out, type of operation, patient’s condition before and after operation, treatment advice, etc.
Constraint: Patient number should exist in PAT_ENTRY table. Department, doctor number should exist or should be valid.
11. ROOM_DETAILS:
It contains details of all rooms in the hospital. The details stored in this table include room number, room type (general or pri
vate), status (whether occupied or not), if occupied, then patient number, patient name, charges per day, etc.
Constraint: Room number should be unique. Room type can only be G or P and status can only be Y or N
E‐R Diagram