0% found this document useful (0 votes)
140 views15 pages

Unit 4 DBMS

DBMS for School

Uploaded by

srnarayanan_slm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
140 views15 pages

Unit 4 DBMS

DBMS for School

Uploaded by

srnarayanan_slm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Chapter 4: Distributed Database System

This chapter introduces the concept of DDBMS. In a distributed database, there are a
number of databases that may be geographically distributed all over the world. A distributed
DBMS manages the distributed database in a manner so that it appears as one single database to
users. In the later part of the chapter, we go on to study the factors that lead to distributed
databases, its advantages and disadvantages.
A distributed database is a collection of multiple interconnected databases, which are spread
physically across various locations that communicate via a computer network.
 Databases in the collection are logically interrelated with each other. Often they represent a
single logical database.
 Data is physically stored across multiple sites. Data in each site can be managed by a
DBMS independent of the other sites.
 The processors in the sites are connected via a network. They do not have any
multiprocessor configuration.
 Transaction servers which is widely used in relational / distributed database systems.
 A distributed database is not a loosely connected file system.
 A distributed database incorporates transaction processing, but it is not synonymous with a
transaction processing system. A transaction processing system is also called as TP
monitor.
 Global Wait-for graph is used for handling deadlocks in Distributed database

4.1 Characteristic of Distributed Database Management System

A distributed database management system (DDBMS) is a centralized software system


that manages a distributed database in a manner as if it were all stored in a single location.
 A collection of logically related shared data;
 The data is split into a number of fragments;
 Fragments may be replicated;
 Fragments/replicas are allocated to sites;
 The sites are linked by a communications network, but all sites in a distributed database are
not commit at exactly the same instant.
 The data at each site is under the control of a DBMS;
 The DBMS at each site can handle local applications, autonomously;
 Each DBMS participates in at least one global application
It is not necessary for every site in the system to have its own local database.

4.2 Factors Encouraging DDBMS

The following factors encourage moving over to DDBMS −


 Distributed Nature of Organizational Units − Most organizations in the current times are
subdivided into multiple units that are physically distributed over the globe. Each unit
requires its own set of local data. Thus, the overall database of the organization becomes
distributed. An autonomous homogenous environment in which the same DBMS is at each
node and each DBMS works independently
 Need for Sharing of Data − The multiple organizational units often need to communicate
with each other and share their data and resources. This demands common databases or
replicated databases that should be used in a synchronized manner.
Example: In Distributed database, In-doubt transactions are the transactions for which a
<ready T> log is found in the log file, but neither a <commit T> log nor an <abort T> log
is found.

 Support for Both OLTP and OLAP − Online Transaction Processing (OLTP) and Online
Analytical Processing (OLAP) work upon diversified systems which may have common
data. Distributed database systems aid both these processing by providing synchronized
data.
 Database Recovery − One of the common techniques used in DDBMS is replication of
data across different sites. Replication of data automatically helps in data recovery if
database in any site is damaged. Users can access data from other sites while the damaged
site is being reconstructed. Thus, database failure may become almost inconspicuous to
users.
Example : Let us suppose that in a distributed database, during a transaction T1, one of the
sites, say S1, is failed. When recovers, the site S1 has to check its log file (log based
recovery) to decide the next move on the transaction T1. If the log contains a <commit T>
record, then the site S1 has to do Perform Redo is a operation.
 Support for Multiple Application Software − Most organizations use a variety of
application software each with its specific database support. DDBMS provides a uniform
functionality for using the same data among different platforms.

4.3 Advantages and Disadvantages of DDBMS

Advantages of Distributed Databases

1. Data are located near the greatest demand site. The data in a distributed database system are
dispersed to match business requirements which reduce the cost of data access.

2. Faster data access. End users often work with only a locally stored subset of the company’s
data.

3. Faster data processing. A distributed database system spreads out the systems workload by
processing data at several sites.

4. Growth facilitation. New sites can be added to the network without affecting the operations
of other sites.

5. Improved communications. Because local sites are smaller and located closer to customers,
local sites foster better communication among departments and between customers and
company staff.

6. Reduced operating costs. It is more cost-effective to add workstations to a network than to


update a mainframe system. Development work is done more cheaply and more quickly on
low-cost PCs than on mainframes.

7. User-friendly interface. PCs and workstations are usually equipped with an easy-to-use
graphical user interface (GUI). The GUI simplifies training and use for end users.

8. Less danger of a single-point failure. When one of the computers fails, the workload is
picked up by other workstations. Data are also distributed at multiple sites.

9. Processor independence. The end user is able to access any available copy of the data, and
an end user's request is processed by any processor at the data location.

Disadvantages of DDBMS:
1. Complexity of management and control. Applications must recognize data location, and
they must be able to stitch together data from various sites. Database administrators must have
the ability to coordinate database activities to prevent database degradation due to data
anomalies.

2. Technological difficulty. Data integrity, transaction management, concurrency control,


security, backup, recovery, query optimization, access path selection, and so on, must all be
addressed and resolved.

3. Security. The probability of security lapses increases when data are located at multiple
sites. The responsibility of data management will be shared by different people at several
sites.

4. Lack of standards. There are no standard communication protocols at the database level.
(Although TCP/IP is the de facto standard at the network level, there is no standard at the
application level.) For example, different database vendors employ different—and often
incompatible—techniques to manage the distribution of data and processing in a DDBMS
environment.

5. Increased storage and infrastructure requirements. Multiple copies of data are required at
different sites, thus requiring additional disk storage space.

6. Increased training cost. Training costs are generally higher in a distributed model than they
would be in a centralized model, sometimes even to the extent of offsetting operational and
hardware savings.

7. Costs. Distributed databases require duplicated infrastructure to operate (physical location,


environment, personnel, software, licensing, etc.)
4.4 Difference between Centralized Database and Distributed Database

Centralized Database:

A centralized database is basically a type of database that is stored, located as well as


maintained at a single location only. A Centralized and distributed databases are not same. This
type of database is modified and managed from that location itself. This location is thus mainly
any database system or a centralized computer system. The centralized location is accessed via
an internet connection (LAN, WAN, etc). This centralized database is mainly used by institutions
or organizations.
Advantages:
 Since all data is stored at a single location only thus it is easier to access and coordinate data.
 The centralized database has very minimal data redundancy since all data is stored in a single
place.
 Modular growth has advantages over a centralized database in distributed database
 It is cheaper in comparison to all other databases available.
 The real use of the Two-phase commit protocol is Atomicity.


Disadvantages:
 The data traffic in the case of a centralized database is more.
 If any kind of system failure occurs in the centralized system then the entire data will be
destroyed.

Distributed Database:

A distributed database is basically a type of database which consists of multiple databases


that are connected with each other and are spread across different physical locations. While using
commit protocols for handling atomicity issues, the distributed database system may enter into a
situation called Blocking problem and can be avoided by using Three-phase commit protocol.
The data that is stored in various physical locations can thus be managed independently of other
physical locations. The communication between databases at different physical locations is thus
done by a computer network.
Advantages:
 This database can be easily expanded as data is already spread across different physical
locations.
 Biased protocol concurrency control protocol is suitable for an application where frequency
of read operation is much greater than that of write operation
 The distributed database can easily be accessed from different networks.
 Transparent management of distributed, fragmented, and replicated data are promise of
distributed database.
 This database is more secure in comparison to a centralized database.
Disadvantages:
 This database is very costly and is difficult to maintain because of its complexity.
 In this database, it is difficult to provide a uniform view to users since it is spread across
different physical locations.

4.5 Distributed DBMS - Environments


Distributed databases can be classified into homogeneous and heterogeneous databases
having further divisions. Shared nothing is parallel database architectures is/are mainly used by
distributed database system. The next section of this chapter discusses the distributed
architectures namely client – server, peer – to – peer and multi – DBMS. Finally, the different
design alternatives like replication and fragmentation are introduced. Read one, write all
available protocol is used to increase Availability and Robustness in a distributed database
system.

Types of Distributed Databases

Distributed databases can be broadly classified into homogeneous and heterogeneous


distributed database environments, each with further sub-divisions, as shown in the following
illustration.
Homogeneous Distributed Databases
In a homogeneous distributed database, all the sites use identical DBMS and operating systems.
Its properties are −
 The sites use very similar software.
 The sites use identical DBMS or DBMS from the same vendor.
 Each site is aware of all other sites and cooperates with other sites to process user requests.
 The database is accessed through a single interface as if it is a single database.
 All Accesses to the Database are via the DB buffer manager.
Types of Homogeneous Distributed Database
There are two types of homogeneous distributed database −
 Autonomous − Each database is independent that functions on its own. They are integrated
by a controlling application and use message passing to share data updates.
 Non-autonomous − Data is distributed across the homogeneous nodes and a central or
master DBMS co-ordinates data updates across the sites.
Heterogeneous Distributed Databases
In a heterogeneous distributed database, different sites have different operating systems,
DBMS products and data models. Its properties are −
 Different sites use dissimilar schemas and software.
 The system may be composed of a variety of DBMSs like relational, network, hierarchical
or object oriented.
 The Local Reliability Protocol (LRM) is Implemented Within the data processor.
 Query processing is complex due to dissimilar schemas.
 Transaction processing is complex due to dissimilar software.
 A site may not be aware of other sites and so there is limited co-operation in processing
user requests.

Types of Heterogeneous Distributed Databases


 Federated − The heterogeneous database systems are independent in nature and integrated
together so that they function as a single database system. A federated database is a system
in which several databases appear to function as a single entity.
 Each component database in the system is completely self-sustained and functional. Within
a federated system, a single SQL statement can access data that is distributed among several
data sources. For example, a single SQL statement can join data that is located in a Db2®
table, an Oracle table, and an XML tagged file.
 Un-federated − The database systems employ a central coordinating module through
which the databases are accessed.

4.6 Distributed DBMS Architectures

DDBMS architectures are generally developed depending on three parameters −


 Distribution − It states the physical distribution of data across the different sites. DDB
have single processor configuration.
 Autonomy − It indicates the distribution of control of the database system and the degree to
which each constituent DBMS can operate independently.
 Heterogeneity − It refers to the uniformity or dissimilarity of the data models, system
components and databases.
Architectural Models

Some of the common architectural models are −


 Client - Server Architecture for DDBMS
 Peer - to - Peer Architecture for DDBMS
 Multi - DBMS Architecture

Client - Server Architecture for DDBMS


This is a two-level architecture where the functionality is divided into servers and clients.
The server functions primarily encompass data management, query processing, optimization and
transaction management. Client functions include mainly user interface. However, they have
some functions like consistency checking and transaction management.
The two different client - server architecture are −
 Single Server Multiple Client
 Multiple Server Multiple Client (shown in the following diagram)

Peer- to-Peer Architecture for DDBMS


In these systems, each peer acts both as a client and a server for imparting database services. The
peers share their resource with other peers and co-ordinate their activities.
This architecture generally has four levels of schemas −
 Global Conceptual Schema − Depicts the global logical view of data.
 Local Conceptual Schema − Depicts logical data organization at each site.
 Local Internal Schema − Depicts physical data organization at each site.
 External Schema − Depicts user view of data.
Multi - DBMS Architectures
This is an integrated database system formed by a collection of two or more autonomous
database systems.
Multi-DBMS can be expressed through six levels of schemas −
 Multi-database View Level − Depicts multiple user views comprising of subsets of the
integrated distributed database. The transaction wants to edit the data item is called as
Exclusive mode.
 Multi-database Conceptual Level − Depicts integrated multi-database that comprises of
global logical multi-database structure definitions.
 Multi-database Internal Level − Depicts the data distribution across different sites and
multi-database to local data mapping.
 Local database View Level − Depicts public view of local data.
 Local database Conceptual Level − Depicts local data organization at each site.
 Local database Internal Level − Depicts physical data organization at each site.
There are two design alternatives for multi-DBMS −
 Model with multi-database conceptual level.
 Model without multi-database conceptual level.
4.7 Design Alternatives

The distribution design alternatives for the tables in a DDBMS are as follows −
 Non-replicated and non-fragmented
 Fully replicated
 Partially replicated
 Fragmented
 Mixed
Non-replicated & Non-fragmented
In this design alternative, different tables are placed at different sites. Data is placed so
that it is at a close proximity to the site where it is used most. It is most suitable for database
systems where the percentage of queries needed to join information in tables placed at different
sites is low. If an appropriate distribution strategy is adopted, then this design alternative helps to
reduce the communication cost during data processing.
Fully Replicated
In this design alternative, at each site, one copy of all the database tables is stored. Since,
each site has its own copy of the entire database, queries are very fast requiring negligible
communication cost. On the contrary, the massive redundancy in data requires huge cost during
update operations. Hence, this is suitable for systems where a large number of queries is required
to be handled whereas the number of database updates is low.
Partially Replicated
Copies of tables or portions of tables are stored at different sites. The distribution of the
tables is done in accordance to the frequency of access. This takes into consideration the fact that
the frequency of accessing the tables vary considerably from site to site. The number of copies of
the tables (or portions) depends on how frequently the access queries execute and the site which
generate the access queries. Partially committed the transaction will wait for the final statement
has been executed.

Fragmented
In this design, a table is divided into two or more pieces referred to as fragments or
partitions, and each fragment can be stored at different sites. This considers the fact that it
seldom happens that all data stored in a table is required at a given site. Moreover, fragmentation
increases parallelism and provides better disaster recovery. Here, there is only one copy of each
fragment in the system, i.e. no redundant data.
The three fragmentation techniques are −
 Vertical fragmentation
 Horizontal fragmentation
 Hybrid fragmentation

Mixed Distribution
This is a combination of fragmentation and partial replications. Here, the tables are
initially fragmented in any form (horizontal or vertical), and then these fragments are partially
replicated across the different sites according to the frequency of accessing the fragments.
4.8 Distributed DBMS - Database Control
Database control refers to the task of enforcing regulations so as to provide correct data
to authentic users and applications of a database. In order that correct data is available to users,
all data should conform to the integrity constraints defined in the database. Besides, data should
be screened away from unauthorized users so as to maintain security and privacy of the database.
Database control is one of the primary tasks of the database administrator (DBA). Data Security
Includes Data Protection and Access Control
The three dimensions of database control are −
 Authentication
 Access rights
 Integrity constraints

Authentication

In a distributed database system, authentication is the process through which only legitimate
users can gain access to the data resources.
Authentication can be enforced in two levels −
 Controlling Access to Client Computer − At this level, user access is restricted while
login to the client computer that provides user-interface to the database server. The most
common method is a username/password combination. However, more sophisticated
methods like biometric authentication may be used for high security data. Otherwise the
database that was in main memory buffers is lost as a result of a system failure.

 Controlling Access to the Database Software − At this level, the database


software/administrator assigns some credentials to the user. Batch transaction access longer
portion of the database as compared to online transaction. The user gains access to the
database using these credentials. One of the methods is to create a login account within the
database server.

Access Rights

A user’s access rights refers to the privileges that the user is given regarding DBMS
operations such as the rights to create a table, drop a table, add/delete/update tuples in a table or
query upon the table.
In distributed environments, since there are large number of tables and yet larger number
of users, it is not feasible to assign individual access rights to users. So, DDBMS defines certain
roles. A sophisticated locking mechanism known as 2-phase locking which includes Growing
phase and Shrinking Phase.
A role is a construct with certain privileges within a database system. Once the different roles
are defined, the individual users are assigned one of these roles. Often a hierarchy of roles are
defined according to the organization’s hierarchy of authority and responsibility.
For example, the following SQL statements create a role "Accountant" and then assigns this role
to user "ABC".
CREATE ROLE ACCOUNTANT;
GRANT SELECT, INSERT, UPDATE ON EMP_SAL TO ACCOUNTANT;
GRANT INSERT, UPDATE, DELETE ON TENDER TO ACCOUNTANT;
GRANT INSERT, SELECT ON EXPENSE TO ACCOUNTANT;
COMMIT;
GRANT ACCOUNTANT TO ABC;
COMMIT;

Semantic Integrity Control

Semantic integrity control defines and enforces the integrity constraints of the database system.
The integrity constraints are as follows −
 Data type integrity constraint
 Entity integrity constraint
 Referential integrity constraint

Data Type Integrity Constraint


A data type constraint restricts the range of values and the type of operations that can be
applied to the field with the specified data type. The need for updating data in multiple sites
cause problem of data integrity. Commit and rollback are related to data integrity.
For example, let us consider that a table "HOSTEL" has three fields - the hostel number,
hostel name and capacity. The hostel number should start with capital letter "H" and cannot be
NULL, and the capacity should not be more than 150. The following SQL command can be used
for data definition –

CREATE TABLE HOSTEL (


H_NO VARCHAR2(5) NOT NULL,
H_NAME VARCHAR2(15),
CAPACITY INTEGER,
CHECK ( H_NO LIKE 'H%'),
CHECK ( CAPACITY <= 150)
);

Entity Integrity Control


Entity integrity control enforces the rules so that each tuple can be uniquely identified
from other tuples. For this a primary key is defined. A primary key is a set of minimal fields that
can uniquely identify a tuple. Entity integrity constraint states that no two tuples in a table can
have identical values for primary keys and that no field which is a part of the primary key can
have NULL value. A participant can time out in initial , ready and pre-commit states.
For example, in the above hostel table, the hostel number can be assigned as the primary key
through the following SQL statement (ignoring the checks) −
CREATE TABLE HOSTEL (
H_NO VARCHAR2(5) PRIMARY KEY,
H_NAME VARCHAR2(15),
CAPACITY INTEGER
);

Referential Integrity Constraint


Referential integrity constraint lays down the rules of foreign keys. A foreign key is a
field in a data table that is the primary key of a related table. A classification that has been
proposed is with respect to the organization of the read-write actions.
The referential integrity constraint lays down the rule that the value of the foreign key field
should either be among the values of the primary key of the referenced table or be entirely
NULL.
For example, let us consider a student table where a student may opt to live in a hostel.
To include this, the primary key of hostel table should be included as a foreign key in the student
table. The following SQL statement incorporates this −
CREATE TABLE STUDENT (
S_ROLL INTEGER PRIMARY KEY,
S_NAME VARCHAR2(25) NOT NULL,
S_COURSE VARCHAR2(10),
S_HOSTEL VARCHAR2(5) REFERENCES HOSTEL
);

You might also like