Unit 4 DBMS
Unit 4 DBMS
This chapter introduces the concept of DDBMS. In a distributed database, there are a
number of databases that may be geographically distributed all over the world. A distributed
DBMS manages the distributed database in a manner so that it appears as one single database to
users. In the later part of the chapter, we go on to study the factors that lead to distributed
databases, its advantages and disadvantages.
A distributed database is a collection of multiple interconnected databases, which are spread
physically across various locations that communicate via a computer network.
Databases in the collection are logically interrelated with each other. Often they represent a
single logical database.
Data is physically stored across multiple sites. Data in each site can be managed by a
DBMS independent of the other sites.
The processors in the sites are connected via a network. They do not have any
multiprocessor configuration.
Transaction servers which is widely used in relational / distributed database systems.
A distributed database is not a loosely connected file system.
A distributed database incorporates transaction processing, but it is not synonymous with a
transaction processing system. A transaction processing system is also called as TP
monitor.
Global Wait-for graph is used for handling deadlocks in Distributed database
Support for Both OLTP and OLAP − Online Transaction Processing (OLTP) and Online
Analytical Processing (OLAP) work upon diversified systems which may have common
data. Distributed database systems aid both these processing by providing synchronized
data.
Database Recovery − One of the common techniques used in DDBMS is replication of
data across different sites. Replication of data automatically helps in data recovery if
database in any site is damaged. Users can access data from other sites while the damaged
site is being reconstructed. Thus, database failure may become almost inconspicuous to
users.
Example : Let us suppose that in a distributed database, during a transaction T1, one of the
sites, say S1, is failed. When recovers, the site S1 has to check its log file (log based
recovery) to decide the next move on the transaction T1. If the log contains a <commit T>
record, then the site S1 has to do Perform Redo is a operation.
Support for Multiple Application Software − Most organizations use a variety of
application software each with its specific database support. DDBMS provides a uniform
functionality for using the same data among different platforms.
1. Data are located near the greatest demand site. The data in a distributed database system are
dispersed to match business requirements which reduce the cost of data access.
2. Faster data access. End users often work with only a locally stored subset of the company’s
data.
3. Faster data processing. A distributed database system spreads out the systems workload by
processing data at several sites.
4. Growth facilitation. New sites can be added to the network without affecting the operations
of other sites.
5. Improved communications. Because local sites are smaller and located closer to customers,
local sites foster better communication among departments and between customers and
company staff.
7. User-friendly interface. PCs and workstations are usually equipped with an easy-to-use
graphical user interface (GUI). The GUI simplifies training and use for end users.
8. Less danger of a single-point failure. When one of the computers fails, the workload is
picked up by other workstations. Data are also distributed at multiple sites.
9. Processor independence. The end user is able to access any available copy of the data, and
an end user's request is processed by any processor at the data location.
Disadvantages of DDBMS:
1. Complexity of management and control. Applications must recognize data location, and
they must be able to stitch together data from various sites. Database administrators must have
the ability to coordinate database activities to prevent database degradation due to data
anomalies.
3. Security. The probability of security lapses increases when data are located at multiple
sites. The responsibility of data management will be shared by different people at several
sites.
4. Lack of standards. There are no standard communication protocols at the database level.
(Although TCP/IP is the de facto standard at the network level, there is no standard at the
application level.) For example, different database vendors employ different—and often
incompatible—techniques to manage the distribution of data and processing in a DDBMS
environment.
5. Increased storage and infrastructure requirements. Multiple copies of data are required at
different sites, thus requiring additional disk storage space.
6. Increased training cost. Training costs are generally higher in a distributed model than they
would be in a centralized model, sometimes even to the extent of offsetting operational and
hardware savings.
Centralized Database:
Disadvantages:
The data traffic in the case of a centralized database is more.
If any kind of system failure occurs in the centralized system then the entire data will be
destroyed.
Distributed Database:
The distribution design alternatives for the tables in a DDBMS are as follows −
Non-replicated and non-fragmented
Fully replicated
Partially replicated
Fragmented
Mixed
Non-replicated & Non-fragmented
In this design alternative, different tables are placed at different sites. Data is placed so
that it is at a close proximity to the site where it is used most. It is most suitable for database
systems where the percentage of queries needed to join information in tables placed at different
sites is low. If an appropriate distribution strategy is adopted, then this design alternative helps to
reduce the communication cost during data processing.
Fully Replicated
In this design alternative, at each site, one copy of all the database tables is stored. Since,
each site has its own copy of the entire database, queries are very fast requiring negligible
communication cost. On the contrary, the massive redundancy in data requires huge cost during
update operations. Hence, this is suitable for systems where a large number of queries is required
to be handled whereas the number of database updates is low.
Partially Replicated
Copies of tables or portions of tables are stored at different sites. The distribution of the
tables is done in accordance to the frequency of access. This takes into consideration the fact that
the frequency of accessing the tables vary considerably from site to site. The number of copies of
the tables (or portions) depends on how frequently the access queries execute and the site which
generate the access queries. Partially committed the transaction will wait for the final statement
has been executed.
Fragmented
In this design, a table is divided into two or more pieces referred to as fragments or
partitions, and each fragment can be stored at different sites. This considers the fact that it
seldom happens that all data stored in a table is required at a given site. Moreover, fragmentation
increases parallelism and provides better disaster recovery. Here, there is only one copy of each
fragment in the system, i.e. no redundant data.
The three fragmentation techniques are −
Vertical fragmentation
Horizontal fragmentation
Hybrid fragmentation
Mixed Distribution
This is a combination of fragmentation and partial replications. Here, the tables are
initially fragmented in any form (horizontal or vertical), and then these fragments are partially
replicated across the different sites according to the frequency of accessing the fragments.
4.8 Distributed DBMS - Database Control
Database control refers to the task of enforcing regulations so as to provide correct data
to authentic users and applications of a database. In order that correct data is available to users,
all data should conform to the integrity constraints defined in the database. Besides, data should
be screened away from unauthorized users so as to maintain security and privacy of the database.
Database control is one of the primary tasks of the database administrator (DBA). Data Security
Includes Data Protection and Access Control
The three dimensions of database control are −
Authentication
Access rights
Integrity constraints
Authentication
In a distributed database system, authentication is the process through which only legitimate
users can gain access to the data resources.
Authentication can be enforced in two levels −
Controlling Access to Client Computer − At this level, user access is restricted while
login to the client computer that provides user-interface to the database server. The most
common method is a username/password combination. However, more sophisticated
methods like biometric authentication may be used for high security data. Otherwise the
database that was in main memory buffers is lost as a result of a system failure.
Access Rights
A user’s access rights refers to the privileges that the user is given regarding DBMS
operations such as the rights to create a table, drop a table, add/delete/update tuples in a table or
query upon the table.
In distributed environments, since there are large number of tables and yet larger number
of users, it is not feasible to assign individual access rights to users. So, DDBMS defines certain
roles. A sophisticated locking mechanism known as 2-phase locking which includes Growing
phase and Shrinking Phase.
A role is a construct with certain privileges within a database system. Once the different roles
are defined, the individual users are assigned one of these roles. Often a hierarchy of roles are
defined according to the organization’s hierarchy of authority and responsibility.
For example, the following SQL statements create a role "Accountant" and then assigns this role
to user "ABC".
CREATE ROLE ACCOUNTANT;
GRANT SELECT, INSERT, UPDATE ON EMP_SAL TO ACCOUNTANT;
GRANT INSERT, UPDATE, DELETE ON TENDER TO ACCOUNTANT;
GRANT INSERT, SELECT ON EXPENSE TO ACCOUNTANT;
COMMIT;
GRANT ACCOUNTANT TO ABC;
COMMIT;
Semantic integrity control defines and enforces the integrity constraints of the database system.
The integrity constraints are as follows −
Data type integrity constraint
Entity integrity constraint
Referential integrity constraint