0% found this document useful (0 votes)
102 views

Distributed DB

A distributed database is a collection of interconnected databases located across different physical locations but which can be viewed logically as a single database. A distributed database management system (DDBMS) allows for the management of this distributed database and makes the distribution transparent to users. Key features of distributed databases include logical interrelation of data across sites, physical storage of data at different sites, and connection of processors via a network.

Uploaded by

Gopal Garg
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
102 views

Distributed DB

A distributed database is a collection of interconnected databases located across different physical locations but which can be viewed logically as a single database. A distributed database management system (DDBMS) allows for the management of this distributed database and makes the distribution transparent to users. Key features of distributed databases include logical interrelation of data across sites, physical storage of data at different sites, and connection of processors via a network.

Uploaded by

Gopal Garg
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 43

Unit#1

Distributed DB
A distributed database is a collection of multiple interconnected databases, which are
spread physically across various locations that communicate via a computer network.

A distributed database management system (distributed DBMS) is


then defined as the software system that permits the management
of the distributed database and makes the distribution transparent
to the users. Sometimes “distributed database system” (DDBS) is
used to refer jointly to the distributed database and the distributed
DBMS. The two important terms in these definitions are “logically
interrelated” and “distributed over a computer network.”
Features
 Databases in the collection are logically interrelated with each other. Often they
represent a single logical database.
 Data is physically stored across multiple sites. Data in each site can be managed
by a DBMS independent of the other sites.
 The processors in the sites are connected via a network. They do not have any
multiprocessor configuration.
 A distributed database is not a loosely connected file system.
 A distributed database incorporates transaction processing, but it is not
synonymous with a transaction processing system.

 Location independent

 Distributed query processing

 Distributed transaction management

 Hardware independent

 Operating system independent

 Network independent

 Transaction transparency

 DBMS independent

Advantages of Distributed Database :

 Transparent Management of Distributed and Replicated Data

 Reliability Through Distributed Transactions

 Improved Performance

 Easier System Expansion


 Modular Development − If the system needs to be expanded to new locations
or new units, in centralized database systems, the action requires substantial
efforts and disruption in the existing functioning. However, in distributed databases,
the work simply requires adding new computers and local data to the new site and
finally connecting them to the distributed system, with no interruption in current
functions.
 More Reliable − In case of database failures, the total system of centralized
databases comes to a halt. However, in distributed systems, when a component
fails, the functioning of the system continues may be at a reduced performance.
Hence DDBMS is more reliable.
 Better Response − If data is distributed in an efficient manner, then user
requests can be met from local data itself, thus providing faster response. On the
other hand, in centralized systems, all queries have to pass through the central
computer for processing, which increases the response time.
 Lower Communication Cost − In distributed database systems, if data is
located locally where it is mostly used, then the communication costs for data
manipulation can be minimized. This is not feasible in centralized systems.

Disadvantages of Distributed Database System

 Design Issues

 Cost of Update Replication , and Syncronization

 Cost of Security Constraints

 Recover from Failure and Syncronization

Distributed Database Management System


A distributed database management system (DDBMS) is a centralized software system
that manages a distributed database in a manner as if it were all stored in a single
location.

Features

 It is used to create, retrieve, update and delete distributed databases.


 It synchronizes the database periodically and provides access mechanisms by
the virtue of which the distribution becomes transparent to the users.
 It ensures that the data modified at any site is universally updated.
 It is used in application areas where large volumes of data are processed and
accessed by numerous users simultaneously.
 It is designed for heterogeneous database platforms.
 It maintains confidentiality and data integrity of the databases.

Factors Encouraging DDBMS


The following factors encourage moving over to DDBMS −
 Distributed Nature of Organizational Units − Most organizations in the current
times are subdivided into multiple units that are physically distributed over the
globe. Each unit requires its own set of local data. Thus, the overall database of
the organization becomes distributed.
 Need for Sharing of Data − The multiple organizational units often need to
communicate with each other and share their data and resources. This
demands common databases or replicated databases that should be used in a
synchronized manner.
 Support for Both OLTP and OLAP − Online Transaction Processing (OLTP)
and Online Analytical Processing (OLAP) work upon diversified systems which
may have common data. Distributed database systems aid both these
processing by providing synchronized data.
 Database Recovery − One of the common techniques used in DDBMS is
replication of data across different sites. Replication of data automatically helps
in data recovery if database in any site is damaged. Users can access data from
other sites while the damaged site is being reconstructed. Thus, database
failure may become almost inconspicuous to users.
 Support for Multiple Application Software − Most organizations use a variety
of application software each with its specific database support. DDBMS provides
a uniform functionality for using the same data among different platforms.

Adversities of Distributed Databases


Following are some of the adversities associated with distributed databases.
 Need for complex and expensive software − DDBMS demands complex and
often expensive software to provide data transparency and co-ordination across
the several sites.
 Processing overhead − Even simple operations may require a large number of
communications and additional calculations to provide uniformity in data across
the sites.
 Data integrity − The need for updating data in multiple sites pose problems of
data integrity.
 Overheads for improper data distribution − Responsiveness of queries is
largely dependent upon proper data distribution. Improper data distribution often
leads to very slow response to user requests.
Centralized database

A centralized database (sometimes abbreviated CDB) is a database that is located,


stored, and maintained in a single location. This location is most often a central
computer or database system, for example a desktop or server CPU, or a mainframe
computer.
The Advantages of Centralized Database Storage

 Improves Data Preservation. Centralized database storage improves data


preservation. ...
 Improves Physical Security. Locally stored data represents an ongoing physical
security risk. ...
 Improves Data Security. ...
 Reduces costs. ...
 Improved Reliability and Update Speed.

A centralized database is stored at a single location such as a mainframe computer. It is


maintained and modified from that location only and usually accessed using an internet
connection such as a LAN or WAN. The centralized database is used by organisations
such as colleges, companies, banks etc.

As can be seen from the above diagram, all the information for the organisation is
stored in a single database. This database is known as the centralized database.

Advantages
Some advantages of Centralized Database Management System are −

 The data integrity is maximised as the whole database is stored at a single physical
location. This means that it is easier to coordinate the data and it is as accurate and
consistent as possible.
 The data redundancy is minimal in the centralised database. All the data is stored
together and not scattered across different locations. So, it is easier to make sure there
is no redundant data available.
 Since all the data is in one place, there can be stronger security measures around it. So,
the centralised database is much more secure.
 Data is easily portable because it is stored at the same place.
 The centralized database is cheaper than other types of databases as it requires less
power and maintenance.
 All the information in the centralized database can be easily accessed from the same
location and at the same time.
Disadvantages
Some disadvantages of Centralized Database Management System are −
 Since all the data is at one location, it takes more time to search and access it. If the
network is slow, this process takes even more time.
 There is a lot of data access traffic for the centralized database. This may create a
bottleneck situation.
 Since all the data is at the same location, if multiple users try to access it simultaneously
it creates a problem. This may reduce the efficiency of the system.
 If there are no database recovery measures in place and a system failure occurs, then
all the data in the database will be destroyed.

Main Features of a DBMS

Some of the significant features of a DBMS include:

·       Low Repetition and Redundancy

In a database, the chances of data duplication are quite high as several users
use one database. A DBMS reduces data repetition and redundancy by
creating a single data repository that can be accessed by multiple users.

·       Easy Maintenance of Large Databases

Most organizational data is stored in large databases. A DBMS helps maintain


these databases by enforcing user-defined validation and integrity
constraints, such as user-based access.

·       Enhanced Security

When handling large amounts of data, security becomes the top-most


concern for all businesses. A database management software doesn’t allow
full access to anyone except the database administrator or the departmental
head. Only they can modify the database and control user access, making the
database more secure. All other users are restricted, depending on their
access level.

·       Improved File Consistency

By implementing a database management system, organizations can create


a standardized way to use files and ensure consistency of data with other
systems and applications. This streamlines data management and
manipulation because the same rules can be applied to all the data
throughout the organization.

·       Multi-User Environment Support

A database management software supports a multi-user environment,


allowing several users to access and work on data concurrently. It also
supports several views of the data. A view is a subsection of the database
that’s distinct and dedicated for specific operators of the system.

As a database is typically accessed by multiple operators simultaneously,


these operators may need different database views. For example, operator A
may want to print a bank statement, whereas Operator B would want to only
check the bank balance. Although both are querying the same database, they
will be presented with different views.

Types of Distributed Databases


Distributed databases can be broadly classified into homogeneous and heterogeneous
distributed database environments, each with further sub-divisions, as shown in the
following illustration.

Homogeneous Distributed Databases

In a homogeneous distributed database, all the sites use identical DBMS and operating
systems. Its properties are −
 The sites use very similar software.
 The sites use identical DBMS or DBMS from the same vendor.
 Each site is aware of all other sites and cooperates with other sites to process
user requests.
 The database is accessed through a single interface as if it is a single database.

Types of Homogeneous Distributed Database

There are two types of homogeneous distributed database −


 Autonomous − Each database is independent that functions on its own. They
are integrated by a controlling application and use message passing to share
data updates.
 Non-autonomous − Data is distributed across the homogeneous nodes and a
central or master DBMS co-ordinates data updates across the sites.

Heterogeneous Distributed Databases

In a heterogeneous distributed database, different sites have different operating


systems, DBMS products and data models. Its properties are −
 Different sites use dissimilar schemas and software.
 The system may be composed of a variety of DBMSs like relational, network,
hierarchical or object oriented.
 Query processing is complex due to dissimilar schemas.
 Transaction processing is complex due to dissimilar software.
 A site may not be aware of other sites and so there is limited co-operation in
processing user requests.

Types of Heterogeneous Distributed Databases

 Federated − The heterogeneous database systems are independent in nature


and integrated together so that they function as a single database system.
 Un-federated − The database systems employ a central coordinating module
through which the databases are accessed.

DDBMS - Distribution Transparency


Distribution transparency is the property of distributed databases by the virtue of which
the internal details of the distribution are hidden from the users. The DDBMS designer
may choose to fragment tables, replicate the fragments and store them at different
sites. However, since users are oblivious of these details, they find the distributed
database easy to use like any centralized database.
The three dimensions of distribution transparency are −

 Location transparency
 Fragmentation transparency
 Replication transparency

Location Transparency
Location transparency ensures that the user can query on any table(s) or fragment(s)
of a table as if they were stored locally in the user’s site. The fact that the table or its
fragments are stored at remote site in the distributed database system, should be
completely oblivious to the end user. The address of the remote site(s) and the access
mechanisms are completely hidden.
In order to incorporate location transparency, DDBMS should have access to updated
and accurate data dictionary and DDBMS directory which contains the details of
locations of data.

Fragmentation Transparency
Fragmentation transparency enables users to query upon any table as if it were
unfragmented. Thus, it hides the fact that the table the user is querying on is actually a
fragment or union of some fragments. It also conceals the fact that the fragments are
located at diverse sites.
This is somewhat similar to users of SQL views, where the user may not know that
they are using a view of a table instead of the table itself.

Replication Transparency
Replication transparency ensures that replication of databases are hidden from the
users. It enables users to query upon a table as if only a single copy of the table exists.
Replication transparency is associated with concurrency transparency and failure
transparency. Whenever a user updates a data item, the update is reflected in all the
copies of the table. However, this operation should not be known to the user. This is
concurrency transparency. Also, in case of failure of a site, the user can still proceed
with his queries using replicated copies without any knowledge of failure. This is failure
transparency.

Combination of Transparencies
In any distributed database system, the designer should ensure that all the stated
transparencies are maintained to a considerable extent. The designer may choose to
fragment tables, replicate them and store them at different sites; all oblivious to the end
user. However, complete distribution transparency is a tough task and requires
considerable design efforts.

Distribution Transparency
Distribution transparency allows a physically dispersed database to be
managed as though it were a centralized database. The level of transparency
supported by the DDBMS varies from system to system. Three levels of
distribution transparency are recognized:
• Fragmentation transparency is the highest level of transparency. The end
user or programmer does not need to know that a database is partitioned.
Therefore, neither fragment names nor fragment locations are specified prior
to data access.
• Location transparency exists when the end user or programmer must
specify the database fragment names but does not need to specify where
those fragments are located.
• Local mapping transparency exists when the end user or programmer must
specify both the fragment names and their locations.

To illustrate the use of various transparency levels, suppose you have an


EMPLOYEE table containing the attributes EMP_NAME, EMP_DOB,
EMP_ADDRESS, EMP_DEPARTMENT, and EMP_SALARY. The EMPLOYEE data
are distributed over three different locations: New York, Atlanta, and Miami.
The table is divided by location; that is, New York employee data are stored
in fragment E1, Atlanta employee data are stored in fragment E2, and Miami
employee data are stored in fragment E3. Consider the following figure.

 
 

Now suppose that the end user wants to list all employees with a date of
birth prior to January 1, 1960. To focus on the transparency issues, also
suppose that the EMPLOYEE table is fragmented and each fragment is
unique. The unique fragment condition indicates that each row is unique,
regardless of the fragment in which it is located. Finally, assume that no
portion of the database is replicated at any other site on the network.
Depending on the level of distribution transparency support, you may
examine three query cases.

 
Case 1: The Database Supports Fragmentation Transparency
 

The query conforms to a non-distributed database query format; that is, it


does not specify fragment names or locations.

The query reads:

SELECT *
FROM EMPLOYEE
WHERE EMP_DOB < '01-JAN-196';

 
Case 2: The Database Supports Location Transparency
 

Fragment names must be specified in the query, but the fragment’s location
is not specified. The query reads:
SELECT *
FROM E1
WHERE EMP_DOB < '01-JAN-1960';
UNION
SELECT *
FROM E2
WHERE EMP_DOB < '01-JAN-1960';
UNION
SELECT *
FROM E 3
WHERE EMP_DOB < '01-JAN-1960';
 
Case 3: The Database Supports Local Mapping Transparency
 

Both the fragment name and its location must be specified in the query.
Using pseudo-SQL:

SELECT *
FROM El NODE NY
WHERE EMP_DOB < '01-JAN-1960';
UNION
SELECT *
FROM E2 NODE ATL
WHERE EMP_DOB < '01-JAN-1960';
UNION
SELECT * FROM E3 NODE MIA
WHERE EMP_DOB < '01-JAN-1960';
Distribution transparency is supported by a distributed data dictionary
(DDD), or a distributed data catalog (DDC). The DDC contains the
description of the entire database as seen by the database administrator.
The database description, known as the distributed global schema, is the
common database schema used by local TPs to translate user requests into
sub queries (remote requests) that will be processed by different DPs. The
DDC is itself distributed, and it is replicated at the network nodes. Therefore,
the DDC must maintain consistency through updating at all sites.

Explain different types of transparencies in distributed


database.
Transparency in DBMS stands for the separation of high level semantics of the system
from the low-level implementation issue. High-level semantics stands for the endpoint
user, and low level implementation concerns with complicated hardware implementation
of data or how the data has been stored in the database. We are familiar with the term
data independence. Using data independence in various layers of the database,
transparency can be implemented in DBMS.
Unlike normal DBMS, DDBMS deals with communication network, replicas and
fragments of data. Thus, transparency also involves these three factors.
Following are three types of transparency:
1) Network transparency
2) Replication transparency
3) Fragmentation transparency
1. Network transparency
DBMS users should not be concerned about the type of DBMS they are using. In
case of DDBMS, the network should not be visible to the user. Thus, to provide
network transparency. We need to follow the following statements:

2. The query language that will be used should not include any location
specification. In this way, location transparency can be achieved.
3. The data that is stored in a relation should not contain any location specification.
In this naming transparency can be assured.
4. Every database object must have a system wide unique name.
5. The location information can be found using the data dictionary.
6. Using aliases, we can move the database objects transparently.
7. Replication Transparency
Replication transparency states that the replicas that are created should be
controlled by the system, not by user. The user should not have any doubt about
whether the fetched data is coming from replicated copy of the elation or from the
actual copy of the relation. To achieve replication transparency the concurrency
control protocol needs to be devised, which assures that the update of data
taking place in one copy should also be updated in other copies. In this way,
transparency regarding replicas of data can be maintained.
8. Fragmentation Transparency
Fragmentation transparency states that the fragments that are created to store
the data in distributed manner should remain transparent and all the data
management work required to control the fragments should be done by the
system, not by the user. In this task, when a user puts a query, the global query
is distributed in many sites to fetch data from fragments and this data is put
together at the end to generate the result. The system ensures that the total
procedure of query decomposition and re-composition should be transparent to
the user.

4.4 REFERENCE ARCHITECTURE OF DDBMS


From pdf Unit1(Unit 4)
Explain Reference Architecture of Distributed DBMS
Reference Architecture of Distributed DBMSs
1. Data is distributed system are usually fragmented and replicated.Considering this
fragmentation and replication issue
2.The reference architecture of DBMS consist of the following schemas:-
●A set of global external schema.
●A global conceptual schema.
●A fragmentation schema and allocation schema.
●A set of schemas for each local DBMS.
Global external schema- In a distributed system,user applications and user accesseto
the distributed database are represented by a number of global external schemas.This
is the topmost level in the reference architecture of DBMS.This level describes the part
of the distributed database that is relevant to different users.
Global conceptual schema- The GCS represents the logical discription of entire
database as if it is not distributed.This level contains definitions of all
entities,relationships among entities and security and integrity information of whole
databases stored at all sites in a distributed system.
Fragmentation schema and allocation schema- The fragmentation schema
describes how the data is to be logically partitioned in a distributed database.The GCS
consists of a set of global relations,and the mapping between the global relations and
fragments is defined in the fragmentation schema.
The allocation schema is a description of where the data(fragments)are to be
located,taking account of any replication.The type of mapping in the allocation schema
determined whether the distributed database is redundant or non redundant.In case of
redundant data distribution,the mapping is one to many,whereas in case of non
redundant data distribution is one to one.
Local schemas- In a distributed database system,the physical data organization at
each machine is probably different,and therefore it requires an individual internal
schema definition at each site,called local internal schema.
To handle fragmentation and replication issues,the logicalorganization of data at each
sites is described by a third layer in the architecture called local conceptual schema.
The GCS is the union of all local conceptual schemas thus the local conceptual
schemas are mappings of the global schema onto each site.This mapping is done
by local mapping schemas.
This architecture provides a very general conceptual framework for understanding
distributed database.

Reference architecture of a distributed DBMS


In chapter 1 we looked at the ANSI_SPARC three-level architecture of a DBMS. The architecture
reference shows how different schemas of the DBMS can be organised. This architecture cannot be
applied directly to distributed environments because of the diversity and complexity of distributed
DBMSs. The diagram below shows how the schemas of a distributed database system can be
organised. The diagram is adopted from Hirendra Sisodiya (2011).

Figure 15.4
Reference architecture for distributed database
1. Global schema
The global schema contains two parts, a global external schema and a global conceptual
schema. The global schema gives access to the entire system. It provides applications with
access to the entire distributed database system, and logical description of the whole
database as if it was not distributed.
2. Fragmentation schema
The fragmentation schema gives the description of how the data is partitioned.
3. Allocation schema
Gives a description of where the partitions are located.
4. Local mapping
The local mapping contains the local conceptual and local internal schema. The local
conceptual schema provides the description of the local data. The local internal schema
gives the description of how the data is physically stored on the disk.
Data fragmentation in DBMS
Distributed Database systems provide distribution transparency of the data
over the DBs. This is achieved by the concept called Data Fragmentation.
That means, fragmenting the data over the network and over the DBs. Initially
all the DBs and data are designed as per the standards of any database
system – by applying normalization and denormalization. But the concept of
distributed system makes these normalized data to be divided further. That
means the main goal of DDBMS is to provide the data to the user from the
nearest location to them and as fast as possible. Hence the data in a table are
divided according their location or as per user’s requirement.

Dividing the whole table data into smaller chunks and storing them in different
DBs in the DDBMS is called data fragmentation. By fragmenting the relation in
DB allows:

 Easy usage of Data: It makes most frequently accessed set of data


near to the user. Hence these data can be accessed easily as and when
required by them.
 Efficiency : It in turn increases the efficiency of the query by reducing
the size of the table to smaller subset and making them available with
less network access time.
 Security : It provides security to the data. That means only valid and
useful records will be available to the actual user. The DB near to the
user will not have any unwanted data in their DB. It will contain only
those informations, which are necessary for them.
 Parallelism : Fragmentation allows user to access the same table at
the same time from different locations. Users at different locations will
be accessing the same table in the DB at their location, seeing the data
that are meant for them. If they are accessing the table at one location,
then they have to wait for the locks to perform their transactions.
 Reliability : It increases the reliability of fetching the data. If the users
are located at different locations accessing the single DB, then there will
be huge network load. This will not guarantee that correct records are
fetched and returned to the user. Accessing the fragment of data in the
nearest DB will reduce the risk of data loss and correctness of data.
 Balanced Storage : Data will be distributed evenly among the
databases in DDB.
Information about the fragmentation of the data is stored in DDC. When user
sends a query, this DDC will determine which fragment to be accessed and it
points that data fragment.

Fragmentation of data can be done according to the DBs and user


requirement. But while fragmenting the data, below points should be kept in
mind :

 Completeness : While creating the fragment, partial records in the


table should not be considered. Fragmentation should be performed on
whole table’s data to get the correct result. For example, if we are
creating fragment on EMPLOYEE table, then we need to consider
whole EMPLOYEE table for constructing fragments. It should not be
created on the subset of EMPLOYEE records.
 Reconstructions : When all the fragments are combined, it should give
whole table’s data. That means whole table should be able to
reconstruct using all fragments. For example all fragments’ of
EMPLOYEE table in the DB, when combined should give complete
EMPLOYEE table records.
 Disjointedness : There should not be any overlapping data in the
fragments. If so, it will be difficult to maintain the consistency of the
data. Effort needs to be put to create same replication in all the copies
of data. Suppose we have fragments on EMPLOYEE table based on
location then, there should not be any two fragments having the details
of same employee.
Table of Contents
 There are 3 types of data fragmentations in DDBMS.
o Horizontal Data Fragmentation :
o Vertical Data Fragmentation :
o Hybrid Data Fragmentation :

There are 3 types of data


fragmentations in DDBMS.
 Horizontal Data Fragmentation :
As the name suggests, here the data / records are fragmented horizontally.
i.e.; horizontal subset of table data is created and are stored in different
database in DDB.

READ  Database Management System

For example, consider the employees working at different locations of the


organization like India, USA, UK etc. number of employees from all these
locations are not a small number. They are huge in number. When any details
of any one employee are required, whole table needs to be accessed to get
the information. Again the employee table may present in any location in the
world. But the concept of DDB is to place the data in the nearest DB so that it
will be accessed quickly. Hence what we do is divide the entire employee
table data horizontally based on the location. i.e.;

SELECT * FROM EMPLOYEE WHERE EMP_LOCATION = ‘INDIA;

SELECT * FROM EMPLOYEE WHERE EMP_LOCATION = ‘USA’;

SELECT * FROM EMPLOYEE WHERE EMP_LOCATION = ‘UK;

Now these queries will give the subset of records from EMPLOYEE table
depending on the location of the employees. These sub set of data will be
stored in the DBs at respective locations. Any insert, update and delete on the
employee records will be done on the DBs at their location and it will be
synched with the main table at regular intervals.
Above is the simple example of horizontal fragmentation. This fragmentation
can be done with more than one conditions joined by AND or OR clause.
Fragmentation is done based on the requirement and the purpose of DDB.

 Vertical Data Fragmentation :


This is the vertical subset of a relation. That means a relation / table is
fragmented by considering the columns of it.
For example consider the EMPLOYEE table with ID, Name, Address, Age, location, DeptID,
ProjID. The vertical fragmentation of this table may be dividing the table into different tables
with one or more columns from EMPLOYEE.

SELECT EMP_ID, EMP _FIRST_NAME, EMP_LAST_NAME, AGE FROM EMPLOYEE;


SELECT EMP_ID, STREETNUM, TOWN, STATE, COUNTRY, PIN FROM EMPLOYEE;
SELECT EMP_ID, DEPTID FROM EMPLOYEE;
SELECT EMP_ID, PROJID FROM EMPLOYEE;

This type of fragment will have fragmented details about whole employee. This will be useful
when the user needs to query only few details about the employee. For example consider a query
to find the department of the employee. This can be done by querying the third fragment of the
table. Consider a query to find the name and age of an employee whose ID is given. This can be
done by querying first fragment of the table. This will avoid performing ‘SELECT *’ operation
which will need lot of memory to query the whole table – to traverse whole data as well as to
hold all the columns.

READ  Architecture of Database

In this fragment overlapping columns can be seen but these columns are primary key and are
hardly changed throughout the life cycle of the record. Hence maintaining cost of this
overlapping column is very least. In addition this column is required if we need to reconstruct the
table or to pull the data from two fragments. Hence it still meets the conditions of fragmentation.

 Hybrid Data Fragmentation :


This is the combination of horizontal as well as vertical fragmentation. This
type of fragmentation will have horizontal fragmentation to have subset of data
to be distributed over the DB, and vertical fragmentation to have subset of
columns of the table.

As we observe in above diagram, this type of fragmentation can be done in


any order. It does not have any particular order. It is solely based on the user
requirement. But it should satisfy fragmentation conditions.

Consider the EMPLOYEE table with below fragmentations.

SELECT EMP_ID, EMP _FIRST_NAME, EMP_LAST_NAME, AGE


FROM EMPLOYEE WHERE EMP_LOCATION = ‘INDIA;
SELECT EMP_ID, DEPTID FROM EMPLOYEE WHERE EMP_LOCATION = ‘INDIA;
SELECT EMP_ID, EMP _FIRST_NAME, EMP_LAST_NAME, AGE
FROM EMPLOYEE WHERE EMP_LOCATION = ‘US;
SELECT EMP_ID, PROJID FROM EMPLOYEE WHERE EMP_LOCATION = ‘US;

This is a hybrid or mixed fragmentation of EMPLOYEE table.

What are distributed database access primitives?


Database access primitives are often categorized as the CRUD operations (Create, Read,
Update, Delete). Whether the database is "distributed" (there are multiple relevant
definitions for that term) isn't directly relative to what the primitive operations are.

Note that CRUD doesn't include a means of searching for things in a database, which would
generally be considered a higher-level capability.

Distribution of a database affects implementation strategy, but it's generally desirable that
the semantics of the core operations don't constrain the user to a particular
implementation.

Access Rights
A user’s access rights refers to the privileges that the user is given regarding DBMS
operations such as the rights to create a table, drop a table, add/delete/update tuples
in a table or query upon the table.
In distributed environments, since there are large number of tables and yet larger
number of users, it is not feasible to assign individual access rights to users. So,
DDBMS defines certain roles. A role is a construct with certain privileges within a
database system. Once the different roles are defined, the individual users are
assigned one of these roles. Often a hierarchy of roles are defined according to the
organization’s hierarchy of authority and responsibility.
For example, the following SQL statements create a role "Accountant" and then
assigns this role to user "ABC".
CREATE ROLE ACCOUNTANT;
GRANT SELECT, INSERT, UPDATE ON EMP_SAL TO ACCOUNTANT;
GRANT INSERT, UPDATE, DELETE ON TENDER TO ACCOUNTANT;
GRANT INSERT, SELECT ON EXPENSE TO ACCOUNTANT;
COMMIT;
GRANT ACCOUNTANT TO ABC;
COMMIT;

Semantic Integrity Control


Semantic integrity control defines and enforces the integrity constraints of the database
system.
The integrity constraints are as follows −

 Data type integrity constraint


 Entity integrity constraint
 Referential integrity constraint

Data Type Integrity Constraint

A data type constraint restricts the range of values and the type of operations that can
be applied to the field with the specified data type.
For example, let us consider that a table "HOSTEL" has three fields - the hostel
number, hostel name and capacity. The hostel number should start with capital letter
"H" and cannot be NULL, and the capacity should not be more than 150. The following
SQL command can be used for data definition −
CREATE TABLE HOSTEL (
H_NO VARCHAR2(5) NOT NULL,
H_NAME VARCHAR2(15),
CAPACITY INTEGER,
CHECK ( H_NO LIKE 'H%'),
CHECK ( CAPACITY <= 150)
);

Entity Integrity Control

Entity integrity control enforces the rules so that each tuple can be uniquely identified
from other tuples. For this a primary key is defined. A primary key is a set of minimal
fields that can uniquely identify a tuple. Entity integrity constraint states that no two
tuples in a table can have identical values for primary keys and that no field which is a
part of the primary key can have NULL value.
For example, in the above hostel table, the hostel number can be assigned as the
primary key through the following SQL statement (ignoring the checks) −
CREATE TABLE HOSTEL (
H_NO VARCHAR2(5) PRIMARY KEY,
H_NAME VARCHAR2(15),
CAPACITY INTEGER
);

Referential Integrity Constraint

Referential integrity constraint lays down the rules of foreign keys. A foreign key is a
field in a data table that is the primary key of a related table. The referential integrity
constraint lays down the rule that the value of the foreign key field should either be
among the values of the primary key of the referenced table or be entirely NULL.
For example, let us consider a student table where a student may opt to live in a
hostel. To include this, the primary key of hostel table should be included as a foreign
key in the student table. The following SQL statement incorporates this −
CREATE TABLE STUDENT (
S_ROLL INTEGER PRIMARY KEY,
S_NAME VARCHAR2(25) NOT NULL,
S_COURSE VARCHAR2(10),
S_HOSTEL VARCHAR2(5) REFERENCES HOSTEL
);

Integrity Constraints
o Integrity constraints are a set of rules. It is used to maintain the quality of
information.
o Integrity constraints ensure that the data insertion, updating, and other processes
have to be performed in such a way that data integrity is not affected.
o Thus, integrity constraint is used to guard against accidental damage to the
database.
Types of Integrity Constraint

1. Domain constraints
o Domain constraints can be defined as the definition of a valid set of values for an
attribute.
o The data type of domain includes string, character, integer, time, date, currency,
etc. The value of the attribute must be available in the corresponding domain.

Example:
2. Entity integrity constraints
o The entity integrity constraint states that primary key value can't be null.
o This is because the primary key value is used to identify individual rows in relation
and if the primary key has a null value, then we can't identify those rows.
o A table can contain a null value other than the primary key field.

Example:

3. Referential Integrity Constraints


o A referential integrity constraint is specified between two tables.
o In the Referential integrity constraints, if a foreign key in Table 1 refers to the
Primary Key of Table 2, then every value of the Foreign Key in Table 1 must be null
or be available in Table 2.

Example:

next →← prev

Integrity Constraints
o Integrity constraints are a set of rules. It is used to maintain the quality of
information.
o Integrity constraints ensure that the data insertion, updating, and other processes
have to be performed in such a way that data integrity is not affected.
o Thus, integrity constraint is used to guard against accidental damage to the
database.

Types of Integrity Constraint

 Domain constraints. Domain constraints can be defined as the definition of a


valid set of values for an attribute. ...
 Entity integrity constraints. The entity integrity constraint states that primary
key value can't be null. ...
 Referential Integrity Constraints. ...
 Key constraints.
Types of Integrity Constraint

1. Domain constraints
o Domain constraints can be defined as the definition of a valid set of values for an
attribute.
o The data type of domain includes string, character, integer, time, date, currency,
etc. The value of the attribute must be available in the corresponding domain.

Example:
2. Entity integrity constraints
o The entity integrity constraint states that primary key value can't be null.
o This is because the primary key value is used to identify individual rows in relation
and if the primary key has a null value, then we can't identify those rows.
o A table can contain a null value other than the primary key field.

Example:

3. Referential Integrity Constraints


o A referential integrity constraint is specified between two tables.
o In the Referential integrity constraints, if a foreign key in Table 1 refers to the
Primary Key of Table 2, then every value of the Foreign Key in Table 1 must be
null or be available in Table 2.

Example:
4. Key constraints
o Keys are the entity set that is used to identify an entity within its entity set
uniquely.
o An entity set can have multiple keys, but out of which one key will be the primary
key. A primary key can contain a unique and null value in the relational table.

Example:

You might also like