0% found this document useful (0 votes)
14 views6 pages

Database Modeling - notes-VII

Uploaded by

ishtiaq.hussain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views6 pages

Database Modeling - notes-VII

Uploaded by

ishtiaq.hussain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Requirements of a Generalized DDBMS: Date’s 12 Rules

Rule 1. Local Autonomy. Local data is locally owned and managed, even when it is accessible
by a remote site. Security, integrity, and storage remain under control of the local system. Local
users should not be hampered when their system is part of a distributed system.

Rule 2. No Central Site. There must be no central point of failure or bottleneck. Therefore the
following must be distributed: dictionary management, query processing, concurrency control, and
recovery control.

Rule 3. Continuous Operation. The system should not require a shutdown to add or remove a
node from the network. User applications should not have to change when a new network is added,
provided they do not need information from the added node.

Rule 4. Location Independence (or Transparency). A common global user view of the
database should be supported so that users need not know where the data is located. This allows data
to be moved for performance considerations or in response to storage constraints without affecting the
user applications.

Rule 5. Fragmentation Independence (or Transparency). This allows tables to be split


among several sites, transparent to user applications. For example, we can store New York employee
records at the New York site and Boston employees at the Boston site, but allow the user to refer to
the separated data as EMPLOYEES, independent of their locations.

Rule 6. Replication Independence (or Transparency). This allows several copies of a table
(or portions thereof) to reside at different nodes. Query performance can be improved since
applications can work with a local copy instead of a remote one. Update performance, however, may
be degraded du to the additional copies. Availability can improve.

Rule 7. Distributed Query Processing. No central site should perform optimization; but the
submitting site, which receives the query from the user, should decide the overall strategy. Other
participants perform optimization at their own levels.

Rule 8. Distributed Transaction Processing. The system should process a transaction across
multiple databases exactly as if all of the data were local. Each node should be capable of acting as a
coordinator for distributed updates, and as a participant in other transactions. Concurrency control
must occur at the local level (Rule 2), but there must also be cooperation between individual systems to
ensure that a “global deadlock” does not occur.

Rule 9. Hardware Independence. The concept of a single database system must be presented
regardless of the underlying hardware used to implement the individual systems.

Rule 10. Operating System Independence. The concept of a single database system must be
presented regardless of the underlying operating systems used.

Rule 11. Network Independence. The distributed system must be capable of communicating
ove a wide variety of networks, often different ones in the same configuration. Standard network
protocols must be adhered to.

Rule 12. DBMS Independence (Heterogeneity). The distributed system should be able to be
made up of individual sites running different database management systems.

1
What are the basic issues in the design and implementation of
distributed database systems?
* Data Distribution Strategies
- Fragmentation
- Data allocation
- Replication
- Network data directory distribution

* Query Processing and Optimization


* Distribution Transparency
- location, fragmentation, replication, update
* Integrity
- Transaction management
- Concurrency control
- Recovery and availability
- Integrity constraint checking
* Privacy and Security
- Database administrators
* Data Manipulation Languages
- SQL is the standard
- Forms coming into common use

2
Modified Life Cycle for Data Distribution

IV. Data distribution (allocation). Create a data allocation schema that indicates
where each copy of each table is to be stored. The allocation schema defines at which site(s) a table
is located. A one-to-one mapping in the allocation schema results in non-redundancy, while a one-to-
many mapping defines a redundant distributed database.
Fragmentation.
Fragmentation is the process of taking subsets of rows and/or columns of tables as the smallest unit of
data to be sent across the network. Unfortunately, very few commercial systems have implemented
this feature, but we include a brief discussion for historical reasons. We could define a fragmentation
schema of the database based on dominant applications’ “select” predicates (set of conditions for
retrieval specified in a select statement).

Horizontal fragmentation partitions the rows of a global fragment into subsets. A fragment r1 is a
selection on the global fragment r using a predicate Pi, its qualification. The reconstruction of r is
obtained by taking the union of all fragments.
Vertical fragmentation subdivides the attributes of the global fragment into groups. The simplest
form of vertical fragmentation is decomposition. A unique row-id may be included in each fragment
to guarantee that the reconstruction through a join operation is possible.

Mixed fragmentation is the result of the successive application of both fragmentation techniques.

Rules for Fragmentation


1. Fragments are formed by the select predicates associated with
dominant database transactions. The predicates specify
attribute
values used in the conjunctive (AND) and disjunctive (OR) form of
select commands, and rows (records) containing the same values form
fragments.

2. Fragments must be disjoint and their union must become the whole
fragment. Overlapping fragments are too difficult to analyze
and implement.
3. The largest fragment is the whole table. The smallest table is a
single record. Fragments should be designed to maintain
a balance between these extremes.

3
Data Distribution
Data distribution defines the constraints under which data allocation strategies may operate. They
are determined by the system architecture and the available network database management
software. The four basic data distribution approaches are :
* Centralized
In the centralized database approach, all the data are located at a single site. The implementation
of this approach is simple. However, the size of the database is limited by the availability of the
secondary storage at the central site. Furthermore, the database may become unavailable from any of
the remote sites when communication failures occur, and the database system fails totally when the
central site fails
* Partitioned
In this approach, the database is partitioned by tables, and each table is assigned to a particular
sit This strategy is particularly appropriate where either local secondary storage is limited compared to
the database size, the reliability of the centralized database is not sufficient, or operating efficiencies
can be gained through the exploitation of the locality of references in database accesses.
* Replicated
The replicated data distribution strategy allocates a complete copy of the database to each site in
th network. This completely redundant distributed data strategy is particularly appropriate when
reliability i critical, the database is small, and update inefficiency can be tolerated.
* Hybrid
The hybrid data distribution strategy partitions the database into critical and non-critical
tables. Non-critical tables need only be stored once, while critical tables are duplicated as desired to
meet the required level of reliability.

4
Distributed Database Requirements
Database Description

1. Conceptual schema (ER diagram)

2. Transactions: functions and data accessed

Configuration Information

1. Sources of data—where data can be located.


2. Sinks of data—where user transactions can be initiated
and data transferred.

3. Transaction rate (frequency) and volume (data flow).


4. Processing capability at each site—CPU and I/O
capability (speed).
5. Security—data ownership (who can update) and
access authorization (who can query) for each
transaction.

6. Recovery—estimated frequency and volume of


backup operations.

7. Integrity — referential integrity, concurrency control, journaling, overhead, etc.

Constraints

1. Network topology: Ethernet, token ring, ATM

2. Processing capability needed at each site.

3. Channel (link) transmission capacity.


4. Availability—related to mean-time-between-failures (MTBF) and
mean-time-to-repair (MTTR).
Objective Functions

1. Response time as a function of transaction size.

2. Total system cost—communications, local I/O, cpu time, disk space.

5
The General Data Allocation Problem
Given
1. the application system specifications
- A database global schema.
- A set of user transactions and their frequencies.
- Security, i.e. data ownership (who can update) and access authorization (who can query)
for each transaction.
- Recovery, estimated frequency and volume of backup operations.
2. The distributed system configuration and software:
- The network topology, network channel capacities, and network control mechanism.
- The site locations and their processing capacity (CPU and I/O processing).
- Sources of data (where data can be located), and sinks of data (where user transactions can
be initiated and data transferred).
- The transaction processing options and synchronization algorithms.
- The unit costs for data storage, local site processing, and communications.
Find
the allocation of programs and database tables to sites which minimizes C, the total cost:
C = Ccomm + Cproc + Cstor
where:
Ccomm = communications cost for message and data.
Cproc = site processing cost (CPU and I/O).
Cstor = storage cost for data and programs at sites.

subject to possible additional constraints on:


* Transaction response time which is the sum of communication delays, local
processing, and all resource queuing delays.

* Transaction availability which is the percentage of time the transaction executes with
all components available.

You might also like