Distributed Database Overview
Distributed Database Overview
A distributed database can be defined as consisting of a collection of data with different parts under the control of separate DBMSs running on independent computer systems. All the computers are interconnected and each system has autonomous processing capability serving local applications. Each system participates, as well, in the execution of one or more global applications. Such applications require data from more than one site. The distributed nature of the database is hidden from users and this transparency manifests itself in a number of ways. Although there are a number of advantages to using a distributed DBMS, there are also a number of problems and implementation issues. Finally, data in a distributed DBMS can be partitioned or replicated or both. http://www.compapp.dcu.ie/databases/f449.html
Location transparency Replication transparency Performance transparency Transaction transparency Catalog transparency
The distributed database should look like a centralised system to the users. Problems of the distributed database are at the internal level.
Capacity and incremental growth Reliability and availability Efficiency and flexibility Sharing
Distributed query optimisation Distributed update propagation Distributed concurrency control Distributed catalog management
Centralised - Keep one master copy of the catalog Fully replicated - Keep one copy of the catalog at each site Partitioned - Partition and replicate the catalog as usage patterns demand Centralised/partitioned - Combination of the above
FROM WHERE
This query retrieves the names of employees who earn more than their supervisors. Suppose we had a constraint on the database schema that states that no employee can earn more than their supervisor. If the semantic query optimisor checks for the existence of this constraint, then it need not execute the query at all. This may save considerable time if the checking for constraints can be done efficiently; however, searching through many constraints to find ones applicable to a given query can also be quite time consuming.
Timestamping
Timestamping is a method of concurrency control where basically, all transactions are given a timestamp or unique date/time/site combination and the database management system uses one of a number of protocols to schedule transactions which require access to the same piece of data. While more complex to implement than locking, timestamping does avoid deadlock occurring by avoiding it in the first place.