We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 24
Distributed Databases
Andrew Mwaura
Lecture 1
* Distributed Databases - CSC451 1
Outline ⚫ Motivation of Distributed Databases ⚫ Distributed Data Processing Vs Distributed Database ⚫ Distributed Databases ⚫ Partitioning ⚫ Replication ⚫ Advantages and Disadvantages of Distributed Databases
* Distributed Databases - CSC451 2
Motivation of Distributed Databases
* Distributed Databases - CSC451 3
Distributed Database - A User’s View
* Distributed Databases - CSC451 4
Distributed Database - Reality
* Distributed Databases - CSC451 5
Peer-to-Peer Databases
* Distributed Databases - CSC451 6
Distributed Database Applications ⚫ Corporate MIS ⚫ Hotel Chains ⚫ Manufacturing
* Distributed Databases - CSC451 7
Distributed Processing vs Distributed Database Definition (Distributed Data Processing) ⚫ Refers to the deployment of related computing tasks across one or more discreet computing system, or “nodes”. ⚫ Nodes may or may not be geographically dispersed ⚫ A central processing unit stores data and manages access to that data by the nodes (the “client/server” model). ⚫ The central unit, the “server”, stores and manages the data, while the nodes act on it, typically using locally loaded software.
* Distributed Databases - CSC451 8
Distributed Processing… ⚫ This has the effect of putting the data under the control of end users, subject to restrictions imposed by the server. i.e. Distribution of control
* Distributed Databases - CSC451 9
Distributed Processing… ⚫ Early distributed data processing systems worked with a single centralized database. ⚫ Smaller local systems started to store local databases as well as do local data processing. ⚫ Move to central database distributed to local processors as long as: ⚫ Accurate updating of data ⚫ Integrity of Data ⚫ Sharing of Data ⚫ Central Administrative controls Could all be assured and maintained
* Distributed Databases - CSC451 10
Distributed Processing Why is the increase in DDP? ⚫ Dramatically reduced workstation costs ⚫ Improved user interfaces and desktop power ⚫ Ability to share data across multiple servers
* Distributed Databases - CSC451 11
Distributed Database Definition (DDB) ⚫ A distributed database (DDB) is a collection of multiple, logically interrelated databases over a computer network
* Distributed Databases - CSC451 12
Distributed Database… Definition (D-DBMS) ⚫ A distributed database management system (D-DBMS) is the software that manages the DDB and provides an access mechanism that makes the distribution transparent to users
Distributed Database System (DDBS) = DDB + D-DBMS
* Distributed Databases - CSC451 13
Distributed Database… ⚫ Consists of a collection of data with different parts under the control of separate DBMSs running on independent computer systems.
⚫ Parts (Partition d/b) or copies (Replicated d/b) are
physically stored in one location and other parts or copies are stored and maintained in other locations
⚫ All the computers are interconnected and each system has
autonomous processing capability serving local applications
* Distributed Databases - CSC451 14
Distributed Database ⚫ Data in a distributed DBMS can be (a) Partitioned or (b) Replicated (c) or both
* Distributed Databases - CSC451 15
Distributed Database (a) Data partitioning ⚫ In a distributed DBMS a relational table may be broken up into two or more non-overlapping partitions or slices. ⚫ A table may be broken up horizontally, vertically, or a combination of both. ⚫ Partitions may in turn be replicated. This feature causes problems for concurrency control and catalogue management in distributed databases. ⚫ Partitioning is transparent to users.
* Distributed Databases - CSC451 16
Distributed Database (b) Data replication ⚫ In a distributed DBMS a relational table or a partition may be replicated or copied, and copies may be distributed throughout the database. ⚫ This feature can cause problems for propagating updates and concurrency control ⚫ Is transparent to users in distributed databases
* Distributed Databases - CSC451 17
Distributed Database Advantages ⚫ Capacity and incremental growth ⚫ Reliability and availability ⚫ Improved performance ⚫ Improved shareability and local autonomy
* Distributed Databases - CSC451 18
Distributed Database Capacity and incremental growth ⚫ As the organisation grows, new sites can be added with little or no change to the DBMS ⚫ Does not involve upgrading with changes in hardware and software that effect the entire database
Reliability and availability
⚫ Even when a portion of a system (i.e. a local site) is down, the overall system remains available ⚫ With replicated data, the failure of one site still allows access to the replicated copy of the data from another site ⚫ The greater accessibility enhances the reliability of the system.
* Distributed Databases - CSC451 19
Distributed Database Improved performance ⚫ Data is physically stored close to the anticipated point of use ⚫ If usage patterns change then data can be dynamically moved or replicated to where it is most needed
Improved shareability and local autonomy
⚫ users at a given site are able to access data stored at other sites and at the same time retain control over the data at their own site.
* Distributed Databases - CSC451 20
Distributed Database Disadvantages … ⚫ Increased Cost ⚫ Complexity ⚫ Security ⚫ Integrity control may be difficult
* Distributed Databases - CSC451 21
Distributed Database Increased Cost ⚫ Implementation are its cost and complexity ⚫ Maintenance costs of the system are higher ⚫ Large communication overhead in coordinating messages between the different sites Security ⚫ Security risk is high ⚫ Needs secure network communication
* Distributed Databases - CSC451 22
Distributed Database Complexity ⚫ A distributed system, which hides its distributed nature from the end user, is more complex than the centralised system ⚫ The parallel nature of the system means that errors are harder to avoid and those in the applications are difficult to pinpoint Integrity control may be difficult ⚫ Validity and consistency of stored data ⚫ Communication and processing cost required to enforce the integrity constraints may be significant in some situations * Distributed Databases - CSC451 23 Summary ⚫ Motivation of Distributed Databases ⚫ Distributed Data Processing Vs Distributed Database ⚫ Distributed Databases ⚫ Partitioning ⚫ Replication ⚫ Advantages and Disadvantages of Distributed Databases