Distributed Databases and Client-Server Architectures
Distributed Databases and Client-Server Architectures
Client-Server Architectures
Chapter 25-2
Distributed Database Concepts
It is a system to process Unit of execution (a transaction) in
a distributed manner. That is, a transaction can be executed
by multiple networked computers in a unified manner.
It can be defined as
A distributed database (DDB) is a collection of multiple
logically related database distributed over a computer
network, and a distributed database management system
as a software system that manages a distributed
database while making the distribution transparent to
the user. A distributed database is a database that is under the control of a central
database management system (DBMS) in which storage devices are not all attached to a
common CPU. It may be stored in multiple computers located in the same physical location, or
may be dispersed over a network of interconnected computers.
Chapter 25-3
Distributed Database System
Advantages
1. Management of distributed data with different
levels of transparency: This refers to the physical
placement of data (files, relations, etc.) which is not
known to the user (distribution transparency).
Site 5
Site 1
Site 3 Site 2
Distributed Database System
Advantages
The EMPLOYEE, PROJECT, and WORKS_ON tables may be
fragmented horizontally and stored with possible replication as shown
below.
EMPLOYEES - All
PROJECTS - All
WORKS_ON - All
EMPLOYEES - New York
Chicago PROJECTS - All
(headquarters) WORKS_ON - New York Employees
Allocation schema
It describes the distribution of fragments to sites of distributed
databases. It can be fully or partially replicated or can be
partitioned.
Data Fragmentation
Hillside Lowman 1
Hillside Camp 2
Valleyview Camp 3
Valleyview Kahn 4
Hillside Kahn 5
Valleyview Kahn 6
Valleyview Green 7
deposit1 = branch_name, customer_name, tuple_id (employee_info )
account_number balance tuple_id
A-305 500 1
A-226 336 2
A-177 205 3
A-402 10000 4
A-155 62 5
A-408 1123 6
A-639 750 7
deposit2 = account_number, balance, tuple_id (employee_info )
Data Fragmentation, Replication and
Allocation
Data Replication
Database is replicated to all sites. In full replication the entire
database is replicated and in partial replication some selected
part is replicated to some of the sites. Data replication is
achieved through a replication schema.
Oracle
Site 3 Site 2
Linux Oracle Linux Oracle
Types of Distributed Database Systems
Heterogeneous
Federated: Each site may run different database system but the data access is managed
through a single conceptual schema. This implies that the degree of local autonomy is
minimum. Each site must adhere to a centralized access policy. There may be a global
schema.
Multidatabase: There is no one conceptual global schema. For data access a schema is
constructed dynamically as needed by the application software.
Network
Object DBMS
Oriented Site 3 Site 2 Relational
Linux Linux
Types of Distributed Database Systems
Department at Site 2. 100 rows. Row size = 35 bytes. Table size = 3500 bytes.
Dname Dnumber Mgrssn Mgrstartdate
Stretagies:
1. Transfer Employee and Department to site 3. Total transfer bytes
= 1,000,000 + 3500 = 1,003,500 bytes.
2. Transfer Employee to site 2, execute join at site 2 and send the
result to site 3. Query result size = 40 * 10,000 = 400,000 bytes.
Total transfer size = 400,000 + 1,000,000 = 1,400,000 bytes.
Query Processing in Distributed
Databases
Stretagies:
3. Transfer Department relation to site 1, execute the join at site 1,
and send the result to site 3. Total bytes transferred = 400,000 +
3500 = 403,500 bytes.
Optimization criteria: minimizing data transfer.
Preferred approach: strategy 3.
Chapter 25-
Query Processing in Distributed Databases
The result of this query will have 100 tuples, assuming that every
department has a manager, the execution strategies are:
Stretagies:
1. Transfer Employee and Department to the result site and perorm
the join at site 3. Total bytes transferred = 1,000,000 + 3500 =
1,003,500 bytes.
2. Transfer Employee to site 2, execute join at site 2 and send the
result to site 3. Query result size = 40 * 100 = 4000 bytes. Total
transfer size = 4000 + 1,000,000 = 1,004,000 bytes.
3. Transfer Department relation to site 1, execute join at site 1 and
send the result to site 3. Total transfer size = 4000 + 3500 = 7500
bytes.
Query Processing in Distributed Databases
Possible strategies :
1. Transfer Employee relation to site 2, execute the query and present
the result to the user at site 2. Total transfer size = 1,000,000 bytes
for both queries Q and Q’.
2. Transfer Department relation to site 1, execute join at site 1 and
send the result back to site 2. Total transfer size for Q = 400,000 +
3500 = 403,500 bytes and for Q’ = 4000 + 3500 = 7500 bytes.
Concurrency Control and Recovery
Distributed Databases encounter a number of concurrency control and
recovery problems which are not present in centralized databases.
Some of them are listed below.
Primary site
Site 5
Site 1
Site 3 Site 2
Chapter 25-
Concurrency Control and Recovery
Transaction management: Concurrency control and commit are
managed by this site. In two phase locking, this site manages locking
and releasing data items. If all transactions follow two-phase policy at
all sites, then serializability is guaranteed.
Advantages: An extension to the centralized two phase locking so
implementation and management is simple. Data items are locked only
at one site but they can be accessed at any site.
Disadvantages: All transaction management activities go to primary
site which is likely to overload the site. If the primary site fails, the
entire system is inaccessible.
To aid recovery a backup site is designated which behaves as a shadow
of primary site. In case of primary site failure, backup site can act as
primary site.
Chapter 25-
Concurrency Control and Recovery
Primary Copy Technique: In this approach, instead of a site, a data
item partition is designated as primary copy. To lock a data item just
the primary copy of the data item is locked.
Advantages: Since primary copies are distributed at various sites, a
single site is not overloaded with locking and unlocking requests.
Disadvantages: Identification of a primary copy is complex. A
distributed directory must be maintained, possibly at all sites.
Chapter 25-
Concurrency Control and Recovery
Chapter 25-
Client-Server Database Architecture
It consists of clients running client software, a set of servers which
provide all database functionalities and a reliable communication
infrastructure.
Server 1 Client 1
Client 2
Server 2 Client 3
Server n Client n
Chapter 25-
Client-Server Database Architecture
Clients reach server for desired service, but server does reach clients.
Chapter 25-
Client-Server Database Architecture
Chapter 25-
NAVATHE –CHAPTER-Distributed Database
1.What are the main reasons for and potential advantages of distributed databases?
2.When are voting and election used in distributed database.