0% found this document useful (0 votes)
22 views

Student Notes

The document discusses distributed database systems and their architecture. It describes distributed database systems as collections of logically related databases distributed over a computer network and managed by a distributed database management system (DBMS). The ANSI/SPARC architecture model defines three views of data - the external view for users, internal view for the system, and conceptual view for the enterprise. Client/server systems separate functions into server functions like data management and client functions like applications. Peer-to-peer systems have no centralized server and require local schemas to describe data at each site along with a global conceptual schema. The major components of a distributed DBMS are the user processor and data processor.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Student Notes

The document discusses distributed database systems and their architecture. It describes distributed database systems as collections of logically related databases distributed over a computer network and managed by a distributed database management system (DBMS). The ANSI/SPARC architecture model defines three views of data - the external view for users, internal view for the system, and conceptual view for the enterprise. Client/server systems separate functions into server functions like data management and client functions like applications. Peer-to-peer systems have no centralized server and require local schemas to describe data at each site along with a global conceptual schema. The major components of a distributed DBMS are the user processor and data processor.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

DISTRIBUTED DATA SYSTEMS - SSZG554

Student Notes

DISTRIBUTED DATA SYSTEMS - SSZG554


Student Notes

Module – 2: Distributed DBMS Architecture

Module Structure
Distributed DBMS Architecture
 Distributed DBMS
 Distributed DBMS Architecture
 Distributed Data Sources
 Distributed Design Issues

Distributed Database System


• We define a distributed database as a collection of multiple, logically interrelated databases
distributed over a computer network.
• A distributed database management system (distributed DBMS) is then defined as the
software system that permits the management of the distributed database and makes the
distribution transparent to the users.
• Sometimes “distributed database system” (DDBS) is used to refer jointly to the distributed
database and the distributed DBMS.
• The two important terms in these definitions are
– logically interrelated
– distributed over a computer network

ANSI/SPARC Architecture
• In late 1972, the Computer and Information Processing Committee (X3) of the American
National Standards Institute (ANSI) established a Study Group on Database Management
Systems under the auspices of its Standards Planning and Requirements Committee (SPARC).
• The mission of the study group was to study the feasibility of setting up standards in this
area, as well as determining which aspects should be standardized if it was feasible
• The study group proposed that the interfaces be standardized, and defined an architectural
framework that contained 43 interfaces, 14 of which would deal with the physical storage
subsystem of the computer and therefore not be considered essential parts of the DBMS
architecture

1
DISTRIBUTED DATA SYSTEMS - SSZG554
Student Notes

• In a simplified version of the ANSI/SPARC architecture there are three views of data:
– The external view , which is that of the end user, who might be a programmer
– The internal view , that of the system or machine; and
– The conceptual view , that of the enterprise

Internal Schema
• At the lowest level of the architecture is the internal view, which deals with the physical
definition and organization of data.
• The location of data on different storage devices and the access mechanisms used to reach
and manipulate data are the issues dealt with at this level.
External Schema
• At the other extreme is the external view, which is concerned with how users view the
database.
• An individual user’s view represents the portion of the database that will be accessed by that
user as well as the relationships that the user would like to see among the data.
• A view can be shared among a number of users, with the collection of user views making up
the external schema.
Conceptual Schema
• In between these two ends is the conceptual schema, which is an abstract definition of the
database.
• It is the “real world” view of the enterprise being modeled in the database

Client/Server Systems
• The general idea is very simple and elegant: distinguish the functionality that needs to be
provided and divide these functions into two classes server functions and client functions.
• This provides a two-level architecture which makes it easier to manage the complexity of
modern DBMSs and the complexity of distribution
• In relational systems, the server does most of the data management work.

2
DISTRIBUTED DATA SYSTEMS - SSZG554
Student Notes

• This means that all of query processing and optimization, transaction management and
storage management is done at the server.
• The client, in addition to the application and the user interface, has a DBMS client module
that is responsible for managing the data that is cached to the client and (sometimes)
managing the transaction locks that may have been cached as well.
• It is also possible to place consistency checking of user queries at the client side, but this is
not common since it requires the replication of the system catalog at the client machines.
• In relational systems where the communication between the clients and the server(s) is at
the level of SQL statements
• There are a number of different types of client/server architecture.
• The simplest is the case where there is only one server which is accessed by multiple clients
we call this multiple client/single server .
• From a data management perspective, this is not much different from centralized databases
since the database is stored on only one machine (the server) that also hosts the software to
manage it.
• A more sophisticated client/server architecture is one where there are multiple servers in
the system the so-called multiple client/multiple server approach.

3
DISTRIBUTED DATA SYSTEMS - SSZG554
Student Notes

• In this case, two alternative management strategies are possible: either each client manages
its own connection to the appropriate server or each client knows of only its “home server”
which then communicates with other servers as required

Peer-to-Peer Systems
• The physical data organization on each machine may be, and probably is, different.
• This means that there needs to be an individual internal schema definition at each site,
which we call the local internal schema (LIS).
• The enterprise view of the data is described by the global conceptual schema (GCS), which is
global because it describes the logical structure of the data at all the sites.
• To handle data fragmentation and replication, the logical organization of data at each site
needs to be described.
• Therefore, there needs to be a third layer in the architecture, the local conceptual schema
(LCS).

4
DISTRIBUTED DATA SYSTEMS - SSZG554
Student Notes

• In the architectural model we have chosen, then, the global conceptual schema is the union
of the local conceptual schemas.
• Finally, user applications and user access to the database is supported by external schemas
• The user queries data irrespective of its location or of which local component of the
distributed database system will service it
• The distributed DBMS translates global queries into a group of local queries, which are
executed by distributed DBMS components at different sites that communicate with one
another.

The first major component, which we call the user processor , consists of four elements:
1. The user interface handler is responsible for interpreting user commands as they come in, and
formatting the result data as it is sent to the user.
2. The semantic data controller uses the integrity constraints and authorization that are defined as
part of the global conceptual schema to check if the use query can be processed

5
DISTRIBUTED DATA SYSTEMS - SSZG554
Student Notes

3. The global query optimizer and decomposer determines an execution strategy to minimize a cost
function, and translates the global queries into local ones using the global and local conceptual
schemas as well as the global directory.
The global query optimizer is responsible, among other things, for generating the best strategy to
execute distributed join operations
4. The distributed execution monitor coordinates the distributed execution of the user request.
The execution monitor is also called the distributed transaction manager .
In executing queries in a distributed fashion, the execution monitors at various sites may, and usually
do, communicate with one another

The second major component of a distributed DBMS is the data processor and consists of
three elements
1. The local query optimizer, which actually acts as the access path selector, is responsible for
choosing the best access path5 to access any data item
2. The local recovery manager is responsible for making sure that the local database remains
consistent even when failures occur.
3. The run-time support processor physically accesses the database according to the physical
commands in the schedule generated by the query optimizer.
The run-time support processor is the interface to the operating system and contains the database
buffer (or cache) manager, which is responsible for maintaining the main memory buffers and
managing the data accesses.

Multidatabase System

• Multidatabase systems (MDBS) represent the case where individual DBMSs (whether
distributed or not) are fully autonomous and have no concept of cooperation; they may not
even “know” of each other’s existence or how to talk to each other.
• The differences in the level of autonomy between the distributed multi-DBMSs and
distributed DBMSs are also reflected in their architectural models.

6
DISTRIBUTED DATA SYSTEMS - SSZG554
Student Notes

• In the case of logically integrated distributed DBMSs, the global conceptual schema defines
the conceptual view of the entire database, while in the case of distributed multi-DBMSs, it
represents only the collection of some of the local databases that each local DBMS wants to
share.
• The individual DBMSs may choose to make some of their data available for access by others
by defining an export schema
• In a MDBS, the GCS which is also called a mediated schema is defined by integrating either
the external schemas of local autonomous databases or (possibly parts of their) local
conceptual schemas.
• Designing the global conceptual schema in multidatabase systems involves the integration of
either the local conceptual schemas or the local external schemas
• A major difference between the design of the GCS in multi-DBMSs and in logically integrated
distributed DBMSs is that in the former the mapping is from local conceptual schemas to a
global schema

• if heterogeneity exists in the multidatabase system, a canonical data model has to be found
to define the GCS
• If heterogeneity exists in the system, then two implementation alternatives exist: unilingual
and multilingual.
• A unilingual multi-DBMS requires the users to utilize possibly different data models and
languages when both a local database and the global database are accessed.
• The identifying characteristic of unilingual systems is that any application that accesses data
from multiple databases must do so by means of an external view that is defined on the
global conceptual schema.
• This means that the user of the global database is effectively a different user than those who
access only a local database, utilizing a different data model and a different data language.
• An alternative is multilingual architecture, where the basic philosophy is to permit each user
to access the global database (i.e., data from other databases) by means of an external
schema, defined using the language of the user’s local DBMS
• A popular implementation architecture for MDBSs is the mediator/wrapper approach

7
DISTRIBUTED DATA SYSTEMS - SSZG554
Student Notes

• A mediator is a software module that exploits encoded knowledge about certain sets or
subsets of data to create information for a higher layer of applications
• Using this architecture to implement a MDBS, each module in the multi-DBMS layer of is
realized as a mediator
• Since mediators can be built on top of other mediators, it is possible to construct a layered
implementation.
• In mapping this architecture to the data logical view of the mediator level implements the
GCS.
• It is this level that handles user queries over the GCS and performs the MDBS functionality.
• The mediators typically operate using a common data model and interface language.
• To deal with potential heterogeneities of the source DBMSs, wrappers are implemented
whose task is to provide a mapping between a source DBMSs view and the mediators’ view.
• The exact role and function of mediators differ from one implementation to another.
• In some cases, thin mediators have been implemented who do nothing more than
translation.
• In other cases, wrappers take over the execution of some of the query functionality.
• One can view the collection of mediators as a middleware layer that provides services above
the source systems

8
DISTRIBUTED DATA SYSTEMS - SSZG554
Student Notes

Distributed Data Sources


• A data source is simply the source of the data.
• It can be a file, a particular database on a DBMS, or even a live data feed.
• The data might be located on the same computer as the program, or on another computer
somewhere on a network
• The purpose of a data source is to gather all of the technical information needed to access
the data — the driver name, network address, network software, and so on — into a single
place and hide it from the user.
• The user should be able to look at a list that includes Payroll, Inventory, and Personnel,
choose Payroll from the list, and have the application connect to the payroll data, all without
knowing where the payroll data resides or how the application got to it
• There are two types of data sources:
– Machine Data Sources And
– File Data Sources.
• Although both contain similar information about the source of the data, they differ in the
way this information is stored.
• Because of these differences, they are used in somewhat different manners

Machine data sources

• Machine data sources are stored on the system with a user-defined name.
• Associated with the data source name is all of the information the Driver Manager and
driver need to connect to the data source.
• For an Xbase data source, this might be the name of the Xbase driver, the full path of the
directory containing the Xbase files, and some options that tell the driver how to use those
files, such as single-user mode or read-only.

File data sources

• File data sources are stored in a file and allow connection information to be used repeatedly
by a single user or shared among several users.
• When a file data source is used, the Driver Manager makes the connection to the data
source using the information in a .dsn file.
• This file can be manipulated like any other file. A file data source does not have a data
source name, as does a machine data source, and is not registered to any one user or
machine.
• Data sources usually are created by the end user or a technician with a program called
the ODBC Administrator.
• The ODBC Administrator prompts the user for the driver to use and then calls that driver.
• The driver displays a dialog box that requests the information it needs to connect to the data
source.
• After the user enters the information, the driver stores it on the system.
• Later, the application calls the Driver Manager and passes it the name of a machine data
source or the path of a file containing a file data source.

9
DISTRIBUTED DATA SYSTEMS - SSZG554
Student Notes

• When passed a machine data source name, the Driver Manager searches the system to find
the driver used by the data source.
• It then loads the driver and passes the data source name to it. The driver uses the data
source name to find the information it needs to connect to the data source.
• Finally, it connects to the data source, typically prompting the user for a user ID and
password, which generally are not stored.
• When passed a file data source, the Driver Manager opens the file and loads the specified
driver.
• If the file also contains a connection string, it passes this to the driver.
• Using the information in the connection string, the driver connects to the data source.
• If no connection string was passed, the driver generally prompts the user for the necessary
information.

10

You might also like