0% found this document useful (0 votes)
28 views31 pages

Schemas, Subscema and Instances

Uploaded by

thelsare18
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views31 pages

Schemas, Subscema and Instances

Uploaded by

thelsare18
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 31

DATA ARCHIECTURES IN DATABASE

INTRODUCTION
An organisation requires an accurate and reliable data and efficient
database system for effective decision-making.

To achieve this goal, the organisation maintains records for its varied
operations by building appropriate database models and by capturing
essential properties of the objects and record relationship. Users of a
database system in the organisation look for an abstract view of data they
are interested in.

Furthermore, since database is a shared resource, each user may require a


different view of the data held in the database. Therefore, one of the main
aims of a database system is to provide users with an abstract view of data,
hiding certain details of how data is stored and manipulated. To satisfy
these needs, we need to develop architecture for the database systems. The
database architecture is a framework in which the structure of the DBMS is
described.

The DBMS architecture has evolved from early-centralised monolithic


systems to the modern distributed DBMS system with modular design.
Large centralised mainframe computers have been replaced by hundreds of
distributed workstations and personal computers connected via
communications networks.

**********
In the early systems, the whole DBMS package was a single, tightly
integrated system, whereas the modern DBMS is based on client-
server system architecture. Under the client-server system architecture, the
majority of the users of the DBMS are not present at the site of the
database system, but are connected to it through a network.
On server machines, the database system runs, whereas on client machines
(which are typically workstations or personal computers) remote database
users work.

The client-server architecture has been explained in details below.


The database applications are usually portioned into a two-tier architecture
or a three-tier architecture, as shown in Fig. 1.

In a two-tier architecture, the application is partitioned into a component


that resides at the client machines, which evokes database system
functionality at the server machine through query language statements.
Application program interface standards are used for interaction between
the client and the server.

Fig. 1. Database system architectures

In a three-tier architecture, the client machine acts as merely a front-end


and does not contain any direct database calls. Instead, the client end
communicates with an application server, usually through a forms interface.

The application server in turn communicates with a database system to


access data. The business logic of the application, which says what actions
to carry out and under what conditions, is embedded in the application
server, instead of being distributed across multiple clients.

Three-tier architectures are more appropriate for large applications and for
applications that run on the World Wide Web (WWW).
It is not always possible that every database system can be fitted or
matched to a particular framework.

Also, there is no particular framework that can be said to be the only


possible framework for defining database architecture.

However, a generalized architecture of the database system, which fits


most system reasonably well, will be discussed here.

SCHEMAS, SUBSCHEMA AND INSTANCES

When the database is designed to meet the information needs of an


organisation, plans (or scheme) of the database and actual data to be stored
in it becomes the most important concern of the organisation.

****
It is important to note that the data in the database changes frequently,
while the plans remain the same over long periods of time (although not
necessarily forever). The database plans consist of types of entities that a
database deals with, the relationships among these entities and the ways in
which the entities and relationships are expressed from one level of
abstraction to the next level for the users’ view.

The users’ view of the data (also called logical organisation of data) should
be in a form that is most convenient for the users and they should not be
concerned about the way data is physically organised. Therefore, a DBMS
should do the translation between the logical (users’ view) organisation and
the physical organisation of the data in the database.

Schema
The plan (or formulation of scheme) of the database is known as schema.
Schema gives the names of the entities and attributes. It specifies the
relationship among them. It is a framework into which the values of the data
items (or fields) are fitted.

******
The plans or the format of schema remains the same. But the values fitted
into this format changes from instance to instance.

In other terms, schema mean an overall plan of all the data item (field)
types and record types stored in a database.

Schema includes the definition of the database name, the record type and
the components that make up those records.
Let us look at a Fig. 1.23 and assume that it is a sales record database of
M/s ABC, a manufacturing company. The structure of the database
consisting of three files (or tables) namely, PRODUCT, CUSTOMER and
SALES files is the schema of the database.

A database schema corresponds to the variable declarations (along with


associated type definitions) in a program. Fig. 2.2 shows a schema diagram
for the database structure shown in Fig. 1.23.
The schema diagram displays the structure of each record type but not the
actual instances of records. Each object in the schema, for example,
PRODUCT, CUSTOMER or SALES are called a schema construct.

Fig. 1.23. Schema diagram for database of M/s ABC Company

(a) Schema diagram for sales record database

(b) Schema defined using database language


The schema diagram and the relationships for another example of
purchasing system of M/s KLY System.

The purchasing system schema has three records (or objects) namely
PURCHASE-ORDER, SUPPLIER, PURCHASE-ITEM, QUOTATION and
PART. Solid arrows connecting different blocks show the relationships
among the objects. For example, the PURCHASE-ORDER record is
connected to the PURCHASE-ITEM records of which that purchase order is
composed and the SUPPLIER record to the QUOTATION records showing
the parts that supplier can provide and so forth. The dotted arrows show the
cross-references between attributes (or data items) of different objects or
records.

Fig.2 3. Schema diagram for database of M/s KLY System

(a) Schema diagram of purchasing system database


(b) Schema defined using database language
(c) Schema relationship diagram
As can be seen in Fig. 2.3 (c), the duplication of attributed are avoided
using relationships and cross-referencing. For example, the attributes SUP-
NAME, SUP-ADD and SUP-DETAILS are included in separate SUPPLIER
record and not in the PURCHASE-ORDER record.

Similarly, attributes such as PART-NAME, PART-DETAILS and QTY-ON-


HAND are included in separate PART record and not in the PURCHASE-
ITEM record.

Thus, the duplication of including PART-DETAILS and SUPPLIERS in every


PURCHASE-ITEM is avoided. With the help of relationships and cross-
referencing, the records are linked appropriately with each other to
complete the information and data is located quickly.

The database system can have several schemas partitioned according to the
levels of abstraction. In general, schema can be categorised in two parts; (a)
a logical schema and (b) a physical schema.

The logical schema is concerned with exploiting the data structures offered
by a DBMS in order to make the scheme understandable to the computer.
The physical schema, on the other hand, deals with the manner in which the
conceptual database shall get represented in the computer as a stored
database.

The logical schema is the most important as programs use it to construct


applications. The physical schema is hidden beneath the logical schema and
can usually be changed easily without affecting application programs.
DBMSs provide database definition language (DDL) and database storage
definition language (DSDL) in order to make the specification of both the
logical and physical schema easy for the DBA.

2.2.2. Subschema
A subschema is a subset of the schema and inherits the same property that
a schema has. The plan (or scheme) for a view is often called subschema.
Subschema refers to an application programmer’s (user’s) view of the data
item types and record types, which he or she uses. It gives the users a
window through which he or she can view only that part of the database,
which is of interest to him. In other words, subschema defines the portion of
the database as “seen” by the application programs that actually produced
the desired information from the data contained within the database.
Therefore, different application programs can have different view of
data. Fig. 2.4 shows subschemas viewed by two different application
programs derived from the example of Fig. 2.3.
Fig. 2.4. Subschema views of two applications programs

(a) Subschema for first application program

(b) Subschema for second application program


As shown in Fig. 2.4, the SUPPLIER-MASTER record of first application
program {Fig. 2.4 (a)} now contains additional attributes such a SUP-
NAME and SUP-ADD from SUPPLIER record of Fig. 2.3 and the
PURCHASE-ORDER-DETAILS record contains additional attributes such as
PART-NAME, SUP-NAME and PRICE from two records PART and
SUPPLIER respectively. Similarly, ORDER-DETAILS record of second
application program {Fig. 2.4 (b)} contains additional attributes such as
SUP-NAME, and QTY-ORDRD form two records SUPPLIER and PURCHASE-
ITEM respectively.
Individual application programs can change their respective subschema
without effecting subschema views of others. The DBMS software derives
the subschema data requested by application programs from schema data.
The database administrator (DBA) ensures that the subschema requested by
application programs is derivable from schema.

The application programs are not concerned about the physical organisation
of data. The physical organisation of data in the database can change
without affecting application programs. In other words, with the change in
physical organisation of data, application programs for subschema need not
be changed or modified. Subschemas also act as a unit for enforcing
controlled access to the database, for example, it can bar a user of a
subschema from updating a certain value in the database but allows him to
read it. Further, the subschema can be made basis for controlling
concurrent operations on the database. Subschema definition language
(SDL) is used to specify a subschema in the DBMS. The nature of this
language depends upon the data structure on which a DBMS is based and
also upon the host language within which DBMS facilities are used. The
subschema is sometimes referred to as an LVIEW or logical view. Many
different subschemas can be derived from one schema.

2.2.3. Instances

When the schema framework is filled in the data item values or the contents
of the database at any point of time (or current contents), it is referred to as
an instance of the database. The term instance is also called as state of the
database or snapshot. Each variable has a particular value at a given
instant. The values of the variables in a program at a point in time
correspond to an instance of a database schema, as shown in Fig. 2.5.

Fig. 2.5. Instance of the database of M/s ABC Company


(a) Instance of the PRODUCT relation

(b) Instance of the CUSTOMER relation

(c) Instance of the SALES relation

The difference between database schema and database state or instance is


very distinct. In the case of a database schema, it is specified to DBMS
when new database is defined, whereas at this point of time, the
corresponding database state is empty with no data in the database. Once
the database is first populated with the initial data, from then on, we get
another database state whenever an update operation is applied to the
database. At any point of time, the current state of the database is called
the instance.
2.3. THREE-LEVEL ANSI-SPARC DATA BASE ARCHITECTURE

For the first time in 1971, Database Task Group (DBTG) appointed by the
Conference on Data Systems and Languages (CODASYL), produced a
proposal for general architecture for database systems. The DBTG proposed
a two-tier architecture as shown in Fig. 2.1 (a) with a system view called the
schema and user views called subschemas. In 1975, ANSI-SPARC (American
National Standards Institute – Standards Planning and Requirements
Committee) produced a three-tier architecture with a system catalog. The
architecture of most commercial DBMSs available today is based to some
extent on ANSI-SPARC proposal.

ANSI-SPARC three-tier database architecture is shown in Fig. 2.6. It


consists of following three levels:
 Internal level,
 Conceptual level,
 External level.

Fig. 2.6. ANSI-SPARC three-tier database structure


The view at each of the above levels is described by a scheme or schema. As
explained in Section 2.2, a schema is an outline or plan that describes the
records, attributes and relationships existing in the view. The term view,
scheme and schema are used interchangeably. A data definition language
(DDL), as explained in Section 1.10.1, is used to define the conceptual and
external schemas. Structured query language (SQL) commands are used to
describe the aspects of the physical (or internal schema). Information about
the internal, conceptual and external schemas is stored in the system
catalog, as explained in Section 1.2.6.

Let us take an example of CUSTOMER record of Fig. 2.2 as shown in Fig.


2.7 (a). The integrated record definition of CUSTOMER record is shown
in Fig. 2.7 (b). The data has been abstracted in three levels corresponding
to three views (namely internal, conceptual and external views), as shown
in Fig. 2.8. The lowest level of abstraction of data contains a description of
the actual method of storing data and is called the internal view, as shown
in Fig. 2.8 (c). The second level of abstraction is the conceptual or global
view, as shown in Fig. 2.8 (b). The third level is the highest level of
abstraction seen by the user or application program and is called the
external view or user view, as shown in Fig. 2.8 (a). The conceptual view is
the sum total of user or external view of data.
Fig. 2.7. CUSTOMER record definition

(a) CUSTOMER record

(b) Integrated record definition of CUTOMER record

Fig. 2.8. Three views of the data

(a) Logical records

(b) Conceptual records


(c) Internal record

From Fig. 2.8, the following explanations can be derived:

 At the internal or physical level, as shown in Fig. 2.8 (c), customers


are represented by a stored record type called STORED-CUST, which
is 74 characters (or bytes) long. CUSTOMER record contains five
fields or data items namely CUST-ID, CUST-NAME, CUST-STREET,
CUST-CITY and CUST-BAL corresponding to five properties of
customers.
 At the conceptual or global level, as shown in Fig. 2.8 (b), the
database contains information concerning an entity type called
CUSTOMER. Each individual customer has a CUST-ID (4 digits),
CUST-NAME (20 characters), CUST-STREET (40 characters), CUST-
CITY (10 characters) and CUST-BAL (8 digits).
 The user view 1 in Fig. 2.8 (a) has an external schema of the database
in which each customer is represented by a record containing two
fields or data items namely CUST-NAME and CUST-CITY. The other
three fields are of no interest to this user and have therefore been
omitted.
 The user view 2 in Fig. 2.8 (a) has an external schema of the database
in which each customer is represented by a record containing three
fields or data items namely CUST-ID, CUST-NAME and CUST-BAL.
The other two fields are of no interest to this user and have thus been
omitted.
 There is only one conceptual schema and one internal schema per
database.

2.3.1. Internal Level


Internal level is the physical representation of the database on the
computer and this view is found at the lowest level of abstraction of
database. This level indicates how the data will be stored in the database
and describes the data structures, file structures and access methods to be
used by the database. It describes the way the DBMS and the operating
system perceive the data in the database. Fig. 2.8 (c)shows internal view
record of a database. Just below the internal level there is physical level
data organisation whose implementation is covered by the internal level to
achieve routine performance and storage space utilization. The internal
schema defines the internal level (or view). The internal schema contains
the definition of the stored record, the method of representing the data
fields (or attributes), indexing and hashing schemes and the access methods
used. Internal level provides coverage to the data structures and file
organisations used to store data on storage devices.
Essentially, internal schema summarizes how the relations described in the
conceptual schema are actually stored on secondary storage devices such
as disks and tapes. It interfaces with the operating system access methods
(also called file management techniques for storing and retrieving data
records) to place the data on the storage devices, build the indexes, retrieve
the data and so on. Internal level is concerned with the following activities:
 Storage space allocation for data and storage.
 Record descriptions for storage with stored sizes for data items.
 Record placement.
 Data compression and data encryption techniques.
The process arriving at a good internal (or physical) schema is
called physical database design. The internal schema is written using SQL
or internal data definition language (internal DDL).

2.3.2. Conceptual Level


The conceptual level is the middle level in the three-tier architecture. At
this level of database abstraction, all the database entities and relationships
among them are included. Conceptual level provides the community view of
the database and describes what data is stored in the database and the
relationships among the data. It contains the logical structure of the entire
database as seen by the DBA. One conceptual view represents the entire
database of an organisation.

It is a complete view of the data requirements of the organisation that is


independent of any storage considerations. The conceptual schema defines
conceptual view. It is also called the logical schema. There is only one
conceptual schema per database.

Fig. 2.8 (b) shows conceptual view record of a database.


This schema contains the method of deriving the objects in the conceptual
view from the objects in the internal view. Conceptual level is concerned
with the following activities:
 All entities, their attributes and their relationships.
 Constraint on the data.
 Semantic information about the data.
 Checks to retain data consistency and integrity.
 Security information.
The conceptual level supports each external view, in that any data available
to a user must be contained in, or derived from, the conceptual level.
However, this level must not contain any storage-dependent details. For
example, the description of an entity should contain only data types of
attributes (for example, integer, real, character and so on) and their length
(such as the maximum number of digits or characters), but not any storage
consideration, such as the number of bytes occupied. The choice of relations
and the choice of field (or data item) for each relation, is not always
obvious. The process of arriving at a good conceptual schema is
called conceptual database design. The conceptual schema is written using
conceptual data definition language (conceptual DDL).

2.3.3. External Level


The external level is the user’s view of the database. This level is at the
highest level of data abstraction where only those portions of the database
of concern to a user or application program are included. In other words,
this level describes that part of the database that is relevant to the user.
Any number of user views, even identical, may exist for a given conceptual
or global view of the database. Each user has a view of the “real world”
represented in a form that is familiar for that user. The external view
includes only those entities, attributes and relationships in the “real world”
that the user is interested in. Other entities, attributes and relationships
that are not of interest to the user, may be represented in the database, but
the user will be unaware of them. Fig. 2.8 (a) shows external or user view
record of a database.

In the external level, the different views may have different representations
of the same data. For example, one user may view data in the form as day,
month, year while another may view as year, month, day. Some views might
include derived or calculated data, that is, data is not stored in the database
but are created when needed. For example, the average age of an employee
in an organisation may be derived or calculated from the individual age of
all employees stored in the database. External views may include data
combined or derived from several entities.

An external schema describes each external view. The external schema


consists of the definition of the logical records and the relationships in the
external view. It also contains the method of deriving the objects (for
example, entities, attributes and relationships) in the external view from the
object in the conceptual view. External schemas allow data access to be
customized at the level of individual users or groups of users. Any given
database has exactly one internal or physical schema and one conceptual
schema because it has just one set of stored relations, as shown in Fig. 2.8
(a) and (b). But, it may have several external schemas, each tailored to a
particular group of users, as shown in Fig. 2.8 (a). The external schema is
written using external data definition language (external DDL).

2.3.4. Advantages of Three-tier Architecture


The main objective of the three-tier database architecture is to isolate each
user’s view of the database from the way the database is physically stored
or represented. Following are the advantages of a three-tier database
architecture:
 Each user is able to access the same data but have a different
customized view of the data as per their own needs. Each user can
change the way he or she views the data and this change does not
affect other users of the same database.
 The user is not concerned about the physical data storage details. The
user’s interaction with the database is independent of physical data
storage organisation.
 The internal structure of the database is unaffected by changes to the
physical storage organisation, such as changeover to a new storage
device.
 The database administrator (DBA) is able to change the database
storage structures without affecting the user’s view.
 The DBA is able to change the conceptual structure of the database
without affecting all users.

2.3.5. Characteristics of Three-tier Architecture


Table 2.1 shows degree of abstraction, characteristics and type of DBMS
used for the three levels.
able 2.1. Features of three-tier structure

Abstraction Physical Internal Conceptual External


eatures ↓ level → level level level level

Degree of Low Medium High Medium


bstraction

Characteristics Hardware Hardware Hardware and Hardware


and Software and Software Software and Software
dependent dependent dependent dependent
able 2.1. Features of three-tier structure

Abstraction Physical Internal Conceptual External


eatures ↓ level → level level level level

Hierarchical Attention Attention Attention Attention


DBMS required required required about required
about about physical-level about
physical-level physical-level details physical-leve
details details details
Degree of
Network Attention Attention Attention Attention
nowledge
DBMS required required required about required
equired by
about about physical-level about
atabase designer
physical-level physical-level details physical-leve
sing
details details details

Relational No concern No concern No concern No concern


DBMS about about about physical about
physical level physical level level details physical leve
details details details

2.4. DATA INDEPENDENCE

Data independence (briefly discussed in Section 1.8.5 (b)) is a major


objective of implementing DBMS in an organisation. It may be defined as
the immunity of application programs to change in physical representation
and access techniques. Alternatively, data independence is the
characteristics of a database system to change the schema at one level
without having to change the schema at the next higher level. In other
words, the application programs do not depend on any one particular
physical representation or access technique. This characteristic of DBMS
insulates the application programs from changes in the way the data is
structured and stored. The data independence in achieved by DBMS
through the use of the three-tier architecture of data abstraction. There are
two types of data independence as shown in the mapping of three-tier
architecture of Fig. 2.9.

1. Physical data independence.


2. Logical data independence.

Fig. 2.9. Mappings of three-tier architecture


2.4.1. Physical Data Independence

Immunity of the conceptual (or external) schemas to changes in the internal


schema is referred to as physical data independence. In physical data
independence, the conceptual schema insulates the users from changes in
the physical storage of the data. Changes to the internal schema, such as
using different file organizations or storage structures, using different
storage devices, modifying indexes or hashing algorithms, must be possible
without changing the conceptual or external schemas. In other words,
physical data independence indicates that the physical storage structures or
devices used for storing the data could be changed without necessitating a
change in the conceptual view or any of the external views. The change is
absorbed by conceptual/internal mapping, as discussed in Section 2.5.1.

2.4.2. Logical Data Independence

Immunity of the external schemas (or application programs) to changes in


the conceptual schema is referred to as logical data independence. In
logical data independence, the users are shielded from changes in the
logical structure of the data or changes in the choice of relations to be
stored. Changes to the conceptual schema, such as the addition and
deletion of entities, addition and deletion of attributes, or addition and
deletion of relationships, must be possible without changing existing
external schemas or having to rewrite application programs. Only the view
definition and the mapping need be changed in a DBMS that supports
logical data independence. It is important that the users for whom the
changes have been made should not be concerned. In other words, the
application programs that refers to the external schema constructs must
work as before, after the conceptual schema undergoes a logical
reorganisation.

2.5. MAPPINGS

The three schemas and their levels discussed in Section 2.3 are the
description of data that actually exists in the physical database. In the
three-schema architecture database system, each user group refers only to
its own external schema. Hence, the user’s request specified at external
schema level must be transformed into a request at conceptual schema
level. The transformed request at conceptual schema level should be further
transformed at internal schema level for final processing of data in the
stored database as per user’s request. The final result from processed data
as per user’s request must be reformatted to satisfy the user’s external
view. The process of transforming requests and results between the three
levels are called mappings. The database management system (DBMS) is
responsible for this mapping between internal, conceptual and external
schemas.

The three-tier architecture of ANSI-SPARC model provides the following


two-stage mappings as shown in Fig. 2.9:
 Conceptual/Internal mapping
 External/Conceptual mapping
2.5.1. Conceptual/Internal Mapping
The conceptual schema is related to the internal schema
through conceptual/internal mapping. The conceptual internal mapping
defines the correspondence between the conceptual view and the stored
database. It specifies how conceptual records and fields are presented at
the internal level. It enables DBMS to find the actual record or combination
of records in physical storage that constitute a logical record in the
conceptual schema, together with any constraints to be enforced on the
operations for that logical record. It also allows any differences in entity
names, attribute names, attribute orders, data types, and so on, to be
resolved. In case of any change in the structure of the stored database, the
conceptual/internal mapping is also changed accordingly by the DBA, so
that the conceptual schema can remain invariant. Therefore, the effects of
changes to the database storage structure are isolated below the conceptual
level in order to preserve the physical data independence.

2.5.2. External/Conceptual Mapping

Each external schema is related to the conceptual schema by


the external/conceptual mapping. The external/conceptual mapping defines
the correspondence between a particular external view and the conceptual
view. It gives the correspondence among the records and relationships of
the external and conceptual views. It enables the DBMS to map names in
the user’s view on to the relevant part of the conceptual schema. Any
number of external views can exist at the same time, any number of users
can share a given external view and different external view can overlap.

There could be one mapping between conceptual and internal levels and
several mappings between external and conceptual levels. The
conceptual/internal mapping is the key to physical data independence while
the external/conceptual mapping is the key to the logical data
independence. Fig. 2.9 illustrates the three-tier ANSI-SPARC architecture
with mappings.

The information about the mapping requests among various schema levels
are included in the system catalog of DBMS. The DBMS uses additional
software to accomplish the mappings by referring to the mapping
information in the system catalog. When schema is changed at some level,
the schema at the next higher level remains unchanged. Only the mapping
between the two levels is changed. Thus, data independence is
accomplished. The two-stage mapping of ANSI-SPARC three-tier structure
provides greater data independence but inefficient mapping. However,
ANSI-SPARC provides efficient mapping by allowing the direct mapping of
external schemas on to the internal schema (by passing the conceptual
schema) but at reduced data independence (more data-dependent).

2.6. STRUCTURE, COMPONENTS, AND FUNCTIONS OF DBMS

As discussed in Chapter 1, Section 1.5, a database management system


(DBMS) is highly complex and sophisticated software that handles access to
the database. The structure of DBMS varies greatly from system to system
and, therefore, a generalised component structure of DBMS is not possible
to make.
2.6.1. Structure of a DBMS
A typical structure of a DBMS with its components and relationships
between them is shown in Fig. 2.10. The DBMS software is partitioned into
several modules. Each module or component is assigned a specific
operation to perform. Some of the functions of the DBMS are supported by
operating systems (OS) to provide basic services and DBMS is built on top
of it. The physical data and system catalog are stored on a physical disk.
Access to the disk is controlled primarily by OS, which schedules disk
input/output. Therefore, while designing a DBMS its interface with the OS
must be taken into account.

Fig. 2.10. Structure of DBMS


2.6.2. Execution Steps of a DBMS
As shown in Fig. 2.10, conceptually, following logical steps are followed
while executing users request to access the database system:
1. Users issue a query using particular database language, for example,
SQL commands.
2. The passed query is presented to a query optimiser, which uses
information about how the data is stored to produce an efficient
execution plan for evaluating the query.
3. The DBMS accepts the users SQL commands and analyses them.
4. The DBMS produces query evaluation plans, that is, the external
schema for the user, the corresponding external/conceptual mapping,
the conceptual schema, the conceptual/internal mapping, and the
storage structure definition. Thus, an evaluation plan is a blueprint for
evaluating a query.
5. The DBMS executes these plans against the physical database and
returns the answers to the users.
Using components such as transaction manager, buffer manager, and
recovery manager, the DBMS supports concurrency and crash recovery by
carefully scheduling users requests and maintaining a log of all changes to
the database.

2.6.3. Components of a DBMS

As explained in Section 2.6.2, the DBMS accepts the SQL commands


generated from a variety of user interfaces, produces query evaluation
plans, executes these plans against the database, and returns the answers.
As shown in Fig. 2.10, the major software modules or components of DBMS
are as follows:

1. Query processor: The query processor transforms users queries into a


series of low-level instructions directed to the run time database
manager. It is used to interpret the online user’s query and convert it
into an efficient series of operations in a form capable of being sent to
the run time data manager for execution. The query processor uses
the data dictionary to find the structure of the relevant portion of the
database and uses this information in modifying the query and
preparing an optimal plan to access the database.

2. Run time database manager: Run time database manager is the


central software component of the DBMS, which interfaces with user-
submitted application programs and queries. It handles database
access at run time. It converts operations in user’s queries coming
directly via the query processor or indirectly via an application
program from the user’s logical view to a physical file system. It
accepts queries and examines the external and conceptual schemas to
determine what conceptual records are required to satisfy the users
request. The run time data manager then places a call to the physical
database to perform the request. It enforces constraints to maintain
the consistency and integrity of the data, as well as its security. It also
performs backing and recovery operations. Run time database
manager is sometimes referred to as the database control system and
has the following components:
1. Authorization control: The authorization control module checks
that the user has necessary authorization to carry out the
required operation.
2. Command processor: The command processor processes the
queries passed by authorization control module.
3. Integrity checker: The integrity checker checks for necessary
integrity constraints for all the requested operations that
changes the database.
4. Query optimizer: The query optimizer determines an optimal
strategy for the query execution. It uses information on how the
data is stored to produce an efficient execution plan for
evaluating query.
5. Transaction manager: The transaction manager performs the
required processing of operations it receives from transactions.
It ensures that (a) transactions request and release locks
according to a suitable locking protocol and (b) schedules the
execution of transactions.
6. Scheduler: The scheduler is responsible for ensuring that
concurrent operations on the database proceed without
conflicting with one another. It controls the relative order in
which transaction operations are executed.
7. Data manager: The data manager is responsible for the actual
handling of data in the database. This module has the following
two components:
- Recovery manager: The recovery manager ensures that the database
remains in a consistent state in the presence of failures. It is responsible for
(a) transaction commit and abort operations, (b) maintaining a log, and (c)
restoring the system to a consistent state after a crash.

- Buffer manager: The buffer manager is responsible for the transfer of data
between the main memory and secondary storage (such as disk or tape). It
brings in pages from the disk to the main memory as needed in response to
read user requests. Buffer manager is sometimes referred as the cache
manager.

3. DML processor: Using a DML compiler, the DML processor converts


the DML statements embedded in an application program into
standard function calls in the host language. The DML compiler
converts the DML statements written in a host programming language
into object code for database access. The DML processor must
interact with the query processor to generate the appropriate code.
4. DDL processor: Using a DDL compiler, the DDL processor converts
the DDL statements into a set of tables containing metadata. These
tables contain the metadata concerning the database and are in a
form that can be used by other components of the DBMS. These tables
are then stored in the system catalog while control information is
stored in data file headers. The DDL compiler processes schema
definitions, specified in the DDL and stores description of the schema
(metadata) in the DBMS system catalog. The system catalog includes
information such as the names of data files, data items, storage details
of each data file, mapping information amongst schemas, and
constraints.

2.6.4. Functions and Services of DBMS

As discussed in Chapter 1, Section 1.8.5, the DBMS offers several


advantages over file-oriented systems. A DBMS performs several important
functions that guarantee integrity and consistency of data in the database.
Most of these functions are transparent to end-users. Fig. 2.11 illustrates
the functions and services provided by a DBMS.
Fig. 2.11. Functions of DBMS
1. Data Storage Management: The DBMS creates the complex structures
required for data storage in the physical database. It provides a
mechanism for management of permanent storage of the data. The
internal schema defines how the data should be stored by the storage
management mechanism and the storage manager interfaces with the
operating system to access the physical storage. This relieves the
users from the difficult task of defining and programming the physical
data characteristics. The DBMS provides not only for the data, but
also for related data entry forms or screen definitions, report
definitions, data validation rules, procedural code, structure to handle
video and picture formats, and so on.

2. Data Manipulation Management: A DBMS furnishes users with the


ability to retrieve, update and delete existing data in the database or
to add new data to the database. It includes a DML processor
component (as shown in Fig. 2.10) to deal with the data manipulation
language (DML).

3. Data Definition Services: The DBMS accepts the data definitions such
as external schema, the conceptual schema, the internal schema, and
all the associated mappings in source form. It converts them to the
appropriate object form using a DDL processor component (as shown
in Fig. 2.10) for each of the various data definition languages (DDLs).

4. Data Dictionary/System Catalog Management: The DBMS provides a


data dictionary or system catalog function in which descriptions of
data items are stored and which is accessible to users. As explained
in Chapter 1, Section 1.2.6 and 1.3, a system catalog or data
dictionary is a system database, which is a repository of information
describing the data in the database. It is the data about the data or
metadata. All of the various schemas and mappings and all of the
various security and integrity constraints, in both source and object
forms, are stored in the data dictionary. The system catalog is
automatically created by the DBMS and consulted frequently to
resolve user requests. For example, the DBMS will consult the system
catalog to verify that a requested table exists and that the user issuing
the request has the necessary access privileges.

5. Database Communication Interfaces: The end-user’s requests for


database access (may be from remote location through internet or
computer workstations) are transmitted to DBMS in the form of
communication messages. The DBMS provides special communication
routines designed to allow the database to accept end-user requests
within a computer network environment. The response to the end user
is transmitted back from DBMS in the form of such communication
messages. The DBMS integrates with a communication software
component called data communication manager (DCM), which
controls such message transmission activities. Although, the DCM is
not a part of DBMS, both work in harmony in which the DBMS looks
after the database and the DCM handles all messages to and from the
DBMS.

6. Authorisation / Security Management: The DBMS protects the


database against unauthorized access, either intentional or
accidental. It furnishes mechanism to ensure that only authorized
users can access the database. It creates a security system that
enforces user security and data privacy within the database. Security
rules determine which users can access the database, which data
items each user may access and which data operations (read, add,
delete and modify) the user may perform. This is especially important
in multi-user environment where many users can access the database
simultaneously. The DBMS monitors user requests and rejects any
attempts to violate the security rules defined by the DBA. It monitors
and controls the level of access for each user and the operations that
each user can perform on the data depending on the
access privileges or access rights of the users.

There are many ways for a DBMS to identify legitimate users. The most
common method is to establish accounts with passwords. Some DBMSs
use data encryption mechanisms to ensure the information written to disk
cannot be read or changed unless the user provides the encryption key that
unscrambles the data. Some DBMSs also provide users with the ability to
instruct the DBMS, via user exits, to employ custom-written routines to
encode the data. In some cases, organisations may be interested in
conducting security audits, particularly if they suspect the database may
have been tampered with. Some DBMSs provide audit trails, which are
traces or logs that records various kinds of database access activities (for
example, unsuccessful access attempts). Security management is discussed
in further details in Chapter 14.

7. Backup and Recovery Management: The DBMS provides mechanisms


for backing up data periodically and recovering from different types of
failures. This prevents the loss of data. It ensures that the aborted or
failed transactions do not create any adverse effect on the database or
other transactions. The recovery mechanisms of DBMSs make sure
that the database is returned to a consistent state after a transaction
fails or aborts due to a system crash, media failure, hardware or
software errors, power failure, and so on. Many DBMSs enable users
to make full or partial backups of their data. A full backup saves all
the data in the target resource, such as the entire file or an entire
database. These are useful after a large quantity of work has been
completed, such as loading data into a newly created database.
Partial, or incremental, backups usually record only the data that has
been changed since the last full backup. These are less time-
consuming than full backups and are useful for capturing periodic
changes. Some DBMSs support online backups, enabling a database
to be backed up while it is open and in use. This is important for
applications that require support for continuous operations and
cannot afford having a database inaccessible. Recovery management
is discussed in further detail in Chapter 13.

8. Concurrency Control Services: Since DBMSs support sharing of data


among multiple users, they must provide a mechanism for managing
concurrent access to the database. DBMSs ensure that the database is
kept in consistent state and that the integrity of the data is preserved.
It ensures that the database is updated correctly when multiple users
are updating the database concurrently. Concurrency control is
discussed in further detail in Chapter 12.

9. Transaction Management: A transaction is a series of database


operations, carried out by a single user or application program, which
accesses or changes the contents of the database. Therefore, a DBMS
must provide a mechanism to ensure either that all the updates
corresponding to a given transaction are made or that none of them is
made. A detailed discussion on transaction management has been
given in Chapter 1, Section 1.11. A further detail on transaction
processing is given in Chapter 12.

10. Integrity Services: As discussed in Chapter 1, Section 1.5 (f),


database integrity refers to the correctness and consistency of stored
data and is especially important in transaction-oriented database
system. Therefore, a DBMS must provide means to ensure that both
the data in the database and changes to the data follow certain rules.
This minimizes data redundancy and maximizes data consistency.

11. The data relationships stored in the data dictionary are used to
enforce data integrity. Various types of integrity mechanisms and
constraints may be supported to help ensure that the data values
within a database are valid, that the operations performed on those
values are valid and that the database remains in a consistent state.
12. Data Independence Services: As discussed in Chapter 1, Section
1.8.5 (b) and Section 2.4, a DBMS must support the independence of
programs from the actual structure of the database.
13. Utility Services: The DBMS provides a set of utility services
used by the DBA and the database designer to create, implement,
monitor and maintain the database. These utility services help the
DBA to administer the database effectively.

14. Database Access and Application Programming Interfaces: All


DBMSs provide interface to enable applications to use DBMS
services. They provide data access via structured query language
(SQL). The DBMS query language contains two components: (a) a
data definition language (DDL) and (b) a data manipulation language
(DML). As discussed in Chapter 1, Section 1.10, the DDL defines the
structure in which the data are stored and the DML allows end users
to extract the data from the database. The DBMS also provides data
access to application programmers via procedural (3GL) languages
such as C, PASCAL, COBOL, Visual BASIC and others.

You might also like