Database Architecture
Database Architecture
Databases have become ubiquitous, touching almost every activity. Every IT application today uses databases in some form
or the other. They have tremendous impact in all applications, and have made qualitative changes in fields as diverse as
health, education, entertainment, industry, and banking.
Database systems have evolved from the late 1960s hierarchical and network models to today’s relational model.
From the earlier file-based system (basically repositories of data, providing very simple retrieval facilities), they now
address complex environments, offering a range of functionalities in user-friendly environment.
The academic community has strived, and is continuing to strive, to improve these services. At the back of these complex
software packages is mathematics and other research which provides the backbone and basic building blocks of these
systems. It is a challenge to provide good database services in a dynamic and flexible environment in a user-friendly way.
An understanding of the basics of database systems is crucial to designing good applications
An organisation requires an accurate and reliable data and efficient database system for effective decision-making. To
achieve this goal, the organisation maintains records for its varied operations by building appropriate database models
and by capturing essential properties of the objects and record relationship. Users of a database system in the
organisation look for an abstract view of data they are interested in.
Furthermore, since database is a shared resource, each user may require a different view of the data held in the database.
Therefore, one of the main aims of a database system is to provide users with an abstract view of data, hiding certain
details of how data is stored and manipulated.
To satisfy these needs, we need to develop architecture for the database systems. The database architecture is a
framework in which the structure of the DBMS is described.
The DBMS architecture has evolved from early-centralized monolithic systems to the modern distributed DBMS system
with modular design. Large centralized mainframe computers have been replaced by hundreds of distributed workstations
and personal computers connected via communications networks.
In the early systems, the whole DBMS package was a single, tightly integrated system, whereas the modern DBMS is
based on client-server system architecture. Under the client-server system architecture, the majority of the users of the
DBMS are not present at the site of the database system, but are connected to it through a network. On server machines,
the database system runs, whereas on client machines (which are typically workstations or personal computers) remote
database users work.
The database applications are usually portioned into a two-tier architecture or a three-tier architecture, as shown in Fig.
2.1. In a two-tier architecture, the application is partitioned into a component that resides at the client machines, which
evokes database system functionality at the server machine through query language statements. Application program
interface standards are used for interaction between the client and the server.
In a three-tier architecture, the client machine acts as merely a front-end and does not contain any direct database calls.
Instead, the client end communicates with an application server, usually through a forms interface. The application server
in turn communicates with a database system to access data. The business logic of the application, which says what
actions to carry out and under what conditions, is embedded in the application server, instead of being distributed across
multiple clients. Three-tier architectures are more appropriate for large applications and for applications that run on the
World Wide Web (WWW).
It is not always possible that every database system can be fitted or matched to a particular framework. Also, there is no
particular framework that can be said to be the only possible framework for defining database architecture. However, in
this topic, a generalized architecture of the database system, which fits most system reasonably well, will be discussed.
When the database is designed to meet the information needs of an organisation, plans (or scheme) of the database and
actual data to be stored in it becomes the most important concern of the organisation. It is important to note that the
data in the database changes frequently, while the plans remain the same over long periods of time (although not
necessarily forever).
The database plans consist of types of entities that a database deals with, the relationships among these entities and the
ways in which the entities and relationships are expressed from one level of abstraction to the next level for the users’
view. The users’ view of the data (also called logical organization of data) should be in a form that is most convenient for
the users and they should not be concerned about the way data is physically organized. Therefore, a DBMS should do the
translation between the logical (users’ view) organisation and the physical organization of the data in the database.
2.2.1. Schema
The plan (or formulation of scheme) of the database is known as schema. Schema gives the names of the entities and
attributes. It specifies the relationship among them. It is a framework into which the values of the data items (or fields) are
fitted. The plans or the format of schema remains the same. But the values fitted into this format changes from instance to
instance. In other terms, schema mean an overall plan of all the data item (field) types and record types stored in a
database. Schema includes the definition of the database name, the record type and the components that make up those
records.
Let us look at a Fig. 1.23 and assume that it is a sales record database of M/s ABC, a manufacturing company. The
structure of the database consisting of three files (or tables) namely, PRODUCT, CUSTOMER and SALES files is the schema
of the database. A database schema corresponds to the variable declarations (along with associated type definitions) in a
program.
Fig. 2.2 shows a schema diagram for the database structure shown in Fig. 1.23. The schema diagram displays the structure
of each record type but not the actual instances of records. Each object in the schema, for example, PRODUCT, CUSTOMER
or SALES are called a schema construct.
Fig. 2.2. Schema diagram for database of M/s ABC Company
(a) Schema diagram for sales record database
As can be seen in Fig. 2.3 (c), the duplication of attributed are avoided using relationships and cross-referencing. For
example, the attributes SUP-NAME, SUP-ADD and SUP-DETAILS are included in separate SUPPLIER record and not in the
PURCHASE-ORDER record. Similarly, attributes such as PART-NAME, PART-DETAILS and QTY-ON-HAND are included in
separate PART record and not in the PURCHASE-ITEM record. Thus, the duplication of including PART-DETAILS and
SUPPLIERS in every PURCHASE-ITEM is avoided. With the help of relationships and cross-referencing, the records are
linked appropriately with each other to complete the information and data is located quickly.
The database system can have several schemas partitioned according to the levels of abstraction. In general, schema can
be categorised in two parts; (a) a logical schema and (b) a physical schema. The logical schema is concerned with
exploiting the data structures offered by a DBMS in order to make the scheme understandable to the computer.
The physical schema, on the other hand, deals with the manner in which the conceptual database shall get represented in
the computer as a stored database. The logical schema is the most important as programs use it to construct applications.
The physical schema is hidden beneath the logical schema and can usually be changed easily without affecting application
programs. DBMSs provide database definition language (DDL) and database storage definition language (DSDL) in order to
make the specification of both the logical and physical schema easy for the DBA.
2.2.2. Subschema
A subschema is a subset of the schema and inherits the same property that a schema has. The plan (or scheme) for a view
is often called subschema. Subschema refers to an application programmer’s (user’s) view of the data item types and
record types, which he or she uses. It gives the users a window through which he or she can view only that part of the
database, which is of interest to him. In other words, subschema defines the portion of the database as “seen” by the
application programs that actually produced the desired information from the data contained within the database.
Therefore, different application programs can have different view of data. Fig. 2.4 shows subschemas viewed by two
different application programs derived from the example of Fig. 2.3.
Fig. 2.4. Subschema views of two applications programs
(a) Subschema for first application program
As shown in Fig. 2.4, the SUPPLIER-MASTER record of first application program {Fig. 2.4 (a)} now contains additional
attributes such a SUP-NAME and SUP-ADD from SUPPLIER record of Fig. 2.3 and the PURCHASE-ORDER-DETAILS record
contains additional attributes such as PART-NAME, SUP-NAME and PRICE from two records PART and SUPPLIER
respectively. Similarly, ORDER-DETAILS record of second application program {Fig. 2.4 (b)} contains additional attributes
such as SUP-NAME, and QTY-ORDRD form two records SUPPLIER and PURCHASE-ITEM respectively.
Individual application programs can change their respective subschema without effecting subschema views of others. The
DBMS software derives the subschema data requested by application programs from schema data. The database
administrator (DBA) ensures that the subschema requested by application programs is derivable from schema.
The application programs are not concerned about the physical organisation of data. The physical organisation of data in
the database can change without affecting application programs. In other words, with the change in physical organisation
of data, application programs for subschema need not be changed or modified. Subschemas also act as a unit for enforcing
controlled access to the database, for example, it can bar a user of a subschema from updating a certain value in the
database but allows him to read it. Further, the subschema can be made basis for controlling concurrent operations on the
database. Subschema definition language (SDL) is used to specify a subschema in the DBMS. The nature of this language
depends upon the data structure on which a DBMS is based and also upon the host language within which DBMS facilities
are used. The subschema is sometimes referred to as an LVIEW or logical view. Many different subschemas can be derived
from one schema.
2.2.3. Instances
When the schema framework is filled in the data item values or the contents of the database at any point of time (or
current contents), it is referred to as an instance of the database. The term instance is also called as state of the database
or snapshot. Each variable has a particular value at a given instant. The values of the variables in a program at a point in
time correspond to an instance of a database schema, as shown in Fig. 2.5.
Fig. 2.5. Instance of the database of M/s ABC Company
(a) Instance of the PRODUCT relation
The difference between database schema and database state or instance is very distinct. In the case of a database schema,
it is specified to DBMS when new database is defined, whereas at this point of time, the corresponding database state is
empty with no data in the database. Once the database is first populated with the initial data, from then on, we get
another database state whenever an update operation is applied to the database. At any point of time, the current state
of the database is called the instance.
The view at each of the above levels is described by a scheme or schema. As explained in Section 2.2, a schema is an
outline or plan that describes the records, attributes and relationships existing in the view. The term view, scheme and
schema are used interchangeably. A data definition language (DDL), as explained in Section 1.10.1, is used to define the
conceptual and external schemas. Structured query language (SQL) commands are used to describe the aspects of the
physical (or internal schema). Information about the internal, conceptual and external schemas is stored in the system
catalog, as explained in Section 1.2.6.
Let us take an example of CUSTOMER record of Fig. 2.2 as shown in Fig. 2.7 (a). The integrated record definition of
CUSTOMER record is shown in Fig. 2.7 (b). The data has been abstracted in three levels corresponding to three views
(namely internal, conceptual and external views), as shown in Fig. 2.8. The lowest level of abstraction of data contains a
description of the actual method of storing data and is called the internal view, as shown in Fig. 2.8 (c). The second level of
abstraction is the conceptual or global view, as shown in Fig. 2.8 (b). The third level is the highest level of abstraction seen
by the user or application program and is called the external view or user view, as shown in Fig. 2.8 (a). The conceptual
view is the sum total of user or external view of data.
Fig. 2.7. CUSTOMER record definition
The conceptual level supports each external view, in that any data available to a user must be contained in, or derived
from, the conceptual level. However, this level must not contain any storage-dependent details. For example, the
description of an entity should contain only data types of attributes (for example, integer, real, character and so on) and
their length (such as the maximum number of digits or characters), but not any storage consideration, such as the number
of bytes occupied. The choice of relations and the choice of field (or data item) for each relation, is not always obvious.
The process of arriving at a good conceptual schema is called conceptual database design. The conceptual schema is
written using conceptual data definition language (conceptual DDL).
Degree of knowledge Hierarchical Attention required Attention required Attention required Attention
required by database DBMS about physical- about physical- about physical- required
designer using level details level details level details about
physical-
level
details
Immunity of the conceptual (or external) schemas to changes in the internal schema is referred
to as physical data independence. In physical data independence, the conceptual schema
insulates the users from changes in the physical storage of the data. Changes to the internal
schema, such as using different file organisations or storage structures, using different storage
devices, modifying indexes or hashing algorithms, must be possible without changing the
conceptual or external schemas. In other words, physical data independence indicates that the
physical storage structures or devices used for storing the data could be changed without
necessitating a change in the conceptual view or any of the external views. The change is
absorbed by conceptual/internal mapping, as discussed in Section 2.5.1.
Immunity of the external schemas (or application programs) to changes in the conceptual
schema is referred to as logical data independence. In logical data independence, the users are
shielded from changes in the logical structure of the data or changes in the choice of relations to
be stored. Changes to the conceptual schema, such as the addition and deletion of entities,
addition and deletion of attributes, or addition and deletion of relationships, must be possible
without changing existing external schemas or having to rewrite application programs. Only the
view definition and the mapping need be changed in a DBMS that supports logical data
independence. It is important that the users for whom the changes have been made should not
be concerned. In other words, the application programs that refers to the external schema
constructs must work as before, after the conceptual schema undergoes a logical reorganisation.
2.5. MAPPINGS
The three schemas and their levels discussed in Section 2.3 are the description of data that actually exists in the physical
database. In the three-schema architecture database system, each user group refers only to its own external schema.
Hence, the user’s request specified at external schema level must be transformed into a request at conceptual schema
level. The transformed request at conceptual schema level should be further transformed at internal schema level for final
processing of data in the stored database as per user’s request. The final result from processed data as per user’s request
must be reformatted to satisfy the user’s external view. The process of transforming requests and results between the
three levels are called mappings. The database management system (DBMS) is responsible for this mapping between
internal, conceptual and external schemas. The three-tier architecture of ANSI-SPARC model provides the following two-
stage mappings as shown in Fig. 2.9:
Conceptual/Internal mapping
External/Conceptual mapping
A database management system (DBMS) is highly complex and sophisticated software that handles access to the
database.
The structure of DBMS varies greatly from system to system and, therefore, a generalised component structure of DBMS
is not possible to make.
A typical structure of a DBMS with its components and relationships between them is shown in Fig. 2.10. The DBMS
software is partitioned into several modules. Each module or component is assigned a specific operation to perform. Some
of the functions of the DBMS are supported by operating systems (OS) to provide basic services and DBMS is built on top
of it. The physical data and system catalog are stored on a physical disk. Access to the disk is controlled primarily by OS,
which schedules disk input/output. Therefore, while designing a DBMS its interface with the OS must be taken into
account.
A hierarchical path that traces the parent segments to the child segments, beginning from the left, defines the tree shown
in Fig. 2.12. For example, the hierarchical path for segment ‘E’ can be traced as ABDE, tracing all segments from the root
starting at the leftmost segment. This left-traced path is known as preorder traversal or the hierarchical sequence. As can
be noted from Fig. 2.12 that each parent can have many children but each child has only one parent.
Fig. 2.13 (a) shows a hierarchical data model of a UNIVERSITY tree type consisting of three levels and three record types
such as DEPARTMENT, FACULTY and COURSE. This tree contains information about university academic departments along
with data on all faculties for each department and all courses taught by each faculty within a department. Fig. 2.13
(b) shows the defined fields or data types for department, faculty, and course record types. A single department record at
the root level represents one instance of the department record type. Multiple instances of a given record type are used
at lower levels to show that a department may employ many (or no) faculties and that each faculty may teach many (or
no) courses. For example, we have a COMPUTER department at the root level and as many instances of the FACULTY
record type are faculties in the computer department. Similarly, there will be as many COURSE record instances for each
FACULTY record as that faculty teaches. Thus, there is a one-to-many (1:m) association among record instances, moving
from the root to the lowest level of the tree. Since there are many departments in the university, there are many instances
of the DEPARTMENT record type, each with its own FACULTY and COURSE record instances connected to it by appropriate
branches of the tree. This database then consists of a forest of such tree instances; as many instances of the tree type as
there are departments in the university at any given time. Collectively, these comprise a single hierarchic database and
multiple databases will be online at a time.
Fig. 2.13. Hierarchical data model relationship of university tree type
Suppose we are interested in adding information about departments to our hierarchical database. For example, since the
departments are having various subjects for teaching, we want to keep record of subjects with each department in the
university. In that case, we would expand the diagram of Fig. 2.13 to look like that of Fig. 2.14. DEPARTMENT is still related
to FACULTY which is related to COURSE. DEPARTMENT is also related to SUBJECT which is related to TOPIC. We see from
this diagram that DEPARTMENT is at the top of a hierarchy from which a large amount of information can be derived.
Fig. 2.14. Hierarchical relationship of department with faculty and subject
Hierarchical database is one of the oldest database models used by enterprise in the past. Information Management
System (IMS), developed jointly by IBM and North American Rockwell Company for mainframe computer platform, was
one of the first hierarchical databases. IMS became the world’s leading hierarchical database system in the 1970s and
early 1980s. Hierarchical database model was the first major commercial implementation of a growing pool of database
concepts that were developed to counter the computer file system’s inherent shortcomings.
Unlike the hierarchical data model, network data model supports multiple paths to the same record, thus avoiding the
data redundancy problem associated with hierarchical system.
E.F. Codd of IBM Research first introduced the relational data model in a paper in 1970. The relational data model is
implemented using very sophisticated Relational Database Management System(RDBMS). The RDMS performs the same
basic functions of the hierarchical and network DBMSs plus a host of other functions that make the relational data models
easier to understand and implement. The relational data model simplified the user’s view of the database by using simple
tables instead of the more complex tree and network structures. It is a collection of tables (also called relations) as shown
in Fig. 2.17 (a) in which data is stored. Each of the tables is a matrix of a series of row and column intersections. Tables are
related to each other by sharing common entity characteristic. For example, a CUSTOMER table might contain an AGENT-
ID that is also contained in the AGENT table, as shown in Fig. 2.17 (a) and (b).
Fig. 2.17. Relational data model
(a) Relational Tables
(a) Linkage between relational tables
Even though the customer and agent data are stored in two different tables, the common link between the CUSTOMER
and AGENT tables, which is AGENT-ID, helps in connecting or matching of the customer to its sales agent. Although tables
are completely independent of one another, data between the tables can be easily connected using common links. For
example, the agent of customer “Lions Distributors” of CUSTOMER table can be retrieved as “Greenlay & Co.” from AGENT
table with the help of a common link AGENT-ID, which is AO-9999.
2.7.6.1. Advantages of Relational Data Model
Simplicity: A relational data model is even simpler than hierarchical and network models. It frees the designers
from the actual physical data storage details, thereby allowing them to concentrate on the logical view of the
database.
Structural independence: Unlike hierarchical and network models, the relational data model does not depend on
the navigational data access system. Changes in the database structure do not affect the data access.
Ease of design, implementation, maintenance and uses: The relational model provides both structural
independence and data independence. Therefore, it makes the database design, implementation, maintenance
and usage much easier.
Flexible and powerful query capability: The relational database model provides very powerful, flexible, and easy-to-
use query facilities. Its structured query language (SQL) capability makes ad hoc queries a reality.
This organisation manufactures various products, which are sold to the customers against an order.
Fig. 2.19 (b) shows data items and records of entities. According to the E-R diagram of Fig. 2.19 (a), a customer having
identification no. 1001, name Waterhouse Ltd. with address Box 41, Mumbai [as shown in Fig. 2.19 (b)], is an entity since it
uniquely identifies one particular customer. Similarly, a product A1234 with a description Steel almirah and unit cost
of 4000 is an entity since it uniquely identifies one particular product and so on.
Fig. 2.19. E-R diagram for M/s ABC & Co
Instances of the class-object correspond to individual customers. Within an object, the class attributes takes specific
values, which distinguish one customer (object) from another. However, all the objects belonging to the class, share the
behaviour pattern of the class. The object-oriented database maintains relationships through logical containment.
The object-oriented database is based on encapsulation of data and code related to an object into a single unit, whose
contents are not visible to the outside world. Therefore, object-oriented data models emphasise on objects (which is a
combination of data and code), rather than on data alone. This is largely due to their heritage from object-oriented
programming languages, where programmers can define new types or classes of objects that may contain their own
internal structures, characteristics and behaviours.
Thus, data is not thought of as existing by itself. Instead, it is closely associated with code (methods of member functions)
that defines what objects of that type can do (their behaviour or available services). The structure of object-oriented data
model is highly variable. Unlike traditional databases (such as hierarchical, network or relational), it has no single inherent
database structure. The structure for any given class or type of object could be anything a programmer finds useful, for
example, a linked list, a set, an array and so forth. Furthermore, an object may contain varying degrees of complexity,
making use of multiple types and multiple structures.
The object-oriented database management system (OODBMS) is among the most recent approaches to database
management. They started in the engineering and design domain applications, and became the favoured system for
financial, telecommunications, and World Wide Web (WWW) applications. It is suited for multimedia applications as well
as data with complex relationships that are difficult to model and process in a relational DBMS .
Data model Data element Relationship organisation Identity Access Data Structural
organisation language Independence Independe
Relational Tables Identifiers of rows in one table Value- Non- Yes Yes
are embeded as attribute based procedural
values in another table
E-R diagram Objects, Entity Relational extenders that Value- Non- Yes Yes
set support specialized based procedural
applications.
The classification of a database management system (DBMS) is greatly influenced by the underlying computing system on
which it runs, in particular of computer architecture such as parallel, networked or distributed. However, the DBMS can be
classified according to the number of users, the database site locations and the expected type and extent of use.
1. On the basis of the number of users:
1. Single-user DBMS.
2. Multi-user DBMS.
2. On the basis of the site locations:
1. Centralized DBMS.
2. Parallel DBMS.
3. Distributed DBMS.
4. Client/server DBMS.
3. On the basis of the type and the extent of use:
1. Transactional or production DBMS.
2. Decision support DBMS.
3. Data warehouse.
In this section, we will discuss about some of the important types of DBMS system, which are presently being used.
Fig. 2.21 illustrates the different architecture of parallel database system. In shared data storage disk, all the processors
share a common disk (or set of disks), as shown in Fig. 2.21 (a). In shared memory architecture, all the processors share
common memory, as shown in Fig. 2.21 (b). In independent resource architecture, the processors share neither a common
memory nor a common disk. They have their own independent resources as shown in Fig. 2.21 (c). Hierarchical
architecture is hybrid of all the earlier three architectures, as shown in Fig. 2.21 (d). A further detail on parallel database
system is given in Chapter 17.
As shown in Fig. 2.23, in distributed database system, data is spread across a variety of different databases. These are
managed by a variety of different DBMS softw
ares running on a variety
of different computing machines supported by a variety of different operating systems. These machines are spread (or
distributed) geographically and connected together by a variety of communication networks. In distributed database
system, one application can operate on data that is spread geographically on different machines. Thus, in distributed
database system, the enterprise data might be distributed on different computers in such a way that data for one portion
(or department) of the enterprise is stored in one computer and the data for another department is stored in another.
Each machine can have data and applications of its own. However, the users on one computer can access to data stored in
several other computers. Therefore, each machine will act as a server for some users and a client for others. A further
detail on distributed database system is given in Chapter 18.
2.8.4.1. Advantages of Distributed Database System
Distributed database architecture provides greater efficiency and better performance.
Response time and throughput is high.
The server (database) machine can be custom-built (tailored) to the DBMS function and thus can provide better
DBMS performance.
The client (application database) might be a personnel workstation, tailored to the needs of the end users and thus
able to provide better interfaces, high availability, faster responses and overall improved ease of use to the user.
A single database (on server) can be shared across several distinct client (application) systems.
As data volumes and transaction rates increase, users can grow the system incrementally.
It causes less impact on ongoing operations when adding new locations.
Distributed database system provides local autonomy.