CSC 122: DATABASE SYTEMS
LECTURE TWO
DATABASE SYSTEM CONCEPTS AND ARCHITECTURE
BY MADAM PAULINE MWAKA
2.1 INTRODUCTION
The architecture of DBMS packages has evolved from the early monolithic systems, where
the whole DBMS software package was one tightly integrated system, to the modern
DBMS packages that are modular in design, with a client/server system architecture. This
evolution mirrors the trends in computing, where large centralized mainframe computers
are being replaced by hundreds of distributed workstations and personal computers
connected via communications networks to various types of server machines—Web
servers, database servers, file servers, application servers, and so on.
1.2 DATA MODELS
One fundamental characteristic of the database approach is that it provides some level of
data abstraction. Data abstraction generally refers to the suppression of details of data
organization and storage, and the highlighting of the essential features for an improved
understanding of data.
One of the main characteristics of the database approach is to support data abstraction so
that different users can perceive data at their preferred level of detail. A data model which
is a collection of concepts that can be used to describe the structure of a database—provides
the necessary means to achieve this abstraction. By structure of a database we mean the
data types, relationships, and constraints that apply to the data.
There are numerous data models that can be used. They include
2.2.1 Relational Model
The Relational Model was first introduced by Dr. Edgar Frank, an Oxford-trained
Mathematician, while working in IBM Research Centre in 1970’s.
It organizes data into tables with rows and columns, where each table represents an entity,
each row represents a record or instance of that entity while a column represents the
attributes of the entity. Relationships between tables are established through keys, such as
primary keys and foreign keys. The relational model allows for flexible querying and
supports the use of Structured Query Language (SQL).
2.2.2 Entity-Relationship Model (ER Model)
The E-R model is a high level conceptual data model developed by Chen in 1976 to
facilitate database design. The entity-relationship model is used to represent relationships
between entities in a database. It uses entities to represent real-world objects, attributes to
describe properties of those objects, and relationships to define associations between
entities. The ER model is often used during the design phase of a database to identify
entities, their attributes, and the relationships between them.
2.2.3 Hierarchical Model
The hierarchical model organizes data in a tree-like structure, with parent-child
relationships. Each record or data item is connected to only one parent record but can have
multiple child records. The hierarchical model was popular in the early days of databases
and is still used in certain specialized applications, such as file systems.
2.2.4 Network Model
As a result of limitations in the hierarchical model, designers developed the Network
Model. The ability of this model to handle many to many (N : N) relations between its
records is the main distinguishing feature from the hierarchical model.
The network model is an extension of the hierarchical model that allows for more complex
relationships. It represents data as a collection of records and sets, with each record having
multiple owner records and member records. The network model enables many-to-many
relationships and provides more flexibility than the hierarchical model.
2.2.5 Object-Oriented Model
The object-oriented model represents data as objects that encapsulate both data and the
operations that can be performed on the data. It combines data and behavior into a single
unit and supports inheritance, polymorphism, and encapsulation. Object-oriented databases
(OODBs) are based on this model and are suitable for applications with complex data
structures and object-oriented programming paradigms.
2.3 DATABASE SCHEMA AND INSTANCE
In any data model, it is important to distinguish between the description of the database
and the database itself. The description of a database is called the database schema, which
is specified during database design and is not expected to change frequently. Most data
models have certain conventions for displaying schemas as diagrams. A displayed schema
is called a schema diagram.
A schema diagram displays only some aspects of a schema, such as the names of record
types and data items, and some types of constraints. Other aspects are not specified in the
schema diagram
The actual data in a database may change quite frequently. For example, the student
database changes every time we add a new student or enter a course. The data in the
database at a particular moment in time is called a database instance or database state or
snapshot.
The distinction between database schema and database state is very important. When we
define a new database, we specify its database schema only to the DBMS. At this point,
the corresponding database state is the empty state with no data. We get the initial state of
the database when the database is first populated or loaded with the initial data. From then
on, every time an update operation is applied to the database, we get another database
state. At any point in time, the database has a current state.
The DBMS is partly responsible for ensuring that every state of the database is a valid
state—that is, a state that satisfies the structure and constraints specified in the schema.
Hence, specifying a correct schema to the DBMS is extremely important and the schema
must be designed with utmost care. The DBMS stores the descriptions of the schema
constructs and constraints—also called the meta-data—in the DBMS catalog so that
DBMS software can refer to the schema whenever it needs to.
2.4 THREE-SCHEMA ARCHITECTURE AND DATA INDEPENDENCE
Three of the four important characteristics of the database approach, listed in Lecture one
is
i. Use of a catalog to store the database description (schema) so as to make it self-
describing,
ii. Insulation of programs and data (program-data and program-operation
independence), and
iii. Support of multiple user views.
In this section we specify an architecture for database systems, called the three-schema
architecture, that was proposed to help achieve and visualize these characteristics. Then we
discuss the concept of data independence further.
2.4.1 The Three-Schema Architecture
The goal of the three-schema architecture is to separate the user applications from the
physical database. In this architecture, schemas can be defined at the following three levels:
1. The internal level has an internal schema, which describes the physical storage structure
of the database. The internal schema uses a physical data model and describes the complete
details of data storage and access paths for the database.
2. The conceptual level has a conceptual schema, which describes the structure of the
whole database for a community of users. The conceptual schema hides the details of
physical storage structures and concentrates on describing entities, data types,
relationships, user operations, and constraints. Usually, a representational data model is
used to describe the conceptual schema when a database system is implemented.
3. The external or view level includes a number of external schemas or user views. Each
external schema describes the part of the database that a particular user group is interested
in and hides the rest of the database from that user group.
Figure 1: Three level schema architecture diagram
2.4.2 Data Independence
The three-schema architecture can be used to further explain the concept of data
independence, which can be defined as the capacity to change the schema at one level of a
database system without having to change the schema at the next higher level. We can
define two types of data independence:
i. Logical data independence is the capacity to change the conceptual schema
without having to change external schemas or application programs. We may
change the conceptual schema to expand the database (by adding a record type or
data item), to change constraints, or to reduce the database (by removing a record
type or data item).
ii. Physical data independence is the capacity to change the internal schema without
having to change the conceptual schema. Hence, the external schemas need not be
changed as well. Changes to the internal schema may be needed because some
physical files were reorganized—for example, by creating additional access
structures—to improve the performance of retrieval or update. If the same data as
before remains in the database, we should not have to change the conceptual
schema.
2.5 DBMS ARCHITECTURES
Database management systems are divided into multiple levels of abstraction for proper
functioning. These modules/layers describe the functioning and the design of the DBMS.
Since a database management system is not always directly accessible by the user or an
application, we can maintain it with the help of various architectures based on how the user
is connected to the database. These architectures follow a tier-based classification, i.e., the
DBMS architecture is classified depending upon how many layers are present in
the structure of the DBMS.
Hence, an n-tier DBMS Architecture divides the whole DBMS into related but n
independent layers or levels, i.e., a one-tier architecture divides the DBMS into a single
layer, a two-tier DBMS architecture divides the DBMS into two layers, a three-tier in three
layers, and so on. When the layers are increased in the architecture, the level of abstraction
also increases, resulting in an increase in the security and the complexity of the DBMS
structure. All these layers are independent, i.e., any modification performed in a particular
layer does not affect the other layer present in the architecture.
Now, let’s look at the most common DBMS architectures:
• Single Tier Architecture (One-Tier Architecture)
• Two-Tier Architecture
• Three-Tier Architecture
1. Single Tier Architecture
• In this architecture, the database is directly available to the user. It means the user
can directly sit on the DBMS and uses it.
• Any changes done here will directly be done on the database itself. It doesn't provide
a handy tool for end users.
• The 1-Tier architecture is used for development of the local application, where
programmers can directly communicate with the database for the quick response.
Single Tier DBMS Architecture is used whenever:
• The data isn't changed frequently.
• No multiple users are accessing the database system.
• We need a direct and simple way to modify or access the database for application
development.
Example of Single Tier DBMS Architecture:
In order to learn the Structure Query Language (SQL), we set up our SQL server and the
database on our local system. This SQL server enables us to directly interact with the
relational database and execute certain operations without requiring any network
connection. This whole setup to learn SQL queries is an example of Single-Tier DBMS
architecture.
2. Two Tier Architecture
The 2-tier Architecture is based on a client-server machine.
In this type of architecture, the applications on client-side interact directly with the database
present at the server-side.
This interaction between client and server uses Application Program Interface like ODBC
and JDBC.
• ODBC − Open Database Connectivity
• JDBC − Java Database Connectivity
When there are a large number of users at client side to access the database, this architecture
gives a poor performance.
The server side is responsible for delivering the functionalities like query processing and
management of transactions.
For example − Oracle, Sybase, Microsoft SQL Server etc.
The Tier-2 architecture of DBMS is diagrammatically represented as follows −
The main advantages of having a two-tier architecture over a single tier are:
• Multiple users can use it at the same time. Hence, it can be used in an organization.
• It has high processing ability as the database functionality is handled by the server
alone.
• Faster access to the database due to the direct connection and improved
performance.
• Because of the two independent layers, it's easier to maintain.
Example of Two Tier DBMS Architecture:
Consider a situation where you went to a bank to withdraw some cash. After entering the
withdrawal amount and the account details on the withdrawal slip, the banker will go
through the server-side database via his credential (API call) and will check whether there
is enough balance present or not. This client-server model is an example of Two-Tier
DBMS architecture.
3. Three Tier Architecture
A 3-tier architecture separates its tiers from each other based on the complexity of the users
and how they use the data present in the database. It is the most widely used architecture
to design a DBMS.
• Database (Data) Tier − At this tier, the database resides along with its query
processing languages. We also have the relations that define the data and their
constraints at this level.
• Application (Middle) Tier − At this tier reside the application server and the
programs that access the database. For a user, this application tier presents an
abstracted view of the database. End-users are unaware of any existence of the
database beyond the application. At the other end, the database tier is not aware of
any other user beyond the application tier. Hence, the application layer sits in the
middle and acts as a mediator between the end-user and the database.
• User (Presentation) Tier − End-users operate on this tier and they know nothing
about any existence of the database beyond this layer. At this layer, multiple views
of the database can be provided by the application. All views are generated by
applications that reside in the application tier.
2.6 DATABASE DESIGN AND IMPLEMENTATION PROCESS
A database undergoes through six phases in its design and implementation as discussed
below
1. Requirements collection and analysis. During this step, the database designers
interview prospective database users to understand and document their data
requirements. The result of this step is a concisely written set of users’ requirements.
These requirements should be specified in as detailed and complete a form as
possible. In parallel with specifying the data requirements, it is useful to specify the
known functional requirements of the application. These consist of the user defined
operations (or transactions) that will be applied to the database, including both
retrievals and updates.
2. Conceptual Design: Once the requirements have been collected and analyzed, the
next step is to create a conceptual schema for the database, using a high-level
conceptual data model. The conceptual schema is a concise description of the data
requirements of the users and includes detailed descriptions of the entity types,
relationships, and constraints; these are expressed using the concepts provided by
the high-level data model.
Because these concepts do not include implementation details, they are usually
easier to understand and can be used to communicate with nontechnical users. The
high-level conceptual schema can also be used as a reference to ensure that all users’
data requirements are met and that the requirements do not conflict. This approach
enables database designers to concentrate on specifying the properties of the data,
without being concerned with storage and implementation details. This makes it is
easier to create a good conceptual database design.
3. Choice of a DBMS The choice of a DBMS is governed by a number of factors—
some technical, others economic, and still others concerned with the politics of the
organization. The technical factors focus on the suitability of the DBMS for the task
at hand. Issues to consider are
• The type of DBMS (relational, object-relational, object, other),
• The storage structures and access paths that the DBMS supports,
• The user and programmer interfaces available,
• The types of high-level query languages,
• The availability of development tools,
• The ability to interface with other DBMSs via standard interfaces,
• The architectural options related to client-server operation, and so on.
Nontechnical factors include the financial status and the support organization of the
vendor.
4. Logical design: The next step in database design is the actual implementation of
the database, using a selected DBMS. Most current commercial DBMSs use an
implementation data model—such as the relational or the object-relational
database model—so the conceptual schema is transformed from the high-level data
model into the implementation data model. This step is called logical design or
data model mapping; its result is a database schema in the implementation data
model of the DBMS. Data model mapping is often automated or semiautomated
within the database design tools.
5. Physical design: The last step is the physical design phase, during which the
internal storage structures, file organizations, indexes, access paths, and physical
design parameters for the database files are specified. In parallel with these
activities, application programs are designed and implemented as database
transactions corresponding to the high-level transaction specifications.
6. Database system implementation and tuning. During this phase, the database and
application programs are implemented, tested, and eventually deployed for service.
Various transactions and applications are tested individually and then in conjunction
with each other. This typically reveals opportunities for physical design changes,
data indexing, reorganization, and different placement of data—an activity referred
to as database tuning. Tuning is an ongoing activity—a part of system
maintenance that continues for the life cycle of a database as long as the database
and applications keep evolving and performance problems are detected.
SUMMARY
In this chapter we introduced the main concepts used in database systems. We defined a
data model and we distinguished its five main categories. We also distinguished the
schema, or description of a database, from the database itself. The schema does not change
very often, whereas the database state changes every time data is inserted, deleted, or
modified. Then we described the three-schema DBMS architecture, which allows three
schema levels:
• An internal schema describes the physical storage structure of the database.
• A conceptual schema is a high-level description of the whole database.
• External schemas describe the views of different user groups.
A DBMS that cleanly separates the three levels must have mappings between the schemas
to transform requests and query results from one level to the next. Most DBMSs do not
separate the three levels completely. We used the three-schema architecture to define the
concepts of logical and physical data independence. Finally we discussed the DBMS
architectures.
DISCUSSION QUESTION
If you were designing a Web-based system to make airline reservations and sell airline
tickets, which DBMS architecture would you choose from the above discussed ones? Why?
Why would the other architectures not be a good choice?