Schemas, Subscema and Instances
Schemas, Subscema and Instances
INTRODUCTION
An organisation requires an accurate and reliable data and efficient
database system for effective decision-making.
To achieve this goal, the organisation maintains records for its varied
operations by building appropriate database models and by capturing
essential properties of the objects and record relationship. Users of a
database system in the organisation look for an abstract view of data they
are interested in.
**********
In the early systems, the whole DBMS package was a single, tightly
integrated system, whereas the modern DBMS is based on client-
server system architecture. Under the client-server system architecture, the
majority of the users of the DBMS are not present at the site of the
database system, but are connected to it through a network.
On server machines, the database system runs, whereas on client machines
(which are typically workstations or personal computers) remote database
users work.
Three-tier architectures are more appropriate for large applications and for
applications that run on the World Wide Web (WWW).
It is not always possible that every database system can be fitted or
matched to a particular framework.
****
It is important to note that the data in the database changes frequently,
while the plans remain the same over long periods of time (although not
necessarily forever). The database plans consist of types of entities that a
database deals with, the relationships among these entities and the ways in
which the entities and relationships are expressed from one level of
abstraction to the next level for the users’ view.
The users’ view of the data (also called logical organisation of data) should
be in a form that is most convenient for the users and they should not be
concerned about the way data is physically organised. Therefore, a DBMS
should do the translation between the logical (users’ view) organisation and
the physical organisation of the data in the database.
Schema
The plan (or formulation of scheme) of the database is known as schema.
Schema gives the names of the entities and attributes. It specifies the
relationship among them. It is a framework into which the values of the data
items (or fields) are fitted.
******
The plans or the format of schema remains the same. But the values fitted
into this format changes from instance to instance.
In other terms, schema mean an overall plan of all the data item (field)
types and record types stored in a database.
Schema includes the definition of the database name, the record type and
the components that make up those records.
Let us look at a Fig. 1.23 and assume that it is a sales record database of
M/s ABC, a manufacturing company. The structure of the database
consisting of three files (or tables) namely, PRODUCT, CUSTOMER and
SALES files is the schema of the database.
The purchasing system schema has three records (or objects) namely
PURCHASE-ORDER, SUPPLIER, PURCHASE-ITEM, QUOTATION and
PART. Solid arrows connecting different blocks show the relationships
among the objects. For example, the PURCHASE-ORDER record is
connected to the PURCHASE-ITEM records of which that purchase order is
composed and the SUPPLIER record to the QUOTATION records showing
the parts that supplier can provide and so forth. The dotted arrows show the
cross-references between attributes (or data items) of different objects or
records.
The database system can have several schemas partitioned according to the
levels of abstraction. In general, schema can be categorised in two parts; (a)
a logical schema and (b) a physical schema.
The logical schema is concerned with exploiting the data structures offered
by a DBMS in order to make the scheme understandable to the computer.
The physical schema, on the other hand, deals with the manner in which the
conceptual database shall get represented in the computer as a stored
database.
2.2.2. Subschema
A subschema is a subset of the schema and inherits the same property that
a schema has. The plan (or scheme) for a view is often called subschema.
Subschema refers to an application programmer’s (user’s) view of the data
item types and record types, which he or she uses. It gives the users a
window through which he or she can view only that part of the database,
which is of interest to him. In other words, subschema defines the portion of
the database as “seen” by the application programs that actually produced
the desired information from the data contained within the database.
Therefore, different application programs can have different view of
data. Fig. 2.4 shows subschemas viewed by two different application
programs derived from the example of Fig. 2.3.
Fig. 2.4. Subschema views of two applications programs
The application programs are not concerned about the physical organisation
of data. The physical organisation of data in the database can change
without affecting application programs. In other words, with the change in
physical organisation of data, application programs for subschema need not
be changed or modified. Subschemas also act as a unit for enforcing
controlled access to the database, for example, it can bar a user of a
subschema from updating a certain value in the database but allows him to
read it. Further, the subschema can be made basis for controlling
concurrent operations on the database. Subschema definition language
(SDL) is used to specify a subschema in the DBMS. The nature of this
language depends upon the data structure on which a DBMS is based and
also upon the host language within which DBMS facilities are used. The
subschema is sometimes referred to as an LVIEW or logical view. Many
different subschemas can be derived from one schema.
2.2.3. Instances
When the schema framework is filled in the data item values or the contents
of the database at any point of time (or current contents), it is referred to as
an instance of the database. The term instance is also called as state of the
database or snapshot. Each variable has a particular value at a given
instant. The values of the variables in a program at a point in time
correspond to an instance of a database schema, as shown in Fig. 2.5.
For the first time in 1971, Database Task Group (DBTG) appointed by the
Conference on Data Systems and Languages (CODASYL), produced a
proposal for general architecture for database systems. The DBTG proposed
a two-tier architecture as shown in Fig. 2.1 (a) with a system view called the
schema and user views called subschemas. In 1975, ANSI-SPARC (American
National Standards Institute – Standards Planning and Requirements
Committee) produced a three-tier architecture with a system catalog. The
architecture of most commercial DBMSs available today is based to some
extent on ANSI-SPARC proposal.
In the external level, the different views may have different representations
of the same data. For example, one user may view data in the form as day,
month, year while another may view as year, month, day. Some views might
include derived or calculated data, that is, data is not stored in the database
but are created when needed. For example, the average age of an employee
in an organisation may be derived or calculated from the individual age of
all employees stored in the database. External views may include data
combined or derived from several entities.
2.5. MAPPINGS
The three schemas and their levels discussed in Section 2.3 are the
description of data that actually exists in the physical database. In the
three-schema architecture database system, each user group refers only to
its own external schema. Hence, the user’s request specified at external
schema level must be transformed into a request at conceptual schema
level. The transformed request at conceptual schema level should be further
transformed at internal schema level for final processing of data in the
stored database as per user’s request. The final result from processed data
as per user’s request must be reformatted to satisfy the user’s external
view. The process of transforming requests and results between the three
levels are called mappings. The database management system (DBMS) is
responsible for this mapping between internal, conceptual and external
schemas.
There could be one mapping between conceptual and internal levels and
several mappings between external and conceptual levels. The
conceptual/internal mapping is the key to physical data independence while
the external/conceptual mapping is the key to the logical data
independence. Fig. 2.9 illustrates the three-tier ANSI-SPARC architecture
with mappings.
The information about the mapping requests among various schema levels
are included in the system catalog of DBMS. The DBMS uses additional
software to accomplish the mappings by referring to the mapping
information in the system catalog. When schema is changed at some level,
the schema at the next higher level remains unchanged. Only the mapping
between the two levels is changed. Thus, data independence is
accomplished. The two-stage mapping of ANSI-SPARC three-tier structure
provides greater data independence but inefficient mapping. However,
ANSI-SPARC provides efficient mapping by allowing the direct mapping of
external schemas on to the internal schema (by passing the conceptual
schema) but at reduced data independence (more data-dependent).
- Buffer manager: The buffer manager is responsible for the transfer of data
between the main memory and secondary storage (such as disk or tape). It
brings in pages from the disk to the main memory as needed in response to
read user requests. Buffer manager is sometimes referred as the cache
manager.
3. Data Definition Services: The DBMS accepts the data definitions such
as external schema, the conceptual schema, the internal schema, and
all the associated mappings in source form. It converts them to the
appropriate object form using a DDL processor component (as shown
in Fig. 2.10) for each of the various data definition languages (DDLs).
There are many ways for a DBMS to identify legitimate users. The most
common method is to establish accounts with passwords. Some DBMSs
use data encryption mechanisms to ensure the information written to disk
cannot be read or changed unless the user provides the encryption key that
unscrambles the data. Some DBMSs also provide users with the ability to
instruct the DBMS, via user exits, to employ custom-written routines to
encode the data. In some cases, organisations may be interested in
conducting security audits, particularly if they suspect the database may
have been tampered with. Some DBMSs provide audit trails, which are
traces or logs that records various kinds of database access activities (for
example, unsuccessful access attempts). Security management is discussed
in further details in Chapter 14.
11. The data relationships stored in the data dictionary are used to
enforce data integrity. Various types of integrity mechanisms and
constraints may be supported to help ensure that the data values
within a database are valid, that the operations performed on those
values are valid and that the database remains in a consistent state.
12. Data Independence Services: As discussed in Chapter 1, Section
1.8.5 (b) and Section 2.4, a DBMS must support the independence of
programs from the actual structure of the database.
13. Utility Services: The DBMS provides a set of utility services
used by the DBA and the database designer to create, implement,
monitor and maintain the database. These utility services help the
DBA to administer the database effectively.