CHAPTER 1 Final
CHAPTER 1 Final
This chapter discusses the basic concepts of the database management system. After
studying this chapter, students will be able to learn the following:
Basics of DBMS
File System vs. DBMS
Architecture of DBMS
Data Independence
Database Languages
Database Users
Advantages and Disadvantages of DBMS
1.0 INTRODUCTION
There is an enormous growth in the number and importance of database applications
since last two decades. Databases are used in almost all the applications in every
organization including business, healthcare, education, government, military and
libraries etc. Due to the highly competitive environment, Organizations require
accurate and reliable data for effective and efficient decision making. Many
organizations today are building separate databases, called "datawarehouse" for this
type of decision support applications, which is out of the scope of this book.
Before defining the DBMS, we must have a clear understanding of what the DBMS is
managing- the database. The data are distinct pieces of information. It can be stored
in a variety of ways such as numbers or text on pieces of paper, as bits and bytes
stored in electronic memory etc. A collection of such types of data designed to be
used by different people in different applications is called database. In more clear
way, it is a collection of inter-related data stored together with a limited duplication
which can be used by many users in one or more application in an efficient manner.
To access necessary information, which may be needed in applications from the
database, we need some mechanism or methods, which is termed as the management
system. A management system is a collection of progress that enables users to create
and maintain the database in a proficient and fastest way.
The primary purpose of a DBMS is to allow a user to store, update and retrieve data in
abstract terms and thus make it easy to maintain and retrieve information from a
database. A DBMS relieves the user from having to know about exact physical
representations of data and having to specify detailed algorithms for storing, updating
and retrieving data.
So, summarily a DBMS can be defined as software, which provides secure survivable
services for creating and accessing the database, while maintaining all the required
features of the data. A DBMS provides the various facilities such as: -
i) Creating a database
ii) Modifying / Removing the database.
iii) Inserting, Updating or Deleting the data.
iv) Maintaining the data integrity and the security.
v) Retrieving data from existing databases.
Besides these, a DBMS also provides various advanced features like Transaction
Management, Concurrency Management, Recovery Management and Storage
Management etc. All these features are explained in a chapter on Transaction and
Concurrency Control in this book.
DBMS provides data integrity services to ensure that the data is not corrupted through
outside means such as power problem or disk crash. It also provides protection of data
against unauthorized/illegal users, which means that only authorized users can access
the given data in the database. Recovering management is an activity which works in
case of a transactions failure. It returns the system to a consisted state.
The database stored in a database can be described as a group of related items, called
fields. The items all together constitute a record. A database can have many records
for example we have a STUDENT database shown below: -
Field
Each record in the database stores the information about a single a student. The fields
in each record store the important information about that student. For example Ram
has an Enrollment Number 1. His father’s name is Mr. Shyam and he is studying in
class XII with science stream.
The data in the database will be both integrated and shared. An organization that
wishes to establish a database in the modern computer system will want that all the
data must be considered for inclusion in a common pool. This common pool is
accessible to all parts of the organization with a need to know that information. So the
database need to be integrated in such a way that the data must not represent just the
isolated facts about the real world but should also represent naturally occurring
relationship among them.
For example a given database might contain two files one for student which contains
Enrollment_No, Stud_Name, Class, F_name and another for books which contains
Book_Id, Title, Author, Publisher and Price. These two files are being used in two
different applications say Library Management System and Accounts Management
System. In Library Management System, STUDENT and BOOK data files are used to
issue the books to the students while in Accounts Management System, STUDENT
file is used for fee purpose while BOOK file is used for billing purpose (for
calculating total amount expenditure in purchasing the books). So these files we can
use in multiple application.
Library Information
System
Account System
Fig. 1.3 A data file being used in more than one application.
1.1 FILE SYSTEM
When computers were introduced into the business world the data were used to be the
large sizes of files. In the file system each field or data item is stored sequentially on
disk in one large file. In order to find a particular item, the system has to search the
entire file from the beginning. It can also keep a pointer (a locator on the disk) to the
last data item retrieved so that searched for more occurrences of the same data type
don’t have to begin at start of the file.
The application to interact with these data files were written mostly in COBOL. It was
very complex to write such programmes for all data related activities like inserting,
updating and query the data. Moreover each application had its own master files
causing a huge redundancy as we may need the same data files for many applications.
Several disadvantages are associated with conventional file processing systems. These
disadvantages are given as: -
a) Limited Data Sharing: With the traditional file processing approach, each
application has its own private files and users have little opportunity to share
data outside their own applications. In addition, a major management effort
may also be required since different organizational units may own these
different files.
b) Program Data Dependence: File descriptions are stored within each
application program that accesses a given file. As a consequence, any change
to a file structure requires changes to the file descriptions for all programs that
access the file. It is often difficult even to locate all programs affected by such
changes.
c) Duplication of Data: This was due to the inability of the system to access the
same data for different applications.
d) Occurrences of inconsistencies: Occurrence of inconsistencies and other
error in data files. Because a change of information at one place would cause
the same updating at all the copies. In order to override such problems, user
requirement must be accessed and considered prior to the development and
implementation of software packages.
e) Lengthy Development Time: With traditional file processing system, there is
little opportunity to leverage previous development efforts. Each new
application requires that the developer essentially start from scratch by
designing new file formats and descriptions, and then writing the file access
logic for each new program. The lengthy development times required are often
inconsistent with today’s fast-paced business environment, in which time to
market (or time to production for an information system) is a key business
success factor.
f) Inadequate Security options: In file system we can provide the security just
at operating system level. We can not provide the authorization to access
different subsets of data for users, which may be required for many
applications.
g) Excessive Program Maintenance We had to write special programs to
answer each query, which was required by the application. These programs
were generally very complex because of the large volume of the data to be
searched.
1.2 ARCHITECTURE OF DBMS
The generally accepted method of explaining the architecture of a database system
was formalized in by ANSI/SPARC committee in 1975 and subsequently in more
detail in 1978. The knowledge of this architecture is extremely useful in describing
general concepts of the database and the structure of individual system. A major
purpose of a database system is to provide users an abstract view of the data. The
system hides certain details of how the data is stored and maintained. The ANSI-
SPARC model of a database identifies three distinct levels at which data items can be
described. The brief overview of all these three levels are given here.
Internal Level or Physical Level: At the lowest level the data elements appear as
disk storage. It contains the specifications for how data are actually stored in a
computer’s secondary memory. The internal view is expressed by the internal schema,
which contains the definition of the stored record, the method of representing the data
fields, and the access aids used. In general, the main aim of this level is to describe
how we intend physically to implement the logical database design. In the following
of STUDENT database, the internal level contains a physical description of the
structure for the conceptual record expressed in a high-level language.
Struct STUDENT {
int Enrollment_No;
char Stud_Name[15];
char F_Name[15];
char Class[15];
char Stream[15];
struct STUDENT *next; // pointer to next record
}
The physical structure contains a “pointer”, next. This will be simply the memory
address at which the next record is stored. Thus the set of student records may be
physically linked together to form a chain.
Below the internal level is the physical level, which is managed by the OS under the
direction of the DBMS. It deals with the mechanics of physically storing data on a
device such as a disk.
View Level or External Schema: This is a logical description of some portion of the
database that is required by a user to perform some task. This user view is
independent of database technology and typically contains a subset of the associated
conceptual schema, relevant to a particular user or group of users. We can have many
users' views for a given conceptual design in this way. For example, large
Organisations may have finance and stock control departments. Workers in finance
will not usually view stock details, as they are more concerned with the accounting
side of things, for example. Thus, workers in each department will require a different
user interface to the information stored in the database. Two external schemas of
STUDENT are given in the following figure
Stud_Name F_Name
External View 2
Views may provide different representations of the same data. For example, some
users might view dates in the form (day/month/year) while others prefer
(year/month/day). Some views might include derived or calculated data. For example,
a person’s age might be calculated from his date of birth since storing his age would
require it to be updated each year.
Each external view is described by means of a schema called an external schema or
subschema. The external schema consists of the definition of the logical records and
the relationship in the external view. The external schema also contains the method of
deriving the objective in the external view from the objects in the conceptual view.
The objects include entities, attributes, and relationships.
Data Definition Language: As the name suggests, DDL is used to define the
conceptual schema. It also facilitates to give the details about how to implement this
schema in physical devices used to store the data. This definition include:
- the name of schema
- the attribute of the schema along with their data types.
- constraints specifications
- relationship among various schemas.
On compilation of the DDL statement, schema (usually a table) is created. The DBMS
maintains the information of all such schemas in a special file called data dictionary.
A data dictionary is a file that contains metadata – that is, data about data. It works
like a central repository for the database. For any data manipulation operation, DBMS
has to consult this data dictionary first. For example, if a constraint is applied on the
STUDENT schema that the students can enroll themselves only in two courses that is
either B.A. and M.A. then any attempt to register a student who wish to study in other
than these two courses will be checked in the data dictionary and would give a proper
error message.
There is a special type of DDL, called DSDL (Data Storage and Definition Language)
which is used to storage structure (internal schema) and access method. The complied
internal schema specifies the implementation details of the internal database,
including the access methods, employed. This information is handled by the DBMS,
the user need not be aware of these details.
- Retrieve the data from the database, called the query, A query is a statement in
the DML that requests the retrieval of data from the database as per the
application requirement.
- Insert the new data in the database.
- Modifying the existing data.
- Deleting the data from the database.
The DML provides the commands to perform all the above said activities. These
commands can be sued in an interactive model so that a result is returned immediately
following the execution of the command or with any programming languages such as
COBOL, C, etc. Embedded DML may provide the programmer with more control
over timing of report generation. There are two types of DML, first is procedural
DML which requires writing the procedures/methods to specify what data is needed
and how to get it. Procedural DML language has all the features of any other high
level language including control constructs, error handling and other features. The
second type of DML is non-procedural DML, which needs just what data is needed
without requiring the accessing method.
1.7 DATABASE USERS
A primary goal of a database system is to provide an improved environment for
retrieving information from and storing new information into the database. There are
number of users who can interact with the system.
End Users: Usually, these are not the computer professionals. They access to the
database for querying, updating and generating reports. The main responsibilities
include constantly querying and updating the database, using standard types of queries
and updates called canned transactions that have been carefully programmed and
tested. End Users come form a diverse and increasing number of areas. They simply
use application written by database application programmers, and so require little
technical knowledge about DBMS software. For example, reservation clerks for
airlines, hotels, and car rental companies check availability for given requests and
make reservations if available. Here the user does not know the logical behind the
application.
Database Administrators
The DBMS is at the center of most modern application systems. Technology and
business requirements come together to deliver business solutions with the DBMS as
the central point of convergence. And the DBA is the guardian of the DBMS.
Each database requires at least one database administrator (DBA) to administer it.
Because a database management system can be large and can have many user. Often
this is not a one person job. In such cases, there us a group of DBAs who share
responsibility. The DBA must possess a mixture of technical expertise, political
savvy, and leadership and business knowledge to succeed.
A database administrator’s responsibilities can include the following task:
Monitoring of the growth: The DBA must also be aware of the growth of the
database. It is an expectation of the DBA to provide management with growth
forecasts so that any needed additional hardware can be ordered in a timely
manner.
To accurately build databases, and then manage data quality, integrity and security, a
thorough understanding of the data from a business perspective is mandatory. DBAs
have difficult jobs that require a delicate balance of business and technology;
leadership and understanding. Indeed, the role of the DBA is changing.
File Manager: File Manager is responsible for allocation of space on disk storage
and the data structures used to represent information stored on physical media.
Data Manager The database managers is the program or program unit that provides
the interface between the physical level and the conceptual level. It translates DML
statements into low-level file system commands to interact with the data stored in the
database.
Functions of the database manager:
interaction with the file manager (file system)
minimizing file reads and writes, as disk access is slower than main
memory access
translating DML commands to file operations
integrity enforcement
checking that consistency constraints are satisfied
taking some action when they aren't
security enforcement
preventing unauthorized access to data
example: through a password and security classification system
backup and recovery
detect when information in the database or data dictionary is lost or
corrupted due to disk crash, power failure, software errors ...
restore the database to a previous consistent state
concurrency control - making sure that concurrent updates don't give
surprising or inconsistent results.
The database manager for a small system typically does not implement all of these
functions.
Query Processor: To retrieve the desired information from the database, user has to
write the query in DML either in interactive mode or in embedded form. Query
processor transforms this query into an equivalent correct and efficient execution
strategy and sends it to the data manager for execution.
Data Dictionary: Data dictionary is used to store the metadata- that is the data about
the data. It contains a list of all schemas in the database, the number of records in each
file, the names and types of each field and relationships between different data
structures. Most database management systems keep the data dictionary hidden from
users to prevent them from accidentally destroying its contents.
Data dictionaries do not contain any actual data from the database, only bookkeeping
information for managing it. Without a data dictionary, however, a database
management system cannot access data from the database.
1.10 DISADVANTAGES
A database system generally provides on-line access to the database for many users.
In contrast, a conventional system is often designed to meet a specific need and
therefore generally provides access to only a small number of users. Because of the
larger number of users accessing the data when a database is used, the enterprise may
involve additional risks as compared to a conventional data processing system in the
following areas.
1. Confidentiality, Privacy and Security: When information is centralized and is
made available to users from remote locations, the possibilities of abuse are often
more than in a conventional system. To reduce the chances of unauthorized users
accessing sensitive information, it is necessary to take technical, administrative and,
possibly, legal measures. Most databases store valuable information that must be
protected against deliberate trespass and destruction.
2. Data Quality: Since the database is accessible to users remotely, adequate controls
are needed to control users updating data and to control data quality. With increased
number of users accessing data directly, there are enormous opportunities for users to
damage the data. Unless there are suitable controls, the data quality may be
compromised.
3. Data Integrity: Since a large number of users could be using a database
concurrently, technical safeguards are necessary to ensure that the data remain
correct during operation. The main threat to data integrity comes from several
different users attempting to update the same data at the same time. The database
therefore needs to be protected against inadvertent changes by the users.
4. Enterprise Vulnerability: Centralizing all data of an enterprise in one database
may mean that the database becomes an indispensable resource. The survival of
the enterprise may depend on reliable information being available from its
database. The enterprise therefore becomes vulnerable to the destruction of the
database or to unauthorized modification of the database.
5. The Cost of using a DBMS: Conventional data processing systems are typically
designed to run a number of well-defined, preplanned processes. Such systems are
often "tuned" to run efficiently for the processes that they were designed for.
Although the conventional systems are usually fairly inflexible in that new
applications may be difficult to implement and/or expensive to run, they are
usually very efficient for the applications they are designed for.
6. The database approach on the other hand provides a flexible alternative where
new applications can be developed relatively inexpensively. The flexible approach
is not without its costs and one of these costs is the additional cost of running
applications that the conventional system was designed for. Using standardized
software is almost always less machine efficient than specialized software.
EXERCISE