DBMS Module 1
DBMS Module 1
This University database stores student and course information.The database is a collection of
tables. A table has a set of named columns. Each column is of some data type. Database
manipulation involves querying and updating.
Example query: Retrieve class of 'Smith'.
Example of update: Change the class of ‘Smith’.
Database Approach:
When an organization uses the database approach, it allows many programs and users to share
the data in the database. Authorized users can access, retrieve and select data by a single query
from the database. As shown in the below image, various areas within the university share and
interact with the data in the database.
In the database approach, a single database is maintained that is used by all the departments in
the organization.
Disadvantages of Traditional File processing:
1. Data Redundancy – Data redundancy occurs when the same data items are stored in
multiple files. Each department in an organization has its own separate files in the file
processing system. For example, the Student contract file and the Student file store the same
students’ names and addresses.
2. Wastes resources - Duplicating data in this way is a waste of storage space. People’s time is
also wasted because file maintenance tasks require more time in updating or deleting multiple
files that contain the same data whenever the data is modified. It also Increases the chance of
errors. For example, if a student changes his or her telephone numbers, the institution must
update the student contact file. If no change is made in all the files where the data is stored, then
inconsistencies among the files exist.
3. Isolated Data – Isolated data increases the difficulty in accessing the data that is stored in
separate files in different departments. Sharing data from multiple and separate files is a
complicated procedure especially when there are many files involved.
The main characteristics of the database approach versus the file-processing approach are the
following:
1. Self-describing nature of a database system
2. Insulation between programs and data, and data abstraction
3. Support of multiple views of the data
4. Sharing of data and multi user transaction processing
● the database
● the complete definition of the database structure and
● constraints (limitations).
The information such as the structure of each table, the type and storage format of each data
item, and various constraints on the data are stored in the DBMS catalog, which is called
meta-data (data about data). It is also called a data dictionary.
The catalog is used by the DBMS software and also by database users who need information
about the database structure. These definitions are specified by the database designer before
creating the actual database and are stored in the catalog.
For example whenever a request is made to access the Name of a STUDENT record, the DBMS
software refers to the catalog to determine the structure of the STUDENT table and the position
and size of the Name data item within a STUDENT record. A part of catalog for the University
database is given below:
TABLES
3. Multiple Views of Data ( Data Abstraction): Different users (e.g., in different departments
of an organization) have different "views" on the database. A good DBMS has facilities for
defining multiple views. This is not only convenient for users, but also provides security in
data access.
For example, from the point of view of an institution’s office employee, student data does not
include which grades were earned. As another example, a university's office employee might
think that GPA is a field of data in each student's record. In reality, the underlying database
might calculate that value each time it is needed. This is called virtual (or derived) data.
4. Data Sharing and Multi-user Transaction Processing: A multi-user DBMS must allow
multiple users to access the database at the same time. The DBMS must include concurrency
control software.
The concurrency control Manager of the DBMS ensures that several users can update the same
data in a "controlled" manner. A fundamental role of multi user DBMS software is to ensure
that concurrent transactions operate correctly and efficiently.
A transaction is an executing program or process that includes one or more database accesses,
such as reading or updating of database records. Each transaction should execute a logically
correct database access without interference from other transactions. The DBMS must enforce
several transaction properties.
i) The atomicity property ensures that either all the database operations in a transaction are
executed or none are.
ii) The isolation property ensures that each transaction appears to execute in isolation from
other transactions, even though hundreds of transactions may be executing concurrently.
Applications such as airline reservation systems are known as online transaction processing
applications.
Database Users
Many people are involved in the design, use, and maintenance of a large
database with hundreds of users.
2. Database Designers
Database designers are responsible for
i) identify the data to be stored in the database
ii) choose appropriate structures to represent and store this data.
iii) communicate with all database users to understand their requirements and
iv) create a design that meets these requirements.
3. End Users
End users are the people whose jobs require access to the database for querying,
updating, and generating reports. The database primarily exists for their use.
There are several categories of end users:
■Casual end users use the database occasionally. But they may need different
information each time. They use query language to specify their requests. Casual
users learn only a few facilities that they may use repeatedly.
■ Naive or parametric end users make up a major portion of database end
users. (Naive means no experience. Parametric End Users have no DBMS knowledge
but they frequently use the database applications in their daily life to get the desired
results). They use program interfaces to use the database system. Eg: Bank tellers,
Reservation agents for airlines, hotels car rental companies etc.
■ Sophisticated end users include engineers, scientists, business analysts,
and others who thoroughly familiarize themselves with the facilities of the DBMS.
(They have knowledge about the database). They implement their own applications to
meet their complex requirements. Sophisticated users try to learn most of the DBMS
facilities in order to achieve their complex requirements.
In addition dbms provide some additional capabilities. They can be listed as follows:
1. Controlling Redundancy
Redundancy in storing the same data multiple times leads to several problems. They
are:
i) Duplication of Effort : There is the need to update multiple copies of the
same data item.
ii) Storage space is wasted when the same data is stored repeatedly.
iii) Files that represent the same data may become inconsistent.
A DBMS should provide the capability that no inconsistencies are introduced when
data is updated. Data normalization ensures consistency and saves storage space.
5. Economies of Scale.
DBMS enables the whole organization to invest in more powerful processors, storage
devices, or communication gear, rather than having each department purchase its own
(lower performance) equipment. This reduces overall costs of operation and
management.
6. Improved Data Integrity – When users modify data in the database, they can
directly make changes to one file instead of multiple files. Therefore, the mismatch in
different copies of the same data will not occur. This database approach increases the
quality and data’s integrity by reducing the possibility of causing inconsistencies.
Database System: The database and DBMS software together is called a database
system. Figure shows a simplified database system environment.
Data Models
The diagram displays the structure of each record type. The following
Figure shows a schema diagram for the University database.
Data Independence:
Data independence is the capacity to change the schema at one level of a
database system without having to change the schema at the next higher
level. There are two types of data independence:
The DBMS must provide appropriate languages and interfaces for each
category of users. In the database world, the two functions of declaration
and manipulation are separated into two different languages DDL and DML.
DDL: In many DBMSs, the data definition language (DDL), is used by the
DBA and by database designers to define schemas. The DBMS has a DDL
compiler whose function is to process DDL statements in order to identify
descriptions of the schema constructs and to store the schema description
in the DBMS catalog.
A view definition language (VDL) is used to specify user views and their
mappings to the conceptual schema, but in most DBMSs the DDL is used
to define both conceptual and external schemas.
The top part of Figure shows the following users and their intefaces
i) DBA staff
ii) casual users
iii) application programmers
iv) parametric users 3
i) The DBA staff works on defining the database and making changes to the
database definition. They use the DDL and other privileged commands.
The DDL compiler processes schema definitions, specified in the DDL, and
stores descriptions of the schemas (meta-data) in the DBMS catalog.
The catalog includes information such as the names of files, sizes of files,
data item names and data types of data items, storage details of each file,
mapping information among schemas, and constraints. (In addition, the
catalog stores many other types of information that are needed by the
DBMS modules. )
ii) Casual users occasionally need information from the database. They use
interfaces to formulate queries. These queries are parsed and validated for
correctness of the query syntax by a query compiler. The query compiler
compiles them into an internal form. (This internal query is subjected to
query optimization.) The query optimizer optimizes the query by the
rearrangement and possible reordering of operations, elimination of
redundancies, and use of correct algorithms and indexes.
iii) Application programmers write programs in host languages such as
Java, C, or C++. They are submitted to a precompiler. The precompiler
extracts DML commands from the application program. These commands
are sent to the DML compiler for compilation into object code for database
access. The rest of the program is sent to the host language compiler. The
object codes for the DML commands and the rest of the program are
linked, forming a canned transaction (standard transaction).
iv) Canned transactions are executed repeatedly by parametric users,
who simply supply the parameters to the transactions. Each execution is
considered to be a separate transaction. An example is a bank withdrawal
transaction.
In the lower part, the runtime database processor executes (1) the
privileged commands, (2) the executable query plans, and (3) the canned
transactions with runtime parameters. It works with the system catalog. It
also works with the stored data manager, which uses basic operating
system services. The runtime database processor handles other aspects of
data transfer, such as management of buffers in the main memory. Some
DBMSs have their own buffer management module while others depend on
the OS for buffer management.
The DBMS interacts with the operating system when disk accesses are
needed. If the computer system is mainly dedicated to running the
database server, the DBMS will control the main memory buffering of disk
pages. The DBMS also interfaces with compilers for general purpose host
programming languages, and with application servers and client programs
running on separate machines through the system network interface.