Database Systems Concepts Design and Applications 2nd Edition 9788131760925 978 8131760925
Database Systems Concepts Design and Applications 2nd Edition 9788131760925 978 8131760925
Database Systems Concepts Design and Applications 2nd Edition 9788131760925 978 8131760925
Second Edition
S. K. SINGH
Head
Maintenance Engineering Department (Electrical)
Tata steel Limited
Jamshedpur
Foreword
Preface
1.1 Introduction
1.2.1 Data
1.2.2 Information
1.2.5 Metadata
1.2.8 Records
1.2.9 Files
1.4 Database
Review Questions
2.1 Introduction
2.2 Schemas, Sub-schemas, and Instances
2.2.1 Schema
2.2.2 Sub-schema
2.2.3 Instances
2.5 Mappings
Review Questions
3.1 Introduction
3.6 Indexing
Review Questions
4.1 Introduction
4.3.1 Domain
Review Questions
5.1 Introduction
Review Questions
6.1 Introduction
6.2.1 Entities
6.2.2 Relationship
6.2.3 Attributes
6.2.4 Constraints
Review Questions
7.1 Introduction
7.3.1 Specialisation
7.3.2 Generalisation
7.4 Categorisation
Review Questions
8.1 Introduction
Review Questions
9.1 Introduction
9.3 Decomposition
Review Questions
Chapter 10 Normalization
10.1 Introduction
10.2 Normalization
Review Questions
11.1 Introduction
Review Questions
12.1 Introduction
12.3.2 Schedule
12.4.3 Deadlocks
Review Questions
13.1 Introduction
13.5.4 Checkpoints
Review Questions
14.1 Introduction
14.5 Firewalls
Review Questions
15.1 Introduction
15.2 Object-Oriented Data Model (OODM)
15.3.1 Objects
15.3.4 Classes
15.3.7 Operation
15.3.8 Polymorphism
Review Questions
16.1 Introduction
Review Questions
17.1 Introduction
17.4.1 Speed-up
17.4.2 Scale-up
17.4.3 Synchronization
17.4.4 Locking
Review Questions
18.1 Introduction
18.5.1 Semi-JOIN
18.6.3 Timestamping
Review Questions
19.1 Introduction
Review Questions
20.1 Introduction
Review Questions
21.1 Introduction
Review Questions
22.1 Introduction
Review Questions
23.1 Introduction
23.4.2 Installation Prerequisite: DB2 Workgroup Server Edition and Non-partitioned DB2
Enterprise Server Edition
23.6.1 Performing Installation Operation for IBM DB2 Universal Database Version 8.1
Review Questions
Chapter 24 Oracle
24.1 Introduction
24.4 SQL*Plus
Review Questions
25.1 Introduction
25.4.6 Security
Review Questions
26.1 Introduction
26.2.1 Tables
26.2.2 Queries
26.2.3 Reports
26.2.4 Forms
26.2.5 Macros
Review Questions
Chapter 27 MySQL
27.1 Introduction
Review Questions
28.1 Introduction
Review Questions
Answers
Bibliography
About the Author
S. K. GUPTA
Professor
Department of Computer science and Engineering
IIT Delhi
Preface to the Second Edition
The first edition of the book received overwhelming response from both
students and teaching faculties of undergraduate and postgraduate
engineering courses and also from practicing engineers in computer and IT-
application industries. A large number of reprints of the first edition in the
last five years indicate great demand and popularity of the book amongst
students and teaching communities.
The advancement and rapid growth in computing and communication
technologies has revolutionized computer applications in everyday life. The
dependence of the business establishment on computers is also increasing at
an accelerated pace. Thus, the key to success of a modern business is an
effective data-management strategy and interactive data-analysis
capabilities. To meet these challenges, database management has evolved
from a specialized computer application to a central component of the
modern computing environment. This has resulted in the development of
new database application platforms.
While retaining the features of previous edition, this book contains a
chapter on a new commercial database called “TERADATA Relational
Database Management System”. Teradata is a parallel processing system
and is linearly and predictably scalable in all dimensions of a database
system workload (data volume, breadth, number of users and complexity of
queries). Due to the scalability features, it is popular in enterprise data
warehousing applications.
This book also has a study card to give a brief definition of important
topics related to DBMS. This will help student to quickly grasp the subject.
Preface
ACKNOWLEDGEMENTS
S. K. SINGH
Part-I
DATABASE CONCEPTS
Chapter 1
1.1 INTRODUCTION
With the growing use of computers, the organisations are fast migrating from
a manual system to a computerised information system for which the data
within the organisation is a basic resource. Therefore, proper organisation
and management of data is necessary to run the organisation efficiently. The
efficient use of data for planning, production control, marketing, invoicing,
payroll, accounting and other function in an organisation have a major
impact for its competitive edge. In this section, formal definition of the terms
used in databases is provided.
1.2.1 Data
Data may be defined as a known fact that can be recorded and that have
implicit meaning. Data are raw or isolated facts from which the required
information is produced.
Data are distinct pieces of information, usually formatted in a special way.
They are binary computer representations of stored logical entities. A single
piece of data represents a single fact about something in which we are
interested. For an industrial organisation, it may be the fact that Thomas
Mathew’s employee (or social security) number is 106519, or that the largest
supplier of the casting materials of the organisation is located in Indore, or
that the telephone number of one of the key customers M/s Elbee Inc. is 001-
732-3931650. Similarly, for a Research and Development set-up it may be
the fact that the largest number of new products as on date is 100, or for a
training institute it may be the fact that largest enrolment were in Database
Management course. Therefore, a piece of data is a single fact about
something that we care about in our surroundings.
Data can exist in a variety of forms that have meaning in the user’s
environment such as numbers or text on a piece of paper, bits or bytes stored
in computer’s memory, or as facts stored in a person’s mind. Data can also
be objects such as documents, photographic images and even video
segments. The example of data is shown in Table 1.1.
Amount-payable Skill-type
Usually there are many facts to describe something of interest to us. For
example, let us consider the facts that as a Manager of M/s Elbee Inc., we
might be interested in our employee Thomas Mathew. We want to remember
that his employee number is 106519, his basic salary rate is Rs. 2,00,000
(US$ 4000) per month, his home town is Jamshedpur, his home country is
India, his date of birth is September 6th, 1957, his marriage anniversary is on
May 29th, his telephone number is 0091-657-2431322 and so forth. We need
to know these things in order to process Mathew’s payroll check every
month, to send him company greeting cards on his birthday or marriage
anniversary, print his salary slip, to notify his family in case of any
emergency and so forth. It certainly seems reasonable to collect all the facts
(or data) about Mathew that we need for the stated purposes and to keep
(store) all of them together. Table 1.2 shows all these facts about Thomas
Mathew that concern payroll and related applications.
Data is also known as the plural of datum, which means a single piece of
information. However, in practice, data is used as both-the singular and the
plural form of the word. The term data is often used to distinguish machine-
readable (binary) information from human-readable (textual) information.
For example, some applications make a distinction between data files (that
contain binary data) and text files (that contain ASCII data). Either numbers,
or characters or both can represent data.
Figure 1.1 shows a three-layer data structure that is generally used for data
warehousing applications (the detailed discussion on data warehouse is
given in Chapter 20).
Fig. 1.1 Three-layer data structure
1.2.2 Information
Data and information are closely related and are often used interchangeably.
Information is processed, organised or summarised data. It may be defined
as collection of related data that when put together, communicate meaningful
and useful message to a recipient who uses it, to make decision or to
interpret the data to get the meaning.
Data are processed to create information, which is meaningful to the
recipient, as shown in Fig. 1.2. For example, from the salesperson’s view, we
might want to know the current balance of a customer M/s Waterhouse Ltd.,
or perhaps we might ask for the average current balance of all the customers
in Asia. The answers to such questions are information. Thus, information
involves the communication and reception of knowledge or intelligence.
Information apprises and notifies, surprises and stimulates. It reduces
uncertainty, reveals additional alternatives or helps in eliminating irrelevant
or poor ones, influences individuals and stimulates them into action. It gives
warning signals before some thing starts going wrong. It predicts the future
with reasonable level of accuracy and helps the organisation to make the best
decisions.
Now let us modify the data in example 1.1 by adding a few additional data
and providing some structure and place the same data in a context shown in
Fig. 1.4 (a). Now data has been rearranged or processed to provide
meaningful message or information, which is an Employee Master of M/s
Metal Rolling Pvt. Ltd. Now this is useful information for the departmental
head or the organisational head for taking decisions related to the additional
requirement of experienced and qualified manpower.
1.2.5 Metadata
A metadata (also called the data dictionary) is the data about the data. It is
also called the system catalog, which is the self-describing nature of the
database that provides program-data independence. The system catalog
integrates the metadata. The metadata is the data that describes objects in the
database and makes easier for those objects to be accessed or manipulated. It
describes the database structure, constraints, applications, authorisation,
sizes of data types and so on. These are often used as an integral tool for
information resource management.
Metadata is found in documentation describing source systems. It is used
to analyze the source files selected to populate the largest data warehouse. It
is also produced at every point along the way as data goes through the data
integration process. Therefore, it is an important by-product of the data
integration process. The efficient management of a production or enterprise
warehouse relies heavily on the collection and storage of metadata. Metadata
is used for understanding the content of the source, all the conversion steps it
passes through and how it is finally described in the target system or data
warehouse.
Metadata is used by developers who rely on it to help them develop the
programs, queries, controls and procedures to manage and manipulate the
warehouse data. Metadata is also used for creating reports and graphs in
front-end data access tools, as well as for the management of enterprise-wide
data and report changes for the end-user. Change management relies on
metadata to administer all of the related objects for example, data model,
conversion programs, load jobs, data definition language (DDL), and so on,
in the warehouse that are impacted by a change request. Metadata is
available to database administrators (DBAs), designers and authorised users
as on-line system documentation. This improves the control of database
administrators (DBAs) over the information system and the users’
understanding and use of the system.
1.2.8 Records
A record is a collection of logically related fields or data items, with each
field possessing a fixed number of bytes and having a fixed data type. A
record consists of values for each field. It is an occurrence of a named
collection of zero, one, or more than one data items or aggregates. The data
items are grouped together to form records. The grouping of data items can
be achieved through different ways to form different records for different
purposes. These records are retrieved or updated using programs.
1.2.9 Files
A file is a collection of related sequence of records. In many cases, all
records in a file are of the same record type (each record having an identical
format). If every record in the file has exactly the same size (in bytes), the
file is said to be made up of fixed-length records. If different records in the
file have different sizes, the file is said to be made of variable-length
records.
Table 1.3 Employee payroll file for M/s Metal Rolling Pvt. Ltd.
Data dictionary is usually a part of the system catalog that is generated for
each database. A useful data dictionary system usually stores and manages
the following types of information:
Descriptions of the schema of the database.
Detailed information on physical database design, such as storage structures, access paths
and file and record sizes.
Description of the database users, their responsibilities and their access rights.
High-level descriptions of the database transactions and applications and of the relationships
of users to transactions.
The relationship between database transactions and the data items referenced by them. This is
useful in determining which transactions are affected when certain data definitions are
changed.
Usage statistics such as frequencies of queries and transactions and access counts to different
portions of the database.
1.3.1.1 Entities
Entity is the real physical object or an event; the user is interested in keeping
track of. In other words, any item about which information is stored is called
entity. For example, in Fig. 1.9 (b), Thomas Mathew is a real living person
and an employee of M/s ABC Motors Ltd., is an entity for which the
company is interested in keeping track of the various details or facts.
Similarly, in Fig. 1.9 (a), Maharaja model car (Model no. M-1000) is a real
physical object manufactured by M/s ABC Motors Ltd., is an entity. A
collection of the entities of the same type, for example “all” of the
company’s employees (the rows in EMPLOYEE file in Fig. 1.9 (b)), and
“all” the company’s model (the rows in INVENTORY file in Fig. 1.9 (a)) are
called an entity set. In other words, we can say that, a record describes the
entity and a file describes an entity set.
1.3.1.2 Attributes
An attribute is a property or characteristic (field) of an entity. In Fig. 1.9 (b),
Mathew’s EMP-NO, EMP-SALARY and so forth, all are his attributes.
Similarly, in Fig. 1.9 (a), Maharaja car’s MOD-NO, MOD-DESC, UNIT-
PRICE and so forth, all are its attributes. In other words, we can say that,
values in all the fields are attributes. Fig. 1.12 shows an example of an entity
set and its attributes.
1.3.1.3 Relationships
The associations or the ways that different entities relate to each other is
called relationships, as shown in Fig. 1.11. The relationship between any pair
of entities of a data dictionary can have value to some part or department of
the organisation. Some data dictionaries define limited set of relationships
among their entities, while others allow the relationship between every pair
of entities. Some examples of common data dictionary relationships are
given below:
Record construction: for example, which field appears in which records.
Security: for example, which user has access to which file.
Impact of change: for example, which programs might be affected by changes to which files.
Physical residence: for example, which files are residing in which storage device or disk
packs.
Program data requirement: for example, which programs use which file.
Responsibility: for example, which users are responsible for updating which files.
Let us take the example shown in Fig. 1.9 (b), wherein there is only one
EMP-NO (employee identification number) in the EMPLOYEE file of
personnel department for each employee, which is unique. This is called
unary associations or one-to-one (1:1) relationship, as shown in Fig. 1.13
(a).
Now let us assume that an employee belongs to a manufacturing
department. While for a given employee there is one manufacturing
department, in the manufacturing department there may be many employees.
Thus, in this case, there is one-to-one relationship in one direction and a
multiple association in the other direction. This combination is called one-to-
many (1:m) relationship, as shown in Fig. 1.13 (b).
Fig. 1.13 Entity relationship (ER) diagram
1.3.1.4 Key
The data item (or field) for which a computer uses to identify a record in a
database system is referred to as key. In other words, key is a single attribute
or combination of attributes of an entity set that is used to identify one or
more instances of the set. There are various types of keys.
Primary key
Concatenated key
Secondary key
Super key
Primary key is used to uniquely identify a record. It is also called entity
identifier, for example, EMP-NO in the EMPLOYEE file of Fig. 1.9 (b) and
MOD-NO in the INVENTORY file of Fig. 1.9 (a). When more than one data
item is used to identify a record, it is called concatenated key, for example,
both EMP-NO and EMP-FNAME in EMPLOYEE file of Fig. 1.9 (b) and
both MOD-NO and MOD-TYPE in INVENTORY file of Fig. 1.9 (a).
Secondary key is used to identify all those records, which have a certain
property. It is an attribute or combination of attributes that may not be a
concatenated key but that classifies the entity set on a particular
characteristic. In Super key includes any number of attributes that possess a
uniqueness property. For example, if we add additional attributes to a
primary key, the resulting combination would still uniquely identify an
instance of the entity set. Such keys are called super keys. Thus, a primary
key is a minimum super key.
1.4 DATABASE
As explained in the earlier sections, data (or data item) is a distinct piece
of information. Relationships represent a correspondence (or
communication) between various data elements. Constraints are predicates
that define correct database states. Schema describes the organisation of data
and relationships within the database. It defines various views of the
database for the use of the various system components of the database
management system and for application security. A schema separates the
physical aspect of data storage from the logical aspects of data
representation.
An organisation of a database is shown in Fig. 1.15. It consists of the
following three independent levels:
Physical storage organisation or internal schema layer
Overall logical organisation or global conceptual schema layer
Programmers’ logical organisation or external schema layer.
Fig. 1.15 Database organisation
The internal schema defines how and where the data are organised in
physical data storage. The conceptual schema defines the stored data
structure in terms of the database model used. The external schema defines a
view of the database for particular users. A database management system
provides for accessing the database while maintaining the required
correctness and consistency of the stored data.
Data description language (DDL): It allows users to define the database, specify the data
types, and data structures, and the constraints on the data to be stored in the database, usually
through data definition language. DDL translates the schema written in a source language
into the object schema, thereby creating a logical and physical layout of the database.
Data manipulation language (DML) and query facility: It allows users to insert, update,
delete and retrieve data from the database, usually through data manipulation language
(DML). It provides general query facility through structured query language (SQL).
Software for controlled access of database: It provides controlled access to the database,
for example, preventing unauthorized user trying to access the database, providing a
concurrency control system to allow shared access of the database, activating a recovery
control system to restore the database to a previous consistent state following a hardware or
software failure and so on.
The database and DBMS software together is called a database system. A
database system overcomes the limitations of traditional file-oriented system
such as, large amount of data redundancy, poor data control, inadequate data
manipulation capabilities and excessive programming effort by supporting
an integrated and centralized data structure.
Let us take an example of M/s Metal Rolling Pvt. Ltd. having a very small
database containing just one, called EMPLOYEE, as shown in Table 1.4.
The EMPLOYEE file in turn contains data concerning the details of
employee working in the company. Fig. 1.17 depicts the various operations
that can be performed on EMPLOYEE file and the results thereafter
displayed on the computer screen.
Table 1.4 EMPLOYEE file of M/s Metal Rolling Pvt. Ltd.
With the assistance of DP department, the files were used for a number of
different applications by the user departments, for example, account
receivable program written to generate billing statements for customers. This
program used the CUSTOMER and SALES files and these files were both
stored in the computer in order by CUST-ID and were merged to create a
printed statement. Similarly, sales statement generation program (using
PRODUCT and SALES files) was written to generate product-wise sales
performance. This type of program, which accomplishes a specific task of
practical value in a business situation is called application program or
application software. Each application program that is developed is designed
to meet the specific needs of the particular requesting department or user
group.
Fig. 1.18 illustrates structures in which application programs are written
specifically for each user department for accessing their own files. Each set
of departmental programs handles data entry, file maintenance and the
generation of a fixed set of specific reports. Here, the physical structure and
storage of the data files and records are defined in the application program.
For example:
It can be seen from the above examples that there is significant amount of
duplication of data storage in different departments (for example, CUST-ID
and PROD-ID), which is generally true with file-oriented system.
Fig. 1.19 shows an example of data inconsistency in which a field for product description is
being shown by all the three department files, namely SALES, PRODUCT and
ACCOUNTS. It can been seen in this example that even though it was always the product
description, the related field in all the three department files often had a different name, for
example, PROD-DESC, PROD-DES and PRODDESC. Also, the same data field might have
different length in the various files, for example, 15 characters in SALES file, 20 characters
in PRODUCT file and 10 characters in ACCOUNTS file. Furthermore, suppose a product
description was changed from steel cabinet to steel chair. This duplication (or redundancy) of
data increased the maintenance overhead and storage costs. As shown in Fig. 1.19, the
product description filed might be immediately updated in the SALES file, updated
incorrectly next week in the PRODUCT file as well as ACCOUNT file. Over a period of
time, such discrepancies can cause serious degradation in the quality of information
contained in the data files and can also affect the accuracy of reports.
c. Program-data dependence: As we have seen, file descriptions (physical structure, storage
of the data files and records) are defined within each application program that accesses a
given file. For example, “Account receivable program” of Fig. 1.18 accesses both
CUSTOMER file and SALES file. Therefore, this program contains a detailed file
description for both these files. As a consequence, any change for a file structure requires
changes to the file description for all programs that access the file. It can also be noticed in
Fig. 1.18 that SALES file has been used in both “Account receivable program” and “Sales
statement program”. If it is decided to change the CUST-ID field length from 4 characters to
6 characters, the file descriptions in each program that is affected would have to be modified
to confirm to the new file structure. It is often difficult to even locate all programs affected by
such changes. It could be very time consuming and subject to error when making changes.
This characteristic of file-oriented system is known as program-data dependence.
d. Poor data control: As shown in Fig. 1.19, a file-oriented system being decentralised in
nature, there was no centralised control at the data element (field) level. It could be very
common for the data field to have multiple names defined by the various departments of an
organisation and depending on the file it was in. This could lead to different meanings of a
data field in different context, and conversely, same meaning for different fields. This leads
to a poor data control, resulting in a big confusion.
e. Limited data sharing: There is limited data sharing opportunities with the traditional file-
oriented system. Each application has its own private files and users have little opportunity to
share data outside their own applications. To obtain data from several incompatible files in
separate systems will require a major programming effort. In addition, a major management
effort may also be required since different organisational units may own these different files.
f. Inadequate data manipulation capabilities: Since File-oriented systems do not provide
strong connections between data in different files and therefore its data manipulation
capability is very limited.
g. Excessive programming effort: There was a very high interdependence between program
and data in file-oriented system and therefore an excessive programming effort was required
for a new application program to be written. Even though an existing file may contain some
of the data needed, the new application often requires a number of other data fields that may
not be available in the existing file. As a result, the programmer had to rewrite the code for
definitions for needed data fields from the existing file as well as definitions of all new data
fields. Therefore, each new application required that the developers (or programmers)
essentially start from scratch by designing new file formats and descriptions and then write
the file access logic for each new program. Also, both initial and maintenance programming
efforts for management information applications were significant.
h. Security problems: Every user of the database system should not be allowed to access all
the data. Each user should be allowed to access the data concerning his area of application
only. Since, applications programs are added to the file-oriented system in an ad hoc manner,
it was difficult to enforce such security system.
Data: From the user’s point of view, the most important component of
database system is perhaps the data. The term data has been explained in
Section 1.2.1. The totality of data in the system is all stored in a single
database, as shown in Fig. 1.20 (b). These data in a database are both
integrated and shared in a system. Data integration means that the database
can be thought of as a function of several otherwise distinct files, with at
least partly eliminated redundancy among the files. Whereas in data sharing,
individual pieces of data in the database can be shared among different users
and each of those users can have access to the same piece of data, possibly
for different purposes. Different users can effectively even access the same
piece of data concurrently (at the same time). Such concurrent access of data
by different users is possibly because of the fact that the database is
integrated.
Depending on the size and requirement of an organisation or enterprise,
database systems are available on machines ranging from the small personal
computers to the large mainframe computers. The requirement could be a
single-user system (in which at most one user can access the database at a
given time) or multi-user system (in which many users can access the
database at the same time).
Hardware: All the physical devices of a computer are termed as
hardware. The computer can range from a personal computer
(microcomputer), to a minicomputer, to a single mainframe, to a network of
computers, depending upon the organisation’s requirement and the size of
the database. From the point of view of the database system the hardware
can be divided into two components:
The processor and associated main memory to support the execution of database system
(DBMS) software and
The secondary (or external) storage devices (for example, hard disk, magnetic disks, compact
disks and so on) that are used to hold the stored data, together with the associated peripherals
(for example, input/output devices, device controllers, input/output channels and so on).
A database system requires a minimum amount of main memory and disk
space to run. With a large number of users, a very large amount of main
memory and disk space is required to maintain and control the huge quantity
of data stored in a database. In addition, high-speed computers, networks and
peripherals are necessary to execute the large number of data access required
to retrieve information in an acceptable amount of time. The advancement in
computer hardware technology and development of powerful and less
expensive computers, have resulted into increased database technology
development and its application.
Software: Software is the basic interface (or layer) between the physical
database and the users. It is most commonly known as database management
system (DBMS). It comprises the application programs together with the
operating system software. All requests from the users to access the database
are handled by DBMS. DBMS provides various facilities, such as adding
and deleting files, retrieving and updating data in the files and so on.
Application software is generally written by company employees to solve a
specific common problem.
Application programs are written typically in a third-generation
programming language (3GL), such as C, C++, Visual Basic, Java, COBOL,
Ada, Pascal, Fortran and so on, or using fourth-generation language (4GL),
such as SQL, embedded in a third-generation language. Application
programs use the facilities of the DBMS to access and manipulate data in the
database, providing reports or documents needed for the information and
processing needs of the organisation. The operating system software
manages all hardware components and makes it possible for all other
software to run on the computers.
Users: The users are the people interacting with the database system in
any form. There could be various categories of users. The first category of
users is the application programmers who write database application
programs in some programming language. The second category of users is
the end users who interact with the system from online workstations or
terminals and accesses the database via one of the online application
programs to get information for carrying out their primary business
responsibilities. The third category of users is the database administrators
(DBAs), as explained in Section 1.7, who manage the DBMS and its proper
functioning. The fourth category of users is the database designers who
design the database structure.
m. Increased concurrency: DBMSs manage concurrent databases access and prevents the
problem of loss of information or loss of integrity.
n. Reduced program maintenance: The problems of high maintenance effort required in
file-oriented system, as explained in Section 1.8.2 (g), are reduced in database system. In a
file-oriented environment, the descriptions of data and the logic for accessing data are built
into individual application programs. As a result, changes to data formats and access methods
inevitably result in the need to modify application programs. In database environment, data
are more independent of the application programs.
o. Improved backup and recovery services: DBMS provides facilities for recovering from
hardware or software failures through its back up and recovery subsystem. For example, if
the computer system fails in the middle of a complex update program, the recovery
subsystem is responsible and makes sure that the database is restored to the state it was in
before the program started executing. Alternatively, the recovery subsystem ensures that the
program is resumed from the point at which it was interrupted so that its full effect is
recorded in the database.
p. Improved data quality: The database system provides a number of tools and processes to
improve data quality.
From the earliest days of computers, storing and manipulation of data have
been a major application focus. Historically, the initial computer applications
focused on clerical tasks, for example, employee’s payroll calculation, work
scheduling of a manufacturing industry, order and entry processing and so
on. Based on the request from the users, such applications accessed data
stored in computer files, converted stored data into information, and
generated various reports useful for the organisation. These were called file-
based systems. Decades-long evolution in computer technology, data
processing and information management, have resulted into development of
sophisticated modern database system. Due to the needs and demands of
organisations, database technology has developed from the primitive file-
based methods of the fifties to the powerful integrated database systems of
today. The file-based system still exists in specific areas of applications. Fig.
1.22 illustrates the evolution of database system technologies in the last
decades.
During 1960s, the US President, Mr. Kennedy initiated a project called
“Apollo Moon Landing”, with an objective of landing of man on the moon
by the end of that decade. The project expected to generate a large volume of
data and there was no system available at that time. File-based system was
unable to handle such voluminous data. Database systems were first
introduced during this time to handle such requirements. The North
American Aviation (now known as Rockwell International), which was the
prime contractor for the project, developed a software known as Generalized
Update Access Method (GAUM) to meet the voluminous data processing
demand of the project. GAUM software was based on the concept that
smaller components come together as parts of larger components, and so on,
until the final product is assembled. This structure confirmed to an up-down
tree and was named as hierarchical structure. Thereafter, database systems
have continued to evolve during subsequent decades.
In practice, the data definition and data manipulation languages are not
two separate languages. Instead they simply form parts of a single database
language and a comprehensive integrated language is used such as the
widely used structured query language (SQL). SQL represents combination
of DDL, VDL and DML, as well as statements for constraints specification
and schema evaluation. It includes constructs for conceptual schema
definition view definition, and data manipulation.
Example 1
CREATE TABLE PRODUCT
(PROD-ID CHAR (6),
PROD-DESC CHAR (20),
UNIT-COST NUMERIC (4);
Example 2
CREATE TABLE CUSTOMER
(CUST-ID CHAR (4),
CUST-NAME CHAR (20),
CUST-STREET CHAR (25),
CUST-CITY CHAR (15)
CUST-BAL NUMERIC (10);
Example 3
CREATE TABLE SALES
(CUST-ID CHAR (4),
PROD-ID CHAR (6),
PROD-QTY NUMERIC (3),
For example, let us look at the following statements of DML that are
specified to retrieve data from tables shown in Fig. 1.24.
Fig. 1.24 Retrieve data from tables using DML
Example 1
SELECT PRODUCT.PROD-DESC
FROM PRODUCT
WHERE PROD-ID = ‘B4432’;
The above query (or DML statement) specifies that those rows from the
table PRODUCT where the PROD-ID is B4432 should be retrieved and the
PROD-DESC attribute of these rows should be displayed on the screen.
Once this query is run for table PRODUCT, as shown in Fig. 1.24 (a), the
result will be displayed on the computer screen as shown below.
B44332 Freeze
Example 2
SELECT CUSTOMER.CUST-ID,
CUSTOMER.CUST-NAME,
FROM CUSTOMER
WHERE CUST-CITY = ‘Mumbai’;
The above query (or DML statement) specifies that those rows from the
table CUSTOMER where the CUST-CITY is INDIA will be retrieved. The
CUST-ID, CUST-NAME and CUST-TEL attributes of these rows will be
displayed on the screen.
Once this query is run for table PRODUCT, as shown in Fig. 1.24 (b), the
result will be displayed on the computer screen as shown below.
DML query may be used for retrieving information from more than one
table as explained in example 3 below.
Example 3
SELECT CUSTOMER.CUST-NAME
CUSTOMER.CUST-BAL
FROM SALES.PROD-ID
WHERE SALES.PROD-ID = ‘B23412’
AND CUSTOMER.CUST-ID = SALES.CUST-ID;
The above query (or DML statement) specifies that those rows from the
tables CUSTOMER and SALES where the PROD-ID = B23412 and CUST-
ID is same in both the tables will be retrieved and the CUST-BAL attribute
of that row will be displayed on the screen.
Once this query is run for tables CUSTOMER and SALES, as shown in
Fig. 1.24 (b) and (c), the result will be displayed on the computer screen as
shown below.
There are two ways of accessing (or retrieving) data from the database. In
one way, an application program issues an instruction (called embedded
statements) to the DBMS to find certain data in the database and returns it to
the program. This is called procedural DML. Procedural DML allows the
user to tell the system what data is needed and exactly how to retrieve the
data. Procedural DML retrieves a record, processes it and retrieves another
record based on the results obtained by this processing and so on. The
process of such retrievals continues until the data request from the retrieval
has been obtained. Procedural DML is embedded in a high-level language,
which contains constructs to facilitate iteration and handle navigational
logic.
In the second way of accessing the data, the person seeking data sits down
at a computer display terminal and issues a command in a special language
(called query) directly to the DBMS to find certain data and returns it to the
display screen. This is called non-procedural DML (or declarative
language). Non-procedural DML allows the user to state what data are
needed, rather than how they are to be retrieved.
DBMS translates a DML statement into a procedure (or set of procedures)
that manipulates the required set of records. This removes the concern of the
user to know how data structures are internally implemented, what
algorithms are required to retrieve and how to transform the data. This
provides users with a considerable degree of data independence.
Structured query language (SQL) and query by example (QBE) are the
examples of fourth-generation language.
All work that logically represents a single unit is called transaction. The
sequence of database operations that represents a logical unit of work is
grouped together as a single transaction and access a database and
transforms it from one state to another. A transaction can update a record,
delete a record, modify a set of records and so on. When the DBMS does a
‘commit’, the changes made by transaction are made permanent. If the
changes are not be made permanent, the transaction can be ‘rollback’ and the
database will remain in its original state.
When updates are performed on a database, we need some way to
guarantee that a set of updates will succeed all at once or not at all.
Transaction ensures that all the work completes or none of it affects the
database. This is necessary in order to keep the database in a consistent state.
For example, a transaction might involve transferring money from a bank
saving account of a person to a checking account. While this would typically
involve two separate database operations. First a withdrawal from the
savings account and then a deposit into the checking account. It is logically
considered one unit of work. It is not acceptable to do one operation and not
the other operation because that would violate integrity of the database.
Thus, both withdrawal and deposit must be completed (committed) or partial
transaction must be aborted (rolled-back), so that uncompleted work does
not affect database.
Consider another example of a railway reservation system in which at any
given instant, it is likely that several travel agents are looking for
information about available seats on various trains and routes and making
new reservations. When several users (travel agents) access the railway
database concurrently, the DBMS must order their request carefully to avoid
conflicts. For example, when one travel agent looks for a train no. 8314 on
some given day and finds an empty seat, another travel agent may
simultaneously be making a reservation for the same seat, thereby making
the information seen by the first agent obsolete.
Through transaction management feature, database management system
must protect users from the effect of system failures or crashes. DBMS
ensures that all data and status is restored to a consistent state when system
is restarted after a crash or failure. For example, if the travel agent asks for a
reservation to be made and the DBMS has responded saying that the
reservation has been made, the reservation is not lost even if the system
crashes or fails. On the other hand, if the DBMS has not yet responded to the
request, but is in the process of making the necessary changes to the data
while the crash occurs, the partial changes are not affected in the database
when the system is restored.
Transaction has, generally, following four properties, called ACID:
Atomicity
Consistency
Isolation
Durability
R Q
1. What is data?
2. What do you mean by information?
3. What are the differences between data and information?
4. What is database and database system? What are the elements of database system?
5. Why do we need a database?
6. What is system catalog?
7. What is database management system? Why do we need a DBMS?
8. What is transaction?
9. What is data dictionary? Explain its function with a neat diagram.
10. What are the components of data dictionary?
11. Discuss active and passive data dictionaries.
12. What is entity and attribute? Give some examples of entities and attributes in a
manufacturing environment.
13. Name some entities and attributes with which an educational institution would be concerned.
14. Name some entities and attributes related to a personnel department and storage warehouse.
15. Why are relationships between entities important?
16. Describe the relationships among the entities you have found in Questions 13 and 14.
17. Outline the advantages of implementing database management system in an organisation.
18. What is the difference between a data definition language and a data manipulation language?
19. The data file shown in Table 1.6 is used in the data processing system of M/s ABC Motors
Ltd., which makes cars of different models.
a. Name one of the entities described in the data file. How would you describe the
entity set?
b. What are the attributes of the entities? Choose one of the entities and describe it.
c. Choose one of the attributes and discuss the nature of the set of values that it can
take.
20. What do you mean by redundancy? What is the difference between controlled and
uncontrolled redundancy? Illustrate with examples.
21. Define the following terms:
a. Data
b. Database
c. Database system
d. DBMS
e. Database catalog
f. DBA
g. Metadata
h. DA
i. End user
j. Security
k. Data Independence
l. Data Integrity
m. Files
n. Records
o. Data warehouse.
a. Data administrator
b. Database administrator
c. Application developer
d. End users.
38. Show the effects of the following SQL operation on the EMLOYEE file of M/s KLY System
Ltd. of Table 1.7.
a. Get employee’s number, employee’s name and telephone number for all employees
of DP department.
b. Get employee’s number, employee’s name, department and telephone number for all
employees of Indian origin.
c. Add 250 in the salary of employees belonging to USA.
d. Remove all records of employees getting salary of more than 6000.
e. Add a new employee details whose details are as follows: employee no.: 106520,
last name: Joseph, first name: Gorge, salary: 8200, country: AUS, birth place:
Melbourne, department: DP, and telephone no.: 334455661
40. List the DDL statements to be given to create three tables shown in Fig. 1.25.
41. Show the effects of the following DML statements on the EMPLOYEE file of M/s KLY
System Ltd., of Table 1.7. For example, let us look at the following statements of DML that
are specified to retrieve data from tables shown in Fig. 1.24.
42. A personnel department of an enterprise has structure of a EMPLOYEE data file, as shown in
Table 1.8.
Table 1.8 EMPLOYEE data file of an enterprise
a. How many records does the file contain, and how many fields are there per record?
b. What data redundancies do you detect and how could these redundancies lead to
anomalies?
c. If you wanted to produce a listing of the database file contents by the last name,
city’s name, country’s name and telephone number, how would you alter the file
structure?
d. What problem would you encounter if you wanted to produce a listing by city? How
would you solve this problem by altering the file structure?
a. Technical university
b. Public library
c. General hospital
d. Departmental store
e. Fastfood restaurant
f. Software marketing company.
For each such entity set, list the attributes that could be used to model each of the entities.
What are some of the applications that may be automated for the above enterprise using a
DBMS?
44. Datasoft Inc. is an enterprise involved in the design, development, testing and marketing of
software for auto industry (two-wheeler). What entities is of interest to such an enterprise?
Give a list of these entities and the relationships among them.
45. Some of the entities relevant to a technical university are given below.
For each of them, indicate the type of relationship existing among them (for example, one-to-
one, one-to-many or many-to-many). Draw a relationship diagram for each of them.
STATE TRUE/FALSE
a. data
b. communication
c. knowledge
d. all of these.
2. Data is:
a. a piece of fact
b. metadata
c. information
d. none of these.
3. Which of the following is element of the database?
a. data
b. constraints and schema
c. relationships
d. all of these.
a. data
b. constraints
c. relationships
d. schema.
a. security enforcement
b. avoidance of redundancy
c. reduced inconsistency
d. all of these.
a. independent
b. secure
c. shared
d. all of these.
7. The name of the system database that contains descriptions of data in the database is:
a. data dictionary
b. metadata
c. table
d. none of these.
a. operational
b. EDW
c. data mart
d. all of these.
a. database objects
b. data dictionary information
c. user access information
d. all of these.
10. A file is a collection of related sequence of records:
a. related records
b. related fields
c. related data items
d. none of these.
a. one-to-one relationship
b. one-to-many relationships
c. many-to-many relationships
d. all of these.
a. data inconsistency
b. duplication of data
c. data dependence
d. all of these.
a. increased productivity
b. improved security
c. economy of scale
d. all of these.
a. network model
b. hierarchical model
c. relational model
d. all of these.
a. Bachman
b. Codd
c. James Gray
d. None of them.
17. DSDL is used to specify:
a. internal schema
b. external schema
c. conceptual schema
d. none of these.
a. internal schema
b. external schema
c. conceptual schema
d. none of these.
a. query languages
b. report generators
c. spreadsheets
d. all of these.
2.1 INTRODUCTION
Fig. 2.3 shows the schema diagram and the relationships for another
example of purchasing system of M/s KLY System. The purchasing system
schema has three records (or objects) namely PURCHASE-ORDER,
SUPPLIER, PURCHASE-ITEM, QUOTATION and PART. Solid arrows
connecting different blocks show the relationships among the objects. For
example, the PURCHASE-ORDER record is connected to the PURCHASE-
ITEM records of which that purchase order is composed and the SUPPLIER
record to the QUOTATION records showing the parts that supplier can
provide and so forth. The dotted arrows show the cross-references between
attributes (or data items) of different objects or records.
As can be seen in Fig. 2.3 (c), the duplication of attributed are avoided
using relationships and cross-referencing. For example, the attributes SUP-
NAME, SUP-ADD and SUP-DETAILS are included in separate SUPPLIER
record and not in the PURCHASE-ORDER record. Similarly, attributes such
as PART-NAME, PART-DETAILS and QTY-ON-HAND are included in
separate PART record and not in the PURCHASE-ITEM record. Thus, the
duplication of including PART-DETAILS and SUPPLIERS in every
PURCHASE-ITEM is avoided. With the help of relationships and cross-
referencing, the records are linked appropriately with each other to complete
the information and data is located quickly.
Fig. 2.3 Schema diagram for database of M/s KLY System
2.2.2 Subschema
A subschema is a subset of the schema and inherits the same property that a
schema has. The plan (or scheme) for a view is often called subschema.
Subschema refers to an application programmer’s (user’s) view of the data
item types and record types, which he or she uses. It gives the users a
window through which he or she can view only that part of the database,
which is of interest to him. In other words, subschema defines the portion of
the database as “seen” by the application programs that actually produced
the desired information from the data contained within the database.
Therefore, different application programs can have different view of data.
Fig. 2.4 shows subschemas viewed by two different application programs
derived from the example of Fig. 2.3.
As shown in Fig. 2.4, the SUPPLIER-MASTER record of first application
program {Fig. 2.4 (a)} now contains additional attributes such a SUP-
NAME and SUP-ADD from SUPPLIER record of Fig. 2.3 and the
PURCHASE-ORDER-DETAILS record contains additional attributes such
as PART-NAME, SUP-NAME and PRICE from two records PART and
SUPPLIER respectively. Similarly, ORDER-DETAILS record of second
application program {Fig. 2.4 (b)} contains additional attributes such as
SUP-NAME, and QTY-ORDRD form two records SUPPLIER and
PURCHASE-ITEM respectively.
Individual application programs can change their respective subschema
without effecting subschema views of others. The DBMS software derives
the subschema data requested by application programs from schema data.
The database administrator (DBA) ensures that the subschema requested by
application programs is derivable from schema.
Fig. 2.4 Subschema views of two applications programs
2.2.3 Instances
When the schema framework is filled in the data item values or the contents
of the database at any point of time (or current contents), it is referred to as
an instance of the database. The term instance is also called as state of the
database or snapshot. Each variable has a particular value at a given instant.
The values of the variables in a program at a point in time correspond to an
instance of a database schema, as shown in Fig. 2.5.
The difference between database schema and database state or instance is
very distinct. In the case of a database schema, it is specified to DBMS when
new database is defined, whereas at this point of time, the corresponding
database state is empty with no data in the database. Once the database is
first populated with the initial data, from then on, we get another database
state whenever an update operation is applied to the database. At any point
of time, the current state of the database is called the instance.
Fig. 2.5 Instance of the database of M/s ABC Company
For the first time in 1971, Database Task Group (DBTG) appointed by the
Conference on Data Systems and Languages (CODASYL), produced a
proposal for general architecture for database systems. The DBTG proposed
a two-tier architecture as shown in Fig. 2.1 (a) with a system view called the
schema and user views called subschemas. In 1975, ANSI-SPARC
(American National Standards Institute — Standards Planning and
Requirements Committee) produced a three-tier architecture with a system
catalog. The architecture of most commercial DBMSs available today is
based to some extent on ANSI-SPARC proposal.
ANSI-SPARC three-tier database architecture is shown in Fig. 2.6. It
consists of following three levels:
Internal level,
Conceptual level,
External level.
The conceptual level supports each external view, in that any data
available to a user must be contained in, or derived from, the conceptual
level. However, this level must not contain any storage-dependent details.
For example, the description of an entity should contain only data types of
attributes (for example, integer, real, character and so on) and their length
(such as the maximum number of digits or characters), but not any storage
consideration, such as the number of bytes occupied. The choice of relations
and the choice of field (or data item) for each relation, is not always obvious.
The process of arriving at a good conceptual schema is called conceptual
database design. The conceptual schema is written using conceptual data
definition language (conceptual DDL).
2.5 MAPPINGS
The three schemas and their levels discussed in Section 2.3 are the
description of data that actually exists in the physical database. In the three-
schema architecture database system, each user group refers only to its own
external schema. Hence, the user’s request specified at external schema level
must be transformed into a request at conceptual schema level. The
transformed request at conceptual schema level should be further
transformed at internal schema level for final processing of data in the stored
database as per user’s request. The final result from processed data as per
user’s request must be reformatted to satisfy the user’s external view. The
process of transforming requests and results between the three levels are
called mappings. The database management system (DBMS) is responsible
for this mapping between internal, conceptual and external schemas. The
three-tier architecture of ANSI-SPARC model provides the following two-
stage mappings as shown in Fig. 2.9:
Conceptual/Internal mapping
External/Conceptual mapping
i. Users issue a query using particular database language, for example, SQL commands.
ii. The passed query is presented to a query optimiser, which uses information about how the
data is stored to produce an efficient execution plan for evaluating the query.
iii. The DBMS accepts the users SQL commands and analyses them.
iv. The DBMS produces query evaluation plans, that is, the external schema for the user, the
corresponding external/conceptual mapping, the conceptual schema, the conceptual/internal
mapping, and the storage structure definition. Thus, an evaluation plan is a blueprint for
evaluating a query.
v. The DBMS executes these plans against the physical database and returns the answers to the
users.
Authorization control: The authorization control module checks that the user has
necessary authorization to carry out the required operation.
Command processor: The command processor processes the queries passed by
authorization control module.
Integrity checker: The integrity checker checks for necessary integrity
constraints for all the requested operations that changes the database.
Query optimizer: The query optimizer determines an optimal strategy for the
query execution. It uses information on how the data is stored to produce an
efficient execution plan for evaluating query.
Transaction manager: The transaction manager performs the required processing
of operations it receives from transactions. It ensures that (a) transactions request
and release locks according to a suitable locking protocol and (b) schedules the
execution of transactions.
Scheduler: The scheduler is responsible for ensuring that concurrent operations
on the database proceed without conflicting with one another. It controls the relative
order in which transaction operations are executed.
Data manager: The data manager is responsible for the actual handling of data in
the database. This module has the following two components:
Recovery manager: The recovery manager ensures that the database remains in a consistent
state in the presence of failures. It is responsible for (a) transaction commit and abort operations,
(b) maintaining a log, and (c) restoring the system to a consistent state after a crash.
Buffer manager: The buffer manager is responsible for the transfer of data between the main
memory and secondary storage (such as disk or tape). It brings in pages from the disk to the main
memory as needed in response to read user requests. Buffer manager is sometimes referred as the
cache manager.
iii. DML processor: Using a DML compiler, the DML processor converts the DML
statements embedded in an application program into standard function calls in the host
language. The DML compiler converts the DML statements written in a host programming
language into object code for database access. The DML processor must interact with the
query processor to generate the appropriate code.
iv. DDL processor: Using a DDL compiler, the DDL processor converts the DDL statements
into a set of tables containing metadata. These tables contain the metadata concerning the
database and are in a form that can be used by other components of the DBMS. These tables
are then stored in the system catalog while control information is stored in data file headers.
The DDL compiler processes schema definitions, specified in the DDL and stores description
of the schema (metadata) in the DBMS system catalog. The system catalog includes
information such as the names of data files, data items, storage details of each data file,
mapping information amongst schemas, and constraints.
i. Data Storage Management: The DBMS creates the complex structures required for data
storage in the physical database. It provides a mechanism for management of permanent
storage of the data. The internal schema defines how the data should be stored by the storage
management mechanism and the storage manager interfaces with the operating system to
access the physical storage. This relieves the users from the difficult task of defining and
programming the physical data characteristics. The DBMS provides not only for the data, but
also for related data entry forms or screen definitions, report definitions, data validation rules,
procedural code, structure to handle video and picture formats, and so on.
ii. Data Manipulation Management: A DBMS furnishes users with the ability to retrieve,
update and delete existing data in the database or to add new data to the database. It includes
a DML processor component (as shown in Fig. 2.10) to deal with the data manipulation
language (DML).
iii. Data Definition Services: The DBMS accepts the data definitions such as external
schema, the conceptual schema, the internal schema, and all the associated mappings in
source form. It converts them to the appropriate object form using a DDL processor
component (as shown in Fig. 2.10) for each of the various data definition languages (DDLs).
iv. Data Dictionary/System Catalog Management: The DBMS provides a data dictionary or
system catalog function in which descriptions of data items are stored and which is accessible
to users. As explained in Chapter 1, Section 1.2.6 and 1.3, a system catalog or data dictionary
is a system database, which is a repository of information describing the data in the database.
It is the data about the data or metadata. All of the various schemas and mappings and all of
the various security and integrity constraints, in both source and object forms, are stored in
the data dictionary. The system catalog is automatically created by the DBMS and consulted
frequently to resolve user requests. For example, the DBMS will consult the system catalog
to verify that a requested table exists and that the user issuing the request has the necessary
access privileges.
v. Database Communication Interfaces: The end-user’s requests for database access (may
be from remote location through internet or computer workstations) are transmitted to DBMS
in the form of communication messages. The DBMS provides special communication
routines designed to allow the database to accept end-user requests within a computer
network environment. The response to the end user is transmitted back from DBMS in the
form of such communication messages. The DBMS integrates with a communication
software component called data communication manager (DCM), which controls such
message transmission activities. Although, the DCM is not a part of DBMS, both work in
harmony in which the DBMS looks after the database and the DCM handles all messages to
and from the DBMS.
vi. Authorisation / Security Management: The DBMS protects the database against
unauthorized access, either intentional or accidental. It furnishes mechanism to ensure that
only authorized users can access the database. It creates a security system that enforces user
security and data privacy within the database. Security rules determine which users can
access the database, which data items each user may access and which data operations (read,
add, delete and modify) the user may perform. This is especially important in multi-user
environment where many users can access the database simultaneously. The DBMS monitors
user requests and rejects any attempts to violate the security rules defined by the DBA. It
monitors and controls the level of access for each user and the operations that each user can
perform on the data depending on the access privileges or access rights of the users.
There are many ways for a DBMS to identify legitimate users. The most common method is
to establish accounts with passwords. Some DBMSs use data encryption mechanisms to
ensure the information written to disk cannot be read or changed unless the user provides the
encryption key that unscrambles the data. Some DBMSs also provide users with the ability to
instruct the DBMS, via user exits, to employ custom-written routines to encode the data. In
some cases, organisations may be interested in conducting security audits, particularly if they
suspect the database may have been tampered with. Some DBMSs provide audit trails,
which are traces or logs that records various kinds of database access activities (for example,
unsuccessful access attempts). Security managemnt is discussed in further details in Chapter
14.
vii. Backup and Recovery Management: The DBMS provides mechanisms for backing up
data periodically and recovering from different types of failures. This prevents the loss of
data. It ensures that the aborted or failed transactions do not create any adverse effect on the
database or other transactions. The recovery mechanisms of DBMSs make sure that the
database is returned to a consistent state after a transaction fails or aborts due to a system
crash, media failure, hardware or software errors, power failure, and so on. Many DBMSs
enable users to make full or partial backups of their data. A full backup saves all the data in
the target resource, such as the entire file or an entire database. These are useful after a large
quantity of work has been completed, such as loading data into a newly created database.
Partial, or incremental, backups usually record only the data that has been changed since the
last full backup. These are less time-consuming than full backups and are useful for capturing
periodic changes. Some DBMSs support online backups, enabling a database to be backed up
while it is open and in use. This is important for applications that require support for
continuous operations and cannot afford having a database inaccessible. Recovery
management is discussed in further detail in Chapter 13.
viii. Concurrency Control Services: Since DBMSs support sharing of data among multiple
users, they must provide a mechanism for managing concurrent access to the database.
DBMSs ensure that the database is kept in consistent state and that the integrity of the data is
preserved. It ensures that the database is updated correctly when multiple users are updating
the database concurrently. Concurrency control is discussed in further detail in Chapter 12.
ix. Transaction Management: A transaction is a series of database operations, carried out by
a single user or application program, which accesses or changes the contents of the database.
Therefore, a DBMS must provide a mechanism to ensure either that all the updates
corresponding to a given transaction are made or that none of them is made. A detailed
discussion on transaction management has been given in Chapter 1, Section 1.11. A further
detail on transaction processing is given in Chapter 12.
x. Integrity Services: As discussed in Chapter 1, Section 1.5 (f), database integrity refers to
the correctness and consistency of stored data and is especially important in transaction-
oriented database system. Therefore, a DBMS must provide means to ensure that both the
data in the database and changes to the data follow certain rules. This minimises data
redundancy and maximises data consistency. The data relationships stored in the data
dictionary are used to enforce data integrity. Various types of integrity mechanisms and
constraints may be supported to help ensure that the data values within a database are valid,
that the operations performed on those values are valid and that the database remains in a
consistent state.
xi. Data Independence Services: As discussed in Chapter 1, Section 1.8.5 (b) and Section
2.4, a DBMS must support the independence of programs from the actual structure of the
database.
xii. Utility Services: The DBMS provides a set of utility services used by the DBA and the
database designer to create, implement, monitor and maintain the database. These utility
services help the DBA to administer the database effectively.
xiii. Database Access and Application Programming Interfaces: All DBMSs provide
interface to enable applications to use DBMS services. They provide data access via
structured query language (SQL). The DBMS query language contains two components: (a) a
data definition language (DDL) and (b) a data manipulation language (DML). As discussed
in Chapter 1, Section 1.10, the DDL defines the structure in which the data are stored and the
DML allows end users to extract the data from the database. The DBMS also provides data
access to application programmers via procedural (3GL) languages such as C, PASCAL,
COBOL, Visual BASIC and others.
Data models can be broadly classified into the following three categories:
Record-based data models
Object-based data models
Physical data models
Most commercial DBMSs support a single data model but the data models
supported by different DBMSs differ.
The entity-relationship (E-R) data model is one of the main techniques for
a database design and widely used in practice. The object-oriented data
models extend the definition of an entity to include not only the attributes
that describe the state of the object but also the actions that are associated
with the object, that is, its behaviour.
A hierarchical path that traces the parent segments to the child segments,
beginning from the left, defines the tree shown in Fig. 2.12. For example, the
hierarchical path for segment ‘E’ can be traced as ABDE, tracing all
segments from the root starting at the leftmost segment. This left-traced path
is known as preorder traversal or the hierarchical sequence. As can be noted
from Fig. 2.12 that each parent can have many children but each child has
only one parent.
Fig. 2.13 (a) shows a hierarchical data model of a UNIVERSITY tree type
consisting of three levels and three record types such as DEPARTMENT,
FACULTY and COURSE. This tree contains information about university
academic departments along with data on all faculties for each department
and all courses taught by each faculty within a department. Fig. 2.13 (b)
shows the defined fields or data types for department, faculty, and course
record types. A single department record at the root level represents one
instance of the department record type. Multiple instances of a given record
type are used at lower levels to show that a department may employ many
(or no) faculties and that each faculty may teach many (or no) courses. For
example, we have a COMPUTER department at the root level and as many
instances of the FACULTY record type are faculties in the computer
department. Similarly, there will be as many COURSE record instances for
each FACULTY record as that faculty teaches. Thus, there is a one-to-many
(1:m) association among record instances, moving from the root to the
lowest level of the tree. Since there are many departments in the university,
there are many instances of the DEPARTMENT record type, each with its
own FACULTY and COURSE record instances connected to it by
appropriate branches of the tree. This database then consists of a forest of
such tree instances; as many instances of the tree type as there are
departments in the university at any given time. Collectively, these comprise
a single hierarchic database and multiple databases will be online at a time.
Fig. 2.13 Hierarchical data model relationship of university tree type
Fig. 2.15 shows a diagram of network data model. It can be seen in the
diagram that member ‘B’ has only one owner ‘A’ whereas member ‘E’ has
two owners namely ‘B’ and ‘C’. Fig. 2.16 illustrates an example of
implementing network data model for a typical sales organisation in which
CUSTOMER, SALES_REPRESENTATIVE, INVOICE, INVOICE_LINE,
PRODUCT and PAYMENT represent record types. It can be seen in Fig.
2.16 that INVOICE_LINE is own by both PRODUCT and INVOICE.
Similarly, INVOICE has two owners namely SALES_REPRESENTATIVE
and CUSTOMER. In network data model, each link between two record
types represents a one-to-many (1:m) relationship between them.
Unlike the hierarchical data model, network data model supports multiple
paths to the same record, thus avoiding the data redundancy problem
associated with hierarchical system.
Fig. 2.19 (a) illustrates a typical E-R diagram for a product sales
organisation called M/s ABC & Co. This organisation manufactures various
products, which are sold to the customers against an order. Fig. 2.19 (b)
shows data items and records of entities. According to the E-R diagram of
Fig. 2.19 (a), a customer having identification no. 1001, name Waterhouse
Ltd. with address Box 41, Mumbai [as shown in Fig. 2.19 (b)], is an entity
since it uniquely identifies one particular customer. Similarly, a product
A1234 with a description Steel almirah and unit cost of 4000 is an entity
since it uniquely identifies one particular product and so on.
Now the set of all products (all records in the PRODUCT table of Fig.
2.19 (b) of M/s ABC & Co. is defined as the entity set PRODUCT.
Similarly, the entity set CUSTOMER represents the set of all the customers
of M/s ABC & Co. and so on. An entity set is represented by set of attributes
(called data items or fields). Each rectangular box represents an entity for
example, PRODUCT, CUSTOMER and ORDER. Each ellipse (or oval
shape) represents attributes (or data items or fields). For example, attributes
of entity PRODUCT are PROD-ID, PROD-DESC and UNIT-COST.
CUSTOMER entity contains attributes such as CUST-ID, CUST-NAME and
CUST-ADDRESS. Similarly, entity ORDER contains attributes such as
ORD-DATE, PROD-ID and PROD-QTY. There is a set of permitted values
for each attribute, called the domain of that attribute, as shown in Fig. 2.19
(b).
Fig. 2.19 E-Rdiagram for M/s ABC & Co
The E-R diagram has become a widely excepted data model. It is used for
designing of relational databases. A further detail on the E-R data model is
given in Chapter 6.
Single-user DBMS.
Multi-user DBMS.
Centralised DBMS.
Parallel DBMS.
Distributed DBMS.
Client/server DBMS.
R Q
1. Describe the three-tier ANSI-SPARC architecture. Why do we need mappings between
different schema levels? How do different schema definition languages support this
architecture?
2. Discuss the advantages and characteristics of the three-tier architecture.
3. Discuss the concept of data independence and explain its importance in a database
environment.
4. What is logical data independence and why is it important?
5. What is the difference between physical data independence and logical data independence?
6. How does the ANSI-SPARC three-tier architecture address the issue of data independence?
7. Explain the difference between external, conceptual and internal schemas. How are these
different schema layers related to the concepts of physical and logical data independence?
8. Describe the structure of a DBMS.
9. Describe the main components of a DBMS.
10. With a neat sketch, explain the structure of DBMS.
11. What is a transaction?
12. How does the hierarchical data model address the problem of data redundancy?
13. What do you mean by a data model? Describe the different types of data models used.
14. Explain the following with their advantages and disadvantages:
a. Data independence
b. Query processor
c. DDL processor
d. DML processor.
e. Run time database manager.
16. How does the hierarchical data model address the problem of data redundancy?
17. What do each of the following acronyms represent and how is each related to the birth of the
network database model?
a. SPARC
b. ANSI
c. DBTG
d. CODASYL.
18. Describe the basic features of the relational data model. Discuss their advantages,
disadvantages and importance to the end-user and the designer.
19. A university has an entity COURSE with a large number of courses in its catalog. The
attributes of COURSE include COURSE-NO, COURSE-NAME and COURSE-UNITS.
Each course may have one or more different courses as prerequisites or may have no
prerequisites. Similarly, a particular course may be a prerequisite for any number of courses,
or may not be a prerequisite for any other course. Draw an E-R diagram for this situation.
20. A company called M/s ABC Consultants Ltd. has an entity EMPLOYEE with a number of
employees having attributes such as EMP-ID, EMP-NAME, EMP-ADD and EMP-BDATE.
The company has another entity PROJECT that has several projects having attributes such as
PROJ-ID, PROJ-NAME and START-DATE. Each employee may be assigned to one or more
projects, or may not be assigned to a project. A project must have at least one employee
assigned and may have any number of employees assigned. An employee’s billing rate may
vary by project, and the company wishes to record the applicable billing rate (BILL-RATE)
for each employee when assigned to a particular project. By making additional assumptions,
if so required, drawn an E-R diagram for the above situation.
21. An entity type STUDENT has the attributes such as name, address, phone, activity, number
of years and age. Activity represents some campus-based student activity, while number of
years represents the number of years the student has engaged in these activities. A given
student may engage in more than one activity. Draw an E-R diagram for this situation.
22. Draw an E-R diagram for an enterprise or an organisation you are familiar with.
23. What is meant by the term client/server architecture and what are the advantages and
disadvantages of this approach?
24. Compare and contrast the features of hierarchical, network and relational data models. What
business needs led to the development of each of them?
25. Differentiate between schema, subschema and instances.
26. Discuss the various execution steps that are followed while executing users request to access
the database system.
27. With a neat sketch, describe the various components of database management systems.
28. With a neat sketch, describe the various functions and services of database management
systems.
29. Describe in detail the different types of DBMSs.
30. Explain with a neat sketch, advantages and disadvantages of a centralised DBMS.
31. Explain with a neat sketch, advantages and disadvantages of a parallel DBMS.
32. Explain with a neat sketch, advantages and disadvantages of a distributed DBMS.
STATE TRUE/FALSE
1. In a database management system, data files are the files that store the database information.
2. The external schema defines how and where data are organised in physical data storage.
3. In a network database terminology, a relationship is a set.
4. A feature of relational database is that a single database can be spread across several tables.
5. An SQL is a fourth generation language.
6. An object-oriented DBMS is suited for multimedia applications as well as data with complex
relationships.
7. An OODBMS allows for fully integrated databases that hold data, text, voice, pictures and
video.
8. The hierarchical model assumes that a tree structure is the most frequently occurring
relationship.
9. The hierarchical database model is the oldest data model.
10. The data in a database cannot be shared.
11. The primary difference between the different data models lies in the methods of expressing
relationships and constraints among the data elements.
12. In a database, the data are stored in such a fashion that they are independent of the programs
of users using the data.
13. The plan (or formulation of scheme) of the database is known as schema.
14. The physical schema is concerned with exploiting the data structures offered by a DBMS in
order to make the scheme understandable to the computer.
15. The logical schema, deals with the manner in which the conceptual database shall get
represented in the computer as a stored database.
16. Subschemas act as a unit for enforcing controlled access to the database.
17. The process of transforming requests and results between three levels are called mappings.
18. The conceptual/ internal mapping defines the correspondence between the conceptual view
and the stored database.
19. The external/conceptual mapping defines the correspondence between a particular external
view and the conceptual view.
20. A data model is an abstraction process that concentrates essential and inherent aspects of the
organisation’s applications while ignores superfluous or accidental details.
21. Object-oriented data model is a logical data model that captures the semantics of objects
supported in object-oriented programming.
22. Centralised database system is physically confined to a single location.
23. Parallel database systems architecture consists of one central processing unit (CPU) and data
storage disks in parallel.
24. Distributed database systems are similar to client/server architecture.
a. data
b. constraints and schema
c. relationships
d. all of these.
2. What separates the physical aspects of data storage from the logical aspects of data
representation?
a. data
b. schema
c. constraints
d. relationships.
3. What schema defines how and where the data are organised in a physical data storage?
a. external
b. internal
c. conceptual
d. nNone of these
4. Which of the following schemas defines the stored data structures in terms of the database
model used?
a. external
b. conceptual
c. internal
d. none of these.
5. Which of the following schemas defines a view or views of the database for particular users?
a. external
b. conceptual
c. internal
d. none of these.
a. Database
b. RDBMS
c. DBMS
d. none of these.
a. shared
b. secure
c. independent
d. all of these.
8. Which of the following is the database management activity of coordinating the actions of
database manipulation processes that operate concurrently, access shared data and can
potentially interfere with each other?
a. concurrency management
b. database management
c. transaction management
d. information management.
11. Immunity of the conceptual (or external) schemas to changes in the internal schema is
referred to as:
13. Immunity of the external schemas (or application programs) to changes in the conceptual
schema is referred to as:
a. SPARC
b. E.F. Cord
c. ANSI
d. Chen.
17. The E-R data model was first introduced by:
a. SPARC
b. E.F. Cord
c. ANSI
d. Chen.
3.1 INTRODUCTION
Thus, large volumes of data and programs are stored in physical storage
devices, called secondary, auxiliary or external storage devices. The database
management system (DBMS) software then retrieves updates and processes
this data as needed. When data are stored physically on secondary storage
devices, the organisation of data determines the way data can be accessed.
The organisation of data is influenced by a number of factors such as:
Maximizing the amount of data that can be stored efficiently in a particular storage device by
suitable structuring and blocking of data or records.
Time (also called response time) required for accessing a record, writing a record, modifying
a record and transferring a record to the main memory. This affects the types of applications
that can use the data and the time and cost required to do so.
Minimizing or zero data redundancy.
Characteristics of secondary storage devices.
Expandability of data.
Recovery of vital data in case of system failure or data loss.
Data independence.
Complexity and cost.
The secondary (also called on-line) storage can be further categorised as:
Magnetic disk.
Advantages:
High-speed storage and much faster than main memory.
Disadvantages:
Small storage device.
Expensive as compared to main memory.
Volatile memory.
Advantages:
High-speed random access memory.
Its operation is very fast.
Disadvantages:
Usually small in size but bigger than cache memory.
Very costly.
Volatile memory.
Disadvantages:
Usually small in size.
It is costly as compared to secondary storage.
Hard disks.
Removable-pack disks.
Winchester disks.
b. Exchangeable or flexible disks
Floppy disks.
Zip disks.
Jaz disks.
Super disks.
Access time is the time from when a read or write request is issued to
when the data transfer begins. The read/write arm first moves to get
positioned over the correct track to access data on a given sector of a disk. It
then waits for the sector to appear under it as the disk rotates. The time
required to move the read/write heads from their current position to a new
cylinder address is termed as seek time (or access motion time). Seek time
increases with the distance that the arm moves and it ranges typically from 2
to 30 milliseconds depending on how far the track is from the initial arm
position. The time for a seek is the most significant delay when accessing
data on a disk, just as it is when accessing data on a movable-head assembly.
Therefore, it is always desirable to minimise the total seek time. Once seek
has started, the read/write head waits for the sector to be accessed to appear
under it. This waiting time, due to rotational delay is termed as rotational
latency time. There is third timing factor called head activation time, which
is required to electronically activate the read/ write head over the disk
surface where data transfer is to take place. Head activation time is regarded
as negligible as compared to other performance factors. Therefore, access
time depends both on seek time and the latency time.
Disk transfer rate is the amount of time required to transfer data from the
disk to or from main memory. In other words, it is the state at which data can
be retrieved from or stored to the disk from the main memory. Data transfer
rate is a function of the rotational speed and the density of the recorded data.
Ideally, the current magnetic disks have data transfer rates of about 25 to 40
megabytes per second, however actual data transfer rates are significantly
less (in order of 4 to 8 megabytes per second).
Mean time to failure (MTTF) is the measure of reliability of the disk.
MTTF of a disk is the amount of time that a system is, on an average,
expected to run continuously without any failure. Theoretically, present
available MTTF of disk is typically ranging from 30,000 to 1,200,000 hours
(about 3.4 to 136 years). But in practice the MTTF is computed on the
probability of failure when the disk is new and a MTTF of 1,200,000 hours
does not mean that a disk can be expected to function for 136 years. Most
disks have expected life span of about 5 years and have high rates of failure
with increased years of use.
The most popular form of optical storages is the compact disk (CD) and
digital video disk (DVD). CD can store more than 1 gigabyte of data and
DVD can store more than 20 gigabytes of data on both sides of the disk.
Like audio CDs, CD-ROMs come with data already encoded onto them. The
data is permanent and can be read any number of times but cannot be
modified. A CD-ROM player is required to read data from CD-ROM drive.
There are record-once versions of compact disks called CD-recordable (CD-
R) and DVD-Recordable (DVD-R), which can be written only once. Such
disks are also called write-once, read-only memory (WOROM) disks.
Multiple-write versions of compact disks called CD-ReWritable (CD-RW)
and digital video disks called DVD-ReWritable (DVD-RW) and DVD-RAM
are also available which can be written multiple times. Recordable compact
disks are magnetic-optical storage devices that use optical means to read
magnetically encoded data. Such optical disks are useful for archival storage
of data as well as distribution of data.
Since the head assembly is heavier, DVD and CD drives have much
longer seek time (typically 100 milliseconds) as compared to magnetic-disk
drives. Rotational speeds of DVD and CD drives are lower than that of
magnetic disk drives. Faster DVD and CD drives have rotational speed of
about 3000 rotations per minute, which is comparable to speed of lower-end
magnetic-disk drives. Data transfer rates of DVD and CD drives are less
than that of magnetic disk drives. The data transfer rate of CD drive is
typically 3 to 6 megabytes per second and that of DVD drive is 8 to 15
megabytes per second. The transfer rate of optical drives is characterised as
n×, which means the drive supports transfer at n-times the standard rate. The
commonly available transfer rate of CD is 50× and that of DVD is 12×. Due
to high storage capacity, longer lifetime than magnetic disks and being
remove able, CD-R / CD-RW and DVD-R / DVD-RW are popular for
archival storage of data.
Current magnetic tapes are available with high storage capacity. Digital
audio tap (DAT) cartridge is available with storage capacity in the range of
few gigabytes, whereas digital linear tape (DLT) is available with storage
capacity of more than 40 gigabytes. The storage capacity of Ultrium tape
format is more than 100 gigabytes and that of Ampex helical scan tapes is in
the range of 330 gigabytes. Data transfer rates of these tapes are of the order
of a few megabytes per second to tens of megabytes per second.
Advantages:
Improved overall reliability.
Expensive.
Disadvantages:
Redundant data.
The RAID level 5 configuration has the best redundancy performance for
small and large read and large write requests. Small writes require a read-
modify-write cycle and are thus less efficient than RAID level 1 system. The
effective space utilisation of RAID level 5 system is 80 per cent, the same as
in RAID 3 and 4 systems.
The first operation is to insert records in the first available slots (or empty
spaces). Now whenever a record is deleted, the empty slot created by
deletion of record must be filled with some other record of the file. This can
be achieved using number of alternatives. The first alternative is that the
record that came after deleted record can be moved into the empty space
formally occupied by the deleted record. This operation will continue until
every record following the deleted record has been moved ahead. Fig. 3.9 (a)
shows an empty slot created by deletion of record 5, whereas in Fig. 3.9 (b)
all the subsequent records have moved one slot upward from record 6
onwards. All empty slots appear together at the end of the page. Such an
approach requires moving a large number of records depending on the
position of deleted record in a page of the file.
Fig. 3.9 Deletion operation on PURCHASE record
The second alternative is that only the last record is shifted in empty slot
of deleted record, instead of disturbing large number of records, as shown in
Fig. 3.9. (c). In both these two alternatives, it is not desirable to move
records to occupy the empty slot of deleted record as because doing so
requires additional block accesses. As insertion of records is a more
frequently performed operation than deletion of records, it would be more
appropriate to keep the empty slot of the deleted record vacant for a
subsequent insertion of a record before the space can be reused.
Therefore, a third alternative is used in which the deletion of a record is
handled by using an array of bits (or bytes) called file header at the
beginning of the file, one per slot, to keep track of free (or empty) slot
information. Till the time record is stored in the slot, its bit is ON. But when
a record is deleted, its bit is turned OFF. The file header tracks this bit
becoming ON or OFF. A file header contains a variety of information about
the file including the addresses of the slot of deleted records. When the first
record is deleted, the file header stores its slot address. Now this empty slot
of first deleted record is used to store the empty slot address of the second
available record and so on, as shown in Fig. 3.9 (d). These stored empty slot
addresses of deleted records are also called pointers since they point to the
location of a record. The empty slot of deleted records thus forms a linked
list, which is referred to as a free list. Under this arrangement, whenever a
new record is inserted, the first available empty slot pointed by the file
header is used to store it. The file header pointer is now pointed to the next
available empty slot for storing next inserted record. In case of unavailability
of an empty slot, the new record is added at the end of the file.
In case of multiple key search (or sorting), the first key is called a primary
key while the others are called secondary keys. Fig. 3.16 (a) shows a simple
EMPLOYEE file of an organisation, while Fig. 3.16 (b) shows the same file
sorted on three keys in ascending order. As shown in Fig. 3.16 (b), the first
key (primary key) is employee’s last name (EMP-LNAME), the second key
(secondary key) is employee’s identification number (EMP-ID) and the third
key (secondary key) is employee’s country (COUNTRY) to which they
belong. Whenever an attribute (filed item) or a set of attributes is added into
the record, the entire file is reorganised to effect the addition of new attribute
in each record of the file. Therefore, extra fields are always kept in the
sequential file for future addition of items.
Fig. 3.16 EMPLOYEE payroll file of an organisation
(a) Unsorted
The primary data storage area (also called prime area) is an area in which
records are written when an indexed-sequential file is originally created. It
contains the records written by the users’ programs. The records are written
in data blocks in ascending key sequence. These data blocks are in turn
stored in ascending sequence in the primary data storage area. The data
blocks are sequenced by the highest key of logical records contained in
them. The prime area is essentially a sequential file.
The overflow area is essentially used to store new records, which cannot
be otherwise inserted in the prime area without rewriting the sequential file.
It permits the addition of records to the file whenever a new record is
inserted in the original logical block. Multiple records belonging to the same
logical area may be chained to maintain logical sequencing. A pointer is
associated with each record in the prime area which indicates that the next
sequential record is stored in the overflow area. Two types of overflow areas
are generally used, which are known as:
a. Cylinder overflow area.
b. Independent overflow area.
Either or both of these overflow areas may be specified for a particular
file. In cylinder overflow area, the spare tracks in every cylinder is reserved
for accommodating the overflow records, whereas in an independent
overflow area, overflow records from anywhere in the prime area may be
placed.
In case of random enquiry or update, a hierarchy of indices are maintained
that are accessed to get the physical location of the desired record. The data
of the indexed-sequential files is stored on the cylinders, each of which is
made up of a number of tracks. Some of these tracks are reserved for
primary data storage area and others are used for an overflow area associated
with the primary data area on the cylinder. A track index is written and
maintained for each cylinder. It contains an entry of each primary data track
in the cylinder as well as an entry to indicate if any records have overflowed
from the track.
Fig. 3.17 shows an example of indexed-sequential file organisation and
access. Fig. 3.17 (a) shows how overflow area is created. As shown, when a
new record 512 is inserted in an existing logical block having records 500,
505, 510, 515, 520 and 525, an overflow area is created and record 525 is
shifted into it. Fig. 3.17 (b) illustrates the relationships between the different
levels of indices. Locating a particular record involves a search operation of
master index to find the proper cylinder index (for example Cyl index 1)
with which the record is associated. Next, a search is made of the cylinder
index to find the cylinder (for example, Cyl 1) on which the record is
located. A search of the track index is then made to know the track number
on which the record resides (for example, Track 0). Finally, a search of the
track is required to locate the desired record. Master index resides in main
memory during file processing, and remains their till the file is closed.
However, master index is not always necessary and it should only be
requested for large files. Master index is the highest level of index in an
indexed-sequential file organisation.
An example of a indexed-sequential file organisation developed by IBM,
is called Index Sequential Access Method (ISAM). Since the records are
organised and stored sequentially in ISAM files, adding new records to the
file can be a problem. To overcome this problem, ISAM files maintain an
overflow area for records added after a file is created. Pointers are used to
find the records in their proper sequence when the file is processed
sequentially. In case of overflow area becoming full, an ISAM file can be
reorganised by merging records in the overflow area with the records in the
primary data storage area to produce a new file with all the records in the
proper sequence. Virtual Storage Access Method (VSAM) is advanced
version of ISAM file in which virtual storage methods are used to enter the
instructions. It is a version of B+ tree discussed in Section 3.6.3. In VSAM
files, instead of using overflow area for adding records, the new records are
inserted into the appropriate place in the file and the records that follow are
shifted to new physical locations. The shifted records are logically connected
through pointers located at the end of inserted records. Thus, VSAM file do
not require reorganisation, as is the case with ISAM files. VSAM file
method is much more efficient that ISAM files.
(a) Shifting of the last record into overflow area while inserting a record
(b) Relationship between different levels of indices
In a hash file, the data is scattered throughout the disk in a random order.
The processing of a hash file is dependent on how the search key set for the
records is transformed (or mapped) into the addresses of secondary storage
device (for example, hard disk) to locate the desired record. The search
condition must be an equality condition on a single files, called the hash
field of the file. In most cases, the hash field is also a key field of the file, in
which case it is called hash key. In hashing operations, there is a function h
called a hash function or randomising function that is applied to the hash
field value v of a record. This operation yields the address of the disk block
in which the address is stored. A search for the record within the block can
be carried out in the main memory buffer. The function h(v) indicates the
number of the bucket in which the record with key value v is to be found. It
is desirable that h “hashes” v, that is, h(v) takes all its possible values with
roughly equal to probability as v ranges over likely collections of values for
the key.
3.6 INDEXING
Fig. 3.21 shows an example of trees. Each node has to be reachable from
the root through a unique sequence of arcs called a path. The number of arcs
in a path is called the length of the path. The length of the path from the root
to the node plus 1 is called level of a node. The height of a non-empty tree is
the maximum level of a node in the tree. The empty tree is a legitimate tree
of height 0 and a single node is a tree of height 1. This is the only case in
which a node is both the root and a leaf. The level of a node must be
between the levels of the root (that is 1) and the height of the tree. Fig. 3.21
shows an example of a tree structure that reflects the hierarchy of a
manufacturing organisation.
In a tree-based indexing scheme, the search generally starts at the root
node. Depending on the conditions that are satisfied at the node under
examination, a branch is made to one of several nodes and the procedure is
repeated until a match is found or a leaf note is encountered. A leaf node is
the last node beyond which there are no more nodes available. There are
several types of tree-based index structure, however detailed explanation
about B-tree indexing and B+-tree indexing are provided in this section.
Fig. 3.21 Example of trees
R Q
1. Discuss physical storage media available on the computer system.
2. What is a file? What are records and data items in a file?
3. List down the factors that influence organisation of data in a database system.
4. What is a physical storage? Explain with block diagrams, a system of physically accessing
the database.
5. A RAID system allows replacing failed disks without stopping access to the system. Thus,
the data in the failed disk must be rebuilt and written to the replacement disk while the
system is in operation. With which of the RAID levels is the amount of interference between
the rebuild and ongoing disk accesses least? Explain.
6. How are records and files related?
7. List down the factors that influence the organisation of a file.
8. Explain the differences between master files, transaction files and report files.
9. Consider the deletion of record 6 from file of Fig. 3.8 (b). Compare the relative merits of the
following techniques for implementing the deletion:
a. Move record 7 to the space occupied by record 6 and move record 8 to the space
occupied by record 7 and so on.
b. Move record 8 to the space occupied by record 6.
c. Mark record 6 as deleted and move no records.
10. Show the structure of the file of Fig. 3.9 (d) after each of the following steps:
11. Give an example of a database application in which variable-length records are preferred to
the pointer method. Explain your answer.
12. What is a file organisation? What are the different types of file organisation? Explain using a
sketech each of them with their advantages and disadvantages.
13. What is a sequential file organisation and a sequential file processing?
14. What are the advantages and disadvantages of a sequential file organisation?
15. In the sequential file organisation, why is an overflow block used even if there is, at the
moment, only one overflow record?
16. What is indexing and hashing?
17. When is it preferable to use dense index than parse index? Explain your answer.
18. What is the difference between primary index and secondary index?
19. What is the most important difference between a disk and a tape?
20. Explain the terms seek time, rotational delay and transfer time.
21. Explain what buffer manager must do to process a read request for a page.
22. What is direct file organisation? Write its advantages and disadvantages.
23. What are secondary indexes and what are they used for?
24. When does a buffer manager write a page to disk?
25. What do you mean by indexed-sequential file processing?
26. Explain the difference between the following:
a. File organisation
b. Sequential file organisation
c. Indexed-file organisation
d. Direct file organisation
e. Indexing
f. RAID
g. File manager
h. Buffer manager
i. Tree
j. Leaf.
k. Cylinder
l. Main memory.
a. Cache memory
b. Main memory
c. Magnetic disk
d. Magnet tape
e. Optical disk
f. Flash memory.
55. With a neat sketch, explain the advantages, and disadvantages of a magnetic disk storage
mechanism.
56. Explain the factors affecting the performance of magnetic disk storage device.
57. What do you mean by RAID technology? What are the various RAID levels?
58. What are the factors that influence the choice of RAID levels? Provide an orientation table
for RAID levels.
59. Explain the working of a tree-based indexing.
STATE TRUE/FALSE
1. The efficiency of the computer system greatly depends on how it stores data and how fast it
can retrieve the data.
2. Because of the high cost and volatile nature of the auxiliary memory, permanent storage of
data is done in the main memory.
3. In a computer, a file is nothing but a series of bytes.
4. An indexed-sequential file organisation is a direct processing method.
5. In a physical storage, a record has a physical storage location or address associated with it.
6. Access time is the time from when a read or write request is issued, to the time when data
transfer begins.
7. The file manager is a software that manages the allocation of storage locations and data
structure.
8. The different types of files are master files, report files and transaction files.
9. The secondary devices are volatile whereas the tertiary storage devices are non-volatile.
10. The buffer manager fetches a requested page from disk into a region of main memory called
the buffer pool and tells the file manager the location of the requested page.
11. The term non-volatile means it stores and retains the programs and data even after the
computer is switched off.
12. Auxiliary storage devices are also useful for transferring data from one computer to another.
13. Transaction files contain relatively permanent information about entities.
14. Master file is a collection of records describing activities or transactions by organisation.
15. Report file is a file created by extracting data to prepare a report.
16. Auxiliary storage devices process data faster than main memory.
17. The capacity of secondary storage devices is practically unlimited.
18. It is more economical to store data on secondary storage devices than in primary storage
devices.
19. Delete operation deletes the current record and updates the file on the disk to reflect the
deletion.
20. In case of sequential file organisation, records are stored in some predetermined sequence,
one after another.
21. A file could be made of records which are of different sizes. These records are called
variable-length records.
22. Sequential file organisation is most common because it makes effective use of the least
expensive secondary storage devices such as magnetic disk.
23. When using sequential access to reach a particular record, all the records preceding it need
not be processed.
24. In direct file processing, on an average, finding one record will require that half of the
records in the file be read.
25. In a direct file, the data may be organised in such a way that they are scattered throughout the
disk in what may appear to be random in order.
26. Auxiliary and secondary storage devices are the same.
27. Sequential access storage is off-line.
28. Magnetic tapes are direct-access media.
29. Direct access systems do not search the entire file, instead, they move directly to the needed
record.
30. Hashing is a method of determining the physical location of a record.
31. In hashing, the record key is processed mathematically.
32. The file storage organisation determines how to access the record.
33. Files could be made of fixed-length records or variable-length records.
34. A file in which all the records are of the same length are said to contain fixed-length-records.
35. Because tapes are slow, they are generally used only for long-term storage and backup.
36. There are many types of magnetic disks such as hard disks, flexible disks, zip disks and jaz
disks.
37. Data transfer time is the time it takes to transfer the data to the primary storage.
38. Optical storage is low-speed direct access storage device
39. In magnetic tape, the read/write head reads magnetized areas (which represent data on the
tape), converts them into electrical signals and sends them to main memory and CPU for
execution or further processing.
40. In a bit-level stripping, splitting of bits of each byte is done across multiple disks.
41. In a block-level stripping, splitting of blocks is done across multiple disks and it treats the
array of disks as a single large disk.
42. B+-tree index is a balanced tree in which the internal nodes direct the search operation and
the leaf nodes contain the data entries.
1. If data are stored sequentially on a magnetic tape, they are ideal for:
a. on-line applications
b. batch processing applications
c. spreadsheet applications
d. decision-making applications.
a. costly
b. volatile
c. faster
d. none of these.
a. relative addressing
b. indexing
c. hashing
d. all of these.
a. costly
b. volatile
c. faster
d. none of these.
a. record
b. file
c. field
d. none of these.
a. knowledge
b. instructions
c. data
d. none of these.
a. master file
b. report file
c. transaction file
d. all of these.
a. transaction file
b. master file
c. report file
d. none of these.
11. Which of the following file is created by extracting data to prepare a report?
a. report file
b. master file
c. transaction file
d. all of these.
a. economy
b. security
c. capacity
d. all of these.
14. Employee ID, Supplier ID, Model No and so on are examples of:
a. primary keys
b. fields
c. unique record identifier
d. all of these.
a. magnetic tape
b. magnetic disk
c. zip disk
d. DAT cartridge.
a. hard disks
b. magnetic tape
c. jaz disk
d. floppy disk.
18. Which storage media does not permit a record to be read and written in the same place?
a. magnetic disk
b. hard disk
c. magnetic tape
d. none of these.
a. from when a read or write request is issued to when data transfer begins
b. amount of time required to transfer data from the disk to or from main memory
c. required to electronically activate the read/write head over the disk surface where
data transfer is to take place
d. none of these.
a. from when a read or write request is issued to when data transfer begins
b. amount of time required to transfer data from the disk to or from main memory
c. required to electronically activate the read/write head over the disk surface where
data transfer is to take place
d. none of these.
a. from when a read or write request is issued to when data transfer begins
b. amount of time required to transfer data from the disk to or from main memory
c. required to electronically activate the read/write head over the disk surface where
data transfer is to take place
d. none of these.
23. Which of the following is a factor that affects the access time of hard disks?
a. zip disk
b. hard disk
c. magnetic tape
d. none of these.
a. optical disk
b. zip disk
c. hard disk
d. jaz disk.
a. WORM
b. Super disk
c. CD-ROM
d. CD-RW.
a. DEC
b. IBM
c. COMPAC
d. HP.
a. slowest
b. fastest
c. medium speed
d. none of these.
a. cache memory
b. main memory
c. flash memory
d. all of these.
1. The _____ temporarily stores data and programs in its main memory while the data are being
processed.
2. The most common types of _____ devices are magnetic tapes, magnetic disks, floppy disks,
hard disks and optical disks.
3. The buffer manager fetches a requested page from disk into a region of main memory called
_____ pool.
4. _____ is also known as secondary memory or auxiliary storage.
5. Redundancy is introduced using _____ technique.
6. In a bit-level stripping, splitting of bits of each byte is done across _____ .
7. There are two types of secondary storage devices (a) _____ and (b) _____ .
8. A collection of related record is called _____.
9. RAID stands for _____.
10. ISAM stands for _____.
11. VSAM stands for _____.
12. There are mainly two kinds of file operations (a) _____ and (b) _____.
13. Direct access storage devices are called _____.
14. Mean time to failure (MTTF) is the measure of _____ of the disk.
15. The overflow area is essentially used to store _____, which cannot be otherwise inserted in
the prime area without rewriting the sequential file.
16. Primary index is called _____ index.
17. Primary index is an index based on a set of fields that include _____ key.
18. Data to be used regularly is almost always kept on a _____.
19. A dust particle or a human hair on the magnetic disk surface could cause the head to crash
into the disk. This is called _____.
20. Secondary index is used to search a file on the basis of _____ keys.
21. The two forms of record organisations are (a) _____ and (b) _____.
22. In sequential processing, one field referred to as the _____, usually determines the sequence
or order in which the records are stored.
23. Secondary storage is called _____ storage whereas Tertiary storage is called _____ storage
device.
24. Processing data using sequential access is referred to as _____.
25. _____ is the duration taken to complete a data transfer _____ from the time when the
computer requests data from a secondary storage device to the time when the transfer of data
is complete.
26. A _____ is a field or set of fields whose contents is unique to one record and can therefore be
used to identify that record.
27. Hashing is also known as _____.
28. _____ is the time it takes an access arm (read/write head) to get into position over a
particular track.
29. In an indexing method, a _____ associates a primary key with the physical location at which
a record is stored.
30. When the records in a large file must be accessed immediately, then _____ organisation must
be used.
31. In an _____, the records are stored either sequentially or non-sequentially and an index is
created that allows the applications to locate the individual records using the index.
32. In an indexed organisation, if the records are stored sequentially based on primary key value,
than that file organisation is called an _____.
33. A track is divided into smaller units called _____.
34. The sectors are further divided into _____.
35. CD-R drive is short for _____.
36. _____ stands for write-once, read-many.
37. In tree-based indexing scheme, the search generally starts at the _____ node.
38. Deletion time is the time taken to delete _____.
39. ISAM was developed by _____.
Part-II
RELATIONAL MODEL
Chapter 4
4.1 INTRODUCTION
4.3.1 Domain
Fig. 4.1 shows the structure of an instance or extension, of a relation called
EMPLOYEE. The EMPLOYEE relation has six attributes (field items),
namely EMP-NO, LAST-NAME, FIRST-NAME, DATE-OF-BIRTH, SEX,
TEL-NO and SALARY. The extension has seven tuples (records). Each
attribute contains values drawn form a particular domain. A domain is a set
of atomic values. Atomic means that each value in the domain is indivisible
to the relational model. Domain is usually specified by name, data type,
format and constrained range of values. For example, in Fig. 4.1, attribute
EMP-NO, is a domain whose data type is an integer with value ranging
between 1,00,000 and 2,00,000. Additional information for interpreting the
values of a domain can also be given for example, SALARY should have the
units of measurement as Indian Rupees or US Dollar. Table 4.1 shows an
example of seven different domains with respect to EMPLOYEE record of
Fig. 4.1. The value of each attribute within each tuple is atomic, that means
it is a single value drawn from the domain of the attribute. Multiple or
repeating values are not permitted.
Fig. 4.1 EMPLOYEE relation
The relationship R for a given n number of domains D (D1, D2, D3,…, Dn)
consists of an un-ordered set of n-tuples with attributes (A1, A2, A3, … An)
where each value A1 is drawn from the corresponding domain D1. Thus,
A1 ∈ D1A2 ∈ D2 … An ∈ Dn (4.1)
Each tuple is a member of the set formed by the Cartesian product (that is
all possible distinct combination) of the domains D1× D2 × D3 … × Dn.
Thus, each tuple is distinct from all others and any instance of the relation is
a subset of the Cartesian product of its domain.
4.3.2.1 Superkey
Superkey is an attribute, or set of attributes, that uniquely identifies a tuple
within a relation. In Fig. 4.1, the attribute EMP-NO is a superkey because
only one row in the relation has a given value of EMP-NO. Taken together,
the two attributes EMP-NO and LAST-NAME are also a superkey because
only one tuple in the relation has a given value of EMP-NO and LAST-
NAME. In fact, all the attributes in a relation taken together are a superkey
because only one row in a relation has a given value for all the relation
attributes.
The project manager of project P1 is Thomas and this project uses five
excavators and four drills. There will be at most one row for a combination
of a project and machine, and {PROJECT, MACHINE} is the relation key. It
is to be noted that a project has only one project manager and that
consequently PROJ-MANAGER can identify a project. {PROJ-
MANAGER, MACHINE} is also a relation key. Thus relation USE of Fig.
4.3 has two relation keys. Some keys are more important than others. For
example, {PROJECT, MACHINE} is considered more important than
{PROJ-MANGER, MACHINE} because PROJECT is more stable identifier
of projects. PROJ-MANAGER is not a stable identifier because a project’s
manager can change during its execution. Since this is an important key, it is
often known as primary key, senior to the candidate keys.
A candidate key can also be described as a superkey without the
redundancies. In other words, candidate key is a superkey such that no
proper subset is a superkey within the relation. There may be several
candidate keys for a relation.
Set-theoretic operations make use of the fact that tables are essentially
sets of rows. There are four set-theoretical operations, as shown in Table 4.3.
Native relational operation focuses on the structure of the rows. There are
four native relational operations, as shown in Table 4.4.
SELECT *
From WAREHOUSE
where LOCATION = ‘Mumbai’
into R2
We can also impose conditions on more than one attribute. For example,
SELECT *
From WAREHOUSE
where LOCATION = ‘Mumbai’ and NO-OF-BINS
>
into R3
In the case of PROJECTION operation, the SQL does not follow the
relational model and the operation is expressed as:
PROJECT WAREHOUSE
ON WH-ID, LOCATION, PHONE
Into R4
PROJECT WAREHOUSE
ON LOCATION
Into R5
JOIN WAREHOUSE
with ITEMS
ON WH-ID
Into R6
The above operations will select all the attributes of both relations
WAREHOUSE and ITEMS with the same value of matching attribute WH-
ID and create a new relation R6, as shown in Fig. 4.7 (b). Thus, in JOIN
operation, the tuples that have the same value of matching attributes in
relations WAREHOUSE and ITEMS be combined into a single tuple in the
new relation R6.
Fig. 4.7 The JOIN operation
(b) Relation R6
There are several types of JOIN operations. The JOIN operation discussed
above is called equijoin, in which two tuples are combined if the values of
the two nominated attributes are the same. A JOIN operation may be for
conditions such as a ‘greater-than’, ‘less-than’ or ‘not-equal’. The JOIN
operation requires a domain that is common to the tables (or relations) being
joined. This prerequisite for performing JOIN operation enables RDBMS
that support domains to check for a common domain before performing the
join requested. This check protects users from possible errors.
SELECT *
from relation 1
UNION
SELECT *
from relation 2
Fig. 4.9 The UNION operation
For example, let us consider relations R8 and R9, as shown in Fig. 4.9 (a).
Now, UNION of the two relations, R8 and R9, is given in relation R10, as
shown in Fig. 4.9 (b). The operation may be written as:
UNION R8, R9
Into R10
SELECT *
from relation 1
MINUS
SELECT *
from relation 2
Fig. 4.10 The difference operation
For example, let us consider relations R11 and R12 as shown in Fig. 4.10
(a). Now, DIFFERENCE of the two relations, R11 and R12 is given in
relation R13, as shown in Fig. 4.10 (b). The operation may be written as:
In case of difference, only those tuples (rows) are outputted (R13) that
appear in the first relation (R11) but not the second (R12).
SELECT *
from relation 1
INTERSECT
SELECT *
from relation 2
For example, let us consider relations R11 and R12 as shown in Fig. 4.10
(a). Now, INTERSECTION of the two relations, R11 and R12 is given in
relation R14, as shown in Fig. 4.11. The operation may be written as:
In case of intersection, those tuples (rows) are outputted (R14) that appear
in both the relations R11 and R12.
SELECT *
from relation 1, relation 2
For example, let us consider relations R15 and R16 as shown in Fig. 4.12
(a). Now, the CARTESIAN PRODUCT of two relations, R15 and R16 is
given in relation R17, as shown in Fig. 4.12 (b). The operation may be
written as:
SELECT *
from relation 1
DIVISION
SELECT *
from relation 2
Suppose we have two relations R18 and R19, as shown in Fig. 4.13. If R18
is the dividend and R19 the divisor, then relation R20 = R18 / R19. The
operation may be written as:
(a) σdept-no=10(EMPLOYEE)
(b) σSALARY=80000(EMPLOYEE)
Query # 2 Select tuples for all employees in the relation
EMPLOYEE who either work in DEPT-NO 10 and get
annual salary of more than INR 80,000, or work in
DEPT-NO 12 and get annual salary of more than INR
90,000.
ACTUAL-DEPENDENTS ← (σEMP-ID=FEPT-
ID(DEPENDENTS)
FINAL-RESULT ← ∏EMP-NAME(RESULT-EMP-ID *
EMPLOYEE)
Query # 9 Retrieve the names of employees who have no
dependents.
ALL-EMP ← ∏EMP-ID(EMPLOYEE)
EMP-WITHOUT-DEPENDENT ← (ALL-EMP -
EMP-WITH-DEPENDENT)
FINAL-RESULT ← ∏EMP-NAME(EMP-WITHOUT-
DEPENDENT * EMPLOYEE)
Query # 10 Retrieve the names of managers who have at least one
dependent.
MGRS-WITH-DEPENDENT ← (MANAGER ⋂
EMP-WITH-DEPENDENT)
FINAL-RESULT ← ∏EMP-NAME(MGRS-WITH-
DEPENDENT * EMPLOYEE)
Query # 11 Prepare a list of project numbers (PROJ-NO) for
projects (PROJECT) that involve an employee whose
name is “Thomas”, either as a technician or as a
manager of the department that controls the project.
Thomas-TECH-PROJ ← ∏PROJ-NO(WORKS-ON *
Thomas)
FINAL-RESULT ← (Thomas-TECH-PROJ ⋃
Thomas-MGR-PROJ)
Let us take another example. In Fig. 4.14 (b), predicates “is smaller than”,
“is greater than”, “is north of”, “is south of” require two objects and are
called two-place predicates.
In database applications, a relational calculus is of two types:
Tuple relational calculus.
Domain relational calculus.
EMPLOYEE(R)
To express the query ‘Find the set of all tuples R such that F(R) is true’,
we write:
{R|F(R)}
This term can be illustrated for relations shown in Fig. 4.15, for example,
WAREHOUSE.LOCATION = MUMBAI
or, ITEMS.ITEM-NO > 30
All tuple variables in terms are defined to be free. In defining a WFF,
following symbols are used that are commonly found in predicate calculus:
⌉ = negation
∃ = existential quantifier (meaning ‘there EXISTS’)used for in
formulae that must be true for at least one instance
∀ = universal quantifier meaning ‘FORALL’) used in statements
about every instance
In the above examples, STORED and ITEMS are free variables in the first
WFF. In the second WFF, only STORED is free, whereas ITEMS is bound.
Bound and free variables are important to formulating calculus expression.
A calculus expression may be given in the form mentioned below so that all
tuple variables preceding WHERE are free in the WFF.
Fig. 4.15 Sample relations
Relational calculus expressions can be used to retrieve data from one or
more relations, with the simplest expressions being those that retrieve data
from one relation only.
b. List the details of employees earning salary more than IRS 40000.
c. List the details of cities where there is a branch office but no properties for rent.
d. List the names of clients who have viewed a property for rent in Delhi.
e. List all the cities where there is a branch office and at least one property for the client.
where d1, d2,… , dn and d1, d2,… , dn represent domain variables and
F(d1, d2,…, dm) represents a formula composed of atoms. Each atom has one
of the following forms:
R(d1, d2,…, dn ), where R is a relation of degree n and each di. is a domain variable.
di. θdj, where di. and dj. are domain variables and θ is one of the comparison operators (<, ≤,
>, ≥, =, ≠); the domains di. and dj. must have members that can be compared by θ.
di. θ c, where di. is a domain variable, c is a constant from the domain of di., and θ is one of
the comparison operators.
{FN, IN | (∃EN, PROJ, SEX, DOB, SAL) (EMPLOYEE(EN, FN, IN, PROJ, SEX, DOB,
SAL) ⋀ PROJ = ‘SAP’)}
b. List the details of employees working on a SAP project and drawing salary more than IRS
30000.
{FN, IN | (∃EN, PROJ, SEX, DOB, SAL) (EMPLOYEE(EN, FN, IN, PROJ, SEX, DOB,
SAL) ⋀ PROJ = ‘SAP’ ⋀ SAL > 30000)}
c. List the names of clients who have viewed a property for rent in Delhi.
{FN, IN | (∃CN, CN1, PN, PN1, CITY) (CLIENT(CN, FN, IN, TEL, PT, MR) ⋀
VIEWING((CN1, PN1, DT CMT) ⋀ PROPERTY-FOR-RENT(PN, ST, CITY, PC, TYP,
RMS, MT, ON, SN) ⋀ (CN = CN1) ⋀ PN = PN1) ⋀ CITY = ‘Delhi’)}
d. List the details of cities where there is a branch office but no properties for rent.
{CITY | (BRANCH (BN, ST, CITY, PC) ⋀ ⌉ (∃CITY1) (PROPERTY-FOR-RENT(PN, ST1,
CITY1, PC1, TYP, RMS, RNT, ON, SN, BN1) ⋀ (CITY = CITY 1))}
e. List all the cities where there is both a branch office and at least one property for client.
f. List all the cities where there is either a branch office or a property for client.
{CITY | (BRANCH (BN, ST, CITY, PC) ⋁ PROPERTY-FOR-RENT(PN, ST1, CITY1, PC1,
TYP, RMS, RNT, ON, SN, BN)}
R Q
1. In the context of a relational model, discuss each of the following concepts:
a. relation
b. attributes
c. tuple
d. cardinality
e. domain.
2. Discuss the various types of keys that are used in relational model.
3. The relations (tables) shown in Fig. 4.15 are a part of the relational database (RDBMS) of an
organisation.
Find primary key, secondary key, foreign key and candidate key.
4. Let us assume that a database system has the following relations:
12. Let us assume that a relation MANUFACTURE of a database system is given, as shown in
Fig. 4.16 below:
13. What do you mean by relational calculus? What are the types of relational calculus?
14. Define the structure of well-formed formula (WFF) in both the tuple relational calculus and
domain relational calculus.
15. What is difference between JOIN and OUTER JOIN operator?
16. Describe the relations that would be produced by the following tuple relational calculus
expressions:
17. Provide the equivalent domain relational calculus and relational algebra expressions for each
of the tuple relational calculus expressions of Exercise 16.
18. Generate the relational algebra, tuple relational calculus, and domain relational calculus
expressions for the following queries:
19. You are given the relational database as shown in Fig. 4.15. How would you retrieve the
following information, using relational algebra and relation calculus?
20. For the relation A and B shown in Fig. 4.17 below, perform the following operations and
show the resulting relations.
21. Consider a database for the telephone company that contains relation SUBSCRIBERS,
whose attributes are given as:
Assume that the INFORMATION-NO is the unique 10-digit telephone number, including
area code, provided for subscribers. Although one subscriber may have multiple phone
numbers, such alternate numbers are carried in a separate relation (table). The current relation
has a row for each distinct subscriber (but note that husband and wife, subscribing together,
can occupy two rows and share an information number). The database administrator has set
up the following rules about the relation, reflecting design intentions for the data:
No two subscribers (on separate rows) have the same social security number (SSN).
Two different subscribers can share the same information number (for example,
husband and wife). They are listed separately in the SUBSCRIBERS relation.
However, two different subscribers with the same name cannot share the same
address, city, and zip code and also the same information number.
a. Identify all candidate keys for the SUBSCRIBERS relation, based on the
assumptions given above. Note that there are such keys, one of them contains the
INFORMATION-NO attribute and a different one contains the ZIP attribute.
b. Which of these candidate keys would you choose for a primary key? Explain why.
b. The primary keys in the relations are underlined. Give a expression in the relational
algebra to express each of the following queries:
i. Find the names of all employees who work for ABC Co.
ii. Find the names and cities of residence of all employees who work for ABC Co.
iii. Find the names, street address, and cities of residence of all employees who work for ABC Co.
and earn more than INR 35000 per month.
iv. Find names of all employees who live in the same city and on the same street as do their
managers.
v. Find the names employees who do not work for ABC Co.
The key fields are underlined, and the domain of each field is listed after the field name.
Write the following queries in relational algebra, tuple relational calculus, and domain
relational calculus:
STATE TRUE/FALSE
a. Pascal
b. C.J. Date
c. Dr. Edgar F. Cord
d. none of these.
2. Who wrote the paper titled “A Relational Model of Data for Large Shared Data Banks”?
a. F.R. McFadden
b. C.J. Date
c. Dr. Edgar F. Cord
d. none of these.
3. The first large scale implementation of Codd’s relational model was IBM’s:
a. DB2
b. system R
c. ingress
d. none of these.
a. ingress
b. DB2
c. IMS
d. sybase.
a. tuple
b. relation
c. attribute
d. domain.
a. 10
b. 100
c. 1000
d. none of these.
a. 10
b. 50
c. 500
d. 5000.
a. 10
b. 100
c. 1000
d. none of these.
10. Which of the following keys in a table can uniquely identify a row in a table?
a. primary key
b. alternate key
c. candidate key
d. all of these.
a. primary key
b. alternate key
c. candidate key
d. all of these.
12. What are all candidate keys, other than the primary keys called?
a. secondary keys
b. alternate keys
c. eligible keys
d. none of these.
13. What is the name of the attribute or attribute combination of one relation whose values are
required to match those of the primary key of some other relation?
a. candidate key
b. primary key
c. foreign key
d. matching key.
a. tuple
b. relation
c. attribute
d. domain.
a. tuple
b. relation
c. attribute
d. domain.
16. What is the RDBMS terminology for a set of legal values that an attribute can have?
a. tuple
b. relation
c. attribute
d. domain.
17. What is the RDBMS terminology for the number of tuples in a relation?
a. degree
b. relation
c. attribute
d. cardinality.
a. degree
b. attribute
c. domain
d. tuple.
19. What is the RDBMS terminology for the number of attributes in a relation?
a. degree
b. relation
c. attribute
d. cardinality.
20. Which of the following aspects of data is the concern of a relational database model?
a. data manipulation
b. data integrity
c. data structure
d. all of these.
a. data type
b. field
c. data value
d. none of these.
5.1 INTRODUCTION
Dr. Edgar F. Codd proposed a set of rules that were intended to define the
important characteristics and capabilities of any relational system [Codd
1986]. Today, Codd’s rules are used as a yardstick for what can be expected
from a conventional relational DBMS. Though, it is referred to as “Codd’s
twelve rules”, in reality there are thirteen rules. The Codd’s rules are
summarised in Table 5.1.
Of the given rules in Table 5.1 Rules 1 to 5 and 8 are well supported by
the majority of current commercially available RDBMSs. Rule 11 is
applicable to distributed database systems.
XYZ = (R * S) : B = C % A, D
LIST XYZ
P = XYZ + Q
the values of R and S are at that time submitted into the formula for XYZ
to get a value for XYZ.
c. Print name of parts (in the relation of example (b) having cost more than Rs. 4000.00.
e. Print names of the suppliers (in the relation of example (b) who supply every part ordered by
a customer “Abhishek”
f. Print names of the suppliers (in the relation of example (b) who have not supplied any part
ordered by a customer “ Abhishek”
range of t1 is R1
range of t2 is R2
:
:
range of tn is Rn
where ψ
Table 5.3 shows various QUEL operations for relations R(A1…….An) and
S(B1……Bm).
range of t is CUSTOMERS
RETRIEVE (t. CUST-NAME)
where t. BALANCE < 0
ii. Print the supplier names, items and prices of all suppliers that supply at least one
item ordered by M/s ABC Co.
range of t is ORDERS
range of s is SUPPLIERS
RETRIEVE (s. SUP-NAME, s.ITEM, s.PRICE)
where t. CUST-NAME = “M/s ABC Co.” and t. ITEM = s. ITEM
iii. Print the supplier names that supply every item ordered by M/s ABC Co.
This query can be executed in the following three steps.
CREATE STUDENT_ADMISSION
CREATE STUDENTS (S-NAME IS 〈format〉, Roll-NO. IS 〈format〉, ADDRESS IS 〈format〉,
MAIN IS 〈format〉)
CREATE ADMISSION (ROLL-NO. IS 〈format〉, COURSE IS 〈format〉, SEMESTER IS
〈format〉)
CREATE FACULTY (COURSE IS 〈format〉, FACULTY IS 〈format〉, SEMESTER IS
〈format〉)
CREATE OFFERING (BRANCH IS 〈format〉, COURSE IS 〈format〉)
CREATE TABLE
CREATE VIEW
CREATE INDEX
ALTER TABLE
DROP TABLE
DROP VIEW
DROP INDEX
Fig. 5.2 Creating SQL table for employee health centre schema
As shown in Fig. 5.2 under PATIENT table, a constraint may be named by
preceding it with Constraint 〈constraint-table-name〉. ON UPDATE and ON
DELETE clauses are used to trigger referential integrity checks and
specifying their corresponding actions. The possible actions of these clauses
are SET NULL, SET DEFAULT and CASCADE. Both SET NULL and SET
DEFAULT remove the relationship by resetting the foreign key value to null,
or to its default if it has one. The action is same for both updates and deletes.
The effect of CASCADE depends on the event. With ON UPDATE, a
change to the primary key value in the related tuple is reflected in the foreign
key. Changing a primary key should normally be avoided but it may be
necessary when a value has been entered incorrectly. Cascaded update
ensures that referential integrity is maintained. With ON DELETE, if the
related tuple is deleted then the tuple containing the foreign key is also
deleted. Cascaded deletes are therefore appropriate for mandatory
relationships such as those involving weak entity classes. As shown in Fig.
5.2, the PATIENT table includes the following named referential integrity
constraints:
CONSTRAINT PATIENT-REG
FOREIGN KEY (REGISTERED-WITH) REFERENCES DOCTOR
(DOCTOR-ID)
ON DELETE SET NULL
ON UPDATE CASCADE
The above statement means to drop the schema named as well as all
tables, data and other schema objects that still exist (that means removing
the entire schema irrespective of its content).
The above statement means to drop the schema only if all other schema
objects have already been deleted (that is only if schema is empty).
Otherwise, an exception will be raised.
Here, the default value of 10 for the appointment duration has been
changed to 20. The default even can be removed, as shown in the second
statement above.
The CLUSTER option could also be specified to indicate the records and
are to be placed in physical proximity to each other. The unique option
specifies that only one record could exist at any time with a given value for
the column(s) specified in the statement of create the index. An example of
creating index for EMPLOYEE relation is given below.
The sub-query cannot include either UNION or ORDER BY. The clause
‘WITH CHECK OPTION’ indicates that modifications (update and insert)
operations against the view are to be checked to ensure that the modified
row satisfies the view-defining condition. There are limitations on updating
data through views. Where views can be updated, those changes can be
transferred to the underlying base tables originally referenced to create the
view. An example of creating view for EMPLOYEE relation is given below:
In the above syntax, the clauses such as WHERE, GROUP BY, HAVING
and ORDER BY, are optional. They are included in the SELECT statement
only when functions provided by them are required in the query. In its basic
form of the SQL the SELECT statement is formed of three clauses namely,
SELECT, FROM and WHERE. This basic form of SELECT statement is
sometimes called a mapping or a select-from-where block. These three
clauses corresponds to the relational algebra operations as follows:
The SELECT clause corresponds to the projection operation of the relational algebra. It is
used to list the attributes (columns) desired in the result of a query. SELECT * is used to get
all the columns of a particular table.
The FROM clause corresponds to the Cartesian-product operation of the relational algebra. It
is used to list the relations (tables) to be scanned from where data has to be retrieved.
The WHERE clause corresponds to the selection predicate of the relational algebra. It
consists of a predicate involving attributes of the relations that appear in the FROM clause. It
tells SQL to include only certain rows of data in the result set. The search criteria is specified
in WHERE clause.
In the above syntax, attribute name along with relation name is optional.
An example of INSERT command, with reference to Fig. 4.15 of Chapter 4,
is given in Fig. 5.13 below:
Fig. 5.13 Examples of INSERT command
In Query 2 of the above example, the INSERT command allows the user
to specify explicit attribute names that correspond to the values provided in
the INSERT command. This is useful if a relation has many attributes, but
only a few of those attributes are assigned values in the new tuple. The
attributes not specified in the command format (as shown in Query 2), are
set to their DEFAULT or to NULL and the values are listed in the same
order as the attributes are listed in the INSERT command itself.
UPDATE 〈table-name〉
SET 〈target-value-list〉
WHERE 〈predicate〉
GRANT 〈privilege(s)〉
ON 〈table-name/view-name〉
TO 〈user(s)-id〉, 〈group(s)-id〉, 〈public〉
The key words for this command are GRANT, ON and TO. A privilege is
typically a SQL command such as CREATE, UPDATE or DROP and so on.
The user-id is the identification code of the user to whom the DBA wants to
grant the specific privilege. The example of GRANT command is given
below:
In the above examples, DBA has granted a user-id named Abhishek the
capability to create, update, drop and or select tables. As shown in example
4, the DBA has granted Abhishek the right to create, update, drop and select
data in ITEMS table. Furthermore, Abhishek can grant these same rights to
others at his descretion.
REVOKE 〈privilege(s)〉
ON 〈table-name/view-name〉
FROM 〈user(s)-id〉, 〈group(s)-id〉, 〈public〉
The key words for this command are REVOKE and FROM. The example
of REVOKE command is given below:
In the above examples, DBA has revoked the privileges that were
previously granted to user-id named Abhishek.
For example, the following code segment shows how an SQL statement is
included in a COBOL program.
CCOBOL statement
…
…
EXEC SQL
SELECT 〈attribute(s)-name〉
INTO: WS-NAME
FROM 〈table(s)-name〉
WHERE 〈conditions〉
END-EXEC
The embedded SQL statements are thus used in the application to perform
the data access and manipulation tasks. A special SQL pre-complier accepts
the combined source code that is, code containing the embedded SQL
statements and code containing programming language statements. It
compiles to convert into the executable form. This compilation process is
slightly different from the compilation of a program, which does not have
embedded SQL statements.
The exact syntax for embedded SQL requests depends on the language in
which SQL is embedded. For instance, a semicolon (;) is used instead of
END-EXEC (as in case of COBOL) when SQL is embedded in ‘C’. The
Java embedding of SQL (called SQL) uses syntax
EXEC SQL
DECLARE 〈variable-name〉 CURSOR FOR
SELECT 〈attribute(s)-name〉
FROM 〈table(s)-name〉
WHERE 〈conditions〉
END-EXEC
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
R Q
With reference to the above relations display the result of the following commands:
7. With reference to Fig. 5.22, write relational statements to answer the following queries:
8. What is Information System Based Language (ISBL)? What are its limitations?
9. Explain the syntax of ISBL for executing query. Show the comparison of syntax of ISBN and
relational algebra.
10. How do we create an external relation using ISBL syntax?
11. What will be the output of the following ISBL syntax?
a. Print the supplier names, items, and prices of all suppliers that supply at least one
item ordered by M/s ABC Co.
b. Print the supplier names that supply every item ordered by M/s ABC Co.
c. Print the names of customers with negative balance.
16. How do we create an external relation using QUEL? Explain.
17. What is structured query language? What are its advantages and disadvantages?
18. Explain the syntax of SQL for executing query.
19. What is the basic data structure of SQL? What do you mean by SQL data type? Explain.
20. What are SQL operators? List them in a tabular form.
21. What are the uses of views? How are data retrieved using views.
22. What are the main components of SQL? List the commands/statements used under these
components.
23. What are logical operators in SQL? Explain with examples.
24. Write short notes on the following:
25. How do we create table, views and index using SQL commands?
26. What would be the output of following SQL statements?
27. What is embedded SQL? Why do we use it? What are its advantages?
28. The following four relations (tables), as shown in Fig. 5.23, constitute the database of an
appliance repair company named M/s ABC Appliances Company. The company maintains
the following information:
Formulate the SQL commands to answer the following requests for data from M/s ABC
Appliances Company database:
29. Using the database of M/s ABC Appliances Company of Fig. 5.23, translate the meaning of
following SQL commands and indicate their results with the data shown.
(a) SELECT *
FROM TECHNICIAN
WHERE JOB-TITLE = ‘Sr. Technician’
(b) SELECT APPL-NO, APPL-OWN, APPL-
AGE
FROM APPLIANCES
WHERE APPL-TYPE = ‘Freezer’
ORDER BY APPL-AGE
(c) SELECT APPL-TYPE, APPL-OWN
FROM APPLIANCES
WHERE APPL-AGE BETWEEN 4 AND
9
(d) SELECT COUNT(*)
FROM TECHNICIAN
(e) SELECT AVG(RATE)
FROM TYPES
GROUP BY APPL-CAT
(f) SELECT APPL-NO. APPL-OWN
FROM TYPES, APPLIANCES
WHERE TYPES. APPL-TYPE =
APPLIANCES. APPL-TYPE
AND APPL-CAT =
‘Minor’
(g) SELECT APPL-NAME, APPL-OWN
FROM TECHNICIAN,
QUALIFICATION,
APPLIANCES
WHERE TECHNICIAN.TECH-ID =
QUALIFICATION.TECH-NO
AND QUALIFICATION.APPL-TYPE =
APPLIANCES.APPL-TYPE
AND TECH-NAME = ‘Rajesh Mathew’
30. What are the uses of SUM(), AVG(), COUNT(), MIN() and MAX()?
31. What is query-by-example (QBE)? What are its advantages?
32. List the QBE commands in relational database system. Explain the meaning of these
commands with examples.
33. Using the database of M/s ABC Appliances Company of Fig. 5.23, translate the meaning of
following QBE commands and indicate their results with the data shown.
34. Consider the following relational schema in which an employee can work in more than one
department.
STATE TRUE/FALSE
1. Dr. Edgar F. Codd proposed a set of rules that were intended to define the important
characteristics and capabilities of any relational system.
2. Codd’s Logical Data Independence rule states that user operations and application programs
should be independent of any changes in the logical structure of base tables provided they
involve no loss information.
3. The entire field of RDBMS has its origin in Dr. E.F. Codd’s paper.
4. ISBL has no aggregate operators for example, average, mean and so on.
5. ISBL has no facilities for insertion, deletion or modification of tuples.
6. QUEL is a tuple relational calculus language of a relational database system INGRESS
(Interactive Graphics and Retrieval System).
7. QUEL supports relational algebraic operations such as intersection, minus or union.
8. The first commercial RDBMS was IBM’s DB2.
9. The first commercial RDBMS was IDM’s INGRES.
10. SEQUEL and SQL are the same.
11. SQL is a relational query language.
12. SQL is essentially not a free-format language.
13. SQL statements can be invoked either interactively in a terminal session but cannot be
embedded in application programs.
14. In SQL data type of every data object is required to be declared by the programmer while
using programming languages.
15. HAVING clause is equivalent of WHERE clause and is used to specify the search criteria or
search condition when GROUP BY clause is specified.
16. HAVING clause is used to eliminate groups just as WHERE is used to eliminate rows.
17. If HAVING is specified, ORDER BY clause must also be specified.
18. ALTER TABLE command enables us to delete columns from a table.
19. The SQL data definition language provides commands for defining relation schemas,
deleting relations and modifying relation schemas.
20. In SQL, it is not possible to create local or global temporary tables within a transaction.
21. All tasks related to relational data management cannot be done using SQL alone.
22. DCL commands let users insert data into the database, modify and delete the data in the
database.
23. DML consists of commands that control the user access to the database objects.
24. If nothing is specified, the result set is stored in descending order, which is the default.
25. ‘*’ is used to get all the columns of a particular table.
26. The CREATE TABLE statement creates new base table.
27. A based table is not an autonomous named table.
28. DDL is used to create, alter and delete database objects.
29. SQL data administration statement (DAS) allows the user to perform audits and analysis on
operations within the database.
30. COMMIT statement ends the transaction successfully, making the database changes
permanent.
31. Data administration Commands allow the users to perform audits and analysis on operations
within the database.
32. Transaction control statements manage all the changes made by the DML statement.
33. DQL enables the users to query one or more table to get the information they want.
34. In embedded SQL, SQL statements are merged with the host programming language.
35. The DISTINCT keyword is illegal for MAX and MIN.
36. Application written in SQL can be easily ported across systems.
37. Query-By-Example (QBE) is a two-dimensional domain calculus language.
38. QBE was originally developed by M.M. Zloof at IBM’s T.J. Waston Research Centre.
39. QBE represents a visual approach for accessing information in a database through the use of
query templates.
40. The QBE make-table action query is an action query as it performs an action on existing
table or tables to create a new table.
41. QBE differs from SQL in that the user does not have to specify a structured query explicitly.
42. In QBE, user does not have to remember the names of the attributes or relations, because
they are displayed as part of the templates.
43. The delete action query of QBE deletes one or more than one records from a table or more
than one table.
a. the system should be able to perform all theoretically possible updates on views.
b. the logical description of the database is represented and may be interrogated by
authorised users, in the same way as for normal data.
c. the ability to treat whole tables as single objects applies to insertion, modification
and deletion, as well as retrieval of data.
d. null values are systematically supported independent of data type.
a. the system should be able to perform all theoretically possible updates on views.
b. the logical description of the database is represented and may be interrogated by
authorised users, in the same way as for normal data.
c. the ability to treat whole tables as single objects applies to insertion, modification
and deletion, as well as retrieval of data.
d. null values are systematically supported independent of data type.
a. SEQUEL
b. SQL
c. QUEL
d. All of these.
a. INDEX
b. CREATE
c. MODIFY
d. DELETE.
a. GET
b. RETRIEVE
c. SELECT
d. None of these.
a. COUNT
b. Intersection
c. Union
d. Subquery.
a. INGRES
b. DB2
c. ORACLE
d. None of these.
a. INGRESS
b. DB2
c. ORACLE
d. None of these.
a. CREATE TABLE
b. MAKE TABLE
c. CONSTRUCT TABLE
d. None of these.
a. TRIGGER
b. INDEX
c. TABLE
d. None of these.
14. The SQL data definition language (DDL) provides commands for:
a. DB2
b. SQL/DS
c. IMS
d. None of these.
a. MODIFY TABLE
b. UPDATE TABLE
c. ALTER TABLE
d. All of these.
20. Which of the following clause specifies the table or tables from where the data has to be
retrieved?
a. WHERE
b. TABLE
c. FROM
d. None of these.
22. Which of the following is used to get all the columns of a table?
a. *
b. @
c. %
d. #
a. LIKE
b. BETWEEN
c. IN
d. None of these
26. Which of the following clause is usually used together with aggregate functions?
a. ORDER BY ASC
b. GROUP BY
c. ORDER BY DESC
d. None of these.
a. ALTER
b. DROP
c. CREATE
d. SELECT.
a. ROLLBACK
b. GRANT
c. REVOKE
d. None of these.
a. UPDATE
b. COMMIT
c. INSERT
d. DELETE.
1. Information system based language (ISBL) is a pure relational algebra based query language,
which was developed in _____ in UK in the year _____.
2. ISBL was first used in an experimental interactive database management system called
_____.
3. In ISBL, to print the value of an expression, the command is preceded by _____.
4. _____ is a standard command set used to communicate with the RDBMS.
5. To query data from tables in a database, we use the _____ statement.
6. The expanded from of QUEL is _____.
7. QUEL is a tuple relational calculus language of a relational database system called _____.
8. QUEL is based on _____.
9. INGRES is the relational database management system developed at _____.
10. _____ is the data definition and data manipulation language for INGRES.
11. The data definition statements used in QUEL (a)_____, (b)_____, (c)_____, (d)_____ and
(e)_____.
12. The basic data retrieval statement in QUEL is _____.
13. SEQUEL was the first prototype query language of _____.
14. SEQUEL was implemented in the IBM prototype called _____ in early-1970s.
15. SQL was first implemented on a relational database called _____.
16. DROP operation of SQL is used for _____ tables from the schema.
17. The SQL data definition language provides commands for (a)_____, (b)_____, and (c
)_____.
18. _____ is an example of data definition language command or statement.
19. _____ is an example of data manipulation language command or statement.
20. The _____ clause sorts or orders the results based on the data in one or more columns in the
ascending or descending order.
21. The _____ clause _____ specifies a summary query.
22. _____ is an example of data control language command or statement.
23. The _____ clause _____ specifies the table or tables from where the data has to be retrieved.
24. The _____ clause _____ directs SQL to include only certain rows of data in the result set.
25. _____ is an example of data administration system command or statement.
26. _____ is an example of transaction control statement.
27. SQL data administration statement (DAS) allows the user to perform (a) _____and (b) _____
on operations within the database.
28. The five aggregate functions provided by SQL are (a) _____, (b) _____, (c) _____, (d )
_____and (e) _____.
29. Portability of embedded SQL is _____.
30. Query-By-Example (QBE) is a two-dimensional _____ language.
31. QBE was originally developed by _____ at IBM’s T.J. Watson Research Centre.
32. The QBE _____ creates a new table from all or part of the data in one or more tables.
33. QBE’s _____ can be used to update or modify the values of one or more records in one or
more than one table in a database.
34. In QBE, the query is formulated by filling in _____ of relations that are displayed on the MS-
Access scree.
Chapter 6
6.1 INTRODUCTION
6.2.1 Entities
An entity is an ‘object’ or a ‘thing’ in the real world with an independent
existence and that is distinguishable from other objects. Entities are the
principle data objects about which information is to be collected. An entity
may be an object with a physical existence such as a person, car, house,
employee or city. Or, it may be an object with a conceptual existence such as
a company, an enterprise, a job or an event of informational interest. Each
entity has attributes. Some of the examples of the entity are given below:
An entity set (also called entity type) is a set of entities of the same type that
share the same properties or attributes. In E-R modelling, similar entities are
grouped into an entity type. An entity type is a group of objects with the
same properties. These are identified by the enterprise as having an
independent existence. It can have objects with physical (or real) existence
or objects with a conceptual (or abstract) existence. Each entity type is
identified by a name and a list of properties. A database normally contains
many different entity types and not to a single entity occurrence. In other
words, the word ‘entity’ in the E-R modelling corresponds to a table and not
to a row in the relational environment. The E-R model refers to a specific
table row as an entity instance or entity occurrence. An entity occurrence
(also called entity instance) is a uniquely identifiable object of an entity type.
For example, in a relation (table) PERSONS, the person identification
(PERSON-ID), person name (PERSON-NAME), designation (DESG), date
of birth (DOB) and so on are all entities. In Fig. 6.1, there are two entity sets
namely PROJECT and PERSON.
Entity types can be classified as being strong or weak entity. An entity type
that is not existence-dependent on some other entity type is called strong
entity type. The strong entity type has a characteristic that each entity
occurrence is uniquely identifiable using the primary key attribute(s) of that
entity type. Weak entity types are sometimes referred to as child, dependent
or subordinate entities. An entity type that is existence- dependent on some
other entity type is called weak entity type. The week entity type has a
characteristic that each entity occurrence cannot be uniquely identifiable
using only the attributes associated with that entity type. Strong entity types
are sometimes referred to as parent, owner or dominant entities.
With reference to semantic net of Fig 6.1, Fig. 6.2 illustrates the
distinction between and entity type and two of its instances.
Fig. 6.1 Semantic net of an enterprise
As shown in Fig. 6.4, there are three basic constructs of connectivity for
binary relationship namely, one to-one (1:1), one-to-many (1:N), and many-
to-many (M:N). In case of one-to-one connection, exactly one PERSON
manages the entity DEPT and each person manages exactly one DEPT.
Therefore, the maximum and minimum connectivities are exactly one for
both the entities. In case of one-to-many (1:N), the entity DEPT is associated
to many PERSON, whereas each person works within exactly one DEPT.
The maximum and minimum connectivities to the PERSON side are of
unknown value N, and one respectively. Both maximum and minimum
connectivities on DEPT side are one only. In case of many-to-many (M:N)
connectivity, the entity PERSON may work on many PROJECTS and each
project may be handled by many persons. Therefore, maximum connectivity
for PERSON and PROJECT are M and N respectively, and minimum
connectivities are each defined as one. If the values of M and N are 10 and 5
respectively, it means that the entity PERSON may be a member of a
maximum 5 PROJECTs, whereas, the entity PROJECT may contain
maximum of 10 PERSONs.
6.2.3 Attributes
An attribute is a property of an entity or a relationship type. An entity is
described using a set of attributes. All entities in a given entity type have the
same or similar attributes. For example, an EMPLOYEE entity type could
use name (NAME), social security number (SSN), date of birth (DOB) and
so on as attributes. A domain of possible values identifies each attribute
associated with an entity type. Each attribute is associated with a set of
values called a domain. The domain defines the potential values that an
attribute may hold and is similar to the domain concept in relational model
explained in Chapter 4, Section 4.3.1. For example, if the age of an
employee in an enterprise is between 18 and 60 years, we can define a set of
values for the age attribute of the ‘employee’ entity as the set of integers
between 18 and 60. Domain can be composed of more than one domain. For
example, domain for the date of birth attribute is made up of sub-domains
namely, day, month and year. Attributes may share a domain and is called
the attribute domain. The attribute domain is the set of allowable values for
one or more attributes. For example, the date of birth attributes for both
‘worker’ and ‘supervisor’ entities in an organisation can share the same
domain.
Fig. 6.6 Existence of a relationship
The attributes hold values that describe each entity occurrence and
represent the main part of the data stored in the database. For example, an
attribute NAME of EMPLOYEE entity might be the set of 30 characters
strings, SSN might be of 10 integers and so on. Attributes can be assigned to
relationships as well as to entities. An attribute of a many-to-many
relationship such as the ‘works-on’ relationship of Fig. 6.4 between the
entities PERSON and PROJECT could be ‘task-management’ or ‘start-date’.
In this case, a given task assignment or start date is common only to an
instance of the assignment of a particular PERSON to a particular
PROJECT, and it would be multivalued when characterising either the
PERSON or the PROJECT entity alone. Attributes of relationships are
assigned only to binary many-to-many relationships and to ternary
relationships and normally not to one-to-one or one-to-many relationships.
This is because at least one side of the relationship is a single entity and
there is no ambiguity in assigning the attribute to a particular entity instead
of assigning it to relationship.
Attributes can be classified into the following three categories:
Simple attribute.
Composite attribute.
Single-valued attribute.
Multi-valued attribute.
Derived attribute.
6.2.3.1 Simple Attributes
A simple attribute is an attribute composed of a single component with an
independent existence. A simple attribute cannot be subdivided or broken
down into smaller components. Simple attributes are sometimes called
atomic attributes. EMP-ID, EMP-NAME, SALARY and EMP-DOB of the
EMPLOYEE entity are the example of simple attributes.
(a) Multi-attribute
(b) Multi-value
6.2.4 Constraints
Relationship types usually have certain constraints that limit the possible
combinations of entities that may participate in the corresponding
relationship set. The constraints should reflect the restrictions on the
relationships as perceived in the ‘real world’. For example, there could be a
requirement that each department in the entity DEPT must have a person and
each person in the PERSON entity must have a skill. The main types of
constraints on relationships are multiplicity, cardinality, participation and so
on.
An E-R model can be converted to relations, in which each entity set and
each relationship set is converted to a relation. Fig. 6.12 illustrates a
conversion of E-R diagram into a set of relations.
Fig. 6.12 Conversion of E-R model to relations
A separate relation represents each entity set and each relationship set.
The attributes of the entities in the entity set become the attributes of the
relation, which represents that entity set. The entity identifier becomes the
key of the relation and each entity is represented by a tuple in the relation.
Similarly, the attributes of the relationships in each relationship set become
the attributes of the relation, which represents the relationship set. The
relationship identifiers become the key of the relation and each relationship
is represented by a tuple in that relation.
The E-R model of Fig. 6.1 and Fig. 6.12 (a) is converted to the following
three relations as shown in Fig. 6.12 (b):
b. An entity table with the embedded foreign key of the parent entity: This is one of the most
common ways CASE tools handle relationships. It prompts the user to define a foreign key in
the ‘child’ table that matches a primary key in the ‘parent’ table. This transformation rule
always occurs with the following relationships:
c. A relationship table with the foreign keys of all the entities in the relationship: This is the
other most common way CASE tools handle relationships in the E-R model. In this case, a
many-to-many (M:N) relationship can only be defined in terms of a table that contains
foreign keys that match the primary keys of the two associated entities. This new table may
also contain attributes of the original relationship. This transformation rule always occurs
with the following relationships:
In the above transformations, the following rules apply to handle SQL null
values:
Nulls are allowed in an entity table for foreign keys of associated (referenced) optional
entities.
Nulls are not allowed in an entity table for foreign keys of associated (referenced) mandatory
entities.
Nulls are not allowed for any key in a relationship table because only complete row entries
are meaningful in the entries.
Some problems, called connection traps, may arise when creating an E-R
model. The connection traps normally occur due to a misinterpretation of the
meaning of certain relationships. There are mainly two types of connection
traps:
Fan traps.
Chasm traps.
a. Faculty can teach the same course in several semesters and each offering must be
recorded.
b. Faculty can teach the same course in several semesters and only the most recent
such offering needs to be recorded.
c. Every faculty must teach some course and only the most recent such offering needs
to be recorded.
d. Every faculty teaches exactly one course and every course must be taught by some
faculty.
13. Discuss the E-R symbols used for E-R diagram. Discuss the conventions for displaying an E-
R model database schema as an E-R diagram.
14. E-R diagram of Fig. 6.25 shows a simplified schema for an Airline Reservations System.
From the E-R diagram, extract the requirements and constraints that produced this schema.
15. A university needs a database to hold current information on its students. An initial analysis
of these requirements produced the following facts:
a. Each of the faculties in the university is identified by a unique name and a faculty
head is responsible for each faculty.
b. There are several major courses in the university. Some major courses are managed
by one faculty member, whereas others are managed jointly by two or more faculty
members.
c. Teaching is organised into courses and varying numbers of tutorials are organised
for each course.
d. Each major course has a number of required courses.
e. Each course is supervised by one faculty member.
f. Each major course has a unique name.
g. A student has to pass the prerequisite courses to take certain courses.
h. Each course is at a given level and has a credit-point value.
i. Each course has one lecturer in charge of the course. The university keeps a record
of the lecturer’s name and address.
j. Each course can have a number of tutors.
k. Any number of students can be enrolled in each of the major courses.
l. Each student can be enrolled in only one major course and the university keeps a
record of that student’s name and address and an emergency contact number.
m. Any number of students can be enrolled in a course and each student in a course can
be enrolled in only one tutorial for that course.
n. Each tutorial has one tutor assigned to it.
o. A tutor can tutor in more than one tutorial for one or more courses.
p. Each tutorial is given in an assigned class room at a given time on a given day.
q. Each tutor not only supervises tutorials but also is in charge of some course.
Identify the entities and relationships for this university and construct an E-R
diagram.
16. Some new information has been added in the database of Exercise 6.15, which are as
follows:
a. Some tutors work part time and some are full-time staff members. Some tutors (may
be from both full-time and part-time) are not in charge of any units.
b. Some students are enrolled in major courses, whereas others are enrolled in a single
course only. Change your E-R diagrams considering the additional information.
a. Attribute
b. Domain
c. Relationship
d. Entity
e. Entity set
f. Relationship set
g. 1:1 relationship
h. 1:N relationship
i. M:N relationship
j. Strong entity
k. Weak entity
l. Constraint
m. Role name
n. Identifier
o. Degree of relationship
p. Composite attribute
q. Multi-valued attribute
r. Derived attribute.
25. Define the concept of aggregation. Give few examples of where this concept is used.
26. We can convert any weak entity set into a strong entity set by adding appropriate attributes.
Why, then, do we have a weak entity set?
27. A person identified by a PER-ID and a LAST-NAME, can own any number of vehicles. Each
vehicle is of a given VEH-MAKE and is registered in any one of a number of states
identified STATE-NAME. The registration number (REG-NO) and the registration
termination date (REG-TERM-DATE) are of interest, and so is the address of a registration
office (REG-OFF-ADD) in each state.
Identify the entities and relationships for this enterprise and construct an E-R diagram.
28. An organisation purchases items from a number of suppliers. Suppliers are identified by
SUP-ID. It keeps track of the number of each item type purchased from each supplier. It also
keeps a record of supplier’s addresses. Supplied items are identified by ITEM-TYPE and
have description (DESC). There may be more than one such addresses for each supplier and
the price charged by each supplier for each item type is stored.
Identify the entities and relationships for this organisation and construct an E-R diagram.
29. Given the following E-R diagram of Fig. 6.26, define the appropriate SQL tables.
30. (a) Construct an E-R diagram for a hospital management system with a set of doctors and a
set of patients. With each patient, a series of various tests and examinations are conducted.
On the basis of preliminary report patients are admitted to a particular speciality ward.
(b) Construct appropriate tables for the above E-R diagram.
31. A chemical testing laboratory has several chemists who work on one or more projects.
Chemists may have a variety of equipment on each project. The CHEMIST has the attributes
namely EMP-ID (identifier), CHEM-NAME, ADDRESS and PHONE-NO. The PROJECT
has attributes such as PROJ-ID (identifier), START-DATE and END-DATE. The
EQUIPMENT has attributes such as EQUP-SERIAL-NO and EQUP-COST. The laboratory
management wants to record the EQUP-ISSUE-DATE when given equipment item is
assigned to a particular chemist working on a specified project. A chemist must be assigned
to at least one project and one equipment item. A given equipment item need not be assigned
and a given project need not be assigned either a chemist or an equipment item.
Draw an E-R diagram for this situation.
32. A project handling organisation has persons identified by a PER-ID and a LAST-NAME.
Persons are assigned to departments identified by a DEP-NAME. Persons work on projects
and each project has a PROJ-ID and a PROJ-BUDGET. Each project is managed by one
department and a department may manage many projects. But a person may work on only
some (or none) of the projects in his or her department.
a. Identify the entities and relationships for this organisation and construct an E-R
diagram.
b. Would your E-R diagram change if the person worked on all the projects in his or
her department?
c. Would there be any change if you also recorded the TIME-SPENT by the person on
each project?
STATE TRUE/FALSE
a. binary relationship.
b. ternary relationship.
c. recursive relationship.
d. none of these.
a. binary relationship.
b. ternary relationship.
c. recursive relationship.
d. none of these.
a. external.
b. internal.
c. conceptual.
d. all of these.
a. entity.
b. attribute.
c. relationship.
d. all of these.
a. composite attribute.
b. atomic attribute.
c. single-valued attribute.
d. derived attribute.
10. The attribute composed of multiple components, each with an independent existence is
called:
a. composite attribute.
b. simple attribute.
c. single-valued attribute.
d. derived attribute.
11. Which of these expresses the specific number of entity occurrences associated with one
occurrence of the related entity?
a. degree of relationship.
b. connectivity of relationship.
c. cardinality of relationship.
d. none of these.
FILL IN THE BLANKS
7.1 INTRODUCTION
The basic concepts of an E-R model discussed in Chapter 6 are adequate for
representing database schemas for traditional and administrative database
applications in business and industry such as customer invoicing, payroll
processing, product ordering and so on. However, it poses inherent problems
when representing complex applications of newer databases that are more
demanding than traditional applications such as Computer-aided Software
Engineering (CASE) tools, Computer-aided Design (CAD) and Computer-
aided Manufacturing (CAM), Digital Publishing, Data Mining, Data
Warehousing, Telecommunications applications, images and graphics,
Multimedia Systems, Geographical Information Systems (GIS), World Wide
Web (WWW) applications and so on. The designers to represent these
modern and more complex applications use additional semantic modelling
concepts. There are various abstractions available to capture semantic
features, which cannot be explicitly modelled by entity and relationships.
Enhanced Entity-Relationship (EER) model uses such additional semantic
concepts incorporated into the original E-R model to overcome these
problems. The EER model consists of all the concepts of the E-R model
together with the following additional concepts:
Specialisation/Generalisation
Categorisation.
This chapter describes the entity types called superclasses (or supertype)
and subclasses (or subtype) in addition to these additional concepts
associated with the EER model. How to convert the E-R model into EER
model has also been demonstrated in this chapter.
It can be noticed from the above attributes that all the three categories of
employees have some attributes in common such as EMP-ID, EMP-NAME,
ADDRESS, DATE-OF-BIRTH and DATE-OF-JOINING. In addition to
these common attributes, each type has one or more attributes that is unique
to that type. For example, SALARY and ALLOWANCES are unique to a
fulltime employee, whereas the HOURLY-RATE is unique to the part time
employees and so on. While developing a conceptual data model in this
situation, the database designer might consider the following three choices:
a. Treat these entities as three separate ones. In this case, the model will fail to exploit the
common attributes of all employees and thus creating an inefficient model.
b. Treat these entities as a single entity, which contains a superset of all attributes. In this case,
the model requires the use of nulls (or the attributes for which the different entities have no
value), thus making the design more complex.
Let us expand the EMPLOYEE example of Fig. 7.3 to illustrate the above
conditions, as shown in Fig. 7.4. The EMPLOYEE supertype has three
subtypes namely FULL-TIME-EMPLOYEE, PART-TIME-EMPLOYEE and
CONSULTANT. All employees have common attributes like EMP-ID,
EMP-NAME, ADDRESS, DATE-OF-BIRTH and DATE-OF-JOINING.
Each subtype has attributes unique to that subtype. Full time employees have
SALARY and ALLOWANCE, part time employees have HOURLY-RATE,
and consultants have CONTRACT-NO and BILLING-RATE. The full time
employees have a unique relationship with the TRAINING entity. Only full
time employees can enrol in the training courses conducted by the enterprise.
Thus, this is a case where one has to use supertype/subtype relationship as
there exist an instance of a subtype that participate in a relationship that is
unique to that subtype.
7.3.1 Specialisation
Specialisation is the process of identifying subsets of an entity set (the
superclass or supertype) that share some distinguishing characteristic. In
other words, specialisation maximises the differences between the members
of an entity by identifying the distinguishing and unique characteristics (or
attributes) of each member. Specialisation is a top-down process of defining
superclasses and their related subclasses. Typically the superclass is defined
first, the subclasses are defined next and subclass-specific attributes and
relationship sets are then added. If specialisation approach was not applied,
the three subtypes would have looked like as depicted in Fig. 7.5 (a).
Creation of three subtypes for the EMPLOYEE supertype in Fig. 7.5 (b),
is an example of specialisation. The three subclasses have many attributes in
common. But there are also attributes that are unique to each subtype, for
example, SALARY and ALLOWANCES for the FULL-TIME-EMPLOYEE.
Also there are relationships unique to some subclasses, for example,
relationship of full-time employee to TRAINING. In this case, specialisation
has permitted a preferred representation of the problem domain.
Fig. 7.5 Example of specialisation
7.3.2 Generalisation
Generalisation is the process of identifying some common characteristics of
a collection of entity sets and creating a new entity set that contains entities
processing these common characteristics. In other words, it is the process of
minimising the differences between the entities by identifying the common
features. Generalisation is a bottom-up process, just opposite to the
specialisation process. It identifies a generalised superclass from the original
subclasses. Typically, these subclasses are defined first, the superclass is
defined next and any relationship sets that involve the superclass are then
defined. Creation of the EMPLOYEE superclass with common attributes of
three subclasses namely FULL-TIME-EMPLOYEE, PART-TIME-
EMPLOYEE and CONSULTANT as shown in Fig. 7.7, is an example of
generalisation.
Fig. 7.7 Example of generalisation
If the subtypes are not constrained to be disjoint, the sets of entities may
overlap. In other words, the same real-world entity may be a member of
more than one subtype of the specialisation/generalisation. This is called an
overlapping constraint. For example, the subtypes of EMPLOYEE
supertype namely UNION-MEMBER and CLUB-MEMBER are connected
as shown in Fig. 7.9 (b). This means that an employee can be a member of
one, or two of the subtypes. In other words, an employee can be a union-
member as well as a club-member. The overlapping constraint is represented
by placing letter ‘o’ in the circle that connects the subtypes to the supertype.
7.4 CATEGORISATION
R Q
1. What are the disadvantages or limitations of an E-R Model? What led to the development of
EER model?
2. What do you mean by superclass and subclass entity types? What are the differences between
them? Explain with an example.
3. Using a semantic net diagram, explain the concept of superclasses and subclasses.
4. With an example, explain the notations used for EER diagram while designing database for
an enterprise.
5. What do you mean by attribute enheritance? Why do we use it in EER diagram? Explain with
an example.
6. Differentiate between a shared subtype and a multiple enheritance.
7. What are the conditions that must be considered while deciding on supertype/subtype
relationship? Explain with an example.
8. What are the advantages of using supertypes and subtypes?
9. What do you understand by specialisation and generalisation in EER modelling? Explain
with examples.
10. Discuss the constraints on specialisation and generalisation.
11. What is participation constraint? What are its types? Explain with an example.
12. What is partial participation? Explain with an example.
13. What is mandatory participation? Explain with an example.
14. What do you mean by disjoint constraints of specialisation/generalisation? Explain with an
example.
15. What is overlapping constraint? Explain with an example.
16. A non-government organisation (NGO) depends on the number of different types of persons
for its operations. The NGO is interested in three types of persons namely volunteers, donors
and patrons. The attributes of such persons are person identification number, person name,
address, city, pin code and telephone number. The patrons have only a date-elected attribute
while the volunteers have only skill attribute. The donors only have a relationship ‘donates’
with an ITEM entity type. A donor must have donated one or more items and an item may
have no donors, or one or more donors. There are persons other than donors, volunteers and
patrons who are of interest to the NGO, so that a person need not belong to any of these three
groups. On the other hand, at a given time a person may belong to two or more of these
groups.
Draw an EER diagram for this NGO database schema.
17. Draw an EER diagram for a typical banking organisation. Make assumptions wherever
required.
STATE TRUE/FALSE
1. Subclasses are the sub-grouping of occurrences of entities in an entity type that shares
common attributes or relationships distinct from other sub-groupings.
2. In case of supertype, objects in one set are grouped or subdivided into one or more classes in
many systems.
3. Superclass is a generic entity type that has a relationship with one or more subtypes.
4. Each member of the subclass is also a member of the superclass.
5. The relationship between a superclass and a subclasses is a one-to-many (1:N) relationship.
6. The U-shaped symbols in ERR model indicates that the supertype is a subset of the subtype.
7. Attribute inheritance is the property by which supertype entities inherit values of all
attributes of the subtype.
8. Specialisation is the process of identifying subsets of an entity set of the superclass or
supertype that share some distinguishing characteristic.
9. Specialisation minimizes the differences between members of an entity by identifying the
distinguishing and unique characteristics of each member.
10. Generalisation is the process of identifying some common characteristics of a collection of
entity sets and creating a new entity set that contains entities processing these common
characteristics.
11. Generalisation maximizes the differences between the entities by identifying the common
features.
12. Total participation is also called an optional participation.
13. A total participation specifies that every member (or entity) in the supertype (or superclass)
must participate as a member of some subclass in the specialisation/generalisation.
14. The participation constraint can be total or partial.
15. A partial participation constraint specifies that a member of a supertype need not belong to
any of its subclasses of a specialisation/generalisation.
16. A non-joint constraint is also called an overlapping constraint.
17. A partial participation is also called a mandatory participation.
18. Disjoint constraint specifies the relationship between members of the subtypes and indicates
whether it is possible for a member of a supertype to be a member of one, or more than one,
subtype.
19. The disjoint constraint is only applied when a supertype has one subtype.
20. A partial participation is represented using a single line between the supertype and the
specialisation/generalisation circle.
21. A subtype is not an entity on its own.
22. A subtype cannot have its own subtypes.
1. The U-shaped symbols on each line connecting a subtype to the circle, indicates that the
subtype is a
a. maximized.
b. minimized.
c. both of these.
d. none of these.
a. maximized.
b. minimized.
c. both of these.
d. none of these.
5. Specialisation is a
a. extended E-R.
b. effective E-R.
c. expanded E-R.
d. enhanced E-R.
7. Which are the additional concepts that are added in the E-R mdel?
a. specialisation.
b. generalisation.
c. supertype/subtype entity.
d. all of these.
DATABASE DESIGN
Chapter 8
8.1 INTRODUCTION
The number and types of data fields, one composite or several databases
and others that are necessary to fulfil the requirements of an enterprise are
derived from the information system strategic planning exercise known as
information system life cycle. This is also called software development life
cycle (SDLC).
In this chapter, basic concepts of software development life cycle (SDLC),
structure system analysis and design (SSAD), database development life
cycle (DDLC) and automated design tools have been explained.
8.2 SOFTWARE DEVELOPMENT LIFE CYCLE (SDLC)
Fig. 8.1 Approximate relative costs for the phases of the software process
Fig. 8.2 Relative cost of fixing a fault at each phase of the software process
Fig. 8.3 Relative cost to fix an error (fault) plotted on linear scale
Fig. 8.5 A typical example of DFD for modelling of steel making process
Testing and Validation: At this stage, the new database system is tested
and validated for its intended results.
As shown in Fig. 8.7, the conceptual database design stage involves two parallel activities
namely:
The technical factors are concerned with the suitability of the DBMS for
the intended task to be performed. It considers issues such as:
types of DBMS such as relational, hierarchical, networking, object-oriented, object-relational
and so on.
The storage structures.
Access paths that the DBMS supports.
User and programmer interfaces available.
Types of high-level query languages.
Availability of development tools.
Ability to interface with other DBMS via standard interfaces.
Architectural options related to client-server operation.
The economic factors are concerned with the costs of the DBMS product
and consider the following issues:
Costs of additional hardware and software required to support the database system.
Purchase cost of basic DBMS software and other products such as language options, different
interface options such as forms, menu and Web-based graphic user interface (GUI) tools,
recovery and backup options, special access methods, documentation and so on.
Cost associated with the changeover.
Cost of staff training.
Maintenance cost.
Cost of database creation and conversion.
8.3.1.4 Logical Database Design
The logical database design may be defined as the process of the following:
Creating a conceptual schema and external schemas from the high-level data model of
conceptual database design stage into the data model of the selected DBMS by mapping
those schemas produced in conceptual design stage.
Organising the data fields into non-redundant groupings based on the data relationship and an
initial arrangement of those logical groupings into structures based on the nature of the
DBMS and the applications that will use the data.
Constructing a model of the information used in an enterprise based on a specific data model,
but independent of a particular DBMS and other physical considerations.
8.3.1.6 Prototyping
Prototyping is a rapid method of interactively building a working model of
the proposed database application. It is one of the rapid application
development (RAD) methods to design a database system. RAD is an
interactive process of rapidly repeating analysis, design and implementation
steps until it fulfils the user requirements. Therefore, prototyping is an
interactive process of database systems development in which the user
requirements are converted to a working system that is continually revised
through close work between database designer and the users.
A prototype does not normally have all the required features and
functionality of the final system. It basically allows users to identify the
features of the proposed system that work well, or are inadequate and if
possible to suggest improvements or even new features to the database
application.
Fig. 8.8 shows the prototyping steps. With the increasing use of visual
programming tools such as Java, Visual Basic, Visual C++ and fourth
generation languages, it has become very easy to modify the interface
between system and user while prototyping. A prototyping has the following
advantages:
Relatively inexpensive.
Quick to build.
Easy to change the contents and layout of user reports and displays.
With changing needs and evolving system requirements, the prototype database can be
rebuilt.
Fig. 8.9 illustrates the three application levels of CASE tools with respect
to the database development life cycle (DDLC) of Fig. 8.6. Upper-CASE
tools support the initial stages of DDLC, that is, from feasibility study and
requirement analysis through the database design. Lower-CASE tools
support the later stages of DDLC, that is, from database implementation
through testing, to monitoring and maintenance. Integrated- CASE tools
support all stages of the DDLC and provides the functionality of both upper-
CASE and lower-CASE in one tool.
Facilities provided by CASE Tools: CASE tools provide the following
facilities to the database designer:
Create a data dictionary to store information about the database application’s data.
Design tools to support data analysis.
Tools to permit development of the corporate data model and the conceptual and logical data
models.
To help in drawing conceptual schema diagram using entity-relationship (E-R) and other
various notations such as entity types, relationship types, attributes, keys and so on.
Generate schemas (or codes) in SQL DDL for various RDBMSs for model mapping and
implementing algorithms.
Decomposition and normalisation.
Indexing.
Tools to enable prototyping of applications.
Performance monitoring and measurement.
Fig. 8.9 Application levels of CASE Tools
Some of the popular CASE tools being used for database design are
shown in Table 8.1.
Table 8.1 Popular CASE tools for database design
R Q
1. What is software development life cycle (SDLC)? What are the different phases of a SDLC?
2. What is the cost impact of frequent software changes? Explain.
3. What is structured system analysis and design (SSAD)? Explain.
4. What do you mean by database development life cycle (DDLC)? When does DDLC start?
5. What are the various stages of DDLC? Explain each of them.
6. What are the different approaches of database design? Explain each of them.
7. What are the different phases of database design? Discuss each phase.
8. Discuss the relationship between the SDLC and DDLC.
9. Write short notes on the following:
a. Conceptual database design
b. Logical database design
c. Physical database design
d. Prototyping
e. CASE tools
f. DBMS selection.
g. Database implementation and tuning
10. Which of the different phases of database design are considered the main activities of the
database design process itself? Why?
11. Consider an actual application of a database system for an off-shore software development
company. Define the requirements of the different levels of users in terms of data needed,
types of queries and transactions to be processed.
12. What functions do the typical automated database design tools provide?
13. What are the limitations of manual database design?
14. Discuss the main purpose and activities associated with each phase of the DDLC.
15. Compare and contrast the various phases of database design.
16. Identify the stage where it is appropriate to select a DBMS and describe an approach to
selecting the best DBMS for a particular use.
17. Describe the main advantages of using a prototyping approach when building a database
application.
18. What are computer-aided software engineering (CASE) tools?
19. What are the facilities provided by CASE tools?
20. What should be the characteristics of right CASE tools?
21. List the various types of CASE tools and their functions provided by different vendors.
STATE TRUE/FALSE
1. Database design is a process of arranging the data fields into an organised structure needed
by one or more applications.
2. Frequent software changes, without change in specifications, is an indication of a good
design.
3. Maintenance is an extremely time-consuming and expensive phase of the software process.
4. Relative costs of fixing a fault at later phases, is less as compared to fixing the fault at the
early phases of the software process.
5. Database requirement phase of database design is the process of detailed analysis of the
expectations of the users and intended uses of the database.
6. The conceptual database design is dependent on specific DBMS.
7. The physical database design is independent of any specific DBMS.
8. The bottom-up approach is appropriate for the design of simple databases with a relatively
small number of attributes.
9. The top-down approach is appropriate for the design of simple databases with a relatively
small number of attributes.
10. The top-down database design approach starts at the fundamental level of abstractions.
11. The bottom-up database design approach starts with the development of data models that
contains high-level abstractions.
12. The inside-out database design approach uses both the bottom-up and top-down approach
instead of following any particular approach.
13. The mixed strategy database design approach starts with the identification of set of major
entities and then spreading out.
14. The objective of developing a prototype of the system is to demonstrate the understanding of
the user requirements and the functionality that will be provided in the proposed system.
a. data redundancy
b. data independence
c. data security
d. all of these.
2. Structured system analysis and design (SSAD) is a software engineering approach to the
specification, design, construction, testing and maintenance of software for
3. The database development life cycle (DDLC) is to meet strategic or operational information
needs of an organisation and is a process of
designing
implementing
maintaining
all of these.
5. Which of the following is the SDLC phase that starts after the software is released into use?
10. Which of the following design is both hardware and software independent?
a. conceptual
b. logical
c. physical
d. none of these.
1. Database design is a process of arranging the _____ into an organised structure needed by
one or more _____.
2. Frequent software changes, without change in specifications, is an indication of a _____
design.
3. Structured system analysis and design (SSAD) is a software engineering approach to the
specification, design, construction, testing and maintenance of software for (a) maximising
the _____ and _____ of the system as well as for reducing _____.
4. The _____ is the main source of database development projects.
5. The four database design approaches are (a) _____, (b) _____, (c) _____ and (d) _____.
6. The bottom-up database design approach starts at the _____.
7. The top-down database design approach starts at the _____.
8. The inside-out database design approach starts at the _____.
9. It is in the _____ phase that the system design objectives are defined.
10. In the _____ phase, the conceptual database design is translated into internal model for the
selected DBMS.
Chapter 9
9.1 INTRODUCTION
Example 1
Example 2
Similarly, in the above example, if the person also uses one machine each
day, then FD can be given as:
Example 3
It means that in an ASSIGN relation (or table), once the values of EMP-
NO. and PROJECT are known, a unique value of YRS-SPENT-BY-EMP-
ON-PROJECT also can be known. Fig. 9.5 (b) shows the functional
dependency diagram (FDD) for this example.
Fig. 9.5 Example 3
Example 4
Example 5
A number of (or all) functional dependencies can be represented on one
functional dependency diagram (FDD). In this case FDD contains one entry
for each attribute and shows all functional dependencies between attributes.
Fig. 9.8 shows a functional dependency diagram with a number of functional
dependencies.
As shown in FDD of Fig. 9.8, suppliers make deliveries to warehouses.
One of the attributes, WAREHOUSE-NAME, identifies a warehouse. Each
warehouse has one WAREHOUSE-ADDRESS. An attribute QTY-IN-
STORE-ON-DATE is determined by the combination of attributes
WAREHOUSE-NAME, INVENTORY-DATE and PART-NO. This is an
example of a technique for modelling of time variations by functional
dependencies.
Another technique used by functional dependencies is modelling
composite identifiers. As shown in Fig. 9.8, a delivery is identified by a
combination of SUPPLIER-NAME and DELIVERY-NO within the supplier.
The QTY-DELIVERED of a particular part is determined by the
combination of this composite identifier, that is, SUPPLIER-NAME and
DELIVERY-NO.
Fig. 9.8 also shows one-to-one dependencies, in which each warehouse
has one manager and each manager manages one warehouse. These
dependencies are modelled by the double arrow between WAREHOUSE-
NAME and MANAGER-NAME, showing that following arguments are
true:
Rule 4 Self-determination: X → X.
Rule 5 Pseudo-transitivity: If X → Y and YW → Z, then XW →
Z.
Rule 6 Union or additive: If X → Z and X → Y, then X → YZ.
Rule 7 Decomposition or If X → YZ, then X → Y and X → Z.
projective:
Rule 8 Composition: If X → Y and Z → W, then XZ →
YW.
Rule 9 Self accumulation: If X → YZ and Z → W, then X
→YZW.
Fig. 9.10 Diagrammatical representation of Armstrong’s axioms
Z→A B→X AX → Y ZB → Y
Z→A B→X AX → Y ZB → Y
Fig. 9.11 Membership algorithm to find redundant FDs
F = {A → B, B → C, C → D, D → E, E → F, F → G, G → H}
Using algorithm of Fig. 9.13, the closure sets with respect to F can be
calculated as follows:
9.3 DECOMPOSITION
A functional decomposition is the process of breaking down the functions of
an organisation into progressively greater (finer and finer) levels of detail. In
decomposition, one function is described in greater detail by a set of other
supporting functions. In other words, decomposition is done to break the
modules in smallest one to convert the data models in normal forms to avoid
redundancies. The decomposition of a relation scheme R consists of
replacing the relation schema by two or more relation schemas that each
contain a subset of the attributes of R and together include all attributes in R.
The algorithm of relational database design starts from a single universal
relation schema R= {A1, A2, A3…, An}, which includes all the attributes of
the database. The universal relation states that every attribute name is
unique. Using the functional dependencies, the design algorithms decompose
the universal relation schema R into a set of relation schemas D = {R1 ,R2,
R3,… Rm}. Now, D becomes the relational database schema and D is called
a decomposition of R.
The decomposition of a relation scheme R ={A1, A2, A3,…;An} is its
replacement by a set of relation schemes D = {R1, R2, R3,…,Rm}, such that
R1 ⊆ R for 1 ≤ i ≤ m
and R1 ⌒ R2 ⌒ R3 … ⌒ Rm = R
where ∏ = projection
⋈ = the natural join of all relations in
D.
R1 = projection of R over X, Y
R2 = projection of R over X, Z
R Q
1. What do you mean by functional dependency? Explain with an example and a functional
dependency diagram.
2. What is the importance of functional dependencies in database design?
3. What are the main characteristics of functional dependencies?
4. Describe Armstrong’s axioms. What are derived rules?
5. Describe how a database designer typically identifies the set of FDs associated with a
relation.
6. A relation schema R (A, B, C) is given, which represents a relationship between two entity
sets with primary key A and B respectively. Let us assume that R has the FDs A → B and B
→ A, amongst others. Explain what such a pair of dependencies means about the relationship
in the database model?
7. What is a functional dependency diagram? Explain with an example.
8. Draw a functional dependency diagram (FDD) for the following:
a. Which of the following dependencies can you infer does not hold over schemas S?
i. A → B
ii. BC → A
iii. B → C.
Which of the following decompositions of R = ABCDEG, with the same set of dependencies
F, is
(a) dependency-preserving and (b) lossless-join?
12. What is the dependency preservation property for decomposition? Why is it important?
13. Let R be decomposed into R1, R2,…, Rn and F be a set of functional dependencies (FDs) on
R. Define what it means for F to be preserved in the set of decomposed relations.
14. A relation R having three attributes ABC is decomposed into relations R1 with attributes AB
and R2 with attributes BC. State the definition of lossless-join decomposition with respect to
this example, by writing a relational algebra equation involving R, R1, and R2.
15. What is the lossless or non-additive join property of decomposition? Why is it important?
16. The following relation is given:
A university can have any number of campuses. Each campus has one library. Each library is
on one campus. Each library has a distinct name. A student is at one university only and can
use the libraries at some, but not all, of the campuses.
Decomposition 1
R1 (UNIVERSITY, CAMPUS, LIBRARY)
R2 (STUDENT, UNIVERSITY)
Decomposition 2
R1 (UNIVERSITY, CAMPUS, LIBRARY)
R2 (STUDENT, LIBRARY)
17. Consider the relation SUPPLIES given as:
Now the above relation is decomposed into the following two relations:
18. What do you mean by trivial dependency? What is its significance in database design?
19. What are redundant functional dependencies? Explain with an example. Discuss the
membership algorithm to find redundant FDs.
20. What do you mean by the closure of a set of functional dependencies? Discuss how
Armstrong’s axioms can be used to develop algorithm that will allow computing F+ from F.
21. Illustrate the three Armstrong’s axioms using diagrammatical representation.
22. A relation R(A, B, C, D) is given. For each of the following sets of FDs, assuming they are
the only dependencies that hold for R, state whether or not the proposed decomposition of R
into smaller relations is a good decomposition. Briefly explain your answer why or why not.
23. Consider the relation R (ABCD) and the FDs {A → B, C → D, A → E}. Is the decomposition
of R into (ABC), (BCD) and (CDE) lossless?
24. Remove any redundant FDs from the following sets of FDs:
Set 1: A → B, B → C, AD → C
Set 2: XY → V, ZW → V, VX → Y, W → Y, Z → X
Set 3: PQ → R, PS → Q, QS → P, PR → Q, S → R.
26. Consider that there are the following requirements for a university database to keep track of
students’ transcripts:
a. The university keeps track of each student’s name (STDT-NAME), student number
(STDT-NO), social security number (SS-NO), present address (PREST-ADDR),
permanent address (PERMT-ADDR), present contact number (PREST-CONTACT-
NO), permanent contact number (PERMT- CONTACT-NO), date of birth (DOB),
sex (SEX), class (CLASS) for example fresh, graduate and so on, major department
(MAJOR-DEPT), minor department (MINOR-DEPT) and degree program (DEG-
PROG) for example, BA, BS, PH.D and so on. Both SS-NO and STDT-NO have
unique values for each student.
b. Each department is described by a name (DEPT-NAME), department code (DEPT-
CODE), office number (OFF-NO), office phone (OFF-PHONE) and college
(COLLEGE). Both DEPT-NAME and DEPT-CODE have unique values for each
department.
c. Each course has a course name (COURSE-NAME), description (COURSE-DESC),
course number (COURSE-NO), credit for number of semester hours (CREDIT),
level (LEVEL) and course offering department (COURSE-DEPT). The COURSE-
NO is unique for each course.
d. Each section has a faculty (FACULTY-NAME), semester (SEMESTER), year
(YEAR), section course (SEC-COURSE) and section number (SEC-NO). The SEC-
NO distinguishes different sections of the same course that are taught during the
same semester/year. The values of SEC-NO are 1, 2, 3,…, up to the total number of
sections taught during each semester.
e. A grade record refers to a student (SS-NO), a particular section (SEC-NO), and a
grade (GRADE).
i. Design a relational database schema for this university database application.
ii. Specify the key attributes of each relation.
iii. Show all the FDs that should hold among attributes.
Make appropriate assumptions for any unspecified requirements to render the specification
complete.
STATE TRUE/FALSE
1. A functional dependency is a
a. redundancy
b. inconsistencies
c. anomalies
d. all of these.
a. loss of information.
b. loss of attributes.
c. loss of relations.
d. none of these.
a. X is functionally dependent on Y.
b. X is not functionally dependent on any subset of Y.
c. both (a) and (b).
d. none of these.
Normalization
10.1 INTRODUCTION
10.2 NORMALIZATION
A normal form is a state of a relation that results from applying simple rules
regarding functional dependencies (FDs) to that relation. It refers to the
highest normal form of condition that it meets. Hence, it indicates the degree
to which it has been normalised. The normal forms are used to ensure that
various types of anomalies and inconsistencies are not introduced into the
database. For determining whether a particular relation is in normal form or
not, the FDs between the attributes in the relation are examined and not the
current contents of the relation. First C. Berri and his co-workers proposed a
notation to emphasise these relational characteristics. They proposed that the
relation is defined as containing two components namely (a) the attributes
(b) the FDs between them. It takes the form
The first component of the relation R1 is the attributes, and the second
component is the FDs. For example, let us look at the relation ASSIGN of
Table 10.2.
The FDs between attributes are important when determining the relation’s
key. A relation key uniquely identifies a tuple (row). Hence, the key or prime
attributes uniquely determines the values of the non-key or non-prime
attributes. Therefore, a full FD exists from the prime to the nonprime
attributes. It is with full FDs whose determinants are not keys of a relation
that problems arise. For example, in the relation ASSIGN of Table 10.2, the
key is {EMP-NO, PROJECT}. However, PROJECT-BUDGET depends on
only part of the key. Alternatively, the determinant of the FD: PROJECT →
PROJECT-BUDGET is not the key of the relation. This undesirable property
causes the anomalies. Conversion to normal forms requires a choice of
relations that do not contain such undesirable dependencies. Various types of
normal forms used in relational database are as follows:
First normal form (1NF).
Second normal form (2NF).
Third normal form (3NF).
Boyce/Codd normal form (BCNF).
Fourth normal form (4NF).
Fifth normal form (5NF).
Example 1
Consider a relation LIVED_IN, as shown in Fig. 10.2 (a), which keeps
records of person and his residence in different cities. In this relation, the
domain RESIDENCE is not simple. For example, an attribute “Abhishek”
can have residence in Jamshedpur, Mumbai or Delhi. Therefore, the relation
is un-normalised. Now, the relation LIVED_IN is normalised by combining
each row in residence with its corresponding value of PERSON and making
this combination a tuple (row) of the relation, as shown in Fig. 10.2 (b).
Thus, now non-simple domain RESIDENCE is replaced with simple
domains.
Example 2
It can be observed from the relational table that a doctor cannot have two
simultaneous appointments and thus DOCTOR-NAME and DATE-TIME is
a compound key. Similarly, a patient cannot have same time from two
different doctors. Therefore, PATIENT-NAME and DATE-TIME attributes
are also a candidate key.
Problems with 1NF
1NF contains redundant information. For example, the relation PATIENT_DOCTOR in 1NF
of Table 10.3 has the following problems with the structure:
a. A doctor, who does not currently have an appointment with a patient, cannot be
represented.
b. Similarly, we cannot represent a patient who does not currently have an appointment
with a doctor.
c. There is redundant information such as the patient’s date-of-birth and the doctor’s
phone numbers, stored in the table. This will require considerable care while
inserting new records, updating existing records or deleting records to ensure that
all instances retain the correct values.
d. While deleting the last remaining record containing details of a patient or a doctor,
all records of that patient or doctor will be lost.
Example 1
Table 10.4 Relation PATIENT_DOCTOR decomposed into two tables for refirement into 2NF
Relation: DOCTOR
DOCTOR-NAME CONTACT-NO
Abhishek 657-2145063
Sanjay 651-2214381
Thomas 011-2324567
Thomas 011-2324567
Abhishek 657-2145063
Thomas 011-2324567
Sanjay 651-2214381
Abhishek 657-2145063
Example 2
Table 10.5 Decomposition of relations ASSIGN into ASSIGN and PROJECTS as 2NF
Relation: ASSIGN
EMP-NO PROJECT YRS-SPENT-BY EMP-ON-PROJECT
106519 P1 5
112233 P3 2
106519 P2 5
123243 P4 10
106519 P3 3
111222 P1 4
(a)
Relation: PROJECT
PROJECT PROJECT-BUDGET
P1 INR 100 CR
P2 INR 150 CR
P3 INR 200 CR
P4 INR 100 CR
P5 INR 150 CR
P6 INR 300 CR
(b)
As can been seen from Table 10.6 and Table 10.7 that each project is in
one department, and each department has one address. It is however,
possible for a department to include more than one project. The relation has
only one relation (primary) key, namely, PROJECT. Both DEPARTMENT
and DEPARTMENT- ADDRESS are fully functionally dependent on
PROJECT. Thus, relation PROJECT_DEPARTMENT is in 2NF.
Relation: EMPLOYEE
EMP-ID EMP-NAME
106519 Kumar Abhishek
(a)
Relation: PROJECT-ASSIGNMENT
EMP-NO PROJECT YRS-SPENT-BY EMP-ON-PROJECT
106519 P1 20.05.04
112233 P1 11.1104
106519 P2 03.03.05
123243 P3 12.01.05
112233 P4 30.03.05
(b)
Example 1
Example 2
Relation: PROJECT
PROJECT PROJECT-BUDGET DEPARTMENT
P1 INR 100 CR Manufacturing
P2 INR 150 CR Manufacturing
P3 INR 200 CR Manufacturing
P4 INR 100 CR Training
(a)
Relation: DEPARTMENT
DEPARTMENT DEPARTMENT-ADDRESS
Manufacturing Jamshedpur-1
Manufacturing Jamshedpur-1
Manufacturing Jamshedpur-1
Training Mumbai-2
(b)
Example 3
In the previous examples of 3NF, only one relation (primary) key has been
used. Conversion into 3NF becomes problematic when the relation has more
than one relation keys. Let us consider another relation USE, as shown in
Fig. 10.7 (a). Functional dependency diagram (FDD) of relation USE is
shown in Fig. 10.7 (b).
As shown in Fig. 10.7 (a), the relation USE stores the machines used by
both projects and project managers. Each project has one project manager
and each project manager manages one project. Now, it can be observed that
this relation USE has two relation (primary) keys, namely, {PROJECT,
MACHINE} and {PROJ-MANAGER, MACHINE}. The keys overlap
because MACHINE appears in both keys, whereas, PROJECT and PROJ-
MANAGER each appear in one relation key only.
The relation USE of Fig. 10.7 has only one non-prime attribute called,
QTY-USED, which is fully functionally dependent on each of the two
relations. Thus, relation USE is in 2NF. Furthermore, as there is only one
non-prime attribute QTY-USED, there can be no dependencies between non-
prime attributes. Thus, the relation USE is also in 3NF.
There is dependency between PROJECT and MANAGER, both of which appear in one
relation key only. This dependency leads to redundancy.
Example 1
Relation USE in Fig. 10.7 (a) does not satisfy the above condition, as it
contains the following two functional dependencies:
PROJ-MANAGER → PROJECT
PROJECT → PROJ-MANAGER
Both of the above relations are in BCNF. The only FD between the USE
attributes is
PROJECT → PROJ-MANAGER
PROJ-MANAGER → PROJECT
This table lists the projects, the parts, the quantities of those parts they use
and the vendors who supply these parts. There are two assumptions. Firstly,
each project is supplied with a specific part by only one vendor, although a
vendor can supply that part to more than one project. Secondly, a vendor
makes only one part but the same part can be made by other vendors. The
primary keys of the relation PROJECT_PART are PROJECT-NAME and
PART-CODE. However, another, overlapping, candidate key is present in the
concatenation of the VENDOR-NAME (assumed unique for all vendors)
and PROJECT-NAME (assumed unique for all projects) attributes. These
would also uniquely identify each tuple of the relation (table).
Let us consider a relation PERSON_SKILL, as shown in Table 10.11. This relation contains
the following:
a. The SKILL-TYPE possessed by each person. For example, “Abhishek” has “DBA”
and “Quality Auditor” skills.
b. The PROJECTs to which a person is assigned. For example, “John” is assigned to
projects “P1” and “P2”.
c. The MACHINEs used on each project. For example, “Excavator”, “Shovel” and
“Drilling” are used on project “P1”.
Table 10.11 Relation PERSON_SKILL
To deal with the problem of BCNF, R. Fagin introduced the idea of multi-
valued dependency (MVD) and the fourth normal form (4NF). A multi-
valued dependency (MVD) is a functional dependency where the dependency
may be to a set and not just a single value. It is defined as X→→Y in relation
R (X, Y, Z), if each X value is associated with a set of Y values in a way that
does not depend on the Z values. Here X and Y are both subsets of R. The
notation X→→Y is used to indicate that a set of attributes of Y shows a
multi-valued dependency (MVD) on a set of attributes of X.
Example 1
Relation: COURSE_STUDENT_BOOK
COURS STUDENT-NAME TEXT-BOOK
Computer Engg Thomas Database Management
Computer Engg Thomas Software Engineering
Computer Engg John Database management
Computer Engg John Software Engineering
Electronics Engg Thomas Digital Electronics
Electronics Engg Thomas Pulse Theory
MCA Abhishek Computer Networking
MCA Abhishek Data Communication
Example 2
Example 3
PERSON →→ PROJECT
Both these types of trivial MVDs hold for any set of attributes of R and
therefore can serve no purpose as design criteria.
In the above rules, X, Y, and Z all are sets of attributes of a relation R and
U is the set of all the attributes of R. These four axioms can be used to derive
the closure of a set D+, of D of multi-valued dependencies. It can be noticed
that there are similarities between the Armstrong’s axioms for FDs and
Berri’s axioms for MVDs. Both have reflexivity, augmentation, and
transitivity rules. But, the MVD set also has a complementation rule.
Following additional rules can be derived from the above Berri’s axioms
to derive closure of a set of FDs and MVDs:
Further additional rules can be derived from above rules, which are as
follows:
Example 1
Example 2
R1 (PROJECT, MACHINE)
R2 (PERSON, SKILL-TYPE)
R3 (PERSON, PROJECT)
Example 3
Relation: PERSONS_ON_JOB_SKILLS
PERSON SKILL-TYPE JOB
Thomas Analyst J-1
Thomas Analyst J-2
Thomas DBA J-2
Thomas DBA J-3
John DBA J-1
Abhishek Analyst J-1
The anomalies of MVDs and are eliminated by join dependency (JD) and
5NF.
R = R1 ⋃ R2 ⋃ …… ⋃ Rn
Example 1
Example 1
Now by applying the definition of 5NF, the join dependency is given as:
The above statement is true because a join relation of these three relations
is equal to the original relation PERSONS_ON_JOB_SKILLS. The
consequence of these join dependencies is that the SKILL-TYPE, JOB or
PERSON, is not relation key, and hence the relation is not in 5NF. Now
suppose, the second tuple (row 2) is removed form relation
PERSONS_ON_JOB_SKILLS, a new relation is created that no longer has
any join dependencies. Thus the new relation will be in 5NF.
R Q
1. What do you understand by the term normalization? Describe the data normalization process.
What does it accomplish?
2. Describe the purpose of normalising data.
3. What are different normal forms?
4. Define 1NF, 2NF and 3NF.
5. Describe the characteristics of a relation in un-normalised form and how is such a relation
converted to a first normal form (1NF).
6. What undesirable dependencies are avoided when a relation is in 3NF?
7. Given a relation R(A, B, C, D, E) and F = (A → B, BC → D, D → BC, DE → ϕ), synthesise
a set of 3NF relation schemes.
8. Define Boyce-Codd normal form (BCNF). How does it differ from 3NF? Why is it
considered a stronger from 3NF? Provide an example to illustrate.
9. Why is 4NF preferred to BCNF?
10. A relation R(A, B, C) has FDs AB → C and C → A. Is R is in 3NF or in BCNF? Justify your
answer.
11. A relation R(A, B, C, D) has FD C → B. Is R is in 3NF? Justify your answer.
12. A relation R(A, B, C) has FDs A. → C. Is R is in 3NF? Does AB → C? Justify your answer.
13. Given the relation R(A, B, C, D, E) with the FDs (A → BCDE, B → ACDE, C → ABDE),
what are the join dependencies of R? Give the lossless decomposition of R.
14. Given the relation R(A, B, C, D, E, F) with the set X = (A → CE, B → D, C → ADE, BD
→→ F), find the dependency basis of BCD.
15. Explain the following:
16. Consider the functional dependency diagram as shown in Fig. 10.12. Following relations are
given:
PROJ-NO → PROJ-NAME
PROJ-NO → START-DATE
PROJ-NO, MACHINE-NO → TIME-SPENT-ON-PROJ
MACHINE-NO, PERSON-NO → TIME-SPENT-BY-PERSON
18. Define the concept of multi-valued dependency (MVD) and describe how this concept relates
to 4NF. Provide an example to illustrate your answer.
19. Following relation is given:
This relation stores the actors in each play and the performance times of each play. It is
assumed that each actor takes part in every performance.
21. A role of the actor is added in the relation of exercise 20, which now becomes
a. Assuming that each actor has one role in each play, find the MVDs for the following
cases:
i. Each actor takes part in every performance of the play.
ii. An actor takes part in only some performances of the play.
b. In each case determine whether the relation is in 4NF and decompose it if it is not.
22. For exercise 6 of Chapter 9, design relational schemas for the database that are each in 3NF
or BCNF.
23. Consider the universal relation R (A, B, C, D, E, F, G, H, I, J) and the set of FDs
F = ({A, B} → {A} → {D, E}, {B} → {F}, {F} → {G, H}, {D} → {I, J}).
G = ({A, B} → {C} → {B, D} → {E, F}, {A, D} → {G, H}, {A} → {I}, {H}, → {J}).
25. Following relations for an order-processing application database of M/s KLY Ltd. are given:
ORDER (ORD-NO, ORD-DATE, CUST-NO, TOT-AMNT)
ORDER_ITEM (ORD-NO, ITEM-NO, QTY-ORDRD, TOT-PRICE, DISCT%)
Assume that each item has a different discount. The TOT-PRICE refers to the price of one
item. ORD-DATE is the date on which the order was placed. TOT-AMNT is the amount of
the order.
a. If natural join is applied on the relations ORDER and ORDER_ITEM in the above
database, what does the resulting relation schema look like?
b. What will be its key?
c. Show the FDs in this resulting relation.
d. State why or why not is it in 2NF.
e. State why or why not is it in 3NF.
AUTH-AFFL refers to the affiliation of author. Suppose that the following FDs exist:
27. Set of FDs given are A → BCDEF, AB → CDEF, ABC → DEF, ABCD → EF, ABCDE →
F, B → DG, BC → DEF, BD → EF and E → BF.
a. Is R is in 3NF?
b. Is R in BCNF?
c. Does the MVD AB →→ C hold?
d. Does the set {R1(A, B, C), R2(A, B, D)} satisfy the lossless join property?
29. A relation R(A, B, C) and the set {R1(A, B), R2(B, C)}satisfies the lossless decomposition
property.
a. Is R in 4NF?
b. Is B a candidate key?
c. Does the MVD B →→ C hold?
31. A life insurance company has a large number of policies. For each policy, the company wants
to know the policy holder’s social security number, name, address, date of birth, policy
number, annual premium and death benefit amount. The company also wants to keep track of
agent number, name, and city of residence of the agent who made the policy. A policy can
have many policies and an agent can make many policies.
Create a relational database schema for the above life insurance company with all relations in
4NF.
32. Define the concept of join dependency (JD) and describe how this concept relates to 5NF.
Provide an example to illustrate your answer.
33. Give an example of a relation schema R and a set of dependencies such that R is in BCNF,
but is not in 4NF.
34. Explain why 4NF is a normal form more desirable than BCNF.
STATE TRUE/FALSE
7. When a relation R in 3NF with FDs AB → C and C → B is decomposed into two relations
R1 (with AB → null, that is, all key) and R2 (with C → B), the relations R1 and R2
8. When a relation R in BCNF with FDs A → BCD (where A is the primary key) is decomposed
into two relations R1 (with A → B) and R2 (with A → CD), the resulting two relations R1
and R2
1. Normalization is a process of
a. E.F. Codd.
b. R.F. Boyce.
c. R. Fagin.
d. Collin White.
3. A normal form is
a. a state of a relation that results from applying simple rules regarding FDs.
b. the highest normal form condition that it meets.
c. an indication of the degree to which it has been normalised.
d. all of these.
4. Which of the following is the formal process of deciding which attributes should be grouped
together in a relation?
a. optimization
b. normalization
c. tuning
d. none of these.
5. In 1NF,
6. 2NF is always in
a. 1NF.
b. BCNF.
c. MVD.
d. none of these.
a. if it is in 1NF.
b. every non-prime key attributes of R is fully functionally dependent on each relation
key of R.
c. if it is in BCNF.
d. both (a) and (b).
a. relation R is in 2NF.
b. nonprime attributes are mutually independent.
c. functionally dependent on the primary key.
d. all of these.
a. E.F. Codd.
b. R.F. Boyce.
c. R. Fagin.
d. none of these.
10. The expansion of BCNF is
11. The fourth normal form (4NF) is concerned with dependencies between the elements of
compound keys composed of
a. one attribute.
b. two attributes.
c. three or more attributes.
d. none of these.
12. When all the columns (attributes) in a relation describe and depend upon the primary key, the
relation is said to be in
a. 1NF.
b. 2NF.
c. 3NF.
d. 4NF.
1. Normalization is a process of _____a set of relations with anomalies to produce smaller and
well-structured relations that contain minimum or no _____.
2. _____ is the formal process for deciding which attributes should be grouped together.
3. In the _____ process we analyse and decompose the complex relations and transform them
into smaller, simpler, and well-structured relations.
4. _____ first developed the process of normalization.
5. A relation is said to be in 1NF if the values in the domain of each attribute of the relation
are_____.
6. A relation R is said to be in 2NF if it is in _____ and every non-prime key attributes of R is
_____ on each relation key of R.
7. 2NF can be violated only when a key is a _____ key or one that consists of more than one
8. When the multi-valued attributes or repeating groups in a relation are removed then that
relation is said to be in _____.
9. In 3NF, no non-prime attribute is functionally dependent on _____ .
10. Relation R is said to be in BCNF if for every nontrivial FD: _____ between attributes X and Y
holds in R.
11. A relations is said to be in the _____ when transitive dependencies are removed.
12. A relation is in BCNF if and only if every determinant is a _____ .
13. Any relation in BCNF is also in _____ and consequently in_____.
14. The difference between 3NF and BCNF is that for a functional dependency A → B, 3NF
allows this dependency in a relation if B is a _____ key attribute and A is not a _____ key.
Whereas, BCNF insists that for this dependency to remain in a relation, A must be a _____
key.
15. 4NF is violated when a relation has undesirable _____.
16. A relation is said to be in 5NF if every join dependency is a _____ of its relation keys.
Part-IV
11.1 INTRODUCTION
The syntax analyser takes the query from the users, parses it into tokens and
analyses the tokens and their order to make sure they comply with the rules
of the language grammar. If an error is found in the query submitted by the
user, it is rejected and an error code together with an explanation of why the
query was rejected is returned to the user.
A simple form of language grammar that could be used to implement a
SQL statement is given below:
The query decomposition is the first phase of query processing whose aims
are to transform a high-level query into a relational algebra query and to
check whether that query is syntactically and semantically correct. Thus, a
query decomposition phase starts with a high-level query and transforms into
a query graph of low-level operations (algebraic expressions), which
satisfies the query. In practice, SQL (a relational calculus query) is used as
high-level query language, which is used in most commercial RDBMSs. The
SQL is then decomposed into query blocks (low-level operations), which
form the basic units. The query block contains expressions such as a single
SELECT-FROM-WHERE, as well as clause such as GROUP BY and
HAVING, if these are part of the block. Hence, nested queries within a query
are identified as separate query blocks. The query decomposer goes through
five stages of processing for decomposition into low-level operation and to
accomplish the translation into algebraic expressions. Fig. 11.2 shows the
five stages of query decomposer. The five stages of query decomposition
are:
Fig. 11.2 Stages of query decomposer
Query analysis.
Query normalization.
Semantic analysis.
Query simplifier.
Query restructuring.
The above query will be rejected because of the following two reasons:
In the SELECT list, the attribute EMP-ID is not defined for the relation EMPLOYEE.
In the WHERE clause, the comparison “> 100” is incompatible with the data type EMP-
DESIG, which is a variable character string.
In the above SQL query, the join condition DEPT-NO = D-NUM relates a
project to its controlling department, whereas the join condition MGR-ID =
EMP-ID relates the controlling department to the employee who manages
that department. The equivalent relational algebra expression for the above
SQL query can be written as:
Or
(b) I i i l f SQL
(b) Initial query tree for SQL query
Same SQL query can have many different relational algebra expressions
and hence many different query trees. The query parser typically generates a
standard initial (canonical) query tree to correspond to an SQL query,
without doing any optimization. For example, the initial query tree is shown
in Fig. 11.3 (b) for a SELECT-PROJECT-JOIN query. The CARTESIAN
PRODUCT (×) of the relations specified in the FROM clause is first applied,
then the SELECTION and JOIN conditions of the WHERE clause are
applied, followed by the PROJECTION on the SELECT clause attributes.
Because of the CARTESIAN PRODUCT (×) operations, a relational algebra
expression represented by the query tree is very efficient.
The disadvantages of a query graph are that it does not indicate an order
on which operation to perform first, as is the case with query tree. Therefore,
a query tree representation is preferred over the query graph in practice.
There is only one graph corresponding to each query. Query tree and query
graph notations are used as the basis for the data structures that are used for
internal representation of queries.
(EMP-DESIG=‘Programmer’^ EMP-DESIG=‘Analyst’)
A relation query graph for the above SQL query is shown in Fig. 11.5 (a),
which is not fully connected. That means, query is not correctly formulated.
In this graph, the join condition (V.PROJ-NO = P.PROJ-NO) has been
omitted.
Now let us consider the SQL query given as:
A normalised relation query graph for the above SQL query is shown in
Fig. 11.5 (b). This graph has a cycle between the nodes D.MAX-BUDGET
and 0 with a negative valuation sum. Thus, it indicates that the query is
contradictory. Clearly, we cannot have a department with a maximum budget
that is both greater than INR 85,000 and less than INR 50000.
Let us examine the following part of the above query statement in greater
detail:
Now, the above part of the query can be represented in the form of
idempotence rules of Boolean algebra as follows:
PRED1 AND NOT (PRED2 AND PRED3) = P1 ^ ∼(P2 ^ P3)
But in the above query, PRED1 and PRED2 are identical. Now the query
simplifier module applies idempotency rule 1 (Table 11.2) to obtain the
following form:
The SQL query in our example finally looks like the following form:
Thus, in the above example, the original query contained many redundant
predicates, which were eliminated without changing the semantics of the
query.
The term query optimization does not mean giving always an optimal
(best) strategy as the execution plan. It is just a reasonably efficient strategy
for execution of the query. The decomposed query blocks of SQL is
translated into an equivalent extended relational algebra expression (or
operators) and then optimised. There are two main techniques for
implementing query optimization. The first technique is based on heuristic
rules for ordering the operations in a query execution strategy. A heuristic
rule works well in most cases but is not guaranteed to work well in every
possible case. The rules typically reorder the operations in a query tree. The
second technique involves systematic estimation of the cost of different
execution strategies and choosing the execution plan with the lowest cost
estimate. Semantic query optimization is used in combination with the
heuristic query transformation rules. It uses constraints specified on the
database schema such as unique attributes and other more complex
constraints, in order to modify one query into another query that is more
efficient to execute.
Now, let us consider a query in the above database to find the names of
employees born after 1970 whop work on a project named ‘Growth’. This
SQL query can be written as follows:
SELECT EMP-NAME
FROM EMPLOYEE, WORKS_ON, PROJECT
WHERE PROJ-NAME = ‘Growth’ AND PROJ-NO = P-NO
AND E-ID = EMP-ID AND BIRTH-DATE = ‘31-
12-1970’;
Fig. 11.7 (a) shows the initial query tree for the above SQL query. It can
be observed that by executing this initial query tree directly creates a very
large file containing the CARTESIAN PRODUCT (×) of the entire
EMPLOYEE, WORKS_ON and PROJECT files. But, the query needed only
one tuple (record) from the PROJECT relation for the ‘Growth’ project and
only the EMPLOYEE records for those whose date of birth is after ‘31-12-
1970’.
Fig. 11.7 (b) shows an improved version of a query tree that first applies
the SELECT operations to reduce the number of tuples that appear in the
CARTESIAN PRODUCT. As shown in Fig. 11.7 (c), further improvement in
the query tree is achieved by applying more restrictive SELECT operations
and switching the positions of the EMPLOYEE and PROJECT relations in
the query tree. The information that PROJ-NO is a key attribute of
PROJECT relation is used. Hence, SELECT operation on the PROJECT
relation retrieves a single record only.
A further improvement in the query tree can be achieved by replacing any
CARTESIAN PRODUCT (×) operation and SELECT operations with JOIN
operations, as shown in Fig. 11.7 (d). Another improvement in the query tree
can be achieved by keeping only the attributes needed by the subsequent
operations in the intermediate relations, by including PROJECT (∏)
operations in the query tree, as shown in Fig. 11.7 (e). This reduces the
attributes (columns or fields) of the intermediate relations, whereas the
SELECT operations reduce the number of tuples (rows or records).
Fig. 11.7 Steps in converting query tree during heuristic optimization
Example:
σBRANCH-LOCATION = ‘Mumbai’ ^ EMP-SALARY > 85000
(EMPLOYEE) ≡
Example:
σBRANCH-LOCATION = ‘Mumbai’ (σEMP-SALARY > 85000)
(EMPLOYEE) ≡
R× S ≡S × R
Example: EMPLOYEE ⋈ EMPLYEE.BRANCH-NO = BRANCH.BRANCH-NO
(BRANCH) ≡
σc(R×S)≡(σc(R)) × S
Example:
σEMP-TITLE = ‘ Manager’ ^ CITY = ‘ Mumbai ’ (EMPLOYEE) ⋈EM
PLOYEE.BRANCH-NO = BRANCH.BRANCH-NO (BRANCH)
= σ TEMP-TTTLE = ‘Manager’ (EMPLOYEE)
Example:
∏EMP-TITLE, CITY (EMPLOYEE) ⋈EMPLOYEE.BRANCH-NO. =
BRANCH.BRANCH-NO.
⋈EMPLOYEE.BRANCH-NO. = BRANCH.BRANCH-NO
(∏CITY, BRANCH-NO. (BRANCH))
Rule 8: Commutativity of Union (∪) and Intersection (∩)
R∪S≡S∩R
R∩S≡S∩R
Rule 9: Commutativity of Selection (σ) and set of operations
such as Union (∪), Intersection (∩) and set difference (−)
σc (R ∪ S) = σc (S) ⋃ σc (R)
σc (R ∪ S) = σc (S) ∩ σc (R)
σc (R ∩ S) = σc (S) − σc (R)
If θ stands for any of the set of operations such as Union (⋃), Intersection
(⋂) or set difference (−), then the above expression can be written as:
(R×S)× T = R × (S × T)
If the join condition c involves only attributes from the relation S and T,
then join is associative in the following manner:
If θ stands for any of the set of operations such as Join (⋈), Union (∪),
Intersection (∩) or Cartesian product (×), then the above expression can be
written as:
(R θ S) θ T = R θ (S θ T)
Rule 12: Associativity of Union (∪) and Intersection (∩)
(R ∪ S) ∪ T = S ∪ (R ∪ T)
(R ∩ S) ∩ T = S ∩ (R ∩ T)
Rule 13: Converting a Selection and Cartesian Product (σ, ×)
sequence into Join (⋈)
σc (R × S) ≡ (R ⋈c S)
Let us consider the SQL query in which the prospective renters are looking
for a ‘Bungalow’. Now, we have to develop a query to find the properties
that match their requirements and are owned by owner ‘Mathew’.
The SQL query for the above requirement can be written as:
The main aim of query optimization is to choose the most efficient way of
implementing the relational algebra operations at the lowest possible cost.
Therefore, the query optimizer should not depend solely on heuristics rules,
but, it should also estimate the cost of executing the different strategies and
find out the strategy with the minimum cost estimate. The method of
optimising the query by choosing a strategy those results in minimum cost is
called cost-based query optimization. The cost-based query optimization
uses formulae that estimate the costs for a number of options and selects the
one with lowest cost and most efficient to execute. The cost functions used
in query optimization are estimates and not exact cost functions. So, the
optimization may select a query execution strategy that is not the optimal
one.
The cost of an operation is heavily dependent on its selectivity, that is, the
proportion of the input relation(s) that forms the output. In general, different
algorithms are suitable for low-and high-selectivity queries. In order for a
query optimiser to choose a suitable algorithm for an operation an estimate
of the cost of executing that algorithm must be provided. The cost of an
algorithm is dependent on the cardinality of its input. To estimate the cost of
different query execution strategies, the query tree is viewed as containing a
series of basic operations which are linked in order to perform the query.
Each basic operation has an associated cost function whose argument(s) are
the cardinality of its input(s). It is also important to know the expected
cardinality of an operation’s output, since this forms the input to the next
operation in the tree. The expected cardinalities are derived from statistical
estimates of a query’s selectivity, that is, the portion of the tuple satisfying
the query.
Out of the above five cost components, the most important is the
secondary storage access cost. The emphasis of cost minimisation depends
on the size and type of database applications. For example, in smaller
databases the emphasis is on the minimising computation cost as because
most of the data in the files involve in the query can be completely stored in
the main memory. For large databases, the main emphasis is on minimizing
the access cost to secondary storage. For distributed databases, the
communication cost is minimised as because many sites are involved for the
data transfer.
To estimate the costs of various execution strategies, we must keep track
of any information that is needed for the cost functions. This information
may be stored in the DBMS catalog, where it is accessed by the query
optimiser. Typically, the DBMS is expected to hold the following types of
information in its system catalogue:
i. The number of tuples (records) in relation R, given as [nTuples(R)].
ii. The average record size in relation R.
iii. The number of blocks required to store relation R, given as [nBlocks(R)].
iv. The blocking factor of relation R (that is, the number of tuples of R that fit into one block),
given as [bFactor(R)].
v. Primary access method for each file.
vi. Primary access attributes for each file.
vii. The number of levels of each multi-level index I (primary, secondary, or clustering), given as
[nLevelsA(I)].
viii. The number of first-level index blocks, given as [nBlocksA(I)]
ix. The number of distinctive values that appear for attribute A in relation R, given as
[nDistinctA(R)].
x. The minimum and maximum possible values for attribute A in relation R, given as [minA(R),
maxA(R)].
xi. The selectivity of an attribute, which is the fraction of records satisfying an equality
condition on the attribute.
xii. The selection cardinality of attribute A in relation R, given as [SCA(R)]. The selection
cardinality is the average number of tuples (records) that satisfy an equality condition on
attribute A.
For the use in estimating the cost of various execution strategies, the
query optimiser needs reasonably close values of the frequently changing
parameters such as the number of tuples (records) in a file (or relation) every
time a record is inserted or deleted. This is so because every time a tuple is
inserted deleted or updated, updating of the database at peak times would
have a significant impact on the performance of DBMS. Alternatively,
DBMS may update the database on a periodic basis, for example, fortnightly
or whenever the system is idle. This will help in minimising the estimated
cost.
Let us also assume that the EMPLOYEE relation has the following
statistics stored in the system catalog:
nTuples(EMPLOYEE) = 6,000
bFactor(EMPLOYEE) = 60
nBlocks(EMPLOYEE) = [nTuples(EMPLOYEE)
/bFactor
(EMPLOYEE)]
= 6,000 / 60 = 100
nDistinctDEPT-ID
(EMPLOYEE) = 1000
SCDEPT-ID (EMPLOYEE) = [nTuples(EMPLOYEE)
/nDistinctDEP-ID
(EMPLOYEE)]
= 6,000 / 1000 = 6
nDistinctPOSITION = 20
(EMPLOYEE)
SCPOSITION(EMPLOYEE) = [nTuples(EMPLOYEE)
nDistinctPOSITION
(EMPLOYEE)]
= 6,000 / 20 = 300
nDistinctSALARY = 1000
(EMPLOYEE)
SC SALARY (EMPLOYEE) = [nTuples(EMPLOYEE)
nDistinctSALARY
(EMPLOYEE)]
= 6,000 / 1000 = 6
nDistinctPOSITION = 20
(EMPLOYEE)
minSALARY(EMPLOYEE) = 20,000
maxSALARY(EMPLOYEE) = 80,000
nLevelsDEPT-ID(I) =2
nLevelsSALARY(I) =2
nLfBlocksSALARY(I) = 50
The estimated cost of a linear search on the key attribute EMP-ID is 50
blocks and the cost of a linear search on a non-key attribute is 100 blocks.
Now let us consider the following Selection operations, and use the
strategies of Table 11.4 to improve on these two costs:
Now we will choose the query execution strategies by comparing the cost
as follows:
Let us also assume that the EMPLOYEE relation has the following
statistics stored in the system catalog:
nTuples(EMPLOYEE) = 12,000
bFactor(EMPLOYEE) = 120
nBlocks(EMPLOYEE) = [nTuples
(EMPLOYEE)/bFactor
(EMPLOYEE)]
= 12,000 / 120 = 200
nTuples(DEPARTMENT) = 600
bFactor(DEPARTMENT) = 60
nBlocks(DEPARTMENT) = [nTuples
(DEPARTMENT)/b
Factor(DEPARTMENT
)]
= 600 / 60 = 10
nTuples(PROJECT) = 80,000
bFactor(PROJECT) = 40
nBlocks(PROJECT) = [nTuples(PROJECT)
/bFactor(PROJECT)] = 80000 / 40 = 2000
Now let us consider the following two Join and use the strategies of Table
11.5 to improve on the costs:
The estimated I/O cost of Join operations for the above two joins is shown
in Table 11.6.
Table 11.6
It can be seen in both Join 1 and 2 that the cardinality of the result relation
can be no larger than the cardinality of the first relation, as we are joining
over the key of the first relation. Also, it is to be noted that no one strategy is
best for both join operations. The sort-merge join is the best for the Join 1
provided both relations are already sorted. The indexed nested-loop join is
the best for the Join 2.
Advantages
The use of pipelining saves on the cost of creating temporary relations and reading the results
back in again.
Disadvantages
The inputs to operations are not necessarily available all at once for processing. This can
restrict the choice of algorithms.
The above terms were defined by Graefe and DeWitt in 1987. They refer
to how operations are combined to execute the query. Naming convention
relates to the way the inputs of binary operations, particularly join, are
treated. Most operations treat their inputs in different ways, so the
performance characteristics differ according to the ordering of the inputs.
Fig. 11.10 illustrates different schemes of query evaluation plans.
Left-deep (or join) tree query execution plan starts from a relation (table)
and constructs the result by successively adding an operation involving a
single relation (table) until the query is completed. That is, only one input
into a binary operation is an intermediate result. The term relates to how
operations are combined to execute the query, for example, only the left
hand side of a join is allowed to be something that results from a previous
join and hence the name left-deep tree. Fig. 11.10 (a) shows an example of
left-deep query execution plan. All the relational algebra trees we have
discussed in the earlier sections of this chapter are left-deep (join) trees. The
left-deep tree query execution plan has the advantages of reducing the search
space and allowing the query optimiser to be based on dynamic
programming techniques. Left-tree join plans arr particularly convenient for
pipelined evaluation, since the right operand is a stored relation, and thus
only one input to each join is pipelined. The main disadvantage is that, in
reducing the search space, many alternative execution strategies are not
considered, some of which may be of lower cost than the one found using
the linear tree.
Fig. 11.10 Query execution plan
R Q
1. What do you mean by the term query processing? What are its objectives?
2. What are the typical phases of query processing? With a neat sketch discuss these phases in
high-level query processing.
3. Discuss the reasons for converting SQL queries into relational algebra queries before query
optimization is done.
4. What is syntax analyser? Explain with an example.
5. What is the objective of query decomposer? What are the typical phases of query
decomposition? Describe these phases with a neat sketch.
6. What is a query execution plan?
7. What is query optimization? Why is it needed?
8. With a detailed block diagram, explain the function of query optimization.
9. What is meant by the term heuristic optimization? Discuss the main heuristics that are
applied during query optimization to improve the processing of query.
10. Explain how heuristic query optimization is performed with an example.
11. How does a query tree represent a relational algebra expression?
12. Write and justify an efficient relational algebra expression that is equivalent to the following
given query:
SELECT B1.BANK-NAME
FROM BANK1 AS B1, BANK2 AS B2
WHERE B1.ASSETS > B2.ASSETS AND
B2.BANK-LOCATION = ‘Jamshedpur’
13. What is query tree? What is meant by an execution of a query tree? Explain with an example.
14. What is relational algebra query tree?
15. What is the objective of query normalization. What are its equivalence rules?
16. What is the purpose of syntax analyser? Explain with an example.
17. What is the objective of a query simplifier? What are the idempotence rules used by query
simplifier? Give an explain to explain the concept.
18. What are query transformation rules?
19. Discuss the rules for transformation of query trees and identify when each rule should be
applied during optimization.
20. Discuss the main cost components for a cost function that is used to estimate query execution
cost.
21. What cost components are used most often as the basis for cost functions?
22. List the cost functions for the SELECT and JOIN operations.
23. What are the cost functions of the SELECT operation for a linear search and a binary search?
24. Consider the relations R(A, B, C), S(C, D, E) and T(E, F), with primary keys A, C and E,
respectively. Assume that R has 2000 tuples, S has 3000 tuples, and T has 1000 tuples.
Estimate the size of R ⋈ S ⋈ T and give an efficient strategy for computing the join.
25. What is meant by semantic query optimization?
26. What are heuristic optimization algorithms? Discuss various steps in heuristic optimization
algorithm.
27. What is a query evaluation plan? What are its advantages and disadvantages?
28. Discuss the different types of query evaluation trees with the help of a neat sketch.
29. What is materialization?
30. What is pipelining? What are its advantages?
31. Let us consider the following relations (tables) that form part of a database of a relational
DBMS:
Using the above HOTEL schema, determine whether the following queries are semantically
correct:
32. Using the hotel schema of exercise 31, draw a relational algebra tree for each of the
following queries. Use the heuristic rules to transform the queries into a more efficient form.
There is a hash index with no overflow on the primary key attributes ROOM-NO,
HOTEL-NO in the relation ROOM.
There is a clustering index on the foreign key attribute HOTEL-NO in the relation
ROOM.
There is B+-tree index on the PRICE attribute in the relation ROOM.
There is a secondary index on the attribute type in the relation ROOM.
Let us also assume that the schema has the following statistics stored in the system catalogue:
nTuples(ROOM) = 10,000
nTuples(HOTEL) = 50
nTuples(BOOKING) = 100000
nDistinctHOTEL-NO = 50
(ROOM)
nDistinctTYPE (ROOM) = 10
nDistinctPRICE (ROOM) = 500
minPRICE (ROOM) = 200
maxPRICE (ROOM) = 50
nLevelsHOTEL-NO (I) =2
nLevelPRICE (I) =2
nLfBlocksPRICE(I) = 50
bFactor(ROOM) = 200
bFactor(HOTEL) = 40
bFactor(BOOKING) = 60
a. Calculate the cardinality and minimum cost for each of the following Selection
operations:
b. Calculate the cardinality and minimum cost for each of the following Join
operations:
STATE TRUE/FALSE
1. Query processing is the procedure of selecting the most appropriate plan that is used in
responding to a database request.
2. Execution plan is a series of query complication steps.
3. The cost of processing a query is usually dominated by secondary storage access, which is
slow compared to memory access.
4. The transformed query is used to create a number of strategies called execution (or access)
plans.
5. The internal query representation is usually a binary query tree.
6. A query is contradictory if its predicate cannot be satisfied by any tuple in the relation(s).
7. A query tree is also called a relational algebra tree.
8. Heuristic rules are used as an optimization technique to modify the internal representation of
a query.
9. Transformation rules are used by the query optimiser to transform one relational algebra
expression into an equivalent expression that is more efficient to execute.
10. Systematic query optimization is used for estimation of the cost of different execution
strategies and choosing the execution plan with the lowest cost estimate.
11. Usually, heuristic rules are used in the form of query tree or query graph data structure.
12. The heuristic optimization algorithm utilizes some of the transformation rules to transform an
initial query tree into an optimised and efficiently executable query tree.
13. The emphasis of cost minimisation depends on the size and type of database applications.
14. The success of estimating size and cost of intermediate relational algebra operations depends
on the amount and accuracy of the statistical data information stored with the database
management system (DBMS).
15. The cost of materialised evaluation includes the cost of writing the result of each operation to
the secondary storage and reading them back for the next operation.
16. Combining operations into a pipeline eliminates the cost of reading and writing temporary
relations.
a. parser.
b. compiler.
c. syntax checker.
d. none of these.
a. decomposition.
b. restructuring.
c. analysis.
d. none of these.
5. In which phase of the query processing is the query lexically and syntactically analysed using
parsers to find out any syntax errors?
a. normalization.
b. semantic analysis.
c. analysis.
d. all of these.
6. Which of the following represents the result of a query in a query tree?
a. root node.
b. leaf node.
c. intermediate node.
d. none of these.
7. In which phase of the query processing are the queries that are incorrectly formulated or are
contradictory are rejected?
a. simplification.
b. semantic analysis.
c. analysis.
d. none of these.
a. R ∪ S = S ∪ R.
b. R ∩ S = S ∩ R.
c. R − S = S − R.
d. All of these.
a.
b.
c. ∏L∏M………∏N (R) ≡ ∏L.
d.
a.
b.
c. ∏L∏M………∏N (R) ≡ ∏L.
d.
a.
b.
c. ∏L∏M………∏N (R) ≡ ∏L.
d.
a.
b.
c. ∏L∏M………∏N (R) ≡ ∏L.
d.
14. Which of the following transformation is referred to as commutativity of projection and join?
a.
b. R ∪ S = S ∪ R.
c. R ∩ S = S ∩ R.
d. both (b) and (c).
a.
b. R ⋃ S = S u R.
c. R ⋂ S = S ⋂ R.
d. both (b) and (c).
17. Which of the following cost is the most important cost component to be considered during
the cost-based query optimization?
a. query tree.
b. query graph data structure.
c. both (a) and (b).
d. either (a) or (b).
20. The success of estimating size and cost of intermediate relational algebra operations depends
on the emphasis of cost minimization depends on the
a. pipelining.
b. materialization.
c. tunnelling.
d. none of these.
1. A query processor transforms a _____ query into an _____ that performs the required
retrievals and manipulations in the database.
2. Execution plan is a series of _____ steps.
3. In syntax-checking phase of query processing the system _____ the query and checks that it
obeys the _____ rules.
4. _____ is the process of transforming a query written in SQL (or any high-level language)
into a correct and efficient execution strategy expressed in a low-level language.
5. During the query transformation process, the _____ checks the syntax and verifies if the
relations and the attributes used in the query are defined in the database.
6. Query transformation is performed by transforming the query into _____ that are more
efficient to execute.
7. The four main phases of query processing are (a) _____, (b) _____, (c) _____ and (d) _____.
8. The two types of query optimization techniques are (a) _____ and (b) _____.
9. In _____, the query is parsed, validated and optimised once.
10. The objective of _____ is to transform the high-level query into a relational algebra query
and to check whether that query is syntactically and semantically correct.
11. The five stages of query decomposition are (a) _____ , (b) _____, (c) _____, (d) _____ and
(e) _____.
12. In the _____ stage, the query is lexically and syntactically analysed using parsers to find out
any syntax error.
13. In _____ stage, the query is converted into normalised form that can be more easily
manipulated.
14. In _____ stage, incorrectly formulated and contradictory queries are rejected.
15. _____ uses the transformation rules to convert one relational algebraic expression into an
equivalent form that is more efficient.
16. The main cost components of query optimization are (a) _____ and (b) _____.
17. A query tree is also called a _____ tree.
18. Usually, heuristic rules are used in the form of _____ or _____ data structure.
19. The heuristic optimization algorithm utilises some of the transformation rules to transform an
_____ query tree into an _____ and _____ query tree.
20. The emphasis of cost minimization depends on the _____ and _____ of database
applications.
21. The process of query evaluation in which several relational operations are combined into a
pipeline of operations is called_____.
22. If the results of the intermediate processes in a query are created and then are used for
evaluation of the next-level operations, this kind of query execution is called _____.
Chapter 12
12.1 INTRODUCTION
For the client, the entire process as explained above is a single operation
called transaction, which is payment of the mobile bill to Reliance mobile.
But within the database system, this comprises several operations. It is
essential that either all these operations occur, in which case the bill payment
will be successful, or in case of a failure, none of the operations should take
place, in which case the bill payment would be unsuccessful and client will
be asked to try again. It is unacceptable if the client’s credit card account is
debited and the Reliance mobile’s account is not credited. The client will
loose the money and his mobile number will be deactivated.
A transaction is a sequence of READ and WRITE actions that are grouped
together to from a database access. Whenever we Read from and/or Write to
(update) the database, a transaction is created. A transaction may consist of a
simple SELECT operation to generate a list of table contents, or it may
consist of a series of related UPDATE command sequences. A transaction
can include the following basic database access operations:
Read_item(X): This operation reads a database item named X into a
program variable Y. Execution of Read-item(X) command includes the
following steps:
Find the address of disk block that contains the item X.
Copy that disk block into a buffer in main memory.
Copy item X from the buffer to the program variable named Y.
BEGIN_TRANSACTION_1:
READ (TABLE = T1, ROW = 15, OBJECT = COL1);
:COL1 = COL1 + 500;
WRITE (TABLE = T1, ROW = 15, OBJECT = COL1, VALUE
=:COL1);
READ (TABLE = T2, ROW = 15, OBJECT = COL2);
:COL2 = COL2 + 500;
WRITE (TABLE = T2, ROW = 30, OBJECT = COL2, VALUE
=:COL2);
READ (TABLE = T3, ROW = 30, OBJECT = COL3);
:COL3 = COL3 + 500;
WRITE (TABLE = T3, ROW = 45, OBJECT = COL3, VALUE
=:COL3);
END_OF_TRANSACTION_1;
BEGIN TRANSACTION_T1
END TRANSACTION_T1;
The log is written before any updates are made to the database. This is
called write-ahead log strategy. In this strategy, a transaction is not allowed
to modify the physical database until the undo portion of the log is written to
stable database. Table 12.1 illustrates example of a transaction log of section
12.2.2 in which the previous two SQL sequences are reflected for database
tables EMPLOYEE and PROJECT. In case of a system failure, the DBMS
examines the transaction log for all uncommitted or incomplete transactions
and restores (ROLLBACK) the database to its previous state based on the
information in the transaction log. When the recovery process is completed,
the DBMS writes in the transaction log all committed transactions that were
not physically written to the physical database before the failure occurred.
The TRNASACTION-ID is automatically assigned by the DBMS. If a
ROLLBACK is issued before the termination of a transaction, the DBMS
restores the database only for that particular transaction, rather than for all
transactions, in order to maintain the durability of the previous transactions.
In other words, committed transactions are not rolled back.
Table 12.4 shows the serial execution of these transactions under normal
circumstances, yielding the correct result of EMP-LOAN-BAL = 60000.
Table 12.5 illustrates the sequence of execution resulting in dirty read (or
uncommitted data) problem when the ROLLBACK is completed after
transaction T2 has begun its execution.
Although the final results are correct after the adjustment, inconsistent
retrievals are possible during the correction process as illustrated in Table
12.7.
As shown in Table 12.7, the computed answer of INR 350000 is obviously
wrong, because we know that the correct answer is INR 330000. Unless the
DBMS exercises concurrency control, a multi-user database environment
can create havoc within the information system.
Read-Read: Permutable
Read-write: Not permutable, since the result is different depending
on whether read is first or write is first.
Write-Write: Not permutable, as the second write always nullifies the
effects of the first write.
12.3.4 Schedule
A schedule (also called history) is a sequence of actions or operations (for
example, reading writing, aborting or committing) that is constructed by
merging the actions of a set of transactions, respecting the sequence of
actions within each transaction. As we have explained in our previous
discussions, as long as two transactions T1 and T2 access unrelated data,
there is no conflict and the order of execution is not relevant to the final
result. But, if the transactions operate on the same or related
(interdependent) data, conflict is possible among the transaction components
and the selection of one operational order over another may have some
undesirable consequences. Thus, DBMS has inbuilt software called
scheduler, which determines the correct order of execution. The scheduler
establishes the order in which the operations within concurrent transactions
are executed. The scheduler interleaves the execution of database operations
to ensure serialisability (as explained in section 12.3.5). The scheduler bases
its actions on concurrency control algorithms, such as locking or time
stamping methods. The schedulers ensure the efficient utilisation of central
processing unit (CPU) of computer system.
Fig. 12.6 shows a schedule involving two transactions. It can be observed
that the schedule does not contain an ABORT or COMMIT action for either
transaction. Schedules which contain either an ABORT or COMMIT action
for each transaction whose actions are listed in it are called a complete
schedule. If the actions of different transactions are not interleaved, that is,
transactions are executed one by one from start to finish, the schedule is
called a serial schedule. A non-serial schedule is a schedule where the
operations from a group of concurrent transactions are interleaved.
A lock is a variable associated with a data item that describes the status of
the item with respect to possible operations that can be applied to it. It
prevents access to a database record by a second transaction until the first
transaction has completed all of its actions. Generally, there is one lock for
each data item in the database. Locks are used as means of synchronising the
access by concurrent transactions to the database items. Thus, locking
schemes aim to allow the concurrent execution of compatible operations. In
other words, permutable actions are compatible. Locking is the most widely
used form of concurrency control and is the method of choice for most
applications. Locks are granted and released by a lock manager. The
principle data structure of a lock manager is the lock table. In the lock table,
an entry consists of a transaction identifier, a granule identifier and lock
type. The simplest type of a locking scheme has two types of lock namely
(a) S locks- shared or Read lock and (b) X locks-exclusive or Write lock. The
lock manager refuses incompatible requests, so if
a. Transaction T1 holds an S lock on granule G1. A request by transaction T2 for an S lock will
be granted. In other words, Read-Read is permutable.
b. Transaction T1 holds an S lock on granule G1. A request by transaction T2 for an X lock will
be refused. In other words, Read-Write is not permutable.
c. Transaction T1 holds an X lock on granule G1. No request by transaction T2 for a lock on
G1. will be granted. In other words, Write is not permutable.
Thus, the granularity affects the concurrency control of the data items,
that is, what portion of the database a data item represents. An item can be as
small as a single attribute (or field) value or as large as a disk block, or even
a whole file or the entire database.
Table 12.8 illustrates the binary locking technique for the example of lost
update (section 12.3.1). It can be observed from the above table that the lock
and unlock features eliminate the lost update problem as depicted in table
12.3. Binary locking system has advantages of easy to implement. However,
the binary locking technique has limitations of being restrictive to yield
optimal concurrency conditions. For example, the DBMS will not allow the
two transactions to read the same database object, even though neither
transaction updates the database. Therefore, concurrency problems do not
occur as is the case in lost update.
12.4.3 Deadlocks
A deadlock is a condition in which two (or more) transactions in a set are
waiting simultaneously for locks held by some other transaction in the set.
Neither transaction can continue because each transaction in the set is on a
waiting queue, waiting for one of the other transactions in the set to release
the lock on an item. Thus, a deadlock is an impasse that may result when
two or more transactions are each waiting for locks to be released that are
held by the other. Transactions whose lock requests have been refused are
queued until the lock can be granted. A deadlock is also called a circular
waiting condition where two transactions are waiting (directly or indirectly)
for each other. Thus in a deadlock, two transactions are mutually excluded
from accessing the next record required to complete their transactions, also
called a deadly embrace. A deadlock exists when two transactions T1 and T2
exist in the following mode:
Fig. 12.10 Schedule with strict two-phase locking with interleaved actions
Fig. 12.11 Waits-for graph for deadlocks involving two or more transactions
In the complex situation of Fig. 12.11 (b), two alternatives can be used
such as (a) to minimize the amount of work done by the transactions to be
aborted or (b) to find the minimal cut-set of the graph and abort the
corresponding transactions.
R Q
1. What is a transaction? What are its properties? Why are transactions important units of
operation in a DBMS?
2. Draw a state diagram and discuss the typical states that a transaction goes through during
execution.
3. How does the DBMS ensure that the transactions are executed properly?
4. What is consistent database state and how is it achieved?
5. What is transaction log? What are its functions?
6. What are the typical kinds of records in a transaction log? What are transaction commit
points and why are they important?
7. What is a schedule? What does it do?
8. What is concurrency control? What are its objectives?
9. What do you understand by the concurrent execution of database transactions in a multi-user
environment?
10. What do you mean by atomicity? Why is it important? Explain with an example.
11. What do you mean by consistency? Why is it important? Explain with an example.
12. What do you mean by isolation? Why is it important? Explain with an example.
13. What do you mean by durability? Why is it important? Explain with an example.
14. What are transaction states?
15. A hospital blood bank transaction system is given which records the following information:
16. Discuss the transition execution state with a state transition diagram and related problems.
17. What are ACID properties of a database transaction? Discuss each of these properties and
how they relate to the concurrency control. Give examples to illustrate your answer.
18. Explain the concepts of serial, non-serial and serialisable schedules. State the rules for
equivalence of schedules.
19. Explain the distinction between the terms serial schedule and serialiable schedule.
20. What is locking? What is the relevance of lock in database management system? How does a
lock work?
21. What are the different types of locks?
22. What is deadlock? How can a deadlock be avoided?
23. Discuss the problems of deadlock and the different approaches to dealing with these
problems.
24. Consider the following two transactions:
T1 : Read (A)
Read (B)
If A = 0 then B := B + 1
Write (B).
T2 : Read (B)
Read (A)
If B = 0 then A := A + 1
Write (A).
a. Add lock and unlock instructions to transactions T1 and T2 , so that they observe
the two-phase locking protocol.
b. Can the execution of these transactions result in a deadlock?
25. Compare binary locks to shared/exclusive locks. Why is the former type of locks preferable?
26. Discuss the actions taken by Read_item and Write_item operations on a database.
27. Discuss how seralizability is used to enforce concurrency control in a database system. Why
is seralizability sometimes considered too restrictive as a measure of correctness for
schedules?
28. Describe the four levels of transaction concurrency.
29. Define the violations caused by the following:
a. Lost updates.
b. Dirty read (or uncommitted data).
c. Unrepeatable read (or inconsistent retrievals).
30. Describe the wait-die and wound-wait techniques for deadlock prevention.
31. What is a timestamp? How does the system generate timestamp?
32. Discuss the timestamp ordering techniques for concurrency control.
33. When a transaction is rolled back under timestamp ordering, it is assigned a new timestamp.
Why can it not simply keep its old timestamp?
34. How does optimistic concurrency control method differ from other concurrency control
methods? Why are they also called validation or certification methods:
35. How does the granularity of data items affect the performance of concurrency control
methods? What factors affect selection of granularity size of data items?
36. What is serialisability? What is its objective?
37. Using an example, illustrate how two-phase locking works.
38. Two transactions are said to be serialisable if they can be executed in parallel (interleaved) in
such a way that their results are identical to that achieved if one transaction was processed
completely before the other was initiated. Consider the following two interleaved
transactions, and suppose a consistency condition requires that data items A or B must always
be equal to 1. Assume that A = B = 1 before these transactions execute.
Transaction T1 Transaction T2
Read_item(A)
Read_item(B)
Read_item(A)
Read_item(B)
If A = 1
then B := B + 1
If B = 1
then A := A + 1
Write_item(A)
Write_item(B)
a. Will the consistency requirement be satisfied? Justify your answer.
b. Is there an interleaved processing schedule that will guarantee serialisability? If so,
demonstrate it. If not, explain why?
39. Assuming a transaction log with immediate updates, create the log entries corresponding to
the following transaction actions:
40. Suppose that in Question 1 a failure occurs just after the transaction log record for the action
write (B, b1) has been written.
Data items
Data items locked by
Transaction transaction is waiting
transaction
for
T1 X2 X1, X3
T2 X3, X10 X7, X8
T3 X8 X4, X5
T4 X7 X1
T5 X1, X5 X3
T6 X4, X9 X6
T7 X6 X5
STATE TRUE/FALSE
1. The transaction consists of all the operations executed between the beginning and end of the
transaction.
2. A transaction is a program unit, which can either be embedded within an application program
or can be specified interactively via a high-level query language such as SQL.
3. The changes made to the database by an aborted transaction should be revered or undone.
4. A transaction that is either committed or aborted is said to be terminated.
5. Atomic transaction is transactions in which either all actions associated with the transaction
are executed to completion, or none are performed.
6. The effects of a successfully completed transaction are permanently recorded in the database
and must not be lost because of a subsequent failure.
7. Level 0 transactions are recoverable.
8. Level 1 transaction is the minimum consistency requirement that allows a transaction to be
recovered in the event of system failure.
9. Log is a record of all transactions and the corresponding changes to the database.
10. Level 2 transaction consistency isolates from the updates of other transactions.
11. The DBMS automatically update the transaction log while executing transactions that modify
the database.
12. A committed transaction that has performed updates transforms the database into a new
consistent state.
13. The objective of concurrency control is to schedule or arrange the transactions in such a way
as to avoid any interference.
14. Incorrect analysis problem is also known as dirty read or unrepeatable read.
15. A consistent database state is one in which all data integrity constraints are satisfied.
16. The serial execution always leaves the database in a consistent state although different results
could be produced depending on the order of execution.
17. Cascading rollbacks are not desirable.
18. Locking and timestamp ordering are optimistic techniques, as they are designed based on the
assumption that conflict is rare.
19. Two types of locks are Read and Write locks.
20. In the two-phase locking, every transaction is divided into (a) growing phase and (b)
shrinking phase.
21. A dirty read problem occurs when one transaction updates a database item and then the
transaction fails for some reason.
22. The size of the locked item determines the granularity of the lock.
23. There is no deadlock in the timestamp method of concurrency control.
24. A transaction that changes the contents of the database must alter the database from one
consistent state to another.
25. A transaction is said to be in committed state if it has partially committed, and it can be
ensured that it will never be aborted.
26. Level 3 transaction consistency adds consistent reads so that successive reads of a record will
always give the same values.
27. A lost update problem occurs when two transactions that access the same database items
have their operations in a way that makes the value of some database item incorrect.
28. Serialisability describes the concurrent execution of several transactions.
29. Unrepeatable read occur when a transaction calculates some summary function over a set of
data while other transactions are updating the data.
30. It prevents access to a database record by a second transaction until the first transaction has
completed all of its actions.
31. In a shrinking phase, a transaction releases all locks and cannot obtain any new lock.
32. A deadlock in a distributed system may be either local or global.
1. Which of the following is the activity of coordinating the actions of process that operate in
parallel and access shared data?
a. Transaction management
b. Recovery management
c. Concurrency control
d. None of these.
2. Which of the following is the ability of a DBMS to manage the various transactions that
occur within the system?
a. Transaction management
b. Recovery management
c. Concurrency control
d. None of these.
a. Isolation
b. Durability
c. Atomicity
d. All of these.
4. Which of the following ensures the consistency of the transactions?
a. Application programmer
b. Concurrency control
c. Recovery management
d. Transaction management.
a. Application programmer
b. Concurrency control
c. Recovery management
d. Transaction management.
a. Application programmer
b. Concurrency control
c. Recovery management
d. Transaction management.
a. Application programmer
b. Concurrency control
c. Recovery management
d. Transaction management.
a. Active
b. Commit
c. Aborted
d. All of these.
a. lost updates
b. dirty read
c. unrepeatable read
d. all of these.
11. Which of the following is not a transaction management SQL command?
a. COMMIT
b. SELECT
c. SAVEPOINT
d. ROLLBACK.
12. Which of the following is a statement after which you cannot issue a COMMIT command?
a. INSERT
b. SELECT
c. UPDATE
d. DELETE.
a. uniqueness.
b. monotonicity.
c. both (a) and (b).
d. none of these.
a. validation
b. write
c. read
d. all of these.
15. The READ and WRITE operations of database within the same transaction must have
a. same timestamp.
b. different timestamp.
c. no timestamp.
d. none of these.
16. Which of the following is a transaction state when the normal execution of the transaction
cannot proceed?
a. Failed
b. Active
c. Terminated
d. Aborted.
a. Page level.
b. Database level.
c. Row level.
d. all of these.
a. Recovery
b. Compensating transaction
c. Rollback
d. None of these.
20. Which of the following is the size of the data item chosen as the unit of protection by a
concurrency control program?
a. Blocking factor
b. Granularity
c. Lock
d. none of these.
a. Read_item(X).
b. Write_item(X).
c. both (a) & (b).
d. none of these.
22. Which of the following is a problem resulting from concurrent execution of transaction?
a. Incorrect analysis
b. Multiple update
c. Ucommitted dependency
d. all of these.
a. Timeout
b. Deadlock annihilation
c. Deadlock prevention
d. Deadlock detection.
24. In which of the following schedule are the transactions performed one after another, one at a
time?
a. Non-serial schedule
b. Conflict serialisable schedule
c. Serial schedule
d. None of these.
25. A shared lock exists when concurrent transactions are granted the following access on the
basis of a common lock:
a. READ
b. WRITE
c. SHRINK
d. UPDATE.
a. by locking data
b. without unlocking any data
c. with unlocking any data
d. None of these.
a. Validation-based
b. Timestamp ordering
c. Lock-based
d. None of these.
29. In optimistic methods, each transaction moves through the following phases:
a. read phase
b. validation phase
c. write phase
d. All of these.
13.1 INTRODUCTION
Concurrency control and database recovery are intertwined and both are a
part of the transaction management. Recovery is required to protect the
database from data inconsistencies and data loss. It ensures the atomicity and
durability properties of transactions as discussed in chapter 12, Section
12.2.3. This characteristics of DBMS helps to recover from the failure and
restore the database to a consistent state. It minimises the time for which the
database is not usable after a crash and thus provides high availability. The
recovery system is an integral part of a database system.
In this chapter, we will discuss the database recovery and examine the
techniques that can be used to ensure the database remaining in a consistent
state in the event of failures. We will finally examine buffer management
method used for database recovery.
There are many types of failures that can affect database processing. Some
failures affect the main memory only, while others involve secondary
storage. Following are the types of failure:
Hardware failures: Hardware failures may include memory errors, disk crashes, bad disk
sectors, disk full errors and so on. Hardware failures can also be attributed to design errors,
inadequate (poor) quality control during fabrication, overloading (use of under-capacity
components) and wearout of mechanical parts.
Software failures: Software failures may include failures related to softwares such as,
operating system, DBMS software, application programs and so on.
System crashes: System crashes are due to hardware or software errors, resulting in the loss
of main memory. There could be a situation that the system has entered an undesirable state,
such as deadlock, which prevents the program from continuing with normal processing. This
type of failure may or may not result in corruption of data files.
Network failures: Network failures can occur while using a client-server configuration or a
distributed database system where multiple database servers are connected by common
networks. Network failures such as communication software failures or aborted
asynchronous connections will interrupt the normal operation of the database system.
Media failures: Such failures are due to head crashes or unreadable media, resulting in the
loss of parts of secondary storage. They are the most dangerous failures.
Application software errors: These are logical errors in the program that is accessing the
database, which cause one or more transactions to fail.
Natural physical disasters: These are failures such as fires, floods, earthquake or power
failures.
Carelessness: These are failures due to unintentional destruction of data or facilities by
operators or users.
Sabotage: These are failures due to intentional corruption or destruction of data, hardware, or
software facilities.
In the event of failure, there are two principal effects that happen, namely
(a) loss of main memory including the database buffer and (b) the loss of the
disk copy (secondary storage) of the database. Depending on the type and
the extent of the failure, the recovery process ranges from a minor short-term
inconvenience to major long-term rebuild action. Regardless of the extent of
the required recovery process, recovery is not possible without backup.
Example 1
Roll-backward (undo) and roll forward (redo) can be explained with an
example as shown in Fig. 13.3 in which there are a number of concurrently
executing transactions T1, T2, ……, T6. Now, let us assume that the DBMS
starts execution of transactions at time ts but fails at time tf due to disk crash
at time tc. Let us also assume that the data for transactions T2 and T3 has
already been written to the disk (secondary storage) before failure at time tf.
It can be observed from Fig. 13.3 that transactions T1 and T6 had not
committed at the point of the disk crash. Therefore, the recovery manager
must undo the transactions T1 and T6 at the start. However, it is not clear
from Fig. 13.3 that to what extent the changes made by the other already
committed transactions T1 and T6 have been propagated to the database on
secondary storage. This uncertainty could be because the buffers may or may
not have been flushed to secondary storage. Thus, the recovery manager
would be forced to redo transactions T2, T3, T4 and T5.
Fig. 13.3 Example of roll backward (undo) and roll froward (redo)
Example 2
Let us consider another example in which a transaction log operation
history is given as shown in Table 13.1. Besides the operation history, log
entries are listed that are written into the log buffer memory (resident in
main or physical memory) for the database recovery. The second transaction
operation W1 (A, 20) in Table 13.1 is assumed to represent an update by
transaction T1, changing the balance column value to 20 for a row in the
accounts table with ACOUNT-ID = A. In the same sense, the write log (W,
1, A, 50, 20), the value 50 is the before image for the balance column in this
row and 20 is the after image for this column. Now, let us assume that a
system crash occurs immediately after the operation W1 (B, 80) has
completed, in the sequence of events of Table 13.1. This means that the log
entry (W, 1, B, 50, 80) has been placed in the log buffer, but the last point at
which the log buffer was written out to disk was with the log entry (C, 2).
This is the final log entry that will be available when recovery is started to
recover from the crash. At this time, since transaction T2 has committed
while transaction T1 has not, we want to make sure that all updates
performed by transaction T2 are placed on disk an that all updates performed
by transaction T1 are rolled back on disk. The final values for these data
items after recovery has been performed should be A = 50, B = 50, and C =
50, which is the values just before Table 13.1.
After the crash system is reinitialised, a command is given to initiate
database recovery. The process of recovery takes place in two phases namely
(a) roll backward or ROLLBACK and (b) roll forward or ROLL
FORWARD. In the ROLLBACK phase, the entries in the sequential log file
are read in reverse order back to system start-up, when all data access
activity began. We assume that the system start-up happened just before the
first operation R1 (A, 50) of transaction history. In the ROLL FORWARD
phase, the entries in the sequential log file are read forward again to the last
entry. During the ROLLBACK step, recovery performs UNDO of all the
updates that should not have occurred, because the transaction that made
them did not commit. It also makes a list of all transactions that have
committed. We have assumed here that the ROLLBACK phase occurs first
and the ROLL FORWARD phase afterward, as is the case in most of the
commercial DBMSs such as DB2, System R of IBM.
Table 13.2 ROLLBACK process for transaction history crashed just after Wl (B, 80)
4. (W, 1, A, 50, 20) Transaction T1 has never committed. Its last operation
was a write. Therefore, system performs UNDO of
this update by writing the before image value (50)
into data item A. Put transaction T1 into the
uncommitted list.
5. (S, 1) Make a note that transaction T1 is no longer active.
Now that no transactions were active, ROLLBACK
phase is ended.
Table 13.2 and 13.3 list all the log entries encountered and the actions
taken during ROLLBACK and ROLL FORWARD phases of recovery. It is
to be noted that the steps of ROLLBACK are numbered on the left and the
numbering is continued during the ROLL FORWARD phase of table 13.3.
During ROLLBACK the system reads backward through the log entries of
the sequential log file and makes a list of all transactions that did and did not
commit. The list of committed transactions is used in the ROLL
FORWARD, but the list of transactions that did not commit is used to decide
when to UNDO updates. Since the system knows which transactions did not
commit as soon as it encounters (reading backward) the final log entry, it can
immediately begin to UNDO write log changes of uncommitted transactions
by writing before images onto disk over the row values affected. Disk
buffering is used during recovery to read in pages containing rows that need
to be updated by UNDO or REDO steps. An example of UNDO write is
shown in step 4 of table 13.2. Since the transaction responsible for the write
log entry did not commit, it should not have any transactional updates out on
disk. It is possible that some values given in the after images of these write
log entries are not out on disk. But, in any event it is clear that writing the
before images in place of these data items cannot hurt. Eventually, we return
to the value such data items had before any uncommitted transactions tried
to change them.
Table 13.3 ROLL FORWARD process for transaction history taking place after ROLLBACK of
table 13.2
Database recovery techniques used by DBMS depend on the type and extent
of damage that has occurred to the database. These techniques are based on
the atomic transaction property. All portions of transactions must be treated
as a single logical unit of work, in which all operations must be applied and
completed to produce a consistent database. The following two types of
damages can take place to the database:
a. Physical damage: If the database has been physically damaged, for example disk crash has
occurred, then the last backup copy of the database is restored and update operations of
committed transactions are reapplied using the transaction log file. It is to be noted that the
restoration in this case is possible only if the transaction log has not been damaged.
b. Non-physical or Transaction failure: If the database has become inconsistent due to a
system crash during execution of transactions, then the changes that caused the inconsistency
are rolled-backward (undo). It may also be necessary to roll-forward (redo) some transactions
to ensure that the updates performed by them have reached secondary storage. In this case,
the database is restored to a consistent state using the before- and after-images held in the
transaction log file. This technique is also known as log-based recovery technique. The
following two techniques are used for recovery from nonphysical or transaction failure:
Deferred update.
Immediate update.
Time-4 READ (B, b1) Read the current loan cash balance
Time-5 b1 := b1 − 20000 Reduce the loan cash balance left by INR 20000
just after the COMMIT record is entered in the transaction log and before the updated
records are written to the database.
just before the execution of the WRITE operation.
Table 13.6 shows the transaction log when a failure has occurred just after
the “<T, COMMIT>” record is entered in the transaction log and before the
updated records are written to the database. When the system comes back
up, no action is necessary because no COMMIT record for transaction T
appears in the transaction log. The REDO operation is executed, resulting in
the values INR 90000 and INR 60000 being written to the database as the
updated values of A and B.
Table 13.6 Deferred update log entries for transaction T after failure occurrence and updates are
written to the database
Table 13.7 shows the transaction log when a failure has occurred just
before the execution of the write operation “WRITE (B, b1)”. When the
system comes back up, no action is necessary because no COMMIT record
for transaction T appears in the transaction log. The value of A and B in the
database remains INR 70000 and INR 80000. In this case, transaction must
be restarted.
Table 13.7 Deferred update log entries for transaction T when failure occurs before the WRITE
action to the database
Therefore, using the transaction log, the DBMS can handle any failure
without any loss of the log information itself. The prevention of loss of the
transaction log is addressed by having parallel backup (replicating) of
transaction log on more than one disk (secondary storage). Since the
probability of loss of the transaction log is very small, this method is usually
referred to as stable storage.
13.5.2 Immediate Update
In case of immediate update technique, all updates to the database are
applied immediately as they occur without waiting to reach the COMMIT
point and a record of all changes is kept in the transaction log. As discussed
in the previous case of deferred update, if a failure occurs, the transaction
log is used to restore the state of the database to a consistent previous state.
Similarly in immediate update also, when a transaction begins, a record “<T,
BEGIN>” and update operations are written to the transaction log on disk
before it is applied to the database. This type of recovery method requires
two procedures namely (a) redoing transaction T(REDO, T) and (b) undoing
of transaction T(UNDO, T). The first procedure redoes the same operation as
before, whereas the second one restores the values of all attributes updated
by transaction T to their old values. Table 13.8 shows the entries in the
transaction log after the execution of transaction T. After a failure has
occurred, the recovery system examines the transaction log to identify those
transactions that need to be undone or redone.
In the case of immediate update, the transaction log file is used in the
following ways:
When a transaction T begins, transaction begin (or “<T, BEGIN>”) is written to the
transaction log.
When a write operation is performed, a record containing the necessary data is written to the
transaction log file.
Once the transaction log is written, the update is written to the database buffers.
The updates to the database itself are written when the buffers are next flushed (transferred)
to secondary storage.
When the transaction T commits, a transaction commit (“<T, COMMIT>”) record is written
to the transaction log.
If the transaction log reveals the record “<T, BEGIN>” but does not reveal “<T, COMMIT>”,
transaction T is undone. The old values of affected data items are restored and transaction T
is restarted.
If the transaction log contains both of the preceding records, transaction T is redone. The
transaction is not restarted.
Table 13.9 Immediate update log entries for transaction T when failure occurs before the WRITE
action to the database
Table 13.9 shows the transaction log when a failure has occurred just
before the execution of the write operation “WRITE (B, b1)” of table 13.4.
When the system comes back up, it finds the record “<T, BEGIN>” but no
corresponding “<T, COMMIT>”. This means that the transaction T must be
undone. Thus, an “UNDO(T)"” operation is executed. This restores the value
of A to INR 70000 and the transaction can be restarted.
Table 13.10 shows the transaction log when a failure has occurred just
after the execution of “<T, COMMIT>” is written to the transaction log but
before the new values are written to the database. When the system comes
back again, a sacn of the transaction log shows corresponding “<T,
BEGIN>” and “<T, COMMIT>” records. Thus, a “REDO(T)” operation is
executed. This results into the values of A and B as INR 90000 and INR
60000 respectively.
The shadow paging scheme is similar to the one which is used by the
operating system for virtual memory management. In case of virtual memory
management, the memory is divided into pages that are assumed to be of a
certain size (in terms of bytes, kilobytes, or megabytes). The virtual or
logical pages are mapped onto physical memory blocks of the same size as
the pages. The mapping is provided by means of a table known as page
table, as shown in Fig. 13.4.The page table contains one entry for each
logical page of the process’s virtual address space.
The shadow paging technique maintains two page tables during the life of
a transaction namely (a) a current page table and (b) a shadow page table,
for a transaction that is going to modify the database. Fig. 13.5 shows
shadow paging scheme. The shadow page is the original page table and the
transaction addresses the database using current page table. At the start of a
transaction the two tables are same and both point to the same blocks of
physical storage. The shadow page table is never changed thereafter, and is
used to restore the database in the event of a system failure. However,
current page table entries may change during execution of a transaction. The
current page table is used to record all updates to the database. When the
transaction completes, the current page table becomes the shadow page
table.
As shown in Fig. 13.5, the pages that are affected by a transaction are
copied to new blocks of physical storage and these blocks, along with the
blocks not modified, are accessible to the transaction via the current page
table. The old version of the changed pages remains unchanged and these
pages continue to be accessible via the shadow page table. The shadow page
table contains the entries that existed in the page table before the start of the
transaction and points to the blocks that were never changed by the
transaction. The shadow page table remains unaltered by the transaction and
is used for undoing the transaction.
13.5.4 Checkpoints
The point of synchronisation between the database and the transaction log
file is called the checkpoint. As explained in the preceding discussions,
general method of database recovery is using information in the transaction
log. But the main difficulty of this recovery is of knowing how far to go
back in the transaction log to search in case of failure. In the absence of this
exact information, we may end up redoing transactions that have already
been safely written to the database. Also, this can be very time-consuming
and wasteful. A better way is to find a point that is sufficiently far back to
ensure that any item written before that point has been done correctly and
stored safely. This method is called checkpointing. In checkpointing, all
buffers are force- written to secondary storage. The checkpoint technique is
used to limit (a) the volume of log information, (b) amount of searching and
(c) subsequent processing that is needed to carry out on the transaction log
file. The checkpoint technique is an additional component of the transaction
logging method.
During execution of transactions, the DBMS maintains the transaction log
as we have described in the preceding sections but periodically performs
checkpoints. Checkpoints are scheduled at predetermined intervals and
involve the following operations:
Writing the start-of-checkpoint record along with the time and date to the log on a stable
storage device giving the identification that it is a checkpoint.
Writing all transaction log file records in main memory to secondary storage.
Writing the modified blocks in the database buffers to secondary storage.
Writing a checkpoint record to the transaction log file. This record contains the identifiers of
all transactions that are active at the time of the checkpoint.
Writing an end-of-checkpoint record and saving of the address of the checkpoint record on a
file accessible to the recovery routine on start-up after a system crash.
For all operations active at checkpoint, their identifiers and their database
modification actions, which at that time are reflected only in the database
buffers, will be propagated to the appropriate storage. The frequency of
checkpointing is a design consideration of the database recovery system. A
checkpoint can be taken at a fixed interval of time (for example, every 15
minutes, or 30 minutes or one hour and so on).
In case of a failure during the serial operation of transactions, the
transaction log file is checked to find the last transaction that started before
the last checkpoint. Any earlier transactions would have committed
previously and would have written to the database at the checkpoint.
Therefore, it is needed to only redo (a) the one that was active at the
checkpoint and (b) any subsequent transactions for which both start and
commit records appear in the transaction log. If a transaction is active at the
time of failure, the transaction must be undone. If transactions are performed
concurrently, redo all transactions that have committed since the checkpoint
and undo all transactions that were active at the time of failure.
Fig. 13.6 Example of checkpointing
Let us assume that a transaction log is used with immediate updates. Also,
consider that the timeline for transaction T1, T2, T3 and T4 are as shown in
Fig. 13.6. When the system fails at time tf ,the transaction log need only be
scanned as far back as the most recent checkpoint tc . Transaction T1 is okay,
unless there has been disk failure that destroyed it and probably other
records prior to the last checkpoint. In that case, the database is reloaded
from the backup copy that was made at the last checkpoint. In either case,
transactions T2 and T3 are redone from the transaction log, and transaction
T4 is undone from the transaction log.
R Q
1. Discuss the different types of transaction failures that may occur in a database environment.
2. What is database recovery? What is meant by forward and backward recovery? Explain with
an example.
3. How does the recovery manager ensure atomicity and durability of transactions?
4. What is the difference between stable storage and disk?
5. Describe how the transaction log file is a fundamental feature in any recovery mechanism.
6. What is the difference between a system crash and media failure?
7. Describe how transaction log file is used in forward and backward recovery.
8. Explain with the help of examples why it is necessary to store transaction log records in a
stable storage before committing that transaction when immediate update is allowed.
9. What can be done to recover the modifications made by partially completed transactions that
are running at the time of a system crash? Can on-line transaction be recovered?
10. What are the types of damages that can take place to the database? Explain.
11. Differentiate between immediate update and deferred update recovery techniques.
12. Assuming a transaction log with immediate updates, create log entries corresponding to the
transactions as shown in Table 13.11 below.
13. Suppose that in Question 12 a failure occurs just after the transaction log record for the action
WRITE (B, b1) has been written.
14. Suppose that in Question 12 a failure occurs just after the “<T, COMMIT>” record is written
to the transaction log.
15. Consider the entries shown in Table 13.12 at the time of database system failure in the
recovery log.
a. Assuming a deferred update log, describe for each case (A, B, C) what recovery
actions are necessary and why. Indicate what are the values for the given attributes
after the recovery actions are completed.
<T1, COMMIT>
b. Assuming an immediate update log, describe for each case (A, B, C) what recovery
actions are necessary and why. Indicate what are the values for the given attributes
after the recovery actions are completed.
16. What is a checkpoint? How is the checkpoint information used in the recovery operation
following a system crash?
17. Describe the shadow paging recovery technique. Under what circumstances does it not
require a transaction log? List the advantages and disadvantages of shadow paging.
18. What is a buffer? Explain the buffer management technique used in database recovery.
STATE TRUE/FALSE
1. Concurrency control and database recovery are intertwined and both are a part of the
transaction management.
2. Database recovery is a service that is provided by the DBMS to ensure that the database is
reliable and remains in consistent state in case of a failure.
3. Database recovery is the process of restoring the database to a correct (consistent) state in the
event of a failure.
4. Forward recovery is the recovery procedure, which is used in case of physical damage.
5. Backward recovery is the recovery procedure, which is used in case an error occurs in the
midst of normal operation on the database.
6. Media failures are the most dangerous failures.
7. Media recovery is performed when there is a head crash (record scratched by a phonograph
needle) on the disk.
8. The recovery process is closely associated with the operating system.
9. Shadow paging technique does not require the use of a transaction log in a single-user
environment
10. In shadowing both the before-image and after-image are kept on the disk, thus avoiding the
need for a transaction log for the recovery process.
11. The REDO operation updates the database with new values (after-image) that is stored in the
log.
12. The REDO operation copies the old values from log to the database, thus restoring the
database prior to a state before the start of the transaction.
13. In case of deferred update technique, updates are not written to the database until after a
transaction has reached its COMMIT point.
14. In case of an immediate update technique, all updates to the database are applied immediately
as they occur with waiting to reach the COMMIT point and a record of all changes is kept in
the transaction log.
15. A checkpoint is a point of synchronisation between the database and the transaction log file.
16. In checkpointing, all buffers are force-written to secondary storage.
17. The deferred update technique is also known as the UNDO/REDO algorithm.
18. Shadow paging is a technique where transaction log are not required.
19. Recovery restores a database form a given state, usually inconsistent, to a previously
consistent state.
20. The assignment and management of memory blocks is called the buffer manager.
a. Shadow paging.
b. Deferred update.
c. Write-ahead logging.
d. Immediate update.
2. Incremental logging with deferred updates implies that the recovery system must necessarily
store
a. Transaction log
b. Physical backup
c. Logical backup
d. None of these.
4. In case of transaction failure under a deferred update incremental logging scheme, which of
the following will be needed:
a. An undo operation
b. A redo operation
c. Both undo and redo operations
d. None of these.
a. Operations
b. Design
c. Physical
d. None of these.
6. For incremental logging with immediate updates, a transaction log record would contain
a. a transaction name, data item name, old value of item and new value of item.
b. a transaction name, data item name, old value of item.
c. a transaction name, data item name, old new value of item.
d. a transaction name and data item name.
a. Hardware
b. Network
c. Media
d. Software.
8. When a failure occurs, the transaction log is referred and each operation is either undone or
redone. This is a problem because
a. memory errors.
b. disk crashes.
c. disk full errors.
d. All of these.
a. operating system.
b. DBMS software.
c. application programs.
d. All of these.
11. Which of the following is a facility provided by the DBMS to assist the recovery process?
a. Recovery manager
b. Logging facilities
c. Backup mechanism
d. All of these.
13. When using a transaction log based recovery scheme, it might improve performance as well
as providing a recovery mechanism by
a. writing the appropriate log records to disk during the transaction’s execution.
b. writing the log records to dick when each transaction commits.
c. never writing the log records to disk.
d. waiting to write the log records until multiple transactions commit and writing them
as a batch.
a. Shadow paging
b. Immediate update
c. Deferred update
d. None of these.
a. Shadow paging
b. Immediate update
c. Deferred update
d. None of these.
17. If the shadowing approach is used for flushing a data item back to disk, then the item is
written to
a. Lorie
b. Codd
c. IBM
d. Boyce.
21. Which of the following recovery technique does not need logs?
a. Shadow paging
b. Immediate update
c. Deferred update
d. None of these.
a. in a different building.
b. protected against danger such as fire, theft, flood.
c. other potential calamities.
d. All of these.
Database Security
14.1 INTRODUCTION
The goal of database security is the protection of data against threats such as
accidental or intentional loss, destruction or misuse. These threats pose
problems to the database integrity and access. Threats may be defined as any
situation or event, whether intentional or accidental, that may adversely
affect a system and consequently the organisation. A threat may be caused
by a situation or event involving a person, action or circumstances that are
likely to harm the organisation. The harm may be tangible, such as loss of
hardware, software, or data. The harm could be intangible, such as loss of
credibility or client confidence in the organisation. Database security
involves allowing or disallowing users from performing actions on the
database and the objects within it, thus protecting the database from abuse or
misuse.
The database administrator (DBA) is responsible for the overall security
of the database system. Therefore, the DBA of an organisation must identify
the most serious threats and enforce security to take appropriate control
actions to minimise these threats. Any individual user (a person) or a user
group (group of persons) needing to access database system, applies to DBA
for a user account. The DBA then creates an account number and password
for user to access the database on the basis of legitimate need and policy of
the organisation. The user afterwards logs in to the DBMS using the given
account number and password whenever database access is needed. The
DBMS checks for the validity of the user’s entered account number and
password. Then the valid user is permitted to use the DBMS and access the
database. DBMS maintains these two fields of user account number and
password by creating an encrypted table. DBMS keeps on appending this
table by inserting a new record whenever a new account is created. When the
account is cancelled, the corresponding record is deleted from the encrypted
table.
Loss of availability means that the data, or the system, or both cannot be
accessed by the users. This situation can arise due to sabotage of hardware,
networks or applications. The loss of availability can seriously cause
operational difficulties and affect the financial performance of an
organisation. Almost all organisations are now seeking virtually continuous
operation, the so called 24 × 7 operations, that is, 24 hours a day and seven
days a week.
Loss of data integrity causes invalid or corrupted data, which may
seriously affect the operation of an organisation. Unless data integrity is
restored through established backup and recovery procedures, an
organisation may suffer serious losses or make incorrect and expensive
decisions based on the wrong or invalid data.
Loss of confidentiality refers to loss of protecting or maintaining secrecy
over critical data of the organisation, which may have strategic value to the
organisation. Loss of confidentiality could lead to loss of competitiveness.
Loss of privacy refers to loss of protecting data from individuals. Loss of
privacy could lead to blackmail, bribery, public embarrassment, stealing of
user passwords or legal action being taken against the organisation.
Theft and fraud affect not only the database environment but also the
entire organisation. Since these situations are related to the involvement of
people attention should be given to reduce the opportunity for the occurrence
of these activities. For example, control of physical security, so that
unauthorised personnel are not able to gain access to the computer room,
should be established. Another example of a security procedure could be
establishment of a firewall to protect from unauthorised access to
inappropriate parts of the database through outside communication links.
This will hamper people who are intent on theft or fraud. Theft and fraud do
not necessarily alter data, as is the case for loss of confidentiality or loss of
privacy.
Accidental losses could be unintentional threats including human error,
software and hardware-caused breaches. Operating procedures, such as user
authorisation, uniform software installation procedures and hardware
maintenance schedules, can be established to address threats from accidental
losses.
b. The relation (or table) level privilege assignment: At relation or table level of privilege
assignment, the DBA controls the privilege to access each individual relation or view in the
database. Privileges at the relation level specify for each user the individual relations on
which each type of command can be applied. Some privileges also refer to individual
attributes (columns) of relations. Granting and revoking of relation privileges is controlled by
assigning an owner account for each relation R in a database. The owner account is typically
the account that was used when the relation was first created. The owner of the relation is
given all privileges on the relation. In SQL, the following types of privileges can be granted
on each individual relation R:
ALL All the privileges for the object for which the user
issuing the GRANT has grant authority, is granted.
privilege-list Only the listed privileges are granted.
ON It specifies the object on which the privileges are
granted. It can be a table or a view.
column-comma-list The privileges are restricted to the specified
columns. If this is not specified, the grant is given
for the entire table/view.
TO It is used to identify the users to whom the privileges
are granted.
PUBLIC It means that the privileges are granted to all known
users of the system who has valid User ID and
Password.
user-list The privileges will be granted to the user(s)
specified in the list.
WITH GRANT It means that the recipient has the authority to grant
OPTION the privileges that were granted to him to another
user.
This means that the users ‘ABHISHEK’ and ‘MATHEW’ are authorised
to perform SELECT operations on the table (or relation) EMPLOYEE.
GRANT SELECT
ON EMPLOYEE
TO PUBLIC
This means that all users are authorised to perform SELECT operations on
the table (or relation) EMPLOYEE.
This means that the user ‘MATHEW’ has the right to perform SELECT
operations on the table EMPLOYEE as well as the right to update the EMP-
ID attribute.
or
REVOKE SELECT
ON EMPLOYEE
FROM MATHEW
This means that the system privilege for creating table is removed from
the user ‘MATHEW’.
REVOKE ALL
ON EMPLOYEE
FROM MATHEW
This means that the all privileges are removed from the user ‘MATHEW’.
This means that the DELETE and UPDATE authority on the EMP-ID and
EMP-SALARY attributes (columns) are removed from the user
‘ABHISHEK’.
The above examples illustrate a few of the possibilities for granting or
revoking authorisation privileges. The GRANT option may cascade among
users. For example, if ‘Mathew’ has the right to grant authority X to another
user ‘Abhishek’, then ‘Abhishek’ has the right to grant authority X to
another user ‘Rajesh’ and so on. Consider the following example:
Mathew:
GRANT SELECT
ON EMPLOYEE
TO ABHISHEK
WITH GRANT OPTION
Abhishek:
GRANT SELECT
ON EMPLOYEE
TO RAJESH
WITH GRANT OPTION
As long as the user has received a GRANT OPTION, he or she can confer
the same authority to others. However, if the user ‘Mathew’ later wishes to
revoke a GRANT OPTION, he could do so by using the following
command:
REVOKE SELECT
ON EMPLOYEE
FROM ABHISHEK
b. Star security property (or *-property): In this case, a subject S is not allowed to write an
object O unless classification of subject S is less than or equal to classification of an object O.
In other words
14.5 FIREWALLS
Well done.
xfmmaepof.
Well done
safetysaf
b. The blank space occupies the twenty-seventh (last but one), and twenty-eight (last) position
in the alphabet. For each character, alphabet position of the plaintext character and that of the
key character is added. The resultant number is divided by 27 and remainder is kept
separately. For our example of, the first letter of the plaintext ‘W’ is found in the twenty-third
place in the alphabet, while the first letter of key ‘s’ is found in the nineteenth position. Thus,
(23 + 19) = 42. The remainder on division by 15 is zero. This process is called division
modulus 27.
Now we can find that the letter in the fifteenth position in the alphabet is ‘Q’. Thus, the
plaintext letter ‘W’ is encrypted as the letter ‘Q’ in the ciphertext. In this way, all the letters
can be encrypted.
R Q
1. What is database security? Explain the purpose and scope of database security.
2. What do you mean by threat in a database environment? List the potential threats that could
affect a database system.
3. List the types of database security issues.
4. Differentiate between authorization and authentication.
5. Discuss each of the following terms:
a. Database Authorization
b. Authentication
c. Audit Trail
d. Privileges
e. Data encryption
f. Firewall.
a. Give an outline of the data structure which can be used by the first user to
implement this scheme.
b. Explain how this scheme can keep track of the current access rights of the users.
STATE TRUE/FALSE
1. Database security encompasses hardware, software, network, people and data of the
organisation.
2. Threats are any situation or event, whether intentional or accidental, that may adversely
affect a system and consequently the organisation.
3. Authentication is a mechanism that determines whether a user is who he or she claims to be.
4. When a user is authenticated, he or she is verified as an authorized user of an application.
5. Authorization and authentication controls can be built into the software.
6. Privileges are granted to users at the discretion of other users.
7. A user automatically has all object privileges for the objects that are owned by him/her.
8. The REVOKE command is used to take away a privilege that was granted.
9. Encryption alone is sufficient for data security.
10. Discretionary access control (also called security scheme) is based on the concept of access
rights (also called privileges) and mechanism for giving users such privileges.
11. A firewall is a system designed to prevent unauthorized access to or from a private network.
12. Statistical database security system is used to control the access to a statistical database.
13. Data encryption is a method of coding or scrambling of data so that humans cannot read
them.
a. a situation or event involving a person that are likely to harm the organisation.
b. an action that is likely to harm the organisation.
c. circumstances that are likely to harm the organisation.
d. All of these.
3. Which of the following is the permission to access a named object in a prescribed manner?
a. Role
b. Privilege
c. Permission
d. All of these.
a. Data
b. Hardware and Software
c. People
d. External hackers.
6. Discretionary access control (also called security scheme) is based on the concept of
a. access rights
b. system-wide policies
c. Both (a) and (b)
d. None of these.
7. Mandatory access control (also called security scheme) is based on the concept of
a. access rights
b. system-wide policies
c. Both (a) and (b)
d. None of these.
10. Which of the following is the process by which a user’s identity is checked?
a. Authorization
b. Authentication
c. Access Control
d. None of these.
12. Which of the following is the process by which a user’s privileges are ascertained?
a. Authorization
b. Authentication
c. Access Control
d. None of these.
14. Which of the following is the process by which a user’s access to physical data in the
application is limited, based on his privileges?
a. Authorization
b. Authentication
c. Access Control
d. None of these.
OBJECT-BASED DATABASES
Chapter 15
Object-Oriented Databases
15.1 INTRODUCTION
As we discussed in the earlier chapters, the relational data model was first
produced by Dr. E.F. Codd in his seminal paper, which addressed the
disadvantages of legacy database approaches such as hierarchical and
network (CODASYL) databases. Since then, more than hundred commercial
relational DBMSs have been developed and put in use both for mainframe
and PC environments. However, RDBMSs have their own disadvantages,
particularly, limited modelling capabilities. Various data models were
developed and implemented for database design that represents the ‘real-
world’ more closely. Fig. 15.1 shows the history of data models.
Each data model capitalised on the shortcomings of previous models. The
hierarchical model was replaced by a network model because it became
much easier to represent complex (many-to-many) relationships. In turn, the
relational model offered several advantages over the hierarchical and
network models through its simpler data representation, superior data
independence and relatively easy-to-use query language. Thereafter, entity-
relationship (E-R) model was introduced by Chen for an easy-to-use
graphical data representation. ER model became the database design
standard. As more intricate real-world problems were modelled, a need arose
for a different data model to closely represent the real-world. Thus, attempts
were made and Semantic Data Model (SDM) was developed by M. Hammer
and D. McLeod to capture more meaning from the real- world objects. SDM
incorporated more semantics into the data model and introduced concepts
such as class, inheritance and so forth. This helped to model the real-world
objects more objectively. In response to the increasing complexity of
database applications, following two new data models emerged:
Object-oriented data model (OODM).
Object-relational data model (ORDM), also called extended-relational data model (ERDM).
Fig. 15.1 History of evolution of data model
OODBs provide a unique system-generated object identifier (OID) for each object so that an
object can easily be identified and operated upon. This is in contrast with the relational model
where each relation must have a primary key attribute whose value identifies each tuple
uniquely.
OODBs are extensible, that is, capable of defining new data types as well as the operations to
be performed on them.
Support encapsulation, that is, the data representation and the methods implementation are
hidden from external entities.
Exhibit inheritance, that is, an object inherits the properties of other objects.
15.3.1 Objects
An object is an abstract representation of a real-world entity that has a
unique identity, embedded properties and the ability to interact with other
objects and itself. It is a uniquely identified entity that contains both the
attributes that describe the state of a real-world object and the actions that
are associated with it. An object may have a name, a set of attributes and a
set of actions or services. An object may stand alone or it may belong to a
class of similar objects. Thus, the definition of objects encompasses a
description of attributes, behaviours, identity, operations and messages. An
object encapsulates both data and the processing that is applied to the data.
A typical object has two components-(a) state (value) and (b) behaviour
(operations). Hence, it is somewhat similar to a program variable in a
programming language, except that it will typically have a complex data
structure as well as specific operations defined by the programmer. Fig. 15.3
illustrates the examples of objects. Each object is represented by a rectangle.
The first item in the rectangle is the name of the object. The name of the
object is separated form the object attributes by a straight line. An object
may have zero or more attributes. Each attribute has its own name, value and
specifications. The list of attributes is followed by a list of services or
actions. Each service has a name associated with it and eventually will be
translated to executable program (machine) code. Services or actions are
separated from the list of attributes by a horizontal line.
The OID should not be confused with the primary key of relational
database. In contrast to the OID, a primary key of relational database is user-
defined values of selected attributes and can be changed at any time.
Examples of Objects
The data structures for an OO database schema can be defined using type
constructors. Fig. 15.5 illustrates how the objects ‘Student’ and ‘Chair’ of
Fig. 15.4 can be declared corresponding to the object instances.
15.3.4 Classes
A class is a collection of similar objects with shared structure (attributes) and
behaviour (methods). It contains the description of the data structure and the
method implementation details for the objects in that class. Therefore, all
objects in a class share the same structure and respond to the same messages.
In addition, a class acts as a storage bin for similar objects. Thus, a class has
a class name, a set of attributes and a set of services or actions. Each object
in a class is known as a class instance or object instance. There are two
implicit service or action functions defined for each class namely
GET<attribute> and PUT<attribute>. The GET function determines the
value of the attribute associated with it, and the PUT function assigns the
computed value of the attribute to the attribute’s name.
Fig. 15.6 illustrates example of a class ‘Furniture’ with two instances. The
‘Chair’ is a member (or instance) of a class ‘Furniture’. A set of generic
attributes can be associated with every object in the class ‘Furniture’, for
example, price, dimension, weight, location and colour. Because ‘Chair’ is a
member of ‘Furniture’, ‘Chair’ inherits all attributes defined for the class.
Once the class has been defined, the attributes can be reused when new
instances of the class are created. For example, assume that a new object
called ‘Table’ has been defined that is a member of the class ‘Furniture’, as
shown in Fig. 15.6. ‘Table’ inherits all of the attributes of ‘Furniture’. The
services associated with the class ‘Furniture’ is buy (purchase the furniture
object), sell (sell the furniture object) and move (move the furniture object
from one place to another).
Fig. 15.6 Example of Class ‘Furniture’
Examples of Classes
Fig. 15.9 illustrates how the type definition of object of Fig. 15.5 may be
extended with operations (services or actions) to define classes of Fig. 15.7.
(b)
(c)
15.3.6.1 Structure
Structure is basically the association of class and its objects. Let us consider
the following classes:
a. Person
b. Student
c. Employee
d. Graduate
e. Undergraduate
f. Administration
g. Staff
h. Faculty
Association lines between a superclass and its subclass all originate from
a half circle that is connected to the superclass, as shown in Fig. 15.12. The
relationship between a superclass and its subclass is known as
generalisation.
Fig. 15.12 Subclass and superclass structure
Assembly Structure
Combined Structure
15.3.6.2 Inheritance
Inheritance is copying the attributes of the superclass into all of its subclass.
It is the ability of an object within the structure (or hierarchy) to inherit the
data structure and behaviour (methods) of the classes above it. For example,
as shown in Fig. 15.12, class ‘Graduate’ inherits its data structure and
behaviour from the superclasses ‘Student’ and ‘Person’. Similarly, class
‘Staff’ inherits its data structure and behaviour from the superclasses
‘Employee’, ‘Person’ and so on. The inheritance of data and methods goes
from the top to bottom in the class hierarchy. There are two types of
inheritances:
a. Single inheritance: Single inheritance exists when a class has only one immediate (parent)
superclass above it. An example of a single inheritance can be given as the class ‘Student’
and class ‘Employee’ inheriting immediate superclass ‘Person’.
b. Multiple inheritances: Multiple inheritances exist when a class is derived from several parent
superclasses immediately above it.
Fig. 15.14 Combined structure
15.3.7 Operation
An operation is a function or a service that is provided by all the instances of
a class. It is only through such operations that other objects can access or
manipulate the information stored in an object. The operation, therefore,
provides an external interface to a class. The interface presents the outside
view of the class without showing its internal structure or how its operations
are implemented. The operations can be classified into the following four
types:
a. Constructor operation: It creates a new instance of a class.
b. Query operation: It accesses the state of an object but does not alter the state. It has no side
effects.
c. Update operation: This operation alters the state of an object. It has side effects.
d. Scope operation: This operation applies to a class rather than an object instance.
15.3.8 Polymorphism
Object-oriented systems provide for polymorphism of operations. The
polymorphism is also sometimes referred to as operator overloading. The
polymorphism concept allows the same operator name or symbol to bound
to two or more different implementations of the operator, depending on the
type of objects to which the operator is applied.
3. Design
Better representation of the real-
world situation.
Captures more of the data model’s
semantics.
4. Operating System
Enhances system probability.
Improves systems interoperability.
5. Databases
Supports complex objects.
Supports abstract data types.
Supports multimedia databases.
The bindings have been specified for three OOPLs namely, (a) C++, (b)
SmallTalk and (c) JAVA. Some vendors offer specific language bindings,
without offering the full capabilities of ODL and OQL.
The ODMG proposed a standard known as the ODMG-93 or ODMG 1.0
standard released in 1993. This was later on revised into ODMG 2.0 in 1997.
In late 1999, ODMG 3.0 was released that included a number of
enhancements to the object model and to the JAVA language binding.
Fig. 15.16 illustrates few query statements with reference to Fig. 15.15
that are used in OQL and their corresponding results.
5. With an example, differentiate between object, object identity and object attributes.
6. What is OID? What are its advantages and disadvantages?
7. Explain how the concept of OID in OO model differs from the concept of tuple equality in
the relational model.
8. Using an example, illustrate the concepts of class and class instances.
9. Discuss the implementation of class using C++ programming language.
10. Define the concepts of class structure (or hierarchy), superclasses and subclasses.
11. What is the relationship between a subclass and superclass in a class structure?
12. What do you mean by operation in OODM? What are its type? Explain.
13. Discuss the concept of polymorphism or operator overloading.
14. Compare and contrast the OODM with the E-R and relational models.
15. A car-rental company maintains a vehicle database for all vehicles in its current fleet. For all
vehicles, it includes the vehicle identification number, license number, manufacturer, model,
date of purchase and colour. Special data are included for certain types of vehicles:
STATE TRUE/FALSE
1. An OODBMS is suited for multimedia applications as well data with complex relationships
that are difficult model and process in a RDBMS.
2. An OODBMS does not call for fully integrated databases that hold data, text, pictures, voice
and vedio.
3. OODMs are a logical data models that capture the semantics of objects supported in object-
oriented programming.
4. OODMs implement conceptual models directly and can represent complexities that are
beyond the capabilities of relational systems.
5. OODBs maintain a direct correspondence between real-world and database objects so that
objects do not loose their integrity and identity.
6. The conceptual data modelling (CDM) is based on an OO modelling.
7. Object-oriented concepts stem from object-oriented programming languages (OOPLs).
8. A class is a collection of similar objects with shared structure (attributes) and behaviour
(methods).
9. Structure is the association of class and its objects.
10. Inheritance is copying the attributes of the superclass into all of its subclass.
11. Single inheritance exists when a class has one or more immediate (parent) superclass above
it.
12. Multiple inheritances exist when a class is derived from several parent superclasses
immediately above it.
13. An operation is a function or a service that is provided by all the instances of a class.
14. The object definition language (ODL) is designed to support the semantic constructs of the
ODMG object model.
15. An OQL query is embedded into these programming languages.
a. late 1960s.
b. late 1970s.
c. early 1980s.
d. late 1990s.
3. Object-oriented data models (OODMs) and object-relational data models (ORDMs) represent
a. first-generation DBMSs.
b. second-generation DBMSs.
c. third-generation DBMSs.
d. none of these.
4. An OODBMS can hold
a. Polymorphism
b. Inheritance
c. Abstraction
d. all of these.
a. SQL.
b. OPL.
c. QUEL.
d. None of these.
a. Ada.
b. Algol.
c. SIMULA.
d. All of these.
8. A class is a collection of
a. similar objects.
b. similar objects with shared attributes.
c. similar objects with shared attributes and behaviour.
d. None of these.
a. software engineering.
b. knowledge base.
c. artificial intelligence.
d. All of these.
a. one-to-one.
b. many-to-one.
c. many-to-many.
d. All of these.
11. OODBMSs have
a. experience.
b. standards.
c. support for views.
d. All of these
a. ODMG 1.0.
b. ODMG 2.0.
c. ODMG 3.0.
d. All of these.
a. C++.
b. SmallTalk.
c. JAVA.
d. All of these.
Object-Relational Database
16.1 INTRODUCTION
Thus, in addition to storing general and simple data types (such as,
numeric, character and temporal), the modern databases are required to
handle these complex data types that is required by modern business
applications. Table 16.1 summarises some of the common complex data
types or objects.
SQL3 includes extensions for content addressing with complex data types.
For example, suppose that user wants to issue the following query:
“Given a photograph of a person, scan the EMPLOYEE table to determine if there is a close
match for any employee to that photo and then display the record (tuple) of the employee
including the photograph”.
SELECT *
FROM EMPLOYEE
WHERE MY-PHOTO LIKE EMP-PHOTO
Query 1: Retrieve only required segment of the video, rather than the
entire video.
SELECT display (P.video, 6.00 a.m. Jan 01
2005, 6.00 a.m. Jan 30 2005)
FROM PROBES_INFO AS P
WHERE P.PROBE-ID = 05
Query 2: Create the location_seq type by defining list type containing
a list of ROW type objects. Extract the time column from
this list to obtain a list of timestamp values. Apply the MIN
aggregate operator to this list to find the earliest time at
which the given probe recorded.
CREATE TYPE location_seq listof
(row (TIME: timestamp, LAT: real,
LONG: real)
SELECT P.PROBE-ID, MIN(P.LOC-
SEQ.TIME)
FROM PROBES_INFO AS P
From the above examples we can see that an ORDBMS gives us many
useful design options that are not available in a RDBMS.
As shown in the table 16.4, each DBMS model uses a particular style of
access languages to manipulate the database contents. Hierarchical, network
and object-oriented DBMS models employ a procedural language describing
the precise sequence of operations to compute the desired results. Relational,
object-relational and deductive DBMS models use non-procedural
languages, stating only the desired results and leaving specific computation
to the DBMS.
16.4.4 Advantages of ORDBMS
Resolving many weaknesses of RDBMS.
Reduced network traffic.
Reuse and sharing.
Improved application and query performance.
Simplified software maintenance.
Perseverance of the significant body of knowledge and experience that has gone into
developing relational applications.
Integrated data and transaction management.
R Q
STATE TRUE/FALSE
a. complex objects.
b. user-defined types.
c. abstract data types.
d. All of these.
3. ORDBMS supports
a. object capabilities.
b. relational capabilities.
c. Both (a) and (b).
d. None of these.
a. universal database.
b. postgres.
c. informix.
d. ODB-II.
a. universal database.
b. postgres.
c. informix.
d. None of these.
a. universal database.
b. postgres.
c. informix.
d. ODB-II.
a. universal database.
b. postgres.
c. informix.
d. ODB-II.
a. universal database.
b. postgres.
c. adapter.
d. ODB-II.
17.1 INTRODUCTION
17.4.1 Speed-up
Speed-up is a property in which the time taken for performing a task
decreases in proportion to the increase in the number of CPUs and disks in
parallel. In other words, speed-up is the property of running a given task in
less time by increasing the degree of parallelism (more number of hardware).
With additional hardware, speedup holds the task constant and measures the
time saved. Thus, speed-up enables users to improve the system response
time for their queries, assuming the size of their databases remain roughly
the same. Speed-up due to parallelism can be defined as
Where
17.4.2 Scale-up
Scale-up is the property in which the performance of the parallel database is
sustained if the number of CPU and disks are increased in proportion to the
amount of data. In other words, scale-up is the ability of handling larger
tasks by increasing the degree of parallelism (providing more resources) in
the same time period as the original system. With added hardware (CPUs
and disks), a formula for scale-up holds the time constant and measures the
increased size of the task, which can be performed. Thus, scale-up enables
users to increase the sizes of their databases while maintaining roughly the
same response time. Scale-up due to parallelism can be defined as
Where
17.4.3 Synchronisation
Synchronisation is the coordination of concurrent tasks. For a successful
operation of the parallel database systems, the tasks should be divided such
that the synchronisation requirement is less. It is necessary for correctness.
With less synchronisation requirement, better speed-up and scale-up can be
achieved. The amount of synchronisation depends on the amount of
resources (CPUs, disks, memory, databases, communication network and so
on) and the number of users and tasks working on the resources. More
synchronisation is required to coordinate large number of concurrent tasks
and less synchronisation is necessary to coordinate small number of
concurrent tasks.
17.4.4 Locking
Locking is a method of synchronising concurrent tasks. Both internal as well
as external locking mechanisms are used for synchronisation of tasks that are
required by the parallel database systems. For external locking, a distributed
lock manager (DLM) is used, which is a part of the operating system
software. DLM coordinates resource sharing between communication nodes
running a parallel server. The instances of a parallel server use the DLM to
communicate with each other and coordinate modification of database
resources. The DLM allows applications to synchronise access to resources
such as data, software and peripheral devices, so that concurrent requests for
the same resource are coordinated between applications running on different
nodes.
SELECT *
FROM EMPLOYEE
WHERE EMP-ID = 106519;
Hash partitioning is also useful for sequential scans of the entire relation
(table) placed on n number of disks. The time taken to scan the relation is
approximately 1/n of the time required to scan the relation in a single disk
system.
SELECT *
FROM EMPLOYEE
WHERE EMP-ID > 105000 and EMP-ID < 150000;
In such a case, the search (scanning) would have to involve most (or all)
disks over which the relation has been partitioned.
Advantages
Intra-query parallelism speeds up long-running queries.
They are beneficial for decision support applications that issue complex, read-only queries,
including queries involving multiple joins.
17.5.3.1 Advantages
Easiest form of parallelism to support in a database system, particularly in shared-memory
parallel system.
Increased transaction throughput.
It scales up a transaction-processing system to support a larger number of transactions per
second.
17.5.3.2 Disadvantages
Response times of individual transactions are no faster than they would be if the transactions
were run in isolation.
It is more complicated in a shared-disk or shared-nothing architecture.
17.5.4.1 Advantages
Intra-operation parallelism is natural in a database.
Degree of parallelism is potentially enormous.
17.5.5 Inter-operation Parallelism
In inter-operation parallelism, the different operations in a query expression
are executed in parallel. The following two types of inter-operation
parallelism are used:
Pipelined parallelism
Independent parallelism
R Q
1. What do you mean by parallel processing and parallel databases? What are the typical
applications of parallel databases?
2. What are the advantages and disadvantages of parallel databases?
3. Discuss the architecture of parallel databases.
4. What is shared-memory architecture? Explain with a neat sketch. What are its benefits and
limitations?
5. What is shared-disk architecture? Explain with a neat sketch. What are its benefits and
limitations?
6. What is shared-nothing architecture? Explain with a neat sketch. What are its benefits and
limitations?
7. Discuss the key elements of parallel processing in brief.
8. What do you mean by speed-up and scale-up? What is the importance of linearity in speed-
up and scale-up? Explain with diagrams and examples.
9. What is synchronisation? Why is it necessary?
10. What is locking? How is locking performed?
11. What is query parallelism? What is its type?
12. What do you mean by data partitioning? What are the different types of partitioning
techniques?
13. For each of the partitioning techniques, give an example of a query for which that
partitioning technique would provide the fastest response.
14. In a range selection on a range-partitioned attribute, it is possible that only one disk may need
to be accessed. Describe the advantages and disadvantages of this property.
15. What form of parallelism (inter-query, inter-operation or intra-operation) is likely to be the
most important for each of the following tasks:
16. What do you mean by pipelined parallelism? Describe the advantages and disadvantages of
pipelined parallelism.
17. Write short notes on the following:
a. Hash partitioning.
b. Round-robin partitioning.
c. Range partitioning.
d. Schema partitioning.
18. Write short notes on the following:
a. Intra-query parallelism.
b. Inter-query parallelism.
c. Intra-operation parallelism.
d. Inter-operation parallelism.
STATE TRUE/FALSE
a. Parallel processing
b. Centralised processing
c. Sequential processing
d. None of these.
2. What is the value of speed-up if the original system took 200 seconds to perform a task, and
two parallel systems took 50 seconds to perform the same task?
a. 2
b. 3
c. 4
d. None of these.
3. What is the value of scale-up if the original system can process 1000 transactions in a given
time, and the parallel system can process 3000 transactions in the same time?
a. 2
b. 3
c. 4
d. None of these.
a. Improved performance
b. Greater flexibility
c. Better availability
d. All of these.
6. The architecture having multiple CPUs working in parallel and physically located in a close
environment in the same building and communicating at very high speed is called
a. parallel database system.
b. distributed database system.
c. centralised database system.
d. None of these.
a. DBMS.
b. portion of data managed by the DBMS.
c. operating system.
d. All of these.
a. shared-disk architecture.
b. shared-nothing architecture.
c. shared-memory architecture.
d. None of these.
16. Speed-up is a property in which the time taken for performing a task
a. decreases in proportion to the increase in the number of CPUs and disks in parallel.
b. increases in proportion to the increase in the number of CPUs and disks in parallel.
c. Both (a) and (b).
d. None of these.
a. I/O parallelism.
b. inter-operation parallelism.
c. intra-query parallelism.
d. inter-query parallelism.
a. the execution of a single query is done in parallel on multiple CPUs using shared-
nothing parallel architecture technique.
b. multiple transactions are executed in parallel, one by each (CPU).
c. we parallelise the execution of each individual operation of a task, such as sorting,
projection, join and so on.
d. the different operations in a query expression are executed in parallel.
a. the execution of a single query is done in parallel on multiple CPUs using shared-
nothing parallel architecture technique.
b. multiple transactions are executed in parallel, one by each (CPU).
c. we parallelise the execution of each individual operation of a task, such as sorting,
projection, join and so on.
d. the different operations in a query expression are executed in parallel.
a. the execution of a single query is done in parallel on multiple CPUs using shared-
nothing parallel architecture technique.
b. multiple transactions are executed in parallel, one by each (CPU).
c. we parallelise the execution of each individual operation of a task, such as sorting,
projection, join and so on.
d. the different operations in a query expression are executed in parallel.
a. the execution of a single query is done in parallel on multiple CPUs using shared-
nothing parallel architecture technique.
b. multiple transactions are executed in parallel, one by each (CPU).
c. we parallelise the execution of each individual operation of a task, such as sorting,
projection, join and so on.
d. the different operations in a query expression are executed in parallel.
18.1 INTRODUCTION
Data fragmentation and data replication are the most commonly used
techniques that are used during the process of DDBS design to break up the
database into logical units and storing certain data in more than one site.
These two techniques are further discussed below in detail.
18.4.1 Data Fragmentation
Technique of breaking up the database into logical units, which may be
assigned for storage at the various sites, is called data fragmentation. In the
data fragmentation, a relation can be partitioned (or fragmented) into several
fragments (pieces) for physical storage purposes and there may be several
replicas of each fragment. These fragments contain sufficient information to
allow reconstruction of the original relation. All fragments of a given
relation will be independent. None of the fragments can be derived from the
others or has a restriction or a projection that can be derived from the others.
For example, let us consider an EMPLOYEE relation as shown in table 18.1.
FRAGMENT EMPLOYEE AS
MUMBAI_EMP AT SITE ‘Mumbai’ WHERE DEPT-ID = 2
JAMSHEDPUR_EMP AT SITE ‘Jamshedpur’ WHERE DEPT-ID = 4
LONDON_EMP AT SITE ‘London’ WHERE DEPT-ID = 5;
σP (R)
Fig. 18.6 An example of horizontal data fragmentation
REPLICATE LONDON_EMP AS
LONMUM_EMP AT SITE ‘Mumbai’
REPLICATE MUMBAI_EMP AS
MUMLON_EMP AT SITE ‘London’
18.5.1 Semi-JOIN
In a distributed query processing, the transmission or communication cost is
high. Therefore, semijoin operation is used to reduce the size of a relation
that needs to be transmitted and hence the communication costs. Let us
suppose that the relation R (EMPLOYEE) and S (PROJECT) are stored at
site C (Mumbai) and site B (London), respectively as shown in Fig. 18.10. A
user issues a query at site C to prepare a project allocation list, which
requires the computation JOIN of the two relations given as
JOIN (R, S)
or JOIN (EMPLOYEE, PROJECT)
X = ∏EMP-ID (S)
or X = ∏EMP-id (PROJECT)
The result of the projection operation is shown in Fig. 18.11. Now at site
C (Mumbai), those tuples of relation R (EMPLOYEE) are selected that have
the same value for the attribute EMP-ID as a tuple in X = ∏EMP-ID
(PROJECT) by a JOIN and can be computed as
Y = JOIN (R, X)
or Y = EMPLOYEE ⋈ X
Fig. 18.10 An example of data replication
Y = EMPLOYEE ⋉ PROJECT
≅ EMPLOYEE ⋈ X
The result of the semijoin operation is shown in Fig. 18.12. But, as can be
seen, the desired result is not obtained after the semijoin operation. The
semijoin operation reduces the number of tuples of relation R (EMPLOYEE)
that have to be transmitted at site B (London). The final result is obtained by
joining of the reduced relation R (EMPLOYEE) and relation S (PROJECT)
as shown in Fig. 18.12 and can be computed as
R⋈S = Y⋈S
or EMPLOYEE ⋈ PROJECT = Y ⋈ PROJECT
The semijoin operator (⋉) is used to reduce the communication cost. If, Z
is the result of the semijoin of relations R and S, then semijoin can be
defined as
Z = R⋉S
Z represents the set of tuples of relation R that join with some tuple(s) in
relation S. Z does not contain tuples of relation R that do not join with any
tuple in relation S. Thus, Z represents the reduced R that can be transmitted
to a site of S for a join with it. If the join of R and S is highly selective, the
size of Z would be a small proportion of the size of R. To get the join of R
and S, we now join P with S, and given as
T = Z⋈S
= (R ⋉ S) ⋈ S
= (S ⋉ R) ⋈ R
= (R ⋉ S) ⋈ (S ⋈ R)
Fig. 18.12 Result of projection operation at site B
18.6.1.1 Advantages
Simple implementation.
Reduces the degree of bottleneck.
Reasonably low overhead, requiring two message transfers for handling lock requests, and
one message transfer for handling unlock requests.
18.6.1.2 Disadvantages
More complex deadlock handling because the lock and unlock requests are not made at
single site.
Possibility of inter-site deadlocks even when there is no deadlock within a single site.
18.6.3 Timestamping
As discussed in chapter 12, section 12.5, timestamping is a method of
identifying messages with their time of transaction. In the DDBSs, each
copy of the data item contains two timestamp values, namely read timestamp
and the write timestamp. Also, each transaction in the system is assigned a
timestamp value that determines its serialisability order.
In distributed systems, each site generates unique local timestamp using
either a logical counter or the local clock and concatenates it with the site
identifier. If the local timestamp were unique, its concatenation with the
unique site identifier would make the global timestamp unique across the
network. The global timestamp is obtained by concatenating the unique local
timestamp with the site identifier, which also must be unique. The site
identifier must be the least significant digits of the timestamp so that the
events can be ordered according to their occurrence and not their location.
Thus, this ensures that the global timestamps generated in one site are not
always greater than those generated in another site.
There could be a problem if one site generates local timestamps at a rate
faster than that of the other sites. Therefore, a mechanism is required to
ensure that local timestamps are generated fairly across the system and
synchronised. The synchronisation is achieved by including the timestamp in
the messages (called logical timestamp) sent between sites. On receiving a
message, a site compares its clock or counter with the timestamp contained
in the message. If it finds its clock or counter slower, it sets it to some value
greater than the message timestamp. In this way, an inactive site’s counter or
a slower clock gets synchronised with the others at the first message
interaction with other site.
Limitations
A failure of the coordinator of sub-transactions can result in the transaction being blocked
from completion until the coordinator is restored.
Requirement of coordinator results into more messages and more overhead.
Advantages
3PC does not block the sites.
Limitations
3PC adds to the overhead and cost.
R Q
1. What is distributed database? Explain with a neat diagram.
2. What are the main advantages and disadvantages of distributed databases?
3. Differentiate between parallel and distributed databases.
4. What are the desired properties of distributed databases?
5. What do you mean by architecture of a distributed database system? What are different types
of architectures? Discuss each of them with neat sketch.
6. What is client/server computing? What are its main components?
7. Discuss the benefits and limitations of client/server architecture of the DDBS.
8. What are the various types of distributed databases? Discuss in detail.
9. What are homogeneous DDBSs? Explain in detail with an example.
10. What are heterogeneous DDBSs? Explain in detail with an example.
11. What do you mean by distributed database design? What strategies and objectives are
common to most of the DDBMSs?
12. What is a fragment of a relation? What are the main types of data fragments? Why is
fragmentation a useful concept in distributed database design?
13. What is horizontal data fragmentation? Explain with an example.
14. What is vertical data fragmentation? Explain with an example.
15. What is mixed data fragmentation? Explain with an example.
16. Consider the following relation
17. For each of the strategy of the previous question, state how your choice of a strategy depends
on:
18. What is data replication? Why is data replication useful in DDBMSs? What typical units of
data replicated?
19. What is data allocation? Discuss.
20. Write short notes on the following:
a. Distributed Database
b. Data Fragmentation
c. Data Allocation
d. Data Replication
e. Two-phase Commit
f. Three-phase Commit
g. Timestamping
h. Distributed Locking
i. Semi-JOIN
j. Distributed Deadlock.
22. What do you mean by data replication? What are its advantages and disadvantages?
23. What is distributed database query processing? How is it achieved?
24. What is semi-JOIN in a DDBS query processing? Explain with an example.
25. Compute a semijoin for the following relation shown in Fig. 18.17 kept at two different sites.
Fig. 18.17 Obtaining a join using semijoin
26. What is the difference between a homogeneous and a heterogeneous DDBS? Under what
circumstances would such systems generally arise?
27. Discuss the issues that have to be addressed with distributed database design.
28. What is middleware system architecture? Explain with a neat sketch and an example.
29. Under what condition is
(R ⋉ S) = (S ⋉ R)
Assume that each fragment has two replicas; one stored at the Bangalore site and one stored
locally at the plant site of Jamshedpur. Describe a good processing strategy for the following
queries entered at the Singapore site:
STATE TRUE/FALSE
1. In a distributed database system, each site is typically managed by a DBMS that is dependent
on the other sites.
2. Distributed database systems arose from the need to offer local database autonomy at
geographically distributed locations.
3. The main aim of client/server architecture is to utilise the processing power on the desktop
while retaining the best aspects of centralised data processing.
4. Distributed transaction atomicity property enables users to ask queries without specifying
where the reference relations, or copies or fragments of the relations, are located.
5. Distributed data independence property enables users to write transactions that access and
update data at several sites just as they would write transactions over purely local data.
6. Although geographically dispersed, a distributed database system manages and controls the
entire database as a single collection of data.
7. In homogeneous DDBS, there are several sites, each running their own applications on the
same DBMS software.
8. In heterogeneous DDBS, different sites run under the control of different DBMSs, essentially
autonomously and are connected somehow to enable access to data from multiple sites.
9. A distributed database system allows applications to access data from local and remote
databases.
10. Homogeneous database systems have well-accepted standards for gateway protocols to
expose DBMS functionality to external applications.
11. Distributed database do not use client/server architecture.
12. In the client/server architecture, client is the provider of the resource whereas the server is a
user of the resource.
13. The client/server architecture does not allow a single query to span multiple servers.
14. A horizontal fragmentation is produced by specifying a predicate that performs a restriction
on the tuples in the relation.
15. Data replication is used to improve the local database performance and protect the
availability of applications.
16. Transparency in data replication makes the user unaware of the existence of the copies.
17. The server is the machine that runs the DBMS software and handles the functions required
for concurrent, shared data access.
18. Data replication enhances the performance of read operations by increasing the processing
speed at site.
19. Data replication decreases the availability of data to read-only transactions.
20. In distributed locking, the DDBS maintains a lock manager at each site whose function is to
administer the lock and unlock requests for those data items that are stored at that site.
21. In distributed systems, each site generates unique local timestamp using either a logical
counter or the local clock and concatenates it with the site identifier.
22. In a recovery control, transaction atomicity must be ensured.
23. The two-phase commit protocol guarantees that all database servers participating in a
distributed transaction either all commit or all abort.
24. The use of 2PC is not transparent to the users.
a. local databases.
b. remote databases.
c. both local and remote databases
d. None of these.
2. In homogeneous DDBS,
a. there are several sites, each running their own applications on the same DBMS
software.
b. all sites have identical DBMS software.
c. all users (or clients) use identical software
d. All of these.
3. In heterogeneous DDBS,
a. different sites run under the control of different DBMSs, essentially autonomously.
b. different sites are connected somehow to enable access to data from multiple sites.
c. different sites may use different schemas, and different DBMS software.
d. All of these.
a. communication networks.
b. server.
c. application softwares.
d. All of these.
a. Communication network
b. Server
c. Client
d. All of these.
a. Client/Server computing
b. Mainframe computing
c. Personal computing
d. None of these.
a. technique of breaking up the database into logical units, which may be assigned for
storage at the various sites.
b. process of deciding about locating (or placing) data to several sites.
c. technique that permits storage of certain data in more than one site.
d. None of these.
a. technique of breaking up the database into logical units, which may be assigned for
storage at the various sites
b. process of deciding about locating (or placing) data to several sites.
c. technique that permits storage of certain data in more than one site.
d. None of these.
a. technique of breaking up the database into logical units, which may be assigned for
storage at the various sites.
b. process of deciding about locating (or placing) data to several sites.
c. technique that permits storage of certain data in more than one site.
d. None of these.
16. Which of the following refers to the operation of copying and maintaining database objects in
multiple databases belonging to a distributed system?
a. Replication
b. Backup
c. Recovery
d. None of these.
a. 2PC
b. Backup
c. Immediate update
d. None of these.
21. In a distributed database system, the deadlock prevention method by aborting the transaction
can be used such as
a. timestamping.
b. wait-die method.
c. wound-wait method.
d. All of these.
19.1 INTRODUCTION
Since data are a crucial raw material in the information age, the preceding
chapters focussed on data storage and its management for efficient database
design and the process of implementation. These chapters mostly devoted on
good database design, controlled data redundancy and produced effective
operational databases to fulfil business needs, such as, tracking customers,
sales, inventories and so on, to facilitate in the management of decision-
making.
In the last few decades, there has been a revolutional change in computer-
based technologies to improve the effectiveness of managerial decision-
making, especially in complex tasks. Decision support system (DSS) is one
of such technologies, which was developed to facilitate the decision-making
process. DSS helps in the analysis of business information. It provides a
computerised interface that enables business decision makers to creatively
approach, analyse and understand business problems. Decision support
systems, more than 30 years old, have already proven themselves by
providing business with substantial savings in time and money.
In this chapter, decision support system (DSS) technology has been
introduced.
The concept of decision support system (DSS) can be traced back to 1940s
and 1950s with the emergence of operations research, behavioural and
scientific theories of management and statistical process control, much
before the general availability of computers. During these days, the basic
objective was to collect business operational data and convert into a form
that is useful to analyse and modify the behaviour of the business in an
intelligent manner. Fig. 19.1 illustrates the evolution of decision support
system.
In the late 1960s and early 1970s, researchers at Harvard and
Massachusetts Institute of Technology (MIT), USA, introduced the use of
computers in the decision-making process. The computing systems to help in
decision-making process were known as management decision systems
(MDS) or management information systems (MIS), Later on, it was most
commonly known as decision support system (DSS). The term management
decision system (MDS) was introduced by Scott-Morton in the early 1970s.
During the 1970s, several query languages were developed and numbers
of custom-built decision support systems were built around such languages.
These custom-built DSS were implemented using report generators such as
RPG or data retrieval products such as focus, datatrieve and NOMAD. The
data were stored in simple flat files until the early 1908s when relational
databases began to be used for decision support purposes.
Fig. 19.2 illustrates the relation among EDP, MIS and DSS. As shown,
DSS can be considered as a subset of MIS.
The decision support system (DSS) emerged from a data processing world of
routine static reports. According to Clyde Holsapple, professor in the
decision science department of the College of Business and Economics at
the University of Kentucky in Lexington, “Decision-makers can’t wait a
week or a month for a report”. As per Holsapple, the advances in the 1960s,
such as the IBM 360 and other mainframe technologies, laid the foundation
for DSS. But, he claims, it was during the 1970s that DSS took off, with the
arrival of query systems, what-if spreadsheets, rules-based software
development and packaged algorithms from companies such as Chicago-
based SPSS Inc. and Cary, N.C.-based SAS Institute Inc.
The concepts of decision support system (DSS), was first articulated in the
early 1970s by Scott-Morton under the term management decision systems
(MDS). He defined such systems as “interactive computer- based systems,
which help decision makers utilise data and models to solve unstructured
problems”.
Keen and Scott-Morton provided another classical definition of decision
support system as, “Decision support systems couple the intellectual
resources of individuals with the capabilities of the computer to improve the
quality decisions. It is a computer-based support system for management
decision makers who deal with semi-structured problems”.
A working definition of DSS can be given as “a DSS is an interactive,
flexible and adaptable computer- based information system (CBIS) that
utilises decision rules, models and model base coupled with a
comprehensive database and the decision maker’s own insights, leading to
specific, implementable decisions in solving problems”. DSS is a
methodology designed to extract information from data and to use each such
information as a basis for decision-making. It is an arrangement of
computerised tools used to assist managerial decision-making within a
business. The DSS is used at all levels within an organisation and is often
tailored to focus on specific business areas or problems, such as, insurance,
financial, banking, health care, manufacturing, marketing and sales and so
on.
The DSS is an interactive computerised system and provides ad hoc query
tools to retrieve data and display data in different formats. It helps decision
makers compile useful information received in form of raw data from a wide
range of sources, different documents, personal knowledge and/or business
models to identify and solve problems and make decisions.
For example, a national on-line books seller wants to begin selling its
products internationally but first needs to determine if that will be a wise
business decision. The vendor can use a DSS to gather information from its
own resources to determine if the company has the ability or potential ability
to expand its business and also from external resources, such as industry
data, to determine if there is indeed a demand to meet. The DSS will collect
and analyse the data and then present it in a way that can be interpreted by
humans. Some decision support systems come very close to acting as
artificial intelligence agents.
DSS applications are not single information resources, such as a database
or a program that graphically represents sales figures, but the combination of
integrated resources working together.
Typical information that a decision support application might gather and
present would be:
Accessing all of current information assets of an enterprise, including legacy and relational
data sources, cubes, data warehouses and data marts.
Comparative sales figures between one week and the next.
Projected revenue figures based on new product sales assumptions.
The consequences of different decision alternatives, given past experience in a context that is
described.
The best DSS combines data from both internal and external sources in a
common view allowing managers and executives to have all of the
information they need at their fingertips.
Operational data and DSS data serve different purposes. Their formats and
structure differ from one another. While operational data captures daily
business transactions, the DSS data give tactical and strategic business
meaning to the operational data. Most operational data are stored in
relational database in which the structures (tables) tend to be highly
normalised. The operational data storage is optimised to support transactions
that present daily operations. Customer data, inventory data and so on, are
frequently updated as when its status change. Operational systems store data
in more than one table for effective update performance. For example, sales
transaction might be represented by many tables, such as invoice, discount,
store, department and so on. Therefore, operational databases are not query
friendly. For example, to extract an invoice details, one has to join several
tables.
DSS data differ from operational data in three main areas, namely time
span, granularity and dimensionality. Table 19.1 shows the difference
between operational data and DSS data under these three areas.
Dimensionality
1. Represent single transaction 1. Represent multidimensional
view of data. view of data.
From the above table it can be observed that operational data have a
narrow time span, low granularity and single focus. It is normally seen in
tabular formats in which each row represents a single transaction. It is
difficult to derive useful information from operational data. On the other
hand, DSS data focuses on a broader time span, have levels of granularity
and can be seen from multiple dimensions.
R Q
1. What do you mean by the decision support system (DSS)? What role does it play in the
business environment?
2. Discuss the evolution of decision support system.
3. What are the main components of a DSS? Explain the functions of each of them with a neat
diagram.
4. What are the differences between operational data and DSS data?
5. Discuss the major characteristics of DSS.
6. List major benefits of DSS.
STATE TRUE/FALSE
3. The term management decision system (MDS) was introduced in the year
a. early-1960s
b. early-1970s.
c. early-1980s
d. None of these.
a. Scott-Morton
b. Kroeber-Waston.
c. Harvard and MIT
d. None of these.
a. MDS.
b. MIT.
c. MIS.
d. Both (b) and (c).
a. Scott-Morton
b. Kroeber-Waston
c. Harvard and MIT
d. None of these.
a. combines data from both internal and external sources in a common view allowing
managers and executives to have all of the information they need at their fingertips.
b. combines data from only internal sources in a common view allowing managers and
executives to have all of the information they need at their fingertips.
c. combines data from only external sources in a common view allowing managers
and executives to have all of the information they need at their fingertips.
d. None of these.
8. DSS incorporates
a. only data
b. only model.
c. both data and model
d. None of these.
a. time span
b. granularity.
c. dimensionality
d. All of these.
20.1 INTRODUCTION
Integrated Similar data can have different Provides a unified view of all
representations or meanings, for data elements with a common
example, business metrics, social definition and representation for
security numbers and others. all departments.
Time-variant Data represent current Data are historic in nature. A
transactions, for example, the dimension is added to facilitate
sales of a product in a given date, data analysis and time
or over last week and so on. corporations.
Non-volatile Data updates and deletes are very Data are changed, but, are only
common. added periodically from
operational systems. Once data
are stored, no changes are
allowed.
As can be seen in table 20.2 (a), the tabular view (in case of operational
data) of sales data is not well- suited to decision-support, because the
relationship INVOICE → PRODUCT_LINE between INVOICE and
PRODUCT_LINE does not provide a business perspective of the sales data.
On the other hand, the end-users view of sales data from a business
perspective is more closely represented by the multidimensional view of
sales than the tabular view of separate tables, as shown in table 20.2 (b). It
can also be noted that the multidimensional view allows end-users to
consolidate or aggregate data at different levels, for example, total sales
figures by customers and by date. The multidimensional view of data also
allows a business data analyst to easily switch business perspectives from
sales by customers to sales by division, by region, by products and so on.
OLAP is a database interface tool that allows users to quickly navigate
within their data. The term OLAP was coined in a white paper written for
Arbor Software Corporation in 1993. OLAP tools are based on
multidimensional databases (MDDBs). These tools allow the users to
analyse the data using elaborate, multidimensional and complex views.
These tools assume that the data is organised in a multidimensional model
that is supported by a special multidimensional database (MDDB) or by a
relational database designed to enable multidimensional properties, such as
multi-relational database (MRDB). OLAP tool is very useful in business
applications such as sales forecasting, product performance and profitability,
capacity planning, effectiveness of a marketing campaign or sales program
and so on. In summary, OLAp systems have the following main
characteristics:
Uses multidimensional data analysis techniques.
Provides advanced database support.
Provides easy-to-use end-user interfaces.
Supports client/server architecture.
R Q
1. What is a data warehouse? How does it differ from a database?
2. What are the goals of a data warehouse?
3. What are characteristics of data warehouse?
4. What are the different components of a data warehouse? Explain with the help of a diagram.
5. List the benefits and limitations of a data warehouse.
6. Discuss what is meant by the following terms when describing the characteristics of the data
in a data warehouse:
a. subject-oriented
b. integrated
c. time-variant
d. non-volatile.
STATE TRUE/FALSE
a. Non-volatile.
b. Subject-oriented.
c. Time-variant.
d. All of these.
3. Data warehouses extract information for strategic use of the organisation in reducing costs
and improving revenues, out of
a. legacy systems.
b. secondary storage.
c. main memory.
d. None of these.
4. The advancements in technology and the development of microcomputers (PCs) along with
data- orientation in form of relational databases, drove the emergence of end-user computing
during
a. 1970s and 1980s.
b. 1980s and 1990s.
c. 1990s and 2000s.
d. the start of the 21st century.
a. 1970s.
b. 1980s.
c. 1990s.
d. early 2000.
7. Which of the following technological advances in data modelling, databases and application
development methods resulted into paradigm shift from information system (IS) approach to
business- driven warehouse implementations?
a. Data modelling.
b. Databases.
c. Application development methods.
d. All of these.
a. providing the end-users with access to the stored warehouse information through the
use of specialised end-user tools.
b. collecting the data from legacy system and convert them into usable form for the
users.
c. holding a vast amount of information from a wide variety of sources.
d. None of these.
a. providing the end-users with access to the stored warehouse information through the
use of specialised end-user tools.
b. collecting the data from legacy system and convert them into usable form for the
users.
c. holding a vast amount of information from a wide variety of sources.
d. None of these.
a. providing the end-users with access to the stored warehouse information through the
use of specialised end-user tools.
b. collecting the data from legacy system and convert them into usable form for the
users.
c. holding a vast amount of information from a wide variety of sources.
d. None of these.
a. subject-oriented.
b. integrated.
c. volatile.
d. All of these.
a. summarised data.
b. de-normalised data.
c. aggregated departmental data.
d. All of these.
17. Online analytical processing (OLAP) is an advanced data analysis environment that supports
a. decision making.
b. business modelling.
c. operations research activities.
d. All of these.
18. OLAP is
21.1 INTRODUCTION
The Internet revolution of the late 90s have resulted into explosive growth of
World Wide Web (WWW) technology and sharply increased direct user
access to databases. Organisations converted many of their phone interfaces
to databases into Web interfaces and made a variety of services and
information available on-line. The transaction requirements of organisations
have grown with increasing use of computers and the phenomenal growth in
the Web technology. These developments have created many sites with
millions of viewers and the increasing amount of data collected from these
viewers has produced extremely large databases at many companies.
Today, millions of people use the Internet to shop for goods and services,
listen to music, view network, conduct research, get stock quotes, keep up-
to-date with current events and send electronic mail to other Internet users.
More and more people are using the Internet at work and at home to view
and download multimedia computer files containing graphics, sound, video
and text.
Internet History
21.2.2 TCP/IP
The two basic protocols TCP and IP that hold the Internet together are
TCP/IP, which are two separate protocols.
The Internet Protocol (IP) joins together the separate network segments
that constitute the Internet. Every computer on the Internet has a unique
address, known as an IP address. The address consists of four numbers, each
in the range 0 to 255, such as 132.151.3.90. Within a computer, these are
stored as four bytes. When printed, the convention is to separate them with
periods as in this example. IP, the Internet Protocol, enables any computer on
the Internet to dispatch a message to any other, using the IP address. The
various parts of the Internet are connected by specialised computers, known
as “routers”. As their name implies, routers use the IP address to route each
message on the next stage of the journey to its destination. Messages on the
Internet are transmitted as short packets, typically a few hundred bytes in
length. A router simply receives a packet from one segment of the network
and dispatches it on its way. An IP router has no way of knowing whether
the packet ever reaches its ultimate destination.
The Transport Control Protocol (TCP) is responsible for reliable delivery
of complete messages from one computer to another. On the sending
computer, an application program passes a message to the local TCP
software. TCP takes the message, divides it into packets, labels each with the
destination IP address and a sequence number and sends them out on the
network. At the receiving computer, each packet is acknowledged when
received. The packets are reassembled into a single message and handed
over to an application program.
TCP guarantees error-free delivery of messages, but it does not guarantee
that they will be delivered punctually. Sometimes, punctuality is more
important than complete accuracy. If an occasional packet fails to arrive on
time, the human ear would much prefer to lose tiny sections of the sound
track rather than wait for a missing packet to be retransmitted, which would
be horribly jerky. Since TCP is unsuitable for such applications, they use an
alternate protocol, named UDP, which also runs over IP. With UDP, the
sending computer sends out a sequence of packets, hoping that they will
arrive. The protocol does its best, but makes no guarantee that any packets
ever arrive.
As shown in the example of Fig. 21.3, the HTML file contains both the
text to be rendered and codes, known as tags that describe the format or
structure. The HMTL tags can always be recognised by the angle brackets
(<and>). Most HTML tags are in pairs with a “/” indicating the end of a pair.
Thus <title> and </title> enclose some text that is interpreted as a title. Some
of the HTML tags show format; thus <i> and </i> enclose text to be
rendered in italic and <br> shows a line break. Other tags show structure:
<p> and </p> delimit a paragraph and <h1> and </h1> bracket a level one
heading. Structural tags do not specify the format, which is left to the
browser.
For example, many browsers show the beginning of a paragraph by
inserting a blank line, but this is a stylistic convention determined by the
browser. This example also shows two features that are special to HMTL
and have been vital to the success of the Web. The first special feature is the
ease of including colour image in Web pages. The tag:
http://www.dlib.org/dlib.html
http://www.google.co.in/google.html
This URL has three parts:
Some URLs are very lengthy and contain additional information about the
path and file name of the Web page. URLs can also contain the identifier of
a program located on the Web server, as well as arguments to be given to the
program. An example of such URL is given below:
http://www.google.co.in/topic/search?q=database
In the above example, “/topic/” is the path name of the HTML document
on the Web server and “/search?q=database” is an execution argument for
the search on the server www.google.co.in. Using the given arguments, the
program executes and returns an HTML document, which is then sent to the
front end.
The document begins with a processing instruction: <?xml …?>. This is the XML
declaration. While it is not required, its presence explicitly identifies the document as an
XML document and indicates the version of XML to which it was authored.
There is no document type declaration. XML does not require a document type declaration.
However, a document type declaration can be supplied and some documents will require one
in order to be understood unambiguously.
Empty elements (<applause/> in this example) have a modified syntax. While most elements
in a document are wrappers around some content, empty elements are simply markers where
something occurs. The trailing /> in the modified syntax indicates to a program processing
the XML document that the element is empty and no matching end-tag should be sought.
Since XML documents do not require a document type declaration, without this clue it could
be impossible for an XML parser to determine which tags were intentionally empty and
which had been left empty by mistake.
XML has softened the distinction between elements which are declared as
EMPTY and elements which merely have no content. In XML, it is legal to
use the empty-element tag syntax in either case. It is also legal to use a start-
tag/end-tag pair for empty elements: <applause></applause>. If
interoperability is of any concern, it is best to reserve empty-element tag
syntax for elements which are declared as EMPTY and to only use the
empty-element tag form for those elements.
XML documents are composed of markup and content. There are six
kinds of markup that can occur in an XML document, namely, elements,
entity references, comments, processing instructions, marked sections and
document type declarations.
21.3.2.1 People
It requires an understanding of the people who are developing the libraries.
Technology has dictated the pace at which digital libraries have been able to
develop, but the manner in which the technology is used depends upon
people. Two important communities are the source of much of this
innovation. One group is the information professionals. They include
librarians, publishers and a wide range of information providers, such as
indexing and abstracting services. The other community contains the
computer science researchers and their offspring, the Internet developers.
Until recently, these two communities had disappointingly little interaction;
even now it is commonplace to find a computer scientist who knows nothing
of the basic tools of librarianship, or a librarian whose concepts of
information retrieval are years out of date. Over the past few years, however,
there has been much more collaboration and understanding.
A variety of words are used to describe the people who are associated
with digital libraries. One group of people are the creators of information in
the library. Creators include authors, composers, photographers, map
makers, designers and anybody else who creates intellectual works. Some
are professionals; some are amateurs. Some work individually, others in
teams. They have many different reasons for creating information.
Another group is the users of the digital library. Depending on the context,
users may be described by different terms. In libraries, they are often called
“readers” or “patrons”; at other times they may be called the “audience” or
the “customers”. A characteristic of digital libraries is that creators and users
are sometimes the same people. In academia, scholars and researchers use
libraries as resources for their research and publish their findings in forms
that become part of digital library collections.
The final group of people is a broad one that includes everybody whose
role is to support the creators and the users. They can be called information
managers. The group includes computer specialists, librarians, publishers,
editors and many others. The World Wide Web has created a new profession
of Webmaster. Frequently a publisher will represent a creator, or a library
will act on behalf of users, but publishers should not be confused with
creators or librarians with users. A single individual may be a creator, user
and information manager.
21.3.2.2 Economics
Technology influences the economic and social aspects of information and
vice versa. The technology of digital libraries is developing fast and so are
the financial, organisational and social frameworks. The various groups that
are developing digital libraries bring different social conventions and
different attitudes to money. Publishers and libraries have a long tradition of
managing physical objects, notably books, but also maps, photographs,
sound recordings and other artifacts. They evolved economic and legal
frameworks that are based on buying and selling these objects. Their natural
instinct is to transfer to digital libraries the concepts that have served them
well for physical artifacts. Computer scientists and scientific users, such as
physicists, have a different tradition. Their interest in digital information
began in the days when computers were very expensive. Only a few well-
funded researchers had computers on the first networks. They exchanged
information informally and openly with colleagues, without payment. The
networks have grown, but the tradition of open information remains.
The economic framework that is developing for digital libraries shows a
mixture of these two approaches. Some digital libraries mimic traditional
publishing by requiring a form of payment before users may access the
collections and use the services. Other digital libraries use a different
economic model. Their material is provided with open access to everybody.
The costs of creating and distributing the information are borne by the
producer, not the user of the information. Almost certainly, both have a long-
term future, but the final balance is impossible to forecast.
21.3.4.1 Mercury
One of the first attempts to create a campus digital library was the Mercury
Electronic Library, a project that we taken at Carnegie Mellon University
between 1987 and 1993. It began in 1988 and went live in 1991 with a dozen
textual databases and a small number of page images of journal articles in
computer science. Mercury was able to build upon the advanced computing
infrastructure at Carnegie Mellon, which included a highperformance
network, a fine computer science department and the tradition of innovation
by the university libraries.
21.3.4.2 CORE
CORE was a joint project by Bellcore, Cornell University, OCLC and the
American Chemical Society that ran from 1991 to 1995. The project
converted about 400,000 pages, representing four years of articles from
twenty journals published by the American Chemical Society.
The project used a number of ideas that have since become popular in
conversion projects. CORE included two versions of every article, a scanned
image and a text version marked up in SGML. The scanned images ensured
that when a page was displayed or printed it had the same design and layout
as the original paper version. The SGML text was used to build a full-text
index for information retrieval and for rapid display on computer screens.
Two scanned images were stored for each page, one for printing and the
other for screen display. The printing version was black and white, 300 dots
per inch; the display version was 100 dots per inch, grayscale.
Although both the Mercury and CORE projects converted existing journal
articles from print to bitmapped images, conversion was not seen as the
long-term future of scientific libraries. It simply reflected the fact that none
of the journal publishers were in a position to provide other formats.
Mercury and CORE were followed by a number of other projects that
explored the use of scanned images of journal articles. One of the best
known was Elsevier Science Publishing’s Tulip project. For three years,
Elsevier provided a group of universities, which included Carnegie Mellon
and Cornell, with images from forty three journals in material sciences. Each
university, individually mounted these images on their own computers and
made them available locally.
Large libraries are painfully expensive for even the richest organisations. Buildings are about
a quarter of the total cost of most libraries. Behind the collections of many great libraries are
huge, elderly buildings, with poor environmental control. Even when money is available,
space for expansion is often hard to find in the centre of a busy city or on a university
campus.
The costs of constructing new buildings and maintaining old ones to store printed books and
other artifacts will only increase with time, but electronic storage costs decrease by at least
30 per cent per annum. In 1987, began work on a digital library at Carnegie Mellon
University, known as the Mercury library. The collections were stored on computers, each
with ten gigabytes of disk storage. In 1987, the list price of these computers was about
$120,000. In 1997, a much more powerful computer with the same storage cost about $4,000.
In ten years, the price was reduced by about 97 per cent. Moreover, there is every reason to
believe that by 2007 the equipment will be reduced in price by another 97 per cent.
Ten years ago, the cost of storing documents on CD-ROM was already less than the cost of
books in libraries. Today, storing most forms of information on computers is much cheaper
than storing artifacts in a library. Ten years ago, equipment costs were a major barrier to
digital libraries. Today, they are much lower, though still noticeable, particularly for storing
large objects such as digitised videos, extensive collections of images, or high-fidelity sound
recordings. In ten years time, equipment that is too expensive to buy today will be so cheap
that the price will rarely be a factor in decision making.
Storage cost is not the only factor. Otherwise libraries would have standardised on microfilm
years ago. Until recently, very few people were happy to read from a computer. The quality
of the representation of documents on the screen was also poor. The usual procedure was to
print a paper copy. Recently, however, major advances have been made in the quality of
computer displays, in the fonts which are displayed on them and in the software that is used
to manipulate and render information. People are beginning to read directly from computer
screens, particularly materials that were designed for computer display, such as Web pages.
The best computers displays are still quite expensive, but every year they get cheaper and
better. It will be a long time before computers match the convenience of books for general
reading, but the high-resolution displays to be seen in research laboratories are very
impressive indeed.
Most users of digital libraries have a mixed style of working, with only part of the materials
that they use in digital form. Users still print materials from the digital library and read the
printed version, but every year more people are reading more materials directly from the
screen.
The growth of the Internet over the past few years has been phenomenal.
Telecommunications companies compete to provide local and long distance Internet service
across the United States; international links reach almost every country in the world; every
sizable company has its internal network; universities have built campus networks;
individuals can purchase low-cost, dial-up services for their homes.
The coverage is not universal. Even in the US there are many gaps and some countries are
not yet connected at all, but in many countries of the world it is easier to receive information
over the Internet than to acquire printed books and journals by orthodox methods.
Portable computers:
Although digital libraries are based around networks, their utility has been greatly enhanced
by the development of portable, laptop computers. By attaching a laptop computer to a
network connection, a user combines the digital library resources of the Internet with the
personal work that is stored on the laptop. When the user disconnects the laptop, copies of
selected library materials can be retained for personal use.
During the past few years, laptop computers have increased in power, while the quality of
their screens has improved immeasurably. Although batteries remain a problem, laptops are
no heavier than a large book and the cost continues to decline steadily.
Using library requires access. Traditional methods require that the user goes to the library. In
a university, the walk to a library takes a few minutes, but not many people are member of
universities or have a nearby library. Many engineers or physicians carry out their work with
depressingly poor access to the latest information.
A digital library brings the information to the user’s desk, either at work or at home, making
it easier to use and hence increasing its usage. With a digital library on the desk top, a user
need never visit a library building. The library is wherever there is a personal computer and a
network connection.
Computing power can be used to find information. Paper documents are convenient to read,
but finding information that is stored on paper can be difficult. Despite the myriad of
secondary tools and the skill of reference librarians, using a large library can be a tough
challenge. A claim that used to be made for traditional libraries is that they stimulate
serendipity, because readers stumble across unexpected items of value. The truth is that
libraries are full of useful materials that readers discover only by accident.
In most aspects, computer systems are already better than manual methods for finding
information. They are not as good as everybody would like, but they are good and improving
steadily. Computers are particularly useful for reference work that involves repeated leaps
from one source of information to another.
Libraries and archives contain much information that is unique. Placing digital information
on a network makes it available to everybody. Many digital libraries or electronic
publications are maintained at a single central site, perhaps with a few duplicate copies
strategically placed around the world. This is a vast improvement over expensive physical
duplication of little used material, or the inconvenience of unique material that is inaccessible
without traveling to the location where it is stored.
Many libraries have the provision of online text of reference works, such as directories or
encyclopedias. Whenever revisions are received from the publisher, they are installed on the
library’s computer. The new versions are available immediately. The Library of Congress has
an online collection, called Thomas. This contains the latest drafts of all legislation currently
before the US Congress; it changes continually.
The doors of the digital library never close; a recent study at a British university found that
about half the usage of a library’s digital collections was at hours when the library buildings
were closed. Material is never checked out to other readers, miss-shelved or stolen; they are
never in an offcampus warehouse. The scope of the collections expands beyond the walls of
the library. Private papers in an office or the collections of a library on the other side of the
world are as easy to use as materials in the local library.
Digital libraries are not perfect. Computer systems can fail and networks may be slow or
unreliable, but, compared with a traditional library, information is much more likely to be
available when and where the user wants it.
Most of what is stored in a conventional library is printed on paper, yet print is not always the
best way to record and disseminate information. A database may be the best way to store
census data, so that it can be analysed by computer; satellite data can be rendered in many
different ways; a mathematics library can store mathematical expressions, not as ink marks
on paper but as computer symbols to be manipulated by programs such as Mathematica or
Maple.
Even when the formats are similar, material that is created explicitly for the digital world are
not the same as material originally designed for paper or other media. Words that are spoken
have a different impact from the words that are written and online textual material is subtly
different from either the spoken or printed word. Good authors use words differently when
they write for different media and users find new ways to use the information. Material
created for the digital world can have a vitality that is lacking in material that has been
mechanically converted to digital formats, just as a feature film never looks quite right when
shown on television.
21.4.1.1 Images
Images include photographs, drawings and so on. Images are usually stored
in raw form as a set of pixel or cell values, or in a compressed form to save
storage space. The image shape descriptor describes the geometric shape of
the raw image, which is typically a rectangle of cells of a certain width and
height. Each cell contains a pixel value that describes the cell content. In
black/white images, pixels can be one bit. In gray scale or colour images,
pixel is multiple bits. Images require very large storages space. Hence, they
are often stored in a compressed form, such as GIF, JPEG. These
compressed forms use various mathematical transformations to reduce the
number of cells stored, without disturbing the main image characteristics.
The mathematical transforms used to compress images include Discrete
Fourier Transform (DFT), Discrete Cosine Transform (DCT) and Wavelet
Transforms.
In order to identify the particular objects in an image, the image is divided
into two homogeneous segments using a homogeneity predicate. The
homogeneity predicate defines the conditions for how to automatically group
those cells. For example, in a colour image, cells that are adjacent to one
another and whose pixel values are close are grouped into a segment.
Segmentation and compression can hence identify the main characteristics of
an image.
Inexpensive image-capture and storage technologies have allowed
massive collections of digital images to be created. However, as a database
grows, the difficulty of finding relevant images increases. Two general
approach namely manual identification and automatic analysis, to this
problem have been developed. Both the approaches use metadata for image
retrieval.
21.6.3.1 Elements
An element is a basic building block of a geometric feature for the Spatial
Data Option. The supported spatial element types are points, line strings and
polygons. For example, elements might be modelled to historic markers
(point clusters), roads, (line strings) and county boundaries (polygons). Each
coordinate in an element is stored as an X, Y pair.
Point data consists of one coordinate and the sequence number is ‘0’. Line
data consists of two coordinates representing a line segment of the element,
starting with sequence number ‘0’. Polygon data consists of coordinate pair
values, one vertex pair for each line segment of the polygon. The first
coordinate pair (with sequence number ‘0’), represents the first line segment,
with coordinates defined in either a clockwise or counter-clockwise order
around the polygon with successive sequence numbers. Each layer’s
geometric objects and their associated spatial index are stored in the
database in tables.
21.6.3.2 Geometries
A geometry or geometric object is the representation of a user’s spatial
feature, modelled as an ordered set of primitive elements. Each geometric
object is required to be uniquely identified by a numeric geometric identifier
(GID), associating the object with its corresponding attribute set. A complex
geometric feature such as a polygon with holes would be stored as a
sequence of polygon elements. In multi-element polygon geometry, all sub-
elements are wholly contained within the outmost element, thus building a
more complex geometry from simpler pieces. For example, geometry might
describe the fertile land in a village. This could be represented as a polygon
with holes that represent buildings or objects that prevent cultivation.
21.6.3.3 Layers
A layer is a homogeneous collection of geometries having the same attribute
set. For example, one layer in a GIS includes topographical features, while
another describes population density and a third describes the network of
roads and bridges in the area (linea and points). Layers are composed of
geometries, which in turn are made up of elements. For example, a point
might represent a building location, a line string might be a road or flight
path and a polygon could be a state, city, zoning district or city block.
Nearest neighbour query or adjacency: This query finds an object of a particular type that
is closest to a given location. For example, finding the police post that is closest to your
house, finding all restaurants that lie within five kilometre of distance of your residence or
finding the hospital nearest to the adjacent site and so on.
Spatial joins or overlays: This query typically joins the objects of two types based on some
spatial condition, such as the objects intersecting or overlapping spatially or being within a
certain distance of one another. For example, finds all cities that fall on National Highway
from Jamshedpur to Patna, or finds all buildings within two kilometres of a steel plant.
21.6.5.1 R-Tree
To answer the spatial queries efficiently, special techniques for spatial
indexing are needed. One of the best- known techniques used is R-tree and
its variations to answer spatial queries. R-trees group together objects that
are in close spatial physical proximity on the same leaf nodes of a tree-
structured index. Since a leaf node can point to only a certain number of
objects, algorithms for dividing the space into rectangular subspaces that
include the objects are needed. Typical criteria for dividing space include
minimising the rectangular areas, since this would lead to a quicker
narrowing of the search space. Problems such as having objects with
overlapping spatial areas are handled in differently by different variations of
R-trees. The internal nodes of R-trees are associated with rectangles whose
area covers all the rectangles in its sub-tree. Hence, R-trees can easily
answer queries, such as find all objects in a given area by limiting the tree
search to those sub-trees whose rectangles intersect with the area given in
the query.
21.6.5.2 Quadtree
Other spatial storage structures include quadtrees and their variations.
Quadtrees is an alternative representation for two-dimensional data.
Quadtrees is a spatial index, which generally divide each space or sub-space
into equally sized areas and proceed with the subdivision of each sub-space
to identify the positions of various objects. Quadtrees are often used for
storing raster data. Raster is a cellular data structure composed of rows and
columns for storing images. Groups of cells with the same value represent
features.
If downtime is not an option and the Web never closes for business, how do
we keep our company’s doors open 24/7? The answer lies in high-
availability (HA) systems that approach 100 per cent uptime.
The principles of high availability define a level of backup and recovery.
Until recently, high availability simply meant hardware or software recovery
via RAID (Redundant Array of Independent Disks). RAID addressed the
need for fault tolerance in data but did not solve the problem of a complete
DBMS.
For even more uptime, database administrators are turning to clustering as
the best way to achieve high availability. Recent moves by Oracle, with its
Real Application Cluster and Microsoft, with MCS (Microsoft Cluster
Service) have made multinode clusters for HA in production environments
mainstream.
R Q
1. What is Internet? What are the available Internet services?
2. What is WWW? What are Web technologies? Discuss each of them.
3. What are hypertext links?
4. What is HTML? Give an example of HTML file.
5. What is HTTP? How does it work?
6. What is an IP address? What is its importance?
7. What is domain name? What is its use?
8. What is a URL? Explain with an example.
9. What is MIMEE in the context of WWW? What is its importance?
10. What are Web browsers?
11. What do you mean by web databases? What are Web database tools? Explain.
12. What is XML? What are XML documents? Explain with an example.
13. What are the advantages and disadvantages of Web databases?
14. What do you mean by spatial data? What are spatial databases?
15. What is a digital library? What are its components? Discuss each one of them.
16. Why do we use digital libraries?
17. Discuss the technical developments and technical areas of digital libraries.
18. How do we get access to digital libraries?
19. Discuss the application of digital libraries for scientific journals.
20. Explain the method or form in which data is stored in digital libraries.
21. What are the potential benefits of digital libraries?
22. What are multimedia databases?
23. What are multimedia sources? Explain each one of them.
24. What do you mean by contest-based retrieval in multimedia databases?
25. What is automatic analysis and manual identification approaches to multimedia indexing?
26. What are the different multimedia sources?
27. What are the properties of images?
28. What are the properties of the video?
29. What is document and how are they stored in a multimedia database?
30. What are the properties of the audio source?
31. How is a query processed in multimedia databases? Explain.
32. How are multimedia sources identified in multimedia databases? Explain.
33. What are the applications of multimedia databases?
34. What is mobile computing?
35. Explain the mobile computing environment with the help of a diagram.
36. What is a mobile database? Explain the architecture of mobile database with neat sketch.
37. What is spatial data model?
38. What do you mean by element?
39. What is geometry or geometric object?
40. What is a layer?
41. What is spatial query?
42. What is spatial overlay?
43. Differentiate between range queries, neighbour queries and spatial joins.
44. What are R-trees and Quadtrees?
45. What are the main characteristics of spatial databases?
46. Explain the concept of clustering-based disaster-proof databases.
STATE TRUE/FALSE
a. search engine.
b. WWW.
c. FTP.
d. All of these.
a. ARPAnet.
b. NSFnet.
c. MILInet.
d. All of these.
a. Domain name.
b. URL.
c. IP address.
d. HTTP.
a. IP address
b. E-mail address
c. Domain name
d. All of these.
a. GIS data.
b. CAD data.
c. CAM data.
d. All of these.
a. people.
b. economic.
c. computers and networks.
d. All of these.
a. Line
b. Points
c. Polygon
d. Area.
10. Which of the following finds objects of a particular type that is within a given spatial area or
within a particular distance from a given location?
a. Range query
b. Spatial joins
c. Nearest neighbour query
d. None of these.
a. X-trees
b. R-trees
c. B-trees
d. None of these.
12. Which of the following is a mathematical transformation used by image compression
standards?
a. Wavelet Transform
b. Discrete Cosine Transform’
c. Discrete Fourier Transform
d. All of these.
a. Cell
b. Shape descriptor
c. Property descriptor
d. Pixel descriptor.
14. Which of the following is an example of a database application here content-based retrieval
is useful?
a. multidimensional space.
b. Single dimensional space.
c. Both (a) & (b).
d. None of these.
1. The Internet is a worldwide collection of _____ connected by _____ media that allow users
to view and transfer information between _____.
2. The World Wide Web is a subset of _____ that uses computers called _____ to store
multimedia files.
3. The _____ is a system, based on hypertext and HTTP, for providing, organising, and
accessing a wide variety of resources that are available via the Internet.
4. HTML is the abbreviation of _____.
5. HTML is used to create _____ stored at web sites.
6. URL is the abbreviation for _____.
7. An _____ is a unique number that identifies computers on the Internet.
8. _____ look up the domain name and match it to the corresponding IP address so that data can
be properly routed to its destination on the Internet.
9. The Common Gateway Interface CGI is a standard for interfacing _____ with _____.
10. _____ is the set of rules, or protocol, that governs the transfer of hypertext between two or
more computers.
11. _____ provide the concept of database that keep track of objects in a multidimensional space.
12. _____ provide features that allow users to store and query different types of multimedia
information like images, video clips, audio clips and text or documents.
13. The _____ is a hierarchical structure consisting of elements, geometries and layers, which
correspond to representations of spatial data.
14. _____ are composed of geometries, which in turn are mad up of elements.
15. An element is the basic building block of a geometric feature for the _____.
16. A _____ finds objects of a particular type that are within a given spatial area or within a
particular distance from a given location.
17. _____ query finds an object of a particular type that is closer to a given location.
18. A _____ is the representation of a user’s spatial feature, modelled as an ordered set of
primitive elements.
19. The process of overlaying one theme with another in order to determine their geographic
relationships is called _____.
20. _____ joins the objects of two types based on some spatial condition, such as the objects
intersecting or overlapping spatiality or being within a certain distance of one another.
21. The multimedia queries are called _____ queries.
22. _____ is general-purpose mathematical analysis tool that has been used in a variety of
information- retrieval applications.
23. An indexing technique called _____ can then be used to group similar documents together.
24. Spatial databases keep track of objects in a _____ space.
Part-VII
CASE STUDIES
Chapter 22
22.1 INTRODUCTION
M/s Greenlay Bank has just ventured into a retail banking system with the
following functions (sub-processes) at the beginning:
Saving Bank accounts.
Current Bank accounts.
Fixed deposits.
Loans.
DEMAT Account.
Fig. 22.1 shows various sub-processes of a typical retail banking system.
Each functions (or sub-processes), in turn has multiple child processes that
work together in harmony for the process to be useful. In this example, we
will consider only three functions, namely saving bank (SB) accounts,
current bank (CB) accounts and fixed deposits (FDs).
Saving bank transactions, both deposits and withdrawals, are updated on real-time
basis.
Current Account
Bank maintains record of each organisation or company with the following details:
Current account transactions, both deposits and withdrawals, are updated on real-
time basis.
During manufacturing, an assembly can pass through any sequence of processes in any order.
It may pass through the same process more than once.
A unique job number (JOB-NO) is assigned every time a process begins on an assembly.
Information recorded about a JOB-NO includes COST, DATE-COMMENCED and DATE-
COMPLETED at the process as well as additional information that depends on the type of
JOB process.
JOBs are classified into job type sets. These type sets are uniform and hence use the same
identifier as JOB-NO. Information stored about particular job types is as follows:
PROC-ID
ASS
DEPT
The above account types can be kept in different type sets. The type sets are unique and
hence use a common identifier as ACCOUNT.
As a job proceeds, cost transactions can be recorded against it. Each such transaction is
identified by a unique transaction number (TRANS-NO) and is for a given cost, SUP-COST.
Each transaction updates the following three accounts:
PROC-ACCT
ASS-ACCT
DEPT-ACCT
The course may be presented either to the general public (GEN-OFFERING), or, as a special
presentation (SPECIAL-OFFERING), to a specific organisation.
There can be any number of participants at each course presentation. Each participant has a
name and is associated with some organisation.
Each course has a fee structure. There is a standard FEE for each participant at a general
offering.
There is a separate SPECIAL-FEE if the course is a SPECIAL-OFFERING on an
organisation’s premises. In that case only a fixed fee is charged for the whole course to the
organisation and there is no extra fee for each participant.
Employees of the organisation can be authorised to be LECTURERs or ORIGINATOR. The
sets that present these nodes are uniform and use the same identifier as the source entity,
EMPLOYEE.
Each lecturer may spend any number of days on one presentation of a given course provided
that such an assignment does not exceed the duration of the course.
The DAYS-SPENT by a lecturer on a course offering are recorded.
M/s KLY Enterprise is one of M/s KLY group company, which runs a large
book store. It keeps books on various subjects. Presently, M/s KLY
Enterprise takes the order from its customer on a phone and the inquiry
about order shipment, delivery status and so on, are handled manually. M/s
KLY Enterprise wants to go online and automate its activities by database
design and implementation. M/s KLY Enterprise wants its entire activities on
a new Web site such that the customers can access and order for books
directly from the Internet.
22.6.1 Requirement Definition and Analysis
The following requirements have been identified after the detailed analysis
of the existing system:
Customers browse the catalogue of books and place orders over the Internet.
M/s KLY’s Internet Book Shop has mostly corporate customers who call the book store and
give the ISBN number of a book and a quantity. M/s KLY then prepares a shipment that
contains the books they have ordered. In case enough copies are not available in the stock,
additional copies are ordered by M/s KLY. The shipment is delayed until the new copies
arrive and entire order together is shipped.
The book store’s catalogue includes all the books that M/s KLY Enterprise sells.
For each book, the catalogue contains the following details:
Most of the customers of M/s KLY Enterprise are regulars, and their records are kept as
follows:
New customers are given an account number before they can use the Web site.
On M/s KLY’s Web site, customers first identify themselves by their unique customer
identification number (CUST-ID) and then they are allowed to browse the catalogue and
place orders on line.
Thus, now the ORDERS are assigned sequential order numbers (ORDER-
NO). The orders that are placed later will have higher order numbers. If
several orders are placed by the same customer on a single day, these orders
will have different order numbers and can thus be distinguished.
ORDER-NO → CUST-ID
ORDER-NO → ORDER-DATE
Fig. 22.13 Sample relations and contents for internet book shop
22.7 DATABASE DESIGN FOR CUSTOMER ORDER WAREHOUSE
R Q
1. Draw functional dependency (FD) diagram for retail banking case study discussed in Section
22.2.
2. M/s KLY Computer System and Services is in the business of computer assembly and
retailing. It assembles personal computers (PCs) and sales to its customers. To remain
competitive in the computer segment and provide its customers the best deals, M/s KLY has
decided to implement a computerised manufacturing and sales system. The requirement
definition and analysis is given below:
M/s KLY computer system and services has the following main processes:
Marketing.
PC assembly.
Finished goods warehouse.
Sales and delivery.
Finance.
Purchase and stores.
The following requirements have been identifies after the detailed analysis of the existing
system:
Customer places order for PCs with detailed technical specification in consultation
with the marketing person of M/s KLY.
Customer order is delivered to PC Assembly department. Assembly department
creates the customer invoice with advance details based on customer order together
with explicit unit cost and the total assembly costs.
PC Assembly department requisitions for parts or components from the Purchase
and Store department. After receiving parts, PCs are assembled and moved to the
Finished Goods Warehouse for temporary storage (prior to delivery to the customer)
along with finished goods delivery note.
The Purchase and Stores department buys computer parts in bulk from various
suppliers and stocked in the stores.
PCs are dispatched to the customer by the Sales and Delivery department along with
a goods delivery challan.
After receiving the delivery challan, customer makes the payment at the Finance
department of M/s KLY.
Figs. 22.18, 22.19 and 22.20 shows workflow diagrams of M/s KLY Computer System and
Services for Customer Order, PC Assembly and Delivery and Spare Parts Inventory,
respectively.
3. for private technical training institute case discussed in Section 22.5, develop the following:
4. for Internet book shop case discussed in Section 22.6, develop the following:
COMMERCIAL DATABASES
Chapter 23
23.1 INTRODUCTION
DB2 comes in the four editions namely DB2 Express, DB2 Workgroup
Server Edition, DB2 Enterprise Server Edition and DB2 Personal Edition.
All four editions provide the same full-function database management
system, but they differ from each other in terms of connectivity options,
licensing agreements and additional function.
All DB2 products have a common component called the DB2 Client
Application Enabler (CAE). Once a DB2 application has been developed,
the DB2 Client Application (CAE) component must be installed on each
workstation executing the application. Fig. 23.1 shows the relationship
between the application, CAE and the DB2 database server. If the
application and database are installed on the same workstation, the
application is known as a local client. If the application is installed on a
workstation other than the DB2 server, the application is known as a remote
client.
The Client Application Enabler (CAE) provides functions other than the
ability to communicate with a DB2 UDB server or DB2 Connect gateway
machine. CAE enables users to perform any of the following tasks:
Issue an interactive SQL statement using CAE on a remote client to access data on a remote
UDB server.
Graphically administer and monitor a UDB database server.
Run applications that were developed to comply with the Open Database Connectivity
(ODBC) standard.
Run Java applications that access and manipulate data in DB2 UDB databases using Java
Database Connectivity (JDBC).
There are no licensing requirements to install the Client Application
Enabler (CAE) component. Licensing is controlled at the DB2 UDB server.
The CAE installation depends on the operating system on the client machine.
There is a different CAE for each supported DB2 client operating system.
The supported platforms are OS/2, Windows NT, Windows 95, Window
2000, Window XP, Windows 3.x, AIX, HP-UX and Solaris. The CAE
component should be installed on all end-user workstations.
The DB2 database products are collectively known as the DB2 Family.
The DB2 family is divided into the following two main groups:
DB2 for midrange and large systems. This is supported on platforms such as OS/400,
VSE/VM and OS/390.
DB2 UDB for Intel and UNIX environments. This is supported on platforms such as MVS,
OS/2, Windows NT, Windows 95, Windows 2000, AIX, HP-UX and Sun Solaris.
The midrange and large system members of the DB2 Family are very
similar to DB2 UDB, but their features and implementations sometimes
differ due to operating system differences.
Table 23.1 summarises the DB2 family of products. The DB2 provides
seamless database connectivity using the most popular network
communications protocols, including NetBIOS, TCP/IP, IPX/SPX, Named
Pipes and APPC. The infrastructure within which DB2 database clients and
DB2 database servers communicate is provided by the DB2.
• Digital Library
All the DB2 UDB editions contain the same database management engine,
support the full SQL language and provide graphical user interfaces (GUIs)
for interactive query and database administration.
DB2 Universal Database (UDB) product family also includes two
“developer’s editions” that provide tools for application program
development. These four different DB2 database server products are as
follows:
DB2 UDB Personal Developer’s Edition.
DB2 UDB Universal Developer’s Edition.
With the exception of the Personal Edition and the Personal Developer’s
Edition, all versions of DB2 UDB are multi-user systems that support remote
clients and include client software called Client Application Enablers
(CAEs) for all supported platforms. The licensing terms of the multi-user
versions of DB2 UDB depend on the number of users and the number of
processors in user’s hardware configuration.
Fig. 23.3 illustrates configuration of DB2 UDB Personal Edition. The user
can access a local database on their mobile workstation (for example, a
laptop) and access remote databases found on the database server. From the
laptop, the user can make changes to the database throughout the day and
replicate those changes as a remote client to a DB2 UDB remote server. DB2
UDB Personal Edition includes graphical tools that enable user to
administer, tune for performance, access remote DB2 servers, process SQL
queries and manage other servers from single workstation.
DB2 UDB Personal Edition product may be appropriate for the following
users:
DB2 mobile users who use a local database and can take advantage of the replication feature
in UDB to copy local changes to a remote server.
DB2 end-users requiring access to local and remote databases.
Another edition, called DB2 Express Edition, has been introduced, which
has the same full-function DB2 database as DB2 Workgroup Edition with
additional new features. The new features make it easy to transparently
install within an application. DB2 Express Edition is available for the Linux
and Windows platform (Window NT, Window 2000 and Window XP).
The popularity of the Internet and the World Wide Web (WWW) has
created a demand for web access to enterprise data. The DB2 UDB server
product includes all supported DB2 Net.Data products. Applications that are
build with Net.Data may be stored on a web server and can be viewed from
any web browser because they are written in hypertext markup language
(HTML). While viewing these documents, users can either select automated
queries or define new ones that retrieve specific information directly from a
DB2 UDB database. The ability to connect to a host database (DB2
Connect) is built into Enterprise Edition.
23.3.1.4 DB2 UDB Enterprise-Extended Edition
As discussed earlier, all the DB2 UDB editions can take advantage of
parallel processing when installed on a symmetric multiprocessor platform.
DB2 UDB Enterprise-Extended Edition introduces a new dimension of
parallelism that can be scaled to a very large capacity and very high
performance. It contains all the features and functions of DB2 UDB
Enterprise Edition. It also provides the ability for an Enterprise-Extended
Edition (EEE) database to be partitioned across multiple independent
machines (computers) of the same platform that are connected by network or
a high-speed switch. Additional machines can be added to an EEE system as
application requirements grow. The individual machines participating in an
EEE installation may be either uni-processors or symmetric multiprocessors.
To the end-user or application developer, the EEE database appears to be
on a single machine. While, DB2 UDB Workgroup and DB2 UDB
Enterprise Edition can handle large databases, the Enterprise-Extended
Edition (EEE) is designed for applications where the database is simply too
large for a single machine to handle efficiently. SQL operations can operate
in parallel on the individual database partitions, thus increasing the
execution of a single query.
DB2 UDB Enterprise-Extended Edition licensing is similar to that of DB2
Enterprise Edition. However, the licensing is based on the number of
registered or concurrent users, the type of processor and the number of
database partitions. The base license for DB2 UDB Enterprise-Extended
Edition is for machines ranging from a uni-processor up to a 4-way SMP.
The base number of users is different in Enterprise-Extended Edition than in
Enterprise Edition. The base user license is for one user with an additional
50 users, equalling 51 users for that database partition. The total number of
users per database partition also depends on the total number of database
partitions. For example, in a system configuration of four nodes, each node
or database partition could support 51 × 4 or 204 users. Tier upgrades also
are available. The first tier upgrade for a database partition provides the
rights to a 50 user entitlement pack for that database partition node.
Additional user entitlements are available for 1, 5, 10 or 50 users. DB2 UDB
Enterprise-Extended Edition is available on the AIX platform.
23.3.3.3 SmartGuides
SmastGuides are tutors that guide a user in creating objects and other
database operations. Each operation has detailed information available to
help the user. The DB2 SmartGuides are integrated into the administration
tools and assist us in completing administration tasks. As shown in Fig.
23.11, Client Configuration Assistant (CCA) tool of DB2 Desktop Folder is
used to set up communication on a remote client to the database server.
Fig. 23.13 Control centre
Fig. 23.14 shows several ways of adding a remote database. User do not
have to know the syntax of commands, or even the location of the remote
database server. One option searches the network, looking for valid DB2
UDB servers for remote access.
Fig. 23.14 Client configuration assistant (CCA)-Add Database SmartGuide
By extracting information from the system and asking questions about the
database workload, the Performance Configuration SmartGuide tool will run
a series of calculations designed to determine an appropriate set of values for
the database and database manager configuration variables. One can choose
whether to apply to changes immediately, or to save them in a file that can
be executed at a later time.
All data access takes place through the SQL interface. The basic elements
of a database engine are database objects, system catalogs, directories and
configuration files.
As discussed in the previous sections, the DB2 UDB server runs on many
different operating systems. However, in this section we will discuss about
the installation of DB2 server on the Windows platform.
The DB2 Setup wizard will provide a disk space estimate for the
installation options you select. Remember to include disk space allowance
for required software, communication products, and documentation. In DB2
Version 8, HTML and PDF documentation is provided on separate CD-
ROMs.
Windows 2000 SP3 and Windows XP SP1 are required for running DB2
applications in either of the following environments:
Applications that have COM+ objects using ODBC; or
Applications that use OLE DB Provider for ODBC with OLE DB resource pooling disabled.
If you are not sure about whether your application environment qualifies,
then it is recommended that you install the appropriate Windows service
level. The Windows 2000 SP3 and Windows XP SP1 are not required for the
DB2 server itself or any applications that are shipped as part of DB2
products.
If you are not sure about whether your application environment qualifies,
then it is recommended that you install the appropriate Windows service
level. The Windows 2000 SP3 and Windows XP SP1 are not required for
DB2 server itself or applications that are shipped as part of DB2 products.
The DB2 Setup wizard will provide a disk space estimate for the
installation options you select. Remember to include disk space allowance
for required software, communication products, and documentation. In DB2
Version 8, HTML and PDF documentation is provided on separate CD-
ROMs.
If you are not sure about whether your application environment qualifies,
then it is recommended that you install the appropriate Windows service
level. The Windows 2000 SP3 and Windows XP SP1 are not required for
DB2 server itself or applications that are shipped as part of DB2 products
You cannot install DB2 Version 8 from a network mapped drive using a
remote session on Windows 2000 Terminal Server edition. The available
workaround is to use Universal Naming Convention (UNC) paths to launch
the installation, or run the install from the console session.
For example, if the directory c:\pathA\pathB\...\pathN on a serverA is
shared as serverdir, you can open \\serverA\serverdir\filename.ext to access
the file c:\pathA\pathB\...pathN\filename.ext on server.
Table 23.5 DB2 Connect Personal Edition for Windows Memory requirements
The DB2 Setup wizard will provide a disk space estimate for the
installation options you select. Remember to include a disk space allowance
for required software, communication products, and documentation. In DB2
Version 8, HTML and PDF documentation is provided on separate CD-
ROMs.
The DB2 Setup wizard provides a disk space estimate for the installation
option you select. Remember to include disk space allowance for required
software, communication products and documentation. For DB2 Cube Views
Version 8.1, the HTML documentation is installed with the product and the
PDF documentation is on the product CD-ROM.
Server component:
Microsoft Windows NT 4 32-bit.
Windows 2000 32-bit.
Client component:
Microsoft Windows NT 4 32-bit.
Windows 2000 32-bit.
Windows XP 32-bit.
Before installing DB2 product, it must be ensured that you meet prerequisite
requirements of hardware and software components such as disk, memory,
communication, operating system and so on, as discussed in previous
sections.
Fig. 23.19 The “Welcome to the DB2 Setup Wizard” dialogue box
Fig. 23.20 The “License Agreement” dialogue box
Fig. 23.21 The “Select the Installation Type” dialogue box
Fig. 23.22 The “Select Installation Folder” dialogue box
Step 9: Click Next to continue. “Set user Information for the DB2
Administration Server” appears on the dialogue box as shown in
Fig. 23.23. Enter the user name and password you would like to
use for the DB2 Administration Server.
Fig. 23.23 The “Set user information for the DB2 Administration Server” dialogue box
Fig. 23.26 The “Prepare the DB2 tools catalog” dialogue box
Fig. 23.27 The “Specify a local database to store the DB2 tools catalog” dialogue box
Fig. 23.28 The “Specify a contact for health monitor notification” dialogue box
Step 15: Click Next to continue. “Start copying files” appears on the
dialogue box as shown in Fig. 23.29. As you have already
given DB2 all the information required to install the product
on your computer, it gives you one last chance to verify the
values you have entered.
Fig. 23.29 The “Start copying files” dialogue box
Step 16: Click Install to have the files copied to your system. You
can also click Back to return to the dialogue boxes that you
have already completed to make any changes. The
installation progress bars appear on screen while the product
is being installed.
Step 17: After the completion of installation process, “Set up is
complete” appears on the dialogue box as shown in Fig.
23.30.
Step 18: Click the Finish button to complete the installation. “First
Steps” and “Congratulations!” appear on the dialogue box
as shown in Fig. 23.31 with the following options:
Create Sample Database.
Work with Sample Database.
Work with Tutorial.
View the DB2 Product Information Library.
Launch DB2 UDB Quick Tour.
Find other DB2 Resources on the World Wide Web.
Exit First Step.
Created the DB2 Administration Sever, added it as a service, and configured it so that DB2
tools can administer the server. The service’s start type was set to Automatic.
Activated DB2 First Steps to start automatically following the first boot after installation.
Now all the installation steps have been completed and DB2 UDB can be
used to create DB2 UDB applications using options as shown in Fig. 23.32.
Fig. 23.32 The “First Steps” dialogue box with DB2 UDB Sample
R Q
1. What is a DB2? Who developed DB2 products?
2. What are the main DB2 products? What are their functions? Explain.
3. On what platforms can DB2 Universal Database be run?
4. What is DB2 SQL? Explain.
5. What tools are available to help administer and manage DB2 databases?
6. What is DB2 Universal Database? Explain with its configuration.
7. With neat sketches, write short notes on the following:
a. DB2 Extenders
b. Text Extenders
c. IAV Extenders
d. DB2 DataJoiner.
16. What are the major components of DB2 Universal Database? Explain each of them.
17. What are the features of DB2 Universal Databases?
18. What is DB2 Administrator’s Tool Folder? What are its components?
19. What is Control Centre? What are its main components?
20. What is a SmartGuide?
21. What are the functions of Database engine?
STATE TRUE/FALSE
1. Once a DB2 application has been developed, the DB2 Client Application (CAE) component
must be installed on each workstation executing the application.
2. DB2 UDB is a Web-enabled relational database management system that supports data
warehousing and transaction processing.
3. DB2 UDB can be scaled from hand-held computers to single processors to clusters of
computers and is multimedia-capable with image, audio, video, and text support.
4. The term “universal” in DB2 UDB refers to the ability to store all kinds of electronic
information.
5. DB2 UDB Personal Edition allows the users to create and use local databases and access
remote databases if they are available.
6. DB2 UDB Workgroup Edition is a server that supports both local and remote users and
applications.
7. DB2 UDB Personal Edition provides different engine functions found in Workgroup,
Enterprise and Enterprise-Extended Editions.
8. DB2 UDB Personal Edition can accept requests from a remote client.
9. DB2 UDB Personal Edition is licensed for multi user to create databases on the workstation
in which it was installed.
10. Remote clients can connect to a DB2 UDB Workgroup Edition server, but DB2 UDB
Workgroup Edition does not provide a way fro its users to connect to databases on host
systems.
11. DB2 UDB Workgroup Edition is not designed for use in a LAN environment.
12. The DB2 UDB Workgroup Edition is most suitable for large enterprise applications.
13. DB2 Enterprise-Extended Edition provides the ability for an Enterprise-Extended Edition
(EEE) database to be partitioned across multiple independent machines (computers) of the
same platform that are connected by network or a high-speed switch.
14. Lotus Approach is a comprehensive World Wide Web (WWW) development tool kit to create
dynamic web pages or complex web-based applications that can access DB2 databases.
15. Net.Data provides an easy-to-use interface for interfacing with UDB and other relational
databases.
16. DB2 Connect enables applications to create, update, control, and manage DB2 databases and
host systems using SQL, DB2 Administrative APIs, ODBC, JDBC, SQLJ, or DB2 CLI.
17. DB2 Connect supports Microsoft Windows data interfaces such as ActiveX Data Objects
(ADO), Remote Data Objects (RDO) and Object Linking and Embedding (OLE) DB.
18. DB2 Connect Personal Edition provides access to remote databases for a multi workstation.
19. DB2 Connect Enterprise Edition provides access form network clients to DB2 databases
residing on iSeries and zSeries host systems.
20. The DB2 Extenders add functions to DB2’s SQL grammar and exposes a C API for
searching and browsing.
21. The Text Extender provides linguistic, precise, dual and ngram indexes.
22. The IAV Extenders provide the ability to use images, audio and video data in user’s
applications.
23. DB2 DataJoiner is a version of DB2 Version 2 for Common Servers that enables its users to
interact with data from multiple heterogeneous sources, providing an image of a single
relational database.
1. Which DB2 UDB product cannot accept requests from remote clients?
a. Control Centre
b. Command Centre
c. Client Configuration Assistant
d. Both (a) and (c).
3. Which of the following is the main function of the DB2 Connect product?
a. DB2 Connect
b. DB2 Personal Edition
c. DB2 Personal Developer’s Edition
d. DB2 Enterprise Edition.
5. Which communication protocol could you use to access a DB2 UDB database?
a. X.25
b. AppleTalk
c. TCP/IP
d. None of these.
6. What product is required to access a DB2 for OS/390 from a DB2 CAE workstation?
7. Which communication protocol can be used between DRDA Application Requester (such as
DB2 Connect) and a DRDA Application Server (such as DB2 for OS/390)?
a. TCP/IP
b. NetBIOS
c. APPC
d. Both (a) and (c).
8. Which of the following provides the ability to access a host database with Distributed
Relational Database Architecture (DRDA)?
a. DB2 Connect
b. DB2 UDB
c. DB2 Developer’s Edition
d. All of these.
9. Which of the following provides the ability to develop and test a database application for one
user?
a. DB2 Connect
b. DB2 UDB
c. DB2 Developer’s Edition
d. All of these.
a. includes all the features provided in the DB2 UDB Workgroup Edition.
b. supports for host database connectivity.
c. provides users with access to DB2 databases residing on iSeries. or zSeries
platforms.
d. All of these.
13. DB2 UDB Personal Developer’s Edition includes for Windows platform the following:
14. A comprehensive World Wide Web (WWW) development tool kit to create dynamic web
pages or complex web-based applications that can access DB2 databases, is provided by
a. Net.Data.
b. Lotus Approach.
c. SDK.
d. JDBC.
15. An easy-to-use interface for interfacing with UDB and other relational databases, is provided
by
a. Net.Data.
b. Lotus Approach.
c. SDK.
d. JDBC.
a. Net.Data.
b. Lotus Approach.
c. SDK.
d. JDBC.
17. A communication product that enables its users to connect to any database server that
implements the Distributed Relational Database Architecture (DRDA) protocol, including all
servers in the DB2 product family, is known as
a. DB2 Extender.
b. DB2 DataJoiner.
c. DB2 Connect.
d. None of these.
18. Access form network clients to DB2 databases residing on iSeries and zSeries host systems,
is provided by
20. A vehicle for extending DB2 with new types and functions to support operations, is known as
Oracle
24.1 INTRODUCTION
Year Feature
1979 Oracle Release 2-the first commercially available relational database to
use SQL.
1983 Single code base for Oracle across multiple platforms.
1984 Portable toolset.
1986 Client/server Oracle relational database.
1987 CASE and 4GL toolset.
1988 Oracle Financial Applications built on relational database.
1989 Oracle6.
1991 Oracle Parallel Server on massively parallel platforms.
1993 Oracle7 with cost-based optimiser.
1994 Oracle Version 7.1 generally available: parallel operations including query,
load and create index.
1996 Universal database with extended SQL via cartridges, thin client and
application server.
1997 Oracle8 generally available: including object-relational and Very Large
Database (VLDB) features.
1999 Oracle8i generally available: Java Virtual Machine (JVM) in the database.
2000 Oracle9i Application Server generally available: Oracle tools integrated in
middle tier.
2001 Oracle9i Database Server generally available: Real Application Clusters,
Advanced Analytic Services.
In 1998, Oracle announced Oracle8i, which is sometimes referred to as
Version 8.1 of the Oracle8 database. The “i” was added to denote added
functionality supporting Internet deployment in the new version. Oracle9i
followed, with Application Server available in 2000 and Database Server in
2001. The terms “Oracle”, “Oracle8”, “Oracle8i” and “Oracle9i” may appear
to be used somewhat interchangeably in this book, since Oracle9i includes
all the features of previous versions. When we describe a new feature that
was first made available specifically for Oracle8i or Oracle9i we have tried
to note that fact to avoid confusion, recognising that many of you may have
old releases of Oracle. We typically use the simple term “Oracle” when
describing features that are common to all these releases.
Oracle has focused development around a single source code model since
1983. While each database implementation includes some operating system-
specific source code, most of the code is common across the various
implementations. The interfaces that users, developers and administrators
deal with for each version are consistent. Features are consistent across
platforms for implementations of Oracle Standard Edition and Oracle
Enterprise Edition. As a result, companies have been able to migrate Oracle
applications easily to various hardware vendors and operating systems while
leveraging their investments in Oracle technology. From the company’s
perspective, Oracle has been able to focus on implementing new features
only once in its product set, instead of having to add functionality at
different times to different implementations.
Oracle Enterprise Edition Version of Oracle for a large number of users or a large
database with advanced features for extensibility,
performance and management.
Oracle Personal Edition Single-user version of Oracle typically used for
development of applications for deployment on other
Oracle versions.
24.3.1.2 SQL
The ANSI standard Structured Query Language (SQL) provides basic
functions for data manipulation, transaction control and record retrieval from
the database. However, most end users interact with Oracle through
applications that provide an interface that hides the underlying SQL and its
complexity.
24.3.1.3 PL/SQL
Oracle’s PL/SQL, a procedural language extension to SQL, is commonly
used to implement program logic modules for applications. PL/SQL can be
used to build stored procedures and triggers, looping controls, conditional
statements and error handling. You can compile and store PL/SQL
procedures in the database. You can also execute PL/SQL blocks via
SQL*Plus, an interactive tool provided with all versions of Oracle.
Oracle Reports, which provides a scalable middle tier for the reporting of prebuilt
query results.
Oracle Discoverer, for ad hoc query and relational online analytical processing
(ROLAP).
OLAP applications custom-built with JDeveloper.
Business intelligence beans that leverage Oracle9i Advanced Analytic Services.
Clickstream Intelligence.
24.3.4.6 Oracle9i AQ
It adds XML support and Oracle Internet Directory (OID) integration. This
technology is leveraged in Oracle Application Interconnect (OAI), which
includes adapters to non-Oracle applications, messaging products and
databases.
24.3.4.7 Availability
Although basic replication has been included with both Oracle Standard
Edition and Enterprise Edition, advanced features such as advanced
replication, transportable tablespaces and Advanced Queuing have typically
required Enterprise Edition.
When you are using Oracle, by default the degree of parallelism for any
operation is set to twice the number of CPUs. You can adjust this degree
automatically for each subsequent query based on the system load. You can
also generate statistics for the cost-based optimiser in parallel.
Maintenance functions can also be performed, such as loading (via
SQL*Loader), backups and index builds in parallel in Oracle Enterprise
Edition. Oracle Partitioning for the Enterprise Edition enables additional
parallel Data Manipulation Language (DML) inserts, updates and deletes as
well as index scans.
Rather than storing the actual value, a bitmap index uses an individual bit for each potential
value with the bit either “on” (set to 1) to indicate that the row contains the value or “off’ (set
to 0) to indicate that the row does not contain the value. This storage mechanism can also
provide performance improvements for the types of joins typically used in data warehousing.
Star query optimization: Typical data warehousing queries occur against a large fact table
with foreign keys to much smaller dimension tables. Oracle added an optimisation for this
type of star query to Oracle 7.3. Performance gains are realised through the use of Cartesian
product joins of dimension tables with a single join back to the large fact table. Oracle8
introduced a further mechanism called a parallel bitmap star join, which uses bitmap indexes
on the foreign keys to the dimension tables to speed star joins involving a large number of
dimension tables.
Materialised views: In Oracle, materialised views provide another means of achieving a
significant speed-up of query performance. Summary-level information derived from a fact
table and grouped along dimension values is stored as a materialised view. Queries that can
use this view are directed to the view, transparently to the user and the SQL they submit.
Analytic functions: A growing trend in Oracle and other systems is the movement of some
functions from decision-support user tools into the database. Oracle8i and Oracle9i feature
the addition of ANSI standard OLAP SQL analytic functions for windowing, statistics,
CUBE and ROLLUP and more.
Oracle9i Advanced Analytic Services: Oracle9i Advanced Analytic Services are a
combination of what used to be called OLAP Services and Data Mining. The OLAP services
provide a Java OLAP API and are typically leveraged to build custom OLAP applications
through the use of Oracle’s JDeveloper product. Oracle9i Advanced Analytic Services in the
database also provide predictive OLAP functions and a multidimensional cache for doing the
same kinds of analysis previously possible in Oracle’s Express Server.
The Oracle9i database engine also includes data-mining algorithms that are exposed through
a Java data-mining API.
Availability: Oracle Standard Edition lacks many important data warehousing features
available in the Enterprise Edition, such as bitmap indexes and materialised views. Hence,
use of Enterprise Edition is recommended for data warehousing projects.
24.3.6.7 Availability
Oracle Enterprise Manager can be used for managing Oracle Standard
Edition and/or Enterprise Edition. Additional functionality for diagnostics,
tuning and change management of Standard Edition instances is provided by
the Standard Management Pack. For Enterprise Edition, such additional
functionality is provided by separate Diagnostics, Tuning and Change
Management Packs.
You can define replication of subsets of data via SQL statements. Because
data distributed to multiple locations can lead to conflicts—such as which
location now has the “true” version of the data-multiple conflict and
resolution algorithms are provided. Alternatively, you can write your own
algorithm.
In the typical usage of Oracle Lite, the user will link her handheld or
mobile device running Oracle Lite to an Oracle Database Server. Data and
applications will be synchronised between the two systems. The user will
then remove the link and work in disconnected mode. After she has
performed her tasks, she will relink and resynchronise the data with the
Oracle Database Server.
24.4 SQL*PLUS
SQL> is the prompt you get when you are connected to the Oracle
database system. In SQL*Plus you can divide a statement into separate lines,
each continuing line is indicated by a prompt such 2>, 3> and so on. An
SQL statement must always be terminated by a semicolon (;). In addition to
the SQL statements, SQL*Plus provides some special SQL*Plus commands.
These commands need not be terminated by a semicolon. Upper and lower
case letters are only important for string comparisons. An SQL query can
always be interrupted by using <Control>C. To exit SQL*Plus you can
either type exit or quit.
The editor can be defined in the SQL*Plus shell by typing the command
de.ne editor = <name>, where <name> can be any editor such as emacs, vi,
joe or jove.
The command set linesize <number> can be used to set the maximum
length of a single line that can be displayed on screen.
set pagesize <number> sets the total number of lines SQL*Plus displays
before printing the column names and headings, respectively, of the selected
rows. Several other formatting features can be enabled by setting SQL*Plus
variables.
The command show all displays all variables and their current values.
To set a variable, type set <variable><value. For example, set timing on
causes SQL*Plus to display timing statistics for each SQL command that is
executed.
set pause on [<text>] makes SQL*Plus wait for you to press Return after
the number of lines defined by set pagesize has been displayed. <text> is the
message SQL*Plus will display at the bottom of the screen as it waits for
you to hit Return.
The tables and views provided by the data dictionary contain information
about the following:
Users and their privileges.
Tables, table columns and their data types, integrity constraints and indexes.
Statistics about tables and indexes used by the optimiser.
Privileges granted on database objects.
Storage structures of the database.
The SQL command select . from DICT[ONARY]; lists all tables and
views of the data dictionary that are accessible to the user. The selected
information includes the name and a short description of each table and
view. Before issuing this query, check the column definitions of
DICT[IONARY] using desc DICT[IONARY] and set the appropriate values
for column using the format command.
The query select . from TAB; retrieves the names of all tables owned by
the user who issues this command. The query select . from COL; returns all
information about the columns of one’s own tables.
Each SQL query requires various internal accesses to the tables and views
of the data dictionary. Since the data dictionary itself consists of tables,
Oracle has to generate numerous SQL statements to check whether the SQL
command issued by a user is correct and can be executed.
requires a verification whether (1) the table EMP exists, (2) the user has
the privilege to access this table, (3) the column SAL is defined for this table
and so on.
USER
Tuples in the USER views contain information about objects owned by the
account performing the SQL query (current user).
USER TABLES: all tables with their name, number of columns, storage information,
statistical information and so on (TABS).
USER CATALOG: tables, views and synonyms (CAT).
USER COL COMMENTS: comments on columns.
USER CONSTRAINTS: constraint definitions for tables.
USER INDEXES: all information about indexes created for tables (IND).
USER OBJECTS: all database objects owned by the user (OBJ).
USER TAB COLUMNS: columns of the tables and views owned by the user (COLS).
USER TAB COMMENTS: comments on tables and views.
USER TRIGGERS: triggers defined by the user.
USER USERS: information about the current user.
USER VIEWS: views defined by the user.
ALL
Rows in the ALL views include rows of the USER views and all information
about objects that are accessible to the current user. The structure of these
views is analogous to the structure of the USER views.
ALL CATALOGUE: owner, name and type of all accessible tables, views and
Synonyms.
ALL TABLES: owner and name of all accessible tables.
ALL OBJECTS: owner, type and name of accessible database objects.
ALL TRIGGERS …
ALL USERS …
ALL VIEWS …
DBA
24.6.1.3 Redo-Log-Buffer
This buffer contains information about changes of data blocks in the
database buffer. While the redo-log- buffer is filled during data
modifications, the log writer process writes information about the
modifications to the redo-log files. These files are used after, for example, a
system crash, in order to restore the database (database recovery). Shared
Pool The shared pool is the part of the SGA that is used by all users. The
main components of this pool are the dictionary cache and the library cache.
Information about database objects is stored in the data dictionary tables.
When information is needed by the database, for example, to check whether
a table column specified in a query exists, the dictionary tables are read and
the data returned is stored in the dictionary cache.
24.6.1.5 DBWR
This process is responsible for managing the contents of the database buffer
and the dictionary cache. For this, DBWR writes modified data blocks to the
data files. The process only writes blocks to the files if more blocks are
going to be read into the buffer than free blocks exist.
24.6.1.6 LGWR
This process manages writing the contents of the redo-log-buffer to the redo-
log files.
24.6.1.7 SMON
When a database instance is started, the system monitor process performs
instance recovery as needed (for example, after a system crash). It cleans up
the database from aborted transactions and objects involved. In particular,
this process is responsible for coalescing contiguous free extents to larger
extents.
24.6.1.8 PMON
The process monitor process cleans up behind failed user processes and it
also cleans up the resources used by these processes. Like SMON, PMON
wakes up periodically to check whether it is needed.
24.6.1.10 USER
The task of this process is to communicate with other processes started by
application programs such as SQL*Plus. The USER process then is
responsible for sending respective operations and requests to the SGA or
PGA. This includes, for example, reading data blocks.
24.6.2.1 Database
A database consists of one or more storage divisions, so-called tablespaces.
24.6.2.2 Tablespaces
A tablespace is a logical division of a database. All database objects are
logically stored in tablespaces. Each database has at least one tablespace, the
SYSTEM tablespace, that contains the data dictionary. Other tablespaces can
be created and used for different applications or tasks.
24.6.2.3 Segments
If a database object (for example, a table or a cluster) is created,
automatically a portion of the tablespace is allocated. This portion is called a
segment. For each table there is a table segment. For indexes, the so-called
index segments are allocated. The segment associated with a database object
belongs to exactly one tablespace.
24.6.2.4 Extent
An extent is the smallest logical storage unit that can be allocated for a
database object, and it consists a contiguous sequence of data blocks! If the
size of a database object increases (for example, due to insertions of tuples
into a table), an additional extent is allocated for the object. Information
about the extents allocated for database objects can be found in the data
dictionary view USER EXTENTS.
A special type of segments are rollback segments. They do not contain a
database object, but contain a “before image” of modified data for which the
modifying transaction has not yet been committed. Modifications are undone
using rollback segments. Oracle uses rollback segments in order to maintain
read consistency among multiple users. Furthermore, rollback segments are
used to restore the “before image” of modified tuples in the event of a
rollback of the modifying transaction. Typically, an extra tablespace (RBS)
is used to store rollback segments. This tablespace can be defined during the
creation of a database. The size of this tablespace and its segments depends
on the type and size of transactions that are typically performed by
application programs.
A database typically consists of a SYSTEM tablespace containing the data
dictionary and further internal tables, procedures etc., and a tablespace for
rollback segments. Additional tablespaces include a tablespace for user data
(USERS), a tablespace for temporary query results and tables (TEMP) and a
tablespace used by applications such as SQL*Forms (TOOLS).
24.6.3.2 Blocks
An extent consists of one or more contiguous Oracle data blocks. A block
determines the finest level of granularity of where data can be stored. One
data block corresponds to a specific number of bytes of physical database
space on disk. A data block size is specified for each Oracle database when
the database is created. A database uses and allocates free database space in
Oracle data blocks. Information about data blocks can be retrieved from the
data dictionary views USER SEGMENTS and USER EXTENTS. These
views show how many blocks are allocated for a database object and how
many blocks are available (free) in a segment/ extent.
As mentioned in Section 24.6.1, aside from datafiles three further types of
files are associated with a database instance:
STATE TRUE/FALSE
1. In 1983, a portable version of Oracle (Version 3) was created that ran only on Digital
VAX/VMS systems.
2. Oracle Personal Edition is the single-user version of Oracle Enterprise Edition.
3. Oracle8i introduced the use of Java as a procedural language with a Java Virtual Machine
(JVM) in the database.
4. National Language Support (NLS) provides character sets and associated functionality, such
as date and numeric formats, for a variety of languages.
5. SQL*Plus is used to issue ad-hoc queries and to view the query result on the screen.
6. The SGA serves as that part of the hard disk where all database operations occur.
1. Oracle is a
a. relational DBMS.
b. hierarchical DBMS.
c. networking DBMS.
d. None of these.
a. Lawrence Ellison.
b. Bob Miner.
c. Ed Oates.
d. All of these.
a. 1977.
b. 1979.
c. 1983.
d. 1985.
4. A portable version of Oracle (Version 3) was created that ran not only on Digital VAX/VMS
systems in
a. 1977.
b. 1979.
c. 1983.
d. 1985.
5. The first version of Oracle, version 2.0, was written in assembly language for the
a. Macintosh machine.
b. IBM Machine.
c. HP machine.
d. DEC PDP-11 machine.
a. System/R.
b. DB2.
c. Sybase.
d. None of these.
a. 1997.
b. 1999.
c. 2000.
d. 2001.
a. 1997.
b. 1999.
c. 2000.
d. 2001.
a. single-server architecture.
b. multi-server architecture.
c. Both (a) and (b).
d. None of these.
1. The first version of Oracle, version 2.0, was written in assembly language for the _____
machine.
2. Oracle 9i application server was developed in the year _____ and the database server was
developed in the year _____.
3. Oracle Liteis intended for single users who are using _____ devices.
4. Oracle’s PL/SQL is commonly used to implement _____ modules for applications.
5. Oracle Lite is Oracle’s suite of products for enabling _____ use of database-centric
applications.
6. SQL*Plus is the _____ to the Oracle database management system.
7. SGA is expanded as _____.
Chapter 25
25.1 INTRODUCTION
Example
This resulted in very inefficient performance at the SQL Server. Each time
a warehouse manager executed the query, the database server was forced to
recompile the query and execute it from scratch. It also required the
warehouse manager to have knowledge of SQL and appropriate permissions
to access the table information.
We can simplify this process through the use of a stored procedure. Let us
create a procedure called sp_GetInventory that retrieves the inventory levels
for a given warehouse. Here is the SQL code:
The New Delhi warehouse manager can use the same stored procedure to
access that area’s inventory.
The benefits of abstraction here is that the warehouse manager does not
need to understand SQL or the inner workings of the procedure. From a
performance perspective, the stored procedure will work wonders. The SQL
Sever creates an execution plan once and then reutilises it by plugging in the
appropriate parameters at execution time.
If you had downloaded the file, you may have the Download Complete
dialogue box, as shown in Fig. 25.2.
Fig. 25.2 Download complete dialog box
Step 03: In this case, click Open. A dialogue box will indicate where
the file would be installed as shown in Fig. 25.3.
Fig. 25.3 Installation folder
Step 04: You can accept the default and click Finish. You may be
asked whether you want to create the new folder that does
not exist and you should click Yes. After a while, you
should receive a message indicating success, as shown in
Fig. 25.4.
Step 1: To start SQL Server, on the Taskbar as shown in Fig. 25.5, click
Start -> Programs -> Microsoft SQL Server -> Service
Manager.
Step 6: On the SQL Server Service Manager dialogue box, click the
Stop button.
Step 7: You will receive a confirmation message box. Click Yes.
Step 3: You can also establish the connection through the SQL Query
Analyser. To do this, from the task bar, you can click Start →
(All) Programs → Microsoft SQL Server → Query Analyser.
This action would open the Connect to SQL Server dialogue
box as shown in Fig. 25.12.
Fig. 25.12 Connect to SQL Server dialogue box
Step 4: If the Enterprise Manager was already opened but the server or
none of its nodes is selected, on the toolbar of the MMC, you
can click Tools → SQL Query Manager. This also would
display the Connect to SQL Server dialogue box.
Step 1: To proceed, you can right-click the SQL Server Group node and
click New SQL Server Registration as shown in Fig. 25.14.
Step 2: Click Next in the first page of the wizard as shown in Fig.
25.15.
Step 3: In the Register SQL Server Wizard and in the Available Servers
combo box, you can select the desired server or click (local),
then click Add as shown in Fig. 25.16.
Step 4: After selecting the server, you can click Next. In the third page
of the wizard as shown in Fig. 25.17, you would be asked to
specify how security for the connection would be handled. If
you are planning to work in a non-production environment
where you would not be concerned with security, the first radio
button would be fine. In most other cases, you should select the
second radio button as it allows you to eventually perform some
security tests during your development. This second radio
button is associated with an account created automatically
during installation. This account is called sa.
Fig. 25.14 SQL Server group and registration dialogue box
Step 5: After making the selection, you can click Next. If you had
clicked the second radio button in the third page, one option
would ask you to provide the user name and the password for
your account as shown in Fig. 25.18. You can then type either sa
or Administrator (or the account you would be using) in the
Login Name text box and the corresponding password. The
second option would ask you to let the computer prompt you for
a username and a password. For our exercise, you should accept
the first radio button, then type a username and a password.
Fig. 25.18 Select connection option dialogue box
Step 6: The next (before last) page would ask you to add the new
server to the existing SQL Server Group as shown in Fig.
25.19. If you prefer to add the server to another group, you
would click the second radio button, type the desired name
in the Group Name text box and click Next.
Step 7: Once all the necessary information has been specified, you
can click Finish.
Step 8: When the registration of the server is over, if everything is
fine, you would be presented with a dialogue box
accordingly as shown in Fig. 25.21.
Step 9: You can then click Close.
Step 10: To specify the computer you want connecting to, if you are
working from the SQL Server Enterprise Manager, you can
click either (local) or the name of the server you want to
connect to as shown in Fig. 25.22.
Fig. 25.19 Select SQL Server group dialogue box
Step 12: If the SQL Server Enterprise Manager is already opened and
you want to open SQL Query Analyser as shown in Fig.
25.23, in the left frame, you can click the server or any node
under the server to select it. Then, on the toolbar of the
MMC, click Tools → SQL Query Analyser. In this case, the
Query Analyser would open directly.
25.4.6 Security
An important aspect of establishing a connection to a computer is security.
Even if you are developing an application that would be used on a
standalone computer, you must take care of this issue. The security referred
to in this attribute has to do with the connection, not how to protect your
database.
If you are using SQL Server Enterprise Manager, you can simply connect
to the computer using the steps we have reviewed so far.
Step 1: If you are accessing SQL Query Analyser from the taskbar
where you had clicked Start → (All) Programs → Microsoft
SQL Server → Query Analyser, after selecting the computer in
the SQL Server combo box, you can specify the type of
authentication you want. If security is not an issue in this
instance, you can click the Windows Authentication radio
button as shown in Fig. 25.24.
Step 2: If you want security to apply and if you are connecting to SQL
Query Analyser using the Connect To SQL Server dialogue box,
you must click the SQL Server Authentication radio button as
shown in Fig. 25.25.
Step 3: If you are connecting to SQL Query Analyser using the Connect
To SQL Server dialogue box and you want to apply
authentication, after selecting the second radio button, this
would prompt you for a username.
Step 4: If you are “physically” connecting to the server through SQL
Query Analyser, besides the username, you can (must) also
provide a password to complete the authentication as shown in
Fig. 25.26.
Fig. 25.24 Connect to SQL Server dialogue box
Step 5: After providing the necessary credentials and once you click
OK, the SQL Query Analyser would display as shown in Fig.
25.27.
Fig. 25.27 SQL query analyzer display
If you are not trying to connect to one particular database, you do not need
to locate and click any. If you are attempting to connect to a specific
database, in SQL Server Enterprise Manager, you can simply click the
desired database as shown in Fig. 25.29.
After using a connection and getting the necessary information from it,
you should terminate it. If you are working in SQL Server Enterprise
Manager or the SQL Query Analyser, to close the connection, you can
simply close the window as an application.
Any of these actions causes the Database Properties to display. You can
then enter the name of the database.
To assist you with writing code, the SQL Query Analyser includes
sections of sample code that can provide placeholders. To access one these
codes, on the main menu of SQL Query Analyser, click File → New. Then,
in the general property page of the New dialogue box, you can double-click
a category to see the available options.
R Q
1. What is Microsoft SQL Server? Explain.
2. What is Microsoft SQL Server 2000? What are its components? Explain.
3. Write the features of Microsoft SQL server.
4. What do you mean by stored procedures in SQL Server? What are its benefits?
5. Explain the structure of stored procedure.
STATE TRUE/FALSE
1. Microsoft SQL Server is a multithreaded server that scales from laptops and desktops to
enterprise servers.
2. Microsoft SQL Server can operate on clusters and symmetrical multiprocessing (SMP)
configurations.
3. SQL Profiler provides a window into the inner workings of the database.
4. Data Transmission Services (DTS) provide an extremely flexible method for importing and
exporting data between a Microsoft SQL Server installation and a large variety of other
formats.
5. SQL Server does not provide any graphical tools.
a. Relational DBMS.
b. Hierarchical DBMS.
c. Networking DBMS.
d. None of these.
a. 1980.
b. 1990.
c. 2000.
d. None of these.
a. Windows NT system.
b. UNIX system.
c. Both (a) and (b).
d. None of these.
a. clusters.
b. symmetrical multiprocessing.
c. personal digital assistant.
d. All of these.
1. Microsoft SQL Server is a _____ management system that was originally developed in 1980s
at _____ for _____ systems.
2. Query Analyser offers a quick method for performing _____ against any of your SQL Server
databases.
3. Microsoft SQL Server provides the stored procedure mechanism to simplify the database
development process by grouping _____ into _____.
4. SQL Server supports network management using the _____.
5. SQL Server supports distributed transactions using _____ and _____.
Chapter 26
Microsoft Access
26.1 INTRODUCTION
26.2.1 Tables
Tables in Access database are tabular arrangements of information. Columns
represent fields of information, or one particular piece of information that
can be stored for each entity in the table. The rows of the table contain the
records. A record contains one of each field in the database. Although a field
can be left blank, each record in the database has the potential for storing
information in each field in the table. Fig. 26.1 shows some of the fields and
records in an Access table.
Generally each major type of information in the database is represented by
a table. You might have a Supplier table, a Client table and an Employee
table. It is unlikely that such dissimilar information would be placed together
in the same table, although this information is all part of the same database.
Access Table Wizard makes table creation easy. When you use the Wizard
to build a table, you can select fields from one or more sample tables. Access
allows you to define relationships between fields in various tables. Using
Wizards, you can visually connect data in the various tables by dragging
fields between them.
Access provides two different views for tables, namely the Design view
and the Datasheet view. The Design view, as shown in Fig. 26.2, is used
when you are defining the fields that store the data in the table. For each
field in the table you define the field name and data type. You can also set
field properties to change the field format and caption (used for the fields on
reports and forms), provide validation rules to check data validity, create
index entries for the field and provide a default value.
In the Datasheet view, you can enter data into fields or look at existing
records in the table. Fig. 26.1 and 26.2 show the same Employee table: Fig.
26.1 presents the Datasheet view of it and Fig. 26.2 shows the design view.
Fig. 26.1 Access table in datasheet view
26.2.2 Queries
Access supports different kinds of queries, such as select, crosstab and
action queries. You can also create parameters that let you customise the
query each time you use it. Select queries choose records from a table and
display them in a temporary table called a dynaset. Select queries are
essentially questions that ask Access about the entries tables. You can create
queries with a Query-by-Example (QBE) grid. The entries you make in this
grid tell Access which fields and records you want to appear in a temporary
table (dynaset) that shows the query results. You can use completed
combinations of criteria to define your needs and see only the records that
you need. Fig. 26.3 shows the entries in the QBE grid that will select the
records you want. This QBE grid includes a Sort row that allows you to
specify the order of records in the resulting dynaset.
Fig. 26.2 Design view for a table
Queries can include calculated fields. These fields do not actually exist in
any permanent table, but display the results of calculations that use the
contents of one or more fields. Queries that use calculated fields let you
derive more meaningful information from the data you record in your tables,
such as year-end totals for sales and expenditures. The Query Wizard can
guide you through the steps of crating some common, but more complicated
types of queries.
26.2.3 Reports
In reports, you can see the detail as you can with a form on the screen but
you can also look at many records at the same time. Reports also let you
look at summary information obtained after reading every record in the
table, such as totals or averages. Reports can show the data from either a
table or a query. Fig. 26.5 shows a report created with Access. The drawing
was created using CoralDraw software.
Access can use OLE and DDE, which are windows features that let you
share data between applications. The Report Wizard of Access helps you in
creating reports.
26.2.4 Forms
You can use forms to view the records in tables or to add new records.
Unlike datasheets, which present many records on the screen at one time,
forms have a narrower focus and usually present one record on the screen at
a time. You can use either queries or tables as the input for a form. You can
create forms using Form Wizard of Access. Access also has an AutoForm
feature that can automatically create a form for a table or query.
Controls are placed on a form to display fields or text. You can select
these controls and move them to a new location or resize them to give your
form the look you want. You can move the controls for fields and the text
that describes that field separately. You can also add other text to the form.
You can change the appearance of text on a form by changing the font or
making the type boldface or italic. You also can show text as raised or
sunken or use a specific colour. Lines and rectangles can be added to a form
to enhance its appearance. Fig. 26.6 shows a form developed to present data
in an appealing manner.
Fig. 26.5 Access report
Forms allow you to show data from more than one table. You can build a
query first to select the data from different tables to appear on a form or use
sub-forms to handle the different tables you want to work with. A sub-form
displays the records associated with a particular field on a form. Sub-forms
provide the best solution when one record in a table relates to many records
in another table. Sub-forms allow you to show the data from one record at
the top of the form with the data from related records shown below it. For
example, Fig. 26.7 shows a form that displays information from the Client
table at the top of the form and information from the Employee Time Log
table in the bottom half of the form, in a sub-form.
A form has events that you can have Access perform as different things
occur. Events happen at particular points in time in the use of a form. For
example, moving from one record to the next is an event. You can have
macros or procedures assigned to an event to tell Access what you want to
happen when an event occurs.
26.2.5 Macros
Macros are a series of actions that describe what you want Access to do.
Macros are an ideal solution for repetitive tasks. You can specify the exact
steps for a macro to perform and the macro can repeat them whenever you
need these steps executed again, without making a mistake.
Access macros are easy to work with. Access lets you select from a list of
all the actions that you can use in a macro. Once you select an action, you
use arguments to control the specific effect of the action. Arguments differ
for each of the actions, since each action requires different information
before it can perform a task. Fig. 26.8 shows macro instructions entered in a
Macro window. For many argument entries, Access provides its best guess at
which entry you will want; you only need to change the entry if you want
something different.
You can create macros for a command button in a form that will open
another form and select the records that appear in the other form. Macros
also allow other sophisticated options such as custom menus and popup
forms for data collection. Menu Builder box of Access offers easier way to
create custom menus to work with macros.
You can execute macros from the database window or other locations. Fig.
26.9 shows a number of macros in the Database Window. You can highlight
a macro and then select Run to execute it.
Fig. 26.8 Access macro
Step 1: Open your database. If you have not already installed the
Northwind sample database, these instructions will assist you.
Otherwise, you need to go to the File tab, select Open and locate
the Northwind database on your computer.
Step 2: Select the queries tab. This will bring up a listing of the
existing queries that Microsoft included in the sample database
along with two options to create new queries as shown in Fig.
26.10.
Step 3: Double-click on “create query by using wizard”. The query
wizard simplifies the creation of new queries as shown in Fig.
26.10.
Step 4: Select the appropriate table from the pull-down menu. When
you select the pull-down menu as shown in Fig. 26.11, you will
be presented with a listing of all the tables and queries currently
stored in your Access database. These are the valid data sources
for your new query. In this example, we want to first select the
Products table, which contains information about the products
we keep in our inventory.
Fig. 26.11 Simple query wizard with pull-down menu
Step 5: Choose the fields you wish to appear in the query results by
either double-clicking on them or by single clicking first on the
field name and then on the “>” icon as shown in Fig. 26.12. As
you do this, the fields will move from the Available Fields
listing to the Selected Fields listing. Notice that there are three
other icons offered. The “>>” icon will select all available
fields. The “<“ icon allows you to remove the highlighted field
from the Selected Fields list while the “<<“ icon removes all
selected fields. In this example, we want to select the
ProductName, UnitsInStock and UnitsOnOrder from the
Product table.
Fig. 26.12 Field selection for query result
Step 11: Click on Finish. You will be presented with the two windows
below. The first window (Fig. 26.15) is the Query tab that we
started with. Notice that there’s one additional listing now-the
Product Supplier Listing we created. The second window (Fig.
26.16) contains our results-a list of our company products,
inventory levels and the supplier’s name and telephone number!
Fig. 26.15 Query tab
You have successfully created your first query using Microsoft Access!
Step 1: Choose an open table entry. Look for an entry in the field row
that does not contain any information. Depending upon the size
of your window you may need to use the horizontal scroll bar at
the bottom of the table to locate an open entry.
Step 2: Select the desired field. Single click in the field portion of the
chosen entry and a small black down arrow will appear. Click
this once and youwill be presented with a list of currently
available fields as shown in Fig. 26.18. Select the field of
interest by single clicking on it. In our example, we want to
choose the ContactName field from the Suppliers table (listed as
Suppliers. Contact Name).
Step 1: Click on the field name. Single click on the name of the field
you wish to remove in the query table. In our example, we want
to remove the CompanyName field from the Suppliers table.
Step 2: Open the Edit menu and select Delete Columns. Upon
completion of this step, as shown in Fig. 26.19, the
CompanyName column will disappear from the query table.
Fig. 26.19 Removing fields
Step 1: Select the criteria field of interest. Locate the field that you
would like to use as the basis for the filter and single click
inside the criteria box for that field. In our example, we would
first like to limit the query based upon the UnitsInStock field of
the Products table.
Step 2: Type the selection criteria. We want to limit our results to those
products with less than ten items in inventory. To accomplish
this, enter the mathematical expression “< 10” in the criteria
field as shown in Fig. 26.20.
Step 3: Repeat steps 1 and 2 for additional criteria. We would also like
to limit our results to those instances where the UnitsOnOrder
field is equal to zero as shown in Fig. 26.20. Repeat the steps
above to include this filter as well.
Step 1: Click the Sort entry for the appropriate field. Single click in
the Sort area of the field entry and a black down arrow will
appear. Single click on this arrow and you’ll be presented with a
list of sort order choices as shown in Fig. 26.22. Do this for the
Products.ProductName field in our example.
Step 2: Choose the sort order. For text fields, ascending order will sort
alphabetically and descending order will sort by reverse
alphabetic order as shown in Fig. 26.22. We want to choose
ascending order for this example.
That is it! Close the design view by clicking the “X” icon in the upper
right corner. From the database menu, double click on our query name and
you’ll be presented with the desired results as shown in Fig. 26.23.
It allows us to create the framework (forms, tables and so on) for storing information in a
database.
Microsoft Access allows opening the table and scrolling through the records contained within
it.
Microsoft Access forms provide a quick and easy way to modify and insert records into your
databases.
Microsoft Access has capabilities to answer more complex requests or queries.
Access queries provide the capability to combine data from multiple tables and place specific
conditions on the data retrieved.
Access provides a user-friendly forms interface that allows users to enter information in a
graphical form and have that information transparently passed to the database.
Microsoft Access provides features such as reports, web integration and SQL Server
integration that greatly enhance the usability and flexibility of the database platform.
Microsoft Access provides native support for the World Wide Web.
Features of Access 2000 provide interactive data manipulation capabilities to web users.
Microsoft Access provides capability to tightly integrate with SQL Server, Microsoft’s
professional database server product.
R Q
1. What is Microsoft Access?
2. How are tables, forms, queries and reports created in Access? Explain.
3. What are the different types of queries that are supported by Access? Explain each of them.
4. What do you mean by macro? Explain how macros are used in Access.
5. What is form in Access? What are its purposes?
STATE TRUE/FALSE
1. Microsoft Access is a powerful and user-friendly database management system for UNIX
systems.
2. Access supports Object Linking and Embedding (OLE) and dynamic data exchange (DDE).
3. Access provides a graphical user interface (GUI).
4. Reports, forms and queries are difficult to design and execute with Access.
5. Access considers both the tables of data that store your information and the supplement
objects that present information and work with it, to be part of the database.
6. Select queries are essentially questions that ask Access about the entries tables.
7. In Access, you cannot create queries with a Query-by-Example (QBE) grid.
1. Access is a
a. relational DBMS.
b. hierarchical DBMS.
c. networking DBMS.
d. none of these.
a. when you are defining the fields that store the data in the table.
b. to enter data into fields or look at existing records in the table.
c. to create parameters that let you customise the query.
d. None of these.
a. when you are defining the fields that store the data in the table.
b. to enter data into fields or look at existing records in the table.
c. to create parameters that let you customise the query.
d. None of these.
a. a table.
b. a query.
c. either a table or a query.
d. None of these.
1. Microsoft Access is a powerful and user-friendly database management system for _____.
2. Access provides two different views for tables, namely (a) _____ and (b)_____.
3. Select queries choose records from a table and display them in a temporary table called a
_____.
4. Crosstab queries provide a concise summary view of data in a _____ format.
5. Action provides four types of action queries, namely (a) _____, (b) _____, (c) _____and
(d)_____.
Chapter 27
MySQL
27.1 INTRODUCTION
Column Types
Full operator and function support in the SELECT and WHERE clauses of queries.
For example: mysql> SELECT CONCAT(first_name, ‘‘, last_name)
→ FROM citizen
Full support for SQL GROUP BY and ORDER BY clauses. Support for group
functions (COUNT(), COUNT(DISTINCT …), AVG(), STD(), SUM(), MAX(),
MIN() and GROUP_CONCAT()).
Support for LEFT OUTER JOIN and RIGHT OUTER JOIN with both standard
SQL and ODBC syntax.
Support for aliases on tables and columns as required by standard SQL.
DELETE, INSERT, REPLACE and UPDATE return the number of rows that were
changed (affected). It is possible to return the number of rows matched instead by
setting a flag when connecting to the server.
The MySQL-specific SHOW command can be used to retrieve information about
databases, database engines, tables and indexes. The EXPLAIN command can be
used to determine how the ptimiser resolves a query.
Function names do not clash with table or column names. For example, ABS is a
valid column name. The only restriction is that for a function call, no spaces are
allowed between the function name and the ‘(‘ that follows it.
You can mix tables from different databases in the same query.
Security: A privilege and password system that is very flexible and secure, and that allows
host-based verification. Passwords are secure because all password traffic is encrypted when
you connect to a server.
Scalability and Limits.
Handles large databases. We use MySQL Server with databases that contain 50
million records.
Up to 64 indexes per table are allowed. Each index may consist of 1 to 16 columns
or parts of columns. The maximum index width is 1000 bytes. An index may use a
prefix of a column for CHAR, VARCHAR, BLOB or TEXT column types.
Connectivity.
Clients can connect to the MySQL server using TCP/IP sockets on any platform. On
Windows systems in the NT family (NT, 2000, XP or 2003), clients can connect
using named pipes. On Unix systems, clients can connect using Unix domain socket
files.
In MySQL versions 4.1 and higher, Windows servers also support shared-memory
connections if started with the-shared-memory option. Clients can connect through
shared memory by using the-protocol = memory option.
The Connector/ODBC (MyODBC) interface provides MySQL support for client
programs that use ODBC (Open Database Connectivity) connections. For example,
you can use MS Access to connect to your MySQL server. Clients can be run on
Windows or Unix. MyODBC source is available. All ODBC 2.5 functions are
supported, as are many others.
The Connector/J interface provides MySQL support for Java client programs that
use JDBC connections. Clients can be run on Windows or Unix. Connector/J source
is available.
Localisation
The MySQL server has built-in support for SQL statements to check, optimise, and
repair tables. These statements are available from the command line through the
mysqlcheck client. MySQL also includes myisamchk, a very fast command-line
utility for performing these operations on MyISAM tables.
All MySQL programs can be invoked with the-help or -? options to obtain online
assistance.
Other ways to work around file-size limits for MyISAM tables are as
follows:
If your large table is read-only, you can use myisampack to compress it. myisampack
usually compresses a table by at least 50%, so you can have, in effect, much bigger tables.
myisampack also can merge multiple tables into a single table.
MySQL includes a MERGE library that allows you to handle a collection of MyISAM tables
that have identical structure as a single MERGE table.
Subqueries 4.1
R-trees 4.1 (for MyISAM tables)
Cursors 5.0
Foreign keys 5.1 (implemented in 3.23 for InnoDB)
Triggers 5.0 and 5.1
MySQL 4.0 has a query cache that can give a huge speed boost to applications with
repetitive queries.
Version 4.0 further increases the speed of MySQL Server in a number of areas, such
as bulk INSERT statements, searching on packed indexes, full-text searching (using
FULLTEXT indexes) and COUNT(DISTINCT).
The new Embedded Server library can easily be used to create standalone and
embedded applications. The embedded server provides an alternative to using
MySQL in a client/server environment.
InnoDB storage engine as standard.
The InnoDB storage engine is offered as a standard feature of the MySQL server.
This means full support for ACID transactions, foreign keys with cascading
UPDATE and DELETE and row-level locking are standard features.
New functionality.
Internationalisation.
Our German, Austrian and Swiss users should note that MySQL 4.0 supports a new
character set, latin1_de, which ensures that the German sorting order sorts words
with umlauts in the same order as do German telephone books.
Usability enhancements.
Most mysqld parameters (startup options) can be set without taking down the
server. This is a convenient feature for database administrators (DBAs).
Multiple-table DELETE and UPDATE statements have been added.
On Windows, symbolic link handling at the database level is enabled by default. On
Unix, the MyISAM storage engine supports symbolic linking at the table level (and
not just the database level as before).
SQL_CALC_FOUND_ROWS and FOUND_ROWS() are new functions that make
it possible to find out the number of rows a SELECT query that includes a LIMIT
clause would have returned without that clause.
Speed enhancements.
Faster binary client/server protocol with support for prepared statements and
parameter binding.
BTREE indexing is supported for HEAP tables, significantly improving response
time for non-exact searches.
New functionality.
CREATE TABLE tbl_name2 LIKE tbl_name1 allows you to create, with a single
statement, a new table with a structure exactly like that of an existing table.
The MyISAM storage engine supports OpenGIS spatial types for storing
geographical data.
Replication can be done over SSL connections.
The new client/server protocol adds the ability to pass multiple warnings to the
client, rather than only a single result. This makes it much easier to track problems
that occur in operations such as bulk data loading.
SHOW WARNINGS shows warnings for the last command.
To support applications that require the use of local languages, the MySQL software
offers extensive Unicode support through the utf8 and ucs2 character sets.
Character sets can be defined per column, table and database. This allows for a high
degree of flexibility in application design, particularly for multi-language Web sites.
Per-connection time zones are supported, allowing individual clients to select their
own time zone when necessary.
Usability enhancements.
Not all platforms are equally well-suited for running MySQL. How well a
certain platform is suited for a high-load mission-critical MySQL server is
determined by the following factors:
General stability of the thread library. A platform may have an excellent reputation
otherwise, but MySQL is only as stable as the thread library it calls, even if everything else is
perfect.
The capability of the kernel and the thread library to take advantage of symmetric multi-
processor (SMP) systems. In other words, when a process creates a thread, it should be
possible for that thread to run on a different CPU than the original process.
The capability of the kernel and the thread library to run many threads that acquire and
release a mutex over a short critical region frequently without excessive context switches. If
the implementation of pthread_mutex_lock() is too anxious to yield CPU time, this hurts
MySQL tremendously. If this issue is not taken care of, adding extra CPUs actually makes
MySQL slower.
General file system stability and performance.
If your tables are big, the ability of the file system to deal with large files at all and to deal
with them efficiently.
To begin working with php you must first have access to either of the
following:
A web hosting account that supports the use of PHP web pages and grants you access to
MySQL databases.
Have PHP and MySQL installed on your own computer.
OR
OR
<script language=“php”>
php_code_here
</script>
So, what kind of code goes where it says php_code_here? Here is a quick
example.
<html>
<head>
<title>My Simple Page </title>
</head>
<body>
<body>
</html>
If you copy that code to a text editor and then view it from a web site that
has PHP enabled you get a page that says Hi There. The echo command
displays whatever is within quotes to the browser. There is also a print
command which does the same thing. Note the semicolon after the quoted
string. The semicolon tells PHP that the command has finished. It is very
important to watch your semicolons! If you do not, you may spend hours
debugging a page. You have been warned.
A little more information can gained by using the PHP info command:
<html>
<head>
<title>My Simple Page</title>
</head>
<body>
<?php phpinfo(); ?>
</body>
</html>
This page will display a bunch of information about the current PHP setup
on the server as well as tell you about the many built in variables that are
available.
It is important to note that most server configurations require that your
files be named with a .php3 extension in order for them to be parsed. Name
all of your PHP coded files filename.php3.
We can now use that variable to replace text throughout the page, as in the
example below:
<html>
<head>
<title>My Simple Page</title>
</head>
<body>
The above code creates a page that prints the words “Hello World”. One
reason to use variables is that you can set up a page that repeats a value
throughout and then only need to change the variable value to make all the
values on the page change.
</form>
</body>
</html>
Let us look at a few of the highlights in this page. The first is the action of
this page, tips.php3. That means that the web server is going to send the
information contained in this form to a page on the server called tips.php3
which is in the same folder as the form.
The names of the input items are also important. PHP will automatically
create a variable with that name and set its value equal to the value that is
sent.
Now we need to create a PHP page that will handle the data. Of course,
this page needs to be named tips.php3. The source is listed below. One way,
perhaps the best way, to create a PHP page is to create the results page in a
graphical editor, highlighting areas where dynamic content should go. You
can then use a text editor to replace the highlighted area with PHP.
1. <!DOCTYPE HTML PUBLIC “-//W3C//DTD HTML 4.0 Transitional//EN”
2. “http://www.w3.org/TR/REC-html40/loose.dtd”>
3. <html>
4. <head>
5. <title>Tip Calculation Complete</title>
6. </head>
7. <body>
8.
9. <?php
10.
11. if ($sub_total = = “”) { echo “<h4>Error: You need to input a total!
</h4>”;}
12.
13. if ($tip percent = = “”) {echo “<h4>Error: You need to input a tip
percentage
!</h4>”;}
14.
15. $tip_decimal = $tip percent/100;
16.
17. $tip = $tip_decimal * $sub_total;
18.
19. $total = $sub_total + $tip;
20.
21. ?>
22.
23. <form action=“tips.php3” method=“get”>
24.
25. <p>Meal Cost: <strong>$<?php echo $sub_total; ?></strong></p>
26.
27. <p>Tip %: <strong><?php echo $tip percent; ?>%</strong></p>
28.
29. <p>Tip Amount: <strong>$<?php echo $tip; ?></strong></p>
30.
31. <p>Total: <font size=“+1” color=“#990000”><strong>$<?php echo
$total;?></strong></font></p>
32.
33. </form>
34.
35. </body>
36. </html>
Note, that the line numbers are there for illustrative purposes only. Please
do not include them in your source code.
The server is the name of the server we want to connect to. Because all of
our scripts are going to be run locally on your web server, the correct address
is localhost.
<?php
mysql_connect(“localhost”, “admin”, “1admin”) or
die(mysql_error());
echo “Connected to MySQL<br />”;
?>
Display:
Connected to MySQL
If you load the above PHP script to your webserver and everything works
properly, then you should see “Connected to MySQL” displayed when you
view the .php page.
The mysql_connect function takes three arguments. Server, username and
password. In our example above these arguments where:
Server - localhost.
Username - admin.
Password - 1admin.
The “or die(mysql…” code display an error message in your browser if,
you guessed it, there is an error!
<?php
mysql_connect(“localhost” “admin” “1admin”) or
die(mysql_error());
echo “Connected to MySQL<br />”;
mysql_select_db(“test”) or die(mysql_error());
echo “Connected to Database”;
?>
Display:
Connected to MySQL
Connected to Database
This table has three categories or “columns”, of data: Age, Height and
Weight. This table has four entries, or in other words, four rows.
<?php
// Make a MySQL Connection
mysql_connect(“localhost” “admin” “1admin”) or
die(mysql_error());
mysql_select_db(“test”) or die(mysql_error());
Display:
Table Created!
The first part of the mysql_query told MySQL that we wanted to create a
new table. We capitalised the two words because they are reserved MySQL
keywords.
The word “example” is the name of our table, as it came directly after
“CREATE TABLE”. It is a good idea to use descriptive names when creating
a table, such as: employee information, contacts or customer orders. Clear
names will ensure that you will know what the table is about when revisiting
it a year after you make it.
Here we create a column “id” that will automatically increment each time a
new entry is added to the table. This will result in the first row in the table
having an id = 1, the second row id = 2, the third row id = 3 and so on.
PRIMARY KEY is used as a unique identifier for the rows. Here, we have
made “id” the PRIMARY KEY for this table. This means that no two ids can
be the same, or else we will run into trouble. This is why we made “id” an
auto incrementing counter in the previous line of code.
‘name VARCHAR(30),’
Here we make a new column with the name “name”! VARCHAR stands for
variable/character. We will most likely only be using this column to store
characters (A-Z, a-z). The numbers inside the parentheses sets the limit on
how many variables/characters can be entered. In this case, the limit is 30.
‘age INT,’
Our third and final column is age, which stores an integer. Notice that there
are no paratheses following “INT”, as SQL already knows what to do with
an integer. The possible integer values that can be stored within an “INT” are
-2,147,483,648 to 2,147,483,647, which is more than enough!
‘or die(mysql_error());’
This will print out an error if there is a problem in the creation process.
Display:
Data Inserted!
Again we are using the msql_query function. “INSERT INTO” means that
data is going to be put into a table. The name of the table we specified is
“example”.
‘(name,age) VALUES(Timmy Mellowman’,‘23’)’)’
“(name, age)” are the two columns we want to add data in. “VALUES”
means that what follows is the data to be put into the columns that we just
specified. Here, we enter the name Kumar Abhishek for “name” and the age
23 for “age”.
<?php
// Make a MySQL Connection
mysql_connect(“localhost”, “admin”, “ladmin”) or
die(mysql_error());
mysql_select_db(“test”) or die(mysql_error());
Display:
When you perform a SELECT query on the database it will return a MySQL
result. We want to use this result in our PHP code, so we need to store it in a
variable. $result now holds the result from our mysql_query.
This line of code reads “Select everything from the table example”. The
asterisk is the wild card in MySQL which just tells MySQL to no exclude
anything in its selection.
mysql_fetch_array returns the first associative array of the mysql result that
we pass to it. Here we are passing our MySQL result $result and the function
will return the first row of that result, which includes the data “Kumar
Abhishek” and “23”.
In our MySQL table “example” there are only two fields that we care
about: name and age. These names are the keys to extracting the data from
our associative array. To get the name we use $row[‘name’] and to get the
age we use $row[‘age’]. MySQL is case sensitive, so be sure to use
capitalization in your PHP code that matches the MySQL column names.
<?php
// Make a MySQL Connection
mysql_connect(“localhost”, “admin”, “ladmin”) or
die(mysql_error());
mysql_select_db(“test”) or die(mysql_error());
?>
Display:
Name Age
Kumar Abhishek 23
Kumar Avinash 21
Alka Singh 15
We only had two entries in our table, so there are only two rows that
appeared above. If you added more entries to your table then you may see
more data than what is above.
‘$result = mysq_query’
When you select items from a database using mysql_query, the data is
returned as a MySQL result. Since we want to use this data in our table we
need to store it in a variable. $result now holds the result from our
mysql_query.
This line of code reads “Select everything from the table example”. The
asterisk is the wild card in MySQL which just tells MySQL to get
everything.
The mysql_fetch_array function gets the next in line associative array from a
MySQL result. By putting it in a while loop it will continue to fetch the next
array until there is no next array to fetch. At this point the loop check will
fail and the code will continue to execute.
In our MySQL table “example” there are only two fields that we care
about: name and age. These names are the keys to extracting the data from
our associative array. To get the name we use $row[‘name’] and to get the
age we use $row[‘age’].
wget http://www.washington.edu/
computing/web/publishing/mysql
-standard-4.1.11-ibm-aix4.3.3.0
-powerpc.tar.gz
lynx:
gunzip-cd mysql-standard-4.1.11-ibm-
aix4.3.3.0-powerpc.tar.gz | tar xvf -
cd mysql
./scripts/mysql_install_db
Step 5: The script informs you that a root password should be set.
You will do this in a few more steps.
Step 6: If you are upgrading an existing version of MySQL, move
back your .my.cnf file:
mv ~/.my.cnf.temp ~/.my.cnf
This requires that you keep the same port number for your
MySQL server when installing the new software.
Step 7: If you are installing MySQL for the first time, get the path
to your home directory:
echo $HOME
Note this down, as you will need the information in the next
step.
Create a new file called .my.cnf in your home directory.
This file contains account-specific settings for your MySQL
server.
pico ~/.my.cnf
Copy and paste the following lines into the file, making the
substitutions listed below:
[mysqld]
port=XXXXX
socket=/hw13/d06/accountname/mysql
.sock
basedir=/hw13/d06/accountname/mysql
datadir=/hw13/d06/accountname/mysql
/data
old-passwords
[client]
port=XXXXX
socket=/hw13/d06/accountname/mysqlm
.sock
Note: You must use a port number that is not already in use.
You can test a port number by typing telnet localhost
XXXXX(again replacing XXXXX with the port number). If
it says “Connection Refused”, then you have a good
number. If it says something ending in “Connection closed
by foreign host.” then there is already a server running on
that port, so you should choose a different number.
rm -R ~/mysql/data
cp -R ~/mysql-bak/data ~/mysql/data
./bin/mysqld_safe &
[1] 67786
% Starting mysqld daemon with databases
from
/hw13/d06/accountname/mysql/data
Step 10: At this point your MySQL password is still empty. Use the
following command to set a new root password:
./bin/mysql -u root -p
mysql>
Step 13: You are done! A MySQL server is now running in your
account and is ready to accept connections. At this point you
can learn about MySQL administration to get more familiar
with MySQL, and you can install phpMyAdmin to help you
administer your new database server.
You can delete the file used to install MySQL with the
following command:
rm ~/mysql-standard-4.1.11-ibm-aix4.3.3.0-
powerpc.tar.gz
R Q
1. What is MySQL?
2. What are the features of MySQL?
3. What do you mean by MySQL stability? Explain.
4. Discuss the features available in MySQL 4.0.
5. What do you mean by embedded MySQL Server?
6. What are the features of MySQL Server 4.1?
7. What are MySQL mailing lists? What does MySQL mailing list contain?
8. What are the operating systems supported by MySQL?
9. What is PHP? What is relevance with MySQL?
STATE TRUE/FALSE
1. MySQL is
a. relational DBMS.
b. Networking DBMS.
c. Open source SQL DBMS.
d. Both (a) and (c).
a. David Axmark.
b. Allan Larsson.
c. Michael “Monty” Widenius.
d. All of these.
a. Subqueries.
b. Unicode support.
c. Both (a) and (b).
d. None of these.
4. PHP allows to
Teradata RDBMS
28.1 INTRODUCTION
The Teradata Tools and Utilities software, together with the Teradata
relational database management system (RDBMS) software, permits
communication between a Teradata client and a Teradata RDBMS.
specifies the Teradata password that is associated with the Teradata user
name. One must also specify USER=.
proc sql;
connect to teradata as dbcon
(user=kamdar pass=ellis);
quit;
In Example 1, SAS/ACCESS
connects to the Teradata DBMS using the alias dbcon;
performs no other work.
Example 2
proc sql;
connect to teradata as tera ( user=kamdar
password=ellis ); execute (drop table salary) by tera;
execute (create table salary (current
salary float, name char(10))) by tera;
execute (insert into salary values (35335.00,
‘Dan J.’)) by tera;
execute (insert into salary values (40300.00,
‘Irma L.’)) by tera;
disconnect from tera;
quit;
In Example 2, SAS/ACCESS
connects to the Teradata DBMS using the alias tera;
drops the SALARY table;
recreates the SALARY table;
inserts two rows;
disconnects from the Teradata DBMS.
Example 3
proc sql;
connect to teradata as tera ( user=kamdar
password=ellis );
execute (update salary set current
salary=45000
where (name=‘Alka Singh’)) by
tera;
disconnect from tera;
quit;
In Example 3, SAS/ACCESS
connects to the Teradata DBMS using the alias tera.
updates the row for Alka Singh, changing her current salary to Rs. 45,000.00.
disconnects from the Teradata DBMS.
Example 4
proc sql;
connect to teradata as tera2 ( user=kamdar
password=ellis ) ;
select * from connection to tera2 (select *
from salary);
disconnect from tera2;
quit;
In Example 4, SAS/ACCESS
connects to the Teradata database using the alias tera2;
selects all rows in the SALARY table and displays them using PROC SQL;
disconnects from the Teradata database.
All client components are based on CLI or ODBC or both. So, once the
client software is installed, these two components should be configured
appropriately before these client utilities are executed.
Teradata RDBMS is able to support JDBC programs in both forms of
application and applet. The client installation manual mentions that we need
to install JDBC driver on client computers, and we also need to start a
JDBC Gateway and Web server on database server. Teradata supports at
least two types of JDBC drivers. The first type can be loaded locally and the
second should be downloadable. In either ways, to support the
development, we need local JDBC driver or Web server/JDBC Gateway
running on the same node on which Query Manager is running. But in the
setup CD we received, there is neither JDBC driver nor any Java
development tools. Moreover, Web server is not started on tour system yet.
One floppy disk is needed, which contains licenses for all components that can be installed.
Each component has one entry in the license txt file.
If it is asked to choose ODBC or Teradata ODBC with DBQM enhanced version to install,
just ignore it. In this case, one cannot install DBQM_Admin, DBQM_Client and
DBQM_Server. These three components are used to optimize the processing of the SQL
queries. The client software still works smoothly without them.
Because CLI and ODBC are the infrastructures of other components, either of them may not
be deleted from the installation list if there is any component based on it.
After ODBC installation, it will be asked to run ODBC administrator to configure a Data
Source Name (DSN). It may be canceled simply because this job can be done later. After
Teradata Manager installation, it will be asked to run Start RDBMS Setup. This can also be
done later.
Add one line into the hosts file: “130.108.5.57 teradatacop1”. Here, 130.108.5.57 is the IP
address of the top node of the system on which Query Manager is running. “teradata” will
be the TDPID which is used in many client components we installed. “cop” is a fixed suffix
string and “1” indicate that there is one RDBMS.
Fig. 28.2 Finding hosts file and setting network parameters
For Windows 2000, perform the following step: Start -> Settings -> Control Panel, as shown
in Fig. 28.4.
Find the icon “System”, double click it, get the following window, then choose “Advanced”
sub-window as shown in Figs. 28.5 and 28.6.
Fig. 28.5 Selecting “System” option
This file is copied to the computer when Teradata client software is installed. It contains
some default settings for CLI.
This can be set as we want. If the file does not exist, when an error occurs, the client
software will create the file to record the log information.
TDMSTPORT = 1025
Because our server is listening the connect request on port 1025, it should be set as 1025.
This system environment variable is added for the future usage. One should not insert a line
tdmst 1025/TCP
We can find the file clispb.dat after we install the client software. In our computer, it is
under the directory C:\Program Files\NCR\Teradata Client. Please use Notepad to open it.
Originally, i_dbcpath was set as dbc. That is not the same as what was set in the file hosts.
So it was modified as teradata. When we use some components based on CLI and do not
specify the TDPID or RDBMS, the components will open this file to find this default
setting. Therefore, it is suggested to set it as what is set in the file hosts.
For other entries in this file, we can just keep them as original settings.
To use utilities such as Queryman and WinDDI, we still need to configure a Data Source
Name (DSN) for our self.
There are four methods to install Teradata Tools and Utilities products:
Installing with PUT: The Teradata Parallel Upgrade Tool is an alternative method of
installing some of the Teradata tools and Utilities products purchased.
Installing with the Client Main Install: All Teradata Tools and Utilities products, except
for the OLE DB Provider, for Teradata can be installed using the Client Main Install. The
Client Main Install is typical of Windows installations, allowing three forms of installation:
Typical installation: A typical installation installs all the products on each CD.
Custom installation: A custom installation installs only those products selected
from the available products.
Network installation: A network installation copies the installation packages for
the selected products to a specified folder. The network installation does not
actually install the products. This must be done by the user.
Installing Teradata JDBC driver by copying files: Starting with Teradata Tools and
Utilities Release 13.00.00, the three Teradata JDBC driver files are included on the utility
pack CDs. To install Teradata JDBC driver, the three files are manually copied from the
\TeraJDBC directory in root on the CD ROM into a directory of choice on the target client.
Installing from the command prompt: Teradata Tools and Utilities packages are
downloaded from a patch server, or copied using Network Setup Type, then installed on the
target machine by providing the package response file name as an input to the setup. exe
command at the command prompt. These packages are installed silently.
Downloading files from the Teradata Download Center: Several Teradata Tools and
Utilities products can be downloaded from the Teradata Download Center located at:
http://www.teradata.com/resources/drivers-udfs-and-toolbox
Teradata Call Level Interface version 2 (CLIv2): This product and its dependent
products namely Teradata Generic Security Services (TDGSS) Client and the
Shared ICU Libraries for Teradata are available for download.
ODBC Driver for Teradata (ODBC): This product and its dependent products
namely Teradata Generic Security Services (TeraGSS) Client and the Shared ICU
Libraries for Teradata are available for download.
Additionally, three other products are available from the Teradata Download Center:
Step 1: After highlighting Typical and clicking Next in the initial Setup
Type dialog box, the Choose Destination Location dialog box appears. If
the default path shown in the Destination Folder block is acceptable, click
Next. (This is recommended).
As shown in Fig. 28.9, to use a destination location other than the default,
click Browse, navigate to the location where the files are to be installed,
click OK, then click Next.
One must have write access to the destination folder, the Windows root
folder and the Windows system folder.
Step 2: In the Select Install Method dialog box as shown in Fig. 28.10,
select the products to automatically install (silent install) or clear the
products to interactively install:
a. Highlight the products to be installed silently.
The ODBC Driver for Teradata and Teradata Manager can be installed silently or
interactively; the default is interactive.
Teradata SQL Assistant/Web Edition, Teradata MultiTool and Teradata DQM
Administrator can only be installed interactively.
All other products can be installed silently or interactively; the default is silent.
b. Those not highlighted will be installed interactively, meaning that the product setup
sequence will be activated so you can make adjustments during installation.
c. Click Next.
Fig. 28.9 Choosing destination location
Step 2: In the License Agreement dialog box, read the agreement, select
I accept the terms in the license agreement, then click Next.
http://tssprod.teradata.com:8080/TSFS/home.do
Step 3: In the Setup Type dialog box, click the name of the desired
installation setup, then click Next:
Custom is for advanced users who want to choose the options to install.
Typical is recommended for most users. All ODBC driver programs will be installed.
Step 5: In the Start Copying Files dialog box, review the information.
When satisfied that it is correct, click Next.
The driver installation begins. During installation, progress monitors
appear. No action is required.
Step 1: After highlighting Custom and clicking Next in the initial Setup
Type dialog box, the Select Components dialog box appears as shown in
Fig. 28.14. Do the following:
Select the check boxes for the products to install.
Clear the check boxes for the products not to install.
Click Next.
Fig. 28.14 Select component dialog box
If the product selected is dependent on other products, then those are also
selected.
If there are questions about the interdependence of products, and
Teradata MultiTool is being installed without having Java 2 Runtime
Environment, install it when prompted to do so.
Step 3: In the Select Install Method dialog box as shown in Fig. 28.16,
select automatic (silent) or interactive installation for each product.
a. Highlight the products being installed silently.
Shared ICU Libraries for Teradata can only be installed in the silent mode from the
CD media.
The ODBC Driver for Teradata and Teradata Manager can be installed silently or
interactively; the default is interactive.
Teradata SQL Assistant/Web Edition, Teradata MultiTool and Teradata DQM
Administrator can only be installed interactively.
All other products can be installed silently or interactively; the default is silent.
b. The products not highlighted are installed interactively, meaning that the product setup
sequence is activated so that adjustments can be made during installation.
c. Click Next.
Step 4: After the installation process copies the necessary files to the
specified folder, the Setup Complete dialog box appears. Choose whether
or not to view the Release Definition, then click Finish.
Fig. 28.18 Choose destination location dialog box
Some usage tips and examples for the frequently used commands are given here.
Teradata SQL statement doesn’t begin with a dot character, but it must end with a ‘;’
character.
Both BTEQ commands and SQL statements can be entered in any combination of uppercase
and lowercase and mixed-case formats.
.Logoff
After we enter the last ‘;’ and hit the [enter] key, these SQL requests will
be submitted as a transaction. If anyone of these has an error, the whole
transaction will be rolled back.
.logon teradata/thomas
PASSWORD:thomaspass
In the above example, we connect to RDBMS called “teradata”. “teradata” is the TDPID of
the server. “thomas” is the userid and “thomaspass” is the password of the user.
LOGOFF
.logoff
Just logoff from the current user account without exiting from BTEQ.
EXIT or QUIT
.exit
.quit
These two commands are the same. After executing them, it will exit from BTEQ.
SHOW VERSIONS
.show versions
SECURITY
Specify the security level of messages sent from network-attached systems to the Teradata
RDBMS. By the first one, only messages containing user passwords, such as CREATE
USER statement, will be encrypted. By the second one, all messages will be encrypted.
SESSIONS
.sessions 5
.repeat 3
After executing the above commands one by one, it will create five sessions running in
parallel. Then it will execute select request three times. In this situation, three out of the five
sessions will execute the select statement one time in parallel.
QUIET
.set quiet on
If switched off, the result of the command or SQL statement will not be displayed.
SHOW CONTROLS
.show controls
.show control
.set retlimit 4
Just display the first 4 rows of the result table and ignore the rest.
.set retlimit 0
RECORDMODE
.set recordmode on
SUPPRESS
.set suppress on 3
If the third column of the students table is Department Name, then the same department
names will be display only once on the terminal screen.
SKIPLINE/SKIPDOUBLE
.set skipline on 1
.set skipdouble on 3
During the display of result table, if the value in column 1 changes, skip one blank line to
display the next row. If the value in column 3 changes, skip two blank lines to display the
next row.
FORMAT
.set format on
Add the heading line, report title and footing line to the result displayed on the terminal
screen.
OS
.os command
c:\progra~l1\ncr\terada~\bin> dir
The first command allows entering the Windows/Dos command prompt status. Then OS
commands such as dir, del, copy, etc. can be entered.
.os dir
Another way to execute the OS command is entering the command after the .os keyword.
RUN
We can run a script file which contains several BTEQ commands and SQL requests. Let us
see the following example:
.set defaults
.set defaults
2. RUN
If the working directory of BTEQ is not same as the directory containing the file,
we must specify the full path.
SYSIN and SYSOUT are standard input and output streams of BTEQ. They can be
redirected as the following example:
Start -> programs -> accessories -> command prompt
In the above example, all output will be written into result.txt file but not to the terminal
screen. If runfile.txt file is placed in the root directory c:\, we can redirect the standard input
stream of BTEQ as the following example:
EXPORT
Command keyword “report” specifies the format of output. “file” specifies the name of
output file. If we want to export the data to a file for backup, use the following command:
.export data file = exdata
select * from students;
we will get all data from the select statement and store them into the file, exdata, in a special
format.
After exporting the required date, we should reset the export options.
.export reset
IMPORT
As mentioned above, we have already stored all data of the students table into the file
exdata. Now, we want to restore them into database. See the following example:
The third command requires BTEQ to execute the following command five times.
The last command has three lines. It will insert one row into the students table each time.
MACRO
We can use the SQL statements to create a macro and execute this macro at any time. See
the following example:
This macro executes one BTEQ command and one SQL request.
execute MyMacro1;
In the following demo, each step has been described of using Teradata
DBMS to develop a DB application. In the example, there are two users:
John and Mike. John is the administrator of the application database. Mike
works for John and he is the person who manipulates the table students in
the database everyday.
John was created by the Teradata DBMS administrator and was granted the privileges to
create a USER and a DATABASE, as shown in Fig. 28.21. In Teradata DBMS, the owner
automatically has all privileges on the database he/she creates.
Fig. 28.22 shows how to create a user. In Teradata DBMS, user is seen as a special database.
The difference between user and database is that a user has a password and can logon to the
DBMS, while a database is just a passive object in DBMS. Fig. 28.23 shows how to create a
database.
John is the owner of user Mike and database student_info. John has all privileges on this
database such as creating table, executing select, insert, update and delete statements. But
we notice that Mike does not have any privilege on this database now. So John needs to
grant some privileges to Mike for his daily work as shown in Fig. 28.24.
Fig. 28.24 Granting privilege to Mike
Create table
After granting appropriate privileges to Mike, John needs to create a table for storing the
information of all students. First, he must specify the database containing the table as shown
in Fig. 28.25.
Using SQL statements such as Select, Insert, Delete and Update Now, Mike can logon and
insert some data into the table students as shown in Figs. 28.27 through 28.29.
Fig. 28.27 Inserting data into table
In the Fig. 28.30, Mike inserts a new row whose first field is “00003”. We notice that
there are two rows whose first fields have the same value. So, Mike decides to delete one of
them as shown in Fig. 28.31.
Fig. 28.30 Inserting a new row in the table
.exit
Figs. 28.33 through 28.38 show each step of creating the DSN used in the
user application:
a. start -> settings -> control panel (Fig. 28.33)
Fig. 28.33 Selecting “Control Panel” option for creating DSN
d. The ODBC Data Source Administrator window lists all DSN already created on the
computer as shown in Fig. 28.36. Now, click the button “Add…”
e. When asked to choose one ODBC driver for the Data Source, choose Teradata. (Fig. 28.37).
Fig. 28.37 Choosing Teradata option
f. As shown in Fig. 28.38, we then need to type in all information about the DSN, such as IP
address of server, username, password and the default database we will use.
We can access the table students via ODBC interface. We need to include
the following files (for developing demo for ODBC interface on Windows
NT/2000 by using VC++ 6.0):
#include <sql.h>
#include <sqlext.h>
#include <odbcinst.h>
#include <odbcss.h>
#include <odbcver.h>
SQLAllocEnv(&DSNhenv);
SQLAllocConnect(ODBChenv, &ODBChdbc);
SQLConnect(ODBChdbc, DataSourceNname, DBusername,
DBuserpassword );
SQLAllocStmt(ODBChdbc, &ODBChstmt);
Construct the SQL command string
SQLExecDirect(ODBChstmt, (UCHAR *)command, SQL_NTS);
if (ODBC_SUCCESS)
{
SQLFetch(ODBChstmt);
while (ODBC_SUCCESS)&&(data set is not empty)
{
processing the data
SQLFetch(ODBChstmt);
}
}
SQLFreeStmt(ODBChstmt, SQL_DROP);
SQLDisconnect(ODBChdbc);
SQLFreeConnect(ODBChdbc);
SQLFreeEnv(DSNhenv);
Copy all files of the project onto the target PC and double click the file
ODBCexample.dsw. VC++ 6.0 developing studio will load the Win32
project automatically as shown in Fig. 28.39. Then, choose menu item
“build” or “execute ODBCexample”.
Fig. 28.39 Loading Win32 and executing ODBC example
In the next window (Fig. 28.40), we can see all DSN defined on our PC.
We can choose TeradataExample created in Step 2. We do not need to
provide the user name mike and the password mikepass, because they were
already set in DSN. Then, click the button “Connect To Database” to
connect to the Teradata DBMS server.
Fig. 28.40 Choosing Teradata Example from all defined DSN lists
Click ”>>” button to enter next window sheet. Now, after pressing “Get
Information of All Tables In The Database”, we can see all tables in the
database including students created in Step 1. Then we can choose one table
from the leftmost listbox and enter the next window sheet by clicking ”>>”
as shown in Fig. 28.41.
Fig. 28.41 Listing of all tables
After typing the SQL statement in the edit box, you can press button “Get
Information” to execute it as shown in Fig. 28.44. If we want to add a
student, please click “Add Student” as shown in Fig. 28.45.
Fig. 28.44 Choosing “Get Information” option
R Q
1. What is Tearadata technology? Who developed Teradata? Explain.
2. Discuss hardware, software and operating system platforms on which Teradata works.
3. Discuss the features of Teradata.
4. List some of the Teradata utilities and Teradata products that are generally used.
5. Briefly discuss the arguments that are used to connect to Teradata using Teradata-specific
SQL procedures. Give examples.
6. What is the purpose of using pass-through facility in Teradata? Discuss with examples how
pass-through facilities are implemented using Teradata-specific SQL procedures.
7. What is Teradata database? With a neat diagram, briefly discuss various components of a
Teradata database.
8. Discuss the functions of parsing engine (PE) and Access Module Processor (AMP).
9. Briefly discuss the various components of Teradata Director Program (TDM) and Teradata
Client software.
STATE TRUE/FALSE
a. 1980
b. 1990
c. 1979
d. none of these.
3. Teradata company was founded by group of people namely, Dr. Jack E. Shemer, Dr. Philip
M. Neches, Walter E. Muir, Jerold R. Modes, William P. Worth and Carroll Reed.
a. 5 group of people
b. an individual
c. a corporate house
d. none of these.
4. The concept of Teradata grew out of research at the California Institute of Technology
(Caltech) and from the discussions of Citibank’s advanced technology group.
8. BTEQ is a component of
a. CLI
b. TDP
c. Teradata client software
d. none of these.
a. data
b. command
c. SQL statement
d. all of these.
STATE TRUE/FALSE
1. Data
2. Fact, processed/organized/summarized data
3. DBMS
4. (a) Data description language (DDL), (b) data manipulation language (DML)
5. DBMS
6. Database Management System
7. Structured Query Language
8. Fourth Generation Language
9. (a) Operational Data, (b) Reconciled Data, (c) Derived Data
10. Data Definition Language
11. Data Manipulation Language
12. Each of the data mart (a selected, limited, and summarized data warehouse)
13. (a) Entities, (b) Attributes, (c) Relationships, (d) Key
14. (a) Primary key, (b) Secondary key, (d) Super key, (d) Concatenated key
15. (a) Active data dictionary, (b) passive data dictionary
16. Conference of Data Systems Languages
17. List Processing Task Force
18. Data Base Task Force
19. Integrated Data Store (IDS)
20. Bachman
21. Permanent.
STATE TRUE/FALSE
STATE TRUE/FALSE
TICK (✓) THE APPROPRIATE ANSWER
a. fixed-length records,
b. variable-length records
22. Search-key
23. Fixed, flexible (removeable)
24. Access time
25. Access time
26. Primary key
27. Direct file organization
28. Head activation time
29. Primary (or clustering) index
30. Indexed-sequential file
31. Indexed-sequential file
32. Sequential file
33. Sectors
34. Bytes of storage area
35. Compact disk-recordable
36. WORM
37. Root
38. A data item or record
39. IBM.
STATE TRUE/FALSE
STATE TRUE/FALSE
STATE TRUE/FALSE
STATE TRUE/FALSE
1. One-to-one (1:1)
2. Subtype, subset, supertype
3. Subtype, supertype
4. Enhanced Entity Relationship (EER) model
5. Redundancy
6. Shared subtype
7. Supertype (or superclass), specialization/generalization
8. Supertype, subclass, specialization/generalization
9. Mandatory
10. Optional
11. One
12. Attribute inheritance
13. ‘d’, circle
14. ‘o’, circle
15. Shared subtype
16. Generalization
17. Generalization
18. Enhanced Entity Relationship.
CHAPTER 8 INTRODUCTION TO DATABASE DESIGN
STATE TRUE/FALSE
STATE TRUE/FALSE
CHAPTER 10 NORMALIZATION
STATE TRUE/FALSE
1. Decomposing, redundancy
2. Normalization
3. Normalization
4. E. F. Codd
5. Atomic
6. 1NF, fully functionally dependent
7. Composite, attribute
8. 1NF
9. Primary (or relation) key
10. X Y
11. 3NF
12. Candidate key
13. 3NF, 2NF
14. Primary, candidate, candidate
15. MVDs
16. Consequence.
STATE TRUE/FALSE
TICK (✓) THE APPROPRIATE ANSWER
STATE TRUE/FALSE
TICK (✓) THE APPROPRIATE ANSWER
1. Logical unit
2. Concurrency control
3. Wait-for-graph.
4. Read, Write
5. Concurrency control
6. All actions associated, none
7. (a) Atomicity, (b) Consistency, (c) Isolation, (d) Durability
8. Isolation
9. Transaction recovery subsystem
10. Record, transactions, database
11. Recovery subsystem
12. A second transaction
13. Data integrity
14. Consistent state
15. Concurrency control
16. Committed
17. Updates
18. Aborted
19. Non-serial schedule
20. Serializability
21. Cascading rollback
22. Granularity
23. Validation or certification method
24. Rollback
25. Transaction
26. Inconsistency
27. Serializability
28. Database record
29. Multiple-mode
30. READ
31. Concurrent processing
32. Unlocking
33. All locks, new.
STATE TRUE/FALSE
1. Database recovery
2. Rollback
3. Global undo
4. Force approach, force writing
5. No force approach
6. A single-user
7. NO-UNDO/NO-REDO
8. Transaction management
9. (a) data inconsistencies, (b) data loss
10. (a) hardware failure, (b) software failure, (c) media failure, (d) network failure
11. Inconsistent state, consistent
12. (a) different building, (b) protected against danger
13. Main memory
14. (a) loss of main memory including the database buffer, (b) the loss of the disk copy
(secondary storage) of the database
15. Head crash (record scratched by a phonograph needle)
16. COMMIT point
17. Without waiting, transaction log
18. (a) a current page table, (b) a shadow page table
19. Force-written
20. Buffer management, buffer manager.
CHAPTER 14 DATABASE SECURITY
STATE TRUE/FALSE
STATE TRUE/FALSE
STATE TRUE/FALSE
1. RDMS, object-oriented
2. Complex objects type
3. Object-oriented
4. Complex data
5. HP
6. Universal server
7. Client, server
8.
STATE TRUE/FALSE
STATE TRUE/FALSE
STATE TRUE/FALSE
STATE TRUE/FALSE
STATE TRUE/FALSE
STATE TRUE/FALSE
STATE TRUE/FALSE
STATE TRUE/FALSE
1. Windows
2. (a) the Design view, (b) the Datasheet view
3. Dynaset
4. Spreadsheet
5. (a) make-table, (b) delete, (c) append, (d) update.
S. K. SINGH
Data: A known fact that can be recorded and that have implicit meaning.
Data: The smallest unit of data that has meaning to its user.
Key: Data item (or field) for which a computer uses to identify a record in a
database system.
Schema: A framework into which the values of the data items (or fields)
are fitted.
Model: A representation of the real world objects and events and their
associations.
SQL Data Query Language (DQL): SQL statements that enable the users
to query one or more tables to get the information they want.
CASE Tools: Software that provides automated support for some portion of
the systems development process.
CHAPTER-10 NORMALIZATION
Normal Form: State of a relation that results from applying simple rules
regarding functional dependencies (FDs) to that relation.
Query Decomposition: The first phase of query processing whose aims are
to transform a high-level query into a relational algebra query and to check
whether that query is syntactically and semantically correct.
World Wide Web (WWW): A subset of the Internet that uses computers
called Web servers to store multimedia files.
S. K. SINGH
Copyright © 2011 Dorling Kindersley (India) Pvt. Ltd.
Licensees of Pearson Education in South Asia.
No part of this eBook may be used or reproduced in any manner whatsoever without the publisher’s
prior written consent.
This eBook may or may not include all assets that were part of the print version. The publisher
reserves the right to remove any material present in this eBook at any time, as deemed necessary.
ISBN 9788131760925
ePub ISBN 9789332503212
Head Office: A-8(A), Sector 62, Knowledge Boulevard, 7th Floor, NOIDA 201 309, India.
Registered Office: 11 Local Shopping Centre, Panchsheel Park, New Delhi 110 017, India