0% found this document useful (0 votes)
11 views

Database Module

The document provides an introduction to database systems, defining a database as a collection of shared information managed by a database management system (DBMS). It discusses various data management approaches, including manual, traditional file-based, and database approaches, highlighting their limitations and the advantages of the database approach such as data integration, sharing, and program-data independence. The document emphasizes the importance of databases in organizations for efficient data management and decision-making.

Uploaded by

kalkidanasdro11
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Database Module

The document provides an introduction to database systems, defining a database as a collection of shared information managed by a database management system (DBMS). It discusses various data management approaches, including manual, traditional file-based, and database approaches, highlighting their limitations and the advantages of the database approach such as data integration, sharing, and program-data independence. The document emphasizes the importance of databases in organizations for efficient data management and decision-making.

Uploaded by

kalkidanasdro11
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 110

Fundamentals of Database System

Chapter One
Introduction to Database system
1.1 Database - Definition and Usage
Database systems are designed to manage large data set in an organization. The data
management involves both definition and the manipulation of the data which ranges from simple
representation of the data to considerations of structures for the storage of information. The data
management also consider the provision of mechanisms for the manipulation of information.
Today, Databases are essential to every business. They are used to maintain internal records, to
present data to customers and clients on the World-Wide-Web, and to support many other
commercial processes. Databases are likewise found at the core of many modern organizations.
The power of databases comes from a body of knowledge and technology that has developed
over several decades and is embodied in specialized software called a database management
system, or DBMS. A DBMS is a powerful tool for creating and managing large amounts of data
efficiently and allowing it to persist over long periods of time, safely. These systems are among
the most complex types of software available.
Thus, for our question: What is a database? In essence a database is nothing more than a
collection of shared information that exists over a long period of time, often many years. In
common dialect, the term database refers to a collection of data that is managed by a DBMS.
Thus the DB course is about:
 How to organize data
 Supporting multiple users
 Efficient and effective data retrieval
 Secured and reliable storage of data
 Maintaining consistent data
 Making information useful for decision making

1.2 Data Management Approaches


Data management passes through the different levels of development along with the
development in technology and services. These levels could best be described by categorizing
the levels into three levels of development. Even though there is an advantage and a problem
overcome at each new level, all methods of data handling are in use to some extent. The major
three levels are;
1. Manual Approach
2. Traditional File Based Approach
3. Database Approach

AASTU Compiled by Chere L. (M.Tech) Page 1


Fundamentals of Database System
There are two major reason to study the alternative data management approaches
 Understanding the problem in those system or approaches, prevents us from repeating
similar problem.
 If you want convert these approaches to a database system, understanding how these
system work will be extremely useful.
1) Manual Approach
In the manual approach, data storage and retrieval follows the primitive and traditional way of
information handling where cards and paper are used for the purpose. The data storage and
retrieval will be performed using human labour.
• Files for as many event and objects as the organization has are used to store information.
• Each of the files containing various kinds of information is labelled and stored in one ore
more cabinets.
• The cabinets could be kept in safe places for security purpose based on the sensitivity of
the information contained in it.
• Insertion and retrieval is done by searching first for the right cabinet then for the right the
file then the information.
• One could have an indexing system to facilitate access to the data

Limitations of the Manual approach


• Prone to error
• Difficult to update, retrieve, integrate
• You have the data but it is difficult to compile the information
• Limited to small size information
• Cross referencing is difficult
An alternative approach of data handling is a computerized way of dealing with the information.
The computerized approach could also be either decentralized or centralized base on where the
data resides in the system.

2) Traditional File Based Approach


After the introduction of Computer for data processing to the business community, the need to
use the device for data storage and processing increase. There were, and still are, several
computer applications with file based processing used for the purpose of data handling. Even
though the approach evolved over time, the basic structure is still similar if not identical.
The term File-Based approach refers to the situation where data is stored in one or more separate
computer files. Typically for example, the details of customers may be stored in one file, orders

AASTU Compiled by Chere L. (M.Tech) Page 2


Fundamentals of Database System
in another etc. Computer programs to perform the various tasks required by the business process
data that is stored in computer files. Each program, or sometimes a related set of programs, are
called computer applications.
For example, all of the programs associated with processing customer’s orders are referred to as
the order processing application. The file-based approach might have application programs that
deal with purchase orders, invoices, sales and marketing, suppliers, customers, employees, and
so on. We can imagine that some of these different programs might use the same data. If the data
is kept in different files, there could be problems when an item of data needs updating, as it will
need to be updated in all the relevant files; if this is not done, the data will be inconsistent, and
this could lead to errors. The problem could be made even worse if different items of data are
changed in different departments, so that the invoice application uses a different address from the
sales mailing list program for the same customer.
The following diagram shows how different applications will each have their own copy of the
files they need in order to carry out the activities for which they are responsible.

For example, one user, the grade reporting office, may keep a file on students and their grades.
Programs to print a transcript and to enter new grades into the file are implemented as part of the
application. A second user, the accounting office, may keep track of students fees and their
payments. Although both are interested in the data about students, each user maintains separate
files and programs to manipulate the files because each requires some data not available from the
other users files.
Summary
• File based systems were an early attempt to computerize the manual filing system.
• This approach is the decentralized computerized data handling method.

AASTU Compiled by Chere L. (M.Tech) Page 3


Fundamentals of Database System
• A collection of application programs that perform services for the end-users. In such
systems, every application program that provides service to end users define and manage
its own data
• Such systems have number of programs for each of the different applications in the
organization.
• Since every application defines and manages its own data, the system is subjected to
serious data duplication problem.
• File, in traditional file based approach, is a collection of records which contains logically
related data.

Limitations of the Traditional File Based approach


As business application become more complex demanding more flexible and reliable data
handling methods, the shortcomings of the file based system became evident. These
shortcomings include, but not limited to:
1) Separation and isolation of data:- When data is isolated in separate files, it is more difficult
for us to access data that should be available. Available information in one application may not
be known. The application programmer is required to synchronize the processing of two or more
files manually to ensure the correct data is extracted.

AASTU Compiled by Chere L. (M.Tech) Page 4


Fundamentals of Database System
2) Duplication of data:- When employing the decentralized file-based approach, the
uncontrolled duplication of data is occurred. Uncontrolled duplication of data is undesirable
because duplication is wasteful (money and time cost) and can lead to loss of data integrity
3) Data dependence on the application:- Using file-based system, the physical structure and
storage of the data files and records are defined in the application program code. This
characteristic is known as program-data dependence. Making changes to an existing data
structure are rather difficult and will lead to a modification of program. Such maintenance
activities are time-consuming and subject to error.
4) Incompatible file formats or data structures (e.g. “C” and COBOL):- The structures of the
file are dependent on the application programming language. However file structure provided in
one programming language such as direct file, indexed-sequential file which is available in
COBOL programming, may be different from the structure generated by other programming
language such as C. The direct incompatibility makes them difficult to process jointly.
5) Fixed queries / proliferation of application programs:- File-based systems are very
dependent upon the application programmer. Any required queries or reports have to be written
by the application programmer. Normally, a fixed format query or report can only be entertained
and no facility for ad-hoc queries if offered. Generally file based approach support fixed query
processing which is defined during application development
6) Update anomalies:- The most significant problem experienced by the traditional file based
approach of data handling is the “update anomalies”. We have three types of update anomalies;
1. Modification Anomalies: a problem experienced when one ore more data value is
modified on one application program but not on others containing the same data set.
2. Deletion Anomalies: a problem encountered where one record set is deleted from one
application but remain untouched in other application programs.
3. Insertion Anomalies: a problem experienced whenever there is new data item to be
recorded, and the recording is not made in all the applications. And when same data item
is inserted at different applications, there could be errors in encoding which makes the
new data item to be considered as a totally different object.
7) Limited data sharing:- Every application maintains its own data.
8) Less security measure:- Difficulty of protecting data from unauthorized access and providing
different level of access privilege.
The limitations for the traditional file based data handling approach arise from two basic reasons.
i. Definition of the data is embedded in the application program which makes it difficult to
modify the database definition easily.
ii. No control over the access and manipulation of the data beyond that imposed by the
application programs.

AASTU Compiled by Chere L. (M.Tech) Page 5


Fundamentals of Database System
The Shared File Approach
One approach to solving the problem of each application having its own set of files is to share
files between different applications. This will alleviate the problem of inconsistent data between
different applications, and is illustrated in the diagram below.

The introduction of shared files solves the problem of inconsistent data across different versions
of the same file held by different departments, but other problems may emerge, including:
• When each department had its own version of a file for processing, each department
could ensure that the structure of the file suited their specific application. If departments
have to share files, the file structure that suits one department might not suit another, for
example, data might need to be sorted in a different sequence for different applications
(for instance, customer details could be stored in alphabetical order, or numerical order,
or ascending or descending order of customer number).
• Some applications may require access to more data than others, for instance a credit
control application will need access to customer credit limit information, whereas an
delivery note printing application will only need access to customer name and address
details. The file will still need to contain the additional information to support the
application that requires it.
• If the structure of the data file needs to be changed in some way (for example, to reflect a
change in currency), this alteration will need to be reflected in all application programs
that use that data file. This problem is known as physical data dependence, and will be
examined in more detail later in the unit.
• While a data file is being processed by one application, the file will not be available for
other applications or for ad hoc queries. This is because, if more than one application is
allowed to alter data in a file at one time, serious problems can arise in ensuring that the
updates made by each application do not clash with one another. This issue of ensuring
consistent, concurrent updating of information is an extremely important one, and is dealt
with in detail for database systems in the unit on concurrency control. File-Based systems
avoid these problems by not allowing more than one application to access a file at one
time.

AASTU Compiled by Chere L. (M.Tech) Page 6


Fundamentals of Database System
3) Database Approach
A database is a collection of information that is organized so that it can easily be accessed,
managed, and updated. The database approach emphasizes the integration and sharing of data
throughout the organization. Thus in Database Approach logically related collection of data are
stored in database.
Database is just a computerized record keeping system or a kind of electronic filing cabinet. It
can viewed as a repository for collection of computerized data files.
Database is a shared collection of logically related data designed to meet the information needs
of an organization. Since it is a shared corporate resource, the database is integrated with
minimum amount of or no duplication. These logically related data comprises entities, attributes,
relationships, and business rules of an organization's information. In addition to containing data
required by an organization, database also contains a description of the data which called as
“Metadata” or “Data Dictionary” or “Systems Catalogue” or “Data about Data”. Since a database
contains information about the data (metadata), it is called a self descriptive collection on
integrated records.
The purpose of a database is to store information and to allow users to retrieve and update that
information on demand. Database is designed once and used simultaneously by many users.
Unlike the traditional file based approach in database approach there is program data
independence. That is the separation of the data definition from the application. Thus the
application is not affected by changes made in the data structure and file organization. Each
database application will perform the combination of: Creating database, Reading, Updating and
Deleting data.

Characteristics of the Database Approach


With the database approach, a single repository of data is maintained that is defined once and is
then accesses by various users. The main characteristics of the database approach are as follows:
 Self describing nature of the database system
 Insulation between programs and data and data abstraction
 Support of multiple views of the data
 Sharing of data and multi-user transaction processing.

(a) Self-Describing Nature of a Database System


A fundamental characteristic of the database approach is that the database system contains not
only the database itself, but a complete definition or description of the database structure and
constraints. The definition is stored in the DBMS catalog which contains information such as the
structure of each file, the type and storage format of each data item, and various constraints on
the data. The information stored in the catalog is called metadata, and it describes the structure
of the primary database. The catalog is used by the DBMS software and also by database users
who need information about the database structure.

AASTU Compiled by Chere L. (M.Tech) Page 7


Fundamentals of Database System
A general purpose database application is not written for a specific database application, so it
must refer to the catalog to know the structure of the files in a specific database, such as the type
and format of data it will access. The DBMS software must work equally well with any number of
database applications.
In traditional file processing, the data definition is part of the application programs themselves,
hence they are constrained to work with only a specific database (i.e. a banking database, or a
university database). For example, an application written in C++ may have a struct or class
declaration. File processing software can access only specific databases; DBMS software can
access diverse databases by extracting the database definitions from the catalog, and using these
definitions.
Example: In the example database, the catalog will store the definitions of all files shown. The
database designer prior to creating the actual database specifies the definitions. Whenever a
request is made to access for example the name of a student record, the DBMS software refers to
the catalog to determine the structure of the student file and the position and size of the name
data item within a student record. In traditional file processing, the size and file structure are
coded within the application that accesses the data item.

(b) Insulation between Programs and Data, and Data Abstraction


In traditional file processing, the structure of the data files is embedded in the application
programs so any changes to the structure of a file may require changing all programs that access
the file. For example, a file access program may be written so that it can only access student
records with the exact structure specified by the starting position and length. If we add another
piece of data to the student record, such as BirthDate, the program must then be changed.
By contrast, DBMS access programs in most cases do not require such changes. The structure of
data files is stored in the catalog separately from the access programs. This is called program-
data independence. So that on previous example, in a DBMS environment, we just need to
change the description of a student record in the catalog to reflect the inclusion of the new data
item BirthDate, and no programs need to be changed. The next time a program refers to the
catalog, the new structure of Student records will be accessed and used.
In some types of database systems, such as Object Oriented database systems users can define
operations on data as part of the database. An operation is specified in two parts. The interface of
an operation includes the operation name and the data types of its arguments. The
implementation of the operation is specified separately and can be changed without affecting the
interface.
User application programs can operate on the data by invoking the operations through their
names and arguments regardless of how the operations are implemented. This can be called
program-operation independence.
This is called data abstraction. A DBMS provides users with the conceptual representation of
data that does not include details of how the data is stored or how the operations are

AASTU Compiled by Chere L. (M.Tech) Page 8


Fundamentals of Database System
implemented. A data model is a type of data abstraction that is used to provide the conceptual
representation.
Example: The internal implementation of a file may be defined by its record length (the number
of characters in each record), and each data item may be specified by its starting byte within a
record and its length in bytes. But a typical database user is not concerned with the location of
each data item within a record or its length. The DBMS hides details of the file storage
organization from the user.

(c) Support of Multiple Views of the Data


A database typically has many users, each of whom may require a different perspective or view
of the database. A view may be a subset of the database, or it may contain virtual data that is
derived from the database files, but not explicitly stored. (Give examples of these types of data).
Users generally do not need to know if the data is stored or derived.

(d) Sharing of Data and Multi-User Transaction Processing


A multi-user DBMS must allow multiple users to access the database at the same time. This is
important if data for multiple applications is to be integrated (brought together) and maintained
in a single database.
The DBMS must include concurrency control software to ensure that several users trying to
update the same data do so in a controlled manor. For example, when several reservation clerks
try to assign a seat on a flight, the DBMS should ensure that each seat can be accesses by only
one clerk.
An important concept in database applications is a transaction which an execution program or
process that includes one or more database accesses, such as reading or updating of database
records. Each transaction is supposed to execute a logically correct database access if executed in
its entirety without interference from other transactions. The isolation property ensures that each
transaction appears to execute in isolation from other transactions, even though hundreds of
transactions may be executing concurrently.

Benefits of the database approach


 Data can be shared: two or more users can access and use the same data instead of
storing data in redundant manner for each user. i.e. Database belongs to the entire
organization and can be shared by all authorized users.
 Integrity can be maintained: data at different applications will be integrated together with
additional constraints to facilitate shared data resource. Database integrity provides the
validity and consistency of stored data. Integrity is usually expressed in terms of
constraints, which are consistency rules that the database is not permitted to violate.
 Improved data accessibility and responsiveness: By having an integration in the database
approach, data accessing can be crossed departmental boundaries. This feature provides

AASTU Compiled by Chere L. (M.Tech) Page 9


Fundamentals of Database System
more functionality and better services to the users. So that by using structured query
languages, the users can easily access data without programming experience.
 Control of data redundancy: isolated data is integrated in database to decrease the
redundant data stored at different applications. Although the database approach does not
eliminate redundancy entirely, it controls the amount of redundancy inherent in the
database.
 Inconsistency can be avoided: by eliminating or controlling data redundancy, the
database approach reduces the risk of inconsistencies occurring of the data in the
database to some extent. It will avoid inconsistency and ensures all copies of the data are
kept consistent.
 Security majors can be enforced: the shared data can be secured by having different
levels of clearance and other data security mechanisms. Database approach provides a
protection of the data from the unauthorized users. It may take the term of user names
and passwords to identify user type and their access right in the operation including
retrieval, insertion, updating and deletion
 Standards can be enforced: The integration of the database enforces the necessary
standards including data formats, naming conventions, documentation standards, update
procedures and access rules. Therefore the different ways of using and dealing with data
by different unite of an organization can be balanced and standardized by using database
approach.
 Improved backing and recovery services:- Modern database management system
provides facilities to minimize the amount of processing that can be lost following a
failure by using the transaction approach.
 Improved maintenance:- Database approach provides a data independence. As a change
of data structure in the database will be affect the application program, it simplifies
database application maintenance.
 Physical data dependence is resolved; this means that the underlying structure of a data
file can be changed without the application programs needing amendment. This is
achieved by a hierarchy of levels of data specification. Each such specification of data in
a database system is called a schema. The different levels of schema provided in database
systems are described and further details of what is included within each specific schema
are discussed later on.
 Transaction support can be provided: basic demands of any transaction support systems
are implanted in a full scale DBMS.
 Quality data can be maintained to Improved decision support: the different integrity
constraints in the database approach will maintain the quality of data leading to better and
improved decision making.

AASTU Compiled by Chere L. (M.Tech) Page 10


Fundamentals of Database System
 Increased productivity:- The database approach provides all the low-level file-handling
routines. The provision of these functions allows the programmer to concentrate more on
the specific functionality required by the users. The fourth-generation environment
provided by the database can simplify the database application development.
 Compactness: since it is an electronic data handling method, the data is stored compactly
(no voluminous papers).
 Speed: data storage and retrieval is fast as it will be using the modern fast computer
systems.
 Less labour: unlike the other data handling methods, data maintenance will not demand
much resource.
 Centralized information control: since relevant data in the organization will be stored at
one repository, it can be controlled and managed at the central level.
 Increased concurrency:- Database can manage concurrent data access effectively. It
ensures no interference between users that would not result any loss of information nor
loss of integrity.
 Balance of conflicting requirements:- By having a structural design in the database, the
conflicts between users or departments can be resolved. Decisions will be based on the
base use of resources for the organization as a whole rather that for an individual entity.
 More information from the same amount of data:- With the integration of the operated
data in the database approach, it may be possible to derive additional information for the
same data.

AASTU Compiled by Chere L. (M.Tech) Page 11


Fundamentals of Database System
Limitations and risk of Database Approach
In split of a large number of advantages can be found in the database approach, it is not without
any challenge. The following disadvantages can be found including:
 Complexity in designing and managing data:- Database management system is an
extremely complex piece of software. All parties must be familiar with its functionality
and take full advantage of it. The changes introduced by the adoption of a database
system must be properly managed to ensure that they help advance the company’s
objectives. Given the fact that database systems hold crucial company data that are
accessed from multiple sources, security issues must be assessed constantly. Therefore,
training for the administrators, designers and users is required.
 Size:- The database management system consumes a substantial amount of main memory
as well as a large number amount of disk space in order to make it run efficiently.
 Cost of DBMS:- A multi-user database management system require sophisticated
hardware and software and highly skilled personnel. Therefore it may need higher cost to
be incurred to develop and maintain the system. Even after the installation, there is a high
recurrent annual maintenance cost on the software. As a general the cost of maintaining
the hardware, software, and personnel required to operate and manage a database
system can be substantial. Training, licensing, and regulation compliance costs are often
overlooked when database systems are implemented.
 Risk and Cost of conversion:- When moving from a file-base system to a database
system, the company is required to have additional expenses on hardware acquisition and
training cost.
 Reduced Performance (due to centralization and data independency):- As the database
approach is to cater for many applications rather than exclusively for a particular one,
some applications may not run as fast as before.
 Higher impact of a failure:- The database approach increases the vulnerability of the
system due to the centralization. As all users and applications reply on the database
availability, the failure of any component can bring operations to a halt and affect the
services to the customer seriously. This shows that high impact on the system when
failure occurs to the central system.
 Vendor Dependence:- Given the heavy investment in technology and personnel training,
companies might be reluctant to change database vendors.

 Frequent Upgrade/Replacement Cycles:- DBMS vendors frequently upgrade their


products by adding new functionality. Such new features often come bundled in new
upgrade versions of the software. Some of these versions require hardware upgrades. Not
only do the upgrades themselves cost money, but it also costs money to train database
users and administrators to properly use and manage the new features.
 Complex backup and recovery services from the users perspective

AASTU Compiled by Chere L. (M.Tech) Page 12


Fundamentals of Database System
1.3 Database Management System (DBMS)
Database Management System (DBMS) is a general-purpose Software packages (a collection of
programs) used for providing EFFICIENT, CONVENIENT and SAFE MULTI-USER (many
people/programs accessing same database, or even same data, simultaneously) storage of and
access to MASSIVE amounts of PERSISTENT (data outlives programs that operate on it) data.
A DBMS also provides a systematic method for define/create, construct, manipulate, maintain,
storing, retrieving data in a database and share databases among various users and applications. It
also provides the service of controlling data access, enforcing data integrity, managing
concurrency control, and recovery. Having this in mind, a full scale DBMS performs several
important functions that guarantee the integrity and consistency of the data in the database. Most
of those functions are transparent to end users, and most can be achieved only through the use
of a DBMS. This includes:
 Data dictionary management:- The DBMS stores definitions of the data elements and
their relationships (metadata) in a data dictionary. So that it define and specify the data
types, structures and constraints for the data to be stored in the database. In turn, all
programs that access the data in the database work through the DBMS. The DBMS uses
the data dictionary to look up the required data component structures and relationships.
Additionally, any changes made in a database structure are automatically recorded in the
data dictionary. In other words, the DBMS provides data abstraction, and it removes
structural and data dependence from the system.
 Data storage management:- DBMS handles Construction of database, the process of
storing the data itself on some storage medium. The DBMS creates and manages the
complex structures required for data storage. A modern DBMS provides storage not only
for the data, but also for related data entry forms or screen definitions, report definitions,
data validation rules, procedural code, structures to handle video and picture formats, and
so on. Data storage management is also important for database performance tuning.
Performance tuning relates to the activities that make the database perform more
efficiently in terms of storage and access speed. Although the user sees the database as a
single data storage unit, the DBMS actually stores the data files may even be stored on
different storage media. Therefore, the DBMS doesn’t have to wait for one disk request to
finish before the next one starts. In other words, the DBMS can fulfill database requests
concurrently.
 Data transformation and presentation (Manipulating):- includes such functions as
querying the database to retrieve specific data, updating the database to reflect changes in
the mini world, and generating reports from the data.
 Security management:- DBMS support the implementation of access and authorization
service to database administrator and users. Also provide protection services which
includes system protection against software/hardware malfunction and security protection
against unauthorized/malicious access.

AASTU Compiled by Chere L. (M.Tech) Page 13


Fundamentals of Database System
 Multi-user access control (Sharing:- allows multiple users and programs to access the
database concurrently. To provide data integrity and data consistency, the DBMS uses
sophisticated algorithms to ensure that multiple users can access the database concurrently
without compromising the integrity of the database.
 Backup and recovery management:- The DBMS provides backup and data recovery to
ensure data safety and integrity. Current DBMS systems provide special utilities that allow
the DBA to perform routine and special backup and restore procedures. Recovery
Management deals with the recovery of the database after a failure, such as bad sector in
the disk or power failure. Such capability is critical to preserving the database’s integrity.
 Data integrity management:- Integrity Services rules about data and the change that took
place on the data, correctness and consistency of stored data, and quality of data based on
business constraints. The DBMS promotes and enforces integrity rules, thus minimizing
data redundancy and maximizing data consistency. The data relationships stored in the data
dictionary are used to enforce data integrity. Ensuring data integrity is especially important
in transaction-oriented database systems.
 Database access language:- The DBMS provides data access through a query language. A
query language is a nonprocedural language, one that lets the user specify what must be
done without having to specify how it is to be done. Structured Query Language (SQL) is
one of the data access standard supported by the majority of DBMS vendors.
 Application program interface management:- DBMS provides application programming
interfaces to procedural languages such as COBOL, C, Java, Visual Basic.NET, and C#.
 Utility services: The DBMS provides administrative utilities used by the DBA and the
database designer to create, implement, monitor, and maintain the database. These utilities
include like importing data, statistical analysis support, index reorganization, garbage
collection
 Database Communication Interfaces:- Current-generation DBMSs accept end-user
requests via multiple, different network environments. For example, the DBMS might
provide access to the database via the Internet through the use of Web browsers such as
Mozilla Firefox or Microsoft Internet Explorer. In this environment, communications can
be accomplished in several ways. End users can generate answers to queries by filling in
screen forms through their preferred Web browser. The DBMS can automatically publish
predefined reports on a Website. The DBMS can connect to third-party systems to
distribute information via e-mail or other productivity applications.
 Concurrency Control Services: access and update on the database by different users
simultaneously should be implemented correctly.
 Services to promote data independency between the data and the application
Databases can be implemented by general purpose DMBSs, or special purpose custom software
can be written to create and maintain a database.
Database System =====> Database and DBMS software together.

AASTU Compiled by Chere L. (M.Tech) Page 14


Fundamentals of Database System
Components of DBMS Environment
As discussed above, DBMS is software package used to design, manage, and maintain databases.
Each DBMS should have facilities to define the database, manipulate the content of the database
and control the database. These facilities will help the designer, the user as well as the database
administrator to discharge their responsibility in designing, using and managing the database. So
that DBMS provides the following facilities and referred as components of a DBMS.

DBMS Engine
The engine is the central component of a DBMS. This component provides access to the
database and coordinates all of the functional elements of the DBMS. An important source of
data for the DBMS engine, and the database system as a whole, is known as metadata. Metadata
means data about data. Metadata is contained in a part of the DBMS called the data dictionary
(described below), and is a key source of information to guide the processes of the DBMS
engine. The DBMS engine receives logical requests for data (and metadata) from human users
and from applications, determines the secondary storage location (i.e. the disk address of the
requested data), and issues physical input/output requests to the computer operating system. The
data requested is fetched from physical storage into computer main memory; it is contained
therein special data structures provided by the DBMS. Whilst the data remains in memory, it is
managed by the DBMS engine. Additional data structures are created by the database system
itself, or by users of the system, in order to provide rapid access to data being processed by the
system. These data structures include indexes to speed up access to the data, buffer areas into
which particular types of data are retrieved, lists of free space etc. The management of these
additional data structures is also carried out by the DBMS engine.

AASTU Compiled by Chere L. (M.Tech) Page 15


Fundamentals of Database System
User Interface Subsystem
The interface subsystem provides facilities for users and applications to access the various
components of the DBMS. Most DBMS products provide a range of languages and other
interfaces, since the system will be used both by programmers (or other technical persons) and
by users with little or no programming experience. Some of the typical interfaces to a DBMS are
the following:

1. Data Definition Language (DDL):


 Language used to define each data element required by the organization. It used by the
DBA and database designers to specify the conceptual schema of a database. In many
DBMSs, it also used to define internal and external schemas (views).
 In some DBMSs, separate storage definition language (SDL) and view definition
language (VDL) are used to define internal and external schemas.
 DDL allows DBA or user to describe and name entitles, attributes and relationships
required for the application. Specification notation for defining the database schema
 Generally it is a commands for setting up schema or the intension of database. These
commands are used to setup a database, create, delete and alter table with the facility
of handling constraints. Also used to define, modify or remove database structures
such as records, tables, files, and views

2. Data Manipulation Language (DML):


 Language for accessing and manipulating the data organized by appropriate data model.
 Is a core command used by end-users and programmers to store, retrieve, and access the
data in the database.
 Used to specify database retrievals and updates.
 Since the required data or Query by the user will be extracted using this type of
language, it is also called "Query Language"
 DML commands (data sublanguage) can be embedded in a general-purpose
programming language (host language), such as COBOL, C or an Assembly Language.
Alternatively, stand-alone DML commands can be applied directly (query language).
 There are two type of DML, namely procedural DML (user specifies what data is
required and how to get the data) and non procedural DML (user specifies what data is
required but not how it is to be retrieved). SQL is the most widely used non-procedural
language query language

3. Data Dictionary:
 Due to the fact that a database is a self describing system, this tool, Data Dictionary, is
used to store and organize information about the data stored in the database.

AASTU Compiled by Chere L. (M.Tech) Page 16


Fundamentals of Database System
4. Data Control Language:
 Database is shared resource that demands control of data access and usage. The database
administrator should have the facility to control the overall operation of the system.
 Data Control Languages are commands that will help the Database Administrator to
control access to the database. It allows a Database administrator to have overall control
of the system, often including the administration of security, so that access to both the
data and processes of the database system can be controlled.
 It used to define the security on the data in the database. The commands include grant or
revoke privileges to access the database or particular object within the database and to
store or remove database transactions

5. A graphical user interface:


 Provide a visual means of browsing or querying the data, including a range of different
display options, such as bar charts, pie charts etc. One particular example of such a
system is Query-by-Example, in which the system displays a skeleton table (or tables),
and users pose requests by suitable entry in the table.
 A forms user interface in which a screen-oriented form is presented to the user who
responds by filling in blanks in the form. Such forms based systems are a popular means
of providing a visual front-end both to developers and to users of a database system.
Typically developers use the Forms-based system in "developer mode”, where they
design the forms or screens that will make up an application, and attach fragments of
code which will be triggered by the actions of users as they use the forms-based user
interface.
 User-friendly interfaces:
o Menu-based, popular for browsing on the web
o Forms-based, designed for native users
o Graphics-based (Point and Click, Drag and Drop etc.)
o Natural language: requests in written English
o Speech as Input (?) and Output
o Setting system parameters
o Changing schemas or access path
o Creating accounts, granting authorizations
o Parametric interfaces (e.g., bank tellers) using function keys.
o Combinations of the above
 A natural language user interface that allows users to present requests in free form
English statements.

AASTU Compiled by Chere L. (M.Tech) Page 17


Fundamentals of Database System
 A DBMS procedural programming language, often based on standard third-generation
programming languages such as C and COBOL which allows programmers to develop
sophisticated applications.
 4th Generation Languages, such as Smalltalk, JavaScript, etc. permit applications to be
developed relatively quickly compared to the procedural languages mentioned above.
 Fourth Generation Language (4GL)
o Query Languages
o Forms Generators
o Report Generators
o Graphics Generator
o Application Generator

6. Database System Utilities:- Used to perform certain functions such as:


 Loading data stored in files into a database. Includes data conversion tools.
 Backing up the database periodically on tape.
 Reorganizing database file structures.
 Report generation utilities.
 Performance monitoring utilities.
 Other functions, such as sorting, user monitoring, data compression, etc.

Data Dictionary Subsystem


The data dictionary subsystem is used to store data about many aspects of how the DBMS works. The
data contained in the Dictionary subsystem varies from DBMS to DBMS, but in all systems it is a key
component of the database. Typical data to be contained in the Dictionary includes: definitions of the
users of the system and the access rights they have, usage standards, details of the data structures used to
contain data in the DBMS, design decisions, application program descriptions, descriptions of business
rules that are stored and enforced within the DBMS, definitions of the additional data structures used to
improve systems performance.
It is important to understand that because of the importance and sensitive nature of the data contained in
the Dictionary subsystem, most users will have none or little direct access to this information. However
the Database Administrator will need to have regular access to much of the dictionary system, and
should have a detailed knowledge of the way in which the Dictionary is organized.
 Active data dictionary is accessed by DBMS software and users/DBA.
 Passive data dictionary is accessed by users/DBA only.

AASTU Compiled by Chere L. (M.Tech) Page 18


Fundamentals of Database System
Performance Management Subsystem
The performance management subsystem provides facilities to optimize (or at least improve) DBMS
performance. This is necessary because the large and complex software in a DBMS requires attention to
ensure it performs efficiently, i.e. it allows retrieval and changes to data to be made without requiring
users to wait for significant periods of time for the DBMS to carry out the requested action.
Two important functions of the Performance management subsystem are:
 Query optimization: Structuring SQL queries (or other forms of user queries) to minimize
response times
 DBMS reorganization: Maintaining statistics on database usage and taking (or recommending)
actions such as database reorganization, creating indexes, and so on to improve DBMS
performance

Data Integrity Management Subsystem


The data integrity management subsystem provides facilities for managing the integrity of data in the
database and the integrity of metadata in the Dictionary. This subsystem is concerned with ensuring that
data is, as far as software can ensure, correct and consistent. There are three important functions:
 Intra-record integrity: Enforcing constraints on data item values and types within each record in
the database.
 Referential integrity: Enforcing the validity of references between records in the database.
 Concurrency control: Assuring the validity of database updates when multiple users access the
database.

Backup and Recovery Subsystem


The backup and recovery subsystem provides facilities for logging transactions and database changes,
periodically making backup copies of the database, and recovering the database in the event of some
type of failure. (backup and recovery will explained in greater detail in a advanced database). A good
DBMS will provide comprehensive and flexible mechanisms for backing up and restoring copies of
data, and it will be up to the Database Administrator, in consultation with users of the system, to decide
precisely how these features should be used.

Application Development Subsystem


The application development sub-system is for programmers to develop complete database applications.
It includes CASE tools (software to enable the modeling of applications), as well as facilities such as
screen generators (for automatically creating the screens of an application given details about the data to
be input and/or output) and report generators.

AASTU Compiled by Chere L. (M.Tech) Page 19


Fundamentals of Database System
In most commercial situations there will in fact be a number of different database systems, operating
within a number of different computer environments. By computer environment we mean a set of
programs and data made available usually on a particular computer. One such set of database systems,
used in a number of medium to large companies, involves the establishment of 3 different computer
environments.
The first of these is the Development environment, where new applications are developed and new
applications, whether written within the company or bought in from outside, are tested. The
Development environment usually contains relatively little data, just enough in fact to adequately test
the logic of the applications being developed and tested. Security within the Development environment
is usually not an important issue, unless the actual logic of the applications being developed is, in its
own right, of a sensitive nature.
The second of the three environments is often called pre-production. Applications that have been tested
in the development environment will be moved into pre-production for volume testing, that is testing
with quantities of data that are typical of the application when it is in live operation.
The final environment is known as the production or live environment. Applications should only be
moved into this environment when they have been fully tested in Pre-production. Security is nearly
always a very important issue in the production environment, as the data being used reflects important
information in current use by the Organization.
Each of these separate environments will have at least one database system, and because of the widely
varying activities and security measures required in each environment, the volumes of data and degree
of administration required will itself vary considerably between environments, with the production
database (s) requiring by far the most support.
Given the need for the Database Administrator to migrate both programs and data between these
environments, an important tool in performing this process will be a set of utilities or programs for
migrating applications and their associated data both forwards and backwards between the environments
in use.

Security Management Subsystem


The security management subsystem provides facilities to protect and control access to the database and
data dictionary.

AASTU Compiled by Chere L. (M.Tech) Page 20


Fundamentals of Database System
1.4 Components of Database System
Taking a DBMS as a system, one can describe it with respect to its environment or other systems
interacting with the DBMS. To design and use a database, there should be the interaction or integration
of Hardware, Software, Data, Procedure and People.

Hardware:
Hardware are components that one can touch and feel. These components are comprised of various types
of personal computers, mainframe or any server computers to be used in multi-user system, network
infrastructure, and other peripherals required in the system.

Software:
Software are collection of commands and programs used to manipulate the hardware to perform a
function. These include components like the DBMS software, application programs, operating systems,
network software, language software and other relevant software.

Data:
Since the goal of any database system is to have better control of the data and making data useful, Data
is the most important component to the user of the database. There are two categories of data in any
database system: that is Operational and Metadata. Operational data is the data actually stored in the
system to be used by the user. Metadata is the data that is used to store information about the database
itself. The structure of the data in the database is called the schema, which is composed of the Entities,
Properties of entities, and relationship between entities.

Procedure:
Procedure is the rules and regulations on how to design and use a database. It includes procedures like
how to log on to the DBMS, how to use facilities, how to start and stop transaction, how to make
backup, how to treat hardware and software failure, how to change the structure of the database.

People:
This component is composed of the people in the organization that are responsible or play a role in
designing, implementing, managing, administering and using the resources in the database. This
component includes group of people with high level of knowledge about the database and the design
technology to other with no knowledge of the system except using the data in the database. In general
database users are include the following people:
 Database Administrator  Database Designer
 Application programmer and System  End Users
analysts

AASTU Compiled by Chere L. (M.Tech) Page 21


Fundamentals of Database System
Roles in Database Design and Use
As people are one of the components in DBMS environment, there are group of roles played by different
stakeholders of the designing and operation of a database system.
1. Database Administrator (DBA)
 Responsible to oversee, control and manage the database resources (the database itself, the
DBMS and other related software)
 Authorizing access to the database
 Coordinating and monitoring the use of the database
 Responsible for determining and acquiring hardware and software resources
 Accountable for problems like poor security, poor performance of the system
 Involves in all steps of database development
We can have further classifications of this role in big organizations having huge amount of data and user
requirement.
A. Data Administrator (DA): is responsible on management of data resources. Involved in
database planning, development, maintenance of standards, policies and procedures at the
conceptual and logical design phases.
B. Database Administrator (DBA): is a more technical role. Is responsible for the physical
realization of the database. Involves in physical design, implementation, security and
integrity control of the database.

2. Database Designer (DBD)


 Identifies the data to be stored and choose the appropriate structures to represent and store the
data. Most of these functions are done before the database is implemented and populated with
the data.
 Should understand the user requirement and should choose how the user views the database.
So that it is responsibility to communicate with all prospective users to understand their
requirements and come up with a design that meets these requirements.
 Involve on the design phase before the implementation of the database system.
 Database designers interact with all potential users and develop views of the database that
meet the data and processing requirements of these groups.
We have two distinctions of database designers, one involving in the logical and conceptual design and
another involving in physical design.

AASTU Compiled by Chere L. (M.Tech) Page 22


Fundamentals of Database System
a. Logical and Conceptual DBD
 Identifies data (entity, attributes and relationship) relevant to the organization
 Identifies constraints on each data
 Understand data and business rules in the organization
 Sees the database independent of any data model at conceptual level and consider one
specific data model at logical design phase.
b. Physical DBD
 Take logical design specification as input and decide how it should be physically
realized.
 Map the logical data model on the specified DBMS with respect to tables and integrity
constraints. (DBMS dependent designing)
 Select specific storage structure and access path to the database
 Design security measures required on the database

3. Application Programmer and Systems Analyst


 System analyst determines the user requirement and how the user wants to view the database.
 The application programmer implements these specifications as programs; code, test, debug,
document and maintain the application program.
 Determines the interface on how to retrieve, insert, update and delete data in the database.
 The application could use any high level programming language according to the availability, the
facility and the required service.

4. End Users
Workers, whose job requires accessing the database frequently for various purposes, there are different
group of users in this category.
i. Naïve Users:
 Sizable proportion of users
 Unaware of the DBMS
 Only access the database based on their access level and demand
 Use standard and pre-specified types of queries.

AASTU Compiled by Chere L. (M.Tech) Page 23


Fundamentals of Database System
ii. Sophisticated Users
 Are users familiar with the structure of the Database and facilities of the DBMS.
 Have complex requirements
 Have higher level queries
 Are most of the time engineers, scientists, business analysts, etc
iii. Casual Users
 Users who access the database occasionally.
 Need different information from the database each time.
 Use sophisticated database queries to satisfy their needs.
 Are most of the time middle to high level managers.

These users can be again classified as “Actors on the Scene” and “Workers Behind the Scene”.
Actors on the Scene:
 Data Administrator
 Database Administrator
 Database Designer
 End Users

Workers behind the Scene


 DBMS designers and implementers: who design and implement different DBMS
software.
 Tool Developers: experts who develop software packages that facilitates database
system designing and use. Prototype, simulation, code generator developers could be an
example. Independent software vendors could also be categorized in this group.
 Operators and Maintenance Personnel: system administrators who are responsible for
actually running and maintaining the hardware and software of the database system and
the information technology facilities.

AASTU Compiled by Chere L. (M.Tech) Page 24


Fundamentals of Database System
1.5 Database Development Life Cycle
As it is one component in most information system development tasks, there are several steps in
designing a database system. Here more emphasis is given to the design phases of the system
development life cycle. The major steps in database design are;

1. Planning: that is identifying information gap in an organization and propose a database solution to
solve the problem.

2. Data analysis and requirements: that concentrates more on fact finding about the problem or the
opportunity. Feasibility analysis, requirement determination and structuring, and selection of best
design method are also performed at this phase.
i. Designer’s efforts are focused on
a) Information needs of Information users.
b) Information sources. Information constitution.
ii. Sources of information for the designer
a) Developing and gathering end user data views
b) Direct observation of the current system: existing and desired output
c) Interface with the systems design group
iii. The designer must identify the company’s business rules and analyze their impacts.

3. Define Problems and Constraints


 How does the existing system function?
 What input does the system require?
 What documents does the system generate?
 How is the system output used? By Whom?
 What are the operational relationships among business units?
 What are the limits and constraints imposed on the system?

4. Design: in database designing more emphasis is given to this phase. The phase is further divided into
three sub-phases.
a) Conceptual Design: concise description of the data, data type, relationship between data
and constraints on the data. There is no implementation or physical detail consideration.
Used to elicit and structure all information requirements

AASTU Compiled by Chere L. (M.Tech) Page 25


Fundamentals of Database System
b) Logical Design: a higher level conceptual abstraction with selected specific data model to
implement the data structure. It is particular DBMS independent and with no other physical
considerations.
c) Physical Design: physical implementation of the upper level design of the database with
respect to internal storage and file structure of the database for the selected DBMS. To
develop all technology and organizational specification.

5. DBMS Selection
 The selection of an appropriate DBMS to support the database application.
 Undertaken at any time prior to logical design provided sufficient information is available
regarding system requirements.
 Also design the user interface and the application programs using the selected DBMS

6. Prototyping: Building a working model of a database application.


Purpose
 To identify features of a system that work well, or are inadequate
 To suggest improvements or even new features
 To clarify the users’ requirements
Prototype Development Method Stages

AASTU Compiled by Chere L. (M.Tech) Page 26


Fundamentals of Database System
7. Implementation: the deployment and testing of the designed database for use.
8. Data conversion and loading
9. Testing
 The process of executing the application programs with the intent of finding errors.
 Use carefully planned test strategies and realistic data.
 Testing cannot show the absence of faults; it can show only that software faults are
present.
 Demonstrates that database and application programs appear to be working according to
requirements.
10. Operation and Support: administering and maintaining the operation of the database system and
providing support to users.
 The process of monitoring and maintaining the system following installation.
 Monitoring the performance of the system.
 If performance falls, may require reorganization of the database.
 Maintaining and upgrading the database application (when required).
 Incorporating new requirements into the database application.

AASTU Compiled by Chere L. (M.Tech) Page 27


Fundamentals of Database System
Chapter Two
Database system Concepts and Architectures
2.1 Database Terminology
Any firm that gathers extensive amounts of information to be retrieved at some point in the future
depends on a file system (generally a database system) that allows for the access of specific information
in a timely and cost effective manner. This information is usually stored in files, records and fields.
A file is a set of related information stored together. It is a collection of information relative to a number
of related objects. For example, a bank needs to collect various types of information about customers,
employees, savings accounts, demand accounts, etc. Each one of these is a ‘subpackage’ of information,
stored as a file. They may have 4 files (Employee File, a Customer File, a Savings Account file and a
Demand Account file).
A record is a collection of information about an object. If we look at the example from above, the
Customer File, a record is a collection of information about a customer object. The record contains
information required by the bank about each customer, and stored for each customer. For example,
information contained in a customer record may include at least first_name, last_name, id_number,
birth_date. Each of these pieces of information are taken together to form a customer record. Because
each record in the file contain the same number and types of fields, the records have the same type.
A field is a piece of information about an entity, contained in a record. In the above example, the 4
pieces of information, first_name, last_name, id_number and birth_date are all fields. Therefore we can
use the following definitions, a record is a collection of related fields, and a file is a collection of related
records, of the same type.
Keys:- In most cases there is a field in a record that identifies that particular record uniquely. This field
is referred to as the primary key of the record. In some cases, there may not be a single field that
uniquely identifies a record and serve as the primary key. In these cases, a combination of two or more
fields can serve as the primary key. This is called a composite key.
Let’s assume there is a field for Customer Name. In this case, because it is likely that two customers will
have the same name, therefore duplicate values, Customer Name can’t be used as a primary key. This is
referred to as a secondary key. A secondary key is a field which identifies a record, but that does not
uniquely identify a record.

AASTU Compiled by Chere L. (M.Tech) Page 1


Fundamentals of Database System
2.2 Data Models, Schemas, and Instances
2.2.1 Data Models
A specific DBMS has its own specific Data Definition Language, but this type of language is too low
level to describe the data requirements of an organization in a way that is readily understandable by a
variety of users. We need a higher-level language. Such a higher-level is called data-model. A model is a
representation(abstraction) of real world objects and events and their associations. The main uses of
model is to help you understand the complexities of the real-world environment.
A data model a set of concepts to describe the structure of a database, and certain constraints
that the database should obey. Most data models also include a set of operations.
• Structure of database: is about data types, relationships, and constraints that should hold for data
• Data Model Operations: Operations for specifying database retrievals and updates by referring
to the concepts of the data model. Operations on the data model may include basic operations
and user-defined operations.
A data model is a description of the way that data is stored in a database. Data model helps to
understand the relationship between entities and to create the most effective structure to hold data. The
main purpose of Data Model is to represent the data in an understandable way.
Data Model is a collection of tools or concepts for describing
 Data and Data relationships
 Data semantics
 Data constraints
Therefore Data Model:
 Useful in understanding complexities of the real-world environment
 Facilitate interaction among the designer, the applications programmer, and the end user
 Data model organizes data for various users as end users have different views and needs for data
Data modeling is an iterative, progressive process. You start with a simple understanding of the
problem domain, and as your understanding of the problem domain increases, so does the level of detail
of the data model. The final data model is in effect a “blueprint” containing all the instructions to build
a database that will meet all end-user requirements. This blueprint is narrative and graphical in nature,
meaning that it contains both text descriptions in plain, unambiguous language and clear, useful
diagrams depicting the main data elements.

AASTU Compiled by Chere L. (M.Tech) Page 2


Fundamentals of Database System

Categories of data models


Many data models have been proposed, which we can categorize according to the types of concepts they
use to describe the database structure. The categories of data model includes:
 Conceptual
 Physical
 Implementation(Representational)

Conceptual (high-level, semantic) data models:


 Provide concepts that are close to the way many users perceive data. (Also called entity-based
or object-based data models.).
 Conceptual data models use concepts such as entities, attributes, and relationships.
 Entity: represents real world object or concept such as of the employee, project
 Attribute: represents property that describes entity more, e.g. employee’s name or salary
 Relationship: represent an association among two or more entities.
 Entity-Relationship model is a popular high level conceptual model.

Physical (low-level, internal) data models:


 Provide concepts that describe details of how data is stored on the computer storage media,
typically magnetic disks.
 Concepts provided by low-level data models are generally meant for computer specialists, not
for end users..

Implementation (representational) data models:


 Provide concepts that fall between the above two, balancing user views with some computer
storage details.
 Provide concepts that may be easily understood by end users but that are not too far removed
from the way data is organized in computer storage.
 Representational data models hide many details of data storage on disk but can be implemented
on a computer system directly.
 Representational or implementation data models are the models used most frequently in
traditional commercial DBMSs. These include the widely used relational data model, as well as
the so-called legacy data models, the network and hierarchical models.

AASTU Compiled by Chere L. (M.Tech) Page 3


Fundamentals of Database System

Type of data models


1) Hierarchical Model
This model implemented in a joint effort by IBM and North American Rockwell around 1965. Resulted
in the IMS family of systems. Other system based on this model: System 2k (SAS inc.).
The hierarchical data model organizes data in a tree structure. There is a hierarchy of parent and child
data segments. This structure implies that a record can have repeating information, generally in the child
data segments. Data in a series of records, which have a set of field values attached to it. It collects all
the instances of a specific record together as a record type.
 The simplest data model
 Record type is referred to as node or segment
 Nodes are arranged in a hierarchical structure as sort of upside-down tree
 The top node is the root node
 A parent node can have more than one child node
 A child node can only have one parent node
 The relationship between parent and child is one-to-many
 Relation is established by creating physical link between stored records (each is stored with
a predefined access path to other records)
 To add new record type or relationship, the database must be redefined and then stored in a
new form.

AASTU Compiled by Chere L. (M.Tech) Page 4


Fundamentals of Database System

Advantages of Hierarchical Data Model:


 Hierarchical Model is simple to construct and operate on
 Corresponds to a number of natural hierarchically organized domains - e.g., assemblies in
manufacturing, personnel organization in companies
 Language is simple; uses constructs like GET, GET UNIQUE, GET NEXT, GET NEXT
WITHIN PARENT etc.

Disadvantages of Hierarchical Data Model:


 Database is visualized as a linear arrangement of records
 Navigational and procedural nature of processing
 Little scope for "query optimization"
 Addition, deletion, and search operations are very difficult.
 There is duplication of data.
 We must write a program and may complex programming is required.

2) Network Model
The network model is the first one to be implemented by Honeywell in 1964-65 (IDS System). Adopted
heavily due to the support by CODASYL (CODASYL - DBTG report of 1971). Later implemented in a
large variety of systems - IDMS (Cullinet - now CA), DMS 1100 (Unisys), IMAGE (H.P.), VAX -
DBMS (Digital Equipment Corp).
The network model is a database model conceived as a flexible way of representing objects and their
relationships. Its distinguishing feature is that the schema, viewed as a graph in which object types are
nodes and relationship types are arcs, is not restricted to being a hierarchy or lattice. The nodes
corresponds to record types and the links to pointers or relationships. All the relationship are hardwired
or pre-computed and build into structure of database itself because they are very efficient in space
utilization and query execution time.
The network data structure looks like a tree structure except that a dependent node which is called a
child or member, may have more than one parent or owner node. The network model replaces the
hierarchical model with a graph thus allowing more general connections among the nodes. The main
difference of the network model from the hierarchical model is its ability to handle many to many
relationships. In other words it allow a record to have more than one parent.

AASTU Compiled by Chere L. (M.Tech) Page 5


Fundamentals of Database System
Generally the network model characterized by the following concepts
 Like hierarchical model network model is a collection of physically linked records.
 Allows record types to have more than one parent unlike hierarchical model
 A network data models sees records as set members and each set has an owner and one or more
members
 Allow member records to have more than one owner

Advantages of Network Data Model:


 Conceptual simplicity:- Just like the hierarchical model, the network model is also
conceptually simple and easy to design.
 Capability to handle more relationship types:- The network model can handle the one to
many as well as many to many relationships which is real help in modeling the real life
situations. Also it able to model complex relationships and represents semantics of add/delete
on the relationships.
 Can handle most situations for modeling using record types and relationship types.
 Ease of data access:- The data access is easier and flexible than the hierarchical model.
Programmers can do optimal navigation through the database.
 Language is navigational; uses constructs like FIND, FIND member, FIND owner, FIND
NEXT within set, GET etc.
 Data integrity- The network model does not allow a member to exist without an owner.
 Duplication of data is reduced as compared to hierarchical model.
 Data independence- The network model is better than the hierarchical model in isolating the
programs from the complex physical storage details.
 Easy to show the connection of items and Good for network type problem.

AASTU Compiled by Chere L. (M.Tech) Page 6


Fundamentals of Database System
Disadvantages of Network Data Model:
 Navigational and procedural nature of processing
 System complexity:- All the records are maintained using pointers and database contains a
complex array of pointers that thread through a set of records. Hence the whole database
structure becomes very complex.
 Operational Anomalies:- The insertion, deletion and updating operations of any record
require large number of pointers adjustments.
 Absence of structural independence:- structural changes to the database is very difficult.
 Little scope for automated "query optimization”
 Programming is required.

3) Relational Data Model


Following a famous paper written by Ted Codd in 1970, database systems changed significantly. Codd
proposed that database systems should present the user with a view of data organized as tables called
relations. Behind the scenes, there might be a complex data structure that allowed rapid response to a
variety of queries. But, unlike the user of earlier database systems, the user of a relational system would
not be concerned with the storage structure. Queries could be expressed in a very high-level language,
which greatly increased the efficiency of database programmers.
First commercial system in 1981-82. Now in several commercial products (DB2, ORACLE, SQL
Server, SYBASE, INFORMIX).

AASTU Compiled by Chere L. (M.Tech) Page 7


Fundamentals of Database System
 Terminologies originates from the branch of mathematics called set theory and relation
 Can define more flexible and complex relationship
 Viewed as a collection of tables called “Relations” equivalent to collection of record types
 Relation: Two dimensional table
 Stores information or data in the form of tables rows and columns
 A row of the table is called tuple equivalent to record
 A column of a table is called attribute equivalent to fields
 Data value is the value of the Attribute
 Records are related by the data stored jointly in the fields of records in two tables or files. The
related tables contain information that creates the relation
 The tables seem to be independent but are related somehow.
 No physical consideration of the storage is required by the user
 Many tables are merged together to come up with a new virtual view of the relationship

 The rows represent records (collections of information about separate items)


 The columns represent fields (particular attributes of a record)
 Conducts searches by using data in specified columns of one table to find additional data in
another table
 In conducting searches, a relational database matches information from a field in one table
with information in a corresponding field of another table to produce a third table that
combines requested data from both tables

AASTU Compiled by Chere L. (M.Tech) Page 8


Fundamentals of Database System

4) Entity Relationship Model


The conceptual simplicity of relational database technology triggered the demand for RDBMSs. In turn,
the rapidly increasing requirements for transaction and information created the need for more complex
database implementation structures, thus creating the need for more effective database design tools.
Complex design activities require conceptual simplicity to yield successful results.
Although the relational model was a vast improvement over the hierarchical and network models, it still
lacked the features that would make it an effective database design tool. Because it is easier to examine
structures graphically than to describe them in text, database designers prefer to use a graphical tool in
which entities and their relationships are pictured. Thus, the entity relationship (ER) model, or ERM, has
become a widely accepted standard for data modeling.
 ER model is the graphical representation of entities and their relationships in a database
structure that quickly became popular because it complemented relational data model concepts.
 The relational data model and ER-model combined to provide the foundation for tightly
structured database design.
 ER models are normally represented in an entity relationship diagram (ERD), which uses
graphical representations to model database components.

2.2.2 Schemas and Instances


In any data model, it is important to distinguish between the description of the database and the
database itself. For instance, when a database is designed using a Relational data model, all the data is
represented in a form of a table. In such definitions and representation, there are two basic components
of the database. The two components are the definition of the Relation or the Table and the actual data
stored in each table.
The data definition is what we call the Schema or the skeleton of the database and the Relations with
some information at some point in time is the Instance or the flesh of the database.
Schemas
 Schema describes how data is to be structured, defined at setup/Design time (also called
"metadata")
 Since it is used during the database development phase, there is rare tendency of changing the
schema unless there is a need for system maintenance which demands change to the definition of
a relation.
 Most data models have certain conventions for displaying schemas as diagrams. A displayed
schema is called a schema diagram.

AASTU Compiled by Chere L. (M.Tech) Page 9


Fundamentals of Database System
 The following figure, shows a schema diagram for the database. The diagram displays the
structure of each record type but not the actual instances of records. We call each object in the
schema (such as STUDENT or COURSE) a schema construct

 A schema diagram displays only some aspects of a schema, such as the names of record types
and data items, and some types of constraints.
 Other aspects are not specified in the schema diagram; for example, in the previous figure shows
neither the data type of each data item, nor the relationships among the various files.
 Many types of constraints are not represented in schema diagrams.
 For example a constraint such as students majoring in computer science must take CS1310
before the end of their sophomore year is quite difficult to represent diagrammatically.

Defines DBMS schemas at three levels:


 Internal schema: at the internal level to describe physical storage structures and access paths.
Typically uses a physical data model.
 Conceptual schema: at the conceptual level to describe the structure and constraints for the
whole database for a community of users. Uses a conceptual or an implementation data model.
 External schema: at the external level to describe the various user views. Usually uses the same
data model as the conceptual level.

AASTU Compiled by Chere L. (M.Tech) Page 10


Fundamentals of Database System
Instances
 It is the collection of data in the database at a particular point of time (snap-shot).
o Also called State or Snap Shot or Extension of the database
o Refers to the actual data in the database at a specific point in time
o State of database is changed any time we add, delete or update an item.
 Valid state: the state that satisfies the structure and constraints specified in the schema and is
enforced by DBMS
 Since Instance is actual data of database at some point in time, changes rapidly
 The actual data in a database may change quite frequently.
 For example, the database shown in the previous figure changes every time we add a new student
or enter a new grade.
When we define a new database, we specify its database schema only to the DBMS. At this point, the
corresponding database state is the empty state with no data. We get the initial state of the database
when the database is first populated or loaded with the initial data

2.3 Three-Schema Architecture and Data Independence


The Three of the four important characteristics of the database approach, listed in chapter 1, Section 1.2,
subsection 3, are
(1) use of a catalog to store the database description (schema) so as to make it self-describing,
(2) insulation of programs and data (program-data and program-operation independence), and
(3) support of multiple user views.
In this section we specify an architecture for database systems, called the three-schema architecture,
that was proposed to help achieve and visualize these characteristics. Then we discuss the concept of
data independence further.

Three-Level database architecture (ANSI-SPARC Architecture)


The major aim of DB is to provide users with an abstract view of data, hiding certain details of how
data is stored and manipulated. Since DB is a shared resources each users may require a different view
of the data. To satisfy these needs, the architecture of most commercial DBMSs available today is based
on the so-called ANSI-SPARC Architecture.(America National Standard Institution Standard Planning
and Requirement committee).

AASTU Compiled by Chere L. (M.Tech) Page 11


Fundamentals of Database System
The purpose and origin of the Three-Level database architecture
 All users should be able to access same data. This is important since the database is having a
shared data feature where all the data is stored in one location and all users will have their own
customized way of interacting with the data.
 User's view is unaffected or immune to changes made in other views. Since the requirement of one
user is independent of the other, a change made in one user’s view should not affect other users.
 Users should not need to know physical database storage details. As there are naïve users of the
system, hardware level or physical details should be a black-box for such users.
 DBA should be able to change database storage structures without affecting the users' views. A
change in file organization, access method should not affect the structure of the data which in turn
will have no effect on the users.
 Internal structure of database should be unaffected by changes to physical aspects of storage.
 DBA should be able to change conceptual structure of database without affecting all users. In any
database system, the DBA will have the privilege to change the structure of the database, like
adding tables, adding and deleting an attribute, changing the specification of the objects in the
database.

Three-level ANSI-SPARC Architecture of a Database


External view: This is a highest level of abstraction as seen by user. Users' view of the database. This
level of abstraction describes only the part of entire database that is relevant to a particular user.
Different users have their own customized view of the database independent of other users. It is based
on the conceptual model, is the end user view of data environment. Each external view described by
means of a schema called an external schema or subschema.

AASTU Compiled by Chere L. (M.Tech) Page 12


Fundamentals of Database System
Conceptual level: At this level of database abstra
abstraction
ction all the database entities and the relationships
among them are included. Describes what data is stored in database and relationships among the data.
Community view of the database. One conceptual view represents the entire database. the conceptual
schema
chema defines thus conceptual view.

Internal(physical) level: This lowest level of abstraction. it closest to physical storage device. It
describes how data are actually stored on the storage medium. The internal schema, which contains the
definition of the
he stored record, the method representing the data fields, expresses the internal view and
the access aids used.

ANSI-SPARC
SPARC Architecture and Database Design Phases

AASTU Compiled by Chere L. (M.Tech) Page 13


Fundamentals of Database System
The following example can be taken as an illustration for the difference between the three levels in the
ANSI-SPARC database Architecture. Where:
 The first level is concerned about the group of users and their respective data requirement
independent of the other.
 The second level is describing the whole content of the database where one piece of information
will be represented once.
 The third level

Differences between Three Levels of ANSI-SPARC Architecture

Data Independence
The three-schema architecture can be used to further explain the concept of data independence , which
can be defined as the capacity to change the schema at one level of a database system without having to
change the schema at the next higher level.
Application programs interact with the external database schema, which has an interface, or mapping, to
the conceptual schema. The conceptual schema is concerned with the identity and relationships between
elements of data of interest to an organization, and has an interface or mapping to the internal schema.
The internal schema controls how the data is stored on physical media, such as magnetic disks.

AASTU Compiled by Chere L. (M.Tech) Page 14


Fundamentals of Database System
In a database environment, if there is a requirement to change the structure of a particular file of data
held on disk, this will be recorded in the internal schema. The interface between the internal schema and
the conceptual schema will be amended to reflect this, but there will be no need to change the external
schema. This means that any such change of physical data storage is transparent to users and application
programs. This approach removes the problem of physical data dependence.
In a similar manner, any changes to the conceptual schema, can be isolated from the external schema
and the internal schema; such changes will be reflected in the interface between the conceptual schema
and the other levels. This achieves logical data independence. What this means effectively, is that
changes can be made at the conceptual level, where the overall model of an organizations data is
specified, and these changes can be made independently of both the physical storage level, and the
external level seen by individual users. The changes are handled by the interfaces between the
conceptual, middle layer, and the physical and external layers.
Mappings among schema levels are needed to transform requests and data. Programs refer to an
external schema, and are mapped by the DBMS to the internal schema for execution. When a schema at
a lower level is changed, only the mappings between this schema and higher-level schemas need to be
changed in a DBMS that fully supports data independence. The higher-level schemas themselves are
unchanged. Hence, the application programs need not be changed since they refer to the external
schemas.
The ability to modify a scheme definition in one level without affecting a scheme definition in a higher
level is called data independence. We can define two types of data independence
 Logical data independence
 Physical data independence

Logical Data Independence:


 Refers to immunity of external schemas to changes in conceptual schema.
 Conceptual schema changes e.g. addition/removal of entities should not require changes to
external schema or rewrites of application programs.
 The capacity/ability to change/modify the conceptual schema without having to change the
external schemas and their application programs (i.e. without causing application programs to
be rewritten).
 Usually done when logical structure of database is altered
 Logical data independence is harder to achieve as the application programs are usually heavily
dependent on the logical structure of the data. An analogy is made to abstract data types in
programming languages.

AASTU Compiled by Chere L. (M.Tech) Page 15


Fundamentals of Database System
Physical Data Independence
 The ability to modify the physical schema without changing the logical schema and without
causing application programs to be rewritten
 Applications depend on the logical schema
 In general, the interfaces between the various levels and components should be well defined so
that changes in some parts do not seriously influence others.
 The capacity to change the internal schema without having to change the conceptual schema
 Modifications at this level are usually to improve performance
 Refers to immunity of conceptual schema to changes in the internal schema
 Internal schema changes e.g. using different file organizations, storage structures/devices
should not require change to conceptual or external schemas.

Data Independence and the ANSI-SPARC Three-level Architecture

AASTU Compiled by Chere L. (M.Tech) Page 16


Fundamentals of Database System
2.4 Centralized and Client-Server Architectures
Centralized DBMS
Earlier architectures used mainframe computers to provide the main processing for all system functions,
combines everything into single system including user application programs and user interface
programs, as well as all the DBMS functionality. The reason was that most users accessed such systems
via computer terminals that did not have processing power and only provided display capabilities.
Therefore, all processing was performed remotely on the computer system, and only display information
and controls were sent from the computer to the display terminals, which were connected to the central
computer via various types of communications networks.

All the DBMS functionality, application program execution, and user inter-face processing were carried
out on one machine. The figure above illustrates the physical components in a centralized architecture.
Gradually, DBMS systems started to exploit the available processing power at the user side, which led to
client/server DBMS architectures.

AASTU Compiled by Chere L. (M.Tech) Page 17


Fundamentals of Database System

Client-Server Architectures:
The client/server architecture was developed to deal with computing environments in which a large
number of PCs, workstations, file servers, printers, data-base servers, Web servers, e-mail servers, and
other software and equipment are connected via a network. The idea is to define specialized servers with
specific functionalities.
There are different client-server DBMS. This includes Specialized Servers with Specialized functions,
Clients and DBMS Server.
Specialized Servers with Specialized functions.
 File Servers --- maintains the files of the client machines.
 Printer Servers -- being connected to various printers; all print requests by the clients are
forwarded to this machine.
 Web Servers and E-mail Servers also fall into the specialized server category

Clients:- The resources provided by specialized servers can be accessed by many client machines.
 The client machines provide appropriate interfaces and a client-version of the system to access
and utilize the server resources as well as with local processing power to run local applications.
 Clients maybe diskless machines or PCs or Workstations with disks with only the client software
installed. Others would have both client and server functionality.
 Connected to the servers via some form of a network (LAN: local area network, wireless
network, etc.)

DBMS Server
 A server is a system containing both hard-ware and software that can provide services to the
client machines, such as file access, printing, archiving, or database access.
 Provides database query and transaction services to the clients
 Sometimes called query and transaction servers

Two main types of basic DBMS architectures were created on this underlying client/server framework:
 Two-tier client/server architecture
 Three-tier client/server architecture

AASTU Compiled by Chere L. (M.Tech) Page 18


Fundamentals of Database System
(a) Two Tier Client-Server Architecture
 User Interface Programs and Application Programs run on the client side
 Interface called ODBC (Open Database Connectivity) provides an Application program
interface (API) allow client side programs to call the DBMS. Most DBMS vendors provide
ODBC drivers.
 A client program may connect to several DBMSs.
The different approach to two-tier client/server architecture was taken by some object-oriented DBMSs,
where the software modules of the DBMS were divided between client and server in a more integrated
way. For example, the server level may include the part of the DBMS software responsible for handling
data storage on disk pages, local concurrency control and recovery, buffering and caching of disk
pages, and other such functions. Meanwhile, the client level may handle the user interface; data
dictionary functions; DBMS interactions with programming language compilers; global query
optimization, concurrency control, and recovery across multiple servers; structuring of complex objects
from the data in the buffers; and other such functions.
In this approach, the client/server interaction is more tightly coupled and is done internally by the
DBMS modules some of which reside on the client and some on the server rather than by the
users/programmers. The exact division of functionality can vary from system to system. In such a
client/server architecture, the server has been called a data server because it provides data in disk pages
to the client. This data can then be structured into objects for the client programs by the client-side
DBMS software.

The architectures described here are called two-tier architectures because the software components are
distributed over two systems: client and server. The advantages of this architecture are its simplicity and
seamless compatibility with existing systems.

AASTU Compiled by Chere L. (M.Tech) Page 19


Fundamentals of Database System

(b) Three Tier Client-Server Architecture


The emergence of the Web changed the roles of clients and servers, leading to the three-tier architecture.
Many Web applications use an architecture called the three-tier architecture, which adds an intermediate
layer between the client and the database server, as illustrated in Figure below:

AASTU Compiled by Chere L. (M.Tech) Page 20


Fundamentals of Database System
The Intermediate Layer called Application Server or Web Server depending on the application. This
server plays an intermediary role by running application programs, stores the web connectivity software
and the rules and business logic (constraints) part of the application used to access the right amount of
data from the database server.
Clients contain GUI interfaces and some additional application-specific business rules. The intermediate
server accepts requests from the client, processes the request and sends database queries and commands
to the database server, and then acts as a conduit for passing (partially) processed data from the database
server to the clients, where it may be processed further and filtered to be presented to users in GUI
format. Thus, the user interface, application rules, and data access act as the three tiers.
Figure 2.7(b) shows another architecture used by database and other application package vendors. The
presentation layer displays information to the user and allows data entry. The business logic layer
handles intermediate rules and constraints before data is passed up to the user or down to the DBMS.
The bottom layer includes all data management services. The middle layer can also act as a Web server,
which retrieves query results from the database server and formats them into dynamic Web pages that
are viewed by the Web browser at the client side.
It can also improve database security by checking a client’s credentials before forwarding a request to
the data-base server.
 encrypt the data at the server before transmission
 decrypt data at the client
Other architectures have also been proposed. It is possible to divide the layers between the user and the
stored data further into finer components, thereby giving rise to n-tier architectures, where n may be
four or five tiers. Typically, the business logic layer is divided into multiple layers. Besides distributing
programming and data throughout a network, n -tier applications afford the advantage that any one tier
can run on an appropriate processor or operating system platform and can be handled independently.

2.5 Classification of DBMSs


Several criteria are normally used to classify DBMSs.
Based on the data model used:
• Traditional: Relational, Network, Hierarchical.
• Emerging: Object-oriented, Object-relational.
Based on number of users:
• Single-user: support only one user at a time and typically used with micro- computers
• Multi-user: support con-current multiple users. It include major of DBMSs.

AASTU Compiled by Chere L. (M.Tech) Page 21


Fundamentals of Database System
Based on number of sites:
• Centralized: the data is stored at a single computer site. A centralized DBMS can support
multiple users, but the DBMS and the database reside totally at a single computer site.
• Distributed: an have the actual database and DBMS software distributed over many sites,
connected by a computer network. It supports multiple computers and use multiple databases.
• Distributed Database Systems have now come to be known as client server based database
systems because they do not support a totally distributed environment, but rather a set of
database servers supporting a set of clients.
• Variations of Distributed Environments:
 Homogeneous DDBMS: use the same DBMS software at all the sites
 Heterogeneous DDBMS: use different DBMS software at each site
• It is also possible to develop middleware software to access several autonomous preexisting
databases stored under heterogeneous DBMSs. This leads to federated DBMS (or multi-
database system), in which the participating DBMSs are loosely coupled and have a degree of
local autonomy.
We can also classify a DBMS on the basis of the types of access path options for storing files. One well-
known family of DBMSs is based on inverted file structures. Finally, a DBMS can be general purpose
or special purpose. When performance is a primary consideration, a special-purpose DBMS can be
designed and built for a specific application; such a system cannot be used for other applications without
major changes. Many airline reservations and telephone directory systems developed in the past are
special-purpose DBMSs. These fall into the category of online transaction processing (OLTP) systems,
which must support a large number of concurrent transactions without imposing excessive delays.

AASTU Compiled by Chere L. (M.Tech) Page 22


Fundamentals of Database System

Chapter Three
Relational Data Model
3.1 Properties of Relational Databases
 Each row of table is uniquely identified by a Primary Key composed of one or more columns
 Each tuple in a relation must be unique
 Group of columns, that uniquely identifies a row in a table is called a Candidate Key
 Entity Integrity rule of the model states that no component of the primary key may contain a
NULL value.
 A column or combination of columns that matches the primary key of another table is called a
Foreign Key. Used to cross-reference tables.
 The Referential Integrity Rule of the model states that, for every foreign key value in a table
there must be a corresponding primary key value in another table in the database or it should
be NULL.
 All tables are Logical Entities
 A table is either a BASE TABLES (Named Relations) or VIEWS (Unnamed Relations)
 Only Base Tables are physically stores
 VIEWS are derived from BASE TABLES with SQL instructions like: [SELECT ..FROM
..WHERE .. ORDER BY]
 In the collection of tables, Each entity represented in one table and Attributes are fields
(columns) in table
 Order of rows and columns is immaterial
 Entries with repeating groups are said to be un-normalized
 Entries are single-valued
 Each column (field or attribute) has a distinct name
 All values in a column represent the same attribute and have the same data format

3.2 Building Blocks of the Relational Data Model


The building blocks of the relational data model are:
 Entities: real world physical or logical object
 Attributes: properties used to describe each Entity or real world object.
 Relationship: the association between Entities
 Constraints: rules that should be obeyed while manipulating the data.

AASTU Compiled by Chere L. (M.Tech) Page 1


Fundamentals of Database System
3.2.1 Entities
The Entities (persons, places, things etc.) which the organization has to deal with. Relations can also
describe relationships. The name given to an entity should always be a singular noun descriptive of each
item to be stored in it. E.g.: student NOT students.
3.2.2 Attributes
The Attributes are the items of information which characterize and describe these entities. Attributes
are pieces of information about entities. The analysis must of course identify those which are actually
relevant to the proposed application. Attributes will give rise to recorded items of data in the database
At this level we need to know such things as:
 Attribute name (be explanatory words or phrases)
 The domain from which attribute values are taken (A DOMAIN is a set of values from which
attribute values may be taken.) Each attribute has values taken from a domain.
For example, the domain of Name is string and that for salary is real
 Whether the attribute is part of the entity identifier (attributes which just describe an entity and
those which help to identify it uniquely)
 Whether it is permanent or time-varying (which attributes may change their values over time)
 Whether it is required or optional for the entity (whose values will sometimes be unknown or
irrelevant)

Types of Attributes
(1) Simple (atomic) Vs Composite attributes
• Simple : an attribute that cannot be subdivided (contains a single value) E.g. Age, gender
• Composite attribute, is an attribute that can be further subdivided to yield additional attributes.
• For example, the attribute ADDRESS can be subdivided into street_Address, city, state, and
Postal code. Similarly, the attribute PHONE_NUMBER can be subdivided into area code and
exchange number.
• To facilitate detailed queries, it is wise to change composite attributes into a series of simple
attributes.
(2) Single-valued Vs multi-valued attributes
• Single-valued attribute is an attribute that can have only a single value (the value may change
but has only one value at one time). For example, a person can have only one Name, Sex, Id.
No. color_of_eyes, Social Security number, and a manufactured part can have only one serial
number.
• Keep in mind that a single-valued attribute is not necessarily a simple attribute. For instance, a
part’s serial number, such as SE-08-02-189935, is single-valued, but it is a composite attribute

AASTU Compiled by Chere L. (M.Tech) Page 2


Fundamentals of Database System
because it can be subdivided into the region in which the part was produced (SE), the plant
within that region (08), the shift within the plant (02), and the part number (189935).
 Multi-Valued attributes are attributes that can have many values. For instance, a person may
have several Address, dependent-name, college degrees and a household may have several
different phones, each with its own number. Similarly, a car’s color may be subdivided into
many colors (that is, colors for the roof, body, and trim).

(3) Stored vs. Derived Attribute


• Stored : - is an attribute in which the value is stored in the attribute of the entity. Not possible
to derive or compute. E.g. Name, Address, Birthdates.
• Stored attribute on the other hand, saves CPU processing cycles, saves data access time, data
value is readily available and can be used to keep track of historical data. However, stored
attributes has disadvantages such as, requires constant maintenance to ensure derived value
is current, especially if any values used in the calculation change.
• Derived attribute is an attribute whose value is calculated (derived) from other attributes.
sometimes referred to as computed attributes. E.g. Age (current year – year of birth)
Length of employment (current date- start date)
G.P.A (grade point/credit hours)
 The derived attribute need not be physically stored within the database; instead, it can be derived
by using an algorithm. It is used to save storage space because computation always yields
current value.But the derived attributes have its own disadvantages such as, uses CPU
processing cycles, increases data access time and adds coding complexity to queries.

(4) Null Values


• NULL applies to attributes which are not applicable or which do not have values.
• You may enter the value NA (meaning not applicable)
• Value of a key attribute cannot be null.
• Default value - assumed value if no explicit value

Entity versus Attributes


When designing the conceptual specification of the database, one should pay attention to the distinction
between an Entity and an Attribute.
 Consider designing a database of employees for an organization:
 Should address be an attribute of Employees or an entity (connected to Employees by a
relationship)?
 If we have several addresses per employee, address must be an entity (attributes cannot be
set-valued/multi valued)

AASTU Compiled by Chere L. (M.Tech) Page 3


Fundamentals of Database System
If the structure (city, Woreda, Kebele, etc) is important, e.g. want to retrieve employees in a given city,
address must be modeled as an entity (attribute values are atomic)

Relationships
The Relationships between entities which exist and must be taken into account when processing
information. In any business processing one object may be associated with another object due to some
event. Such kind of association is what we call a RELATIONSHIP between entity objects.
• One external event or process may affect several related entities.
• Related entities require setting of LINKS from one part of the database to another.
• A relationship should be named by a word or phrase which explains its function
• Role names are different from the names of entities forming the relationship: one entity may
take on many roles, the same role may be played by different entities
• For each RELATIONSHIP, one can talk about the Number of Entities and the Number of
Tuples participating in the association. These two concepts are called Degree and Cardinality
of a relationship respectively.

Degree of a Relationship
An important point about a relationship is how many entities participate in it. The number of entities
participating in a relationship is called the Degree of the relationship. Among the Degrees of
relationship, the following are the basic:
 Unary/Recursive Relationship: Tuples/records of Single entity are related with each other.
 Binary Relationships: Tuples/records of two entities are associated in a relationship
 Ternary Relationship: Tuples/records of three different entities are associated
 N-Nary Relationship (a generalized one): Tuples from arbitrary number of entity sets are
participating in a relationship.

Cardinality of a Relationship
Another important concept about relationship is the number of instances/tuples that can be associated
with a single instance from one entity in a single relationship. The number of instances participating or
associated with a single instance from an entity in a relationship is called the Cardinality of the
relationship. The major cardinalities of a relationship are:
 ONE-TO-ONE: one tuple is associated with only one other tuple.
E.g. Building – Location as a single building will be located in a single location and as a
single location will only accommodate a single Building.
 ONE-TO-MANY, one tuple can be associated with many other tuples, but not the reverse.
E.g. Department-Student as one department can have multiple students.

AASTU Compiled by Chere L. (M.Tech) Page 4


Fundamentals of Database System
 MANY-TO-ONE, many tuples are associated with one tuple but not the reverse.
E.g. Employee – Department: as many employees belong to a single department.
 MANY-TO-MANY: one tuple is associated with many other tuples and from the other side, with
a different role name one tuple will be associated with many tuples
E.g. Student – Courseas a student can take many courses and a single course can be
attended by many students.

Relational Constraints/Integrity Rules


1. Relational Integrity
 Domain Integrity: No value of the attribute should be beyond the allowable limits
 Entity Integrity: In a base relation, no attribute of a Primary Key can assume a value of NULL
 Referential Integrity: If a Foreign Key exists in a relation, either the Foreign Key value must
match a Candidate Key value in its home relation or the Foreign Key value must be NULL
 Enterprise Integrity: Additional rules specified by the users or database administrators of a
database are incorporated
2. Key constraints
If tuples are need to be unique in the database, and then we need to make each tuple distinct. To do this
we need to have relational keys that uniquely identify each relation.
 Super Key: an attribute or set of attributes that uniquely identifies a tuple within a relation.
 Candidate Key: a super key such that no proper subset of that collection is a Super Key within
the relation. A candidate key has two properties: Uniqueness and Irreducibility
If a super key is having only one attribute, it is automatically a Candidate key. If a candidate key
consists of more than one attribute it is called Composite Key.
 Primary Key: the candidate key that is selected to identify tuples uniquely within the relation.
The entire set of attributes in a relation can be considered as a primary case in a worst case.
 Foreign Key: an attribute, or set of attributes, within one relation that matches the candidate key
of some relation. A foreign key is a link between different relations to create the view or the
unnamed relation

Relational Views
Relations are perceived as a Table from the users’ perspective. Actually, there are two kinds of relation
in relational database. The two categories or types of Relations are Named and Unnamed Relations. The
basic difference is on how the relation is created, used and updated:
1. Base Relation:- A Named Relation corresponding to an entity in the conceptual schema, whose
tuples are physically stored in the database.

AASTU Compiled by Chere L. (M.Tech) Page 5


Fundamentals of Database System
2. View (Unnamed Relation):- A View is the dynamic result of one or more relational operations
operating on the base relations to produce another virtual relation that does not actually exist as
presented. So a view is virtually derived relation that does not necessarily exist in the database but
can be produced upon request by a particular user at the time of request. The virtual table or relation
can be created from single or different relations by extracting some attributes and records with or
without conditions.
Purpose of a view
 Hides unnecessary information from users: since only part of the base relation (Some collection
of attributes, not necessarily all) are to be included in the virtual table.
 Provide powerful flexibility and security: since unnecessary information will be hidden from the
user there will be some sort of data security.
 Provide customized view of the database for users: each users are going to be interfaced with
their own preferred data set and format by making use of the Views.
 A view of one base relation can be updated.
 Update on views derived from various relations is not allowed since it may violate the integrity
of the database.
 Update on view with aggregation and summary is not allowed. Since aggregation and summary
results are computed from a base relation and does not exist actually.

AASTU Compiled by Chere L. (M.Tech) Page 6


Fundamentals of Database System

Chapter Four
Data Modeling Using the Entity-Relationship (ER) Model
4.1 Database Design
Database design is the process of coming up with different kinds of specification for the data to be stored
in the database. The database design part is one of the middle phases we have in information systems
development where the system uses a database approach. Design is the part on which we would be
engaged to describe how the data should be perceived at different levels and finally how it is going to be
stored in a computer system.
The ability to design databases and associated applications is critical to the success of the modern
enterprise. Database design requires understanding both the operational and business requirements of an
organization as well as the ability to model and realize those requirements using a database.
Developing database and information systems is performed using a development lifecycle, which
consists of a series of steps. As it is one component in most information system development tasks, there
are several steps to follow in designing a database system.
Information System with Database application consists of several tasks which include:
 Planning of Information systems Design
 Requirements Analysis,
 Design (Conceptual, Logical and Physical Design)
 Tuning
 Implementation
 Operation and Support
The requirements gathering and specification provides you with a high-level understanding of the
organization, its data, and the processes that you must model in the database. Database design involves
constructing a suitable model of this information. Since the design process is complicated, especially for
large databases, database design is mainly focused on this three phases:
1. Conceptual Design
2. Logical Design, and
3. Physical Design
In general, one has to go back and forth between these tasks to refine a database design, and decisions in
one task can influence the choices in another task.

AASTU Compiled by Chere L. (M.Tech) Page 1


Fundamentals of Database System
The Three levels of Database Design

Conceptual Database Design


Conceptual design is the process of constructing a model of the information used in an enterprise,
independent of any physical considerations.
 The process of constructing a model of the information used in an enterprise.
 It used as input or source of information for the logical design phase.
 Mostly uses an Entity Relationship Model to describe the data at this level.
 It is a phase which is independent of all physical considerations (DBMS, OS, . . . ).
Conceptual design revolves around discovering and analyzing organizational and user data
requirements. The important activities are to identify
 Entities
 Attributes
 Relationships
 Constraints
And based on these components develop the ER model using
 ER diagrams
After the completion of Conceptual Design one has to go for refinement of the schema, which is
verification of Entities, Attributes, and Relationships
In developing a good design, one should answer such questions as:
 What are the relevant Entities for the Organization
 What are the important features of each Entity
 What are the important Relationships
 What are the important queries from the user
 What are the constraints(business rules) that (must) hold for entities and relationships?
 What are the other requirements of the Organization and the Users

AASTU Compiled by Chere L. (M.Tech) Page 2


Fundamentals of Database System
Reasons for conceptual modeling
 Helps users and system developers to identify data requirements (abstract model)
 Helps in understanding how existing systems can be modified/maintained
 Allows for easy communication between end-users and developers.
 Independent of dbms or any os.
 Has a clear method to convert from high-level model to relational model.
 It is a permanent description of the database requirements.

Logical Database Design


Logical design is the process of constructing a model of the information used in an enterprise based on a
specific data model (e.g. relational, hierarchical or network or object), but independent of a particular
DBMS and other physical considerations.
 Collection of Rules to be maintained
 Discover new entities in the process
 Revise attributes based on the rules and the discovered Entities

Physical Database Design


Physical design is the process of producing a description of the implementation of the database on
secondary storage. -- defines specific storage or access methods used by database
 Describes the storage structures and access methods used to achieve efficient access to data.
 Tailored to a specific DBMS system -- Characteristics are function of DBMS and operating
systems
 Includes estimate of storage space

4.2 The Entity Relationship (E-R) Model


ER model is the graphical representation of entities and their relationships in a database structure that
quickly became popular because it complemented relational data model concepts. Entity-Relationship
modeling is used to represent conceptual view of database. The main components of ER Modeling are:
(a) Entities
 An entity is defined as anything about which data are to be collected and stored.
 Represented in the ERD by a rectangle, also known as an entity box.
 The name of the entity, a noun, is written in the center of the rectangle.
 The entity name is generally written in capital letters and is written in the singular form:
PAINTER rather than PAINTERS, and EMPLOYEE rather than EMPLOYEES.

AASTU Compiled by Chere L. (M.Tech) Page 3


Fundamentals of Database System
 Usually, when applying the ERD to the relational model, an entity is mapped to a relational
table. i.e. Corresponds to entire table, not row
 Each row in the relational table is known as an entity instance or entity occurrence in the ER
model.
 Each entity is described by a set of attributes that describes particular characteristics of the entity.
 For example, the entity EMPLOYEE will have attributes such as a Social Security number, a last
name, and a first name.
Examples of Entities
Persons: agency, contractor, customer, department, division, employee, instructor, student,
supplier.
Places: sales region, building, room, branch office, campus.
Objects: book, machine, part, product, raw material, software license, software package, tool,
vehicle model, vehicle.
Events: application, award, cancellation, class, flight, invoice, order, registration, renewal,
requisition, reservation, sale, trip.

(b) Attributes
 Are properties used to describe each Entity or real world object.
 Are used to store pieces of information about entities.
 Attributes will give rise to recorded items of data in the database
 For example, the STUDENT entity includes, among many others, the attributes STU_LNAME,
STU_FNAME, and STU_INITIAL.
 In the original Chen notation, attributes are represented by ovals and are connected to the entity
rectangle with a line.

(c) Relationships
 Relationships describe associations among data (exist between entities).
 Most relationships describe associations between two entities.
 Relationship (relationship type) is a meaningful association among entity types.
 Generally, a relationship is represented as a connection between (or among) entities.
 In standard ER model, it uses a diamond shape to connect between (or among) entities.

 The relationship name is an active or passive verb; for example, a STUDENT takes a CLASS,
a PROFESSOR teaches a CLASS, a DEPARTMENT employs a PROFESSOR, a
DIVISION is managed by an EMPLOYEE.

AASTU Compiled by Chere L. (M.Tech) Page 4


Fundamentals of Database System
 There are several type of relationships based on the degree, cardinality, and participation.

 The entities that participate in a relationship are also known as participants, and each
relationship is identified by a name that describes the relationship.

When the basic data model components were introduced, three types of relationships among data were
illustrated:
 One-to-Many (1:M)
 Many-to-Many (M:N), and
 One-to-One (1:1)
The ER model uses the term connectivity to label the relationship types.
 The name of the relationship is usually an active or passive verb.
 For example, a PAINTER paints many PAINTINGs; an EMPLOYEE learns many SKILLs;
an EMPLOYEE manages a STORE.

(d) Constraints:- Represent the constraint in the data

Before working on the conceptual design of the database, one has to know and answer the following
basic questions.
• What are the entities and relationships in the enterprise?
• What information about these entities and relationships should we store in the database?
• What is the integrity constraints that hold? Constraints on each data with respect to update,
retrieval and store.
• Represent this information pictorially in ER diagrams, then map ER diagram into a relational
schema.

4.3 Developing an E-R Diagram


Designing conceptual model for the database is not a one linear process but an iterative activity where
the design is refined again and again.
To identify the entities, attributes, relationships, and constraints on the data, there are different set of
methods used during the analysis phase. These include information gathered by…
 Interviewing end users individually and in a group
 Questionnaire survey
 Direct observation
 Examining different documents

AASTU Compiled by Chere L. (M.Tech) Page 5


Fundamentals of Database System
The basic E-R model is graphically depicted and presented for review. The process is repeated until the
end users and designers agree that the E-R diagram is a fair representation of the organization’s
activities and functions.
Checking for Redundant Relationships in the ER Diagram. Relationships between entities indicate
access from one entity to another - it is therefore possible to access one entity occurrence from another
entity occurrence even if there are other entities and relationships that separate them - this is often
referred to as Navigation' of the ER diagram
The last phase in ER modeling is validating an ER Model against requirement of the user.

Graphical Representations in ER Diagramming


• Entity is represented by a RECTANGLE containing the name of the entity.

• Connected entities are called relationship participants


• Attributes are represented by OVALS and are connected to the entity by a line.

• A derived attribute is indicated by a DOTTEDLINE. (……..)

Ovals

• Primary keys are underlined.

Key

• Partial keys are dotted lined.

Key

AASTU Compiled by Chere L. (M.Tech) Page 6


Fundamentals of Database System
• Relationships are represented by DIAMOND shaped symbols
 Weak Relationship is a relationship between Weak and Strong Entities
 Strong Relationship is a relationship between two strong Entities

AASTU Compiled by Chere L. (M.Tech) Page 7


Fundamentals of Database System
Example 1: Build an ER Diagram for the following information:
A student record management system will have the following two basic data object categories with their
own features or properties: Students will have an Id, Name, Dept, Age, GPA and Course will have an Id,
Name, Credit Hours. Whenever a student enroll in a course in a specific Academic Year and Semester,
the Student will have a grade for the course

Example 2: Build an ER Diagram for the following information:


A Personnel record management system will have the following two basic data object categories with
their own features or properties: Employee will have an Id, Name, DoB, Age, Tel and Department will
have an Id, Name, Location. Whenever an Employee is assigned in one Department, the duration of his
stay in the respective department should be registered.

Example 3: Build an ER Diagram for the following information:


A company database needs to store information about employees (identifyied by ssn, with salary and
phone as attributes); departments (identified by dno, with dname and budget as attributes); and children
of employees (with name and age as attributes). Employees work in departments; each department is
managed by an employee; a child must be identified uniquely by name when the parent (who is an
employee; assume that only one parent works for the company) is known. We are not interested in
information about a child once the parent leaves the company.

AASTU Compiled by Chere L. (M.Tech) Page 8


Fundamentals of Database System
4.4 Structural Constraints on Relationship
Relationship types usually have certain constraints that limit the possible combinations of entities that
may participate in the corresponding relationship set. These constraints are determined from the mini-
world situation that the relationships rep-resent. For example, One company may has a rule that each
employee must work for exactly one department, then we would like to describe this constraint in the
schema. We can distinguish two main types of relationship constraints: cardinality ratio and
participation.

(a) Cardinality Ratio (Multiplicity) Constraints


Multiplicity constraint is the number or range of possible occurrence of an entity type/relation that may
relate to a single occurrence/tuple of an entity type/relation through a particular relationship. As general
it specifies the maximum number of relationship instances that an entity can participate in.
For example, in the WORKS_FOR binary relationship type, DEPARTMENT : EMPLOYEE is of
cardinality ratio 1:N, meaning that each department can be related to (that is, employs) any number of
employees, but an employee can be related to (work for) only one department. This means that for this
particular relationship WORKS_FOR, a particular department entity can be related to any number of
employees (N indicates there is no maximum number). On the other hand, an employee can be related to
a maximum of one department.
The possible cardinality ratios for binary relationship types are 1:1, 1:N, N:1, and M:N. The cardinality
ratio mostly used to insure appropriate enterprise constraints.
One-to-one relationship (1:1)
• A customer is associated with at most one loan via the relationship borrower
• A loan is associated with at most one customer via borrower

E.g.: Relationship Manages between STAFF and BRANCH


The multiplicity of the relationship is:
 One branch can only have one manager
 One employee could manage either one or no branches

AASTU Compiled by Chere L. (M.Tech) Page 9


Fundamentals of Database System
One-To-Many Relationships
• An entity on one side of the relationship can have many related entities, but an entity on the other
side will have a maximum of one related entity
• In the one-to-many relationship a loan is associated with at most one customer via borrower, a
customer is associated with several (including 0) loans via borrower

E.g.: Relationship Leads between STAFF and PROJECT


The multiplicity of the relationship
 One staff may Lead one or more project(s)
 One project is Lead by one staff

Many-To-Many Relationship (Sometimes called non-specific)


When for one instance of entity A, there are zero, one, or many instances of entity B and for one
instance of entity B there are zero, one, or many instances of entity A.
An example is: employees can be assigned to no more than two projects at the same time; projects must
have assigned at least three employees. A single employee can be assigned to many projects; conversely,
a single project can have assigned to it many employee.
Here the cardinality for the relationship between employees and projects is two and the cardinality
between project and employee is three.
Many-to-many relationships cannot be directly translated to relational tables but instead must be
transformed into two or more one-to-many relationships using associative entities.
• A customer is associated with several (possibly 0) loans via borrower
• A loan is associated with several (possibly 0) customers via borrower

AASTU Compiled by Chere L. (M.Tech) Page 10


Fundamentals of Database System
E.g.: Relationship Teaches between INSTRUCTOR and COURSE
The multiplicity of the relationship
• One Instructor Teaches one or more Course(s)
• One Course Thought by Zero or more Instructor(s)

(b) Participation of an Entity Set in a Relationship Set


Recall that relationships are bidirectional; that is, they operate in both directions. For instance, if
COURSE is related to CLASS, then by definition, CLASS is related to COURSE. Because of the
bidirectional nature of relationships, it is necessary to determine the connectivity of the relationship
from COURSE to CLASS and the connectivity of the relationship from CLASS to COURSE. Similarly,
the specific maximum and minimum cardinalities must be determined in each direction for the
relationship. Once again, you must consider the bidirectional nature of the relationship when
determining participation.
The participation constraint specifies whether the existence of an entity depends on its being related to
another entity via the relationship type. This constraint specifies the minimum number of relationship
instances that each entity can participate in, and is some-times called the minimum cardinality
constraint.
Participation constraint of a relationship is involved in identifying and setting the mandatory or optional
feature of an entity occurrence to take a role in a relationship. There are two distinct participation
constraints with this respect, namely: Total Participation and Partial Participation

Total participation:
Every tuple in the entity or relation participates in at least one relationship by taking a role. This means,
every tuple in a relation will be attached with at least one other tuple. The entity with total participation
in a relationship will be connected to the relationship using a double line. The existence of a mandatory
relationship indicates that the minimum cardinality is at least 1 for the mandatory entity.
Let’s examine a few more scenarios. Suppose that Tiny College employs some professors who
conduct research without teaching classes.
If you examine the “PROFESSOR teaches CLASS” relationship, it is quite possible for a
PROFESSOR not to teach a CLASS. Therefore, CLASS is optional to PROFESSOR. On the
other hand, a CLASS must be taught by a PROFESSOR. Therefore, PROFESSOR is mandatory
to CLASS

AASTU Compiled by Chere L. (M.Tech) Page 11


Fundamentals of Database System
Partial participation:
Some tuple in the entity or relation may not participate in the relationship. This means, there is at least
one tuple from that Relation not taking any role in that specific relationship. The entity with partial
participation in a relationship will be connected to the relationship using a single line.
For example, in the “COURSE generates CLASS” relationship, you noted that at least some
courses do not generate a class. In other words, an entity occurrence in the COURSE table does
not necessarily require the existence of a corresponding entity occurrence in the CLASS table.
(Remember that each entity is implemented as a table.) Therefore, the CLASS entity is
considered to be optional to the COURSE entity. The existence of an optional entity indicates
that the minimum cardinality is 0 for the optional entity. (The term optionality is used to label
any condition in which one or more optional relationships exist.)
E.g. 1: Participation of EMPLOYEE in “belongs to” relationship with DEPARTMENT is total since
every employee should belong to a department.
Participation of DEPARTMENT in “belongs to” relationship with EMPLOYEE is total since every
department should have more than one employee.

E.g. 2: Participation of EMPLOYEE in “manages” relationship with sDEPARTMENT, is partial


participation since not all employees are managers.
Participation of DEPARTMENT in “Manages” relationship with EMPLOYEE is total since every
department should have a manager.

4.5 Problem in ER Modeling


The Entity-Relationship Model is a conceptual data model that views the real world as consisting of
entities and relationships. The model visually represents these concepts by the Entity-Relationship
diagram. The basic constructs of the ER model are entities, relationships, and attributes. Entities are
concepts, real or abstract, about which information is collected. Relationships are associations between
the entities. Attributes are properties which describe the entities.
While designing the ER model one could face a problem on the design which is called a connection
traps. Connection traps are problems arising from misinterpreting certain relationships
There are two types of connection traps; fan trap and chasm traos

AASTU Compiled by Chere L. (M.Tech) Page 12


Fundamentals of Database System
1. Fan trap:
Occurs where a model represents a relationship between entity types, but the pathway between certain
entity occurrences is ambiguous.
May exist where two or more one-to-many (1:M) relationships fan out from an entity. The problem
could be avoided by restructuring the model so that there would be no 1:M relationships fanning out
from a single entity and all the semantics of the relationship is preserved.
Example:

Semantics description of the problem;

Problem: Which car (Car1 or Car3 or Car5) is used by Employee 6 Emp6 working in Branch 1 (Bra1)?
Thus from this ER Model one cannot tell which car is used by which staff since a branch can have more
than one car and also a branch is populated by more than one employee. Thus we need to restructure the
model to avoid the connection trap.
To avoid the Fan Trap problem we can go for restructuring of the E-R Model. This will result in the
following E-R Model.

Semantics description of the problem;

AASTU Compiled by Chere L. (M.Tech) Page 13


Fundamentals of Database System
2. Chasm Trap:
Occurs where a model suggests the existence of a relationship between entity types, but the path way
does not exist between certain entity occurrences.
May exist when there are one or more relationships with a minimum multiplicity on cardinality of zero
forming part of the pathway between related entities.
Example:

If we have a set of projects that are not active currently then we can not assign a project manager for
these projects. So there are project with no project manager making the participation to have a minimum
value of zero.
Problem:
How can we identify which BRANCH is responsible for which PROJECT? We know that whether the
PROJECT is active or not there is a responsible BRANCH. But which branch is a question to be
answered, and since we have a minimum participation of zero between employee and PROJECT we
can’t identify the BRANCH responsible for each PROJECT.
The solution for this Chasm Trap problem is to add another relation ship between the extreme entities
(BRANCH and PROJECT)

Example;
The company is organized into departments. Each department has a unique name, a unique number, and
a particular employee who manages the department. We keep track of the start date when that employee
began managing the department. A department may have several locations. A department controls a
number of projects, each of which has a unique name, a unique number, and a single location.
We store each employee’s name, Social Security number, address, salary, sex(gender), and birth date.
An employee is assigned to one department, but may work on several projects, which are not necessarily
controlled by the same department. We keep track of the current number of hours per week that an
employee works on each project. We also keep track of the direct supervisor of each employee (who is
another employee). We want to keep track of the dependents of each employee for insurance purposes.
We keep each dependent’s first name, sex, birth date, and relation-ship to the employee

AASTU Compiled by Chere L. (M.Tech) Page 14


Fundamentals of Database System
We can now define the entity types for the COMPANY database, based on the requirements described
above.
1. An entity type DEPARTMENT with attributes Name, Number, Locations, Manager, and
Manager_start_date. Locations is the only multivalued attribute. We can specify that both Name
and Number are (separate) key attributes because each was specified to be unique.
2. An entity type PROJECT with attributes Name, Number, Location , and Controlling_department.
Both Name and Numberare (separate) key attributes.
3. An entity type EMPLOYEE with attributes Name, Ssn , Sex, Address, Salary, Birth_date,
Department, and Supervisor. Both Name and Address may be composite attributes; however, this
was not specified in the requirements. We must go back to the users to see if any of them will refer
to the individual components of Name—First_name , Middle_initial, Last_name —or of Address.
4. An entity type DEPENDENT with attributes Employee, Dependent_name, Sex, Birth_date, and
Relationship(to the employee).

So far, we have not represented the fact that an employee can work on several projects, nor have we
represented the number of hours per week an employee works on each project. This characteristic is
listed as part of the third requirement and it can be represented by a multivalued composite attribute of
EMPLOYEE called Works_on with the simple components (Project, Hours). Alternatively, it can be
represented as a multivalued composite attribute of PROJECT called Workers with the simple

AASTU Compiled by Chere L. (M.Tech) Page 15


Fundamentals of Database System
components (Employee, Hours). The Name attribute of EMPLOYEE is shown as a composite attribute,
presumably after consultation with the users

Exercises
1. Consider the following set of requirements for a UNIVERSITY database that is used to keep track of
students’ transcripts.
a) The university keeps track of each student’s name, student number, Social Security number,
current address and phone number, permanent address and phone number, birth date, sex,
class (freshman, sophomore, ..., grad-uate), major department, minor department (if any), and
degree program (B.A., B.S., ..., Ph.D.). Some user applications need to refer to the city, state,
and ZIP Code of the student’s permanent address and to the stu-dent’s last name. Both Social
Security number and student number have unique values for each student.
b) Each department is described by a name, department code, office num-ber, office phone
number, and college. Both name and code have unique values for each department.
c) Each course has a course name, description, course number, number of semester hours, level,
and offering department. The value of the course number is unique for each course.

AASTU Compiled by Chere L. (M.Tech) Page 16


Fundamentals of Database System
d) Each section has an instructor, semester, year, course, and section num-ber. The section
number distinguishes sections of the same course that are taught during the same
semester/year; its values are 1, 2, 3, ..., up to the number of sections taught during each
semester.
e) A grade report has a student, section, letter grade, and numeric grade (0,1, 2, 3, or 4).
Design an ER schema for this application, and draw an ER diagram for the schema. Specify key
attributes of each entity type, and structural constraints on each relationship type. Note any unspecified
requirements, and make appropriate assumptions to make the specification complete.

2. Design an ER schema for keeping track of information about votes taken in the U.S. House of
Representatives during the current two-year congressional session. The database needs to keep track
of each U.S. STATE ’s Name (e.g.,‘Texas’, ‘New York’, ‘California’) and include the Region of
the state (whose domain is {‘Northeast’, ‘Midwest’, ‘Southeast’, ‘Southwest’, ‘West’}). Each
CONGRESS_PERSON in the House of Representatives is described by his or her Name, plus the
District represented, the Start_date when the congress person was first elected, and the political
Party to which he or she belongs (whose domain is {‘Republican’, ‘Democrat’, ‘Independent’,
‘Other’}). The database keeps track of each BILL(i.e., proposed law), including the Bill_name, the
Date_of_vote on the bill, whether the bill Passed_or_failed (whose domain is {‘Yes’, ‘No’}), and the
Sponsor (the congressperson(s) who sponsored—that is, proposed—the bill). The database also
keeps track of how each congressperson voted on each bill (domain of Vote attribute is {‘Yes’, ‘No’,
‘Abstain’, ‘Absent’}). Draw an ER schema diagram for this application. State clearly any
assumptions you make
3. A database is being constructed to keep track of the teams and games of a sports league. A team has
a number of players, not all of whom participate in each game. It is desired to keep track of the
players participating in each game for each team, the positions they played in that game, and the
result of the game. Design an ER schema diagram for this application, stating any assumptions you
make. Choose your favorite sport (e.g., soccer, baseball, football).
4. Consider an entity type SECTION in a UNIVERSITY database, which describes the section
offerings of courses. The attributes of SECTION are Section_number, Semester, Year ,
Course_number , Instructor, Room_no (where section is taught), Building (where section is taught),
Weekdays(domain is the possible combinations of weekdays in which a section can be offered
{‘MWF’, ‘MW’, ‘TT’, and so on}), and Hours (domain is all possible time periods during which
sections are offered {‘9–9:50 A . M .’, ‘10–10:50 A . M .’, ...,‘3:30–4:50 P.M.’, ‘5:30–6:20 P.M.’,
and so on}). Assume that Section_number is unique for each course within a particular
semester/year combination (that is, if a course is offered multiple times during a particular semester,
its section offerings are numbered 1, 2, 3, and so on). There are several composite keys for section,
and some attributes are components of more than one key. Identify three composite keys, and show
how they can be represented in an ER schema diagram.

AASTU Compiled by Chere L. (M.Tech) Page 17


Fundamentals of Database System
4.6 Enhanced E-R (EER) Models
The EER model includes all the modeling concepts of the ER model that were presented in earlier
discussion of this chapter. In addition, it includes the concepts of subclass and superclass and the related
concepts of specialization and generalization. Another concept included in the EER model is that of a
category or union type, which is used to represent a collection of objects (entities) that is the union of
objects of different entity types. Associated with these concepts is the important mechanism of attribute
and relationship inheritance. Unfortunately, no standard terminology exists for these concepts, so we use
the most common terminology and we also describe a diagrammatic technique for displaying these
concepts when they arise in an EER schema. We call the resulting schema diagrams enhanced ER or
EER diagrams.
The EER model can describe as follow:
 Object-oriented extensions to E-R model
 EER is important when we have a relationship between two entities and the participation is
partial between entity occurrences. In such cases EER is used to reduce the complexity in
participation and relationship complexity.
 ER diagrams consider entity types to be primitive objects
 EER diagrams allow refinements within the structures of entity types
 EER Concepts
 Sub classes  Specialization
 Super classes  Attribute Inheritance
 Generalization
 Constraints on specialization and generalization

(a) Subclass/Subtype Vs Superclass /Supertype


As we discussed in previously in this chapter, an entity type is used to represent both a type of entity and
the entity set or collection of entities of that type that exist in the database. For example, the entity type
EMPLOYEE describes the type (that is, the attributes and relationships) of each employee entity, and
also refers to the current set of EMPLOYEE entities in the COMPANY database.
In many cases an entity type has numerous sub-groupings or subtypes of its entities that are meaningful
and need to be represented explicitly because of their significance to the database application. For
example, the entities that are members of the EMPLOYEE entity type may be distinguished further into
SECRETARY, ENGINEER, MANAGER, TECHNICIAN, SALARIED_EMPLOYEE,
HOURLY_EMPLOYEE , and so on. The set of entities in each of the latter groupings is a subset of the
entities that belong to the EMPLOYEE entity set, meaning that every entity that is a member of one of
these sub-groupings is also an employee. We call each of these sub-groupings a subclass or subtype of
the EMPLOYEE entity type, and the EMPLOYEE entity type is called the superclass or supertype for

AASTU Compiled by Chere L. (M.Tech) Page 18


Fundamentals of Database System
each of these subclasses. The Figure below shows how to represent these concepts diagrammatically in
EER diagrams. (The circle notation in Figure will be explained in later on);

Superclass/Supertype Entity
• Is the generalized entity
• An entity type whose tuples share common attributes. Attributes that are shared by all entity
occurrences (including the identifier) are associated with the supertype.

Subclass/Subtype Entity
• An entity type whose tuples have attributes that distinguish its members from tuples of the
generalized or Superclass entities.
• When one generalized Superclass has various subgroups with distinguishing features and these
subgroups are represented by specialized form, the groups are called subclasses.
• Subclasses can be either mutually exclusive (disjoint) or overlapping (inclusive).
• A single subclass may inherit attributes from two distinct superclasses.
• A mutually exclusive category/subclass is when an entity instance can be in only one of the
subclasses.
E.g.: An EMPLOYEE can either be SALARIED or PART-TIMER but not both.
• An overlapping category/subclass is when an entity instance may be in two or more subclasses.
E.g.: A PERSON who works for a university can be both EMPLOYEE and a
STUDENT at the same time.

AASTU Compiled by Chere L. (M.Tech) Page 19


Fundamentals of Database System
(b) Specialization
• Specialization is the process of defining a set of subclasses of an entity type;
• The set of subclasses that forms a specialization is defined on the basis of some distinguishing
characteristic of the entities in the superclass.
• Specialization process identify the distinguishing features of some entity occurrences, and
specialize them into different subclasses.
• Specialized entity is the result of subset of higher level entity set to form lower level entity set.
• The specialized entities will have additional set of attributes (distinguishing characteristics)
that distinguish them from the generalized entity.
• Is considered as Top-Down definition of entities.
• Reasons for Specialization
o Attributes only partially applying to superclasses
o Relationship types only partially applicable to the superclass
• In many cases, an entity type has numerous sub-groupings of its entities that are meaningful
and need to be represented explicitly. This need requires the representation of each subgroup in
the ER model. The generalized entity is a superclass and the set of specialized entities will be
subclasses for that specific Superclass.
 Example: Saving Accounts and Current Accounts are Specialized entities for the
generalized entity Accounts. Manager, Sales, Secretary: are specialized employees.
(c) Generalization
• Generalization is the process of defining a more general entity type from a set of more
specialized entity types.
• A generalization hierarchy is a form of abstraction that specifies that two or more entities that
share common attributes can be generalized into a higher level entity type.
• Generalization occurs when two or more entities represent categories of the same real-world
object.
• Is considered as bottom-up definition of entities.
• Generalization hierarchy depicts relationship between higher level superclass and lower level
subclass.
Generalization hierarchies can be nested. That is, a subtype of one hierarchy can be a supertype of
another. The level of nesting is limited only by the constraint of simplicity.

AASTU Compiled by Chere L. (M.Tech) Page 20


Fundamentals of Database System
Example: Vehicle is a generalized form for Car and Truck

Relationship Between Superclass and Subclass


 The relationship between a superclass and any of its subclasses is called a superclass/subclass or
class/subclass relationship
 An instance can not only be a member of a subclass. i.e. Every instance of a subclass is also an
instance in the Superclass.
 A member of a subclass is represented as a distinct database object, a distinct record that is
related via the key attribute to its super-class entity.
 An entity cannot exist in the database merely by being a member of a subclass; it must also be a
member of the super-class.
 An entity occurrence of a sub class not necessarily should belong to any of the subclasses unless
there is full participation in the specialization.
 A member of a subclass is represented as a distinct database object, a distinct record that is
related via the key attribute to its super-class entity.
 The relationship between a subclass and a Superclass is an “IS A” or “IS PART OF” type.
 Subclass IS PART OF Superclass
 Manager IS AN Employee
 All subclasses or specialized entity sets should be connected with the superclass using a line to a
circle where there is a subset symbol indicating the direction of subclass/superclass relationship.

AASTU Compiled by Chere L. (M.Tech) Page 21


Fundamentals of Database System

 We can also have subclasses of a subclass forming a hierarchy of specialization.


 Superclass attributes are shared by all subclasses of that superclass
 Subclass attributes are unique for the subclass.

(d) Attribute Inheritance


 An entity that is a member of a subclass inherits all the attributes of the entity as a member of the
superclass.
 The entity also inherits all the relationships in which the superclass participates.
 An entity may have more than one subclass categories.
 All entities/subclasses of a generalized entity or superclass share a common unique identifier
attribute (primary key). i.e. The primary key of the superclass and subclasses are always
identical.

 Consider the EMPLOYEE supertype entity shown above. This entity can have several different
subtype entities (for example: HOURLY and SALARIED), each with distinct properties not
shared by other subtypes. But whether the employee is HOURLY or SALARIED, same
attributes (EmployeeId, Name, and DateHired) are shared.
 The Supertype EMPLOYEE stores all properties that subclasses have in common. And
HOURLY employees have the unique attribute Wage (hourly wage rate), while SALARIED
employees have two unique attributes, StockOption and Salary.

AASTU Compiled by Chere L. (M.Tech) Page 22


Fundamentals of Database System
(e) Constraints on specialization and generalization

Completeness Constraint.
• The Completeness Constraint addresses the issue of whether or not an occurrence of a Super
class must also have a corresponding Subclass occurrence.
• The completeness constraint requires that all instances of the subtype be represented in the super
type.
• The Total Specialization Rule specifies that an entity occurrence should at least be a member of
one of the subclasses. Total Participation of super class instances on subclasses is diagrammed
with a double line from the Super type to the circle as shown below.
E.g.: If we have EXTENTION and REGULAR as subclasses of a super class STUDENT,
then it is mandatory that each student to be either EXTENTION or REGULAR student.
Thus the participation of instances of STUDENT in EXTENTION and REGULAR
subclasses will be total.

• The Partial Specialization Rule specifies that it is not necessary for all entity occurrences in the
superclass to be a member of one of the subclasses. Here we have an optional participation on
the specialization. Partial Participation of superclass instances on subclasses is diagrammed with
a single line from the Supertype to the circle.
E.g.: If we have MANAGER and SECRETARY as subclasses of a superclass EMPLOYEE,
thenit is not the case that all employees are either manager or secretary. Thus the
participation of instances of employee in MANAGER and SECRETARY subclasses
will be partial.

AASTU Compiled by Chere L. (M.Tech) Page 23


Fundamentals of Database System
Disjointness Constraints
• Specifies the rule whether one entity occurrence can be a member of more than one subclasses.
i.e. it is a type of business rule that deals with the situation where an entity occurrence of a
Superclass may also have more than one Subclass occurrence.
• The Disjoint Rule restricts one entity occurrence of a superclass to be a member of only one of
the subclasses. Example: aEMPLOYEE can either be SALARIED or PART-TIMER, but not
the both at the same time.
• The Overlap Rule allows one entity occurrence to be a member f more than one
subclass. Example: EMPLOYEE working at the university can be both a STUDENT and an
EMPLOYEE at the same time.
• This is diagrammed by placing either the letter "d" for disjoint or "o" for overlapping inside the
circle on the Generalization Hierarchy portion of the E-R diagram.

The two types of constraints on generalization and specialization (Disjointness and Completeness
constraints) are not dependent on one another. That is, being disjoint will not favour whether the tuples
in the superclass should have Total or Partial participation for that specific specialization.
From the two types of constraints we can have four possible constraints
 Disjoint AND Total  Overlapping AND Total
 Disjoint AND Partial  Overlapping AND Partial

AASTU Compiled by Chere L. (M.Tech) Page 24


Fundamentals of Database System

Chapter Five
Logical and Physical Database Design
5.1 Logical Database Design
Logical design is the process of constructing a model of the information used in an
enterprise based on a specific data model (e.g. relational, hierarchical or network or
object), but independent of a particular DBMS and other physical considerations.
There are a collection of rules to be maintained in logical database design that helps us to
discover new entities and Revise attributes based on the rules and the discovered Entities.
This rule is called Normalization process.
The first step before applying the rules in relational data model is converting the
conceptual design to a form suitable for relational logical model, which is in a form of
tables.

5.1.1 Converting ER Diagram to Relational Tables


Three basic rules to convert ER into tables or relations:
1. For a relationship with One-to-One Cardinality:
• All the attributes are merged into a single table. Which means one can post the
primary key or candidate key of one of the relations to the other as a foreign key.
2. For a relationship with One-to-Many Cardinality:
• Post the primary key or candidate key from the “one” side as a foreign key
attribute to the “many” side. E.g.: For a relationship called “Belongs To” between
Employee (Many) and Department (One)
3. For a relationship with Many-to-Many Cardinality:
• Create a new table (which is the associative entity) and post primary key or
candidate key from each entity as attributes in the new table along with some
additional attributes (if applicable)
After converting the ER diagram in to table forms, the next phase is implementing the
process of normalization, which is a collection of rules each table should satisfy.

5.1.2 Normalization
A relational database is merely a collection of data, organized in a particular manner. As
the father of the relational database approach, Codd created a series of rules called
normal forms that help define that organization.

AASTU Compiled by Chere L. (M.Tech) Page 1


Fundamentals of Database System
One of the best ways to determine what information should be stored in a database is to
clarify what questions will be asked of it and what data would be included in the answers.
Database normalization is a series of steps followed to obtain a database design that
allows for consistent storage and efficient access of data in a relational database. These
steps reduce data redundancy and the risk of data becoming inconsistent.
Normalization is the process of identifying the logical associations between data items
and designing a database that will represent such associations but without suffering the
update anomalies which are;
• Insertion Anomalies
• Deletion Anomalies
• Modification Anomalies
Normalization may reduce system performance since data will be cross referenced from
many tables. Thus de-normalization is sometimes used to improve performance, at the
cost of reduced consistency guarantees.
Normalization normally is considered as good if it is lossless decomposition. All the
normalization rules will eventually remove the update anomalies that may exist during
data manipulation after the implementation.
The type of problems that could occur in insufficiently normalized table is called update
anomalies which includes;

(a) Insertion anomalies


An "insertion anomaly" is a failure to place information about a new database entry into
all the places in the database where information about that new entry needs to be stored.
In a properly normalized database, information about a new entry needs to be inserted
into only one place in the database; in an inadequately normalized database, information
about a new entry may need to be inserted into more than one place and, human fallibility
being what it is, some of the needed additional insertions may be missed.

(b) Deletion anomalies


A "deletion anomaly" is a failure to remove information about an existing database entry
when it is time to remove that entry. In a properly normalized database, information
about an old, to-be-gotten-rid-of entry needs to be deleted from only one place in the
database; in an inadequately normalized database, information about that old entry may
need to be deleted from more than one place, and, human fallibility being what it is, some
of the needed additional deletions may be missed.

AASTU Compiled by Chere L. (M.Tech) Page 2


Fundamentals of Database System
(c) Modification anomalies
A modification of a database involves changing some value of the attribute of a table. In
a properly normalized database table, whatever information is modified by the user, the
change will be effected and used accordingly.
Note: The purpose of normalization is to reduce the chances for anomalies to occur in a
database.
Example of problems related with Anomalies

Emp FName LName Skill Skill Skill_Type School SchoolAdd Skill


_ID _ID
Level

12 Abebe Mekuria 2 SQL Database AAU Sidist_Kilo 5

16 Lemma Alemu 5 C++ Programming Unity Gerji 6

28 Chane Kebede 2 SQL Database AAU Sidist_Kilo 10

25 Abera Taye 6 VB6 Programming Helico Piazza 8

65 Almaz Belay 2 SQL Database Helico Piazza 9

24 Dereje Tamiru 8 Oracle Database Unity Gerji 5

51 Selam Belay 4 Prolog Programming Jimma JimmaCity 8

94 Alem Kebede 3 Cisco Networking AAU Sidist_Kilo 7

18 Girma Dereje 1 IP Programming Jimma JimmaCity 4

13 Yared Gizaw 7 Java Programming AAU Sidist_Kilo 6

Deletion Anomalies:
If employee with ID 16 is deleted then ever information about skill C++ and the type of
skill is deleted from the database. Then we will not have any information about C++ and
its skill type.
Insertion Anomalies:
What if we have a new employee with a skill called Pascal? We cannot decide whether
Pascal is allowed as a value for skill and we have no clue about the type of skill that
Pascal should be categorized as.

AASTU Compiled by Chere L. (M.Tech) Page 3


Fundamentals of Database System
Modification Anomalies:
What if the address for Helico is changed froPiazza to Mexico? We need to look for
every occurrence of Helico and change the value of School_Add from Piazza to Mexico,
which is prone to error.
Note:
Database-management system can work only with the information that we put explicitly
into its tables for a given database and into its rules for working with those tables, where
such rules are appropriate and possible.

5.1.3 Functional Dependency (FD)


Before moving to the definition and application of normalization, it is important to have
an understanding of "functional dependency."
Data Dependency
The logical associations between data items that point the database designer in the
direction of a good database design are referred to as determinant or dependent
relationships.
Two data items A and B are said to be in a determinant or dependent relationship if
certain values of data item B always appears with certain values of data item A. if the
data item A is the determinant data item and B the dependent data item then the direction
of the association is from A to B and not vice versa.
The essence of this idea is that if the existence of something, call it A, implies that B
must exist and have a certain value, then we say that "B is functionally dependent on
A." We also often express this idea by saying that "A determines B," or that "B is a
function of A," or that "A functionally governs B." Often, the notions of functionality and
functional dependency are expressed briefly by the statement, "If A, then B." It is
important to note that the value B must be unique for a given value of A, i.e., any given
value of A must imply just one and only one value of B, in order for the relationship to
qualify for the name "function." (However, this does not necessarily prevent different
values of A from implying the same value of B.)
X  Y holds if whenever two tuples have the same value for X, they must
have the same value for Y
The notation is: AB which is read as; B is functionally dependent on A
In general, a functional dependency is a relationship among attributes. In relational
databases, we can have a determinant that governs one other attribute or several other
attributes.
FDs are derived from the real-world constraints on the attributes

AASTU Compiled by Chere L. (M.Tech) Page 4


Fundamentals of Database System
Example
Dinner Course Type of Wine
Meat Red
Fish White
Cheese Rose

Since the type of Wine served depends on the type of Dinner, we say Wine is
functionally dependent on Dinner.
Dinner  Wine
Dinner Type of Wine Type of Fork
Course
Meat Red Meat fork
Fish White Fish fork
Cheese Rose Cheese fork

Since both Wine type and Fork type are determined by the Dinner type, we say Wine is
functionally dependent on Dinner and Fork is functionally dependent on Dinner.
Dinner  Wine
Dinner  Fork
Partial Dependency
If an attribute which is not a member of the primary key is dependent on some part of the
primary key (if we have composite primary key) then that attribute is partially
functionally dependent on the primary key.
Let {A,B} is the Primary Key and C is no key attribute.
C and B
Then if {A,B} C
Then C is partially functionally dependent on {A,B}
Full Dependency
If an attribute which is not a member of the primary key is not dependent on some part of
the primary key but the whole key (if we have composite primary key) then that attribute
is fully functionally dependent on the primary key.
Let {A,B} is the Primary Key and C is no key attribute
C and B
Then if {A,B} C and A
C does not hold
Then C Fully functionally dependent on {A,B}
Transitive Dependency
In mathematics and logic, a transitive relationship is a relationship of the following form:
"If A implies B, and if also B implies C, then A implies C."

AASTU Compiled by Chere L. (M.Tech) Page 5


Fundamentals of Database System
Example:
If Mr X is a Human, and if every Human is an Animal, then Mr X must be an Animal.
Generalized way of describing transitive dependency is that:
If A functionally governs B, AND
If B functionally governs C
THEN A functionally governs C
Provided that neither C nor B determines A i.e. (B / A and C / A)
In the normal notation:
{(AB) AND (BC)}==> AC provided thatB / A and C / A

5.1.4 Steps of Normalization:


We have various levels or steps in normalization called Normal Forms. The level of
complexity, strength of the rule and decomposition increases as we move from one lower
level Normal Form to the higher.
A table in a relational database is said to be in a certain normal form if it satisfies certain
constraints. Normal form below represents a stronger condition than the previous one
Normalization towards a logical design consists of the following steps:
 UnNormalized Form:
Identify all data elements
 First Normal Form:
Find the key with which you can find all data
 Second Normal Form:
Remove part-key dependencies. Make all data dependent on the whole key.
 Third Normal Form
Remove non-key dependencies. Make all data dependent on nothing but the key.
For most practical purposes, databases are considered normalized if they adhere to third
normal form.

(1) First Normal Form (1NF)


Requires that all column values in a table are atomic (e.g., a number is an atomic value,
while a list or a set is not). We have two ways of achieving this:
 Putting each repeating group into a separate table and connecting them with a
primarykey-foreignkey relationship

AASTU Compiled by Chere L. (M.Tech) Page 6


Fundamentals of Database System
 Moving this repeating groups to a new row by repeating the common attributes. If
so then Find the key with which you can find all data
Definition: a table (relation) is in 1NF if the following meet
 There are no duplicated rows in the table. Unique identifier
 Each cell is single-valued (i.e., there are no repeating groups).
 Entries in a column (attribute, field) are of the same kind.
Example for First Normal form (1NF )
Unnormalized
Emp First_ Last_ Skill SkillType School SchoolAdd Skill
ID Name Name Level

12 Abebe Mekuria SQL, Database, AAU, Sidist_Kilo 5


VB6 Programming Helico Piazza 8

16 Lemma Alemu C++ Programming Unity Gerji 6


IP Programming Jimma JimmaCity 4

28 Chane Kebede SQL Database AAU Sidist_Kilo 10

65 Almaz Belay SQL Database Helico Piazza 9


Prolog Programming Jimma JimmaCity 8
Java Programming AAU Sidist_Kilo 6

24 Dereje Tamiru Oracle Database Unity Gerji 5


94 Alem Kebede Cisco Networking AAU Sidist_Kilo 7

First Normal Form (1nf): Remove all repeating groups. Distribute the multi-valued
attributes into different rows and identify a unique identifier for the relation so that is can
be said is a relation in relational database.
Emp First LastName Skill Skill SkillType Schoo SchoolAdd Skill
ID Name ID l Level

12 Abebe Mekuria 1 SQL Database AAU Sidist_Kilo 5

12 Abebe Mekuria 3 VB6 Programming Helico Piazza 8

16 Lemma Alemu 2 C++ Programming Unity Gerji 6

16 Lemma Alemu 7 IP Programming Jimma JimmaCity 4


28 Chane Kebede 1 SQL Database AAU Sidist_Kilo 10

AASTU Compiled by Chere L. (M.Tech) Page 7


Fundamentals of Database System
65 Almaz Belay 1 SQL Database Helico Piazza 9
65 Almaz Belay 5 Prolog Programming Jimma JimmaCity 8
65 Almaz Belay 8 Java Programming AAU Sidist_Kilo 6
24 Dereje Tamiru 4 Oracle Database Unity Gerji 5
94 Alem Kebede 6 Cisco Networking AAU Sidist_Kilo 7

(2) Second Normal form 2NF


No partial dependency of a non key attribute on part of the primary key. This will result
in a set of relations with a level of Second Normal Form.
Any table that is in 1NF and has a single-attribute (i.e., a non-composite) key is
automatically also in 2NF.
Definition: a table (relation) is in 2NF if the following meet
 It is in 1NF and
 If all non-key attributes are dependent on the entire primary key. i.e. no partial
dependency.
Example for 2NF:
EMP_PROJ
EmpID EmpName Proj ProjName Proj ProjFund ProjMang Incentive
No Loc ID

EMP_PROJ rearranged
EmpID Proj EmpName ProjName Proj ProjFund ProjMangID Incentiv
No Loc e

Business rule: Whenever an employee participates in a project, he/she will be entitled for
an incentive.
This schema is in its 1NF since we don’t have any repeating groups or attributes with
multi-valued property. To convert it to a 2NF we need to remove all partial dependencies
of non key attributes on part of the primary key.
{EmpID, ProjNo}EmpName, ProjName, ProjLoc, ProjFund, ProjMangID, Incentive
But in addition to this we have the following dependencies
FD1: {EmpID}EmpName
FD2: {ProjNo}ProjName, ProjLoc, ProjFund, ProjMangID
FD3: {EmpID, ProjNo}Incentive

AASTU Compiled by Chere L. (M.Tech) Page 8


Fundamentals of Database System
As we can see, some non key attributes are partially dependent on some part of the
primary key. This can be witnessed by analyzing the first two functional dependencies
(FD1 and FD2). Thus, each Functional Dependencies, with their dependent attributes
should be moved to a new relation where the Determinant will be the Primary Key for
each.
EMPLOYEE
EmpID EmpName

PROJECT
ProjNo ProjName ProjLoc ProjFund ProjMangID

EMP_PROJ
EmpID ProjNo Incentive

(3) Third Normal Form (3NF )


Eliminate Columns Dependent on another non-Primary Key - If attributes do not
contribute to a description of the key, remove them to a separate table.
This level avoids update and delete anomalies.
Definition: a Table (Relation) is in 3NF if the following meet
 It is in 2NF and
 There are no transitive dependencies between a primary key and non-primary
key attributes.
Example for (3NF)
Assumption: Students of same batch (same year) live in one building or dormitory
STUDENT
StudID Stud_F_Name Stud_L_Name Dept Year Dormitary
125/97 Abebe Mekuria Info Sc 1 401
654/95 Lemma Alemu Geog 3 403
842/95 Chane Kebede CompSc 3 403
165/97 Alem Kebede InfoSc 1 401
985/95 Almaz Belay Geog 3 403

AASTU Compiled by Chere L. (M.Tech) Page 9


Fundamentals of Database System
This schema is in its 2NF since the primary key is a single attribute.
Let’s take StudID, Year and Dormitaryand see the dependencies.
StudIDYear AND YearDormitary
And Year cannot determine StudID and Dormitary cannot determine StudIDThen
transitively StudIDDormitary
To convert it to a 3NF we need to remove all transitive dependencies of non key
attributes on another non-key attribute.
The non-primary key attributes, dependent on each other will be moved to another table
and linked with the main table using Candidate Key- Foreign Key relationship.
STUDENT DORM

StudID Stud Stud Dept Year


Year Dormitary
F_Name L_Name
1 401
125/97 Abebe Mekuria Info Sc 1
3 403
654/95 Lemma Alemu Geog 3
842/95 Chane Kebede CompSc 3
165/97 Alem Kebede InfoSc 1
985/95 Almaz Belay Geog 3

Generally, even though there are other four additional levels of Normalization, a table is
said to be normalized if it reaches 3NF. A database with all tables in the 3NF is said to be
Normalized Database.
Mnemonic for remembering the rationale for normalization up to 3NF could be the
following:
 No Repeating or Redunduncy: no repeting fields in the table.
 The Fields Depend Upon the Key: the table should solely depend on the key.
 The Whole Key: no partial keybdependency.
 And Nothing But The Key: no inter data dependency.
 So Help Me Codd: since Coddcame up with these rules.

AASTU Compiled by Chere L. (M.Tech) Page 10


Fundamentals of Database System
5.1.5 Other Levels of Normalization

Boyce-CoddNormal Form (BCNF):


Isolate Independent Multiple Relationships - No table may contain two or more 1:n or
N:M relationships that are not directly related.
The correct solution, to cause the model to be in 4th normal form, is to ensure that all
M:M relationships are resolved independently if they are indeed independent, as shown
below.
Def: A table is in BCNF if it is in 3NF and if every determinant is a candidate key.

Forth Normal form (4NF)


Isolate Semantically Related Multiple Relationships - There may be practical constrains
on information that justify separating logically related many-to-many relationships.
Def: A table is in 4NF if it is in BCNF and if it has no multi-valued dependencies.

Fifth Normal Form (5NF)


A model limited to only simple (elemental) facts, as expressed in ORM.
Def: A table is in 5NF, also called "Projection-Join Normal Form" (PJNF), if it is
in 4NF and if every join dependency in the table is a consequence of the
candidate keys of the table.

Domain-Key Normal Form (DKNF)


A model free from all modification anomalies.
Def: A table is in DKNF if every constraint on the table is a logical consequence
of the definition of keys and domains.
The underlying ideas in normalization are simple enough. Through normalization we
want to design for our relational database a set of tables that;
(1) Contain all the data necessary for the purposes that the database is to serve,
(2) Have as little redundancy as possible,
(3) Accommodate multiple values for types of data that require them,
(4) Permit efficient updates of the data in the database, and
(5) Avoid the danger of losing data unknowingly.

AASTU Compiled by Chere L. (M.Tech) Page 11


Fundamentals of Database System

5.1.6 Pitfalls of Normalization


 Requires data to see the problems
 May reduce performance of the system
 Is time consuming,
 Difficult to design and apply and
 Prone to human error

AASTU Compiled by Chere L. (M.Tech) Page 12


Fundamentals of Database System
6.2 Physical Database Design
We have established that there are three levels of database design:
• Conceptual: producing a data model which accounts for the relevant entities and relationships
within the target application domain;
• Logical: ensuring, via normalization procedures and the definition of integrity rules, that the
stored database will be non-redundant and properly connected;
• Physical: specifying how database records are stored, accessed and related to ensure adequate
performance.

It is considered desirable to keep these three levels quite separate -- one of Codd's requirements for an
RDBMS is that it should maintain logical-physical data independence. The generality of the relational
model means that RDBMSs are potentially less efficient than those based on one of the older data
models where access paths were specified once and for all at the design stage. However the relational
data model does not preclude the use of traditional techniques for accessing data - it is still essential to
exploit them to achieve adequate performance with a database of any size.
We can consider the topic of physical database design from three aspects:
• What techniques for storing and finding data exist
• Which are implemented within a particular DBMS
• Which might be selected by the designer for a given application knowing the properties of the
data
Thus the purpose of physical database design is:
1. How to map the logical database design to a physical database design.
2. How to design base relations for target DBMS.
3. How to design enterprise constraints for target DBMS.
4. How to select appropriate file organizations based on analysis of transactions.
5. When to use secondary indexes to improve performance.
6. How to estimate the size of the database
7. How to design user views
8. How to design security mechanisms to satisfy user requirements.

Physical database design is the process of producing a description of the implementation of the database
on secondary storage.

Physical design describes the base relation, file organization, and indexes used to achieve efficient
access to the data, and any associated integrity constraints and security measures.

AASTU Compiled by Chere L. (M.Tech) Page 13


Fundamentals of Database System
• Sources of information for the physical design process include global logical data model and
documentation that describes model.
• Logical database design is concerned with the what; physical database design is concerned with
the how.
• The process of producing a description of the implementation of the database on secondary
storage.
• Describes the storage structures and access methods used to achieve efficient access to the data.

6.2.1 Steps in physical database design


1. Translate logical data model for target DBMS
1.1. Design base relation
1.2. Design representation of derived data
1.3. Design enterprise constraint
2. Design physical representation
1.1. Analyze transactions
1.2. Choose file organization
1.3. Choose indexes
1.4. Estimate disk space and system requirement
3. Design user view
4. Design security mechanisms
5. Consider controlled redundancy
6. Monitor and tune the operational system

6.2.2 Translate Logical Data Model for Target DBMS


This phase is the translation of the global logical data model to produce a relational database schema in
the target DBMS. This includes creating the data dictionary based on the logical model and information
gathered.
After the creation of the data dictionary, the next activity is to understand the functionality of the target
DBMS so that all necessary requirements are fulfilled for the database intended to be developed.
Knowledge of the DBMS includes:
 how to create base relations
 whether the system supports:
o definition of Primary key o definition of Alternate key
o definition of Foreign key o definition of Domains
o Referential integrity constraints
o definition of enterprise level constraints

AASTU Compiled by Chere L. (M.Tech) Page 14


Fundamentals of Database System
a) Design base relation
To decide how to represent base relations identified in global logical model in target DBMS.
Designing base relation involves identification of all necessary requirements about a relation starting
from the name up to the referential integrity constraints.
For each relation, need to define:
• The name of the relation;
• A list of simple attributes in brackets;
• The PK and, where appropriate, AKs and FKs.
• A list of any derived attributes and how they should be computed;
• Referential integrity constraints for any FKs identified.
For each attribute, need to define:
• Its domain, consisting of a data type, length, and any constraints on the domain;
• An optional default value for the attribute;
• Whether the attribute can hold nulls.
The implementation of the physical model is dependent on the target DBMS since some has more
facilities than the other in defining database definitions.
The base relation design along with every justifiable reason should be fully documented.

b) Design representation of derived data


While analyzing the requirement of users, we may encounter that there are some attributes holding data
that will be derived from existing or other attributes. A decision on how to represent any derived data
present in the global logical data model in the target DBMS should be devised.
Examine logical data model and data dictionary, and produce list of all derived attributes. Most of the
time derived attributes are not expressed in the logical model but will be included in the data dictionary.
Whether to store derived attributes in a base relation or calculate them when required is a decision to be
made by the designer considering the performance impact.
Option selected is based on:
• Additional cost to store the derived data and keep it consistent with operational data from which
it is derived;
• Cost to calculate it each time it is required.
Less expensive option is chosen subject to performance constraints.
The representation of derived attributes should be fully documented.

AASTU Compiled by Chere L. (M.Tech) Page 15


Fundamentals of Database System
c) Design enterprise constraint
Data in the database is not only subjected to constraints on the database and the data model used but also
with some enterprise dependent constraints. These constraint definitions are also dependent on the
DBMS selected and enterprise level requirements.
One need to know the functionalities of the DBMS since in designing the enterprise constraints for the
target DBMS some DBMS provide more facilities than others.
All the enterprise level constraints and the definition method in the target DBMS should be fully
documented.
6.2.3 Design physical representation
This phase is the level for determining the optimal file organizations to store the base relations and the
indexes that are required to achieve acceptable performance; that is, the way in which relations and
tuples will be held on secondary storage.
Number of factors that may be used to measure efficiency:
• Transaction throughput: number of transactions processed in given time interval.
• Response time: elapsed time for completion of a single transaction.
• Disk storage: amount of disk space required to store database files.
However, no one factor is always correct.
Typically, have to trade one factor off against another to achieve a reasonable balance.
a) Analyze transactions
To understand the functionality of the transactions that will run on the database and to analyze the
important transactions.
Attempt to identify performance criteria, e.g.:
• Transactions that run frequently and will have a significant impact on performance;
• Transactions that are critical to the business;
• Times during the day/week when there will be a high demand made on the database (called the
peak load).
Use this information to identify the parts of the database that may cause performance problems.
To select appropriate file organizations and indexes, also need to know high-level functionality of the
transactions, such as:
• Attributes that are updated in an update transaction;
• Criteria used to restrict tuples that are retrieved in a query.
Often not possible to analyze all expected transactions, so investigate most ‘important’ ones.

AASTU Compiled by Chere L. (M.Tech) Page 16


Fundamentals of Database System
To help identify which transactions to investigate, can use:
• Transaction/relation cross-reference matrix, showing relations that each transaction accesses,
and/or
• Transaction usage map, indicating which relations are potentially heavily used.

To focus on areas that may be problematic:


1. Map all transaction paths to relations.
2. Determine which relations are most frequently accessed by transactions.
3. Analyze the data usage of selected transactions that involve these relations.

b) Choose file organization


To determine an efficient file organization for each base relation
File organizations include Heap, Hash, Indexed Sequential Access Method (ISAM), B+-Tree, and
Clusters.

c) Choose indexes
To determine whether adding indexes will improve the performance of the system.
One approach is to keep tuples unordered and create as many secondary indexes as necessary.
Another approach is to order tuples in the relation by specifying a primary or clustering index.
In this case, choose the attribute for ordering or clustering the tuples as:
• Attribute that is used most often for join operations - this makes join operation more efficient, or
• Attribute that is used most often to access the tuples in a relation in order of that attribute.
If ordering attribute chosen is key of relation, index will be a primary index; otherwise, index will be a
clustering index.
Each relation can only have either a primary index or a clustering index.
Secondary indexes provide a mechanism for specifying an additional key for a base relation that can be
used to retrieve data more efficiently.
Overhead involved in maintenance and use of secondary indexes that has to be balanced against
performance improvement gained when retrieving data.
This includes:
• Adding an index record to every secondary index whenever tuple is inserted;
• Updating a secondary index when corresponding tuple is updated;
• Increase in disk space needed to store the secondary index;
• Possible performance degradation during query optimization to consider all secondary
indexes.

AASTU Compiled by Chere L. (M.Tech) Page 17


Fundamentals of Database System
Guidelines for Choosing Indexes
(1) Do not index small relations.
(2) Index PK of a relation if it is not a key of the file organization.
(3) Add secondary index to a FK if it is frequently accessed.
(4) Add secondary index to any attribute that is heavily used as a secondary key.
(5) Add secondary index on attributes that are involved in: selection or join criteria; ORDER
BY; GROUP BY; and other operations involving sorting (such as UNION or DISTINCT).
(6) Add secondary index on attributes involved in built-in functions.
(7) Add secondary index on attributes that could result in an index-only plan.
(8) Avoid indexing an attribute or relation that is frequently updated.
(9) Avoid indexing an attribute if the query will retrieve a significant proportion of the tuples
in the relation.
(10) Avoid indexing attributes that consist of long character strings.

d) Estimate disk space and system requirement


To estimate the amount of disk space that will be required by the database.
Purpose:
• If system already exists: is there adequate storage?
• If procuring new system: what storage will be required?

6.2.4 Design user view


To design the user views that was identified during the Requirements
Collection and Analysis stage of the relational database application lifecycle.
Define views in DDL to provide user views identified in data model
Map onto objects in physical data model

6.2.5 Design security mechanisms


To design the security measures for the database as specified by the users.
 System security
 Data security

6.2.6 Consider the Introduction of Controlled Redundancy


To determine whether introducing redundancy in a controlled manner by relaxing the normalization
rules will improve the performance of the system.

AASTU Compiled by Chere L. (M.Tech) Page 18


Fundamentals of Database System
Result of normalization is a logical database design that is structurally consistent and has minimal
redundancy.
However, sometimes a normalized database design does not provide maximum processing efficiency.
It may be necessary to accept the loss of some of the benefits of a fully normalized design in favor of
performance.
Also consider that denormalization:
• Makes implementation more complex;
• Often sacrifices flexibility;
• May speed up retrievals but it slows down updates.
Denormalization refers to a refinement to relational schema such that the degree of normalization for a
modified relation is less than the degree of at least one of the original relations.
Also use term more loosely to refer to situations where two relations are combined into one new
relation, which is still normalized but contains more nulls than original relations.
Consider denormalization in following situations, specifically to speed up frequent or critical
transactions:
• Step 1 Combining 1:1 relationships
• Step 2 Duplicating non-key attributes in 1:* relationships to reduce joins
• Step 3 Duplicating foreign key attributes in 1:* relationships to reduce joins
• Step 4 Introducing repeating groups
• Step 5 Merging lookup tables with base relations
• Step 6 Creating extract tables.

6.2.7 Monitoring and Tuning the operational system


Meaning of denormalization
When to denormalize to improve performance
Importance of monitoring and tuning the operational system
To monitor operational system and improve performance of system to correct inappropriate design
decisions or reflect changing requirements.

AASTU Compiled by Chere L. (M.Tech) Page 19


Chapter Six
Relational Query Languages
6.1 Query languages:
 Allow manipulation and retrieval of data from a database.
 Query Languages!= programming languages!
 QLs not intended to be used for complex calculations.
 QLs support easy, efficient access to large data sets.
 Relational model supports simple, powerful query languages.
Formal Relational Query Languages
 There are varieties of Query languages used by relational DBMS for manipulating relations.
 Some of them are procedural
o User tells the system exactly what and how to manipulate the data
 Others are non-procedural
o User states what data is needed rather than how it is to be retrieved.

Two mathematical Query Languages form the basis for Relational languages
 Relational Algebra:
 Relational Calculus:

 We may describe the relational algebra as procedural language: it can be used to tell the DBMS how
to build a new relation from one or more relations in the database.

 We may describe relational calculus as a non-procedural language: it can be used to formulate the
definition of a relation in terms of one or more database relations.

 Formally the relational algebra and relational calculus are equivalent to each other. For every
expression in the algebra, there is an equivalent expression in the calculus.

 Both are non-user friendly languages. They have been used as the basis for other, higher-level data
manipulation languages for relational databases.

A query is applied to relation instances, and the result of a query is also a relation instance.
 Schemas of input relations for a query are fixed
 The schema for the result of a given query is also fixed! Determined by definition of query
language constructs.

6.2 Relational Algebra


The basic set of operations for the relational model is known as the relational algebra. These operations
enable a user to specify basic retrieval requests.

The result of the retrieval is a new relation, which may have been formed from one or more relations. The
algebra operations thus produce new relations, which can be further manipulated using operations of the
same algebra.
A sequence of relational algebra operations forms a relational algebra expression, whose result will also
be a relation that represents the result of a database query (or retrieval request).

 Relational algebra is a theoretical language with operations that work on one or more relations to
define another relation without changing the original relation.
 The output from one operation can become the input to another operation (nesting is possible)

 There are different basic operations that could be applied on relations on a database based on the
requirement.
♠ Selection (  ) Selects a subset of rows from a relation.
♠ Projection (  ) Deletes unwanted columns from a relation.
♠ Renaming: assigning intermediate relation for a single operation
♠ Cross-Product ( x ) Allows us to combine two relations.
♠ Set-Difference ( - ) Tuples in relation1, but not in relation2.
♠ Union ( ) Tuples in relation1 or in relation2.
♠ Intersection () Tuples in relation1 and in relation2
♠ Join Tuples joined from two relations based on a condition
 Using these we can build up sophisticated database queries.

Table1:
Sample table used to illustrate different kinds of relational operations. The relation contains information
about employees, IT skills they have and the school where they attend each skill.

Employee
EmpID FName LName SkillID Skill SkillType School SchoolAdd SkillLevel
12 Abebe Mekuria 2 SQL Database AAU Sidist_Kilo 5
16 Lemma Alemu 5 C++ Programming Unity Gerji 6
28 Chane Kebede 2 SQL Database AAU Sidist_Kilo 10
25 Abera Taye 6 VB6 Programming Helico Piazza 8
65 Almaz Belay 2 SQL Database Helico Piazza 9
24 Dereje Tamiru 8 Oracle Database Unity Gerji 5
51 Selam Belay 4 Prolog Programming Jimma Jimma City 8
94 Alem Kebede 3 Cisco Networking AAU Sidist_Kilo 7
18 Girma Dereje 1 IP Programming Jimma Jimma City 4
13 Yared Gizaw 7 Java Programming AAU Sidist_Kilo 6
6.2.1 Selection
 Selects subset of tuples/rows in a relation that satisfy selection condition.
 Selection operation is a unary operator (it is applied to a single relation)
 The Selection operation is applied to each tuple individually
 The degree of the resulting relation is the same as the original relation but the cardinality (no. of
tuples) is less than or equal to the original relation.
 The Selection operator is commutative.
 Set of conditions can be combined using Boolean operations ((AND), (OR), and ~(NOT))
 No duplicates in result!
 Schema of result identical to schema of (only) input relation.
 Result relation can be the input for another relational algebra operation! (Operator composition.)
 It is a filter that keeps only those tuples that satisfy a qualifying condition (those satisfying the
condition are selected while others are discarded.)

Notation:
<Selection Condition> <Relation Name>
Example: Find all Employees with skill type of Database.

< SkillType =”Database”> (Employee)


This query will extract every tuple from a relation called Employee with all the attributes where the
SkillType attribute with a value of ―Database‖.

The resulting relation will be the following.

EmpID FName LName SkillID Skill SkillType School SchoolAdd SkillLevel


12 Abebe Mekuria 2 SQL Database AAU Sidist_Kilo 5
28 Chane Kebede 2 SQL Database AAU Sidist_Kilo 10
65 Almaz Belay 2 SQL Database Helico Piazza 9
24 Dereje Tamiru 8 Oracle Database Unity Gerji 5

If the query is all employees with a SkillType Database and School Unity the relational algebra operation
and the resulting relation will be as follows.

< SkillType =”Database” AND School=”Unity”> (Employee)


EmpID FName LName SkillID Skill SkillType School SchoolAdd SkillLevel
24 Dereje Tamiru 8 Oracle Database Unity Gerji 5
6.2.2 Projection
 Selects certain attributes while discarding the other from the base relation.
 The PROJECT creates a vertical partitioning – one with the needed columns (attributes) containing
results of the operation and other containing the discarded Columns.
 Deletes attributes that are not in projection list.
 Schema of result contains exactly the fields in the projection list, with the same names that they had in
the (only) input relation.
 Projection operator has to eliminate duplicates!
o Note: real systems typically don’t do duplicate elimination unless the user explicitly asks for it.
 If the Primary Key is in the projection list, then duplication will not occur
 Duplication removal is necessary to insure that the resulting table is also a relation.

Notation:
<Selected Attributes> <Relation Name>
Example: To display Name, Skill, and Skill Level of an employee, the query and the resulting relation
will be:

<FName, LName, Skill, Skill_Level> (Employee)


FName LName Skill SkillLevel
Abebe Mekuria SQL 5
Lemma Alemu C++ 6
Chane Kebede SQL 10
Abera Taye VB6 8
Almaz Belay SQL 9
Dereje Tamiru Oracle 5
Selam Belay Prolog 8
Alem Kebede Cisco 7
Girma Dereje IP 4
Yared Gizaw Java 6
If we want to have the Name, Skill, and Skill Level of an employee with Skill SQL and SkillLevel greater
than 5 the query will be:

<FName, LName, Skill, Skill_Level> ( (Employee))


<Skill=”SQL”  SkillLevel>5>
FName LName Skill SkillLevel
Chane Kebede SQL 10
Almaz Belay SQL 9

6.2.3 Rename Operation


We may want to apply several relational algebra operations one after the other. The query could be written
in two different forms:

1. Write the operations as a single relational algebra expression by nesting the operations.
2. Apply one operation at a time and create intermediate result relations. In the latter case, we must give
names to the relations that hold the intermediate resultsRename Operation

If we want to have the Name, Skill, and Skill Level of an employee with salary greater than 1500 and
working for department 5, we can write the expression for this query using the two alternatives:
1. A single algebraic expression:
The above used query is using a single algebra operation, which is:

<FName, LName, Skill, Skill_Level> ( (Employee))


<Skill=”SQL”  SkillLevel>5>

2. Using an intermediate relation by the Rename Operation:

Step1: Result1  <DeptNo=5  Salary>1500> (Employee)

Step2: Result <FName, LName, Skill, Skill_Level> (Result1)

Then Result will be equivalent with the relation we get using the first alternative.

6.2.4 Set Operations


The three main set operations are the Union, Intersection and Set Difference. The properties of these set
operations are similar with the concept we have in mathematical set theory. The difference is that, in
database context, the elements of each set, which is a Relation in Database, will be tuples. The set
operations are Binary operations which demand the two operand Relations to have type compatibility
feature.

Type Compatibility
Two relations R1 and R2 are said to be Type Compatible if:
1. The operand relations R1(A1, A2, ..., An) and R2(B1, B2, ..., Bn) have the same number of
attributes, and
2. The domains of corresponding attributes must be compatible; that is, Dom(Ai)=Dom(Bi) for
i=1, 2, ..., n.
To illustrate the three set operations, we will make use of the following two tables:
Employee
EmpID FName LName SkillID Skill SkillType School SkillLevel
12 Abebe Mekuria 2 SQL Database AAU 5
16 Lemma Alemu 5 C++ Programming Unity 6
28 Chane Kebede 2 SQL Database AAU 10
25 Abera Taye 6 VB6 Programming Helico 8
65 Almaz Belay 2 SQL Database Helico 9
24 Dereje Tamiru 8 Oracle Database Unity 5
51 Selam Belay 4 Prolog Programming Jimma 8
94 Alem Kebede 3 Cisco Networking AAU 7
18 Girma Dereje 1 IP Programming Jimma 4
13 Yared Gizaw 7 Java Programming AAU 6
RelationOne: Employees who attend Database Course
EmpID FName LName SkillID Skill SkillType School SkillLevel
12 Abebe Mekuria 2 SQL Database AAU 5
28 Chane Kebede 2 SQL Database AAU 10
65 Almaz Belay 2 SQL Database Helico 9
24 Dereje Tamiru 8 Oracle Database Unity 5

RelationTwo : Employees who attend a course in AAU


EmpID FName LName SkillID Skill SkillType School SkillLevel
12 Abebe Mekuria 2 SQL Database AAU 5
94 Alem Kebede 3 Cisco Networking AAU 7
28 Chane Kebede 2 SQL Database AAU 10
13 Yared Gizaw 7 Java Programming AAU 6

6.2.5 UNION Operation


The result of this operation, denoted by R U S, is a relation that includes all tuples that are either in R or in
S or in both R and S. Duplicate tuple is eliminated.
The two operands must be "type compatible"
Eg: RelationOne U RelationTwo
Employees who attend Database in any School or who attend any course at AAU

EmpID FName LName SkillID Skill SkillType School SkillLevel


12 Abebe Mekuria 2 SQL Database AAU 5
28 Chane Kebede 2 SQL Database AAU 10
65 Almaz Belay 2 SQL Database Helico 9
24 Dereje Tamiru 8 Oracle Database Unity 5
94 Alem Kebede 3 Cisco Networking AAU 7
13 Yared Gizaw 7 Java Programming AAU 6

6.2.6 INTERSECTION Operation


The result of this operation, denoted by R ∩ S, is a relation that includes all tuples that are in both R and
S. The two operands must be "type compatible"
Eg: RelationOne ∩ RelationTwo
Employees who attend Database Course at AAU

EmpID FName LName SkillID Skill SkillType School SkillLevel


12 Abebe Mekuria 2 SQL Database AAU 5
28 Chane Kebede 2 SQL Database AAU 10

6.2.7 Set Difference (or MINUS) Operation


The result of this operation, denoted by R - S, is a relation that includes all tuples that are in R but not in S.
The two operands must be "type compatible"
Eg: RelationOne - RelationTwo
Employees who attend Database Course but didn’t take any course at AAU
EmpID FName LName SkillID Skill SkillType School SkillLevel
65 Almaz Belay 2 SQL Database Helico 9
24 Dereje Tamiru 8 Oracle Database Unity 5
Eg: RelationTwo - RelationOne
Employees who attend Database Course but didn’t take any course at AAU

EmpID FName LName SkillID Skill SkillType School SkillLevel


12 Abebe Mekuria 2 SQL Database AAU 5
94 Alem Kebede 3 Cisco Networking AAU 7
28 Chane Kebede 2 SQL Database AAU 10
13 Yared Gizaw 7 Java Programming AAU 6

The resulting relation for; R1  R2, R1  R2, or R1-R2 has the same attribute names as the first operand
relation R1 (by convention).
Some Properties of the Set Operators
Notice that both union and intersection are commutative operations; that is
R  S = S  R, and R  S = S  R
Both union and intersection can be treated as n-nary operations applicable to any number of relations as
both are associative operations; that is
R  (S  T) = (R  S)  T, and (R  S)  T = R  (S  T)
The minus operation is not commutative; that is, in general
R-S≠S–R

6.2.8 CARTESIAN (cross product) Operation


This operation is used to combine tuples from two relations in a combinatorial fashion. That means, every
tuple in Relation1(R) one will be related with every other tuple in Relation2 (S).
 In general, the result of R(A1, A2, . . ., An) x S(B1,B2, . . ., Bm) is a relation Q with degree n + m
attributes Q(A1, A2, . . ., An, B1, B2, . . ., Bm), in that order.
 Where R has n attributes and S has m attributes.
 The resulting relation Q has one tuple for each combination of tuples—one from R and one from
S.
 Hence, if R has n tuples, and S has m tuples, then | R x S | will have n* m tuples.
Example:

Employee
ID FName LName
123 Abebe Lemma
567 Belay Taye
822 Kefle Kebede

Dept
DeptID DeptName MangID
2 Finance 567
3 Personnel 123

Then the Cartesian product between Employee and Dept relations will be of the form:
Employee X Dept:
ID FName LName DeptID DeptName MangID
123 Abebe Lemma 2 Finance 567
123 Abebe Lemma 3 Personnel 123
567 Belay Taye 2 Finance 567
567 Belay Taye 3 Personnel 123
822 Kefle Kebede 2 Finance 567
822 Kefle Kebede 3 Personnel 123

Basically, even though it is very important in query processing, the Cartesian Product is not useful by itself
since it relates every tuple in the First Relation with every other tuple in the Second Relation. Thus, to make
use of the Cartesian Product, one has to use it with the Selection Operation, which discriminate tuples of a
relation by testing whether each will satisfy the selection condition.

In our example, to extract employee information about managers of the departments (Managers of each
department), the algebra query and the resulting relation will be.

<ID, FName, LName, DeptName > ( <ID=MangID> (Employee X Dept))


ID FName LName DeptName
123 Abebe Lemma Personnel
567 Belay Taye Finance

6.2.9 JOIN Operation


The sequence of Cartesian product followed by select is used quite commonly to identify and select
related tuples from two relations, a special operation, called JOIN. Thus in JOIN operation, the Cartesian
Operation and the Selection Operations are used together.

JOIN Operation is denoted by a symbol.

This operation is very important for any relational database with more than a single relation, because it
allows us to process relationships among relations.
The general form of a join operation on two relations
R(A1, A2,. . ., An) and S(B1, B2, . . ., Bm) is:

R <join condition> S is equivalent to (R X S)


<selection condition>

where <join condition> and <selection condition> are the same

Where, R and S can be any relation that results from general relational algebra expressions.
Since JOIN is an operation that needs two relation, it is a Binary operation.

This type of JOIN is called a THETA JOIN ( - JOIN)


Where  is the logical operator used in the join condition.
 Could be { <,  , >, , , = }
Example:
Thus in the above example we want to extract employee information about managers of the
departments, the algebra query using the JOIN operation will be.

Employee < ID=MangID> Dept

6.2.9.1 EQUIJOIN Operation


The most common use of join involves join conditions with equality comparisons only ( = ). Such a join,
where the only comparison operator used is called an EQUIJOIN. In the result of an EQUIJOIN we
always have one or more pairs of attributes (whose names need not be identical) that have identical values
in every tuple since we used the equality logical operator.
For example, the above JOIN expression is an EQUIJOIN since the logical operator used is the
equal to operator ( =).
6.2.9.2 NATURAL JOIN Operation
We have seen that in EQUIJOIN one of each pair of attributes with identical values is extra, a new
operation called natural join was created to get rid of the second (or extra) attribute that we will have in
the result of an EQUIJOIN condition.

The standard definition of natural join requires that the two join attributes, or each pair of corresponding
join attributes, have the same name in both relations. If this is not the case, a renaming operation on the
attributes is applied first.

6.2.9.3 OUTER JOIN Operation


OUTER JOIN is another version of the JOIN operation where non matching tuples from a relation are also
included in the result with NULL values for attributes in the other relation.
There are two major types of OUTER JOIN.

1. RIGHT OUTER JOIN: where non matching tuples from the second (Right) relation are included
in the result with NULL value for attributes of the first (Left) relation.
2. LEFT OUTER JOIN: where non matching tuples from the first (Left) relation are included in the
result with NULL value for attributes of the second (Right) relation.

Notation for Left Outer Join:

R <Join Condition > S


When two relations are joined by a JOIN operator, there could be some tuples in the first relation not
having a matching tuple from the second relation, and the query is interested to display these non-
matching tuples from the first or second relation. Such query is represented by the OUTER JOIN.
6.2.9.4 SEMIJOIN Operation
SEMI JOIN is another version of the JOIN operation where the resulting Relation will contain those
attributes of only one of the Relations that are related with tuples in the other Relation. The following
notation depicts the inclusion of only the attributes form the first relation (R) in the result which are
actually participating in the relationship.

R <Join Condition> S
6.3 Relational Calculus
A relational calculus expression creates a new relation, which is specified in terms of variables that range
over rows of the stored database relations (in tuple calculus) or over columns of the stored relations (in
domain calculus).

In a calculus expression, there is no order of operations to specify how to retrieve the query result. A
calculus expression specifies only what information the result should contain rather than how to retrieve
it.

In Relational calculus, there is no description of how to evaluate a query; this is the main distinguishing
feature between relational algebra and relational calculus.

Relational calculus is considered to be a nonprocedural language. This differs from relational algebra,
where we must write a sequence of operations to specify a retrieval request; hence relational algebra can
be considered as a procedural way of stating a query.

When applied to relational database, the calculus is not that of derivative and differential but in a form of
first-order logic or predicate calculus, a predicate is a truth-valued function with arguments.

When we substitute values for the arguments in the predicate, the function yields an expression, called a
proposition, which can be either true or false.

If a predicate contains a variable, as in ‘x is a member of staff’, there must be a range for x. When we
substitute some values of this range for x, the proposition may be true; for other values, it may be false.

If COND is a predicate, then the set off all tuples evaluated to be true for the predicate COND will be
expressed as follows:
{t | COND(t)}
Where t is a tuple variable and COND (t) is a conditional expression involving t. The result of such a
query is the set of all tuples t that satisfy COND (t).

If we have set of predicates to evaluate for a single query, the predicates can be connected using
(AND), (OR), and ~(NOT)
A relational calculus expression creates a new relation, which is specified in terms of variables that range
over rows of the stored database relations (in tuple calculus) or over columns of the stored relations (in
domain calculus).
6.3.1 Tuple-oriented Relational Calculus
 The tuple relational calculus is based on specifying a number of tuple variables. Each tuple variable
usually ranges over a particular database relation, meaning that the variable may take as its value any
individual tuple from that relation.

 Tuple relational calculus is interested in finding tuples for which a predicate is true for a relation.
Based on use of tuple variables.

 Tuple variable is a variable that ‘ranges over’ a named relation: that is, a variable whose only
permitted values are tuples of the relation.

 If E is a tuple that ranges over a relation employee, then it is represented as EMPLOYEE(E) i.e.
Range of E is EMPLOYEE

 Then to extract all tuples that satisfy a certain condition, we will represent is as all tuples E such that
COND(E) is evaluated to be true.

{E  COND(E)}
The predicates can be connected using the Boolean operators:
 (AND),  (OR),  (NOT)

COND(t) is a formula, and is called a Well-Formed-Formula (WFF) if:


 Where the COND is composed of n-nary predicates (formula composed of n single
predicates) and the predicates are connected by any of the Boolean operators.
 And each predicate is of the form A  B and  is one of the logical operators { <,  , >, ,
, = }which could be evaluated to either true or false. And A and B are either constant or
variables.
 Formulae should be unambiguous and should make sense.

Example (Tuple Relational Calculus)


 Extract all employees whose skill level is greater than or equal to 8
{E | Employee(E)  E.SkillLevel >= 8}

EmpID FName LName SkillID Skill SkillType School SchoolAdd SkillLevel


28 Chane Kebede 2 SQL Database AAU Sidist_Kilo 10
25 Abera Taye 6 VB6 Programming Helico Piazza 8
65 Almaz Belay 2 SQL Database Helico Piazza 9
51 Selam Belay 4 Prolog Programming Jimma Jimma City 8

 To find only the EmpId, FName, LName, Skill and the School where the skill is attended where of
employees with skill level greater than or equal to 8, the tuple based relational calculus expression will
be:
{E.EmpId, E.FName, E.LName, E.Skill, E.School | Employee(E)  E.SkillLevel >= 8}

EmpID FName LName Skill School


28 Chane Kebede SQL AAU
25 Abera Taye VB6 Helico
65 Almaz Belay SQL Helico
51 Selam Belay Prolog Jimma

 E.FName means the value of the First Name (FName) attribute for the tuple E.

6.3.2 Quantifiers in Relation Calculus


 To tell how many instances the predicate applies to, we can use the two quantifiers in the predicate
logic.
 One relational calculus expressed using Existential Quantifier can also be expressed using Universal
Quantifier.

1. Existential quantifier  (‘there exists’)


Existential quantifier used in formulae that must be true for at least one instance, such as:
An employee with skill level greater than or equal to 8 will be:
{E | Employee(E)  (E)(E.SkillLevel >= 8)}

This means, there exist at least one tuple of the relation employee where the value for
the SkillLevel is greater than or equal to 8

2. Universal quantifier  (‘for all’)


Universal quantifier is used in statements about every instance, such as:
An employee with skill level greater than or equal to 8 will be:
{E | Employee(E)  (E)(E.SkillLevel >= 8)}

This means, for all tuples of relation employee where value for the SkillLevel attribute
is greater than or equal to 8.
Example:

Let’s say that we have the following Schema (set of Relations)

Employee(EID, FName, LName, Dept)


Project(PID, PName, Dept)
Dept(DID, DName, DMangID)
WorksOn(EID, PID)

To find employees who work on projects controlled by department 5 the query will be:
{E | Employee(E)  (P)(Project(P)  (w)(WorksOn(w)  P.Dept=5  E.EID=W.EID))}

You might also like