0% found this document useful (0 votes)
117 views105 pages

Fundamentals of Database Systems

Uploaded by

rediatdemile695
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
117 views105 pages

Fundamentals of Database Systems

Uploaded by

rediatdemile695
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 105

Bonga University College of Engineering & Technology_________

Bonga University
College of Engineering and Technology
Department of Computer Science

January 2023
Bonga University, Ethiopia

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

CHAPTER ONE
INTRODUCTION TO DATABASE SYSTEM

Database System
Database systems are designed to manage large data set in an organization. The data
management involves both definition and the manipulation of the data which ranges from
simple representation of the data to considerations of structures for the storage of
information. The data management also consider the provision of mechanisms for the
manipulation of information.

The power of databases comes from a body of knowledge and technology that has
developed over several decades and is embodied in specialized software called a database
management system, or DBMS. A DBMS is a powerful tool for creating and managing
large amounts of data efficiently and allowing it to persist over long periods of time, safely.
These systems are among the most complex types of software available.

Thus, for our question: What is a database? In essence a database is nothing more than a
collection of shared information that exists over a long period of time, often many years.
In common dialect, the term database refers to a collection of data that is managed by a
DBMS.

Thus, the DB course is about:


◼ How to organize data
◼ Supporting multiple users
◼ Efficient and effective data retrieval
◼ Secured and reliable storage of data
◼ Maintaining consistent data
◼ Making information useful for decision making
Data management passes through the different levels of development along with the
development in technology and services. These levels could best be described by
categorizing the levels into three levels of development. Even though there is an advantage
and a problem overcome at each new level, all methods of data handling are in use to some
extent. The major three levels are;

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

1. Manual Approach
2. Traditional File Based Approach
3. Database Approach

1.2. Manual Approach


In the manual approach, data storage and retrieval follow the primitive and traditional way
of information handling where cards and paper are used for the purpose. The data storage
and retrieval will be performed using human labour.

• Files for as many events and objects as the organization has are used to store
information.
• Each of the files containing various kinds of information is labelled and stored in
one ore more cabinets.
• The cabinets could be kept in safe places for security purpose based on the
sensitivity of the information contained in it.
• Insertion and retrieval is done by searching first for the right cabinet then for the
right the file then the information.
• One could have an indexing system to facilitate access to the data

Limitations of the Manual approach

• Prone to error
• Difficult to update, retrieve, integrate
• You have the data but it is difficult to compile the information
• Limited to small size information
• Cross referencing is difficult
An alternative approach of data handling is a computerized way of dealing with the
information. The computerized approach could also be either decentralized or centralized
base on where the data resides in the system.

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

1.3. Traditional File Based Approach


After the introduction of Computer for data processing to the business community, the
need to use the device for data storage and processing increase. There were, and still are,
several computer applications with file based processing used for the purpose of data
handling. Even though the approach evolved over time, the basic structure is still similar
if not identical.
• File based systems were an early attempt to computerize the manual filing system.
• This approach is the decentralized computerized data handling method.
• A collection of application programs perform services for the end-users. In such
systems, every application program that provides service to end users define and
manage its own data
• Such systems have number of programs for each of the different applications in
the organization.
• Since every application defines and manages its own data, the system is subjected
to serious data duplication problem.
• File, in traditional file based approach, is a collection of records which contains
logically related data.

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

Limitations of the Traditional File Based approach


As business application become more complex demanding more flexible and reliable data
handling methods, the shortcomings of the file based system became evident. These
shortcomings include, but not limited to:
• Separation or Isolation of Data: Available information in one application may not
be known. Data Synchronisation is done manually.
• Limited data sharing- every application maintains its own data.
• Lengthy development and maintenance time
• Duplication or redundancy of data (money and time cost and loss of data integrity)
• Data dependency on the application- data structure is embedded in the application;
hence, a change in the data structure needs to change the application as well.
• Incompatible file formats or data structures (e.g. “C” and COBOL) between
different applications and programs creating inconsistency and difficulty to
process jointly.

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________
• Fixed query processing which is defined during application development The
limitations for the traditional file based data handling approach arise from two
basic reasons.
1. Definition of the data is embedded in the application program which
makes it difficult to modify the database definition easily.
2. No control over the access and manipulation of the data beyond that
imposed by the application programs.
The most significant problem experienced by the traditional file based approach of data
handling is the “update anomalies”. We have three types of update anomalies;
1. Modification Anomalies: a problem experienced when one ore more data value is
modified on one application program but not on others containing the same data
set.
2. Deletion Anomalies: a problem encountered where one record set is deleted from
one application but remain untouched in other application programs.
3. Insertion Anomalies: a problem experienced when ever there is new data item to
be recorded, and the recording is not made in all the applications. And when same
data item is inserted at different applications, there could be errors in encoding
which makes the new data item to be considered as a totally different object.

1.4. Database Approach


Following a famous paper written by Ted Codd in 1970, database systems changed
significantly. Codd proposed that database systems should present the user with a view of
data organized as tables called relations. Behind the scenes, there might be a complex
data structure that allowed rapid response to a variety of queries. But, unlike the user of
earlier database systems, the user of a relational system would not be concerned with the
storage structure. Queries could be expressed in a very high-level language, which
greatly increased the efficiency of database programmers. The database approach
emphasizes the integration and sharing of data throughout the organization.

Thus in Database Approach:

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________
• Database is just a computerized record keeping system or a kind of electronic
filing cabinet.
• Database is a repository for collection of computerized data files.
• Database is a shared collection of logically related data designed to meet the
information needs of an organization. Since it is a shared corporate resource, the
database is integrated with minimum amount of or no duplication.
• Database is a collection of logically related data where these logically related data
comprises entities, attributes, relationships, and business rules of an organization's
information.
• In addition to containing data required by an organization, database also contains
a description of the data which called as “Metadata” or “Data Dictionary” or
“Systems Catalogue” or “Data about Data”.
• Since a database contains information about the data (metadata), it is called a self
descriptive collection on integrated records.
• The purpose of a database is to store information and to allow users to retrieve and
update that information on demand.
• Database is designed once and used simultaneously by many users.
• Unlike the traditional file based approach in database approach there is program
data independence. That is the separation of the data definition from the
application. Thus the application is not affected by changes made in the data
structure and file organization.
• Each database application will perform the combination of: Creating database,
Reading, Updating and Deleting data.

Benefits of the database approach


➢ Data can be shared: two or more users can access and use same data instead of
storing data in redundant manner for each user.
➢ Improved accessibility of data: by using structured query languages, the users can
easily access data without programming experience.
➢ Redundancy can be reduced: isolated data is integrated in database to decrease the
redundant data stored at different applications.

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________
➢ Quality data can be maintained: the different integrity constraints in the database
approach will maintain the quality leading to better decision making
➢ Inconsistency can be avoided: controlled data redundancy will avoid inconsistency
of the data in the database to some extent.
➢ Transaction support can be provided: basic demands of any transaction support
systems are implanted in a full scale DBMS.
➢ Integrity can be maintained: data at different applications will be integrated together
with additional constraints to facilitate shared data resource.
➢ Security majors can be enforced: the shared data can be secured by having different
levels of clearance and other data security mechanisms.
➢ Improved decision support: the database will provide information useful for decision
making.
➢ Standards can be enforced: the different ways of using and dealing with data by
different unite of an organization can be balanced and standardized by using database
approach.
➢ Compactness: since it is an electronic data handling method, the data is stored
compactly (no voluminous papers).
➢ Speed: data storage and retrieval is fast as it will be using the modern fast computer
systems.
➢ Less labour: unlike the other data handling methods, data maintenance will not
demand much resource.
➢ Centralized information control: since relevant data in the organization will be stored
at one repository, it can be controlled and managed at the central level.

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

Limitations and risk of Database Approach


➢ Introduction of new professional and specialized personnel.
➢ Complexity in designing and managing data
➢ The cost and risk during conversion from the old to the new system
➢ High cost to be incurred to develop and maintain the system
➢ Complex backup and recover services from the users perspective Reduced
performance due to centralization and data independency High impact on the
system when failure occurs to the central system.

1.5 Database Management System (DBMS)


Database Management System (DBMS) is a Software package used for providing
EFFICIENT, CONVENIENT and SAFE MULTI-USER (many people/programs accessing same

database, or even same data, simultaneously) storage of and access to MASSIVE amounts of
PERSISTENT (data outlives programs that operate on it) data. A DBMS also provides a systematic
method for creating, updating, storing, retrieving data in a database. DBMS also provides
the service of controlling data access, enforcing data integrity, managing concurrency
control, and recovery. Having this in mind, a full scale DBMS should at least have the
following services to provide to the user.

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________
1. Data storage, retrieval and update in the database
2. A user accessible catalogue
3. Transaction support service: ALL or NONE transaction, which minimize data
inconsistency.
4. Concurrency Control Services: access and update on the database by different
users simultaneously should be implemented correctly.
5. Recovery Services: a mechanism for recovering the database after a failure must
be available.
6. Authorization Services (Security): must support the implementation of access and
authorization service to database administrator and users.
7. Support for Data Communication: should provide the facility to integrate with
data transfer software or data communication managers.
8. Integrity Services: rules about data and the change that took place on the data,
correctness and consistency of stored data, and quality of data based on business
constraints.
9. Services to promote data independency between the data and the application
10. Utility services: sets of utility service facilities like
➢ Importing data
➢ Statistical analysis support
➢ Index reorganization
➢ Garbage collection

DBMS and Components of DBMS Environment


A DBMS is software package used to design, manage, and maintain databases. Each
DBMS should have facilities to define the database, manipulate the content of the
database and control the database. These facilities will help the designer, the user as well
as the database administrator to discharge their responsibility in designing, using and
managing the database. It provides the following facilities:

Data Definition Language (DDL):


 Language used to define each data element required by the organization.

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________
 Commands for setting up schema or the intension of database
 These commands are used to setup a database, create, delete and alter table
with the facility of handling constraints
Data Manipulation Language (DML):
 Is a core command used by end-users and programmers to store, retrieve,
and access the data in the database e.g. SQL
 Since the required data or Query by the user will be extracted using this
type of language, it is also called "Query Language"
Data Dictionary:
Due to the fact that a database is a self describing system, this tool, Data
Dictionary, is used to store and organize information about the data stored
in the database.
Data Control Language:
 Database is a shared resource that demands control of data access and
usage. The database administrator should have the facility to control the
overall operation of the system.
 Data Control Languages are commands that will help the Database
Administrator to control access to the database.
 The commands include grant or revoke privileges to access the database or
particular object within the database and to store or remove database
transactions
The DBMS is software package that helps to design, manage, and use data using the
database approach. Taking a DBMS as a system, one can describe it with respect to its
environment or other systems interacting with the DBMS. The DBMS environment has
five components. To design and use a database, there should be the interaction or
integration of Hardware, Software, Data, Procedure and People.

1. Hardware: are components that one can touch and feel. These components are
comprised of various types of personal computers, mainframe or any server
computers to be used in multi-user system, network infrastructure, and other
peripherals required in the system.

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

2. Software: are collection of commands and programs used to manipulate the


hardware to perform a function. These include components like the DBMS
software, application programs, operating systems, network software, language
software and other relevant software.

3. Data: since the goal of any database system is to have better control of the data
and making data useful, Data is the most important component to the user of the
database. There are two categories of data in any database system: that is
Operational and Metadata. Operational data is the data actually stored in the system
to be used by the user. Metadata is the data that is used to store information about
the database itself.
The structure of the data in the database is called the schema, which is composed
of the Entities, Properties of entities, and relationship between entities.

4. Procedure: this is the rules and regulations on how to design and use a

database. It includes procedures like how to log on to the DBMS, how to use
facilities, how to start and stop transaction, how to make backup, how to treat
hardware and software failure, how to change the structure of the database.

5. People: this component is composed of the people in the organization that are
responsible or play a role in designing, implementing, managing, administering and
using the resources in the database. This component includes group of people with
high level of knowledge about the database and the design technology to other with
no knowledge of the system except using the data in the database.

1.5. Database Development Life Cycle


As it is one component in most information system development tasks, there are several
steps in designing a database system. Here more emphasis is given to the design phases
of the system development life cycle. The major steps in database design are;

1. Planning: that is identifying information gap in an organization and propose a


database solution to solve the problem.

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

2. Analysis: that concentrates more on fact finding about the problem or the
opportunity. Feasibility analysis, requirement determination and structuring, and
selection of best design method are also performed at this phase.
3. Design: in database designing more emphasis is given to this phase. The phase is
further divided into three sub-phases.
a. Conceptual Design: concise description of the data, data type, relationship
between data and constraints on the data.
• There is no implementation or physical detail consideration.
• Used to elicit and structure all information requirements
b. Logical Design: a higher level conceptual abstraction with selected specific
data model to implement the data structure.
• It is particular DBMS independent and with no other physical
considerations.
c. Physical Design: physical implementation of the upper level design of the
database with respect to internal storage and file structure of the database for
the selected DBMS.
• To develop all technology and organizational specification.

4. Implementation: the deployment and testing of the designed database for use.
5. Operation and Support: administering and maintaining the operation of the
database system and providing support to users.

Roles in Database Design and Use


As people are one of the components in DBMS environment, there are group of roles
played by different stakeholders of the designing and operation of a database system.

1. Database Administrator (DBA)


 Responsible to oversee, control and manage the database resources (the
database itself, the DBMS and other related software)
 Authorizing access to the database
 Coordinating and monitoring the use of the database
 Responsible for determining and acquiring hardware and software resources

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________
 Accountable for problems like poor security, poor performance of the system
 Involves in all steps of database development
We can have further classifications of this role in big organizations having huge amount
of data and user requirement.

1. Data Administrator (DA): is responsible on management of data


resources. Involved in database planning, development, maintenance of
standards, policies and procedures at the conceptual and logical design
phases.

2. Database Administrator (DBA): is a more technical role. Is responsible


for the physical realization of the database. Involves in physical design,
implementation, security and integrity control of the database.

2. Database Designer (DBD)


 Identifies the data to be stored and choose the appropriate structures to
represent and store the data.
 Should understand the user requirement and should choose how the user
views the database.
 Involve on the design phase before the implementation of the database system.
We have two distinctions of database designers, one involving in the logical
and conceptual design and another involving in physical design.

3. Logical and Conceptual DBD


➢ Identifies data (entity, attributes and relationship) relevant to the
organization
➢ Identifies constraints on each data
➢ Understand data and business rules in the organization
➢ Sees the database independent of any data model at conceptual level
and consider one specific data model at logical design phase.

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________
4. Physical DBD
➢ Take logical design specification as input and decide how it should be
physically realized.
➢ Map the logical data model on the specified DBMS with respect to tables
and integrity constraints. (DBMS dependent designing)
➢ Select specific storage structure and access path to the database Design
security measures required on the database
3. Application Programmer and Systems Analyst
 System analyst determines the user requirement and how the user wants to view
the database.
 The application programmer implements these specifications as programs; code,
test, debug, document and maintain the application program.
 Determines the interface on how to retrieve, insert, update and delete data in the
database.
 The application could use any high level programming language according to the
availability, the facility and the required service.

4. End Users
Workers, whose job requires accessing the database frequently for various purposes,
there are different group of users in this category.

5. Naïve Users:
➢ Sizable proportion of users
➢ Unaware of the DBMS
➢ Only access the database based on their access level and demand
➢ Use standard and pre-specified types of queries.
6. Sophisticated Users
➢ Are users familiar with the structure of the Database and facilities of the
DBMS.
➢ Have complex requirements
➢ Have higher level queries
➢ Are most of the time engineers, scientists, business analysts, etc

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________
5. Casual Users
➢ Users who access the database occasionally.
➢ Need different information from the database each time.
➢ Use sophisticated database queries to satisfy their needs.
➢ Are most of the time middle to high level managers.
These users can be again classified as “Actors on the Scene” and “Workers Behind the
Scene”.

Actors on the Scene:


➢ Data Administrator
➢ Database Administrator
➢ Database Designer
➢ End Users

Workers behind the Scene


➢ DBMS designers and implementers: who design and implement different DBMS
software.
➢ Tool Developers: experts who develop software packages that facilitates database
system designing and use. Prototype, simulation, code generator developers could
be an example. Independent software vendors could also be categorized in this
group.
➢ Operators and Maintenance Personnel: system administrators who are
responsible for actually running and maintaining the hardware and software of the
database system and the information technology facilities.

ANSI-SPARC Architecture
The purpose and origin of the Three-Level database architecture
 All users should be able to access same data. This is important since the
database is having a shared data feature where all the data is stored in one
location and all users will have their own customized way of interacting with
the data.

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________
 A user's view is unaffected or immune to changes made in other views. Since
the requirement of one user is independent of the other, a change made in one
user’s view should not affect other users.
 Users should not need to know physical database storage details. As there are
naïve users of the system, hardware level or physical details should be a black-
box for such users.
 DBA should be able to change database storage structures without affecting the
users' views. A change in file organization, access method should not affect the
structure of the data which in turn will have no effect on the users.
 Internal structure of database should be unaffected by changes to physical
aspects of storage.
 DBA should be able to change conceptual structure of database without
affecting all users. In any database system, the DBA will have the privilege to
change the structure of the database, like adding tables, adding and deleting an
attribute, changing the specification of the objects in the database.
All the above and much other functionality are possible due to the three level
ANSI-SPARC architecture.

Three-level ANSI-SPARC Architecture of a Database

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

ANSI-SPARC Architecture and Database Design Phases

External Level: Users' view of the database. Describes that part of the database that is
relevant to a particular user. Different users have their own customized view of the
database independent of other users.
Conceptual Level: Community view of the database. Describes what data is stored in
database and relationships among the data.
Internal Level: Physical representation of the database on the computer. Describes how
the data is stored in the database.
The following example can be taken as an illustration for the difference between the
three levels in the ANSI-SPARC database Architecture. Where: The first level is
concerned about the group of users and their respective data requirement independent of
the other.
The second level describes the whole content of the database where
one piece of information will be represented once. The third level
describes the physical storage of the data.

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

Differences between Three Levels of ANSI-SPARC Architecture

DBMS schemas at three levels:


Internal schema: at the internal level to describe physical storage structures and access
paths. Typically uses a physical data model.
Conceptual schema: at the conceptual level to describe the structure and constraints for
the whole database for a community of users. Uses a conceptual or an implementation
data model.
External schema: at the external level to describe the various user views. Usually uses the
same data model as the conceptual level.

Data Independence
Logical Data Independence:
 Refers to immunity of external schemas to changes in conceptual schema.
 Conceptual schema changes e.g., addition/removal of entities should not
require changes to external schema or rewrites of application programs.
 The capacity to change the conceptual schema without having to change the
external schemas and their application programs.

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

Physical Data Independence


 The ability to modify the physical schema without changing the logical
schema
 Applications depend on the logical schema
 In general, the interfaces between the various levels and components should
be well defined so that changes in some parts do not seriously influence
others.
 The capacity to change the internal schema without having to change the
conceptual schema
 Refers to immunity of conceptual schema to changes in the internal schema
 Internal schema changes e.g. using different file organizations, storage
structures/devices should not require change to conceptual or external
schemas.

Data Independence and the ANSI-SPARC Three-level Architecture

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________
Difference between a Data Definition Language (DDL), Data Manipulation
Language (DML) and Data Control Language (DCL)

1.6. Database Languages

Data Definition Language (DDL)


 Allows DBA or user to describe and name entitles, attributes and
relationships required for the application.
 Specification notation for defining the database schema
Data Manipulation Language (DML)
 Provides basic data manipulation operations on data held in the database.
 Language for accessing and manipulating the data organized by the
appropriate data model
 DML also known as query language
Procedural DML: user specifies what data is required and how to get the data.
Non-Procedural DML: user specifies what data is required but not how it is to be retrieved
SQL is the most widely used non-procedural language query language
Data Control Language DCL: used to define the security on the data in the data base.

1.7. Classification of data models

Data Model
A specific DBMS has its own specific Data Definition Language, but this type of
language is too low level to describe the data requirements of an organization in a
way that is readily understandable by a variety of users.
We need a higher-level language.
Such a higher-level is called data-model.
Data Model: a set of concepts to describe the structure of a database, and
certain constraints that the database should obey.

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

A data model is a description of the way that data is stored in a database. Data model
helps to understand the relationship between entities and to create the most effective
structure to hold data.
Data Model is a collection of tools or concepts for describing
 Data
 Data relationships
 Data semantics
 Data constraints
The main purpose of Data Model is to represent the data in an understandable way.
Categories of data models include:
 Object-based
 Record-based
 Physical
Record-based Data Models
Consist of a number of fixed format records.
Each record type defines a fixed number of fields,
Each field is typically of a fixed length.
 Hierarchical Data Model
 Network Data Model
 Relational Data Model

1. Hierarchical Model
• The simplest data model
• Record type is referred to as node or segment
• The top node is the root node
• Nodes are arranged in a hierarchical structure as sort of upsidedown tree
• A parent node can have more than one child node
• A child node can only have one parent node
• The relationship between parent and child is one-to-many

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________
• Relation is established by creating physical link between stored records (each
is stored with a predefined access path to other records)
• To add new record type or relationship, the database must be redefined and
then stored in a new form.

Department

Employee Job

Time Card Activity

ADVANTAGES of Hierarchical Data Model:


 Hierarchical Model is simple to construct and operate on
 Corresponds to a number of natural hierarchically organized domains - e.g.,
assemblies in manufacturing, personnel organization in companies
 Language is simple; uses constructs like GET, GET UNIQUE, GET NEXT, GET
NEXT WITHIN PARENT etc.
DISADVANTAGES of Hierarchical Data Model:
 Navigational and procedural nature of processing
 Database is visualized as a linear arrangement of records
 Little scope for "query optimization"

2. Network Model
◼ Allows record types to have more that one parent unlike hierarchical model
◼ A network data models sees records as set members
◼ Each set has an owner and one or more members
◼ Allow many to many relationship between entities

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________
◼ Like hierarchical model network model is a collection of physically linked
records.
◼ Allow member records to have more than one owner

Department Job

Employee
Activity

Time Card

ADVANTAGES of Network Data Model:


 Network Model is able to model complex relationships and represents semantics
of add/delete on the relationships.
 Can handle most situations for modeling using record types and relationship
types.
 Language is navigational; uses constructs like FIND, FIND member, FIND
owner, FIND NEXT within set, GET etc. Programmers can do optimal
navigation through the database.
DISADVANTAGES of Network Data Model:
 Navigational and procedural nature of processing
 Database contains a complex array of pointers that thread through a set of
records.
 Little scope for automated "query optimization”

3. Relational Data Model


• Developed by Dr. Edgar Frank Codd in 1970 (famous paper, 'A Relational
Model for Large Shared Data Banks')

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________
• Terminologies originates from the branch of mathematics called set theory and
relation
• Can define more flexible and complex relationship
• Viewed as a collection of tables called “Relations” equivalent to collection of
record types
• Relation: Two-dimensional table
• Stores information or data in the form of tables rows and columns
• A row of the table is called tuple equivalent to record
• A column of a table is called attribute equivalent to fields
• Data value is the value of the Attribute
• Records are related by the data stored jointly in the fields of records in two tables
or files. The related tables contain information that creates the relation
• The tables seem to be independent but are related somehow.
• No physical consideration of the storage is required by the user
• Many tables are merged together to come up with a new virtual view of the
relationship

Alternative terminologies

Relation Table File

Tuple Row Record


Attribute Column Field

• The rows represent records (collections of information about separate


items)
• The columns represent fields (particular attributes of a record)
• Conducts searches by using data in specified columns of one table to find
additional data in another table
• In conducting searches, a relational database matches information from a
field in one table with information in a corresponding field of another table
to produce a third table that combines requested data from both tables

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

CHAPTER TWO
RELATIONAL DATA MODEL PROPERTIES OF RELATIONAL DATABASES

• Each row of a table is uniquely identified by a PRIMARY KEY composed of


one or more columns
• Each tuple in a relation must be unique
• Group of columns, that uniquely identifies a row in a table is called a
CANDIDATE KEY
• ENTITY INTEGRITY RULE of the model states that no component of the
primary key may contain a NULL value.
• A column or combination of columns that matches the primary key of another
table is called a FOREIGN KEY. Used to cross-reference tables.
• The REFERENTIAL INTEGRITY RULE of the model states that, for every
foreign key value in a table there must be a corresponding primary key value in
another table in the database or it should be NULL.
• All tables are LOGICAL ENTITIES
• A table is either a BASE TABLES (Named Relations) or VIEWS (Unnamed
Relations)
• Only Base Tables are physically stores
• VIEWS are derived from BASE TABLES with SQL instructions like:
[SELECT .. FROM .. WHERE .. ORDER BY]
• Is the collection of tables o Each entity in one table

o Attributes are fields (columns) in table


• Order of rows and columns is immaterial
• Entries with repeating groups are said to be un-normalized
• Entries are single-valued
• Each column (field or attribute) has a distinct name

All values in a column represent the same attribute and have the same data format

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

2.1. Building Blocks of the Relational Data Model


The building blocks of the relational data model are:

➢ Entities: real world physical or logical object


➢ Attributes: properties used to describe each Entity or real world object.
➢ Relationship: the association between Entities
➢ Constraints: rules that should be obeyed while manipulating the data.

1. The ENTITIES (persons, places, things etc.) which the organization has to deal with.
Relations can also describe relationships

The name given to an entity should always be a singular noun descriptive of each
item to be stored in it. E.g.: student NOT students.

Every relation has a schema, which describes the columns, or fields the relation itself
corresponds to our familiar notion of a table:
A relation is a collection of tuples, each of which contains values for a fixed number
of attributes

◼ Existence Dependency: the dependence of an entity on the existence of one or


more entities.

◼ Weak entity : an entity that can not exist without the entity with which it has
a relationship – it is indicated by a double rectangle

2. The ATTRIBUTES - the items of information which characterize and describe these
entities.

Attributes are pieces of information ABOUT entities. The analysis must of course
identify those which are actually relevant to the proposed application. Attributes
will give rise to recorded items of data in the database

At this level we need to know such things as:

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________
• Attribute name (be explanatory words or phrases)
• The domain from which attribute values are taken (A DOMAIN is a set of
values from which attribute values may be taken.) Each attribute has values
taken from a domain. For example, the domain of Name is string and that for
salary is real
• Whether the attribute is part of the entity identifier (attributes which just
describe an entity and those which help to identify it uniquely)
• Whether it is permanent or time-varying (which attributes may change their
values over time)
• Whether it is required or optional for the entity (whose values will
sometimes be unknown or irrelevant)

Types of Attributes
(1) Simple (atomic) Vs Composite attributes
• Simple: contains a single value (not divided into sub parts) E.g. Age,
gender
• Composite: Divided into sub parts (composed of other attributes)
E.g., Name, address
(2) Single-valued Vs multi-valued attributes
Single-valued: have only single value(the value may change but has
only one value at one time)
E.g. Name, Sex, Id. No. color_of_eyes
Multi-Valued: have more than one value E.g.
Address, college_degree
Person may have several college degrees
(3) Stored vs. Derived Attribute
Stored : not possible to derive or compute
E.g. Name, Address
Derived: The value may be derived (computed) from the values of
other attributes.
E.g. Age (current year – year of birth)

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________
Length of employment (current date- start date)
Profit (earning-cost)
G.P.A (grade point/credit hours)
(4) Null Values
• NULL applies to attributes which are not applicable or which do
not have values.
• You may enter the value NA (meaning not applicable) Value of
a key attribute can not be null.
Default value - assumed value if no explicit value

Entity versus Attributes


When designing the conceptual specification of the database, one should pay attention
to the distinction between an Entity and an Attribute.

◼ Consider designing a database of employees for an organization:

◼ Should address be an attribute of Employees or an entity (connected to


Employees by a relationship)?

If we have several addresses per employee, address must be an


entity (attributes cannot be set-valued/multi valued)

If the structure (city, Woreda, Kebele, etc) is important, e.g. want to retrieve
employees in a given city, address must be modeled as an entity (attribute values
are atomic).

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________
3. The RELATIONSHIPS between entities which exist and must be taken into account
when processing information. In any business processing one object may be associated
with another object due to some event. Such kind of association is what we call a
RELATIONSHIP between entity objects.

• One external event or process may affect several related entities.


• Related entities require setting of LINKS from one part of the database to
another.
• A relationship should be named by a word or phrase which explains its
function
• Role names are different from the names of entities forming the relationship:
one entity may take on many roles, the same role may be played by different
entities
• For each RELATIONSHIP, one can talk about the Number of
Entities and the Number of Tuples participating in the association. These
two concepts are called DEGREE and CARDINALITY of a relationship
respectively.

Degree of a Relationship
An important point about a relationship is how many entities participate in it.
The number of entities participating in a relationship is called the DEGREE
of the relationship.

Among the Degrees of relationship, the following are the basic:


O UNARY/RECURSIVE RELATIONSHIP: Tuples/records of a
Single entity are related withy each other.
O BINARY RELATIONSHIPS: Tuples/records of two entities are
associated in a relationship
O TERNARY RELATIONSHIP: Tuples/records of three different
entities are associated o And a generalized one:

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________
N-NARY RELATIONSHIP: Tuples from arbitrary number of
entity sets are participating in a relationship.

Cardinality of a Relationship
Another important concept about relationship is the number of
instances/tuples that can be associated with a single instance from one entity
in a single relationship. The number of instances participating or associated
with a single instance from an entity in a relationship is called the
CARDINALITY of the relationship. The major cardinalities of a
relationship are:
o ONE-TO-ONE: one tuple is associated with only one other tuple.
▪ E.g. Building – Location as a single building will be located
in a single location and as a single location will only
accommodate a single Building.
o ONE-TO-MANY, one tuple can be associated with many other
tuples, but not the reverse.
▪ E.g. Department-Student as one department can have
multiple students.
o MANY-TO-ONE, many tuples are associated with one tuple but not
the reverse.
▪ E.g. Employee – Department: as many employees belong to
a single department.
o MANY-TO-MANY: one tuple is associated with many other tuples
and from the other side, with a different role name one tuple will be
associated with many tuples

▪ E.g. Student – Course as a student can take many courses

and a single course can be attended by many students.

4. Relational Constraints/Integrity Rules

Relational Integrity

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________
➢ Domain Integrity: No value of the attribute should be beyond the allowable limits
➢ Entity Integrity: In a base relation, no attribute of a Primary Key can assume a value
of NULL
➢ Referential Integrity: If a Foreign Key exists in a relation, either the Foreign Key
value must match a
Candidate Key value in its home relation or the
Foreign Key value must be NULL
➢ Enterprise Integrity: Additional rules specified by the users or database
administrators of a database are incorporated

Key constraints
If tuples are need to be unique in the database, and then we need to make each
tuple distinct. To do this we need to have relational keys that uniquely identify
each relation.

Super Key: an attribute or set of attributes that uniquely identifies a tuple within a
relation.
Candidate Key: a super key such that no proper subset of that collection is a
Super Key within the relation. A candidate key has two properties:
1. Uniqueness
2. Irreducibility
If a super key is having only one attribute, it is automatically a
Candidate key.
Primary Key: the candidate key that is selected to identify tuples uniquely within
the relation.
The entire set of attributes in a relation can be considered as a primary
case in a worst case.
Foreign Key: an attribute, or set of attributes, within one relation that matches the
candidate key of some relation.
A foreign key is a link between different relations to create the view or the
unnamed relation

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

2.2. Relational Views

Relations are perceived as a table from the users’ perspective. Actually, there are
two kinds of relation in relational database. The two categories or types of
Relations are Named and Unnamed Relations. The basic difference is on how the
relation is created, used and updated:
1. Base Relation
A Named Relation corresponding to an entity in the conceptual schema,
whose tuples are physically stored in the database.

2. View (Unnamed Relation)


A View is the dynamic result of one or more relational operations operating
on the base relations to produce another virtual relation that does not actually
exist as presented. So a view is virtually derived relation that does not
necessarily exist in the database but can be produced upon request by a
particular user at the time of request. The virtual table or relation can be
created from single or different relations by extracting some attributes and
records with or without conditions.

Purpose of a view
➢ Hides unnecessary information from users: since only part of the base
relation (Some collection of attributes, not necessarily all) are to be
included in the virtual table.
➢ Provide powerful flexibility and security: since unnecessary
information will be hidden from the user there will be some sort of
data security.
➢ Provide customized view of the database for users: each users are
going to be interfaced with their own preferred data set and format by
making use of the Views.
➢ A view of one base relation can be updated.

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________
➢ Update on views derived from various relations is not allowed since
it may violate the integrity of the database.
➢ Update on view with aggregation and summary is not allowed. Since
aggregation and summary results are computed from a base relation
and does not exist actually.

2.3. Schemas, Instances and Database States


When a database is designed using a Relational data model, all the data is represented in a
form of a table. In such definitions and representation, there are two basic components of
the database. The two components are the definition of the Relation or the Table and the
actual data stored in each table. The data definition is what we call the Schema or the
skeleton of the database and the Relations with some information at some point in time is
the Instance or the flesh of the database.

Schemas
◼ Schema describes how data is to be structured, defined at setup/Design time (also
called "metadata")
◼ Since it is used during the database development phase, there is rare tendency of
changing the schema unless there is a need for system maintenance which demands
change to the definition of a relation.
⚫ Database Schema (Intension): specifies name of relation and the collection of the
attributes (specifically the Name of attributes).
➢ refer to a description of database (or intention)
➢ specified during database design
➢ should not be changed unless during maintenance
⚫ Schema Diagrams
➢ convention to display some aspect of a schema visually
⚫ Schema Construct
➢ refers to each object in the schema (e.g. STUDENT)
E.g.: STUNEDT (FName,LName,Id,Year,Dept,Sex)

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

Instances
Instance: is the collection of data in the database at a particular point of time (snap-
shot).
Also called State or Snap Shot or Extension of the database
➢ Refers to the actual data in the database at a specific point in time
➢ State of database is changed any time we add, delete or update an item.
➢ Valid state: the state that satisfies the structure and constraints specified in
the schema and is enforced by DBMS
◼ Since Instance is actual data of database at some point in time, changes rapidly
◼ To define a new database, we specify its database schema to the DBMS (database is
empty)
◼ database is initialized when we first load it with data

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

CHAPTER THREE
DATABASE DESIGN

Database design is the process of coming up with different kinds of specification for
the data to be stored in the database. The database design part is one of the middle
phases we have in information systems development where the system uses a
database approach. Design is the part on which we would be engaged to describe
how the data should be perceived at different levels and finally how it is going to be
stored in a computer system.
Information System with Database application consists of several tasks which include:
➢ Planning of Information systems Design
➢ Requirements Analysis,
➢ Design (Conceptual, Logical and Physical Design)
➢ Tuning
➢ Implementation
➢ Operation and Support
From these different phases, the prime interest of a database system will be the Design
part which is again sub divided into other three sub-phases.
These sub-phases are:
1. Conceptual Design
2. Logical Design, and
3. Physical Design
➢ In general, one has to go back and forth between these tasks to refine a database
design, and decisions in one task can influence the choices in another task.
➢ In developing a good design, one should answer such questions as:
▪ What are the relevant Entities for the Organization
▪ What are the important features of each Entity
▪ What are the important Relationships
▪ What are the important queries from the user

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________
▪ What are the other requirements of the Organization and the
Users
The Three levels of Database Design

Conceptual Design

Logical Design

Physical Design

Conceptual Database Design


◼ Conceptual design is the process of constructing a model of the information used
in an enterprise, independent of any physical considerations.
◼ It is the source of information for the logical design phase.
◼ Mostly uses an Entity Relationship Model to describe the data at this level.
◼ After the completion of Conceptual Design one has to go for refinement of the
schema, which is verification of Entities, Attributes, and Relationships

Logical Database Design


◼ Logical design is the process of constructing a model of the
information used in an enterprise based on a specific data model (e.g.
relational, hierarchical or network or object), but independent of a
particular DBMS and other physical considerations.
◼ Normalization process
◼ Collection of Rules to be maintained
◼ Discover new entities in the process
◼ Revise attributes based on the rules and the discovered Entities

Physical Database Design


◼ Physical design is the process of producing a description of the implementation of
the database on secondary storage. -- defines specific storage or access methods
used by database

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________
◼ Describes the storage structures and access methods used to achieve efficient access
to the data.
◼ Tailored to a specific DBMS system -- Characteristics are function of DBMS and
operating systems
◼ Includes estimate of storage space

Conceptual Database Design


◼ Conceptual design revolves around discovering and analyzing
organizational and user data requirements
◼ The important activities are to identify
➢ Entities
➢ Attributes
➢ Relationships
➢ Constraints
◼ And based on these components develop the ER model using ER diagrams

3.1. The Entity Relationship (E-R) Model

◼ Entity-Relationship modeling is used to represent conceptual view of the database


◼ The main components of ER Modeling are:
o Entities
▪ Corresponds to entire table, not row
▪ Represented by Rectangle o Attributes
▪ Represents the property used to describe an entity or a
relationship
▪ Represented by Oval o Relationships
▪ Represents the association that exist between entities
▪ Represented by Diamond o Constraints
▪ Represent the constraint in the data
Before working on the conceptual design of the database, one has to know and
answer the following basic questions.

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________
• What are the entities and relationships in the enterprise?
• What information about these entities and relationships should we store in the
database?
• What are the integrity constraints that hold? Constraints on each data with
respect to update, retrieval and store.
• Represent this information pictorially in ER diagrams, then map ER diagram
into a relational schema.
3.2. Developing an E-R Diagram

◼ Designing conceptual model for the database is not a one linear process but
an iterative activity where the design is refined again and again.
◼ To identify the entities, attributes, relationships, and constraints on the data,
there are different set of methods used during the analysis phase.
These include information gathered by…
➢ Interviewing end users individually and in a group
➢ Questionnaire survey
➢ Direct observation
➢ Examining different documents
◼ The basic E-R model is graphically depicted and presented for review.
◼ The process is repeated until the end users and designers agree that the ER
diagram is a fair representation of the organization’s activities and functions.
◼ Checking for Redundant Relationships in the ER Diagram. Relationships
between entities indicate access from one entity to another - it is therefore
possible to access one entity occurrence from another entity occurrence even
if there are other entities and relationships that separate them - this is often
referred to as Navigation' of the ER diagram
◼ The last phase in ER modeling is validating an ER Model against requirement
of the user.

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

Graphical Representations in ER Diagramming


◼ Entity is represented by a RECTANGLE containing the name of the entity.

Strong Entity Weak Entity

◼ Connected entities are called relationship participants

◼ Attributes are represented by OVALS and are


connected to the entity by a line.
Ova
Ovals Ovals Ovals
Ova

Multi-valued Composite Ova


Attribute
Attribute Attribute

◼ A derived attribute is indicated by a DOTTED LINE. (……..)

Ovals

◼ PRIMARY KEYS are underlined.

Key

◼ Relationships are represented by DIAMOND shaped symbols


◼ Weak Relationship is a relationship between Weak and Strong Entities
◼ Strong Relationship is a relationship between two strong Entities

Diamond Diamond

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________
Strong Relationship Weak Relationship

Example 1: Build an ER Diagram for the following information:


◼ A student record management system will have the following two basic data
object categories with their own features or properties: Students will have an
Id, Name, Dept, Age, GPA and Course will have an Id, Name, Credit Hours
◼ Whenever a student enroll in a course in a specific Academic Year and
Semester, the Student will have a grade for the course

Name Dept DoB Id Name Credit

Id Gpa
Students Course
s

Age

Enrolled_In Semester
Academic
Year

Grade

Example 2: Build an ER Diagram for the following information:


◼ A Personnel record management system will have the following two basic
data object categories with their own features or properties: Employee will
have an Id, Name, DoB, Age, Tel and Department will have an Id, Name,
Location
◼ Whenever an Employee is assigned in one Department, the duration of his
stay in the respective department should be registered.

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

Structural Constraints on Relationship


1. Constraints on Relationship / Multiplicity/ Cardinality Constraints
➢ Multiplicity constraint is the number or range of possible occurrence of an entity
type/relation that may relate to a single occurrence/tuple of an entity type/relation through
a particular relationship.
➢ Mostly used to insure appropriate enterprise constraints.

One-to-one relationship:
➢ A customer is associated with at most one loan via the relationship borrower
➢ A loan is associated with at most one customer via borrower

E.g.: Relationship Manages between STAFF and BRANCH The


multiplicity of the relationship is:
➢ One branch can only have one manager
➢ One employee could manage either one or no branches

1..1 Manages 0..1


Employee Branch

One-To-Many Relationships
➢ In the one-to-many relationship a loan is associated with at most one customer
via borrower, a customer is associated with several (including 0) loans via
borrower

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

E.g.: Relationship Leads between STAFF and PROJECT


The multiplicity of the relationship
➢ One staff may Lead one or more project(s)
➢ One project is Lead by one staff

1..1 Leads 0..*


Employee Project

Many-To-Many Relationship
➢ A customer is associated with several (possibly 0) loans via borrower
➢ A loan is associated with several (possibly 0) customers via borrower

E.g.: Relationship Teaches between INSTRUCTOR and COURSE


The multiplicity of the relationship
➢ One Instructor Teaches one or more Course(s)
➢ One Course Thought by Zero or more Instructor(s)

0..* Teaches 1..*


Instructor Course

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

Participation of an Entity Set in a Relationship Set


Participation constraint of a relationship is involved in identifying and setting the
mandatory or optional feature of an entity occurrence to take a role in a
relationship. There are two distinct participation constraints with this respect,
namely: Total Participation and Partial Participation

➢ Total participation: every tuple in the entity or relation participates in at least


one relationship by taking a role. This means, every tuple in a relation will be
attached with at least one other tuple. The entity with total participation in a
relationship will be connected to the relationship using a double line.

➢ Partial participation: some tuple in the entity or relation may not participate
in the relationship. This means, there is at least one tuple from that Relation not
taking any role in that specific relationship. The entity with partial participation in
a relationship will be connected to the relationship using a single line.

➢ E.g. 1: Participation of EMPLOYEE in “belongs to” relationship with


DEPARTMENT is total since every employee should belong to a department.
Participation of DEPARTMENT in “belongs to” relationship with
EMPLOYEE is total since every department should have more than one
employee.

Employee Belongs To Department

➢ E.g. 2: Participation of EMPLOYEE in “manages” relationship with


DEPARTMENT, is partial participation since not all employees are managers.
Participation of DEPARTMENT in “Manages” relationship with
EMPLOYEE is total since every department should have a manager.

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

Employee Manages Department

Problem in ER Modeling
The Entity-Relationship Model is a conceptual data model that views the real world as
consisting of entities and relationships. The model visually represents these concepts by
the Entity-Relationship diagram. The basic constructs of the ER model are entities,
relationships, and attributes. Entities are concepts, real or abstract, about which information
is collected. Relationships are associations between the entities. Attributes are properties
which describe the entities.
While designing the ER model one could face a problem on the design which is called a
connection traps. Connection traps are problems arising from misinterpreting certain
relationships
There are two types of connection traps;
1. Fan trap:
Occurs where a model represents a relationship between entity types, but the pathway
between certain entity occurrences is ambiguous.
May exist where two or more one-to-many (1:M) relationships fan out from an
entity. The problem could be avoided by restructuring the model so that there would
be no 1:M relationships fanning out from a singe entity and all the semantics of the
relationship is preserved.
Example:

EMPLOYEE 1..* Works For 1..1 BRANCH 1..1 IsAssigned 1..*

CAR

Semantics description of the problem;

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

Emp1 Bra1 Car1


Emp2 Bra2 Car2
Emp3 Bra3 Car3
Emp4 Bra4 Car4
Emp5 Car5
Emp6 Car6
Emp7 Car7

Problem: Which car (Car1 or Car3 or Car5) is used by Employee 6 Emp6 working in
Branch 1 (Bra1)? Thus from this ER Model one can not tell which car is used by which
staff since a branch can have more than one car and also a branch is populated by more
than one employee. Thus, we need to restructure the model to avoid the connection trap.
To avoid the Fan Trap problem we can go for restructuring of the E-R Model. This will result
in the following E-R Model.

1..1 Has 1..* 1..* Used By 1..*

CAR EMPLOYEE
BRANCH

Semantics description of the problem;

Car1
Bra1 Emp1
Car2
Bra2 Emp2
Car3
Bra3 Emp3
Car4
Bra4 Emp4
Car5
Emp5
Car6
Emp6
Car7
Emp7

2. Chasm Trap:
Occurs where a model suggests the existence of a relationship between entity types,
but the path way does not exist between certain entity occurrences.

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________
May exist when there are one or more relationships with a minimum multiplicity
on cardinality of zero forming part of the pathway between related entities.
Example:

BRANCH 1..1 Has 1..* EMPLOYEE 0..1 Manages 0..* PROJECT


If we have a set of projects that are not active currently then we can not assign a
project manager for these projects. So there are project with no project manager
making the participation to have a minimum value of zero.
Problem:
How can we identify which BRANCH is responsible for which PROJECT? We
know that whether the PROJECT is active or not there is a responsible BRANCH.
But which branch is a question to be answered, and since we have a minimum
participation of zero between employee and PROJECT we can’t identify the
BRANCH responsible for each PROJECT.
The solution for this Chasm Trap problem is to add another relation ship between the
extreme entities (BRANCH and PROJECT)

1..1 Has 1..* 0..1 Manages 0..*


BRANCH EMPLOYEE PROJECT

1..1 Responsible for 1..*

Enhanced E-R (EER) Models


◼ Object-oriented extensions to E-R model
◼ EER is important when we have a relationship between two entities and
the participation is partial between entity occurrences. In such cases EER
is used to reduce the complexity in participation and relationship
complexity.
◼ ER diagrams consider entity types to be primitive objects

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________
◼ EER diagrams allow refinements within the structures of entity types
◼ EER Concepts
◼ Generalization
◼ Specialization
◼ Sub classes
◼ Super classes
◼ Attribute Inheritance
◼ Constraints on specialization and generalization

Generalization
➢ Generalization occurs when two or more entities represent categories of the same real-world
object.
➢ Generalization is the process of defining a more general entity type from a set of more
specialized entity types.
➢ A generalization hierarchy is a form of abstraction that specifies that two or more entities
that share common attributes can be generalized into a higher-level entity type.
➢ Is considered as bottom-up definition of entities.
➢ Generalization hierarchy depicts relationship between higher level superclass and lower-
level subclass.
Generalization hierarchies can be nested. That is, a subtype of one hierarchy can be a supertype of
another. The level of nesting is limited only by the constraint of simplicity.
Example: Account is a generalized form for Saving and Current Accounts

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

Specialization
➢ Is the result of subset of a higher-level entity set to form a lower-level entity set.
➢ The specialized entities will have additional set of attributes (distinguishing characteristics)
that distinguish them from the generalized entity.
➢ Is considered as Top-Down definition of entities.
➢ Specialization process is the inverse of the Generalization process. Identify the distinguishing
features of some entity occurrences, and specialize them into different subclasses.
➢ Reasons for Specialization o Attributes only partially applying to superclasses o Relationship
types only partially applicable to the superclass
➢ In many cases, an entity type has numerous sub-groupings of its entities that are meaningful
and need to be represented explicitly. This need requires the representation of each subgroup
in the ER model. The generalized entity is a superclass and the set of specialized entities will
be subclasses for that specific Superclass.
Example: Saving Accounts and Current Accounts are Specialized entities for the generalized
entity Accounts. Manager, Sales, Secretary: are specialized employees.

Subclass/Subtype
➢ An entity type whose tuples have attributes that distinguish its members from
tuples of the generalized or Superclass entities.
➢ When one generalized Superclass has various subgroups with distinguishing
features and these subgroups are represented by specialized form, the groups are
called subclasses.

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________
➢ Subclasses can be either mutually exclusive (disjoint) or overlapping (inclusive).
➢ A single subclass may inherit attributes from two distinct superclasses.
➢ A mutually exclusive category/subclass is when an entity instance can be in only
one of the subclasses.
E.g.: An EMPLOYEE can either be SALARIED or PART-TIMER but not
both.
➢ An overlapping category/subclass is when an entity instance may be in two or
more subclasses.
E.g.: A PERSON who works for a university can be both
EMPLOYEE and a STUDENT at the same time.

Super class /Super type


➢ An entity type whose tuples share common attributes. Attributes that are shared
by all entity occurrences (including the identifier) are associated with the super
type.
➢ Is the generalized entity

Relationship Between Superclass and Subclass


➢ The relationship between a superclass and any of its subclasses is called
a superclass/subclass or class/subclass relationship
➢ An instance can not only be a member of a subclass. i.e. Every instance
of a subclass is also an instance in the Superclass.
➢ A member of a subclass is represented as a distinct database object, a
distinct record that is related via the key attribute to its super-class entity.
➢ An entity cannot exist in the database merely by being a member of a
subclass; it must also be a member of the superclass.
➢ An entity occurrence of a sub class not necessarily should belong to any
of the subclasses unless there is full participation in the specialization.
➢ A member of a subclass is represented as a distinct database object, a
distinct record that is related via the key attribute to its super-class entity.

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________
➢ The relationship between a subclass and a Superclass is an “IS A” or “IS
PART OF” type.
▪ Subclass IS PART OF Superclass
▪ Manager IS AN Employee
➢ All subclasses or specialized entity sets should be connected with the
superclass using a line to a circle where there is a subset symbol
indicating the direction of subclass/superclass relationship.

➢ We can also have subclasses of a subclass forming a hierarchy of


specialization.
➢ Superclass attributes are shared by all subclasses f that superclass
➢ Subclass attributes are unique for the subclass.

Attribute Inheritance
➢ An entity that is a member of a subclass inherits all the attributes of the
entity as a member of the superclass.
➢ The entity also inherits all the relationships in which the superclass
participates.
➢ An entity may have more than one subclass categories.
➢ All entities/subclasses of a generalized entity or superclass share a
common unique identifier attribute (primary key). i.e. The primary key
of the superclass and subclasses are always identical.

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

• Consider the EMPLOYEE supertype entity shown above. This entity can
have several different subtype entities (for example: HOURLY and
SALARIED), each with distinct properties not shared by other subtypes. But
whether the employee is HOURLY or SALARIED, same attributes
(EmployeeId, Name, and DateHired)
are shared.
• The Supertype EMPLOYEE stores all properties that subclasses have in
common. And HOURLY employees have the unique attribute Wage (hourly
wage rate), while SALARIED employees have two unique attributes,
StockOption and Salary.

Constraints on specialization and generalization


◼ Completeness Constraint.
◼ The Completeness Constraint addresses the issue of whether or not an occurrence of
a Superclass must also have a corresponding Subclass occurrence.
◼ The completeness constraint requires that all instances of the subtype be represented
in the supertype.
◼ The Total Specialization Rule specifies that an entity occurrence should at least be a
member of one of the subclasses. Total Participation of superclass instances on
subclasses is diagrammed with a double line from the Supertype to the circle as shown
below.

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________
E.g.: If we have EXTENTION and REGULAR as subclasses of a superclass STUDENT, then it is
mandatory that each student to be either EXTENTION or REGULAR student. Thus, the
participation of instances of STUDENT in EXTENTION and REGULAR subclasses will be total.

◼ The Partial Specialization Rule specifies that it is not necessary for all entity
occurrences in the superclass to be a member of one of the subclasses. Here we have an
optional participation on the specialization. Partial Participation of superclass instances
on subclasses is diagrammed with a single line from the Supertype to the circle.

◼ E.g.: If we have MANAGER and SECRETARY as subclasses of a superclass


EMPLOYEE, then it is not the case that all employees are either manager or secretary.
Thus, the participation of instances of employee in MANAGER and SECRETARY
subclasses will be partial.

◼ Di jointness Constraints.
• Specifies the rule whether one entity occurrence can be a member of more than one
subclasses. i.e., it is a type of business rule that deals with the situation where an entity
occurrence of a Superclass may also have more than one Subclass occurrence.
• The Disjoint Rule restricts one entity occurrence of a superclass to be a member of only
one of the subclasses. Example: a EMPLOYEE can either be SALARIED or PART-
TIMER, but not the both at the same time.
• The Overlap Rule allows one entity occurrence to be a member f more than one subclass.
Example: EMPLOYEE working at the university can be both a STUDENT and an
EMPLOYEE at the same time.

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________
• This is diagrammed by placing either the letter "d" for disjoint or
"o" for overlapping inside the circle on the Generalization Hierarchy
portion of the E-R diagram.

The two types of constraints on generalization and specialization (Disjointness and


Completeness constraints) are not dependent on one another. That is, being disjoint will
not favour whether the tuples in the superclass should have Total or Partial participation
for that specific specialization.
From the two types of constraints, we can have four possible constraints
◼ Disjoint AND Total
◼ Disjoint AND Partial
◼ Overlapping AND Total
◼ Overlapping AND Partial

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

CHAPTER FOUR
LOGICAL DATABASE DESIGN
Logical design is the process of constructing a model of the information used in an
enterprise based on a specific data model (e.g., relational, hierarchical or network or
object), but independent of a particular DBMS and other physical considerations.
◼ Normalization process
◼ Collection of Rules to be maintained
◼ Discover new entities in the process
◼ Revise attributes based on the rules and the discovered Entities
The first step before applying the rules in relational data model is converting the conceptual
design to a form suitable for relational logical model, which is in a form of tables.
Converting ER Diagram to Relational Tables
Three basic rules to convert ER into tables or relations:
1. For a relationship with One-to-One Cardinality:
⚫ All the attributes are merged into a single table. Which means one
can post the primary key or candidate key of one of the relations to
the other as a foreign key.
2. For a relationship with One-to-Many Cardinality:
⚫ Post the primary key or candidate key from the “one” side as a
foreign key attribute to the “many” side. E.g.: For a relationship
called “Belongs To” between Employee (Many) and Department
(One)
3. For a relationship with Many-to-Many Cardinality:
⚫ Create a new table (which is the associative entity) and post primary
key or candidate key from each entity as attributes in the new table
along with some additional attributes (if applicable)
After converting the ER diagram in to table forms, the next phase is implementing the
process of normalization, which is a collection of rules each table should satisfy.

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

4.1. Normalization
A relational database is merely a collection of data, organized in a particular manner. As
the father of the relational database approach, Codd created a series of rules called normal
forms that help define that organization

One of the best ways to determine what information should be stored in a database is to
clarify what questions will be asked of it and what data would be included in the answers.

Database normalization is a series of steps followed to obtain a database design that


allows for consistent storage and efficient access of data in a relational database. These
steps reduce data redundancy and the risk of data becoming inconsistent.

NORMALIZATION is the process of identifying the logical associations between data


items and designing a database that will represent such associations but without suffering
the update anomalies which are;

1. Insertion Anomalies
2. Deletion Anomalies
3. Modification Anomalies

Normalization may reduce system performance since data will be cross referenced from
many tables. Thus, denormalization is sometimes used to improve performance, at the cost
of reduced consistency guarantees.

Normalization normally is considered as good if it is lossless decomposition.

All the normalization rules will eventually remove the update anomalies that may exist
during data manipulation after the implementation. The update anomalies are;

The type of problems that could occur in insufficiently normalized table is called update
anomalies which includes;
(1) Insertion anomalies
An "insertion anomaly" is a failure to place information about a new database entry
into all the places in the database where information about that new entry needs to be

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________
stored. In a properly normalized database, information about a new entry needs to be
inserted into only one place in the database; in an inadequately normalized database,
information about a new entry may need to be inserted into more than one place and,
human fallibility being what it is, some of the needed additional insertions may be
missed.

(2) Deletion anomalies


A "deletion anomaly" is a failure to remove information about an existing database
entry when it is time to remove that entry. In a properly normalized database,
information about an old, to-be-gotten-rid-of entry needs to be deleted from only one
place in the database; in an inadequately normalized database, information about that
old entry may need to be deleted from more than one place, and, human fallibility being
what it is, some of the needed additional deletions may be missed.
(3) Modification anomalies
A modification of a database involves changing some value of the attribute of a table.
In a properly normalized database table, whatever information is modified by the
user, the change will be affected and used accordingly.
The purpose of normalization is to reduce the chances for anomalies to occur in a
database.

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________
Example of problems related with Anomalies

EmpID FName LName SkillID Skill SkillType School SchoolAdd Skill


Level
12 Abebe Mekuria 2 SQL Database AAU Sidist_Kilo 5
16 Lemma Alemu 5 C++ Programming Unity Gerji 6
28 Chane Kebede 2 SQL Database AAU Sidist_Kilo 10
25 Abera Taye 6 VB6 Programming Helico Piazza 8
65 Almaz Belay 2 SQL Database Helico Piazza 9
24 Dereje Tamiru 8 Oracle Database Unity Gerji 5
51 Selam Belay 4 Prolog Programming Jimma Jimma 8
City
94 Alem Kebede 3 Cisco Networking AAU Sidist_Kilo 7
18 Girma Dereje 1 IP Programming Jimma Jimma 4
City
13 Yared Gizaw 7 Java Programming AAU Sidist_Kilo 6

Deletion Anomalies:
If employee with ID 16 is deleted then ever information about skill C++ and the
type of skill is deleted from the database. Then we will not have any information
about C++ and its skill type.
Insertion Anomalies:
What if we have a new employee with a skill called Pascal? We can not decide
weather Pascal is allowed as a value for skill and we have no clue about the type
of skill that Pascal should be categorized as.
Modification Anomalies:
What if the address for Helico is changed fro Piazza to Mexico? We need to look
for every occurrence of Helico and change the value of School_Add from Piazza
to Mexico, which is prone to error.

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________
Database-management system can work only with the information that we put
explicitly into its tables for a given database and into its rules for working with those
tables, where such rules are appropriate and possible.

4.2. Functional Dependency (FD)


Before moving to the definition and application of normalization, it is important to have an
understanding of "functional dependency."

Data Dependency
The logical associations between data items that point the database designer in the direction
of a good database design are refered to as determinant or dependent relationships.
Two data items A and B are said to be in a determinant or dependent relationship if certain
values of data item B always appears with certain values of data item A. if the data item A
is the determinant data item and B the dependent data item then the direction of the
association is from A to B and not vice versa.
The essence of this idea is that if the existence of something, call it A, implies that B must
exist and have a certain value, and then we say that "B is functionally dependent on
A." We also often express this idea by saying that "A determines B," or that "B is a function
of A," or that "A functionally governs B." Often, the notions of functionality and functional
dependency are expressed briefly by the statement, "If A, then B." It is important to note
that the value B must be unique for a given value of A, i.e., any given value of A must
imply just one and only one value of B, in order for the relationship to qualify for the name
"function." (However, this does not necessarily prevent different values of A from implying
the same value of B.)
X Y holds if whenever two tuples have the same value for X, they must have the same
value for Y

The notation is: A B which is read as; B is functionally dependent on A


In general, a functional dependency is a relationship among attributes. In relational
databases, we can have a determinant that governs one other attribute or several other
attributes.

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________
FDs are derived from the real-world constraints on the attributes

Example
Dinner Type of Wine
Course
Meat Red
Fish White
Cheese Rose
Since the type of Wine served depends on the type of Dinner, we say Wine is functionally dependent
on Dinner.
Dinner Wine

Dinner Type of Wine Type of Fork


Course
Meat Red Meat fork
Fish White Fish fork
Cheese Rose Cheese fork

Since both Wine type and Fork type are determined by the Dinner type, we say Wine is
functionally dependent on Dinner and Fork is functionally dependent on Dinner.
Dinner Wine
Dinner Fork

Partial Dependency
If an attribute which is not a member of the primary key is dependent on some part of the
primary key (if we have composite primary key) then that attribute is partially functionally
dependent on the primary key.
Let {A,B} is the Primary Key and C is non key attribute.

Then if {A,B} C and B C


Then C is partially functionally dependent on {A,B}

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

Full Dependency
If an attribute which is not a member of the primary key is not dependent on some part of
the primary key but the whole key (if we have composite primary key) then that attribute
is fully functionally dependent on the primary key.
Let {A, B} is the Primary Key and C is non key attribute

Then if {A, B} C and B C and A C does not hold


Then C Fully functionally dependent on {A, B}

Transitive Dependency
In mathematics and logic, a transitive relationship is a relationship of the following form: "If
A implies B, and if also B implies C, then A implies C."
Example:
If Mr X is a Human, and if every Human is an Animal, then Mr X must be an Animal.
Generalized way of describing transitive dependency is that:
If A functionally governs B, AND
If B functionally governs C
THEN A functionally governs C
Provided that neither C nor B determines A i.e. (B / A and C / A) In the
normal notation:

{(A B) AND (B C)} ==> A C provided that B / A and C / A

Steps of Normalization:
We have various levels or steps in normalization called Normal Forms. The level of
complexity, strength of the rule and decomposition increases as we move from one
lower-level Normal Form to the higher.

A table in a relational database is said to be in a certain normal form if it satisfies certain


constraints.

normal form below represents a stronger condition than the previous one

Normalization towards a logical design consists of the following steps:

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________
UnNormalized Form:
Identify all data elements First
Normal Form:
Find the key with which you can find all data Second
Normal Form:
Remove part-key dependencies. Make all data dependent on the whole key.
Third Normal Form
Remove non-key dependencies. Make all data dependent on nothing but the key. For most practical
purposes, databases are considered normalized if they adhere to third normal form.

First Normal Form (1NF)


Requires that all column values in a table are atomic (e.g., a number is an atomic
value, while a list or a set is not).
We have two ways of achiving this:
1. Putting each repeating group into a separate table and connecting them with
a primary key-foreign key relationship
2. Moving this repeating groups to a new row by repeating the common
attributes. If so, then find the key with which you can find all data
Definition: a table (relation) is in 1NF
If
➢ There are no duplicated rows in the table. Unique identifier Each cell is
single-valued (i.e., there are no repeating groups).
➢ Entries in a column (attribute, field) are of the same kind.

Example for First Normal form (1NF)


UNNORMALIZED
EmpID FirstName LastName Skill SkillType School SchoolAdd SkillLevel

12 Abebe Mekuria SQL, Database, AAU, Sidist_Kilo 5


VB6 Programming Helico Piazza 8

16 Lemma Alemu C++ Programming Unity Gerji 6


IP Programming Jimma Jimma 4
City

28 Chane Kebede SQL Database AAU Sidist_Kilo 10

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

65 Almaz Belay SQL Database Helico Piazza 9


Prolog Programming Jimma Jimma 8
Java Programming AAU City 6
Sidist_Kilo

24 Dereje Tamiru Oracle Database Unity Gerji 5


94 Alem Kebede Cisco Networking AAU Sidist_Kilo 7

FIRST NORMAL FORM (1NF)


Remove all repeating groups. Distribute the multi-valued attributes into different rows
and identify a unique identifier for the relation so that is can be said is a relation in
relational database.

EmpID FirstName LastName SkillID Skill SkillType School SchoolAdd SkillLevel


12 Abebe Mekuria 1 SQL Database AAU Sidist_Kilo 5
12 Abebe Mekuria 3 VB6 Programming Helico Piazza 8
16 Lemma Alemu 2 C++ Programming Unity Gerji 6
16 Lemma Alemu 7 IP Programming Jimma Jimma City 4

28 Chane Kebede 1 SQL Database AAU Sidist_Kilo 10


65 Almaz Belay 1 SQL Database Helico Piazza 9
65 Almaz Belay 5 Prolog Programming Jimma Jimma City 8

65 Almaz Belay 8 Java Programming AAU Sidist_Kilo 6


24 Dereje Tamiru 4 Oracle Database Unity Gerji 5
94 Alem Kebede 6 Cisco Networking AAU Sidist_Kilo 7

Second Normal form 2NF


No partial dependency of a non-key attribute on part of the primary key. This will result in
a set of relations with a level of Second Normal Form.
Any table that is in 1NF and has a single-attribute (i.e., a non-composite) key is automatically
also in 2NF.

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________
Definition: a table (relation) is in 2NF If
➢ It is in 1NF and
➢ If all non-key attributes are dependent on the entire primary key.
i.e., no partial dependency.
Example for 2NF:
EMP_PROJ
EmpID EmpName ProjNo ProjName ProjLoc ProjFund ProjMangID Incentive

EMP_PROJ rearranged
EmpID ProjNo EmpName ProjName ProjLoc ProjFund ProjMangID Incentive
Business rule: Whenever an employee participates in a project, he/she will be entitled for an
incentive.
This schema is in its 1NF since we don’t have any repeating groups or attributes with
multi-valued property. To convert it to a 2NF we need to remove all partial dependencies
of non-key attributes on part of the primary key.
{EmpID, ProjNo} EmpName, ProjName, ProjLoc, ProjFund, ProjMangID, Incentive
But in addition to this we have the following dependencies
FD1: {EmpID} EmpName
FD2: {ProjNo} ProjName, ProjLoc, ProjFund, ProjMangID
FD3: {EmpID, ProjNo} Incentive
As we can see, some non-key attributes are partially dependent on some part of the
primary key. This can be witnessed by analyzing the first two functional dependencies
(FD1 and FD2). Thus, each Functional Dependencies, with their dependent attributes
should be moved to a new relation where the Determinant will be the Primary Key for
each.
EMPLOYEE
EmpID EmpName

PROJECT
ProjNo ProjName ProjLoc ProjFund ProjMangID

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________
EMP_PROJ
EmpID ProjNo Incentive

Third Normal Form (3NF)


Eliminate Columns Dependent on another non-Primary Key - If attributes do not
contribute to a description of the key, remove them to a separate table. This level avoids
update and delete anomalies.
Definition: A Table (Relation) is in 3NF
If
➢ It is in 2NF and
➢ There are no transitive dependencies between a primary key and non-
primary key attribute.
Example for (3NF)
Assumption: Students of same batch (same year) live in one building or dormitory
STUDENT

StudID Stud_F_Name Stud_L_Name Dept Year Dormitary


125/97 Abebe Mekuria Info Sc 1 401
654/95 Lemma Alemu Geog 3 403
842/95 Chane Kebede CompSc 3 403
165/97 Alem Kebede InfoSc 1 401
985/95 Almaz Belay Geog 3 403
This schema is in its 2NF since the primary key is a single attribute.
Let’s take StudID, Year and Dormitary and see the dependencies.

StudID Year AND Year Dormitary


And Year cannot determine StudID and Dormitary cannot determine
StudID Then transitively StudID Dormitary

To convert it to a 3NF we need to remove all transitive dependencies of non-


key attributes on another non-key attribute.

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

The non-primary key attributes, dependent on each other will be moved to another table
and linked with the main table using Candidate Key- Foreign Key relationship.

STUDENT DORM

StudID Stud Stud Dept Year


Year Dormitary
F_Name L_Name 1 401
125/97 Abebe Mekuria Info Sc 3 1 403
654/95 Lemma Alemu Geog 3
842/95 Chane Kebede CompSc 3
165/97 Alem Kebede InfoSc 1
985/95 Almaz Belay Geog 3
Generally, eventhough there are other four additional levels of Normalization, a table is
said to be normalized if it reaches 3NF. A database with all tables in the 3NF is said to be
Normalized Database.

Mnemonic for remembering the rationale for normalization up to 3NF could be the
following:

1. No Repeating or Redunduncy: no repeting fields in the table.


2. The Fields Depend Upon the Key: the table should solely depend on the key.
3. The Whole Key: no partial key dependency.
4. And Nothing but The Key: no inter data dependency.
5. So, Help Me Codd: since Codd came up with these rules.

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

Other Levels of Normalization Boyce-Codd


Normal Form (BCNF):
Isolate Independent Multiple Relationships - No table may contain two or more 1:n or N:M
relationships that are not directly related.

The correct solution, to cause the model to be in 4th normal form, is to ensure that all M:M
relationships are resolved independently if they are indeed independent, as shown below.

Def: A table is in BCNF if it is in 3NF and if every determinant is a candidate key.

Forth Normal form (4NF)


Isolate Semantically Related Multiple Relationships - There may be practical constrains on
information that justify separating logically related many-to-many relationships. Def: A table is
in 4NF if it is in BCNF and if it has no multi-valued dependencies.

Fifth Normal Form (5NF)


A model limited to only simple (elemental) facts, as expressed in ORM.
Def: A table is in 5NF, also called "Projection-Join Normal Form" (PJNF), if it is in 4NF and
if every join dependency in the table is a consequence of the candidate keys of the table.

Domain-Key Normal Form (DKNF)


A model free from all modification anomalies.
Def: A table is in DKNF if every constraint on the table is a logical consequence of the
definition of keys and domains.
The underlying ideas in normalization are simple enough. Through normalization we want to design
for our relational database a set of tables that;
(1) Contain all the data necessary for the purposes that the database is to serve,
(2) Have as little redundancy as possible,
(3) Accommodate multiple values for types of data that require them, (4) Permit efficient
updates of the data in the database, and
(5) Avoid the danger of losing data unknowingly.

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

Pitfalls of Normalization

• Requires data to see the problems


• May reduce performance of the system
• Is time consuming,
• Difficult to design and apply and
• Prone to human error

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

CHAPTER FIVE
Physical Database Design Methodology for Relational Database
We have established that there are three levels of database design:

• Conceptual: producing a data model which accounts for the relevant entities and
relationships within the target application domain;
• Logical: ensuring, via normalization procedures and the definition of integrity rules, that the
stored database will be non-redundant and properly connected;
• Physical: specifying how database records are stored, accessed and related to ensure
adequate performance.
We can consider the topic of physical database design from three aspects:
• What techniques for storing and finding data exist
• Which are implemented within a particular DBMS
• Which might be selected by the designer for a given application knowing the properties
of the data
Thus, the purpose of physical database design is to describe:
1. How to map the logical database design to a physical database design.
2. How to design base relations for target DBMS.
3. How to design enterprise constraints for target DBMS.
4. How to select appropriate file organizations based on analysis of transactions.
5. When to use secondary indexes to improve performance.
6. How to estimate the size of the database
7. How to design user views
8. How to design security mechanisms to satisfy user requirements.

Physical database design is the process of producing a description of the implementation of the
database on secondary storage.
Physical design describes the base relation, file organization, and indexes used to achieve
efficient access to the data, and any associated integrity constraints and security measures.

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

◼ Sources of information for the physical design process include global logical data model and
documentation that describes model.
◼ Logical database design is concerned with the what; physical database design is concerned
with the how.
◼ The process of producing a description of the implementation of the database on secondary
storage.
◼ Describes the storage structures and access methods used to achieve efficient access to the
data.

Steps in physical database design


1. Translate logical data model for target DBMS
1.1. Design base relation
1.2. Design representation of derived data
1.3. Design enterprise constraint
2. Design physical representation
2.1. Analyze transactions
2.2. Choose file organization
2.3. Choose indexes
2.4. Estimate disk space and system requirement
3. Design user view
4. Design security mechanisms
5. Consider controlled redundancy
6. Monitor and tune the operational system
1. Translate logical data model for target DBMS
This phase is the translation of the global logical data model to produce a relational
database schema in the target DBMS. This includes creating the data dictionary based on
the logical model and information gathered.
After the creation of the data dictionary, the next activity is to understand the functionality
of the target DBMS so that all necessary requirements are fulfilled for the database
intended to be developed.
Department of Computer Science Fundamentals of Database Systems (CoSc2041)
Bonga University College of Engineering & Technology_________

Knowledge of the DBMS includes: how to create base relations whether the system supports:
definition of Primary key
definition of foreign key
definition of Alternate key
definition of Domains
Referential integrity constraints
definition of enterprise level constraints

5.1. Design base relation


To decide how to represent base relations identified in global logical model in target DBMS.
Designing base relation involves identification of all necessary requirements about a relation
starting from the name up to the referential integrity constraints.
For each relation, need to define:
• The name of the relation;
• A list of simple attributes in brackets;
• The PK and, where appropriate, AKs and FKs.
• A list of any derived attributes and how they should be computed;
• Referential integrity constraints for any FKs identified.
For each attribute, need to define:
• Its domain, consisting of a data type, length, and any constraints on the domain;
• An optional default value for the attribute; Whether the attribute can hold nulls.
The implementation of the physical model is dependent on the target DBMS since some
has more facilities than the other in defining database definitions. The base relation
design along with every justifiable reason should be fully documented.

1.2. Design representation of derived data


While analyzing the requirement of users, we may encounter that there are some attributes
holding data that will be derived from existing or other attributes. A decision on how to

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

represent any derived data present in the global logical data model in the target DBMS
should be devised.
Examine logical data model and data dictionary, and produce list of all derived attributes.
Most of the time derived attributes are not expressed in the logical model but will be
included in the data dictionary. Whether to store derived attributes in a base relation or
calculate them when required is a decision to be made by the designer considering the
performance impact.
Option selected is based on:
• Additional cost to store the derived data and keep it consistent with operational
data from which it is derived;
• Cost to calculate it each time it is required.
Less expensive option is chosen subject to performance constraints.
The representation of derived attributes should be fully documented.

1.3. Design enterprise constraint


Data in the database is not only subjected to constraints on the database and the data
model used but also with some enterprise dependent constraints. These constraint
definitions are also dependent on the DBMS selected and enterprise level requirements.
One need to know the functionalities of the DBMS since in designing the enterprise
constraints for the target DBMS some DBMS provide more facilities than others.
All the enterprise level constraints and the definition method in the target DBMS should
be fully documented.

2. Design physical representation


This phase is the level for determining the optimal file organizations to store the base
relations and the indexes that are required to achieve acceptable performance; that is,
the way in which relations and tuples will be held on secondary storage.
Number of factors that may be used to measure efficiency:
• Transaction throughput: number of transactions processed in given time
interval.
Department of Computer Science Fundamentals of Database Systems (CoSc2041)
Bonga University College of Engineering & Technology_________

• Response time: elapsed time for completion of a single transaction. Disk


storage: amount of disk space required to store database files.
However, no one factor is always correct.
Typically, have to trade one factor off against another to achieve a reasonable balance.

2.1. Analyze transactions


To understand the functionality of the transactions that will run on the database and to
analyze the important transactions.
Attempt to identify performance criteria, e.g.:
• Transactions that run frequently and will have a significant impact on
performance;
• Transactions that are critical to the business;
• Times during the day/week when there will be a high demand made on the
database (called the peak load).
Use this information to identify the parts of the database that may cause performance
problems.
To select appropriate file organizations and indexes, also need to know highlevel
functionality of the transactions, such as:
• Attributes that are updated in an update transaction; Criteria used to restrict
tuples that are retrieved in a query. Often not possible to analyze all expected
transactions, so investigate most ‘important’ ones.
To help identify which transactions to investigate, can use:
• Transaction/relation cross-reference matrix, showing relations that each
transaction accesses, and/or
• Transaction usage map, indicating which relations are potentially heavily used.
To focus on areas that may be problematic:
1. Map all transaction paths to relations.
2. Determine which relations are most frequently accessed by transactions.
3. Analyze the data usage of selected transactions that involve these relations.
Department of Computer Science Fundamentals of Database Systems (CoSc2041)
Bonga University College of Engineering & Technology_________

2.2. Choose file organization


To determine an efficient file organization for each base relation
File organizations include Heap, Hash, Indexed Sequential Access Method (ISAM), B+-
Tree, and Clusters.

2.3. Choose indexes


To determine whether adding indexes will improve the performance of the system.
One approach is to keep tuples unordered and create as many secondary indexes as
necessary.
Another approach is to order tuples in the relation by specifying a primary or clustering
index.
In this case, choose the attribute for ordering or clustering the tuples as:
• Attribute that is used most often for join operations - this makes join operation
more efficient, or
• Attribute that is used most often to access the tuples in a relation in order of that
attribute.
If ordering attribute chosen is key of relation, index will be a primary index; otherwise,
index will be a clustering index.
Each relation can only have either a primary index or a clustering index. Secondary
indexes provide a mechanism for specifying an additional key for a base relation that
can be used to retrieve data more efficiently.
Overhead involved in maintenance and use of secondary indexes that has to be balanced
against performance improvement gained when retrieving data.
This includes:
• Adding an index record to every secondary index whenever tuple is inserted;
• Updating a secondary index when corresponding tuple is updated;
• Increase in disk space needed to store the secondary index;

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

• Possible performance degradation during query optimization to consider all


secondary indexes.
Guidelines for Choosing Indexes
(1) Do not index small relations.
(2) Index PK of a relation if it is not a key of the file organization.
(3) Add secondary index to a FK if it is frequently accessed.
(4) Add secondary index to any attribute that is heavily used as a secondary key.
(5) Add secondary index on attributes that are involved in: selection or join criteria;
ORDER BY; GROUP BY; and other operations involving sorting (such as
UNION or DISTINCT).
(6) Add secondary index on attributes involved in built-in functions.
(7) Add secondary index on attributes that could result in an index-only plan.
(8) Avoid indexing an attribute or relation that is frequently updated.
(9) Avoid indexing an attribute if the query will retrieve a significant proportion of
the tuples in the relation.
(10) Avoid indexing attributes that consist of long character strings.

2.4. Estimate disk space and system requirement


To estimate the amount of disk space that will be required by the database.
Purpose:
• If system already exists: is there adequate storage?
• If procuring new system: what storage will be required?

3. Design user view


To design the user views that was identified during the Requirements
Collection and Analysis stage of the relational database application lifecycle.
Define views in DDL to provide user views identified in data model
Map onto objects in physical data model

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

4. Design security mechanisms


To design the security measures for the database as specified by the users.
System security

Data security

5. Consider the Introduction of Controlled Redundancy


To determine whether introducing redundancy in a controlled manner by relaxing the
normalization rules will improve the performance of the system.
Result of normalization is a logical database design that is structurally consistent and has
minimal redundancy.
However, sometimes a normalized database design does not provide maximum processing
efficiency.
It may be necessary to accept the loss of some of the benefits of a fully normalized design in
favor of performance. Also consider that denormalization:
• Makes implementation more complex;
• Often sacrifices flexibility;
• May speed up retrievals but it slows down updates.
Denormalization refers to a refinement to relational schema such that the degree of
normalization for a modified relation is less than the degree of at least one of the original
relations.
Also use term more loosely to refer to situations where two relations are combined into one
new relation, which is still normalized but contains more nulls than original relations.
Consider denormalization in following situations, specifically to speed up frequent or critical
transactions:
• Step 1 Combining 1:1 relationship
• Step 2 Duplicating non-key attributes in 1:* relationships to reduce joins
• Step 3 Duplicating foreign key attributes in 1:* relationships to reduce joins
• Step 4 Introducing repeating groups
• Step 5 Merging lookup tables with base relations Step 6 Creating extract tables.

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

6. Monitoring and Tuning the operational system


Meaning of denormalization
When to denormalize to improve performance
Importance of monitoring and tuning the operational system
To monitor operational system and improve performance of system to correct inappropriate
design decisions or reflect changing requirements.

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

CHAPTER SIX
RELATIONAL QUERY LANGUAGES
◼ Query languages: Allow manipulation and retrieval of data from a database.
◼ Query Languages! = programming languages!
◼ QLs not intended to be used for complex calculations.
◼ QLs support easy, efficient access to large data sets.
◼ Relational model supports simple, powerful query languages.
Formal Relational Query Languages
◼ There are varieties of Query languages used by relational DBMS for manipulating
relations.
◼ Some of them are procedural
◼ User tells the system exactly what and how to manipulate the data
◼ Others are non-procedural
◼ User states what data is needed rather than how it is to be retrieved.
Two mathematical Query Languages form the basis for Relational languages
Relational Algebra:
◼ Relational Calculus:
◼ We may describe the relational algebra as procedural language: it can be used to tell
the DBMS how to build a new relation from one or more relations in the database.
◼ We may describe relational calculus as a non-procedural language: it can be used to
formulate the definition of a relation in terms of one or more database relations.
◼ Formally the relational algebra and relational calculus are equivalent to each other. For
every expression in the algebra, there is an equivalent expression in the calculus.
◼ Both are non-user-friendly languages. They have been used as the basis for other, higher-
level data manipulation languages for relational databases.
A query is applied to relation instances, and the result of a query is also a relation instance.
◼ Schemas of input relations for a query are fixed
◼ The schema for the result of a given query is also fixed! Determined by definition
of query language constructs.

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

Relational Algebra
The basic set of operations for the relational model is known as the relational algebra. These
operations enable a user to specify basic retrieval requests.
The result of the retrieval is a new relation, which may have been formed from one or more
relations. The algebra operations thus produce new relations, which can be further manipulated
using operations of the same algebra.
A sequence of relational algebra operations forms a relational algebra expression, whose result
will also be a relation that represents the result of a database query (or retrieval request).
◼ Relational algebra is a theoretical language with operations that work on one or
more relations to define another relation without changing the original relation.
◼ The output from one operation can become the input to another operation (nesting
is possible)
◼ There are different basic operations that could be applied on relations on a
database based on the requirement.
◼ Selection ( ) Selects a subset of rows from a relation.
◼ Projection ( ) Deletes unwanted columns from a relation.
◼ Renaming: assigning intermediate relation for a single operation
◼ Cross-Product ( x ) Allows us to combine two relations.
◼ Set-Difference ( - ) Tuples in relation1, but not in relation2.

◼ Union ( ) Tuples in relation1 or in relation2.


◼ Intersection ( ) Tuples in relation1 and in relation2

◼ Join Tuples joined from two relations based on a condition


◼ Using these we can build up sophisticated database queries.

Table1:
Sample table used to illustrate different kinds of relational operations. The relation contains
information about employees, IT skills they have and the school where they attend each skill.

Employee
EmpID FName LName SkillID Skill SkillType School SchoolAdd SkillLevel

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

12 Abebe Mekuria 2 SQL Database AAU Sidist_Kilo 5


16 Lemma Alemu 5 C++ Programming Unity Gerji 6
28 Chane Kebede 2 SQL Database AAU Sidist_Kilo 10
25 Abera Taye 6 VB6 Programming Helico Piazza 8
65 Almaz Belay 2 SQL Database Helico Piazza 9
24 Dereje Tamiru 8 Oracle Database Unity Gerji 5
51 Selam Belay 4 Prolog Programming Jimma Jimma City 8
94 Alem Kebede 3 Cisco Networking AAU Sidist_Kilo 7
18 Girma Dereje 1 IP Programming Jimma Jimma City 4
13 Yared Gizaw 7 Java Programming AAU Sidist_Kilo 6

1. Selection
◼ Selects subset of tuples/rows in a relation that satisfy selection condition.
◼ Selection operation is a unary operator (it is applied to a single relation)
◼ The Selection operation is applied to each tuple individually
◼ The degree of the resulting relation is the same as the original relation but the cardinality
(no. of tuples) is less than or equal to the original relation.
◼ The Selection operator is commutative.
◼ Set of conditions can be combined using Boolean operations ( (AND), (OR), and

~(NOT))

◼ No duplicates in result!
◼ Schema of result identical to schema of (only) input relation.
◼ Result relation can be the input for another relational algebra operation! (Operator
composition.)
◼ It is a filter that keeps only those tuples that satisfy a qualifying condition
(Those satisfying the condition are selected while others are discarded.)

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

Notation:

<Selection Condition> <Relation Name>


Example: Find all Employees with skill type of Database.

< SkillType =”Database”> (Employee)


This query will extract every tuple from a relation called Employee with all the attributes where
the SkillType attribute with a value of “Database”.
The resulting relation will be the following.

EmpID FName LName SkillID Skill SkillType School SchoolAdd SkillLevel


12 Abebe Mekuria 2 SQL Database AAU Sidist_Kilo 5

28 Chane Kebede 2 SQL Database AAU Sidist_Kilo 10

65 Almaz Belay 2 SQL Database Helico Piazza 9


24 Dereje Tamiru 8 Oracle Database Unity Gerji 5
If the query is all employees with a SkillType Database and School Unity the relational algebra
operation and the resulting relation will be as follows.

< SkillType =”Database” AND School=”Unity”> (Employee)

EmpID FName LName SkillID Skill SkillType School SchoolAdd SkillLevel


24 Dereje Tamiru 8 Oracle Database Unity Gerji 5

2. Projection
◼ Selects certain attributes while discarding the other from the base relation.
◼ The PROJECT creates a vertical partitioning – one with the needed columns (attributes)
containing results of the operation and other containing the discarded Columns.
◼ Deletes attributes that are not in projection list.

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

◼ Schema of result contains exactly the fields in the projection list, with the same names
that they had in the (only) input relation.
◼ Projection operator has to eliminate duplicates!
◼ Note: real systems typically don’t do duplicate elimination unless the user explicitly asks
for it.
◼ If the Primary Key is in the projection list, then duplication will not occur
◼ Duplication removal is necessary to insure that the resulting table is also a relation.

Notation:

<Selected Attributes> <Relation Name>


Example: To display Name, Skill, and Skill Level of an employee, the query and the resulting
relation will be:

<FName, LName, Skill, Skill_Level> (Employee)

FName LName Skill SkillLevel


Abebe Mekuria SQL 5
Lemma Alemu C++ 6
Chane Kebede SQL 10
Abera Taye VB6 8
Almaz Belay SQL 9
Dereje Tamiru Oracle 5
Selam Belay Prolog 8
Alem Kebede Cisco 7
Girma Dereje IP 4
Yared Gizaw Java 6
If we want to have the Name, Skill, and Skill Level of an employee with Skill SQL and SkillLevel
greater than 5 the query will be:

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

<FName, LName, Skill, Skill_Level> ( <Skill=”SQL” (Employee))


SkillLevel>5>

FName LName Skill SkillLevel


Chane Kebede SQL 10

Almaz Belay SQL 9

3. Rename Operation
We may want to apply several relational algebra operations one after the other. The query
could be written in two different forms:
1. Write the operations as a single relational algebra expression by nesting the
operations.
2. Apply one operation at a time and create intermediate result relations. In the
latter case, we must give names to the relations that hold the intermediate
results Rename Operation
If we want to have the Name, Skill, and Skill Level of an employee with salary greater than 1500
and working for department 5, we can write the expression for this query using the two
alternatives:

1. A single algebraic expression:


The above used query is using a single algebra operation, which is:

<FName, LName, Skill, Skill_Level> ( <Skill=”SQL” (Employee))


SkillLevel>5>

2. Using an intermediate relation by the Rename Operation:

Step1: Result1 <DeptNo=5 (Employee) Step2:


Salary>1500>

Result <FName, LName, Skill, Skill_Level> (Result1)

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

Then Result will be equivalent with the relation we get using the first
alternative.

4. Set Operations
The three main set operations are the Union, Intersection and Set Difference. The properties of
these set operations are similar with the concept we have in mathematical set theory. The
difference is that, in database context, the elements of each set, which is a Relation in Database,
will be tuples. The set operations are Binary operations which demand the two operand Relations
to have type compatibility feature.

Type Compatibility
Two relations R1 and R2 are said to be Type Compatible if:
1. The operand relations R1(A1, A2, ..., An) and R2(B1, B2, ..., Bn) have the same number
of attributes, and
2. The domains of corresponding attributes must be compatible; that is,
Dom(Ai)=Dom(Bi) for i=1, 2, ..., n.
To illustrate the three set operations, we will make use of the following two tables:

Employee
EmpID FName LName SkillID Skill SkillType School SkillLevel
12 Abebe Mekuria 2 SQL Database AAU 5

16 Lemma Alemu 5 C++ Programming Unity 6


28 Chane Kebede 2 SQL Database AAU 10

25 Abera Taye 6 VB6 Programming Helico 8

65 Almaz Belay 2 SQL Database Helico 9

24 Dereje Tamiru 8 Oracle Database Unity 5


51 Selam Belay 4 Prolog Programming Jimma 8

94 Alem Kebede 3 Cisco Networking AAU 7


18 Girma Dereje 1 IP Programming Jimma 4

13 Yared Gizaw 7 Java Programming AAU 6

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

RelationOne: Employees who attend Database Course


EmpID FName LName SkillID Skill SkillType School SkillLevel
12 Abebe Mekuria 2 SQL Database AAU 5
28 Chane Kebede 2 SQL Database AAU 10

65 Almaz Belay 2 SQL Database Helico 9

24 Dereje Tamiru 8 Oracle Database Unity 5

RelationTwo : Employees who attend a course in AAU


EmpID FName LName SkillID Skill SkillType School SkillLevel
12 Abebe Mekuria 2 SQL Database AAU 5
94 Alem Kebede 3 Cisco Networking AAU 7

28 Chane Kebede 2 SQL Database AAU 10

13 Yared Gizaw 7 Java Programming AAU 6

a. UNION Operation
The result of this operation, denoted by R U S, is a relation that includes all tuples
that are either in R or in S or in both R and S. Duplicate tuple is eliminated.
The two operands must be "type compatible"

Eg: RelationOne U RelationTwo

Employees who attend Database in any School or who attend any course at AAU

EmpID FName LName SkillID Skill SkillType School SkillLevel


12 Abebe Mekuria 2 SQL Database AAU 5
28 Chane Kebede 2 SQL Database AAU 10
65 Almaz Belay 2 SQL Database Helico 9
24 Dereje Tamiru 8 Oracle Database Unity 5
94 Alem Kebede 3 Cisco Networking AAU 7
13 Yared Gizaw 7 Java Programming AAU 6

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

b. INTERSECTION Operation
The result of this operation, denoted by R ∩ S, is a relation that includes all tuples
that are in both R and S. The two operands must be "type compatible"

Eg: RelationOne ∩ RelationTwo


Employees who attend Database Course at AAU

EmpID FName LName SkillID Skill SkillType School SkillLevel


12 Abebe Mekuria 2 SQL Database AAU 5
28 Chane Kebede 2 SQL Database AAU 10

c. Set Difference (or MINUS) Operation


The result of this operation, denoted by R - S, is a relation that includes all tuples
that are in R but not in S.
The two operands must be "type compatible"

Eg: RelationOne - RelationTwo

Employees who attend Database Course but didn’t take any course at AAU

EmpID FName LName SkillID Skill SkillType School SkillLevel


65 Almaz Belay 2 SQL Database Helico 9
24 Dereje Tamiru 8 Oracle Database Unity 5

Eg: RelationTwo - RelationOne

Employees who attend Database Course but didn’t take any course at AAU

EmpID FName LName SkillID Skill SkillType School SkillLevel


12 Abebe Mekuria 2 SQL Database AAU 5
94 Alem Kebede 3 Cisco Networking AAU 7
28 Chane Kebede 2 SQL Database AAU 10

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

13 Yared Gizaw 7 Java Programming AAU 6

The resulting relation for; R1 R2, R1 R2, or R1-R2 has the same attribute names as
the first operand relation R1 (by convention).
Some Properties of the Set Operators
Notice that both union and intersection are commutative operations; that is

R S=S R, and R S=S R


Both union and intersection can be treated as n-nary operations applicable to any number
of relations as both are associative operations; that is

R (S T) = (R S) T, and (R S) T=R (S T)
The minus operation is not commutative; that is, in general
R-S≠S–R
5. CARTESIAN (cross product) Operation
This operation is used to combine tuples from two relations in a combinatorial fashion. That
means, every tuple in Relation1(R) one will be related with every other tuple in Relation2 (S).
• In general, the result of R(A1, A2, . . ., An) x S(B1,B2, . . ., Bm) is a relation Q with degree
n + m attributes Q(A1, A2, . . ., An, B1, B2, . . ., Bm), in that order.
• Where R has n attributes and S has m attributes.
• The resulting relation Q has one tuple for each combination of tuples— one from R and
one from S.
• Hence, if R has n tuples, and S has m tuples, then | R x S | will have n* m tuples.
Example:
Employee
ID FName LName
123 Abebe Lemma

567 Belay Taye

822 Kefle Kebede

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

Dept
DeptID DeptName MangID
2 Finance 567

3 Personnel 123
Then the Cartesian product between Employee and Dept relations will be of the form:

Employee X Dept:
ID FName LName DeptID DeptName MangID
123 Abebe Lemma 2 Finance 567

123 Abebe Lemma 3 Personnel 123

567 Belay Taye 2 Finance 567

567 Belay Taye 3 Personnel 123

822 Kefle Kebede 2 Finance 567

822 Kefle Kebede 3 Personnel 123


Basically, even though it is very important in query processing, the Cartesian Product is not useful
by itself since it relates every tuple in the First Relation with every other tuple in the Second
Relation. Thus, to make use of the Cartesian Product, one has to use it with the Selection
Operation, which discriminate tuples of a relation by testing whether each will satisfy the selection
condition.
In our example, to extract employee information about managers of the departments (Managers of
each department), the algebra query and the resulting relation will be.

<ID, FName, LName, DeptName > ( <ID=MangID> (Employee X Dept))


ID FName LName DeptName
123 Abebe Lemma Personnel

567 Belay Taye Finance

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

6. JOIN Operation
The sequence of Cartesian product followed by select is used quite commonly to identify and
select related tuples from two relations, a special operation, called JOIN. Thus in JOIN
operation, the Cartesian Operation and the Selection Operations are used together.

JOIN Operation is denoted by a symbol.


This operation is very important for any relational database with more than a single relation,
because it allows us to process relationships among relations.
The general form of a join operation on two relations R(A1, A2,. .
., An) and S(B1, B2, . . ., Bm) is:

R <join condition> S is equivalent to <


selection condition>(R X S)
where <join condition> and <selection condition> are the same
Where, R and S can be any relation that results from general relational algebra expressions.
Since JOIN is an operation that needs two relation, it is a Binary operation.
This type of JOIN is called a THETA JOIN ( - JOIN)
Where is the logical operator used in the join condition.

Could be { <, , >, , ,=}


Example:
Thus, in the above example we want to extract employee information about managers of the
departments, the algebra query using the JOIN operation will be.

Employee < ID=MangID> Dept


a. EQUIJOIN Operation
The most common use of join involves join conditions with equality comparisons only ( = ).
Such a join, where the only comparison operator used is called an EQUIJOIN. In the result of an
EQUIJOIN, we always have one or more pairs of attributes (whose names need not be identical)
that have identical values in every tuple since we used the equality logical operator.
For example, the above JOIN expression is an EQUIJOIN since the logical operator
used is the equal to operator (=).

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

b. NATURAL JOIN Operation


We have seen that in EQUIJOIN one of each pair of attributes with identical values is extra, a
new operation called natural join was created to get rid of the second (or extra) attribute that
we will have in the result of an EQUIJOIN condition.
The standard definition of natural join requires that the two join attributes, or each pair of
corresponding join attributes, have the same name in both relations. If this is not the case, a
renaming operation on the attributes is applied first.
c. OUTER JOIN Operation
OUTER JOIN is another version of the JOIN operation where non matching tuples from a
relation are also included in the result with NULL values for attributes in the other relation.
There are two major types of OUTER JOIN.
1. RIGHT OUTER JOIN: where non matching tuples from the second (Right) relation are
included in the result with NULL value for attributes of the first (Left) relation.
2. LEFT OUTER JOIN: where non matching tuples from the first (Left) relation are
included in the result with NULL value for attributes of the second (Right) relation.
Notation for Left Outer Join:

R <Join Condition > S


When two relations are joined by a JOIN operator, there could be some tuples in the first relation
not having a matching tuple from the second relation, and the query is interested to display these
non-matching tuples from the first or second relation. Such query is represented by the OUTER
JOIN.

d. SEMIJOIN Operation
SEMI JOIN is another version of the JOIN operation where the resulting Relation will contain
those attributes of only one of the Relations that are related with tuples in the other Relation. The
following notation depicts the inclusion of only the attributes form the first relation (R) in the
result which are actually participating in the relationship.

R <Join Condition> S

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

Relational Calculus
A relational calculus expression creates a new relation, which is specified in terms of
variables that range over rows of the stored database relations (in tuple calculus) or over
columns of the stored relations (in domain calculus).
In a calculus expression, there is no order of operations to specify how to retrieve the

query result. A calculus expression specifies only what information the result should

contain rather than how to retrieve it.

In Relational calculus, there is no description of how to evaluate a query; this is the main
distinguishing feature between relational algebra and relational calculus.
Relational calculus is considered to be a nonprocedural language. This differs from
relational algebra, where we must write a sequence of operations to specify a retrieval
request; hence relational algebra can be considered as a procedural way of stating a query.
When applied to relational database, the calculus is not that of derivative and differential
but in a form of first-order logic or predicate calculus, a predicate is a truth-valued
function with arguments.
When we substitute values for the arguments in the predicate, the function yields an
expression, called a proposition, which can be either true or false.

If a predicate contains a variable, as in ‘x is a member of staff’, there must be a range

for x. When we substitute some values of this range for x, the proposition may be true;
for other values, it may be false.
If COND is a predicate, then the set of all tuples evaluated to be true for the predicate
COND will be expressed as follows:
{t | COND(t)}
Where t is a tuple variable and COND (t) is a conditional expression involving t. The
result of such a query is the set of all tuples t that satisfy

COND (t).

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

If we have set of predicates to evaluate for a single query, the predicates can be

connected using (AND), (OR), and ~(NOT)

A relational calculus expression creates a new relation, which is specified in terms of variables
that range over rows of the stored database relations (in tuple calculus) or over columns of the
stored relations (in domain calculus).
Tuple-oriented Relational Calculus
➢ The tuple relational calculus is based on specifying a number of tuple variables. Each
tuple variable usually ranges over a particular database relation, meaning that the variable
may take as its value any individual tuple from that relation.
➢ Tuple relational calculus is interested in finding tuples for which a predicate is true for a
relation. Based on use of tuple variables.

➢ Tuple variable is a variable that ‘ranges over’ a named relation: that is, a variable whose
only permitted values are tuples of the relation.
➢ If E is a tuple that ranges over a relation employee, then it is represented as
EMPLOYEE(E) i.e. Range of E is EMPLOYEE
➢ Then to extract all tuples that satisfy a certain condition, we will represent is as all
tuples E such that COND(E) is evaluated to be true.

{E COND(E)}
The predicates can be connected using the Boolean operators:
(AND), (OR), (NOT)
COND(t) is a formula, and is called a Well-Formed-Formula (WFF) if:
➢ Where the COND is composed of n-nary predicates (formula composed of
n single predicates) and the predicates are connected by any of the Boolean
operators.
➢ And each predicate is of the form A B and is one of the logical
operators { <, , >, , , = }which could be evaluated to either true or
false. And A and B are either constant or variables.
➢ Formulae should be unambiguous and should make sense.
Department of Computer Science Fundamentals of Database Systems (CoSc2041)
Bonga University College of Engineering & Technology_________

Example (Tuple Relational Calculus)


Extract all employees whose skill level is greater than or equal to 8
{E | Employee(E) E.SkillLevel >= 8}

EmpID FName LName SkillID Skill SkillType School SchoolAdd SkillLevel


28 Chane Kebede 2 SQL Database AAU Sidist_Kilo 10
25 Abera Taye 6 VB6 Programming Helico Piazza 8
65 Almaz Belay 2 SQL Database Helico Piazza 9
51 Selam Belay 4 Prolog Programming Jimma Jimma 8
City

➢ To find only the EmpId, FName, LName, Skill and the School where the skill is
attended where of employees with skill level greater than or equal to 8, the tuple
based relational calculus expression will be:
{E.EmpId, E.FName, E.LName, E.Skill, E.School | Employee(E) E.SkillLevel >= 8}

EmpID FName LName Skill School


28 Chane Kebede SQL AAU
25 Abera Taye VB6 Helico
65 Almaz Belay SQL Helico
51 Selam Belay Prolog Jimma
➢ E.FName means the value of the First Name (FName) attribute for the tuple E.

Quantifiers in Relation Calculus


➢ To tell how many instances the predicate applies to, we can use the two quantifiers
in the predicate logic.
➢ One relational calculus expressed using Existential Quantifier can also be
expressed using Universal Quantifier.

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

1. Existential quantifier (‘there exists’)


Existential quantifier used in formulae that must be true for at least one instance,
such as:
An employee with skill level greater than or equal to 8 will be:
{E | Employee(E) ( E)(E.SkillLevel >= 8)}
This means, there exist at least one tuple of the relation employee where the
value for the SkillLevel is greater than or equal to 8
2. Universal quantifier (‘for all’)
Universal quantifier is used in statements about every instance, such as:
An employee with skill level greater than or equal to 8 will be:
{E | Employee(E) ( E)(E.SkillLevel >= 8)}
This means, for all tuples of relation employee where value for the
SkillLevel attribute is greater than or equal to 8.
Example:
Let’s say that we have the following Schema (set of Relations)
Employee(EID, FName, LName, Dept) Project(PID, PName, Dept) Dept(DID, DName,
DMangID) WorksOn(EID, PID)
To find employees who work on projects controlled by department 5 the query will be:
{E | Employee(E) ( P)(Project(P) ( w)(WorksOn(w) P.Dept=5 E.EID=W.EID))}

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

CHAPTER SEVEN
ADVANCED CONCEPTS IN DATABASE SYSTEMS
Database Security and Integrity
Distributed Database Systems Data warehousing
1. Database Security and Integrity
A database represents an essential corporate resource that should be properly secured using
appropriate controls.
Database security encompasses hardware, software, people and data
Multi-user database system - DBMS must provide a database security and authorization
subsystem to enforce limits on individual and group access rights and privileges.
Database security and integrity is about protecting the database from being inconsistent and being
disrupted. We can also call it database misuse.
Database misuse could be Intentional or accidental, where accidental misuse is easier to cope
with than intentional misuse. Accidental inconsistency could occur due to:
➢ System crash during transaction processing
➢ Anomalies due to concurrent access
➢ Anomalies due to redundancy
➢ Logical errors
Likewise, even though there are various threats that could be categorized in this group,
intentional misuse could be:
➢ Unauthorized reading of data
➢ Unauthorized modification of data or
➢ Unauthorized destruction of data
Most systems implement good Database Integrity to protect the system from accidental misuse
while there are many computer based measures to protect the system from intentional misuse,
which is termed as Database Security measures.
Database security is considered in relation to the following situations:
➢ Theft and fraud
Department of Computer Science Fundamentals of Database Systems (CoSc2041)
Bonga University College of Engineering & Technology_________

➢ Loss of confidentiality (secrecy)


➢ Loss of privacy
➢ Loss of integrity
➢ Loss of availability
Security Issues and general considerations
• Legal, ethical and social issues regarding the right to access information

• Physical control

• Policy issues regarding privacy of individual level at enterprise and national level
• Operational consideration on the techniques used (password, etc)

• System level security including operating system and hardware control Security
levels and security policies in enterprise level
• Database security - the mechanisms that protect the database against intentional
or accidental threats. And Database security encompasses hardware, software,
people and data
• Threat – any situation or event, whether intentional or accidental, that may
adversely affect a system and consequently the organization
• A threat may be caused by a situation or event involving a person, action, or
circumstance that is likely to bring harm to an organization
• The harm to an organization may be tangible or intangible Tangible – loss of
hardware, software, or data
Intangible – loss of credibility or client confidence
Examples of threats:
✓ Using another persons’ means of access
✓ Unauthorized amendment/modification or copying of data
✓ Program alteration
✓ Inadequate policies and procedures that allow a mix of confidential and normal
out put
✓ Wire-tapping
Department of Computer Science Fundamentals of Database Systems (CoSc2041)
Bonga University College of Engineering & Technology_________

✓ Illegal entry by hacker


✓ Blackmail
✓ Creating ‘trapdoor’ into system
✓ Theft of data, programs, and equipment
✓ Failure of security mechanisms, giving greater access than normal
✓ Staff shortages or strikes
✓ Inadequate staff training
✓ Viewing and disclosing unauthorized data
✓ Electronic interference and radiation
✓ Data corruption owing to power loss or surge
✓ Fire (electrical fault, lightning strike, arson), flood, bomb
✓ Physical damage to equipment
✓ Breaking cables or disconnection of cables
✓ Introduction of viruses

Levels of Security Measures


Security measures can be implemented at several levels and for different components of
the system. These levels are:
1. Physical Level: concerned with securing the site containing the computer system
should be physically secured. The backup systems should also be physically protected
from access except for authorized users.
2. Human Level: concerned with authorization of database users for access the content
at different levels and privileges.
3. Operating System: concerned with the weakness and strength of the operating
system security on data files. Weakness may serve as a means of unauthorized access
to the database. This also includes protection of data in primary and secondary
memory from unauthorized access.

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

4. Database System: concerned with data access limit enforced by the database system.
Access limit like password, isolated transaction and etc.
Even though we can have different levels of security and authorization on data objects
and users, who access which data is a policy matter rather than technical.
These policies
➢ should be known by the system: should be encoded in the system
➢ should be remembered: should be saved somewhere (the catalogue)

• An organization needs to identify the types of threat it may be subjected to and initiate
appropriate plans and countermeasures, bearing in mind the costs of implementing
them

Countermeasures: Computer based controls


• The types of countermeasures to threats on computer systems range from physical controls
to administrative procedures
• Despite the range of computer-based controls that are available, it is worth noting that,
generally, the security of a DBMS is only as good as that of the operating system, owing to
their close association
• The following are computer-based security controls for a multi-user environment:

Authorization
▪ The granting of a right or privilege that enables a subject to have legitimate access
to a system or a system’s object
▪ Authorization controls can be built into the software, and govern not only what
system or object a specified user can access, but also what the user may do with
it
▪ Authorization controls are sometimes referred to as access controls
▪ The process of authorization involves authentication of subjects (i.e. a user or
program) requesting access to objects (i.e. a database table, view, procedure,
trigger, or any other object that can be created within the system)

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

Views
▪ A view is the dynamic result of one or more relational operations operation on
the base relations to produce another relation
▪ A view is a virtual relation that does not actually exist in the database, but is
produced upon request by a particular user
▪ The view mechanism provides a powerful and flexible security mechanism by
hiding parts of the database from certain users
▪ Using a view is more restrictive than simply having certain privileges granted to
a user on the base relation(s)

Integrity
▪ Integrity constraints contribute to maintaining a secure database system by
preventing data from becoming invalid and hence giving misleading or incorrect
results
▪ Domain Integrity
▪ Entity integrity
▪ Referential integrity Key constraints

Backup and recovery


▪ Backup is the process of periodically taking a copy of the database and log
file (and possibly programs) on to offline storage media
▪ A DBMS should provide backup facilities to assist with the recovery of a
database following failure
▪ Database recovery is the process of restoring the database to a correct state
in the event of a failure
▪ Journaling is the process of keeping and maintaining a log file (or journal)
of all changes made to the database to enable recovery to be undertaken
effectively in the event of a failure

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

▪ The advantage of journaling is that, in the event of a failure, the database


can be recovered to its last known consistent state using a backup copy of
the database and the information contained in the log file
▪ If no journaling is enabled on a failed system, the only means of recovery
is to restore the database using the latest backup version of the database
▪ However, without a log file, any changes made after the last backup to the
database will be lost

Encryption
▪ The encoding of the data by a special algorithm that renders the data
unreadable by any program without the decryption key
▪ If a database system holds particularly sensitive data, it may be deemed
necessary to encode it as a precaution against possible external threats or
attempts to access it
▪ The DBMS can access data after decoding it, although there is a
degradation in performance because of the time taken to decode it
▪ Encryption also protects data transmitted over communication lines
▪ To transmit data securely over insecure networks requires the use of a
Cryptosystem, which includes:

Authentication
➢ All users of the database will have different access levels and permission for
different data objects, and authentication is the process of checking whether the
user is the one with the privilege for the access level.
➢ Is the process of checking the users are who they say they are.
➢ Each user is given a unique identifier, which is used by the operating system to
determine who they are
➢ Thus, the system will check whether the user with a specific username and
password is trying to use the resource.

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

➢ Associated with each identifier is a password, chosen by the user and known to
the operation system, which must be supplied to enable the operating system to
authenticate who the user claims to be
Any database access request will have the following three major components

1. Requested Operation: what kind of operation is requested by a specific query?

2. Requested Object: on which resource or data of the database is the operation sought to be
applied?

3. Requesting User: who is the user requesting the operation on the specified object?
The database should be able to check for all the three components before processing any
request. The checking is performed by the security subsystem of the DBMS.

Forms of user authorization


There are different forms of user authorization on the resource of the database. These forms are
privileges on what operations are allowed on a specific data object.

User authorization on the data/extension


1. Read Authorization: the user with this privilege is allowed only to read the content of
the data object.
2. Insert Authorization: the user with this privilege is allowed only to insert new records
or items to the data object.
3. Update Authorization: users with this privilege are allowed to modify content of
attributes but are not authorized to delete the records.
4. Delete Authorization: users with this privilege are only allowed to delete a record and
not anything else.
Different users, depending on the power of the user, can have one or the combination of the
above forms of authorization on different data objects.

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

Role of DBA in Database Security


The database administrator is responsible to make the database to be as secure as possible. For
this the DBA should have the most powerful privilege than every other user. The DBA provides
capability for database users while accessing the content of the database.
The major responsibilities of DBA in relation to authorization of users are:
1. Account Creation: involves creating different accounts for different USERS as well as
USER GROUPS.
2. Security Level Assignment: involves in assigning different users at different categories of
access levels.
3. Privilege Grant: involves giving different levels of privileges for different users and user
groups.
4. Privilege Revocation: involves denying or canceling previously granted privileges for
users due to various reasons.
5. Account Deletion: involves in deleting an existing account of users or user groups. Is
similar with denying all privileges of users on the database.

2. Distributed Database Systems


◼ Database development facilitates the integration of data available in an organization and
enforces security on data access. But it is not always the case that organizational data
reside in one site. This demand databases at different sites to be integrated and
synchronized with all the facilities of database approach. This leads to Distributed
Database Systems.
◼ In a distributed database system, the database is stored on several computers. The
computers in a distributed system communicate with each other through various
communication media, such as high-speed buses or telephone line.
◼ A distributed database system consists of a collection of sites, each of which maintains a
local database system and also participates in global transaction where different databases
are integrated together.
◼ Even though integration of data implies centralized storage and control, in distributed
database systems the intention is different. Data is stored in different database systems in

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

a decentralized manner but act as if they are centralized through development of


computer networks.
◼ A distributed database system consists of loosely coupled sites that share no physical
component and database systems that run on each site are independent of each other.
Transactions may access data at one or more sites
◼ Organization may implement their database system on a number of separate computer
system rather than a single, centralized mainframe. Computer Systems may be located
at each local branch office.
The functionalities of a DDBMS will include: Extended Communication Services, Extended Data
Dictionary, Distributed Query Processing, Extended Concurrency Control and Extended Recovery
Services.
Concepts in DDBMS

◼ Replication: System maintains multiple copies of data, stored in different sites, for
faster retrieval and fault tolerance.

◼ Fragmentation: Relation is partitioned into several fragments stored in distinct sites


◼ Data transparency: Degree to which system user may remain unaware of the
details of how and where the data items are stored in a distributed system

Advantages of DDBMS
1. Data sharing and distributed control:
➢ User at one site may be able access data that is available at another site.
➢ Each site can retain some degree of control over local data
➢ We will have local as well as global database administrator
2. Reliability and availability of data
➢ If one site fails the rest can continue operation as long as transaction does not demand data
from the failed system and the data is not replicated in other sites
3. Speedup of query processing
➢ If a query involves data from several sites, it may be possible to split the query into sub-
queries that can be executed at several sites which is parallel processing

Disadvantages of DDBMS
1. Software development cost

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

2. Greater potential for bugs (parallel processing may endanger correctness)


3. Increased processing overhead (due to communication jargons) 4.
Communication problems

Homogeneous and Heterogeneous Distributed Databases


In a homogeneous distributed database
◼ All sites have identical software
◼ Are aware of each other and agree to cooperate in processing user requests.
◼ Each site surrenders part of its autonomy in terms of right to change schemas or
software
◼ Appears to user as a single system

In a heterogeneous distributed database


◼ Different sites may use different schemas and software
◼ Difference in schema is a major problem for query processing
◼ Difference in software is a major problem for transaction processing
◼ Sites may not be aware of each other and may provide only limited facilities
for cooperation in transaction processing

3. Data warehousing
Data warehouse is an integrated, subject-oriented, time-variant, nonvolatile
database that provides support for decision making.
✓ Integrated centralized, consolidated database that integrates data derived from
the entire organization.
Consolidates data from multiple and diverse sources with diverse formats.
➢ Helps managers to better understand the company’s operations.
✓ Subject-Oriented Data warehouse contains data organized by topics. Eg.
Sales, marketing, finance, etc.

✓ Time variant: In contrast to the operational data that focus on current


transactions, the warehouse data represent the flow of data through time.

Department of Computer Science Fundamentals of Database Systems (CoSc2041)


Bonga University College of Engineering & Technology_________

➢ Data warehouse contains data that reflect what happened last week, last
month, past five years, and so on.
✓ Non volatile Once data enter the data warehouse, they are never removed.
Because the data in the warehouse represent the company’s entire history.

Differences between database and data warehouse


✓ Because data is added all the time, warehouse is growing.
✓ The data warehouse and operational environments are separated. Data warehouse
receives its data from operational databases.
✓ Data warehouse environment is characterized by read-only transactions to very
large data sets.
✓ Operational environment is characterized by numerous update transactions to a
few data entities at a time.
✓ Data warehouse contains historical data over a long time horizon.
◼ Ultimately Information is created from data warehouses. Such Information becomes the
basis for rational decision making.

◼ The data found in data warehouse is analyzed to discover previously unknown data
characteristics, relationships, dependencies, or trends.

Department of Computer Science Fundamentals of Database Systems (CoSc2041)

You might also like