DataBases Design
DataBases Design
COURSE DETAILS
Course Writer/Developer: Dr. A.T. Akinwale
&
Miss A. J. Ikuomola
Department of Computer Science
College of Natural Science
University of AgricultureAbeokuta,
Ogun State, Nigeria
COURSE CONTENT
COURSE REQUIREMENTS
This is a compulsory course for all students in the Department of Computer Science. In
view of this, students are expected to participate in all the activities and have a
minimum of 75% attendance to be able to write the final examination.
READING LIST
Abiteboul S., Hull R and Viarin V. (1995). Foundation of Databases, Addison-Wesley .
Anderson R.G. (1990). Data Processing, Principle and Practise, Pitman Publishing,
Longman Group , 7th Edition , UK
Anthony H. (1996). Uncertainty in Information Systems, The McGraw-Hill Companies.
Cormen T., Leisserson C. and Rivest R. (1990). Introduction to Algorithms, MIT Press
Date, C. J. (2003). An Introduction to Database Systems. 8th edition, Addison-Wesley.
ISBN 0-321-19784-4
1
Date, C. J., Darwen, H. (2000). Foundation for Future Database Systems: The Third
Manifesto, 2nd edition, Addison-Wesley Professional. ISBN 0-201-70928-7.
Elmesri R. and Narather S.P. (1994). Fundamental of Database Systems
Garae-Morlina H., Ullman J.D. and Widom J.O. (2008). Database Systems, The complete
Book, 2nd Edition, Prentice Hall
2
UNIT ONE
DATABASE DESIGN
Designing a database is an art process similar to building a house. There are many techniques
professionals via design databases. Beforeproceeding forward on database, there is a need to
know the basic concepts of database.
WHAT IS DATABASE?
A database can be defined as central pool of data which is shared by various user of an
organization.
Data processing: is the execution of systematic sequence of operations performed upon data.
By Daniel Martin- He defines database as a collection of data that obeys three criteria:
· Exhaustively
· Non _redundancy
· Appropriate structure
v Exhaustively means that all the data about the subject are actually present in the database.
v Non_redundancy means that each individual piece of data exit only once in the database.
v Appropriate structure means that the data are stored in such a way as to minimized the cost of
the expected processing and storage.
Some author says that database is with an “open” structure such database that is open database
allow for easy change of field dimension (e.g. increasing a given field size from six to seven
digits)
3
The problem with open database is that, it is very costly in terms of processing time, memory
space and disk storage.
Current amount
Exhaustively implies the presence within the database of all information pertaining to a given
customer or to a payment. Non_redundancyexclude the possibility that certain pieces of data
exist more than once within a database. For example, if the payment _due_ file of the database
contains the name and the address of each debtor. And this information is already stored in
“customer file”,it is therefore redundant.
Another concept of a database is that it is in position of being stored by many users. For this to
happen, there is separation of data organization and access technique from application programs,
The user have a view of a database in conformity with the way they want to see them.
4
It follows therefore that the concept of a database is potentially capable of solving both
redundancy and the loss of flexibility problems executed in conventional programming.
The organ that can perform these task is software package and is called Database Management
System (DBMS).DBMS is a software that solves the problem of conventional programming.
Convention is the usual way of doing things.
Assuming that there is a change in colour of a car from red to black. This change must be
captured in the conceptual data base in a faithful manner. This means the colour of the car should
be correctly updated to reflect the current colour.The following operation in computer allow us
to make change in the conceptual data base.
· Insert information
· Delete information
· Update or modify information
· Retrieve information
Let us suppose that at a certain stage in the life of the university, the need for computerization is
felt. This is because it is found very difficult to keep track of the student in the university vis-à-
vis the department to which they are affiliated. Consequently, a student record is defined
containing the field of interest. Let these be the name of the student, the department to which he
is affiliated, hisage, the last qualifying exam ination passed by him, and specialization for which
he is enrolled in the department.
After the record has been defined, a programming language like COBOL, PL/1 OR PASCAL
which support which supports the notion of a record selected and to program the two stages
outlined above selected and used to program the two stages outlined above. Later, the university
decides to computerize information regarding faculty numbers and theiraffiliation todepartments.
A new record is defined to capture this information and may look as in this figure II.
5
NAME DEPT AGE QUALIFICATION RESEARCH INTEREST
The university also tries to computerize the course offered by the various department and the
courses taught by the various faculty members respectively. These records are as follow:
The field course_no appears in the course_offered and course _taught records. This means that
all records which contain this field carry information about the course number; clearly, this
implies that there is a certain amount of information which in term results in a certain amount of
waste of storage.
Data base represent a new approach to storing data. In the data base of any firm, the problem in
company A may be different from the problem in company B; hence the implementation
procedure of their data base may vary with each factor. The data base is introduced a new way of
storing and accessing processed data. The data base represents consolidation of files. once files
are consolidated, it reduce input and output devices, the rate of update process, the rate of file
design, it eliminate the copies of backups and it gives cleaner operating environment.
e.g.Mar Daniel has his saving account and checking account at the lst Bank. Assuming, he
acquired a car loan from lstBank when he purchased his car. This means Mr. Daniel’s name and
other personal data appear in files belonging to saving account, checking account and loan
6
account any time, Mr Daniel makes deposit, it is certain that his name will appear in several
accounts.
Saving acc
This means Data base
file helps to share files.
Mr Daniel
Data base
Checking
acc file
Operating System
SOFTWARE
Database System
Application
Database system is designed to handle data store in a database. This system is to update and
retrieve information stored in the database when request are issued. During database operation,
the operating system would load database system and application program. The application
program can execute any instruction until it wants data from database. At this point, the database
gains control and handle the database request s. The database system resides in storage with the
application program when there is a need to use a database.
7
Knowledge Know-how on Database system
Data Base Management – A database is a collection of carefully integrated files. The data in
these files must be managed. There is a special software packages available to manage a
database. This software packages is known as Data Base Management System.
A database management system is the software that manages the database and provides facilities
for storing, accessing, and maintaining the data. DBMS is developed by computer manufacturers
or software house. These software packages tend to be referred to by several names, among them
“data base system”, “data management system”, “DBMS”. The software is in effect an extension
to the operating system. It acts as an interface between the programs which need to access data in
the database and the database itself, allowing the data to be retrieved or updated.
DBMS – have been in use for almost two decades. It was only about a decade ago that this field
has come to be recognized as a major discipline in computer science. It has been only recently
that DBMS have been studied systematically both from the user and the system points – of –
view. The beginning of DBMS development was marked by Database Task Group (DBTG)
which published a report in 1971, called CODA 71.
- A proposal for the development of relational model for few years, a great debate raged
between the proponents of Database Task Group Report and CODD 70 report. Each side claimed
that their views were the better ones. During this period, the entire area of DBMS was in a state
of abject turmoil (miserable disorder). There were arguments and counter-arguments. What
CODA 71 claimed as advantages, the CODD 70 group claimed as disadvantages. The basic
8
tenets of each proposal were questioned, examined and re-examined over and over again.
Finally, one major fact stood out , that was,
- Network model was shown to be reasonably efficiently implementable because it could
handle large size of data base for over billion of bytes.
- Relational model could only support relatively small database.
The proposal of CODA 71 of Network model was the basis of the design and implementation of
Data Base Management System. All the experts began to arrive at a consensus. The great
database has served the purpose. A new discipline of computer science is born.
In 1976, a study group called ANSI 76 had a new concept on DBMS. This concept opened up a
new area of DBMS. It emphasized the role of a DBMS as a tool for representing in computer, a
model of real world. This ANSI 76 report considered DBMS as software to manage a large pool
of data, called data base. The earlier researcher working towards the view of DBMS before
ANSI 76 group put final approval.
It is now remained for us to realize this structure not only in a neat and clean form but in
reasonably efficient manner.
The ANSI study group recognizes three functions that are necessary in order to support database
system:
That is,
The enterprise manager is responsible to ensure that a proper and adequate system analysis is
done which meet the need of the enterprise. The database administrator exercises control over
the data structure and the storage methods. He is concerned with overall efficiency of the
implementation.
The application administrator is responsible to split up the centralized pool of data among
various users in such a way that each user;
9
SPARC=Standard Planning and Requirement Committee.
- Bibliographic database
- Knowledge data base
- Graphics _ oriented data base
- Decision _making data base
- Bibliographic database: BDB have data which is free of a format. They display little or no
format. Such database are often used library information system .Data could be composed of
abstract of books. It could also compose of keywords and key phrases. It is possible using
these keywords and key phrases to select documents. Ifdesired, the source of the document
could before original document.
- Knowledge database: KDB are used in artificial intelligent applications. The data in these
KDB is discrete and formatted .in these KDB, there are many kinds of data with only a very
few occurrence of each kinds. Clearly, such data bases have the peculiarity that the size of the
data is almost as large as the definition of the data.
HIERARCHY OF DATA
10
Character Fact Record File Database
Character, fact, record, file and data base form a hierarchy of data.
The basic building block is a character. The character consist of upper and lower-case, numeric
digits or symbol. Upper or lower-case of letters are A,a B ,b,.........Z,z . Numeric digits are
0,1,2,3......9.Symbols involves commas, question mark, plus division e.t.c. upper and lower-case
letter are called alphabetic character. Numeric digit are called numeric symbol are character.
Symbols are called special character. A combination of the three is referred to as alphanumeric
characters (# 2B,À2.50K). A computer can accept both alphanumeric and number and store them
in memory. Character are put together to form a fact. A fact is also called a field. A fact or field
is a number, an item, a word, a name or a combination of characters.
- Character /text
- Numeric
- Data
- Logical
- Memo
A field is an individual item of data within a record. Facts are put together to form a record. A
record is a related item of data in a file. An employee record in a company would be a collection
of facts about one employee. These facts would include the employee’s name, address,
department, phone, position, pay rate, earning made to date, and e.t.c
Record are combined together to make a file. A collection of related records is a file e.g.A
collection of all employee records for one company would be an employee file. What is an
inventory file? It is a collection of all inventory records for a particular company.
The file described as employee file is an example of permanent file. The data stored in a
permanent file or master file should be accurate and current. A permanent file of all the
customers who own money to a company is an account receivable master file. An account
payable master file containing all suppliers to which the organization owes money.
11
A transaction file is a temporary file which represents the transactions of the organization. The
data stored in transaction file can be re –adjusted.
File are combined together to make a database. The heart of most organization is data. A data
base is the collection of integrated and related master files. An organization uses data as raw
materials to be stored in database. Once the data have been processed, they are called
information.
TYPES OF DATA
- Numeric data
- Alphanumeric data
Numeric data is expressed in number e.g. age= 35, date of birth is 1970.
Numeric data contains only numeric character or numbers
Alphanumeric data is composed of combination of letters numbers, or special punctuation
characters such as: name –Abeokuta
Address-17, Ibadan road
Let us take an example of long-distance telephone data. When you make a long distance
telephone call, the following item of data is recorded:
12
STRUCTURE OF THE DATA
TOTAL CHARACTER 29
2 - - - - -
.
.
5000 - - - - -
13
Each record has five fields of 29 characters. With 5000 record the file requires 5000*29
characters on external storage device. To process these long –distance telephone call date in a
bill, the record have to be identified through key record.
The record for this case is the telephone number of the person to whom the call is to be billed.
Total 66
Telephone 10
Name 20
Address 20
City 20
Balance 8
Date 8
Each record is 66 characters. With 200 customers, we need to keep 66X200 character of date.
14
UNIT TWO
DATABASE APPROACH
The database is closely associated with Data Base Management System (DBMS) software. A
database management system (DBMS) is a series of computer programs used to create, store,
maintain and access a database. The features offered by a particular DBMS depend on its type
and level of sophistication. For example dBASE is a sophisticated DBMS for microcomputers.
CPU
ALU CU
MAIN MEMORY
DATA DBMS
BASE
As this figure indicates, an application program written in a high level language accesses the
database through the DBMS software. In other words, DBMS software serves as the gate keeper
for the database.
VC FILE
STUDENT FILE
DEAN FILE
STAFF FILE
MANUAL DATABASE
A flat file is a file or series of files that contain records and fields. These fields are called flat
because they have no repeating groups.
15
Advantage of Database
· Data and application programs are independent, so the same data can be used by several
application programs.
· More information can be generated from the same amount of data. In other words, a given set
of data can be manipulated in many ways.
· One-of-a kind requests can be fulfilled easily
· Data duplication is minimal. This is true because one occurrence of each data item is
maintained
· Data management is enhanced and improved. This is possible because there is only one set of
date for all users.
· More sophisticated security measures can be implemented.
· Data is readily shared between applications – this is eliminating duplication and the problems
of maintaining consistency between duplicate values.
· New requests or one-of-a kind requests can be more easily implemented, because the logical
interface with the DBMS is simpler than a set of physical interfaces.
· The applications programs are independent of the stored data. If the storage format changes,
there is no need to alter the applications programs since they communicate with the DBMS in
logical rather than physical terms.
· It can be argued that a single database management system for an integrated database allows
for better management of data, since it is effectively in one place under the control of one set
of people, namely those who implement the database.
· Finally, the integration or sharing of data between applications puts sophisticated
programming within reach of all users of the database.
Disadvantage of Database
· Both DBMS software and extra hardware which might be needed to support the system can
be expensive
· A DBMS is much more complex than a file processing system
· Organization puts all their data in a single basket – if anything happens to the basket that is
the end of the data in the database.
16
FILE PROCESSING APPROACH
Scheduling rehearsals
Cataloguing music
· If the numbers of files increase and applications are generated which use data from one, two
or three files the number of interfaces increases rapidly, such situation might be termed as
interface explosion
· It should be apparent that allowing different application to share data in a traditional file
processing environment can cause considerable problems simply because of the number of
interfaces required
17
In file processing approach,
File Data
DATABASE APPROACH
PAYROLL
SALARY
SYSTEM
MUSICIAN
DATABASE
DATA MANAGE REHEARSAL REHEARSAL
SCHEDULE TIMETABLE
PERFORMANCE MENT
DATA
SYSTEM
MUSIC CATALOGUE CATALOGUE
DATA SYSTEM
· One of the fundamental feature of the database approach is that it allows data to be shared
between different applications
· In database approach, all the data are integrated into one physical file or a set of related files
18
· The sets of data are separated only logically within the database
· All access to the data is performed through a database management system (DBMS) a piece
of software which understands and manipulates the logical data structures in the file.
· Since all the application programs interface only with the DBMS, they (application
programs) require only a single interface to data in database.
· Interface of application programs to data in database can be at a logical rather than a physical
level.
· It is not necessary for any application programs to know how a particular data item is stored
as long as the DBMS can provide the data in the form required by the application. For
example the data item is an integer, it matters nothing to the application if the value is
actively stored in binary or character format as long as it is supplied to the application in the
format it requires.
· The property of needing to know nothing about the physical storage of the data is termed as
data independence.
Data Independence
Every item of data which is used by any application must be present in database
The information which is stored should not change if some files are re-formatted.
Faculty Record
Staff Record
Database file
Spreadsheet
19
Server Types
Database Servers
File Transfer users can transfer files between clients and servers
Encryption
File Storage and Data Migration – online storage – consists of hard drive storage
The process of moving data from ne line to offline or near-line storage is called data migration
File update ensuring that each user of a file has the latest version
File archiving the process of backing up files on offline storage devices such as tapes
DATA
20
With a DBMS – the procedure becomes
Data collection
The process of capturing raw data for use within a DBMS involves getting the original data to
the processing centre, transcribing it, converting it from the one medium to another and finally
getting it into the computer.
Problems
· Source documents – a great deal of data still originate in the form of a clerically prepared
document
· Data transmission
· Data preparation – this is the term given to the transcription of data from the source
document to a machine – sensible medium
· Media conversion – data is prepared in a particular medium and converted to another
medium for faster input to the computer.
Database optimization
Data distribution
21
· Physical loading of raw data. Once the raw data is loaded, it must be maintained and kept up-
to-date
· Duplicating raw data
Classification of raw data into specific object type object types are described by listing their
characteristics:
· Specify the domain of values for the smallest units of logical data e.g. integer or real
· Specify the units of measurement for logical data e.g. dollars, pound or feet
· Specify keys for certain logical units of data e.g. record types or relations
· Specify integrity constraints on the data e.g. an allowable range of values
· Specify access rules for the data e.g. allow update only if a correct password is supplied
ORGANIZATION OF DATA
The physical organization of the data in a database is described by physical storage structures
such as volumes, files or bytes. Physical storage structures are defined by means of a storage
definition language.
Data validation
Data security
Logical data
DBMS
22
PHYSICAL DATA
The DBMS serves as the interface between the logical and physical data. Logical data entities
employed by DBA and the user (the Requestor) are:
- Data items
- Logical records
- And files
Note that a logical file may contain records of more than one type. The physical data units are
called
- Data element
- Store records
- Physical records
- And database
A physical record is defined as the data unit input from the hardware on a single access
23
UNIT THREE
DATABASE MODELS
TYPES OF DATA MODELS (CALLED DATA STRUCTURES)
Computer based Information System (CBIS):
Design and implementation is done by Database Administrators (DBAs). The scope of
responsibilities of DBA depends on the complexity of the database. In small organizations, one
person may carry the entire responsibility of database design.
Generating a database increases cost and creates more complexity in a CBIS operation.
Implementation of an effective CBIS requires an online and comprehensive database regardless
of its cost and complexity.
CBIS is designed to provide timely and relevant information by performing data analysis,
modeling analysis or both.
Data analysis includes various query operations on a database.
Modeling analysis applies some types of model to the data available in the database and provides
additional information that is not directly available within the data.
24
DATA MODEL
THE FLAT FILE MODEL:
This is a file or a series of files that contains records and fields. These files are called flat because
there are no relationships between them. They have no repeating groups. The model does not
allow sophisticated database operations.
Example:
Basic data management operations such as: file creation, file deletion, file update, file single
data, query can be performed using this model.
This model is limited in its capacity to support complex CBIS requirements.
25
Table 3.5
To clarify this concept, look at the two relations in Table 3.4 and 3.5. As you can see the
common field in these two relations is the customer number.
A relational DBMS can use these two relations to generate a report like in Table 3.6.
· Creation of relation.
· Updating (insertion, deletion and modification).
· Selection of a relation or a sub-relation.
· Join operations (putting two relations side-by-side).
· Projection (Selection of a subset of a field or a subset of a series of fields).
· General query operations.
· Cross operation
26
Shortcoming of Relational model:
Supplier
P1 P2 P3 P4 P5 P6 P7
A B C D E F G H I J K L M N O P Q R S T U
Fig. 3.6
Like the relational model, a hierarchical data model is made up of records called nodes. Each of
these nodes can have several fields.
The presentation is similar to a one-dimensional array (a table with only one column or one row)
or tree structure.
The relationships between the records are called branches. The node at the top of the hierarchy is
called the root. Every node of the tree except the root node has a parent. The nodes with the same
parent are called twins or siblings. For example, P1 and P2 in fig. 3.6 are twins or siblings.
The hierarchical model is sometimes called an upside-down tree (a tree with its root up). Fig. 3.6
illustrates an example of a hierarchical model. It indicates that a supplier may supply three
27
different families of products. In each family, there may be several different product categories.
As an example, supplier X may supply soap, shampoo and toothpaste.
Within each product category, there may be many brands of the same product – for example,
nine different shampoos or five different toothpastes. Such a relationship is called a one-to-many
data structure. This means a parent can have many children. Each child has only one parent.
In the hierarchical model, a search in the parent node can lead you to children nodes and vice-
versa. Any updating in a parent node should automatically update the children nodes.
The operations associated with the hierarchical model include file creation, file updating
(insertion, deletion, addition, and modification), file queries, retrieval of the next descendant
round, and retrieval of the parent record.
Cash Credit
Figure 3.7
28
A complex network
Fig 3.8
The network model is similar to the hierarchical model. The records and fields of a network are
organized differently. Fig 3.7 illustrates customer and invoice relations in a network model. In
place of related key fields, there is a connection between the invoice number, customer number
and the method of payment. In this case, the customer number no longer needs to remain in the
invoice record. As fig 3.7 illustrates, invoice numbers are connected to the customer number in
the same order in which they were connected in table 3.4.
· File creation.
· File updating (insertion, deletion, addition and modification).
· File queries.
One-to-many: Each child (invoice) has two parents (methods of payment and customer number)
Fig 3.8 illustrates a many-to-many relationship. In a real estate agency, each agent is selling
several properties. For example, agent A-1 sells properties P-1, P-2 and P-6, while property P-1
has been listed under agent A-1 and A-2. In a many-to-many relationship; the parent-child
relationship breaks down because any record can be the parent and any record can be the child.
1. Basic data management operations: The basic data management operations include database
creation, modification, deletion, addition, insertion, and maintenance. These operations are
supported even in a flat file management system.
29
2. Basic arithmetic operations: These include simple arithmetic operations performed on
different records and fields in a database including addition, subtraction, multiplication and
division. These basic operations may be quite useful for simple query operations, such as
calculating the average salary for both male and female employees or finding the maximum
and minimum salary for each gender.
3. Projection operation: This function may be a special case of a general query operation that
generates a subset of the fields. For example, in a student database that includes each
student’s name, GPA, age, gender, address and nationality. A projection operation could
generate a listing of the names and GPA of all these students or a mailing list for mailing the
students’ transcripts.
4. Search (Query): This function may include different searches on a database for specific
conditions. As an example, a triple criteria search on our example student database is as
follows:
DISPLAY ALL STUDENTS FOR GPA >=3 AND MAJOR=”CS” AND
AGE<=22
Query operations can include as many criteria as the number of fields in the database. The
search can include an AND search (all criteria specified must be met) an OR search (only one
of the specified criteria must be met) and a NOT search (opposite criteria must be met or
supply an alternative)
Table 3.4
File 1
STUDENT MAJOR
Bob MIS
Barry CS
James MIS
Sue Accounting
30
File 2
STUDENT MAJOR
Mary Marketing
Sherry MIS
Suzy Math
File 3
STUDENT MAJOR
Bob MIS
Barry CS
James MIS
Sue Accounting
Mary Marketing
Sherry MIS
Suzy Math
Table 3.7 presents the operation on a student database. File 3 is the union of files 1 and 2.
Remember to perform the union operation; the two databases must be union compatible. This
means they must include the same number of fields and data types.
8. Join operation: This operation combines two or more files, tables or relations within a
database on a common field in order to generate a third file table or generation. Table 3.8
illustrates one example of this operation in which the common key is the customer name.
RELATION 1 RELATION 2
Purchase Number Customer Purchase amount
112 Barry 2000
118 James 5000
129 Susan 1000
135 Bob 1500
31
RELATION 3
9. Intersection operation: The intersection generates the intersection of two relations in a third
relation containing a common tuple(s) (common rows). The result of the intersection of
relation 1 and 2 is relation 3which contains only one row (tuple), the one belonging to the
first two relations.
An intersection operation
Relation 3
Union=Set1+Set2
Intersection=Set1*Set2
Difference=Set1-Set2
Difference operation is defined as the set of elements that are in Set A but not in Set B. For
example,
32
DATA MODEL
Within a model, there is object type. Example of object type is house. House has characteristics
such as address (street number and street name), color (red, green etc.), style (bungalow, duplex),
price etc.
A set of characteristics that uniquely identifies an object of a house can be used to identify each
house within its object type is referred to as a key. For example, if the address of a house can be
used to each house, then the address characteristic is a key of the object type houses.
Characteristics (attribute)
For each entity set (house), its attributes have certain values. For example, the color attribute has
values such as red, green and blue. The set of possible values of an attribute is called the domain
of the attribute. A set of attribute that uniquely determines an instance of an entity is called a
key.
Relationships:
Address Color
The two sets of values immediately carry some information. This information is available
because a relationship has been established between the values of address and color.
33
every element of Set A is also
an element of Set B
B that is not in A
element of A
A that is not in B.
34
UNIT FOUR
Movies
Title Year Length Genre
Gone with the wind 1939 231 Drama
Star wars 1977 124 Sci-fi
name
length type
owns
cinema
address
35
Entity relationship diagram
Star_in: is a relationship connecting each film to the stars of that film.
Owns: each film is owned by at most one cinema.
Relationship = 1 : 1
m:1 1:m Cinema President
Owns
n:m
for a particular star and film, there is only one cinema with which the star has contracted for the
film.
A cinema may contract with several stars for a film and a star may contract with one cinema for
more than one film.
Star Film
Contract
Cinema
original
Sequel of film
sequel
Rules in relationship
Original = role
Sequel= role
· Film may have many sequels.
· Each sequel, there is only one original film.
Many – one
Sequel to original
36
name address
title year
salary
stars
film
contract
s
cinema
length type
name address
37
length title year type
film
murder cartoons
weapon
length
cinema owns
name address
· keys are indicated by underlines = we underline only primary key when we have different
types of keys.
38
Degree constraints
10
stars film
Star_in
· a film entity cannot be connected by relationship star-in to more than 10 star entities.
The crew might be designed by a given cinema as crew1, crew2, and so on.
39
Requirements for weak entity set
· Zero or more of its own attributes
· There many-one relationships are called supporting relationships for E, end the entity sets
reached from E are supporting entity sets.
cinema
An association is a set of pairs of objects, one from each of the classes it connects.
Cinema Film
0..1
Name pic
0..∞
address owns Title pic
Year pic
Length
0..∞
Stars 0..∞ type
Star_in
Name pic
address
40
Two associations = owns, star_in
Every association has constraint on the number of objects from each of its classes.
Constraint as m....n
M.∞ stands for infinity = there is no upper limit
O.∞ no constraint at all on the number of objects.
1.1 = exactly one
* subclasses = UML permit four sub classes
* an aggregation is a line between two classes that ends in an open diamond at one end
0....1 = aggregation is a many-one association
Title pic
Type
Bottom is the method. Neither E/R
Place for methods nor relational model provides
41
methods.
A binary relationship between classes is called an association.
Declaration of keys
Class film (key (title, year)) {
Attribute string title;
Attribute integer year;
Attribute integer length;
Attribute enumFilmType
{drama, horror, comedy}
Film_Type
};
Relationships in ODL
· Relationship is declared inside a class declaration by the keyword = relationship
= a type
= name of the relationship
Relationship set <star> stars
Inverse star:: starred in;
Inverse Relationship star film
To access the stars of a given film we might like to know Star_in
the film in which a given star acted.
Relationship set<film> starred in
Inverse film::stars
42
Object Definition Language ODL
Like UML, the class is the central concept in ODL.
A declaration of a class in ODL in its simplest form is
Class <name> {
<list of properties>
-attribute
-relationship
-method
ODL has structural type
43
APPLICATIONS THAT USES XML
· Cell phones
· File converter PDF to XML converter
· Voice XML
XML code
<?XML version = “1.0” encoding = “ISO-8859-15”?>
<class_list>
<student>
<name>Robert</name>
<grade>A+</grade>
</student>
<name>Leonard</name>
<grade>-A</grade>
</student>
</class_list>
XML declaration, version of XML, type of encoding you are using.
XML element = <student>
XPath
In XPath, there are seven kinds of nodes:
· Element
· Attribute
· Text
· Name space
44
· Processing_instruction
· Comment
· Document
XML documents are treated as tress of nodes. The topmost element of the tee is called the root
element.
XML document.
<?XML version = “1.0” encoding = “ISO-8859-1”?>
<bookstore>
<book>
<title lang = “en”>Harry Potter </title>
<author>J.K. Rowling </author>
<year>2005</year>
<price>29.99 </price>
</book>
</bookstore>
Example of nodes in the XML document above
<bookstore> (root element node)
<author> J.K. Rowling </author> (element node)
Lang = “en” (attribute node)
45
<book>
<title lang = “eng”>learning XML</title>
<price>39.99</price>
</book>
</bookstore>
The most useful path expressions are listed below:
Expression Description
Nodename Selects all child nodes of the named node
/ Selects from the root node
// Selects nodes in the documents from the current node that match the
selection no matter where they are
. Selects the current node
.. Selects the parent of the current node
@ Selects attributes
NOTE: if the path starts with a slash (/) it always represents an absolute path to the element.
46
<title lang = “en”> Harry Potter </title>
<author> J.K. Rowling </author>
<year> 2005 </year>
<price>29.99</price>
</book>
Bookstore/book/title/ //price = selects all the title elements of the book element of the bookstore
element AND all the price element in the document.
47
Query one = 1. Selects all the titles
/bookstore/book/title
48
Selecting unknown nodes
XPath wildcards can be used to select unknown XML elements.
Wildcards Description
* Matches any element node
@* Matches any attribute node
Node() Matches any node of any kind
Predicates
Predicates are used to find a specific node or a node that controls a specific value. Predicates are
always embedded in square brackets.
Path expression Result
/bookstore/book[1] Selects the first book element that is the child of the
bookstore element
/bookstore/book[last()] Selects the last book element that is the child of the bookstore
element
49
/bookstore/book[last() – 1] Selects the last but one book element that is the child of the
bookstore element
/bookstore/book[position() <3] Selects the first two book elements that are children of the
bookstore element
//title[@lang] Selects all the title elements that have an attribute named lang
//title[@lang=”eng” Selects all the title elements that have an attribute named lang
with a value of “eng”
//bookstore/book[price > 35.00] Selects all the book elements of the bookstore element that
have a price element with a value greater than 35.00
XSLT, XQuery
XPath Axis
An axis defines a node-set relative to the current node
50
UNIT FIVE
DATABASE NORMALIZATION
There are many relationships
-A relationship may be a 1:1, 1: N, n: m
1:1 relationship, for example, is the relationship between an employee’s personnel number and
social insurance number.
NORMALIZATION
Suppose that the value of the attribute BUILDER determines values for the attribute STYLE and
PRICE and that the value for the attribute STYLE determines the values for the PRICE.
Grouping these attributes together in the relation HOMES1 (BUILDER, STYLE, PRICE) has
several undesirable properties.
First, the relationship between style and price is repeated in the relationship for each builder who
builds a particular style of home. This repetition creates difficulties. If a builder who happens to
be the last builder of a certain style, home is deleted from the relationship, then the relationship
between the style and its price also disappears from the relation. This is called a deletion
anomaly.
· Similarly, if a new builder who happens to be the first builder of a certain style home is
added, then the relationship between a style of home and its price will also be added,
even though this was not the purpose of the insertion. This is called an insertion anomaly.
Such insertions and deletions are anomalous because most such operations will not
produce these side effects on the style-price relationship. These anomalies are undesirable
since the user is not likely to realize the consequences of the insertion or deletion.
· A second problem with the grouping is the effect of updates of the consistency of the
relation. Suppose that the relationship between a style and its price is changed e.g. the
51
price is increased. To maintain the consistency of the relation, the new style-price
relationship should be included for every builder of the style.
· If the relationship HOMES1 (BUILDER, STYLE, PRICE) is normalized, then the
consistency and anomaly problems disappear.
Normalization is a step-by-step reversible process of replacing a given collection of relations by
successive collections in which the relations have a progressively simpler and more regular
structure. The reversibility guarantees that the original collection of relations can be recovered
and therefore no information has been lost.
The objectives of normalization are
1. To make it feasible to represent any relation in the database.
2. To obtain powerful retrieval algorithms based on a simpler collection relational
operations that would otherwise be necessary.
3. To free relations from undesirable insertion update and deletion dependencies.
4. To reduce the need for restructuring the relations as new types of data are introduced.
5. To make the collection of relations neutral to the query statistics which these statistics are
liable to change as time goes.
· The first two objectives apply only to the first step (conversion to first normal form).
· The last three objectives apply to all normalization steps.
· Unnormalized form:
Eliminate attributes that have relations as elements
· 1NF
· 2NF
· 3NF
· BCNF
52
Consider the relation;
HOMES (BUILDER, MODEL) 1NF
HOMES1 (BUILDER, STYLE, PRICE) 2NF
HOMES2 (BUILDER, STYLE) COST (STYLE, PRICE) 3NF
53
à B is said to be dependent (functional dependent) on A
à A is said to be determine (functional determine) on B
A B = means that there is no functional dependency between A and B
If both AàB and BàA hold then at all times A and B are 1:1 correspondence and the
notation AßàB is used.
Let f: A1 A2 A3…AnàB
g: A1 A2 A3…AmàB where m<n
Assume f (a1 a2 a3...an) = g (a1 a2 a3…am) for all ai in Ai, 1 ≤ I ≤ n
The second normalization (2NF) removes partial dependencies of non-prime attributes on keys.
Second Normal Form (2NF): A relation R is in 2NF if R is in 1NF, and each non-prime attribute
in R is fully dependent upon every key.
Given the relation (keys are underlined)
HOUSES (ID, ADDRESS, SUBDIV, STYLE, BUILDER).
In the HOUSES relation, the keys are ID, ADDRESS and SUBDIV.
The non-prime attributes are STYLE and BUILDER. The relation is therefore not in 2NF. To
place the HOUSES relation into 2NF, it is split into two relations:
HOUSES1 (ID, ADDRESS, SUBDIV, STYLE)
CONTRACTOR (SUBDIV, BUILDER)
· Access to the builder information is still possible from the HOUSES1 relation through
the SUBDIV attribute common to both relations.
For example, to determine the builder of a certain house, the SUBDIV attribute value is obtained
from the appropriate tuple in the HOUSES1relation.This value is then used to search the
CONTRACTOR relation and determine the corresponding builder.
Notice that it is possible to keep information about a builder for a sub-division independent of
the HOUSE1 relation. This capability eliminates one of the insertion anomalies discussed earlier.
54
The next normalization step converts relations to Third Normal Form or 3NF by eliminating
transitive dependence of non-prime attributes on keys.
Suppose that A, B and C are three subsets of a relation R. Suppose that the following time-
independent conditions hold:
A B
B A
B C
A C
C A
A
B
C
Transitive dependence of C on A
NOTE: C B is neither prohibited nor required
If the above conditions hold, then C is transitively dependent on A under R
Price is transitively dependent on Builder under HOME1.
In the special case where CàB also holds, both B and C are transitively dependent on A under
R.
Transitive dependencies also lead to the insertion/deletion anomalies and consistency problems.
Builder
Style
Price
Transitive dependence
55
Consider now the problem of changing the style of house that is built by a builder. In this case,
the value of the price attribute also has to be changed. If it is not, then the database will show an
inconsistency.
Consider also the problem of inserting and deleting tuples. If a new HOMES1 tuple is inserted
for a new style home, then the relationship between style and price is also created.
Similarly, if a builder is deleted from the HOMES1 relation, and if this is the last or only builder
of a particular style, then all information about the particular style-price relationship is also
deleted.
The transitive dependency of BUILDER on PRICE can be eliminated by splitting the HOMES1
relation into the two relations.
· These two relations cannot contain any transitive dependencies since they are each only of
order two.
· The price now appears only once for each style.
· No information has been lost since the price of a style of house can be obtained by using the
style values from the HOMES2 relation to access the COST relation.
Both the HOMES2 and COST relations are in 3NF
CADILLAC DUPLEX
DELZOTO DUPLEX
HOWLETT BUNGALOW
56
JOINT RANCH
METRO BUNGALOW
MONZA DUPLEX
TEREX RANCH
WIMREY RANCH
WEEK – SIX
NORMALIZATION / DECOMPOSITION
57
For example, consider a modified set of functional dependencies describing part of the
HOUSES2 model shown below.
Suppose the price a builder charged in an old subdivision is also to be charged in a new
subdivision i.e. the PRICEàSUBDIV dependency is modified.
In a BCNF relation R, every functional dependency in R must be of the form KàA where K is a
key and A is any attribute. The following can be asserted:
1. All non-prime attributes must be fully dependent on each key.
2. All prime attributes must be fully dependent on all keys of which they are not a part.
3. No attribute (prime or not) can be fully dependent on any set of attributes that is not a
key.
Factors – To link relationship of network data mobile, object types, characteristics, and
relationships.
Value of the attribute BUILDER determines values for the attribute STYLE and PRICE.
Value of the attribute STYLE determines the value for PRICE.
Unlike mathematical relations, database relations are time-varying since tuples may be inserted,
deleted or updated.
An index implies a selection mechanism or a pointer structure to desired data.
58
2NF = the second normalization removes partial dependencies of non prime attribute on key.
Houses (ID, Address, Sub-div, Style, Builder)
- The keys are ID , Address, Sub-div
- the non prime attributes are style and builder
House 1 (ID, Address, Sub-div, Style)
Constructor (sub-div, Builder)
A→B B↛ A
B→C C↛A
A→C
Transitive dependences lead to the insertion and deletion anomalies and consistency problems.
The transitive dependency of builder on price can be eliminated by splitting the Home 1relation
into the two relations.
59
Home 2 (Builder, style) cost (style, price)
Metro bungalow
Moon duplex
- eliminate redundancy
- eliminate anomalies
Kind of anomalies
Insertion anomalies
Deletion anomalies
Update anomalies
Decomposition Relations
Given a relation R with schema {A1, A2, …., An} we may decompose R into two relations S
and T with schema {B1,B2,…, Bm } and {C1, C2,…..,Ck}
60
T = {title, year, Star name}
S = movie 1
T = movie 2
That is the left side of every nontrivial functional dependency must be a super key.
Thus, an equivalent statement of the BCNF condition is that the left side of every nontrivial
functional dependency must contain a key
R → S ∪T
61
Title year Film type Studio name Length
Star 1997 Color Fox 124
Might 1991 Color Disney 104
Way 1992 Color Paramount 95
Add 2001 color Paramount 102
R is not in BCNF
S is in BCNF
T is in BCNF
Note that the different between this 3NF and BCNF condition is the clause or B is a member of
some key.
3rd NF allows the right hand side attribute as a member of the key.
When these relations are not in BCNF, there will be some redundancy left in the schema.
When we decompose a relation schema, we need to check that the resulting schemas are in
BCNF.
Suppose we have a relation R, which is decomposed into relation S and some other relation
62
R (S1, S2, S3)
Compute X+
1. B is an attribute of S
2. B is in X+
3. B is not in X
E.g. R (A, B, C, D)
A→B
We must compute the closure of each subset of [A, C] which is a set of attributes of S
A+ = A → B AB
B→C ABC
A→C C is in S [A, C]
C+ = A → B
B→C
=C
AC + = ABC for A → B
B → C = ABC
63
AC → B B is not in S [A, C] there is no new dependency
R (A B C D E)
S (A, B, C)
A→D
B→E
DE → C
A+ =
A → D = AD A→D
B → E = AD
DE → C = AD
B+ =
A→D=
B → E = BE B→E
DE → C =
C+
A→D=
B →E =
DE → C = C
A → D = ABD AB → D
B → E = ABDE AB → E
DE → C = ABCDE AB → C
A → D = ACD AC→ D
B→E=
64
DE → C =
BC+
A→D=
B → E = BCE BC → E
DE → C =
ABC+
A → D = ABCD ABC → D
B → E = ABCDE ABC → E
DE → C = ABCDE
Suppose we have a relation ABCD with some FD’s F. If we decide to decompose ABCD into
ABC and AD, what are the FD’s for ABC and AD
F = AB → C
C→ D
D→A
{A, B, C}
A+ = C→A
B+ = AB → C
C+ = CD C→D BC → A
CDA` C→A
AB+ = ABC AB → C
ABCD AB → D
AC+ = ACD AC → D
BC+ = BCD BC → D
ABCD BC → A
65
{A, D}
A+ =
D+ = AD = D → A
A→B
B→C
Project at AC
1. A+ = ABC yields A→ B, A→ C
2. B+ = BC yields B → C
3. C+ = C
4. AB+ = ABC yields AB → C
5. AC+ = ABC yields AC→ B
6. BC+ = BC
7. ABC = ABC
Resulting FD’s
A→B
A→C
B→C
FD’s are AC → B
BD → F
F → CE
66
Such that all B1, B2, …..,Bn are in the set of attribute X but cc is not .We add C to the set X.
4. The set X after no more attributes can be added to it is the correct value of {A1, A2,…, An}+
67
UNIT SIX
Output: the set C of super keys that hold as candidate keys in all relation universes over H in
begin
while Q <>∅ do
Q := Q – {K};
minimal := true;
K := (K−Y) ∪X ;
if Kʹ in K then
minimal := false ;
Q := Q ∪ { Kʹ } ;
end if.
end for
if minimal and there is not a subset of K in C then remove all super keys of K from C ;
C := C ∪{ K }
end if
68
end while
end.
· Can recover any instance of the decomposed relation from corresponding instance of the
smaller relations.
· It is required to be reversible.
· It wants no information loss in the process.
The decomposition is lossless since it does not lose any information contained in the original
relation.
· It does not generate any spurious tuples which leads to false or misleading information
· It is called non – additive join since it does not add any new tuples
· It allows to get back exactly what we started with before decomposing.
Remember we decompose to do away with redundancy and all of the problems associated with
the duplication data.
John 2 Books
John 2 Rulers
Bello 2 bks
John 2 rulers
69
The question is: What conditions must be satisfied in other to guarantee that joining R1 and R2
back together takes back to original R?
A decomposition D = {R1, R2} of R has the lossless join property with respect to the set of FD’s
f if
MULTIVALUE DEPENDENCIES
4NF
Multivalued Dependencies
A multivalued dependency (MVD) has the form X─>─>Y where X and Y are sets of attributes
in a relation R.
X─>─>Ymeans whenever two rows in R agree on all the attributes of X then we can swap their
Y components and get new rows that are also in R.
X Y Z
a b1 c1
a b2 c2
a b1 c2 must be in R too
a b2 c1
- - -
- - -
70
MVD student(s1d, c1d, club)
sid ─>─>cid
Given: Relation R and set of MVD’s for R
Definition: R is in 4NF with respect to its MVD’s if every nontrivial MVD AA─>─>BB, AA
contains a key.
Note: Since every FD is also in MVD, 4NF implies BCNF
Trivial M VD
A trivial MVD is when the RHS is a subset of its LHS. AA─>─>BB where BB is a subset of AA
or (AA U BB) does not contain all the attributes of R.
Nontrivial MVD
AA─>─>BB where BB is not a subset of AA and (AA U BB) does not contain all attributes of
R.
MVD
Definition:
71
- MVD is complementation
If X─>─>Y then
X ─>─> attributes (R) - X – Y
- MVD is augumentation
If X─>─>Y and
V Í W then
XW─>─>YV
- MVD is transitivity
If X─>─>Y and
Y─>─>Z then
X─>─>Z – Y
Algorithm for Decomposing a relation into 4NF relation (same ideas as BCNF)
72
-2b- Decompose R′ into R1(AA, BB) and
R2 (AA,CC) where
5NF
If a relation is already in 3NF and each of its keys consists of a single attribute, it is also in 5NF.
L there may be some relations that are in 4NF, but still have some redundant information.
However, there are no violating MVD’s and/or FD’s so we cannot use either of these
dependencies to decompose the relation.
73
If it is true, give a brief argument.
- Why it is true e.g. by applying the closure to desire FD’s.
- You may assume the relation to which they apply is R(A, B, C, D)
a) If A─>─>B then A─>B
If A─>B then A─>─>B false. A─>B
A─>─>B
BC─> D
Note that you can’t reduce ABC─> BD to AC─>D by receiving B from both sides.
c) If A─>B and
B─>─>C then
A─>─>BD
Applying the transitive rule to A─>B and B─>C yields A─>C
Applying the promotion rule to A─>C the yields A─>─>C
Applying complementation rule to A─>─>C then yields A─>─>BD
1) Create a matrix S with one row i for each relation Ri in the decomposition D, and one column
j, for each attribute Aj in R.
2) Set S(i, j ) i=bij for all matrix entries
(* each bijis a distinct symbol associated with indices (i, j) * ) .
3) For each row i representing relation schema Ri for each column j representing attribute Aj.
If Ri includes attributes Aj then set S (i, j) i=aj
(* each aj is a distinct symbol associated with index j *)
4) Repeat the following until a loop execution results in no changes to S.
74
For each functional dependency X─>Y in f, for all rows in S which have the same symbol
in the columns corresponding to attributes in X
Make the symbols in each column that correspond to an attribute inY be the same in all these
rows as follows:
If any of the rows has an “a” symbol for the column, set the other rows to that same “a”
symbol in the column. If no “a” symbol exist for the attribute in any of the rows, choose one
of the “b” symbols that appear in one of the rows for the attribute and set the other rows to
that “b” symbol in the column.
5) If a row is made up entirely of “a” symbol then the decomposition has the lossless join
property – otherwise, it does not.
A decomposition D= (R1, R2,...,Rm) of R has the lossless (non additive) join property with
respect to set of dependencies F on R if for every relation instance r of R that satisfies f,
*(∏<R1>(r), …,∏<Rm>(r)) = r
Additional tuples represent erroneous information and hence add more information.
1) A decomposition D = {R1, R2} of R has the lossless join property with respect to a set of
functional dependencies f on R if and only if either
· The FD ((R1 ∩ R2) ─> (R1 - R2)) is in f+or
· The FD ((R1 ∩ R2) ─> (R2 – R1)) is in f+
2) If a decomposition D = {R1,R2, …, Rm} of R has the lossless join property with respect to a
set of functional dependencies f on R and if a decomposition D1 = {Q1, Q 2,…, Qk} of R1 has
the lossless join property with respect to the projection of f on R1, then the decomposition
75
D2 = {R1, R2,…, Ri-1, Q1, Q2,…,Qk, Ri+1,…, Rm } of R has the lossless join property with
respect to f.
R2 a1 b22 a3 a4 a5 a6
R1 = {SSN, EName}
76
R3 = {SSN, PNumber, Hours}
FD = SSN EName
R2 b21 b22 a3 a4 a5
b26
b32 b34 b35
R3 a1 a2 a3 a4 a5a6
Matrix S after applying the first two functional dependencies = last row is all “a” symbols
So we stop.
77
UNITSEVEN
MULTIVALUED DEPENDENCIES
· 4th = fourth normal form
· Join dependencies
· 5th = fifth normal form
· Inclusion dependencies
· Template dependencies
· Domain key norm form
Multivalued dependencies are a consequence of first normal form which disallowed an attribute
in a tuple from having a set of values or a list of values or a combination of both.
· Whenever two independent 1: N relationship A : B and A:C are mixed in the same
relation by representing all possible combination an MVD may arise.
Definition:
Note that because of the symmetry in the definition whenever X Y holds in R, so does
X (R-X-Y)
78
EName─>─>DName
EName─>─>DName or
EName─>─>DName/ DName
EMP
Smith X John
Smith Y Anna
Smith X Anna
Smith Y John
Emp-Projects emp-Dependents
Smith works on project with PName X, Y. Has two dependent with DName John and Anna
79
An MVD X─>─> Y in R is called a trivial MVD if
a) Y is a subset of X or
b) X U Y = R
Emp is in BCNF because no FD holds in Emp. We need to define a fourth normal form that is
stronger than BCNF.
Properties of MVD
X─>─>{R – ( X U Y)}
W ≥ Z then
WX ─>─>YZ
(a) W ∩ Y is empty
(b) W ─> Z
80
(c) Y≥ Z then X ─>─>Z
A relation schema R is in 4NF with respect to a set to dependencies f if for every non trivial
multivalued dependency X ─>─>Y in f+, X is a super key for R.
EMP relation is not in 4NF because in the non trivial MVDs EName ─>─>PName and EName
─>─>Dname
Both EMP – project and EMP – dependent are in 4th NF. Because Ename ─>─>Pname is a
trivial MVD in EMP – project.
In fact no non trivial MVDs hold in either EMP – project or EMP – dependent
Emp
Brown z Jim
81
Brown x bob Brown Joan
Brown z bob
16 tuples 11 tuples
48 facts 22 facts
PROJECTING MVDs
R (A B C D E) S (A B C)
MVDA ─>─> CD
Chain
A ─>─> C holds in S
A ─>─> B by complementation
A B C D E
a b1 c d1 e1
a b c2 d e
A─>─> CD
82
A B C D E
a b1 C d1 e1
a b c2 d e
a b1 c2 d e1
a b C d1 e
The last row has un-subscripted symbols in all the attributes of S, that is A, B and C. This is
enough to conclude that A─>─>C holds in S
- Trivial MVD
A1, A2, A3….. An─>─> B1 B2 B3……Bm
B1 B2………Bm─>─>C1 C2……….CK
then
83
A1, A2, A3….. An─>B1 B2……Bm then
- Complementation rule
If A1, A2, A3….. An─>─> B1 B2 B3……Bm is an MVD for R then R also satisfies A1,
A2….. An─>─> C1 C2……CK where the C’s are all attributes of R not among the A’s and
B’s
Multivalue dependency
X─>─>Y tells us that it we find two rows of the tableau that agree in X, then we can form two
new tuples by swapping all their component in the attributes of Y
R(A B C D)
A─>B
B ─>─>C
84
A B C D A B C D
a b1 c d1 A─> B a b c d1
a b c2 d a b c2 d
A B C D
a b c d1
a b c2 d
a b c2 d1
a b c d
A─>─> C
Given two tuples of R that agree on A, they must also agree in B, A ─>B
MVD
X ─>─>Y and any FD whose right side is a (not necessarily proper) subset of Y, say Z then X
─>Z
R (A B C D) A B C D
a b2 c2 d2
FD D ─>C
A B C D
a b1 c1 d1
a b2 c2 d2
a b2 c2 d1 D ─>C
a b1 c1 d2
85
A B C D
a b1 c1 d1
a b2 c2 d2
a b1 c1 d2
1. If necessary split the FD’s of S so each FD in S has a single attribute on the right
2. Let X be a set of attributes that eventually will become the closure
Initialize X to be {Aı ,A2, A3,…,An}
3. Repeatedly search for some FD
Bı B2 B3…, Bm → C
Such that all of Bı, B2,….,Bm are in the set of attributes X, but C is not. Add C to the set X
and repeat the search. Since X can only grow and the number of attributes of any relation
scheme must be finite, eventually nothing more can be added to X, and this step ends.
4. The set X after no more attributes can be added to it, is the correct value
{Aı, A2, A3, …,An}⁺
86
Algorithm for projecting a set of functional Dependencies
method
Repeat the above steps in all possible ways until no more changes to T can be made.
Decomposition Algorithm
Method: The following steps can be applied recursively to any relation R and set off FDʹs S.
Initially, apply them with R = R˳ and S = S˳.
1. Check whether R is in BCNF. if so nothing more needs to be done. Reduction {R} as the
answer.
2. If there are BCNF violations, let one be X → Y.use algorithm closure X⁺.
Choose Rı = X⁺as one relation schema and let R2 have attributes of X and those attributes of
R that are not in X⁺.
87
3. Use algorithm for projecting a set of functional dependencies to compute the set of FDʹs for
Rı and R2: let these be Sı and S2 respectively.
4. Recursively decompose Rı and R2 using this algorithm. Return the union of the results of
these decompositions.
In database design, to test for lossless decomposition, one may use the following algorithm
88
i = i + 1;
Ri =
end
if none of the schemas Rj 1≤ ≤
contains a candidate key for R then
begin
i = i + 1;
Ri = any candidate key for R,
end
return ( Rı, R2,R3,…, Ri )
Fc computation algorithm
Fc = F
repeat
apply union rule (right side of fd)
find fd with extraneous attributes ( left | right side)
and delete these
until Fc does not change.
result := {R}
done := false;
compute F⁺
while (not done) do
if (there is a schema Ri in result that is most in BCNF) then
Begin
let → be a nontrivial functional dependency that holds on Ri such that
→ is not in F⁺ and ∩ =ø ;
89
result := (result – Ri ) ∪ (Ri − ) ∪ ( , ),
end
else
done:= true
When we decompose a relation, we have to use natural joins or Cartesian products to put the
pieces back together .This takes computational time.
90
3. If none of the sets of relations from step 2 is a super key for R, add another relation
whose schema is a key for R.
R (A B C D E)
FDs
AB → C
C→B
A→D
Use closure AB for C → B, A→D
A B D = D is included but not C
We conclude that the first FD AB → C is not implied by the second and third FDʹs. we get a
similar conclusion if we try to drop the second or third FD.
· We must also verify that we cannot eliminate any attributes from a left side.
· We start the 3NF synthesis by taking the attributes of each FD as a relation schema.
{A,B,C}
{C, B}
{A, D}
It is never necessary to use a relation whose schema is a proper subset of another relations
schema, so we can drop S2 {A, B,C} and {A,D}.
R has two keys = {A,B,E} {A,C,E}.Neither of the keys is a subset of the schema chosen so far.
We must add one of them R4 {A, B, E} = Final decomposition is Rı {A, B, C} R2 {A,D} R3
{A, B, E}
91
D = {Rı, R2,…,Rn}.The decomposition is said to satisfy the attribute preservation condition if
A decomposition is called lossless (nonadditive) when natural joins applied to the relations in the
decomposition do not generate spurious tuples. Loss is lossless refers to the loss of information
and not the loss of tuples. Actually the loss of information occurs because of added (spurious)
tuples in the joins. A decomposition which is not lossless is called lossy.
92
UNIT EIGHT
RELATIONAL ALGEBRA
In the relational data model, attribute relationships are represented by relations. Relational data
model, unlike, network model, does not provide links to represent associations. Instead, another
relation is used to represent an association.
Relations that represent associations can be existing relations in the database or they can be
generated (created) from existing relations by using relational operators.
The relational operators can be described using either the relational algebra or relational calculus.
Relational algebra is a set of operators that constructs the required relation from given relations.
∃ there exists
∀ for all
˄ and
˅ or
- not complement
∈ belongs to member of
⊆ subset
∅ empty
: such that
S = <s1, ……,sn>
= <1, 2, x, a, 2, 3 >
2 Let R be an n-array relation, r∈ a tuple of R, and {D1, …., Dn} the domains of R; then
5. Two sets of attributes A and B are compatible if they are of the same degree and the
corresponding domain are of the same data type.
The relations deposit the products manufactured by a company (their code, production costs, and
selling price), and the buyers of those products (their names and the products they buy).
For each operator, the expression on the left-hand side of the equality sign is the relational
algebra expression for the operator. The expression on the right-hand side is the relational
calculus definition of the operator.
The first three relational operators are required to obtain any subset of a given relation. For
example, suppose that a user waits those tuples in the PRODUCT relation where the PRICE
attribute value is less than or equal to 8.
Restriction operator can express this requirement as
PRODUCT [PRICE ≤ 8] = (code, cost, price)
A 5 8
B 4 4
More finally, restriction is defined as
95
R [A V] = {r:r ∈R ˄ (r[A]=v)}
Where A is an attribute of R
(theta) is one of the conditional operators < , ≤ , > , ≥ , = , ≠
V is a literal value
The restriction operator is equivalent to a qualification, containing a single condition, on a single
relation. It requires a specific data value (8 in the preceding example) in the condition involving
two attributes of the same relation. For example, suppose that one would like to know which
products are being sold at cost. The selection operator can specify this data selection as
PRODUCT [PRICE = COST] = (CODE, COST, PRICE)
B 4 4
In this case, the qualification specifies a condition involving the two attributes PRICE and COST
in the relation PRODUCT. The two attributes must be compatible. That is, they must be of the
same data type. The result relation contains only those tuples of PRODUCT where the COST
attribute value is equal to the PRICE attribute value. More formally, selection is defined as
R [A ] = {r:r ∈ ˄( [ ])}.
Projection operator can be used to perform this selection. For example, if a user wants to know
names of all buyers of products, this data selection specified as
BUYER [NAME] = (NAME)
Smith
Jones
Adams
In addition, any duplicate tuples are also eliminated. More formally, projection is defined as
R [A] = {r [A] : r ∈ }
The cross-product operator forms all possible combination of the tuples of two relations.
For example, the cross product of BUYER and PRODUCT is
BUYER ⊗ PRODUCT = (NAME, ITEM, CODE, COST, PRICE)
1. Smith A A 5 8
2. Jones B A 5 8
3. Adams A A 5 8
4. Smith B A 5 8
5. Jones A A 5 8
6. Smith C A 5 8
7. Smith A B 4 4
8. Jones B B 4 4
9. Adams A B 4 4
10. Smith B B 4 4
11. Jones A B 4 4
12. Smith C B 4 4
13. Smith A C 6 9
14. Jones B C 6 9
96
15. Adams A C 6 9
16. Smith B C 6 9
17. Jones A C 6 9
18. Smith C C 6 9
R ⊗ S = {( ) : r ∈ ˄ ∈ }
Suppose that a user wants a list of buyer names, the products they buy and the cost and price of
each product. The answer to this query is contained in two relations, BUYER and PRODUCT. A
new relation containing the answer to the query can be constructed by taking the join of BUYER
and PRODUCT according to a join condition.
The join condition is expressed on two compatible attributes, one from each of the original
relations.
Smith A A 5 8
Jones B B 4 4
Adams A A 5
8
Smith B B 4 4
Jones A A 5 8
Smith C C 6 9
R [A ] S = {( ): r ∈ ˄ ∈ ˄( [ ])}
Join
(1) Generalize join forms a new tuple from a BUYER and PRODUCT tuple whenever the
join condition is satisfied.
(2) A natural join is a join where the conditional operator is equality.
97
“Find the buyers who buy each type of product”
= (Name)
Smith
R[A÷ ] S = {r [ ̅]: r ∈ ˄ s [B]⊆gR (r [ ̅])} where the attribute A and B are compatible.
The result relation consists of the projection of the tuples in the dividend relation on those
attributes (NAME) not in the dividend attribute that satisfy the division.
∩ = intersection
− = difference
Find those buyers who purchase products whose price is greater than 5 but less than 9 would be
expressed as
98
· Intersection = set1 *set2 ∩
· Difference = set1-set2 - =[1, 3, 4]- [1, 2, 4] = [3]
· Division = set1/set2
· Equality = [1, 3] = [1,3] = true
· Inequality = [1, 3] <> [2, 4]
· Subset =⊆ = [1, 3]<= [1, 2, 3, 4]
· Proper subset= [1,3]< [1, 2, 3, 4]
· Super set = [1, 2, 3,4]>= [1, 3]
· Proper superset =[1, 2, 3, 4] > [1,3]
Query =search
Relational operation.operates
Concatenation of
Many relation = gR r [ ̅]
99
UNIT NINE
RELATIONAL ALGEBRA
Four broad classes
Is the set of element appears in R and S or both. An element appears only once in the
union even if it is present in both R and S.
R – S = the difference of R and S is the set of element that are in R but not in S. Note that R-S is
difference from S-R. The letter is the set of element that are in S but not in R.
a 2 f a f 2
b 1 g b g 1
c 3 f c f 3
1 A
gR(D1 = 2) ={<c>}
1 B
2 C gR (D1 = 3) = {∅}
1 d
100
1 a x f 2 R [A] = {x, f, a}
2 a y g 3 R [A] = {1,2}
1 b x f
2 c y b 2
A1, A2,……, An (R) is a relation that has only the columns attributes A1, A2, ….., An of R.
Movie
in color (movie) =
Incolor
true
Selection = operator applied to a relation R provides a new relation with a subset of R’s tuples.
The tuples in the resulting relation are those satisfy some condition C that involves the attributes
of R. We denote this operation
101
Title Year length Incolor Studio name Producer
Star wars 1977 124 True Fox 12345
Might ducks 1991 104 True disney 67890
Operations that combine the tuples of two relation including Cartesian product or cross product
Cartesian product, cross product or product of two set R and S is the set of pairs that can be
formed by choosing the first element of the pair to be any element of R and the second on
element of S. This product is denoted by R ⊗ S
A B
1 2
3 4
B C D
2 5 6
4 7 8
9 10 11
A R.B S.B C D
1 2 2 5 6
1 2 4 7 8
1 2 9 10 11
3 4 2 5 6
3 4 4 7 8
3 4 9 10 11
102
Natural join
We find a need to join two relations by pairing only those tuples that match in some way. The
simplest sort of match is the natural join of two relations R and S.
Let A1, A2, …., An be attribute in both the schema of R and the schema of S then a tuple r
from R and a tuple s from S are successfully paired if and only if r and s agree on each of the
attribute A1, A2, …., An
A B C D
1 2 5 6
3 4 7 8
B C D
9 10 11
A tuple that fails to pair with any tuple of other relation in join is sometimes said to be a dangling
tuple.
A B C
1 2 3
6 7 8
9 7 8
B C D
2 3 4
2 3 5
7 8 10
U⋈ V
A B C D
1 2 3 4
1 2 3 5
6 7 8 10
9 7 8 10
103
Theta- joins
The natural join forces us to pair tuples using one specific condition.
Theta join refers to an arbitrary condition which we shall represent by ∁
R ⋈ ∁S
U ⋈A< D V
What are the title and years of movies made by fax that are at least 100 minute long ?
To compute
title, year
↙↘
104
title, year (б length ≥ (movie) ∩ б studio name = “fox” (movie) )
Movie1 (title, year, length, filmType, studioName)
Movie2 (title, year, starName)
Find the stars of movie that at least 100 minute long
title, year (б length ≥ 100 (movie1⋈ movie2)
Renaming
= Ps (R) = change the name to S
Payo (bell) = change the name to ayo from bell
Ps (x, c, d) (S) is a relation named S but its first column has attribute X instead of B
Uncle (x, y) P
↙ ↘
U S
Grandparent (p, y) mother (P, X)
Sister (S, X)
105
As a collection of tuples of feets
Father <adam, bill>
Father <adam, beth>
Mother <anne, bill>
Mother <anne, beth>
Parent <adam, bill>
Parent <adam, beth>
Parent <anne, bill>
Parent <anne, beth>
Deductive database
Parent (X, Y) ← father (X, Y)
Parent (X, Y) ← mother (X, Y)
Father (adam, bill)
Father (adam, beth)
Mother (anne, bill)
Mother (anne, beth)
Grandparent (X, Z) ← parent (X, Y), parent (Y, Z)
Parent (X, Y) ← father (X, Y)
Parent (X, Y) ← mother (X, Y)
Father
X Y
adam Bill
Bill cathy
a
Parent
Y Z
Adam Bill
Bill Cathy
Cathy dove
b
X F.Y P.Y Z
Adam Bill adam Bill
Adam Bill Bill cathy
Adam Bill Cathy Dove
106
Bill Cathy Adam Bill
Bill Cathy Bill Cathy
Bill cathy Cathy dove
X F.Y P.Y Z
Adam Bill bill Cathy
Bill cathy Cathy dove
X F.Y Z
Adam Bill Cathy
Bill Cathy Dove
Natural join
Taking the Cartesian product of the two relations
Selecting those tuples which have identical attribute on the columns with the same attribute
Filtering out the super flows columns
F (X, Y) ⋈ P (Y, Z)
F (X, Y) ⋈ P (Y, Z) is defined as
, . , б . = . ( ( , )× ( , ))
1. Cartesianproduct F (X, Y) × P (Y, Z)
2. Selected the same tuple of the same value F.Y and P.y
, ( ( , )⋈ ( , ))
Or
Or
r (a, c)∈ P for which there exist no b (b ≠ a and b ≠ c) such that r (a, b) ∈ P and r (b, c) ∈ P
107
a
b
c
d
Deductive database
Over (a, b)
Over (b, c)
Over (c, d)
Or
On (a, b)
On (b, c)
On (c, d)
108
Path (X, X)
109
UNIT TEN
SQL
Table column1 column2 column n
Tuples or records
· A table can have up to 254 columns with the same or different data types
· A set of values (domain)
110
Data type derived from number are integer = int
Decimal = dec
Smallint
Real
In Oracle SQL = no data type ‘Boolean’. It can be simulated by using char (1) or number (1).
It may have value null. Null is different from empty string ‘’ or 0.
Example database
EMP (EMPNO, ENAME, JOB, MGR, HIREDATE, SAL, DEPT. NO.)
7369 Smith clerk 7902 17-Dec-80 800 20
20 research Dallas
111
SALGRADE ( GRADE LOWSA HIGHSAL )
1 700 1200
2 1201 1400
3 1401 2000
QUERIES
SQL is used to retrieve information from database.
Form:
Select distinct <columns>
From <table>
Where<condition>
Order by<column>asc/desc
Select is also called projection
Select loc, DEPTNO
From DEPT;
Operators
a) for numbers: abs, cos, sin, exp, log, power, mod, sqrt, +,-,*,/
b) for strings: char, concat (string1, string2),
lower, upper,
replace (string, search_ string, replacement_string)
` substr(string, m, n)
length, to_date, translate
112
c) for date: add.month
month.between
next.day
to.char
· Distinct = after the keyword select, forces the elimination of duplicates from the query
result.
· Order by = works with column.
SELECTION OF TUPLES/RECORDS
Where = for records retrieval
= simple operator = and, or, not
= condition may be pattern matching
List the job title and the salary of those employees whose manager has the number 7698 or 7566
and who earn more than 1500
Select JOB SAL
From EMP
Where (MGR = 7698 or MGR = 7566)
And SAL > 1500;
113
Further comparison operators are
2. Set condition <column> not, in <list of values>
select *
from DEPT
String Operations
Like = uses two operators %, _ (percentage, and underline)
Percentage = wild card
Underline = position marker
If one is interested in all tuples of the table DEPT that contain two C in the name of the
department, the condition would be
Where DNAME like %C%C%.
% sign means that any sub(string) is allowed there even the empty string. Underline stands for
exactly one character where DNAME like %C_C% would require that exactly one character
appears between the two Cs.
114
1. upper (string)
DNAME = UPPER (DNAME)
2. lower (<string>) converts letters to lower case
3. initcap(<string>) converts initial letter to upper case
4. length (<string >) returns the letter of the string
5. substr(<string>n, m )
6. substr(‘DATABASE SYSTEMS’, 10, 7) returns ‘SYSTEMS’
Aggregate function
1. count = count rows
How many tuples are stored in the relation EMP?
select count (*)
from EMMP
How many different job titles are stored in the relation EMP?
select count (distinct JOB)
from EMP
2. MAX
3. MIN
List the minimum and maximum salary
Select min (SAL), max(SAL)
From EMP
4. SUM = sum of all salaries of employees working in the department 30
Select sum (SAL)
From EMP
Where DEPTNO = 30
QUERIES
Join tuples
Select distinct [<alias ak>.]<column i>, …
115
select ENAME , E.DEPTNO, DNAME select ENAME , DEPTNO, DNAME
From EMP E, DEPT D from EMP, DEPT
Where E.DEPTNO = D.DEPTNO Where EMP.DEPTNO = DEPT.DEPTNO
And JOB = ‘SALESMAN’; And JOB = ‘SALESMAN’;
e.g. for each project, retrieve its name, the name of its manager, and the name of the department
where the manager is working.
selectEName, DName, PName
from EMP E, DEPT D, PROJECT P
116
SUBQUERIES
A respective condition in the where clause can have one of the following forms;
1. set-valued subqueries
expression [not] in <subquery>
expression<comparison operator> any/all <subquery>
An expression can either be a column or a computed value.
2. test for (non) existence
[not] exists <subquery>
In where clause condition, using subqueries can be combined arbitrarily by using the logical
connectives and, or
e.g. list the name and salary of employee of department 20 who are leading a project that started
before December 31, 190
selecteName, salary
from EMP in
(select PMGR
from PROJECT
where PSTART <’31-OCT-90’)
and DEPTNO = 20;
The subquery retrieves the set of those employees who manage a project that started before
December 31, 1990. If the employee working in department 20 is contained in this set (in
operator), this tuple belongs to the query result set.
List all employees who are working in a department located in BOSTON
select *
from EMP
where DEPTNO in
(select DEPTNO
from DEPT
where LOC = ‘BOSTON’);
117
A subquery may rise again in a subquery in its where clause
List all those employees who are working in the same department as their manager
Select *
From EMP E1
Where DEPTNO in
(select DEPTNO
from EMP [E]
where [E.]EMPNO = E1. MGR);
The subquery in this example is related to its surrounding query since it refers to the column
E1.MGR. For Each tuple in the table E1, the subquery is evaluated individually.
Condition of the form <expression><comparison operator> [any/all] <subquery> are used to
compare a given <expression> with each value selected by <subquery>.
Retrieve all employees who are working in department 10 and who earn at least much as any (i.e.
at least one) employee working in department 30
Select *
fromEMP
from EMP
where DEPTNO = 30)
and DEPT NO <> 30;
118
For all and any, the following equivalences hold:
In ó = any
Not in ó<> all or != all
Often a query result depends on whether certain rows do (not) exit in (other) tables. Such type of
queries is formulated using the exits operator.
List all the departments that have no employees
Select *
from DEPT
where not exits
(select * from EMP
where DEPTNO = DEPT.DEPTNO)
from EMP
union
select EMPNO, ENAME
from.EMP2
119
· Employees who are listed in both EMP and EMP2
select *
from EMP
intersect
select *
from EMP2
from EMP2
Each operator requires that both tables have the same data type for the columns to which the
operator is applied.
Grouping
Group by<column(s)>
This clause appears after the where clause and must refer to columns of tables listed in the from
clause e.g. for each department, we must retrieve the minimum and maximum salary
Select DEPTNO, min (SAL), max (SAL)
From EMP
Group by DEPTNO
Result
DEPTNO min (SAL) max(SAL)
10 1300 5000
20 800 3000
30 900 2850
If a group contains less than three rows, this type of condition is specified using the having
clause. As for the select clause also in a having clause only<group_column(s)> and aggregation
can be used
120
e.g. retrieve the minimum and maximum salary of clerks for each department having more than
three clerks
· Comment on table
· Comment on column
[<column constraint>]
If more than only one column should be added at one time respectively add clause needs to be
separated from by colons
121
A table constraint can be added to a table using
Alter table<table>
Add (<table constraints>)
When the size of strings that can be stored needs to be increased
Alter table<table>
Modify <column><datatype>
Default <value><column constraints>
It is now possible to rename a table, column and constraint
Deleting a table
A table and its row can be deleted by using the commend
View
To create a view (virtual table) has the form
Create [or replace]view<view name>[<column(s)>] as<select statement>[with check option
[constraint<name>]]
Replace recreates the view if it already exists.
The following view contain the name, job title and annual salary of employees working in
department 20
122
A view can be used in the same as a table, that is row can be retrieved from a view or rows can
be modified.
In Oracle, SQL, no insert, update, or delete modification on views are allowed that use one of
the following constraints in the view definition
· Join
· Aggregate functions such as sum, min, max, out etc
· Set-valued subqueries (in, any, all) or test for existence (exits)
· Group by clause or distinct clause
A view can be deleted using the command
Delete <view_name>
----------------
----------------
<Column n ><data type> [not null] [unique] [<column constraint>],
[<Table constraint(s)>]
);
123
create table EMP(
empno number(4) not null,
empName varchar2(30) not null,
job varchar2(10)
or
124
b)Insert into PROJECTvalues (313, ’DBS’, 7411, NULL, 1500.42, ’10-OCT-94’, null);
If there are already some data in other table; these data can be used for insertions into a new table
Insert into <table>[(<column i,…,column j>)]<query>
Create table OLDEMP (
Where HIREDATE<’31-DEC-60’;
Updates
For modifying attribute values of (some) tuples in a table, we use the update statement;
Update<table> set
<column i> =<expression i>,….
Note
§ That the new value to assign to <column i> must be matching the data type.
§ An update statement without a where clause results in changing respective attributes of
all tuples in the specified table.
e.g. a) the employee JONES is transferred to the department 20 as a manager and his salary is
increased by 1000
update EMP set,
JOB = ‘MANAGER’, DEPTNO = 20, SAL = SAL+1000
125
SAL = SAL * 1.5
Where DEPTNO IN (10,30);
We can use query instead of expression:
e.g. all salesmen working in department 20 get the same salary as the manager who has the
lowest salary among all managers
update EMP set
SAL = (select min(SAL) from EMP
Where JOB = ‘MANAGER’)
The query retrieves the minimum salary form all managers. This value is assigned to all
salesmen working in the department 20.
Deletions
Delete from<table>
[where<condition>],
Note if the where clause is omitted, all tuples are deleted from the table.
e.g. delete all projects (tuples) that have been furnished before the actual date(system date):
delete from PROJECT
where PEND<sysdate;
Note sysdate is a function in SQL that return the system date.Another SQL function is user
which returns the name of the user logged into the current oracle session.
126
Commit and rollback
from
where
group by
order by
inner join merge rows
insert rows
update - rows
delete - rows
Constraints
In creating table, two types of constraints are provided:
Þ column constraints
Þ table constraint.
Column constraints are associated with a single column.Table constraints are associated with
more that one column.
Constraint <name> primary key unique not null
· Constraint can be named in case of violation due to insertion. Two constraints here are
unique and not null
· The most important type of integrity constraints in a database are primary key constraints.
· A primary key constraint enables a unique identification of each tuple in the table. Based
on the primary key, the database system ensures that no duplicates appear in a table.
e.g
create table EMP(
127
EMPNO number (4) constraint pk_emp
Primary key;
)
· Defines the attribute EMPNO as the primary key for the table.
· Each value for the attribute EMPNO must appear only once in the table EMP.
e.g. we want to create a table called PROJECT to store information about projects. For each
project, we want to store the
i. number of the project
ii. name of the project
iii. the employee number of the project’s manager
iv. the budget
v. the number of persons working on the project
vi. The start date
vii. The end date of the project
We have the following conditions
a. A project is identified by its project number
b. The name of a project must be unique
c. The manager and the budget must be defined
Pstart date
Pend date
);
If it is required that no two projects have the same start and end date, we have to add the table
constraint.
128
Constraint no-same-dates unique (PEnd, PStart).This constraint has to be defined in the create
table command after both columns PEnd and PStart have been defined.
· not null
· primary key
· unique
Check constraint = to restrict possible attribute values
Foreign key constraint = to specify interdependences between relations
Check constraint
syntax
Constraint<name> check <condition>
Columns in a table must have values that are within a certain range or that satisfy certain
conditions. If a check constraint is specified as a column constraint, the condition can only refer
that column
e.g
o The name of employees must consist of upper case letters only
o The minimum salary of an employee is 500
o The numbers must range between 10 and 100
Condition can refer to all columns of the table. Not only simple conditions are allowed.
129
SAL number(5.2) constraint check_sal
Check (SAL is not null and SAL>=500)
It is allowed to use and, or, not are allowed in the condition.
e.g. at least two persons must participate in a project and project’s start date must be before the
project’s end date.
Create table PROJECT(
-----
Persons number (5)constraint check_person
check(person>2)
-----
Constraint dates_ok check (PEND >PSTART),
In this task definition, check_person is a column constraint and date_ok is a table constraint.
The database system automatically checks the specified conditions each time a database
modification is performed on this relation.
References<table> [(<columns>)]
On delete cascade
ü A foreign key constraint or referential integrity constraint can be specified as a column
constraint or as a table constraint
130
ü This constraint specifies a columns or a list of columns as a foreign key of the referencing
table
ü The referencing table is called the child_table and the referenced table is called the parent
_table
ü The clause foreign key has to be used in addition to the clause references if the foreign
key includes more than one column
ü The clause references defines which columns of the parent_table are referenced.
e.g. each employee in the table EMP must work in a department that is contained in the table
DEPT
create table EMP(
EMPNO number (4) constraint pk_emp primary key;
-----
According to the above definition for the tableEMP, an employee must not necessarily work in a
department.
e.g. each project manager must be an employee
131
Constraint <name>| primary key |unique <columns>
Cascade
To disable a primary key, one must disable all foreign key constraints that depend on this
primary key. The clause cascade automatically disables foreign key constraints that depend on
the (disabled) primary key.
Triggers
Þ Is a procedure
Þ Such a procedure is associated with a table and is automatically called by the database
system whenever a certain modification (event) occurs on that table.Modificationson a
table may include:
i. Insert
ii. Delete
iii. Update operations
Structure of triggers
A trigger definition consists of the following components:
i. Trigger name
Create or replace trigger<trigger name>
ii. Trigger time point
Before/after
iii. Trigger event(s)
Insert or update of <columns>
Or delete on <table>
iv. Trigger type (optional)
For each row
132
using the clause for each row.A row trigger executes once for each row after (before) the
event.A statement trigger is executed once after (before) the event.If the update affects 20
tuples, the trigger is executed 20 times for each row at a time. A statement trigger is only
executed once.
When combining the different types of triggers, there are twelve possible trigger configurations:
Update
delete
Only with a row trigger it is possible to access the attribut.values of a tuple before and after the
modification.
v For an update trigger, the old attribute value can be accessed using :old<column>
And the new attribute value can be accessed using: new<column>
v For an insert trigger only: new<column> can be used
v For a delete trigger only: old<column> can be used
In these cases:
:new<column> refers to the attribute values of <column> of the inserted tuple
:new<column> refers to the attribute value of <column> of the deleted tuple
In a row trigger thus it is possible to specify comparisons between old and new attribute values
in the PL/SQL blocks e.g.
if :old.sal<:new. Sal then
<Sequence of statement>
o When clause can be used in combination with a for each row trigger.
The trigger body consists of a PL/SQL block. Rollbackand commit can be used in this block.
Three constructs = If inserting
If updating exit
If deleting
e.g.
133
create or replace trigger emp.check
after insert or delete or update on EMP for each row
begin
if inserting then
<PL/SQL block>
end if
if updating then
<PL/SQL block>
end if
if deleting then
<PL/SQL block>
end if
end;
In the PL/SQL blocks of a trigger, an exception can be raised using the statement:
raise_application_error(
Raise_application_error can refer to old/ new values of modified rows:
raise_application_error(-20010,’salary increase form’
|| to_char (:Lold.sal)||`to’
to_char (: new.sal)||’ is for high’)
raise_application_error(-20030,’employee id’||
134
We assume a table SALGRADE that stores the minimum (MINSAL) and maximum (MAXSAL)
salary for each job title (JOB).
Since the above condition can be checked for each employee individually, we define the
following row trigger:
Create or replace trigger check_salary_EMP
After insert or update of SAL, JOB on EMP for each row
from SALGRADE
where JOB = :new. JOB;
----- if the new salary has been decreased or does not lie within the salary range raise an
exception.
if (:new.SAL<minsal or
:new. SAL>maxsal) then
raise_application_error(-20230, ‘salary range exceeded’)
135
UNIT ELEVEN
PL/SQL
Procedure language SQL allows users and designers to develop complex database applications
that require the usage of control structures and procedural elements such as
· Procedures
· Functions
· Modules
The basic construct in PL/SQL is a block statement in a PL/SQL block include
· SQL statements
· Control structures (loops)
· Condition statement (if-then-else)
· Exception handling
· Calls of other PL/SQL blocks
PL/SQL blocks that specify procedures and functions can be grouped into packages. A package
is similar to a module and has an interface and an implementation part.
PL/SQL offers a mechanism to process query results in a tuple-oriented way, that is, one tuple at
a time. For this, cursors are used. A cursor is a pointer to a query result and used to read attribute
values of selected tuples into variables. A cursor is used in combination with a loop construct
such that each tuple read by the cursor can be processed individually.
2. [declare
<constants>
<variables>
<cursors>
136
<user defined exceptions> ]
Begin
3. <PL/SQL statements>
4. [ exception
<exception handle> ]
End;
Declarations constants, variables cursors and exceptions must be declared in the decision section
of that block.
The notnull clause requires that declared variable must always have a value different from null.
Expression is used to initialize a variable.
Constant mean once a value has been assigned to the variable, the value cannot be changed.
Declare
hire_date date;
emp_found Boolean;
….
Begin
…..
End;
₋ column
- many columns
137
a. Emp.empno%TYPE: refers to the data type of a column EMPNO in the relation EMP.
b. DEPT%ROW TYPE: specifies a record suitable to store all attribute values of a complete
row from the table DEPT.
Such records are typically used in combination with a cursor. A field in a record can be accessed
using
A cursor declaration specifies a set of tuples (as a query result) such that the tuples can be
processed in a tuple-oriented way (i.e one tuple at a time) using the fetch statement.
Examples of a parameter type are char, varchar2, number, date, Boolean, integer.
Parameters are used to assign values to the variables that are given in the select statement.
e.g we want to retrieve the following attribute values from the table EMP in a tuple-
oriented way: the job title and name of those employees who have been hired after a
given date, and who have a manager working in a given department.
Dno number) is
138
From EMP
And exit.
Before a declared cursor can be used in PL/SQL statements, the cursor must be opened and after
processing the selected tuples, the cursor must be closed. We use open statement to open a
cursor.
Open
Fetch <cursor_name>
· Fetch command assigns the selected attribute values of the current tuple to the list of
variables.
· After the fetch command, the cursor advances to the next tuple in the result set.
Note that the variables in the list must have the same datatypes as the selected values.
After all the tuples have been processed, the close command is used to disable the cursor.
139
· PL/SQL does not permit the use of createtable statement way to assign a value to a
variable.
1. Declare
Counter integer := 0;
Begin
Counter = counter + 1;
End
2. Select statement
Select <columns>
From <tables>
Where <condition>
Here, select retrieves one tuple/record, two records pls use cursor.
Instead of a list of single variables, a record can be given after the keyword into. Also, in this
case, select statement must retrieve atmost one tuple.
Declare
Max_sal EMP.SAL%TYPE;
Begin
DEPTNO
Into employee_rec
From EMP
140
Where EMPNO = 5698;
Select max(SAL)
Into max_sal
From EMP;
End;
Loops
1. While loops
2. Two types of for loops
3. Continous loops: continous loop are used in combination with cursors.
Label name
<sequence of statements>
2. Label name
For <index> in [reverse] lower bound..upper bound loop
<sequence of statement>
141
Cursor for loops can be used to simplify the usage of a cursor;
Label name
For <record_name>
In <cursor name>
<sequence of statements>
<sequence of statements>
End loop;
end loop;
If – then – else
If <condition> then
<sequence of statements>
End if;
142
Except create a table, other command such as delete, update, insert,
If update or delete statements are used in combination with a cursor these commands can be
restricted to currently fetched tuple. In these cases the clause where current of <cursor name> is
added as shown in the example.
The example below illustrates how a cursor is used together with a continous loop.
Declare
Emp_rec EMP%RPWTYPE
Emp_sal EMP.SAL%TYPE
Begin
Open emp_cur;
Loop
Exit
Emp_sal = emp_rec.sal;
<sequence of statement>
Endloop
Close emp_cur
End.
143
Exit <block label>
When <condition>
· Using exit without a block label causes the completion of the loop that contains the exit
statement.
· A condition can be a simple comparison of values.
· In most cases, the condition refers to a cursor.
· %NOT FOUND is a predicate that evaluates to false if the most recent fetch command has
read a tuple.
· The value of <cursor name> %NOT FOUND is null before the first tuple is fetched.
· The predicate evaluates to true if the most recent fetch failed to return a tuple and false
otherwise.
%found is the logical opposite of %not found.
EX2: the following PL/SQL block performs the following modifications. All employees having
KING has their manager get a 5% salary increase.
Declare
Manager EMP.MGR%TYPE;
Select SAL
From EMP
Begin
144
Select EMPNO
Into manager
From EMP
Update EMP
End loop;
Commit;
End;
EXCEPTION HANDLING
CURSOR_ALREADY OPEN
INVALID_CURSOR
NO_DATA_FOUND
145
TOO MANY ROWS
ZERO_DIVIDE
3. raise_application_errore.g
declare
emp_sal EMP.SAL%TYPE;
emp_no EMP.EMPNO%TYPE;
too_high_sal exception;
begin
from EMP
else
update EMP
set SQL…..
endif;
146
exception
then rollback;
when too_high_sal
commit;
end;
then raise_application_error (-20000, ‘salary increase for employee with id’ // to-char
(emp.no) // ‘is too high’);
user-defined exception: which must be declared by the user in the declaration part of a
block where the exception is used/implemented.
147
o After the keyword exception at the end of a block, user defined exception
handling routines are implemented.
An implementation has the pattern
Declare
Emp_sal EMP.SAL%TYPE;
Emp_no EMP.EMPNO%TYPE;
Too_high_sal exception;
Begin
Select EMPNO
148
SAL into emp_no,emp_sal
From EMP
Raise too_high_sal
Else
Update EMP
Set sal….
Endif;
Exception
Then rollback;
Values (emp_no);
Commit;
End;
If a PL/SQL program is executed from the SQL plus shell, exception handling routines may
contain statements that display error in warning messages on the screen.
149
Raise_application_error can be used.
Error-number is a negative integer defined by the user and must range between -20000 and -
20999.
// = the concatenate operator can be used to concatenate single strings to one string.
In order to display numeric variables, these variables must be converted to string using the
function to_char. e.g
Raise_application_error (-20010,’salary increase for employee with id’ // to_char (emp_no) // ‘is
too high’);
ETHERNET
SQL statements
Control statements(if-then-else)
Exception handling
PL/SQL blocks that specify procedures and functions can be grouped into packages
Input/output routines
150
File handling
Job scheduling
One importance of PL/SQL is that it offers a mechanism to process query results in a triple
oriented way i.e one triple at a time. For this, cursors are used. – A cursor is a pointer
to query result. –A cursor is
used to read attribute values of selected turples into variables. –
A cursor is used in combination with a -loop construct such that each turple read by the cursor
can be processed individually.
[ declare
<constants>
<variables>
<cursors>
Begin
[ Exception
<exception handling> ]
End.
Declaration
-constant, variables, cursors and exceptions used in a PL/SQL blocks must be declared in the
declare section of that block. Eg.
151
Variables and constant can be declared as follows:
[:= <expression> ];
· Char(40)
O=38
D= - 84 to +127
Eg. Number(8)
Number(5.2)
Boolean date may only be true, false or null. The “not null” clause requires that the declared
variable must always have a value different from null
152
Declare
Emp-found Boolean;
……….
Begin …. End;
Instead of specifying a date type, one can also refer to the date type of a table column so called
anchored declaration. Eg.
EMP. empno % type refers to the date type of the column empno in the relation EMP.
DEPT % ROW TYPE specifies a record suitable to store all attribute. Values of a
complete row from the table DEPT.
CURSOR
- A cursor declaration specifies a set of turples
- A turple can be processed in a turple-oriented way. Eg, one turple at a time using the fetch
statement.
– A cursor declaration has the form;
Cursor <cursor name> [ ( < list of parameters > ) ]
Is < select statement >;
E.g. We want to retrieve the following attribute values from the table EMP in a turple-
oriented way. The job title and name of those employees who have been hired after a given date
and who have a manager working in a given department
153
AND DEPT NO =dno);
-before a cursor can be used, it must be opened using the open statement.
Open < cursor name > [(<list of parameters>)]
The associated select statement Then is processed and the cursor reference the first selected
turple. –selected
turples then can be processed one turple at a time using the fetch command.
Fetch < cursor name > into < list of variables >;
- fetch command assigns the selected attribute values of the current turple to the list of variables.
-after the fetch command, the cursor advances to the next turple in the result set
-note that the variables in the list must have the some date type as the selected values.
-after all turples have been processed, the close command is used to disable the cursor
Close < cursor name >
Declare
Cursor emp.cur is
Select from emp;
emp-rec EMP % ROW TYPE;
emp-sal EMP.SAL %TYPE;
Begin
Open emp.cur;
Loop
Fetch emp.cur into emp-rec;
Exit
When emp.cur % NOT FOUND;
Emp-sal I = emp-rec.sal;
End loop
Close emp.cur;
End;
Language elements
(1) variable assignments
(2) control structures - loops (while and for)
-if –then- else
154
1. variable assignments
declare
counter integer i = 0;
begin
counter i =counter + i;
while =
End loop
End loop
Reverse = causes the iteration to proceed downwards from the higher bound to the
lower bound.
If < condition> then
<sequence of statements>
Else < sequence of statement>
End if;
The following PL/SQL block performs the following modification. All employees having
“KING” as their manager get a 5% salary increase.
Declare
Select SAL
From EMP
155
Begin
From EMP
Update EMP
End loop;
Commit;
End;
Procedure syntax
Is
<declaration>
Begin
[exception
Function syntax
156
Return < date type > is
This procedure is used to increase the salary of all employees who work in the department given
by the procedure’s parameter. The percentage of the salary increase is given by a parameter too.
Is
Select SAL
From EMP
Empsal number(8);
Begin
Loop
Exit
Update EMP
157
End loop;
close emp-cur;
commit;
end raise-salary
this procedure can be called from the SQL plus shell using the command
Functions have the same structure as procedures. The only difference is that a function returns a
value whose data type (un constrained) must be specified.
Return number is
All-sal number;
Begin
All-sal i=0;
For emp-sal in
(select SAL
From EMP
Emp loop;
Return all-sal;
End get-dept-salary;
158
In order to call a function from SQL plus shell, it is necessary to first define a variable to which
the return value can be assigned.
E.g.
We use an after trigger because the inserted and updated row is not changed within the PL/SQL
block (eg.in case of a constraint violation, it would be possible to restore the old attribute
values).
Note that also modification on the table SALGRADE can cause a constraint violation.
In order to maintain the complete condition we define the following trigger on the table
SALGRADE. In case of a violation by an update modification however, we do not raise an
exception, but restore the old attribute values.
Declare
Begin
If deleting then :……….does there exist an employee having the deleted job?
Select count(*)
Into job-emps
From EMP
159
If job.emps !=0 then…….. ( Raise-application-error (-20240,” there still exist
employee with the job” II :old.job));
End if
End if
If updating then …………. (are there employees whose salary does not lie within the
modified salary range? )
Salary count(*)
Into job-emps
From EMP
:new .MAXSAL;
End if;
updated row.)
end.
Suppose we furthermore have a column BUDGET in one table DEPT that is used to store the
budget available for each department.
Assume the integrity constraint requires that the total of all salaries in a department must not
exceed the department budget.
Critical operations on the relation EMP are insertions into EMP and updates on the attributes
SAL or DEPTNO
160
Create or replace trigger check-budget-EMP after insert or update of SAL DEPTNO on EMP
Declare
Cursor DEPT-CURR is
From DEPT,
DEPT-SAL number;
Begin
Open DEPT-CURR;
Loop
FROM EMP
end if
end loop
close DEPT-CURR
end.
161
References
Abiterboul S., Hull R and Viarin V. (1995). Foundation of Databases, Addison-Wesley
Anderson R.G. (1990).Data Processing Principles and Practice. Pitman Publishing, Longman
Group UK, 7th Edition
162
Cormen T., Leisserson C. and Rivest R. (1990). Introduction to Algorithms, MIT Press
Garcia-Molina H., Ullman J.D. and Widom J.O. (2008). Database systems, The complete Book,
2nd Edition, Prentice Hall
163