Database Lecture Notes
Database Lecture Notes
BY MR. KASHALE
2023
Syllabus Content
An overview of database systems
Data Models and Implications
Database Architecture and its’
Environment
Database Functions and Languages
Conceptual Database Modelling
Relational Database Systems
Relational Database System / Relational
Database Languages
Structured Query Languages (SQL)
Functional Dependencies
Normalizations for Relational DBS
Procedural DBL and PL SQL
Recommended Books
Data Management: Databases and
Organisations. Watson, Richard T. John Wiley
and Sons: 5th Edition 2005
Data Mining: Concepts and Techniques. Han,
Jiawei; Kamber, Micheline. Elsevier: 2nd
Edition 2006
Data Modelling Essentials. Simsion, Graeme;
Witt, Graham. Morgan Kaufmann Publishers:
3rd Edition 2005
Database Management Systems.
Ramakrishnan, Raghu; Gehrke, Johannes.
McGraw-Hill: 3rd Edition 2002
An overview of database
systems
The aim of this lecture is to introduce you to some
of the DB basic terminology that you need to know.
Topics covered are listed below:
1. Data and information.
2. Traditional file-based processing versus database
processing;
3. Evolvement of Database systems;
4. Database Applications and users;
Data and information..
The term data refers to raw facts and figures,
such as orders and payments, which are
processed into information.
Data is unprocessed facts or figures.
Information on the other hand refers to the
implicit association between data.
When the data is used in context and has a
meaning it then becomes information.
Data represent the values physically recorded in
the Database; e.g. 10023.
Information, though, refers to the meaning of
those values as understood by some users; e.g.
Mohammed’s Employee number is 10023.
Data is the technological term and information is
the business term.
A common misconception is that software is also
data. Software is however “executed”, or “run”,
by the computer whilst data is "processed."
Data and Information cont..
Electronic storage of data or information can take many
forms. Forms such as files, databases, text documents,
images and digitally encoded voice and video.
Data is a basic resource of the organisation and must be
collectively organized and managed to support all business
functions
Traditional file-based processing
The file-based system is the predecessor of Database
systems.
It is a collection of applications (data + programs)
that perform specific tasks to support some business
functions.
In the earlier days of computing, a simple approach
was to imitate the way in which paper files had been
organized.
Each department kept the records that it needed for
its own applications.
The Accounting department kept accounts payable
and receivable, sales and marketing kept information
on customers and products, and so forth.
These files were separate and not cross-indexed; such
"stand-alone" files are sometimes called flat files.
Traditional file-based processing
cont..
Programs were specially written for a specific task to
process a batch of records from these flat files.
Usually, it was the case that the programs used for
one department's records would not understand the
records designed for another department's
applications.
Each set of "application programs" owned its own set
of files, and this approach is known as file processing
Such file processing approaches resulted in some
serious problems for the business organisation.
Since data was organised according to particular
application, and each application held its own data in
"flat" (meaning unrelated) files, this meant that a lot
of duplicated data was being maintained
Traditional file-based processing
cont..
For example, a customer might have had a "sales"
record held by the Marketing Department and a
"receivables" record held by Accounting.
Each might have a separate record for the customer,
which included the customer's mailing address.
Of course, one drawback to file based processing was
wasted file space, and storage was, in the early days
of computing, very expensive.
This problem of duplicated data values is called data
redundancy.
A worse problem however was that, inevitably, data
values were often inconsistent between the files, this
led to confusion and inaccuracies
Traditional file-based processing
cont..
For example, if a customer’s address changed, the
sales department might alter their files to reflect this
change.
A mechanism may not be in place or fails which alters
the accounting departments address records.
The fact that each program was linked to its own
particular file organization caused more headaches.
When new programs were written, data often had to
be restructured and, conversely, when data was
restructured, programs had to be revised.
This problem of data dependence made program
maintenance expensive and much more difficult to
develop new programs.
Typically, with a file-based system, data for various
applications will be stored in a collection of flat files.
This approach has some disadvantages including the
following:
Traditional file-based processing
cont..
1. Difficult to handle complex data
2. Low data quality
3. Redundancy and inconsistency
4. No central management
5. Difficult to maintain and share in multi-user
environments
Traditional file-based processing
cont..
Tradition-based files processing system example:
Evolvement of Database systems
There are two key factor points that need addressing, in
order to solve the problems listed above or similar
problems.
1. First, programs and the data need to be separated and
stored separately and independently.
2. Second, access control and manipulation of data need
to be developed beyond those imposed by the
application programs
This new approach should support management of vast
amounts of data, including efficient access, some degree
of independence from the application programs,
quick application development, support for concurrent
access, recovery from system failure, security of data,
etc.
Such a system, developed to support these features is
called a Database Management System (DBMS).
The DBMS therefore is a collection of programs that
manage large amount of data, which we call a Database
Evolvement of Database systems
cont…
The database approach rests upon the concept of the
modern database which is formally defined as an
integrated, self-describing collection of related records
organized into files which are often referred to as
database tables.
Application programs do not create or maintain these
database tables themselves. This is the work of a DBMS.
The diagram below shows the Database System
Approach:
Evolvement of Database systems
cont…
Evolvement of Database systems
cont…
The DB hence, is a collection of related data.
They must be related, otherwise there is no context in
the data and the data becomes useless.
Data becomes useful only when it is in context. It
becomes information.
The definitions of all data in the DB are called Meta-data
(data about data) and it is stored in a Data Dictionary
(DD).
The term Information Resource Dictionary System has
been accepted with a known published standard.
In the database approach, company data resources are
held in a common collection of data files (often referred
to as data tables) which are organized so that they can
be easily accessed by all application programs which
have need for that data.
The database approach reduces maintenance of
application programs and data, and speeds up
development of new applications.
Evolvement of Database systems
cont…
The Database approach rests upon the concept of the
modern database which is formally defined as an
integrated, self-describing collection of related records
organized into files which are often referred to as
database tables.
Application programs do not create or maintain these
database tables themselves.
This is the work of a special type of system program
called a database management system (DBMS)..
The database partitions the data into tables that are
designed to each hold one very narrow type of data; a
person's name and address for example.
These files (tables) are then indexed in some way
between each other.
For example the Marketing Department may have an
application which reports the customer's sales, while the
Accounting department has an application concerning a
customer's payment record.
Evolvement of Database systems
cont…
Each of these applications would access the same table
that provided the customer's address.
Sales and accounting information would be kept in
separate database tables.
Database processing involves assembling the data
contained in the tables into a form that is suitable for
the end-user's particular application needs.
Database Applications and
users;
They are various classifications for Database users.
One classification, groups them into whether they can be
seen or not on the site of the Database system.
The first group represents Actors on Scene. These include
persons whose jobs involve the day-to-day use of a large
database.
Examples of these are the Data and Database
administrators, Database Designers, End Users: Casual,
Naive, Sophisticated and Stand-alone users, System
Analysts and Application Programmers.
The second category represents Workers behind the
Scene.
These include DBMS Designers and Implementers, Tool
Developers and Operators and Maintenance Personnel.
Database Applications and
users cont..
The Data administrator (DA) is the chief person who
oversees and manages all data resources within the
organization.
This including Database planning, development and
maintenance of standards, policies and procedures, and
conceptual/logical Database design.
The Database administrator (DBA) is responsible for the
physical actualization of the Database.
The role of the DBA is more technical compared with that
of the DA. The DBA requires deep knowledge of the
target DBMS and its environment.
For example, the DBA is responsible for authorizing
access to the Database, for coordinating and monitoring
its use, etc.
Other Database users are Database designers. They are
of two types, logical and physical Database designers.
Database Applications and
users cont..
Logical designers are concerned with identifying the data
elements and they must be aware of the business rules.
The logical Database designer usually gets involved in
producing the conceptual Database model, which is
independent from any implementation details that is
related to the target DBMS.
The physical Database designer takes the logical data
model (produced by the logical Database designer) and
translates it into a physical implementation (a set of
tables and integrity constraints in a relational model).
The physical Database designer select suitable storage
structures, access methods, and design any necessary
security measures for the data.
The physical Database designer must be aware of all
alternative implementations of the same application and
choose the most cost-effective strategy.
The Conceptual and logical Database design is concerned
with the “what” while physical Database design is
concerned with the “how”.
Database Applications and
users cont..
End users, who are another type of DB users, have a set
of requirements.
These requirements are translated into specifications for
canned transactions.
An application programmer must implement these
specifications.
The application programmer tests, debugs, documents,
and maintains all developed transactions.
End users are of different types, depending on their
knowledge and usage of the DBMS.
These include Naïve users who are typically unaware of
the DBMS. These users access the Database in the
simplest possible way through application programs.
They do not need any knowledge of the DBMS.
The other type of end users is called sophisticated users.
They are familiar with the structure of the Database and
the facilities provided by the DBMS. They may use a
stand-alone Database programming language such as
Database Applications and
users cont..
They may use a stand-alone Database programming
language such as SQL.
Lecture Summary
A successful Information System depends upon the
database approach.
Older file processing approaches had some serious
problems.
Data was organized according to particular application,
and each application held its own data in flat files
(meaning unrelated) files.
Data values were often inconsistent between the files,
which led to confusion and inaccuracies.
When programs were written, data often had to be
restructured and, conversely, when data was
restructured, programs had to be revised.
The database approach reduces maintenance of
application programs and data.
A database is an integrated, self-describing collection of
related records organized into files, which are often
referred to as database tables.
Lecturer Summary cont..
This approach facilitates the development of new
applications and encourages data sharing among
organizational units.
There are a number of different types of users. The most
important one is the DBA who is ultimate responsible
person for the maintenance and running of the Database.
The End
Database Models and Implications
Aims and Objectives
The aim of this lecture is to continue from
the last lecture on the introduction to
Databases.
The lecture will cover the following topics
briefly.
1. Data Models;
2. Characteristics of the Database approach;
3. Advantages and Disadvantages of
Databases
4. When not to use a DBMS;
Data Models
A model is a set of concepts used to represent some
aspects, one or more, of a UoD, domain or area of
interest, of some requirement; it’s an abstraction of the
important things in the domain.
A model is a high abstract perception of the domain. The
modeler (analyst or database administrator) plays the
role of a philosopher-king, in determining what
knowledge to represent, how to organize and express it;
and what constraints to impose to keep it as a consistent,
faithful model of the outside world.
To do a good job the modeler/analyst must be sensitive
to semantic issues and have a good working knowledge
of conceptual structures
Building a good data model depends on the process of
data analysis. The objectives of data analysis are two
folds:
Data Models cont..
1. Investigate the "Natural Structure" of the information to
be stored, i.e. deciding exactly what the relationships
are between individual datum.
2. Produce a representation (or model) of that structure,
which will be suitable for easy conversion into a
database structure description (schema).
There are various ways in which models are classified.
The main criterion normally used to classify DBMSs is the
data model on which the DBMS is based.
A Data model is a set of concepts that can be used to
describe the structure of a Database.
There are various categorizations for data models. These
are Conceptual, Implementation and Physical models
The most popular representational model is the
Relational model, which will be covered in depth in the
coming lectures while the object-oriented approach will
be covered at a later stage during the course.
Data Models cont..
In the next lecture we will describe, what is known as the
three level architecture for a Databases system.
Each of these levels focuses on a specific aspect in
Database design. These are:
Conceptual data model (High-Level):
This provides concepts that are close to the way many
users perceive data (e.g. ERD). It is a set-based data
model and DBMS independence
Representational data model (Implementation):
This provides concepts that are understood by end users
but that are not too far removed from the way data is
organized within the computer.
This model hides some details of data storage but can be
implemented on a computer system in a direct way.
Physical model (low-level):
This provides concepts that describe details of how data
is stored in the computer (e.g. Access paths & data
structures). These concepts are generally meant for
computer specialists, not for typical users.
Characteristics of the
Database approach
Self-description nature of the DBS.
The Database system, in addition to the application data,
it holds information about the data (meta-data). Meta-
data describes the structure of the primary Database.
Data Abstraction & Program-Data independence:
since data and programs are stored separately and
independently, changes to the structure of the data do
not necessitate changes to all programs that access that
data.
This independence is called dataprogram independence.
This facility of providing the users with a conceptual
representation is called data abstraction.
Data abstraction, as it does not include details of how the
data is stored
Support multiple views of the data:
as there are many users in a Database system, each of
whom may require a different perspective or view of the
Database.
Characteristics of the
Database approach cont..
A view may present a subset of the Database, or it may
contain virtual data that is derived from existing data.
For example, one may be interested in looking at the
name and salary of each employee in the Database while
another user may be interested in looking at the name
and age of each employee.
The age could be calculated from the stored date of birth.
Sharing of data and Multi-user transaction processing:
one of the main aims of a Database system is to provide
access to multiple users at the same time.
Data that is used in multiple applications bound to be
used concurrently by more than one.
Hence, a mechanism to control the concurrency is
required so that data stays accurate and no updates will
be lost.
Advantages of Databases
Controlling redundancy:
Eliminate the unnecessary redundancy of data by
integrating various files so that several copies of the
same data are not stored.
Data consistency:
Consistent data means that all copies of a data item have
the same values. Database reduces the risk of having
inconsistency since we do not keep duplication unless it
is necessary and well controlled
Multipurpose use of data:
Since the Database is a collection of related data, one
can obtain more information from the same data via the
existing links between various data items.
Data Sharing:
Files are owned by the department or people who use
them, while a Database belongs to the entire
organization and can be shared by all authorized users.
Advantages of Databases
cont..
Data integrity & security:
Data integrity refers to the validity and consistency of the
data. Security is the protection of the Database from
unauthorized users.
The data is more vulnerable to unauthorized access if it is
integrated. The DBA can enforce Database security at
different levels for different users.
Enforcement of standards:
Integration allows the DBA to define and enforce
standard. The DBA uses the DD to specify data formats,
naming conventions, update procedures, access rules,
etc.
Improved data accessibility and responsiveness:
Since the data is integrated, it is directly accessible to
end users with faster responses and more services. An
end user can generate a report or write a simple query
immediately at their terminals.
Advantages of Databases
cont..
Increased concurrency:
In a multi-user environment, there is a high chance of
access interference between various users. Most DBMS
manage concurrency and ensure that there is no loss of
information .
Disadvantages of Databases
Complexity:
A good DBMS makes the software extremely complex
and hence difficult to use. Database designers,
developers, administrators, and sophisticated end users
must understand most DBMS functionalities to take full
advantage of it.
Size:
Due to its complexity and breadth of functionality, a
DBMS becomes an extremely large piece of software,
occupying many gigabytes of disk space and requires
substantial amounts of memory to run efficiently.
Cost of DBMS:
Personal or a single-user DBMS may be relatively cheap
to buy, while a large mainframe multi-user DBMS
servicing a hundred of users can be extremely expensive
Disadvantages of Databases
cont..
Additional hardware costs:
You may need to buy additional storage space, or
expanding the size of ram for a better performance. .
Performance:
Since the DBMS is written to be more general, in order to
cater for many applications rather than just one, the
response time may not be acceptable for some
applications.
Higher impact of failure:
As a result of integration, all users rely on the same
resources. Hence, a failure of any component of a DBMS
has a large impact on all its users
When not to use DBMS
This could be deduced from the list of disadvantages
above. It may be more desirable to use a traditional file
system under the following circumstances:
.
The join operation cont..
The join operation cont..
Often in joining two relations, there is no matching value
in the join columns.
To display rows in the result that do not have matching
values in the join column, we use another type of joins,
the outer join.
The outer join "pads with nulls" the tuples that have no
counterpart. There are three variants:
– "left": only tuples of the first operand are padded
– "right": only tuples of the second operand are padded
– "full": tuples of both operands are padded
The (left/right/full) outer join, is a join in where the
tuples from the first relation one that do not have
matching values in the common columns of second
relation are also
included in the result relation. Figure 4, shows the three
different variations of outer joins.
The join operation cont..
The Set Operations of
Relational Algebra
Relations are sets, so we can apply set operators.
However, we want the results to be relations (that is,
homogeneous sets of tuples).
Therefore, it is meaningful to apply union, intersection,
and difference only to pairs of relations defined over the
same attributes.
The two relations must be union compatible. This means
that they must be of the same degree, n say, and the jth
attribute of one must be drawn from the same domain as
the jth attribute of the other (1≤j≤n).
The Union operation
The union of two (union-compatible) relations A and B, A
∪ B is the set of all tuples t belonging to either A or B (or
both).
For example, from figure 1, let A be the set of supplier
tuples for suppliers in London &
B the set of supplier tuples for suppliers who supply part
P1. Then the operation A ∪ B is the set of supplier
tuples for suppliers who either are located in London or
supply part
The Set Operations of
Relational Algebra cont..
P1 (or both). The union table would be:
.
The Relational calculus
cont..
The Domain -oriented relational calculus
Uses variables that take values from domains instead
of tuples of relations.
We often test for a membership condition, to
determine whether values belong to a relation.
The expression R(x, y ) evaluates to true if and only if
there is a tuple in relation R with values x, y for its
two attributes.
Let us assume that the existence of domain calculus
range variables as follows:
The domain S# ranges over the variable SX, P# over
PX, CITY over CITYX, CITYY, …,
For example, using Figure 1, get supplier-
number/part-number pairs such that the supplier and
part are not collated.
The domain calculus expression would be:
End of Unit Seven
Relational Database
System / Structured Query
Language
The aim of this lecture is to introduce the Structure
Query Language (SQL) and its main operations.
The notes will focus on the data manipulation part of
SQL, as it includes the most common and frequently
used operations in databases (i.e. queries).
The lecture also covers other topics. These are:
Introduction to SQL
Basic structure of SQL commands
Data Definition
Data Manipulation
Aggregation
Introduction to SQL
A database language should allow user to create the
database and relation structures.
The user should be able to perform insertion,
modification, and deletion of data from relations.
The language should also support the simple and
complex queries.
It must perform these tasks with minimal user effort
and command structure and syntax must be easy to
learn.
The language must be portable
SQL is a transform-oriented language with two major
components.
These are the DDL for defining the database structure
and the DML for retrieving and updating data.
SQL does not contain flow control commands.
These must be implemented using a programming or
job-control language, or interactively by the decisions
of the user.
Introduction to SQL cont..
SQL is relatively easy to learn. SQL is a nonprocedural
language, in contrast to the procedural or third-
generation languages (3GLs) such as COBOL and C
that had been created up to that time - you specify
what information you require, rather than how to get
it.
It is essentially free-format. It consists of standard
English words.
A range of users including data and database
administrators, management, application
programmers, and other types of end users can use
SQL.
ORACLE was probably the first commercial RDBMS
based on SQL. ANSI and ISO published many
standards for SQL.
The most popular and widely implemented is referred
to as SQL2 or SQL/92. The SQL examples used in this
lecture note adheres to SQL2 standard.
Basic structure of SQL commands
SQL statement consists of reserved words and user-defined
words.
Reserved words are a fixed part of SQL and must be spelt
exactly as required and cannot be split across lines.
User-defined words are made up by user and represent
names of various database objects such as relations,
columns and views.
Most components of an SQL statement are case insensitive,
except for literal character data.
SQL statements are more readable with indentation and
lineation. An extended form on BNF notation is used to
express the syntax. That is:
Upper case letters represent reserved words.
Lower case letters represent user-defined words.
| indicates a choice among alternatives.
Curly braces indicate a required element.
Square brackets indicate an optional element.
… indicates optional repetition (0 or more).
The above syntax will be used in the rest of this document
when we illustrate the syntax of various SQL commands.
Data Definition
Department
DeptName Address City
Administration Bond Street London
Production Rue Victor Hugo Toulouse
Distribution Pond Road Brighton
Planning Bond Street London
Research Sunset Street San José
Data Manipulation cont..
Q1:Specific Columns, Specific Rows. Find the salaries
of employees named Brown.
SELECT Salary as Remuneration
FROM Employee
WHERE Surname = ’Brown’;
Q2: Find all the information relating to employees
named Brown.
SELECT *
FROM Employee
WHERE Surname = ’BROWN’;
Q3: Find the monthly salary of the employees named
White.
SELECT Salary / 12 as MonthlySalary
FROM Employee
WHERE Surname = ’White’;
Simple join query
Q4: Find the names of the employees and the cities in
which they work.
Data Manipulation cont..
SELECT Employee.FirstName,
Employee.Surname,
Department.City Employee,
Employee.Dept=Department.DeptName;
Using table aliases
Q5: Find the names of the employees and the cities in
which they work (using an alias).
SELECT FirstName, Surname, D.City
FROM Employee, Department D
WHERE Dept = DeptName;
Q6: Using predicate conjunction.
Find the first names and surnames of the employees who
work in office number 20 of the Administration department.
SELECT FirstName, Surname
FROM Employee
WHERE Office = ’20’ AND
Dept = ’Administration’;
Data Manipulation cont..
Q7: Find the first names and surnames of the employees who
work in either the Administration or the Production
department;
SELECT FirstName, Surname
FROM Employee
WHERE Dept = ’Administration’ OR
Dept = ’Production’;
Q8: Find the first names of the employees named Brown who
work in the Administration department or the Production
department.
SELECT FirstName
FROM Employee
WHERE Surname = ’Brown’ AND
(Dept = ’Administration’ OR
Dept = ’Production’);
Q9:Find the employees with surnames that have ’r’ as the
second letter and end in ’n’.
SELECT *
FROM Employee
WHERE Surname LIKE ’_r%n’; 1
Data Manipulation cont..
We will use another example to illustrate other concepts of SQL
queries. The schema below presents a database snapshot (i.e.
a schema instance) for an estate agent.
Owner
Pno Street Area City Pcode Type Rooms Rent Ono Sno Bno
PA1 16 Holhead Dee Aberdeen AB7 5SU H 6 £650.00 CO46 SA9 B7
PG1 5 Novar Dr Hyndland Glasgow G12 9AX F 4 £450.00 CO93 SG14 B3
PG2 8 Dale Rd Hyndland Glasgow G12 H 5 £600.00 CO87 SG37 B3
PG3 2 Manor Rd Glasgow G32 4QX F 3 £375.00 CO93 SG37 B3
PG4 6 Lawrence St Patrick Glasgow G11 9QX F 3 £350.00 CO40 SG14 B3
PL94 6 Argyll St Kilburn London NW2 F 4 £400.00 CO87 SL41 B5
Data Manipulation cont..
Renter
Sno FName LName Address Tel_No Position Sex Salary DOB NIN Bn
SA9 Mary Howe 2 Elm Pl, Aberdeen AB2 Assistant F £9,000.0 19/2/70 WM5321 B7
SG1 David Ford 63 Ashby St, Partick, 0141-339-2177 Deputy M £18,000. 24/3/58 WL22065 B3
SG3 Ann Beech 81 George St, Glasgow, 0171-848-3345 Snr Asst F £12,000. 10/11/6 WL44201 B3
SG5 Susan Brand 5 Gt Western Rd, Glasgow 0141-334-2001 Manager F £24,000. 3/6/40 WK5889 B3
SL2 John White 19 Taylor St, Cranford, 0171-884-5112 Manager M £30,000. 1/10/45 WL43251 B5
SL4 Julie Lee 28 Mavlvern St, Kilburn 0181-554-3541 Assistant F £9,000.0 13/6/65 WA2905 B5
Data Manipulation cont..
Q10: List the details of all viewings on property PG4 where
a comment has not been supplied (Null search conditions).;
SELECT viewing.pno, viewing.rno, Date
FROM viewing
WHERE pno='PG4' AND comment IS NULL;
Q11: Produce an abbreviated list of properties arranged in
order of property type (sorting results)
SELECT Pno, Type, Rooms, Rent
FROM Property
ORDER BY Type;
Q12: List all staff with a salary greater than 10,000
(Comparison Search Condition)
SELECT Staff.Sno, Staff.Fname, Staff.Lname, Position,
Salary
FROM Staff
WHERE Salary > 10000;
Q13: List all staff with a salary between 20,000 and 30,000.
(Range search condition)
Data Manipulation cont..
SELECT staff.Sno, staff.FName, staff.LName, staff.Position,
staff.Salary
FROM staff
WHERE staff.Salary BETWEEN 20000 AND 30000;
Q14: List all Managers and Deputy Managers. (Set
membership search condition)
SELECT staff.Sno, staff.FName, staff.LName, staff.Position
FROM staff
WHERE position in ('Manager', 'Deputy');
Aggregation
ISO standard defines five aggregate functions. These are:
.
Attribute grouping and functional
dependency notation cont..
Functional dependencies help in accomplishing
the following two goals:
(a) controlling redundancy and
(b) enhancing data reliability.
If two tuples agree on the ‘X’ attribute, they
*must* agree on the ‘Y’ attribute, too.
If X Y we say X functionally determines Y.
Notice that X Y implies many-to-one or one-to-
one mapping.
Example: Consider the Emp schema below:
EMP (name, salary, dept, mgr).
Consider the following data dependencies:
1. Each employee has one salary
Name salary
2. Each employee works in only one department
name dept
Attribute grouping and functional
dependency notation cont..
Each possible P# (i.e. Part
number) value has precisely
one associated P-desc value,
then P# is a determinant of P-
desc, or P# φ P-desc.
Each possible P# value has only
one associated Qty-in-stock
value then P# is a determinant
of Qty-in-stock, or P# φ Qty-in-
stock
Dependency Diagrams
Attribute A is a determinant of B, or B is
dependent on A can be represented in a
FD diagram as:
A B
P-desc
Inference Axioms
An inference axiom is a rule that states
that: if a relation satisfies certain FDs then
it must satisfy certain other FDs.
The closure of F (usually written as F+) is
the set of all functional dependencies that
may be logically derived from F.
Often F is the set of most obvious and
important functional dependencies and
F+, the closure, is the set of all the
functional dependencies including F and
those that can be deduced from F.
The closure is important and may, for
example, be needed in finding one or more
candidate keys of the relation.
Inference Axioms cont..
A set of inference rules, called
Armstrong’s axioms, specifies how new
functional dependencies can be inferred
from given ones.
Let A, B, and C be subsets of the
attributes of the relation R. Armstrong’s F
axioms are as follows:
(F1) Reflexivity
If B is a subset of A, then A → B
(F2) Augmentation
(F3) Transitivity
If A → B and B → C, then A → C
Inference Axioms cont..
Further rules can be derived from the first
three rules that simplify the practical task
of computing X+.
Let D be another subset of the attributes
of relation R, then:
(F4) Self-determination
A→A
(F5) Decomposition
If A → B,C, then A → B and A → C
(F6) Union
If A → B and A → C, then A → B,C
(F7) Composition
If A → B and C → D then A,C →
B,D
Relational Database System /
Normalization for Relational
Databases
In the previous lecture we have covered
some informal guidelines for good
database design.
We also covered the concept of functional
dependency, which is the key factor for
grouping attributes in one relation.
We showed how bad design causes
modification anomalies such as insertion,
deletion and update anomalies.
Normalization for Relational
Databases
In this lecture we are covering a series of
formal tests on a relation to determine
whether it satisfies or violates the
requirements of a given normal form.
The objective of this process is to separate
the data (attribute values) into sets based
functional dependencies between
attributes.
The lecture will cover the following topics
1. The purpose of Normalization
2. First Normal Form (1NF)
3. Second Normal Form (2NF)
4. Third Normal Form (3NF)
5. Boyce-Codd Normal Form (BCNF)
The purpose of Normalization
.
Normalizing a logical database design involves
using formal methods to separate the data into
multiple related tables.
The characteristics of normalised database are a
large number of tables with few columns.
A database with only few tables and many
columns is indicative of an un-normalised or
partially normalised database.
The benefits of a normalised relation include:
1. faster sorting and index creation
2. Another benefit is that there will be fewer null
values for data that is either not required or not
known
3. Normalisation also reduces the opportunity for
database inconsistency.
The side effect of normalization is that as it is
implemented the number and complexity of joins
required to retrieve data is increased.
The purpose of Normalization
cont..
Normalization aims to avoid redundant
duplication. .
Data duplication does not always imply
redundancy. Data can be duplicated for efficiency
purposes.
However, this duplication must be controlled.
Duplicated data is present when an attribute has
two (or more) identical values.
A data value is redundant if you can delete it
without information being lost, so redundancy is
unnecessary duplication.
Consider Figure 1 below:
The purpose of Normalization
cont..
Part Part .Supp-Part Supp-Part
P# P-desc. P# P-desc. S# P# P-desc. S# P# P-desc.
p2 nut del. nut p2 ---- S2 P1 bolt del. bolt S2 P1 bolt
p1 bolt loss info. p1 bolt S7 P6 bolt loss no
info. S7 P6 bolt
p3 washer p3 washer S2 P4 nut S2 P4 nut
p4 nut p4 nut S5 P1 bolt S5 P1 ----
(a) (b) (C)
Duplicated, but not redundant. Redundantly duplicated data
Figure 1 Redundancy vs. Duplication
loss no
The purpose
info. of Normalization
S7 P6 bolt
t
olt
cont..S2 P4 nut
S5 P1by----
We have eliminated redundancy splitting the
table. The
(C) . this split is that P1
advantage of
description appears
Redundantly only once.data
duplicated
We have linked the two tables by including p# in
the two tables.
s. Duplication
Supp-Part-1 Part-1
S# P# P# P-desc.
S2 P1 P1 bolt
S7 P6 P6 bolt
(d) S2 P4 (e) P4 nut
S5 P1
Figure 2, Splitting the table
eletion / insertion. This is called
The purpose of Normalization
cont..
So far we implied that table structures which
permit redundancy .could be recognized by
inspection of the table occurrence.
This is not entirely accurate due to the fact that
attribute values are subject to change / deletion /
insertion.
This is called deceptive appearances.
Consider deleting the 4th row from Supp-Part (c)
table. This result in table Supp-Part-2, (f) as:
S# P# P-desc.
S2 P1 bolt (f)
S7 P6 bolt
S2 P4 nut
The purpose of Normalization
cont..
Inspection of (f) does not reveal any redundant
data. .
It could be even consistent with a rule: "No two
suppliers may supply the same p#".
Hence, a snapshot of table is an inadequate guide
to presence/absence of redundant data.
We need to know underlying rules and the DBA
must discover the rules, which apply to the
conceptual model.
These are the functional dependency rules that
we have covered in the last lecture.
The purpose of Normalization
cont..
The above discussion lead us to the fact that
whenever we split a. table for the purpose of
reducing unnecessary duplication then we must
maintain two important properties during the
decomposition.
The first one is the Lossless-join property, this
enables us to find any instance of the original
relation from corresponding instances in the
smaller relations.
The second property is to maintain the
Dependency preservation property, this enables
us to enforce a constraint on the original relation
by enforcing some constraint on each of the
smaller relations.
First Normal Form (1NF)
.
The data as it first collected may or may not be
suitable to be stored in a relational table.
In order to be able to place it in a relational table
it must have certain criteria.
This main basic criterion is summarized by each
single cell in a relational table must only hold a
single atomic value.
When we store such data into a table format we
refer to the table as an Unnormalized Form
(UNF).
The creation of such a table results from the
process of transforming the data from the
information source (e.g. a sample form) into table
format with columns and rows.
First Normal Form (1NF) cont..
.
Consider the Student_Course relational schema
and the table snapshot below:
STU_COURSE