0% found this document useful (0 votes)
5 views

Database Lecture Notes

The document provides an overview of database systems, covering topics such as data models, database architecture, and the evolution from traditional file-based processing to modern database management systems (DBMS). It highlights the importance of data organization, the roles of various database users, and the advantages of using a DBMS over older methods. Recommended readings and a brief summary of the lecture content are also included.

Uploaded by

nankondenyondo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Database Lecture Notes

The document provides an overview of database systems, covering topics such as data models, database architecture, and the evolution from traditional file-based processing to modern database management systems (DBMS). It highlights the importance of data organization, the roles of various database users, and the advantages of using a DBMS over older methods. Recommended readings and a brief summary of the lecture content are also included.

Uploaded by

nankondenyondo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 236

DATABASE SYSTEMS

BY MR. KASHALE
2023
Syllabus Content
 An overview of database systems
 Data Models and Implications
 Database Architecture and its’
Environment
 Database Functions and Languages
 Conceptual Database Modelling
 Relational Database Systems
 Relational Database System / Relational
Database Languages
 Structured Query Languages (SQL)
 Functional Dependencies
 Normalizations for Relational DBS
 Procedural DBL and PL SQL
Recommended Books
 Data Management: Databases and
Organisations. Watson, Richard T. John Wiley
and Sons: 5th Edition 2005
 Data Mining: Concepts and Techniques. Han,
Jiawei; Kamber, Micheline. Elsevier: 2nd
Edition 2006
 Data Modelling Essentials. Simsion, Graeme;
Witt, Graham. Morgan Kaufmann Publishers:
3rd Edition 2005
 Database Management Systems.
Ramakrishnan, Raghu; Gehrke, Johannes.
McGraw-Hill: 3rd Edition 2002
An overview of database
systems
 The aim of this lecture is to introduce you to some
of the DB basic terminology that you need to know.
 Topics covered are listed below:
1. Data and information.
2. Traditional file-based processing versus database
processing;
3. Evolvement of Database systems;
4. Database Applications and users;
Data and information..
 The term data refers to raw facts and figures,
such as orders and payments, which are
processed into information.
 Data is unprocessed facts or figures.
 Information on the other hand refers to the
implicit association between data.
 When the data is used in context and has a
meaning it then becomes information.
 Data represent the values physically recorded in
the Database; e.g. 10023.
 Information, though, refers to the meaning of
those values as understood by some users; e.g.
Mohammed’s Employee number is 10023.
 Data is the technological term and information is
the business term.
 A common misconception is that software is also
data. Software is however “executed”, or “run”,
by the computer whilst data is "processed."
Data and Information cont..
 Electronic storage of data or information can take many
forms. Forms such as files, databases, text documents,
images and digitally encoded voice and video.
 Data is a basic resource of the organisation and must be
collectively organized and managed to support all business
functions
Traditional file-based processing
 The file-based system is the predecessor of Database
systems.
 It is a collection of applications (data + programs)
that perform specific tasks to support some business
functions.
 In the earlier days of computing, a simple approach
was to imitate the way in which paper files had been
organized.
 Each department kept the records that it needed for
its own applications.
 The Accounting department kept accounts payable
and receivable, sales and marketing kept information
on customers and products, and so forth.
 These files were separate and not cross-indexed; such
"stand-alone" files are sometimes called flat files.
Traditional file-based processing
cont..
 Programs were specially written for a specific task to
process a batch of records from these flat files.
 Usually, it was the case that the programs used for
one department's records would not understand the
records designed for another department's
applications.
 Each set of "application programs" owned its own set
of files, and this approach is known as file processing
 Such file processing approaches resulted in some
serious problems for the business organisation.
 Since data was organised according to particular
application, and each application held its own data in
"flat" (meaning unrelated) files, this meant that a lot
of duplicated data was being maintained
Traditional file-based processing
cont..
 For example, a customer might have had a "sales"
record held by the Marketing Department and a
"receivables" record held by Accounting.
 Each might have a separate record for the customer,
which included the customer's mailing address.
 Of course, one drawback to file based processing was
wasted file space, and storage was, in the early days
of computing, very expensive.
 This problem of duplicated data values is called data
redundancy.
 A worse problem however was that, inevitably, data
values were often inconsistent between the files, this
led to confusion and inaccuracies
Traditional file-based processing
cont..
 For example, if a customer’s address changed, the
sales department might alter their files to reflect this
change.
 A mechanism may not be in place or fails which alters
the accounting departments address records.
 The fact that each program was linked to its own
particular file organization caused more headaches.
 When new programs were written, data often had to
be restructured and, conversely, when data was
restructured, programs had to be revised.
 This problem of data dependence made program
maintenance expensive and much more difficult to
develop new programs.
 Typically, with a file-based system, data for various
applications will be stored in a collection of flat files.
This approach has some disadvantages including the
following:
Traditional file-based processing
cont..
1. Difficult to handle complex data
2. Low data quality
3. Redundancy and inconsistency
4. No central management
5. Difficult to maintain and share in multi-user
environments
Traditional file-based processing
cont..
 Tradition-based files processing system example:
Evolvement of Database systems
 There are two key factor points that need addressing, in
order to solve the problems listed above or similar
problems.
1. First, programs and the data need to be separated and
stored separately and independently.
2. Second, access control and manipulation of data need
to be developed beyond those imposed by the
application programs
 This new approach should support management of vast
amounts of data, including efficient access, some degree
of independence from the application programs,
 quick application development, support for concurrent
access, recovery from system failure, security of data,
etc.
 Such a system, developed to support these features is
called a Database Management System (DBMS).
 The DBMS therefore is a collection of programs that
manage large amount of data, which we call a Database
Evolvement of Database systems
cont…
 The database approach rests upon the concept of the
modern database which is formally defined as an
integrated, self-describing collection of related records
organized into files which are often referred to as
database tables.
 Application programs do not create or maintain these
database tables themselves. This is the work of a DBMS.
 The diagram below shows the Database System
Approach:
Evolvement of Database systems
cont…
Evolvement of Database systems
cont…
 The DB hence, is a collection of related data.
 They must be related, otherwise there is no context in
the data and the data becomes useless.
 Data becomes useful only when it is in context. It
becomes information.
 The definitions of all data in the DB are called Meta-data
(data about data) and it is stored in a Data Dictionary
(DD).
 The term Information Resource Dictionary System has
been accepted with a known published standard.
 In the database approach, company data resources are
held in a common collection of data files (often referred
to as data tables) which are organized so that they can
be easily accessed by all application programs which
have need for that data.
 The database approach reduces maintenance of
application programs and data, and speeds up
development of new applications.
Evolvement of Database systems
cont…
 The Database approach rests upon the concept of the
modern database which is formally defined as an
integrated, self-describing collection of related records
organized into files which are often referred to as
database tables.
 Application programs do not create or maintain these
database tables themselves.
 This is the work of a special type of system program
called a database management system (DBMS)..
 The database partitions the data into tables that are
designed to each hold one very narrow type of data; a
person's name and address for example.
 These files (tables) are then indexed in some way
between each other.
 For example the Marketing Department may have an
application which reports the customer's sales, while the
Accounting department has an application concerning a
customer's payment record.
Evolvement of Database systems
cont…
 Each of these applications would access the same table
that provided the customer's address.
 Sales and accounting information would be kept in
separate database tables.
 Database processing involves assembling the data
contained in the tables into a form that is suitable for
the end-user's particular application needs.
Database Applications and
users;
 They are various classifications for Database users.
 One classification, groups them into whether they can be
seen or not on the site of the Database system.
 The first group represents Actors on Scene. These include
persons whose jobs involve the day-to-day use of a large
database.
 Examples of these are the Data and Database
administrators, Database Designers, End Users: Casual,
Naive, Sophisticated and Stand-alone users, System
Analysts and Application Programmers.
 The second category represents Workers behind the
Scene.
 These include DBMS Designers and Implementers, Tool
Developers and Operators and Maintenance Personnel.
Database Applications and
users cont..
 The Data administrator (DA) is the chief person who
oversees and manages all data resources within the
organization.
 This including Database planning, development and
maintenance of standards, policies and procedures, and
conceptual/logical Database design.
 The Database administrator (DBA) is responsible for the
physical actualization of the Database.
 The role of the DBA is more technical compared with that
of the DA. The DBA requires deep knowledge of the
target DBMS and its environment.
 For example, the DBA is responsible for authorizing
access to the Database, for coordinating and monitoring
its use, etc.
 Other Database users are Database designers. They are
of two types, logical and physical Database designers.
Database Applications and
users cont..
 Logical designers are concerned with identifying the data
elements and they must be aware of the business rules.
 The logical Database designer usually gets involved in
producing the conceptual Database model, which is
independent from any implementation details that is
related to the target DBMS.
 The physical Database designer takes the logical data
model (produced by the logical Database designer) and
translates it into a physical implementation (a set of
tables and integrity constraints in a relational model).
 The physical Database designer select suitable storage
structures, access methods, and design any necessary
security measures for the data.
 The physical Database designer must be aware of all
alternative implementations of the same application and
choose the most cost-effective strategy.
 The Conceptual and logical Database design is concerned
with the “what” while physical Database design is
concerned with the “how”.
Database Applications and
users cont..
 End users, who are another type of DB users, have a set
of requirements.
 These requirements are translated into specifications for
canned transactions.
 An application programmer must implement these
specifications.
 The application programmer tests, debugs, documents,
and maintains all developed transactions.
 End users are of different types, depending on their
knowledge and usage of the DBMS.
 These include Naïve users who are typically unaware of
the DBMS. These users access the Database in the
simplest possible way through application programs.
 They do not need any knowledge of the DBMS.
 The other type of end users is called sophisticated users.
They are familiar with the structure of the Database and
the facilities provided by the DBMS. They may use a
stand-alone Database programming language such as
Database Applications and
users cont..
 They may use a stand-alone Database programming
language such as SQL.
Lecture Summary
 A successful Information System depends upon the
database approach.
 Older file processing approaches had some serious
problems.
 Data was organized according to particular application,
and each application held its own data in flat files
(meaning unrelated) files.
 Data values were often inconsistent between the files,
which led to confusion and inaccuracies.
 When programs were written, data often had to be
restructured and, conversely, when data was
restructured, programs had to be revised.
 The database approach reduces maintenance of
application programs and data.
 A database is an integrated, self-describing collection of
related records organized into files, which are often
referred to as database tables.
Lecturer Summary cont..
 This approach facilitates the development of new
applications and encourages data sharing among
organizational units.
 There are a number of different types of users. The most
important one is the DBA who is ultimate responsible
person for the maintenance and running of the Database.
The End
Database Models and Implications
Aims and Objectives
 The aim of this lecture is to continue from
the last lecture on the introduction to
Databases.
 The lecture will cover the following topics
briefly.

1. Data Models;
2. Characteristics of the Database approach;
3. Advantages and Disadvantages of
Databases
4. When not to use a DBMS;
Data Models
 A model is a set of concepts used to represent some
aspects, one or more, of a UoD, domain or area of
interest, of some requirement; it’s an abstraction of the
important things in the domain.
 A model is a high abstract perception of the domain. The
modeler (analyst or database administrator) plays the
role of a philosopher-king, in determining what
knowledge to represent, how to organize and express it;
and what constraints to impose to keep it as a consistent,
faithful model of the outside world.
 To do a good job the modeler/analyst must be sensitive
to semantic issues and have a good working knowledge
of conceptual structures
 Building a good data model depends on the process of
data analysis. The objectives of data analysis are two
folds:
Data Models cont..
1. Investigate the "Natural Structure" of the information to
be stored, i.e. deciding exactly what the relationships
are between individual datum.
2. Produce a representation (or model) of that structure,
which will be suitable for easy conversion into a
database structure description (schema).
 There are various ways in which models are classified.
 The main criterion normally used to classify DBMSs is the
data model on which the DBMS is based.
 A Data model is a set of concepts that can be used to
describe the structure of a Database.
 There are various categorizations for data models. These
are Conceptual, Implementation and Physical models
 The most popular representational model is the
Relational model, which will be covered in depth in the
coming lectures while the object-oriented approach will
be covered at a later stage during the course.
Data Models cont..
 In the next lecture we will describe, what is known as the
three level architecture for a Databases system.
 Each of these levels focuses on a specific aspect in
Database design. These are:
 Conceptual data model (High-Level):
 This provides concepts that are close to the way many
users perceive data (e.g. ERD). It is a set-based data
model and DBMS independence
 Representational data model (Implementation):
 This provides concepts that are understood by end users
but that are not too far removed from the way data is
organized within the computer.
 This model hides some details of data storage but can be
implemented on a computer system in a direct way.
 Physical model (low-level):
 This provides concepts that describe details of how data
is stored in the computer (e.g. Access paths & data
structures). These concepts are generally meant for
computer specialists, not for typical users.
Characteristics of the
Database approach
 Self-description nature of the DBS.
 The Database system, in addition to the application data,
it holds information about the data (meta-data). Meta-
data describes the structure of the primary Database.
 Data Abstraction & Program-Data independence:
 since data and programs are stored separately and
independently, changes to the structure of the data do
not necessitate changes to all programs that access that
data.
 This independence is called dataprogram independence.
This facility of providing the users with a conceptual
representation is called data abstraction.
 Data abstraction, as it does not include details of how the
data is stored
 Support multiple views of the data:
 as there are many users in a Database system, each of
whom may require a different perspective or view of the
Database.
Characteristics of the
Database approach cont..
 A view may present a subset of the Database, or it may
contain virtual data that is derived from existing data.
 For example, one may be interested in looking at the
name and salary of each employee in the Database while
another user may be interested in looking at the name
and age of each employee.
 The age could be calculated from the stored date of birth.
 Sharing of data and Multi-user transaction processing:
 one of the main aims of a Database system is to provide
access to multiple users at the same time.
 Data that is used in multiple applications bound to be
used concurrently by more than one.
 Hence, a mechanism to control the concurrency is
required so that data stays accurate and no updates will
be lost.
Advantages of Databases
 Controlling redundancy:
 Eliminate the unnecessary redundancy of data by
integrating various files so that several copies of the
same data are not stored.
 Data consistency:
 Consistent data means that all copies of a data item have
the same values. Database reduces the risk of having
inconsistency since we do not keep duplication unless it
is necessary and well controlled
 Multipurpose use of data:
 Since the Database is a collection of related data, one
can obtain more information from the same data via the
existing links between various data items.
 Data Sharing:
 Files are owned by the department or people who use
them, while a Database belongs to the entire
organization and can be shared by all authorized users.
Advantages of Databases
cont..
 Data integrity & security:
 Data integrity refers to the validity and consistency of the
data. Security is the protection of the Database from
unauthorized users.
 The data is more vulnerable to unauthorized access if it is
integrated. The DBA can enforce Database security at
different levels for different users.
 Enforcement of standards:
 Integration allows the DBA to define and enforce
standard. The DBA uses the DD to specify data formats,
naming conventions, update procedures, access rules,
etc.
 Improved data accessibility and responsiveness:
 Since the data is integrated, it is directly accessible to
end users with faster responses and more services. An
end user can generate a report or write a simple query
immediately at their terminals.
Advantages of Databases
cont..
 Increased concurrency:
 In a multi-user environment, there is a high chance of
access interference between various users. Most DBMS
manage concurrency and ensure that there is no loss of
information .
Disadvantages of Databases
 Complexity:
 A good DBMS makes the software extremely complex
and hence difficult to use. Database designers,
developers, administrators, and sophisticated end users
must understand most DBMS functionalities to take full
advantage of it.
 Size:
 Due to its complexity and breadth of functionality, a
DBMS becomes an extremely large piece of software,
occupying many gigabytes of disk space and requires
substantial amounts of memory to run efficiently.
 Cost of DBMS:
 Personal or a single-user DBMS may be relatively cheap
to buy, while a large mainframe multi-user DBMS
servicing a hundred of users can be extremely expensive
Disadvantages of Databases
cont..
 Additional hardware costs:
 You may need to buy additional storage space, or
expanding the size of ram for a better performance. .
 Performance:
 Since the DBMS is written to be more general, in order to
cater for many applications rather than just one, the
response time may not be acceptable for some
applications.
 Higher impact of failure:
 As a result of integration, all users rely on the same
resources. Hence, a failure of any component of a DBMS
has a large impact on all its users
When not to use DBMS
 This could be deduced from the list of disadvantages
above. It may be more desirable to use a traditional file
system under the following circumstances:

1. The Database & applications are simple (e.g. one table),


well defined, and not expected to change.
2. There are stringent real-time requirements for some
programs that may not be met because of DBMS
overhead.
3. Multiple-user access to data is not required.
End of Unit 2
Database Architecture and
its’ Environment
 The aim of this lecture is to introduce you to
the model architecture of a DB system. The
lecture will cover the following topics:

 Three-level schema architecture ;


 External level
 Conceptual level
 Internal level
 Logical data independence;
 Physical data independence
 The difference between Conceptual,
Logical and Physical data models
 Database schema vs. database instance
Three-level schema
architecture
 There are a number of features or advantages that are
associated with having a DBMS.
 Having different level of abstractions in a database
modeling helps achieving most of the advantages
mentioned earlier in the course.
 These include the support of data and program
independence, multiple user views and use of a catalog
to store the database description (schema).
 In 1971 (and later in 1978), a Special interest group
proposed a three level architecture to help achieve and
visualize these features.
 The name of this architecture is the ANSI-SPARC
architecture named after the committee that proposed it,
the American National Standard Institute, Standards
Planning And Requirements Committee.
 The aim of presenting this architecture is to provide a
framework that is useful in describing general database
concepts and for explaining the structure of specific
Three-level schema
architecture cont..
Three-level schema
architecture cont..
 The proposed model architecture seems to
fit most systems reasonably.
 To support different level of data abstraction,
the data in a database is described at three
different levels.
 These are the external schema at the
external level, the physical or internal
schema at the lowest level (nearest to the
storage level),
 and the conceptual schema in the middle.
The next three subsections describe these
levels in more details
The External level
 The external level (also known as the user logical level)
is the one closest to the users. It is the level concerned
with the way the data is seen by individual users.
 The user can be either an application programmer or an
end user.
 The external schema is used to describe the external
level.
 It contains a description of a portion of the database
that is of concern to the specific user.
 It is his or her view of the database. It includes only
those entities, attributes and relationships that are of
interest to the user.
 The external view is described in terms of external
records, which may be different form the actual stored
records.
 The external view could even possibly represent records
in a different model of that of the stored level
The Conceptual level
 The level represents the community view of the
database as seen by the database administrator.
 It is also known as the community logical level or even
sometimes just the logical level.
 The conceptual schema is used to describe what data is
stored in the database and the relationships among the
data.
 The description includes the structure and constraints for
the whole database.
 While the external schema describes how users see the
data, the conceptual schema defines the logical structure
of all data in the database.
 The conceptual schema is defined by a Data Definition
Language (DDL). The DDL does not involve any
consideration of the physical representation or access
techniques at all.
 DDL definitions must relate to the information content
only. This is to achieve data independence
The Conceptual level cont..
 Both the external and the conceptual levels have an
important role in the database design process by
providing a high-level data model that is independent of
any DBMS.
 It is a simple and readily understood means for
communicating the development process among various
users.
 During the requirement analysis phase, helps analysts to
communicate with end users, and during the design
phase helps designers to communicate among
themselves including the database administrator.
 This higher level of abstraction helps developers to
capture the semantic of a system at an early stage.
 Students(sid: string, name: string, login: string, age:
integer, gpa:real)
 Faculty(fid: string, fname: string, sal: real)
The Internal level
 It is the description of the implementation of the
conceptual schema by means of physical storage
structures.
 It summarizes how the data are stored on secondary
storage devices such as disks and tapes. However, it is
still one level above the actual physical storage, which is
usually managed by the operating system.
 The internal level does not deal with the physical records
but it deals with the internal records (stored records).
 Details of how the address space is mapped to physical
storage are highly system specific (e.g. a block or a
page) and are deliberately omitted from the general
three-level architecture.
 The internal schema describes the various record types,
what indexes exist, how stored fields are represented,
what physical sequence the stored records are in, etc.
The Internal level cont..
Logical data independence
 External schemas or views are in principle generated on
demand from the corresponding definitions in the
conceptual schema.
 If the underlying data definitions in the conceptual
schema are changed, then the definition of the views can
be modified so that the same data can be presented at
the external level.
 For example, suppose that the Faculty definition in the
previous student enrollment example is replaced by the
following two definitions.
 Faculty_public(fid: string, fname: string, office: integer)
Faculty_private(fid: string, sal: real)
 Intuitively, some confidential information about faculty
has been replaced by a separate definition and
information about offices has been added.
 The CourseInfo view definition can be redefined in terms
of Faculty_public and Faculty_private, which together
contain all the information in Faculty, so that a user who
queries CourseInfo will get the same answer as before
Logical data independence
cont..
 From the above example, we can see that the user can
be shielded from changes in the logical structure of the
data, or changes in the choice of definitions to be stored.
 This immunity of external schemas to changes in
conceptual schema is referred to as logical data
independence.
 For example, the addition or removal of entities or
attributes in the conceptual schema should not require
changes to external schema or rewrites of application
programs.
 Data independence is intended to ensure that the only
elements requiring modification in the system are those
are directly and logically involved in the alteration.
 The logical data independence permits the conceptual
and external models to be altered without affecting the
internal model or the application programs
Physical data
independence
 It refers to immunity of conceptual schema to changes
in the internal schema. Internal schema changes such as
using different file organizations, or storage
structures/devices, should not require change to
conceptual or external schemas.
 Changes to the internal schema may be needed because
some physical files had to be reorganized.
 For example, in order to enhance database performance,
the database administrator (as part of database tuning
activities) may create additional access structure.
 If the same data remain in the database as before, we
should not have to change the conceptual schema.
 The above discussion covers the three-level schema
architecture, that is a convenient tool for the user to
visualize the schema levels in a database system.
 This separation may not be so obvious in most DBMSs
The difference between
Conceptual, Logical and
Physical models
 We referred to the conceptual schema earlier as the
description of the conceptual level or the user logical
model.
 This description of the logical model is oriented towards
a specific data model such as a relational, network,
hierarchical or an object data model. However, it is
independent of the target DBMSs.
 Conceptual model, however, is used to describe data in
a way that is completely independent of any data model.
 The goal is to represent the concepts of the real world.
Conceptual models are used in the early stages of
database design. The most popular is the ERD model.
 Physical data model is concerned with the physical
structure and layout of the data in secondary storage
devices.
 It covers various file organization techniques that could
be used to physically store the data. For example,
Database schema vs.
database instance
 A database schema refers to the description of a
database (or meta-data or intension).
 A displayed schema is called schema diagram.
 Database instance refers to the data in the database at
a particular moment in time.
 It is also called a database state (or set of occurrences or
instances).
End of Unit Three
Database Functions and
Languages
Aims and Objectives
 The aim of this lecture is to continue from
the previous lecture on the model
architecture of a DB system. The lecture will
cover the following topics:
1. Database languages and interfaces.
2. Typical functions of a DBMS.
3. Data & Database Administration.
4. Information Resource Dictionary System
(IRDS).
5. Lifecycle of Database system development.
Database Languages and
Interfaces
 The DBMS aims to serve various users. Each type of
these users is associated with one or more stages of the
development process of a database system.
 Their interaction and requirements varies from one stage
to another and from one user to another.
 Hence, the DBMS should offer a variety of languages for
the management of data.
 The language should support the interaction of a user
with any schema of the three-level architecture described
in earlier lectures, as well as mapping between various
levels.
 Therefore, in DBMSs where there is a true three-schema
architecture, there are three language types
Database Languages and
Interfaces cont…
1. The view definition language (VDL) which is used to
specify user views and the mapping to the conceptual
schema.
2. The data definition language (DDL) is used to specify the
data types and structures, and the constraints on the
data to be stored in the database.
3. The storage definition language (SDL) is used to specify
the internal schema. In most database systems, there is
no strict separation of levels.
4. Having created the database structure, we need to
populate the database with data and then use the
database for routine data manipulation such as general
update and retrieval queries. The language that is used
to manipulate the data is called a data manipulation
language (DML).
5. The database administrator may want to enforce some
scrutiny and authorization measures on the database.
He or she would require another type of language, called
a data control language (DCL).
Database Languages and
Interfaces cont…
 Commonly, the above types of languages are not
considered distinct language.
 Rather, a comprehensive integral language is used that
includes constructs for conceptual schema definitions and
view definitions, and data manipulation.
 A typical example of a comprehensive integrated
language is the structured query language (SQL), which
is a relational database language.
 The DML can be of two main types. A high-level or non-
procedural DML and Low-level or procedural DML. The
first can be used on its own as a stand-alone (query
language) to specify complex database operations in a
concise manner.
 The non-procedural or declarative DML processing is also
known as a set-at-a-time processing DML. Declarative
DML is easier to use since the user only needs to declare
“what” is required rather than specifying the procedure of
“how” to retrieve the data, this is a feature of the
procedural DML
Database Languages and
Interfaces cont…
 The development of the declarative DML aims to provide
user-friendly interfaces to its users.
 This is specifically useful for casual end users when they
specify their requests. Programmers on the other hand
would use the DML in embedded form. The user friendly
interfaces include
1. Graphical user interface: the schema is displayed in
diagrammatic form. The user can specify a query by
manipulating the diagram.
2. Natural language interface: the user request is
expressed in natural language text. Usually this text is
translated by the DBMS into a conventional query
language such as SQL.
 Most of the friendly user interface mentioned above
requires some sort of a high level non-procedural
programming language.
 The user is not expected to write the steps that a
program needs to perform a task, but instead defines
parameters for the tools that use them to generate an
Database Languages and
Interfaces cont…
 These tools consist of higher-level components and
support what is known as the Fourth-Generation
Language (4GL).
 A typical 4GL should include a very high-level language
that is used to generate applications, presentation
languages (query language and report generators) and a
specialty language such as a database programming
language.

Typical functions of a
DBMS
 Maintenance of data structures: The DBMS must furnish
a data dictionary in which description of data items are
stored and which are accessible to users, as well as the
DBMS itself..
 Language for storage, retrieval and update of data: The
DBMS should provide a language that supports
operations at various levels of the three-level
architecture mentioned above for different users. These
include the ability to store, retrieve and update data in
the database.
 Facilities for ensuring data integrity, recovery, and
security: Data integrity refers to the state of a database
in which all constraints are fulfilled. The DBMS must be
able to recover the database to a consistent state. The
database security usually refers to the protection of the
database from malicious access or misuse
Typical functions of a
DBMS cont..
 Various utilities for the DB administrator: A DBMS should
provide a set of utility services that help the database
administrator to effectively carry out his job. Examples of
these include, import, monitoring, and indexing facilities.
 Separation of physical and logical data structures (data
independence): The separation of programs from the
actual structure of the database is one of the most
important features of a DBMS
 Multi user access (concurrency control): Simultaneous
access to the DB by many users, including those who are
altering the data must be furnished by the DBMS.
 Support for data communication: With the current
distribution trends in most applications, users tend to
access their data from terminals, which may be local or
remote from the host DBMS.
 The DBMS receives requests as communication messages
and respond in a similar way
Data & Database
Administration
 The database system is built around the concept of a
centralized data resource, which is shared by many users
in an organization.
 Managing this resource implies that there is a need for
effective management and control of this data. This
management role is at a senior management level.
 The manager of this resource is called the data
administrator.
 The data administrator should understand the data and
the needs of the enterprise with respect to that data.
 The data administrator job is not a technical one.
However, some appreciation of the capabilities of the
database system would be useful.
 The job of the database administrator involves many
tasks. The database administrator creates the actual
database and implements policies and procedures to
control and protect the database.
Data & Database
Administration cont..
 The database administrator works within the guidelines
set by the data administrator to select the storage
structure and access strategies, defining authorization
checks and validation procedures,
 defining a storage strategy for back-up and recovery,
managing data changes and maintaining database
programs, monitoring performance and responding to
changes in user requirements.
 The database administrator is responsible for the overall
control of the system at a technical level.
Information Resource
Dictionary System
 There are many tools used to support the database
administrator tasks in an enterprise.
 Some of these tools are used during the design phase of
a database system, such as Computer Aided Software
Engineering tools (CASE) that dedicated for database
design.
 Most existing tools need some sort of a directory or a
dictionary to maintain all definitions and usage of data
and other database objects.
 This repository of metadata is known as system catalog
or a data dictionary system (DDS).
 As metadata needs to be more accessible and sharable,
there is a need for a tool that can interface and support
various functions found within a database system.
 This is known as the Information Resource Dictionary
System (IRDS).
 Hence, the system catalogue or data dictionary should
be accessible to various users and IRDS defines a set of
access methods for a data dictionary.
Information Resource
Dictionary System cont..
 Other objectives of deploying a DDS include:
1. Using it to help establish effective communication
between the designer and users and among the users
themselves.
2. Controlling the costs of developing and maintaining
applications.
3. Assist in achieving data independence by permitting
applications to access data without knowledge of the
location or storage characteristics of the data in the
system
4. Supporting meta-data (data about data) across
computing environments.
 There are a number of beneficiary users of the DDS.
 Database administrators who use the DDS as an
information source for designing, monitoring, and tuning
physical DB structures.
 Data administrator would use the DDS to inventory the
data resource, implement standards, design external
Information Resource
Dictionary System cont..
 Application personnel would use the DDS to design the
system, to analyze system changes, and to reduce
program coding.
 Operating staff retrieve information about various jobs
from the data dictionary.
 End users can obtain from the data dictionary
descriptions of the data used in their external schemas,
and data auditors may examine the documentation
provided by the DDS.
 There are various tools that may interface to the DDS,
see Figure 1. There are two types of interfaces. The first
type of interface is with the people.
 It may involve database administrator, system
programmer, systems analyst, application programmer,
management, end-user and auditor.
 The second type of interface is with various software
systems such as DBMS, compilers, operating system and
report generators. A DDS can be either an active DDS or
a passive DDS.
Information Resource
Dictionary System cont..
Lifecycle of Database
system development
 The database application lifecycle is inherently
associated with the information system lifecycle.
 The stages as shown in Figure 21, are not strictly
sequential. The process is iterative with feedback loops.
The stages shown in Figure 2 can be grouped into four
main areas.
 These are requirement’s elicitation; database design
(conceptual, logical, and physical design); database
implementation and operational maintenance.
 Prior to the first phase of our requirement’s elicitation, a
plan must be put forward showing each stage of the life
cycle and how it can be performed efficiently and
effectively.
 his is known as the planning phase. The next phase is the
system definition phase, where the scope and boundaries
of the database application, which includes the major
application areas and user groups, are defined.
Lifecycle of Database
system development cont..
 The phase of requirement collection and analysis is
considered to be the most important phase in building
any information system.
 It is concerned with collecting the data and analysing the
user requirements in order to build the database
application.
 The output of the requirement analysis phase is used to
produce the conceptual model. The conceptual model is
used, in turn, to create the physical model. At this stage,
a suitable DBMS may be selected.
 Various user preferences should be considered at this
stage. This involves designing the user interface and the
application programs that use and process the database.
This may entail building a prototype.
 The prototype is a working model of the database
application, which allows the designers or users to
visualise and evaluate how the final system will look and
function.
Lifecycle of Database
system development cont..
 The physical design can be implemented at this stage,
even if it has or not been built.
 The implementation phase involves the creation of the
external, conceptual and internal definitions and the
application programs.
 Validating user requirements against the application
programs then tests the system. The testing phase aims
at executing the application programs with intention to
find errors.
 After the testing phase, the operational maintenance
process continues. This process involves monitoring &
maintaining the system.
Lifecycle of Database
system development cont..
End of Unit Four
Conceptual Database
Modeling
 The aim of this lecture is to introduce you
to the purpose and importance of conceptual
modeling and to learn the basic concepts
associated with Entity-Relationship Model
(ERM).
 The lecture will cover the following topics
briefly:

1. Entity-Relationship Model concepts


2. Top-down and bottom-up modeling
approaches
3. Development of the Conceptual Data Model
(CDM) through case studies
Entity-Relationship Model
concepts
 One of the initial steps in Database design and
development process is the Data analysis and building
the data model.
 A data model should be developed before designing and
developing the actual tables used to hold the data.
 There are various approaches for data model
development. The most common approaches are the top-
down and bottom-up approaches
 Top-down approach
 The top-down approach to data modeling follows these
steps in order:
1. Identify data entities: Any business "objects" -- persons,
places, things or events. These entities may be tangible
(such as a product) or intangible (such as a business
transaction). Deciding on an object to be an entity or an
attribute of an entity depends on the application and
how important is that object for holding information for
that particular application
Entity-Relationship Model
concepts cont..
 2. Determine attributes of the entities: Attributes are
data directly concerning the entity. They are used to
describe the entity itself regardless of its association to
other entities in the environment.
 3. Determine the nature of the relationships: This
depends upon established business policies. There are
two types of constraints (i.e. applicable business rules)
need to be addressed here:
 The first one is called the cardinality of a relationship.
 This refers to the number of instances that participate
from each concerned entity types in a relationship.
 The second type of constraints is to determine the
importance of the instances’ participation in a specific
relationship.
 If it is applied to every instance of an entity, then it is
called a total or a mandatory participation (it is a must
for each instance belongs to that entity type).
Entity-Relationship Model

concepts cont..
If only parts of the instances participate in a relationship or in
other words, an instance may or may not participate in that
relationship, then this is called a partial or optional
participation.
 The advantage of this approach is that usually results in a data
model that is well organized. The disadvantage is that details
can be easily overlooked.
 Bottom-up approach
 The steps for this approach are:
1. Gather information on data used by the organization by
examining current files and evaluating existing reports and
forms
2. Group gathered information into Entities and Attributes
3. Identify relationship and determine their nature.
 The advantage of this approach is insuring that no important
data is overlooked. The disadvantage though is an overall
organization of the model may not be so apparent.
 It is often a good idea to combine both the top-down and
bottom-up approaches to crosscheck the design for both
completeness and good organization.
Entity-Relationship Model
Symbols
Development of the
Conceptual Data Model
(CDM)
 The following terminology will be used extensively:
 Entity: An object or concept of fundamental importance
about which there is a need to record data; the object or
concept rather than the qualities relating to the object
(e.g. Lecturer, Course)
 Attribute: A descriptive characteristic of an entity (e.g.
Lecturer-name, Course duration).
 Relationship: An association between one or more
entities (e.g. between Lecturer & Course, there is the
relationship "supervisor").
 The development of the CDM proceeds in four stages.
These are:
 1. Establish what information you have available, and
what (if any) extra information you require.
 2. Decide what constitutes Entities. This stage requires
an in-depth understanding of the meaning of the data
Development of the
Conceptual Data Model
(CDM) cont..
 Once the entities have been fixed, the attributes comprising
these entities, and the relationships between them may be
noted.
 Note however, that this well may be an iterative process,
with the initial entities being broken down further into
smaller entities, linked by further relationships.
 3. Produce a diagrammatic representation of the
relationships between the entities.
 4. Complete the diagram by the addition of the other
characteristics of the relationships.
 Note that in practice, the above steps are often taken
together, rather than in strict sequence.
 E/R modelling is a tool for database design. A tool for
communications between database designer and users
during system analysis and design process.
 It is used for constructing a concept data model, which
is independent of the development platform (or software)
Development of the
Conceptual Data Model
(CDM) cont..
 E/R model consists of Entities, Attributes, Relationships,
Subtypes and Supertypes.
 Case study /1: Data Analysis
 Consider the example problem given below for the
Lectures/Courses Example. The information involved as
obtained for the written description is as follows:
 Courses
1. Duration (years);
2. No. of Modules to be studied each year;
3. Course name;
4. Course-code;
5. Lecture name of the Supervisor;
 For each module of the course:
1. Module name,
2. Description,
3. Names of the pre-requisite modules,
4. Name of the lecturer who teaches the module;
Development of the
Conceptual Data Model
(CDM) cont..
 Lecturers
1. Name
2. Address, etc.
 For each module that the lecturer can teach:
1. Module-name
2. Experience in teaching it.
 Now, while there are no definitive rules as to what
constitutes an entity, the principle rule is to look for
"objects or concepts of fundamental interest" about
which we have information.
 Obvious candidates are COURSE and LECTURER.
 Other points to look for when trying to isolate entities
and their attributes are:
 Attributes of entities, which are associated with another
entity.
 Attributes which appear in more that one entity
Development of the
Conceptual Data Model
(CDM) cont..
 Attributes which themselves have attributes.
 Attributes which occur a variable number of times.
 Remember that these are only points to look out for, they
are not rules. At all times, it is an understanding of the
meaning of the data, which is important
 By applying logical reasoning, COURSE & LECTURER will
obviously be entities, with "SUPERVISOR" being a
relationship between one occurrence of a COURSE entity
and one occurrence of a LECTURER entity.
 MODULE is also an object of fundamental importance
(and a module also has attributes in its own right).
 At this stage, we have three entities, LECTURER, COURSE
& MODULE. The relationships between these are as
follows:
Development of the
Conceptual Data Model
(CDM) cont..

between COURSE & LECTURER is the relationship SUPERVISOR

" COURSE & MODULE “ COURSE-MODULE

" MODULE & LECTURER “ CAN-TEACH

" MODULE & LECTURER “ DOES-TEACH

" MODULE & MODULE “ PRE-REQUISITE


Development of the
Conceptual Data Model
(CDM) cont..
 Note that now we have much of our original information in the
form of relationships between entities. If we now list the
attributes of each of the entities, we get the following:
 Course
Duration (yrs), No-modules to be studied each year,
Course-name and Course-code.
 Lecturer
Lecturer-name and Address,etc
 Module
Module-name and Description
 Using the Chen notation, we can now draw the basic
diagrammatic data model.
 Note that there are 1:1, 1:N & M:N relationships.
 Can-Teach is an M:N relationship. This means that a
LECTURER can teach several MODULES, and a MODULE
can in general, be taught by several LECTURERS.
Development of the
Conceptual Data Model
(CDM) cont..
 ERD for the above case
Development of the
Conceptual Data Model
(CDM) cont..
 Case study /2: Data Analysis Exercise
 Requirements for the Company Database
 The company is organized into Departments. Each
department has a name, number, and an employee who
manages the department. We keep track of the start
date if the department manager. A department may have
several locations. Each department controls a number of
Projects. Each project has a name, number, and is
located at a single location. We store each Employees
social security number, address, salary, sex, and birth
date. Each employee works for one department but may
work on several projects. We keep track of the number of
hours per week that an employee currently works on
each project. We also keep track of the direct supervisor
of each employee. Each employee may have a number of
Dependents. For each dependent, we keep their name
company, set, birth date, and relationship to the
employee.
Development of the
Conceptual Data Model
(CDM) cont..
Development of the
Conceptual Data Model
(CDM) cont..
 In the diagram above, we have three basic entity types
(rectangle shape) and one weak entity type (rectangle
shape with double line boarder). The three basic entity
types are Employee, Project and Department.
 Few attributes are listed for each entity. One attribute
from each entity is identified as the Primary key
(underlined).
 There are composite attributes such as the Name of an
employee, multi-value attributes such as the Location of
a department (oval with a double line boarder).
 The attribute NumberOfEmployees is a derived attribute.
This means that we do not store it as a value, but it is
stored as an operation, which is calculated at run-time.
 There are four binary relationships and one unary
relationship. The relationship WORKS_FOR specifies that
a department must have some employees working for it
and an employee must work for one department only
Development of the
Conceptual Data Model
(CDM) cont..
 While the MANAGES relationship indicates that a
department must be managed by one employee only and
an employee may or may not manage a department. We
record the starting date of such a role in the attribute-
relationship StartDate.
 The relationship WORKS_ON indicates that a project
must have many employees working on it and each
employee must work at least in one project.
 For each employee we keep the working hours spent on
a specific project in the relationship attribute hours.
 The SUPERVISON relationship is a unary relationship
indicating that an employee may or may not supervise
many employees and an employee may or may not be
supervised by another employee.
 The entity DEPENDENT is a weak entity type. A weak
entity is an entity type that does not have a key
attribute.
Development of the
Conceptual Data Model
(CDM) cont..
 A weak entity type must participate in an identifying
relationship type with an owner or identifying entity type.
 Entities are identified by the combination of:
 A partial key of the weak entity type. –
 The particular entity they are related to in the identifying
entity type.
 DEPENDENT is identified by the dependent’s first name
and date of birth, and the specific Employee that the
dependent is related to.
 Employee is its identifying entity type via the identifying
relationship type Dependent_of.
End of Unit Five
Relational Database
Systems
 The aim of this lecture is to introduce the
relational database system and covers its
basic structures.
 The lecture will cover the following topics..
1. Introduction to Relational Databases
2. Relational Data Structures;
3. Relational Keys;
4. Relational Integrity Constrains;
5. Classifying a DBMS as a Relation (12 Rules)
Introduction to Relational
Databases
 Over the years, a number of models have been adopted
for implementing database systems.
 The most common data model is the relational model. Its
popularities lie in its simplicity.
 It uses a use a simple constructor, to organize data as
sets of homogeneous records. This is known as a
relation.
 The relational approach was originally proposed in 1970’s
and the first project that proved the practicality of the
relational model is System R, developed at IBM’s San
Jose Research Laboratory in 1976.
 This development is followed by more research projects
such as INGRES.
 Commercial systems based on the relational model
started in the late 1970s and early 1980s.
 Now there are several hundred relational DBMSs for
micro and mainframe computers, such Oracle, Sybase,
MS Access and FoxPro, etc.
Introduction to Relational
Databases cont..
 The relational model tends to treat the data in a
disciplined fashion.
 The model was proposed as a disciplined way of handling
data using the rigour of mathematics, particularly set
theory.
 This would enhance the concept of program-data
independence and improve programmer activities.
 The relational approach became the most important and
popular data model for most business applications due to
its simplicity.
 It gives a higher level of abstraction compared with other
models such as the Hierarchical and network models.
 These models are closer to physical structures. The
relational model we have only values. Even references
between data in different sets (relations) are represented
by means of values.
 In the hierarchical and network model there are explicit
references (pointers), which make them more
complicated.
Relational Data Structure
 The Relational approach is based on elementary
mathematical relation theory. Its basic construct is a
relation.
 A relation or also called a table looks like a traditional
sequential file.
 The data is organized in tables. The table has columns
and rows and it only applies to logical structure of the
database, not the physical structure.
 Columns of tables are referred to as attributes. Rows of
tables are referred to as tuples.
 Domain is a set of allowable values for one or more
attributes.
 The mathematical representation of the relational model
could be summarized as follows:
 given a collection of sets D1, D2, … Dn (not necessarily
distinct), R is a relation on these n sets if it is a set of
ordered n-tuples {d1, d2, …dn} such that d1∈D1,
d2∈D2… dn∈ Dn.
Relational Data Structure
cont..
 The sets D1, D2, … Dn are domains of R; the value of n
is the degree of R,
 where R refers to the number of columns in the
table/relation (e.g. if n=1 then R is a unary relation, if
n=2 then R is binary, etc.).
 The degree of relation is the number of attributes in that
relation.
 The cardinality is a number of tuples in a relation.
 The relational database is a collection of normalized
relations.
 Relation name is distinct from all other relations. Each
cell of relation contains exactly one atomic (single) value.
 Each attribute has a distinct name. Values of an attribute
are all from the same domain.
 Figure 1, shows a sample of a relational table for music
CDs.
Relational Data Structure
cont..
Attributes

CD ID Title Artist Price


AB1 Stranger and Others Acker Bilk £ 10.0
AB2 All That Trad Acker Bilk £ 10.0
DB1 Drinking & Courting Dubliners £ 10.0
DB2 The Hard Stuff Dubliners £ 6.5 Tuples
DB3 More Genius Dubliners £ 8.8
GH1 Country Style George Hamilton £ 7.5
JC1 San Quinton Johnny Cash £ 8.5
SG1 Yuckee Spice Spice Girls £ 1.5
SP1 Spinners Story Book Spinners £ 8.5
SP2 10 Of The Best Spinners £ 8.5
Relational Data Structure
cont..
 Relation name is distinct from all other relations within
one database schema. Each cell of relation contains
exactly one atomic (single) value.
 Each attribute has a distinct name within the relation.
 Values of an attribute are all from the same domain. The
order of attributes has no significance.
 Each tuple is distinct, and there are no duplicate tuples.
 Order of tuples has no significance, theoretically.
 There are a number of operations that can be performed
on a relation, that are originated from the mathematical
set theory.
 It is essential to appreciate that the model has this
strong theoretical underpinning.
 For instance, a set of numbers {32, 5, 99, 1066};
another set of numbers could be {1}.
 A set of people could be {Chan, Mohammed, Peter,
Sara}. A subset is part of a full set.
Relational Data Structure
cont..
 For instance {Mohammed, Sara} is a subset of the
previous set. We are interested in the ways in which sets
can be processed mathematically.
 For example, the union operation to combine two sets
into one: the union of the sets {32, 5, 99, 1066} and
{1} is the set {32, 5, 99, 1066, 1}.
1. The importance of the previous example is that all
members of the set are of the same type.
2. Only one instance of any item is held in a set. For
instance, a set {Chan, Sara, Chan} is not a proper set
and would be expressed correctly as {Chan, Sara}.
3. The sequence of items in the set is not significant,
therefore the sets {Mohammed, Peter} is the same as
{Peter, Mohammed}.
 These simple set principles can be used to develop the
concepts of a relation as described in a future lecture.
Relational Keys
 Application programs access the table columns by name.
 Most relations have one attribute where values uniquely
identify the tuples of a relation.
 Consider the schema in Figure 2 for suppliers and parts
database.
 It consist of three tables Part (P), Supplier (S) and
Shipment (SP) of parts/suppliers database. .
Relational Keys cont..
Shipment
SP S# P# QTY
Supplier S1 P1 300
S S# SNAME STATUS CITY S1 P2 200
S1 Smith 20 London S1 P3 400
S2 Jones 10 Paris S2 P1 300
S P
S3 Blake 30 Paris 2 2 400
S P
Part 3 2 200
SP P# PNAME COLOURWEIGHT CITY
P1 Nut Red 12 London
P2 Bolt Green 17 Paris
P3 Screw Blue 17 Rome
P4 Screw Red 14 London
Relational Keys cont..
 Each tuple in any of the tables in Figure 2 occurs only
once. This is to preserve the property of uniqueness for
each record in a table.
 That is, the set of all attributes values for any single row
guarantees a unique instance that related to a real world
entity.
 This complete set of attributes that can be used to
identify any row is called the Superkey.
 The Superkey is an attribute or a set of attributes that
uniquely identifies a tuple within a relation.
 For example the values (S3, P2, 200) identifies the last
row in the Shipment table.
 The Superkey may contain additional attributes that are
not necessary for unique identification.
 If we can pick the minimum set of attributes or an
attribute that can uniquely identifies a tuple then this
identifier is called a candidate key.
 For example, if we assume that the name of the supplier
is unique in the supplier table, then we would have two
candidate keys. The S# and the SNAME.
Relational Keys cont..
 We can choose one of these candidate keys and use it as
the main identifier for the table, then we call it the
primary key.
 For example, P# in relation P uniquely identifies a part.
Such a key is called a primary key. Each relation must
have a primary key.
 This is to enforce the property where duplicate rows are
forbidden in a relation.
 A primary key is one or more columns of a table whose
values are used to uniquely identify each of the rows
within a relation.
 For example, S# for the Supplier table and {S#, P#} for
the Shipment table. The primary key of the Shipment
table is called a composite primary key, since it consists
of more than one attribute.
 As a common convention the primary key is usually
underlined in a relational schema.
 A candidate key that was not picked as a primary key
could be used as an alternative key for the table.
Relational Keys cont..
 For example, the S# is the primary key for the supplier
table and the SNAME is the alternative key for this table
(assuming no duplicate names allowed in the table). .
 One of the main criteria of the relational model is a
value-based system. This means that data and
relationship are represented as values.
 Hence, relationships between tables are represented as
values. These values are used to link tables of the
database, see Figure 3.
 The composite primary key of the Shipment table
consists of two attributes, which are originated from two
separate tables.
 These are P# from the Part table and S# from the
Supplier table. P# in the Shipment table, as well as, its
participation role in the primary key, it called a foreign
key.
 The foreign key is an attribute or set of attributes within
one relation that matches candidate key of some
(possibly same) relation.
Relational Integrity
Constrains
 The term integrity refers to the accuracy or correctness
of data in the database. The database might be subject
to any number of integrity constraints.
 The integrity constraints are of many types. The first
type is associated with chosen data model for the
database.
 For our discussion this would be the relational integrity
constrains. Another type of constraints is associated with
the business application.
 This is also called as the enterprise constraints.
 There are a number of different relational integrity
constraints. Since every attribute is drawn from a specific
domain, then there are constraints on the type and the
range of values that an attribute can have.
 This type of constraints is called the domain constraints.
For example, the status of a supplier can be poor,
satisfactory, good or excellent.
 The domain constraint specifies the legal values for a
given type.
Relational Integrity
Constrains cont..
 An attribute may or may not have a value. This depends
on whether the attribute is participating in the primary
key of the relation or not.
 If the attribute is part of the primary key then it must
have a value, as the primary key uniquely identifies a
tuple and hence a missing value causes problems.
 This implies that not all attributes are required for the
unique identification, which contradicts the condition of a
primary key.
 This constraint is known as the Entity constraints. It
states that no attributes of a primary key can be null.
 Non-prime attributes may have a missing value (it
depends on its importance to the business application).
 If the value is not essential, for example, a customer
may or may not have a mobile phone, then, a “Null” is
assigned.
 However, The null represents unknown value. This is
quite misleading, as it could mean that the attribute is
not applicable to that specific tuple, or it could be
applicable, but no value is yet supplied.
Relational Integrity
Constrains cont..
 For example, employees record that has a field for the
number of children.
 A null could mean that this field is not applicable, or he
or she has not supplied this information, or the filed is
applicable but there is zero number of children, etc.
 A null represents the absence of a value and is not the
same as zero or blanks, which are values. It deals with
incomplete or exceptional data.
 As mentioned above, schema navigation in the relational
model is based on linking tables by matching their
values.
 Figure 3, presented two links between attributes from
two tables (primary keys of Supplier and Part tables) to
attributes in another table (foreign keys in the Shipment
table— the two foreign keys also have the role of a
primary key for that table).
 The value must exist so the links can be established. If
foreign key exists in a relation, either the foreign key
value must match a candidate key value of some tuple in
its home relation or foreign key value must be wholly
Relational Integrity
Constrains cont..
 This is to preserve the navigation. This is known as the
Referential integrity constraint.
 Referential integrity brings up special problems regarding
the updating or deleting of rows in a table.
 For example, consider deleting the Supplier tuple for
Smith in figure 2, or we change the value of the S# of
Jones from S2 to S4.
 These actions will invalidate all references to the S# in
the Supplier table. There are three possible strategies to
avoid this problem:
1. Restrict the update or ban any alterations to the primary
key as far as there are some foreign keys referencing it.
2. Cascade the effect of the operation (update or delete)
on the original row to all rows in all tables that reference
it.
3. Nullify (set to null) all corresponding foreign key values,
and allow the updates or deletion to take place in the
original table.
Classifying a DBMS as a
Relational System (12 Rules)
 During the mid 1980s there was a growing concern at the
large number of DBMSs claiming to be relational.
 In response to this T. Codd (the founder of the relational
theory) published a number of rules which attempted to
define key features of what a relational DBMS should
offer.
 If these rules are not satisfied, the product should not be
considered relational:
 These rules are categorized into five functional areas.
These are foundational, structural, integrity, data
manipulation and data independence rules.
The foundation rules:
 Rule 0: The foundation rule .
 Any system that is marketed as a relational DBMS must
be self contained. This implies that the system should be
able to manage databases entirely through its relational
capabilities.
 Rule 12: The non-subversion rule
 If a relational system has a single-record-oriented
language (low-level), it should not be used to bypass
integrity rules and constraints that were expressed in a
multiple-record-oriented language (high-level).
The structural rules:
 Rule 1: The information rule
 All information should be explicitly represented at the
logical level in one way only, which are values in tables.
 Rule 6: The view updating rule
 All views that are theoretically updateable are also
updateable by the system.
The integrity rules:
 Rule 3: The Systematic treatment of null values
rule
 Null values are supported for representing missing
information and inapplicable information in a systematic
way, independent of data type.
 Rule 10: The integrity independence rule
 Integrity constraints specific to a particular relational
database must be definable in the relational data sub-
language and storable in the catalog, not in the
application programs. Storing the constraints in the
system catalog has the advantage of centralized control
and enforcement.
The data manipulation rules:
 Rule 2: The Guaranteed access rule
 Each and every datum (atomic value) in a relational
database is guaranteed to be logically accessible by
resorting to a combination of table name, primary key
value and column name.
 Rule 4: The dynamic on-line catalogue based on the
relational model
 Database description is represented at the logical level in
the same way as ordinary data, so that authorized users
can apply same relational language to its interrogation as
applied to regular data.
 Only one language should be used for manipulating
metadata as well as data, and only one logical structure
(relations) should be used to store system information.
The data manipulation rules
cont..:
 Rule 5: The comprehensive data sub-language rule
 A relational system may support several languages and
various modes of terminal use.
 However, there must be at least one language whose
statements can express all of the following items:
 (1) data definition; (2) view definition; (3) data
manipulation (interactive and by program); (4) integrity
constraints; (5) authorisation (6) transaction boundaries
(begin, commit, and rollback).
 Rule 7: The high-level insert, update, and delete
rule
 Capability of handling a base relation or a derived
relation as a single operand applies not only to the
retrieval of data but also to the insertion, update, and
deletion of data.
The data independence rules:
 Rule 8: The physical data independence rule
 Application programs and interface activities remain
logically unimpaired when changes are made to storage
representations or access methods.
 Rule 9: The logical data independence rule
 Application programs and interface activities remain
logically unimpaired when information-preserving
changes of any kind that theoretically permit
unimpairment are made to the base tables.
 Rule 11: The distribution independence rule
 The relational DBMS should support distribution
transparency. The users should not be aware of any
distribution-related issues when they use the database.
For them the database should be used as if they were
accessing a centralized database system.
End of Unit Six
Relational Database
Languages
 The aim of this lecture is to introduce you to the basis of
database programming languages that is used to support
operations over relations.
 These are relational algebra and relational calculus. The
lecture will cover the following topics.
 Relational algebra
 Traditional Set operations:
 Select, Project, Cartesian product, Union, &
 Difference
 Derived Set operations:
 Join, Intersection & Division
 Relational calculus
 Tuple-oriented
 Domain-oriented
Introduction
 The relational data model must support a language to
define various types of operations on data.
 The language as covered in a previous lecture consists
of two parts.
 The two most important types of this language are the
data definition language (DDL) and the data manipulation
language (DML).
 This language must support various functions including
data retrieval, update and schema modification.
 Some of these languages are procedural and others are
non-procedural.
 The relational algebra is a high-level procedural
language.
 We can build a new relation from one or more relations
in the database, using relational algebra operations.
 The relational calculus is a non-procedural language. It
can be used to formulate the definition of a relation in
terms of one or more database relations
Introduction cont..
 However, the relational algebra and relational calculus
are equivalent to one another, as it has been shown that
any retrieval expression that can be specified in relational
algebra can also be specified in the relational calculus,
and vise versa.
 This concept of equivalent expressive power of the two
languages has led to the definition of the concept of a
relationally complete language.
 Relational completeness is used as a measure for any
database programming language.
 A programming language is considered to be relationally
complete if we can express in that language any query
that can be expressed in relational calculus.
 Both the algebra and the calculus are formal, non-user
friendly languages.
 They are used as the basis for other, high-level DMLs for
relational databases.
 However, they illustrate the basic operations required of
any DML
Relational Operations of
Relational Algebra
 Relational algebra is a theoretical language with
operations that can derive new relations from original
relations.
 There are five basic operations of relational algebra.
 These are Selection (Restrict), Projection, Cartesian
product (Times), Union and Set differences (Minus).
 There are additionally three derived operations:
 Join, Intersect and Divide. They are derived in that they
can be built from the basic operations.
 However, it is convenient to treat them as if they were
basic operations. In fact, the Join operation is one of the
most commonly used relational operations and it merits
considerable attention in its own right.
 Relational algebra operations can also be classified into
two groups. These are traditional set operations (union,
intersection, etc.), and relational operations (selection,
projection, join, etc).
 All operations result in new relations.
Relational Operations of

Relational Algebra cont..
The selection and projection operations are unary
operations since they operate on one relation.
 The other operations work on pairs of relations and are
therefore called binary operations.
 All the relational operations work at the relation level
alone. Their input must exist within the given database.
 It can be either as base relations or views. The resulting
relation (output) is a temporary relation that has been
derived by applying an operation to the given relations.
 Both operands and results are relations, so output from
one operation can become input to another operation.
 This allows expressions to be nested, just as in
arithmetic. This property is called closure.
 Relational algebra is a set-oriented language in which all
tuples, possibly from different relations, are manipulated
in one statement without looping.
 The relational algebra as a formal language is very rarely
supported directly by a system.
 Most systems provide an additional interface, frequently
Relational Operations of
Relational Algebra cont..
 The task of the SQL interpreter is to break a given SQL
statement down into the series of algebraic operation
that will build the data set described by the statement.
 The rest of this section covers these operations.
 We will use the schema for parts & supplier database1
that we introduced in a previous lecture to illustrate the
output from each operation, see Figure 1
Relational Operations of
Relational Algebra cont..
Shipment
SP S# P# QTY
Supplier S1 P1 300
S S# SNAME STATUS CITY S1 P2 200
S1 Smith 20 London S1 P3 400
S2 Jones 10 Paris S2 P1 300
S3 Blake 30 Paris S2 P2 400
Part S3 P2 200
P P# PNAMECOLOURWEIGHT CITY
P1 Nut Red 12 London
P2 Bolt Green 17 Paris
P3 Screw Blue 17 Rome
P4 Screw Red 14 London
The Select operation
 The SELECT operation is an operation for constructing a
horizontal subset of an existing relation, i.e. all rows
(tuples) of an existing relation, which can satisfy some
conditions.
 The used notation for the select operation is: σ F (r); the
semantic for the command is: σ F (r) = { t | t ∈ r and t
satisfies F}.
 For example consider the following query on the table of
figure 2.

 Get all employees who are under 30 years of age and


earn more than £4K.
 This query is presented in an algebra notation as:
 σ Age<30 Λ Salary>4000 (Employees)
The Select operation cont..
 This results in the table below:

 As another example, consider retrieving all the shipments


for supplier S1 and part P1 from the supplier/part
database instance presented in Figure 1.
 σ weight < 17 (Part) will produce:
P# PNAME COLOUR WEIGHT CITY
P1 Nut Red 12 London
P4 Screw Red 14 London
The Select operation cont..
 The project operator (π ) is used for constructing a
vertical subset of a relation.
 The subset obtained by selecting specified attributes and
eliminating others, (also eliminating duplicate tuples
within the attributes selected).
 The used notation is:
 π Y(r) — given a relation r(X) and a subset Y of X. The
semantic is: π Y(r) = {t[Y] | t ∈ r}.
 List the first and last names of all employees of Figure 2:
 The algebra expression is:
 π Surname, FirstName (Employees), and the result is:

 Another example is to project the city from the S table of


 Figure 2, giving the table:
City
London
Paris
The join operation
 It is the most typical operator in relational algebra. It
allows the establishment of connections among data in
different relations, taking into advantage the "value-
based" nature of the relational model.
 If two relations have a domain in common they may be
joined over that domain.
 The result of the join is a new relation of higher degree in
which each row is formed by joining together two rows,
one from each of the original tables such that the two
rows concerned have the same value in the common
domain.
 The join operand is denoted by the “ “ symbol.
 There are various forms of the join operation. The theta-
join is similar to the cartesian product followed by a
select. The comparison operators could be any of the
common comparison operators (<, ≤ , >, ≥ , =, ≠ ).
 The degree of a theta-join is sum of the degrees of the
operand relations.
 If the join predicate contains only equality (=), the term
equi-join is used to refer to the join operation.
The join operation cont..
 An equi-join that eliminates one of its duplicate columns
is called a natural join.
 The tuples in the result are obtained by combining tuples
in the operands with equal values on the common
attributes.
 The common attributes often form a key of one of the
operands. References are realized by means of keys, and
we join in order to follow references.
 Consider the example in Figure 3. The tables offences
and cars are joined (natural join) over the car
registration number.
 This type of join is the standard type of join and it is
sometimes called as an inner-join.

.
The join operation cont..
The join operation cont..
 Often in joining two relations, there is no matching value
in the join columns.
 To display rows in the result that do not have matching
values in the join column, we use another type of joins,
the outer join.
 The outer join "pads with nulls" the tuples that have no
counterpart. There are three variants:
 – "left": only tuples of the first operand are padded
 – "right": only tuples of the second operand are padded
 – "full": tuples of both operands are padded
 The (left/right/full) outer join, is a join in where the
tuples from the first relation one that do not have
matching values in the common columns of second
relation are also
 included in the result relation. Figure 4, shows the three
different variations of outer joins.
The join operation cont..
The Set Operations of

Relational Algebra
Relations are sets, so we can apply set operators.
 However, we want the results to be relations (that is,
homogeneous sets of tuples).
 Therefore, it is meaningful to apply union, intersection,
and difference only to pairs of relations defined over the
same attributes.
 The two relations must be union compatible. This means
that they must be of the same degree, n say, and the jth
attribute of one must be drawn from the same domain as
the jth attribute of the other (1≤j≤n).
 The Union operation
 The union of two (union-compatible) relations A and B, A
∪ B is the set of all tuples t belonging to either A or B (or
both).
 For example, from figure 1, let A be the set of supplier
tuples for suppliers in London &
 B the set of supplier tuples for suppliers who supply part
P1. Then the operation A ∪ B is the set of supplier
tuples for suppliers who either are located in London or
supply part
The Set Operations of
Relational Algebra cont..
 P1 (or both). The union table would be:

S# SNAME STAUS CITY


S1 Smith 20 London
S2 Jones 10 Paris
 The Intersection operation
 The intersection of two (union-compatible) relations A
and B, A ∩ B is the set of all tuples t belonging to both A
and B.
 For example, from Figure 1, let A be the set of supplier
tuples for suppliers in London, and B the set of supplier
tuples for suppliers who supply part P1.
 Then the operation: A ∩ B
 is the set of supplier tuples who are located in London
and who supply part P1. The intersection table would be
The Set Operations of
Relational Algebra cont..

S# SNAME STAUS CITY


S1 Smith 20 London
 The difference operation
 The difference between two (union-compatible)
relations A and B, A−B is the set of all tuples
belonging to A and not to B.
 For example, from Figure 1, let A be the set of
supplier tuples for suppliers in Paris and the set of
supplier tuples for suppliers who supply part P1. Then
the operation: A − B
 is the set of supplier tuples for suppliers who are
located in Paris and who do not supply part P1.
 The difference table would be:
The Set Operations of
Relational Algebra cont..

 The Cartesian product operation


 The cartesian product operation, denoted as R × S,
defines a relation that is the concatenation of every
tuple of relation R with every tuple of relation S.
 The result contains tuples obtained by combining the
tuples of the operands in all possible ways.
 Figure 5, shows an example of a cartesian product of
two tables.:
The Set Operations of
Relational Algebra cont..

 The Divide operation


 The divide operator is a derived operator in that it can
be built from other operations.
 It is concerned with overlapping attributes sets where
the attributes of one relation are a subset of the
attributes in the other.
 When we have two such relations, divide returns all of
the tuples in the first relation that can be matched
against all of the values in the second (smaller)
The Set Operations of
Relational Algebra cont..
 For example, from the database in Figure 1, to get the
supplier(s) who has shipments for all parts, we need
to divide S over SP (S# is the matching attribute).
 This will produce the table:

S# SNAME STAUS CITY


.
S1 Smith 20 London
The Relational calculus
 The second group of relational languages based on a
different theoretical foundation, namely the relational
calculus.
 The relational algebra is a perspective or procedural
approach while calculus is a declarative or descriptive
approach.
 A query is specified in a single declarative statement,
without specifying any order or method for retrieving
the result of the query.
 For example, from figure 1, for each part supplied,
find part number and names of all cities supplying that
part.
 The syntax of the query would look like:
 {(SP.P#, S.CITY) : SP.S# = S.S#}
<target> WHERE <predicate>
 :
The Relational calculus

cont..
The “{}” indicates relation (set) expression; The term
preceding the “:” represents a typical member, and
the term following it is the qualification or predicate
representing the defining property of the set or
relation.
 Relational calculus query specifies what is to be
retrieved rather than how to retrieve it.
 It does not need any description of how to evaluate a
query.
 It is based on a branch of symbolic logic called
predicate calculus..
 In first-order logic or predicate calculus, a predicate is
a truth-valued function with arguments.
 When we substitute values for the arguments, the
function yields an expression, called a proposition,
which can be either true or false.
The Relational calculus

cont..
If a predicate contains a variable, as in ‘x is a member
of staff’, there must be a range for x.
 When we substitute some values of this range for x,
the proposition may be true; for other values, it may
be false. If P is a predicate, then we write the set of all
x such that P is true for x, as
{x | P(x)}
 Predicates can be connected using ∧ (AND), ∨ (OR),
and ~ (NOT)
 A relational calculus employs the concept of a
variable, which act as a place-holder in the relation. If
the language uses tuple variables that range over
tuples (rows) of relations, then we have a tuple-
oriented calculus.
 If the language uses domain variables that range over
domains (columns of relations) then we call it as a
domain-oriented calculus.
The Relational calculus

cont..
The tuple-oriented relational calculus.
 Tuple variable is a variable that ‘ranges over’ a named
relation: that is, a variable whose only permitted
values are tuples of the relation.
 A simple tuple relational calculus query is of the form
{t | Cond(t)} where t is a tuple variable and Cond(t)
is conditional expression involving t.
 For example, for the database of figure 1, get supplier
numbers for suppliers in London.
 The following expression defines the range of the
variable over the supplier table:
 RANGE OF SX IS S.
 For each possible value of the variable SX, retrieve the
S# component of that value, if
 CITY='LONDON'. This is expressed as:
{SX.S# : SX.CITY='LONDON'}.
The Relational calculus
cont..
 Another example, is to find all employees whose
salary is above £50000, is expressed as follow:

 { t I EMPLOYEE(t) and t.SALARY>50000}

 The EMPLOYEE(t) specifies the range relation


EMPLOYEE for the tuple variable t.
 Each tuple t satisfying t.SALARY>50000 is retrieved. It
retrieves the whole tuple t
 To retrieve only some attributes of t, the expression
would be as:
 { t.FNAME, t.LNAME I EMPLOYEE(t) and
t.SALARY>50000

.
The Relational calculus
cont..
 The Domain -oriented relational calculus
 Uses variables that take values from domains instead
of tuples of relations.
 We often test for a membership condition, to
determine whether values belong to a relation.
 The expression R(x, y ) evaluates to true if and only if
there is a tuple in relation R with values x, y for its
two attributes.
 Let us assume that the existence of domain calculus
range variables as follows:
 The domain S# ranges over the variable SX, P# over
PX, CITY over CITYX, CITYY, …,
 For example, using Figure 1, get supplier-
number/part-number pairs such that the supplier and
part are not collated.
 The domain calculus expression would be:
End of Unit Seven
Relational Database
System / Structured Query
Language
 The aim of this lecture is to introduce the Structure
Query Language (SQL) and its main operations.
 The notes will focus on the data manipulation part of
SQL, as it includes the most common and frequently
used operations in databases (i.e. queries).
 The lecture also covers other topics. These are:

 Introduction to SQL
 Basic structure of SQL commands
 Data Definition
 Data Manipulation
 Aggregation
Introduction to SQL
 A database language should allow user to create the
database and relation structures.
 The user should be able to perform insertion,
modification, and deletion of data from relations.
 The language should also support the simple and
complex queries.
 It must perform these tasks with minimal user effort
and command structure and syntax must be easy to
learn.
 The language must be portable
 SQL is a transform-oriented language with two major
components.
 These are the DDL for defining the database structure
and the DML for retrieving and updating data.
 SQL does not contain flow control commands.
 These must be implemented using a programming or
job-control language, or interactively by the decisions
of the user.
Introduction to SQL cont..
 SQL is relatively easy to learn. SQL is a nonprocedural
language, in contrast to the procedural or third-
generation languages (3GLs) such as COBOL and C
that had been created up to that time - you specify
what information you require, rather than how to get
it.
 It is essentially free-format. It consists of standard
English words.
 A range of users including data and database
administrators, management, application
programmers, and other types of end users can use
SQL.
 ORACLE was probably the first commercial RDBMS
based on SQL. ANSI and ISO published many
standards for SQL.
 The most popular and widely implemented is referred
to as SQL2 or SQL/92. The SQL examples used in this
lecture note adheres to SQL2 standard.
Basic structure of SQL commands
 SQL statement consists of reserved words and user-defined
words.
 Reserved words are a fixed part of SQL and must be spelt
exactly as required and cannot be split across lines.
 User-defined words are made up by user and represent
names of various database objects such as relations,
columns and views.
 Most components of an SQL statement are case insensitive,
except for literal character data.
 SQL statements are more readable with indentation and
lineation. An extended form on BNF notation is used to
express the syntax. That is:
 Upper case letters represent reserved words.
 Lower case letters represent user-defined words.
 | indicates a choice among alternatives.
 Curly braces indicate a required element.
 Square brackets indicate an optional element.
 … indicates optional repetition (0 or more).
 The above syntax will be used in the rest of this document
when we illustrate the syntax of various SQL commands.
Data Definition

 The first syntax we are looking at under this section is


schema definition.
 A schema is a collection of objects. It includes domains,
tables, indexes, assertions, views and privileges. A schema
has a name and an owner (the authorization).
 The syntax is:
 CREATE SCHEMA [name |
AUTHORIZATION creator_id ]
 The syntax for creating a table consists of an ordered
set of attributes and a (possibly empty) set of
constraints.
 For example, the syntax below is for table creation:
 CREATE TABLE table_name
(col_name data_type [NULL | NOT NULL] [,...])
 The above statement creates a table with one or more
columns of the specified data_type.
 The NULL (which is the default value) indicates
whether column can contain nulls.
Data Definition cont..

 With NOT NULL, system rejects any attempt to insert a null


in the column.
 The primary keys should always be specified as NOT NULL.
Foreign keys, though are often (but not always) candidates
for NOT NULL.
 The SQL statement below creates a table named Employee.
 CREATE TABLE Employee
 (
 RegNo CHARACTER(6) PRIMARY KEY,
 FirstName CHARACTER(20) NOT NULL,
 Surname CHARACTER(20) NOT NULL,
 Dept CHARACTER (15) REFERENCES
Department(DeptName)
 ON DELETE SET NULL
 ON UPDATE CASCADE,
 Salary NUMERIC(9) DEFAULT 0,
 CityCHARACTER(15),
 UNIQUE (Surname,FirstName)
 )

Data Definition cont..
Constraints are conditions that must be verified by
every database instance.
 The first type includes intra-relational constraints. This
type involves a single relation.
 For example, the “not null” (on single attributes), and
the “unique” which permits the definition of keys.
 The syntax is for single attributes is to place the
reserved word “unique” after the domain and for
multiple attributes is “unique (attribute1, attribute2,
etc.”.
 The primary key phrase defines the primary key (once for
each table and it implies not null).
 The syntax is similar to “unique” phrase.
 The second type of constraints includes the inter-relational
constraints.
 These constraints may take into account several relations.
 The clause “references” and “foreign key” permit the
definition of referential integrity constraints.
 The syntax for single attributes is “references” after the
domain while for multiple attributes “foreign key
(Attribute1, Attribute2) references…”
Data Definition cont..
 It is possible to associate reaction policies to violations
of referential integrity.
 Reactions operate on the internal table, after changes
to the external table.
 Violations may be introduced (1) by updates on the
referred attribute or (2) by row deletions. Reactions
can be one of the following:
 CASCADE: propagate the change,
 SET NULL: nullify the referring attribute,
 SET DEFAULT: assign the default value to the
referring attribute or
 NO ACTION: forbid the change on the external
table.
 Reactions may depend on the event. The syntax is:
Data Definition cont..
 on < delete | update > < cascade | set null | set
default | no action >
 A schema (e.g. a table) can be deleted from a
database using the command “drop”. The following
syntax is used.
 DROP SCHEMA name [RESTRICT | CASCADE ];
 With RESTRICT (default), schema must be empty or
operation fails.
 With CASCADE, operation cascades to drop all objects
associated with schema in the order defined above.
 If any of these operations fail, DROP SCHEMA fails.
 To drop a table then the syntax below is used:
 DROP TABLE tbl_name [RESTRICT | CASCADE];
 To drop the Employee table we have created earlier
use the following drop statement should delete it from
the database.
 DROP TABLE Employee;
Data Definition cont..
 The above statement removes the Employee table and
all rows within it.
 With RESTRICT, SQL does not allow the deletion of
any other objects that their continued existence
depend on the table to be dropped (e.g. projects that
an employee responsible for).
 With CASCADE, SQL drops all dependent objects —
and objects dependent on these objects (e.g.
dependent or next-of-kin records that are associated
with each employee);
 A schema can be changed or altered after creation.
The SQL statement:
ALTER (alter domain ..., alter table …)
 For example the command:
 ALTER TABLE Department
 ADD COLUMN NoOfOffices NUMERIC(4);
 Add a new column called NoOfOffices to the
department table as numeric type of length four.
Data Manipulation
 SQL expresses queries in declarative way – queries
specify the properties of the result, not the way to
obtain it.
 Queries are translated by the query optimizer into the
procedural language internal to the DBMS.
 The programmer should focus on readability, not on
efficiency.
 A query in SQL can consist of up to six clauses, but
only the first two are mandatory.
 SELECT [DISTINCT | ALL]
 {* | [column_expression [AS new_name]] [,...] }
 FROM table_name [alias] [, ...]
 [WHERE condition]
 [GROUP BY column_list] [HAVING condition] [ORDER
BY column_list]
 A query is evaluated by first applying the WHERE-
clause, then GROUP BY and HAVING, and finally the
SELECT-clause.
Data Manipulation cont..
 The order of the clauses cannot be changed. The
following is a brief description of each of the reserved
words
 SELECT specifies which columns are to appear in
output.
 FROM specifies table(s) to be used.
 WHERE filters rows.
 GROUP BY forms groups of rows with same column
value.
 HAVING filters groups subject to some condition.
 ORDER BY specifies the order of the output.

 The two table instances below are used to illustrate


various queries.
 An employee name (first plus surname) is unique. The
department name is unique. An employee must work
in one department.
Data Manipulation cont..
 Employee
FirstName Surname Dept Office Salary City
Mary Brown Administration 10 45 London
Charles White Production 20 36 Toulouse
Gus Green Administration 20 40 Oxford
Jackson Neri Distribution 16 45 Dover
Charles Brown Planning 14 80 London
Laurence Chen Planning 7 73 Worthing
Pauline Bradshaw Administration 75 40 Brighton
Alice Jackson Production 20 46 Toulouse

 Department
DeptName Address City
Administration Bond Street London
Production Rue Victor Hugo Toulouse
Distribution Pond Road Brighton
Planning Bond Street London
Research Sunset Street San José
Data Manipulation cont..
 Q1:Specific Columns, Specific Rows. Find the salaries
of employees named Brown.
 SELECT Salary as Remuneration
 FROM Employee
 WHERE Surname = ’Brown’;
 Q2: Find all the information relating to employees
named Brown.
 SELECT *
 FROM Employee
 WHERE Surname = ’BROWN’;
 Q3: Find the monthly salary of the employees named
White.
 SELECT Salary / 12 as MonthlySalary
 FROM Employee
 WHERE Surname = ’White’;
 Simple join query
 Q4: Find the names of the employees and the cities in
which they work.
Data Manipulation cont..
 SELECT Employee.FirstName,
Employee.Surname,
Department.City Employee,
Employee.Dept=Department.DeptName;
 Using table aliases
 Q5: Find the names of the employees and the cities in
which they work (using an alias).
 SELECT FirstName, Surname, D.City
 FROM Employee, Department D
 WHERE Dept = DeptName;
 Q6: Using predicate conjunction.
 Find the first names and surnames of the employees who
work in office number 20 of the Administration department.
 SELECT FirstName, Surname
 FROM Employee
 WHERE Office = ’20’ AND
Dept = ’Administration’;
Data Manipulation cont..
 Q7: Find the first names and surnames of the employees who
work in either the Administration or the Production
department;
 SELECT FirstName, Surname
 FROM Employee
 WHERE Dept = ’Administration’ OR
 Dept = ’Production’;
 Q8: Find the first names of the employees named Brown who
work in the Administration department or the Production
department.
 SELECT FirstName
 FROM Employee
 WHERE Surname = ’Brown’ AND
(Dept = ’Administration’ OR
Dept = ’Production’);
 Q9:Find the employees with surnames that have ’r’ as the
second letter and end in ’n’.
 SELECT *
 FROM Employee
 WHERE Surname LIKE ’_r%n’; 1
Data Manipulation cont..
 We will use another example to illustrate other concepts of SQL
queries. The schema below presents a database snapshot (i.e.
a schema instance) for an estate agent.
 Owner

Ono FName LName Address Tel_No


CO40 Tina Murphy 63 Well St, Shawlands, Glasgow G42 0141-943-1728
CO46 Joe Keogh 2 Fergus Dr, Banchory, Aberdeen AB2 7SX 01224-861212
CO87 Carol Farrel 6 Achray St, Glasgow G32 9DX 0141-357-7419
CO93 Tony Shaw 12 Park Pl, Hillhead, Glasgow G4 0Q4 0141-225-7025
Data Manipulation cont..
 Branch
Bno Street Area City Pcode Tel_no Fax_no MgrNo
B2 56 Clover Dr London NW10 6EU 0181-963-1030 0181-453-7992 SG5
B3 163 Main St Patrick Glasgow G11 9QX 0141-339-2178 0141-339-4439 SL21
B4 32 Manse Rd Leigh Bristol BS99 1NZ 0171-916-1170 1007-776-1114 SL21
B5 22 Deer Rd Sidcup London SW1 4EH 0171-886-1212 0171-886-1214 SG5
B7 16 Argyll Dyce Aberdeen AB2 3SU 01224-67125 01224-67111 SG5
 Property

Pno Street Area City Pcode Type Rooms Rent Ono Sno Bno
PA1 16 Holhead Dee Aberdeen AB7 5SU H 6 £650.00 CO46 SA9 B7
PG1 5 Novar Dr Hyndland Glasgow G12 9AX F 4 £450.00 CO93 SG14 B3
PG2 8 Dale Rd Hyndland Glasgow G12 H 5 £600.00 CO87 SG37 B3
PG3 2 Manor Rd Glasgow G32 4QX F 3 £375.00 CO93 SG37 B3
PG4 6 Lawrence St Patrick Glasgow G11 9QX F 3 £350.00 CO40 SG14 B3
PL94 6 Argyll St Kilburn London NW2 F 4 £400.00 CO87 SL41 B5
Data Manipulation cont..
 Renter

Rno FName LName Address Tel_No Pref_Type Max_Rent


CR56 Aline Stewart 64 Fern Dr, Pollock, Glasgow, G42 0141-848- Flat $350.00
CR62 Mary Tergear 5 Tarbot Rd, Kildary, Aberdeen AB9 01224- Flat $600.00
CR74 Mike Ritchie 18 Tain St, Gourock PA1G 1YQ 01475- House $750.00
CR76 John Kay 56 High St, Puttney, London SW1 0171-774- Flat $425.00
 Viewing

Pno Rno Date Comment


PA14 CR56 24-May-95 too small
PA14 CR62 14-May-95 no dining room
PG36 CR56 28-Apr-95
PG4 CR56 26-May-95
PG4 CR76 20-Apr-95 too remote
Data Manipulation cont..
 Staff

Sno FName LName Address Tel_No Position Sex Salary DOB NIN Bn
SA9 Mary Howe 2 Elm Pl, Aberdeen AB2 Assistant F £9,000.0 19/2/70 WM5321 B7
SG1 David Ford 63 Ashby St, Partick, 0141-339-2177 Deputy M £18,000. 24/3/58 WL22065 B3
SG3 Ann Beech 81 George St, Glasgow, 0171-848-3345 Snr Asst F £12,000. 10/11/6 WL44201 B3
SG5 Susan Brand 5 Gt Western Rd, Glasgow 0141-334-2001 Manager F £24,000. 3/6/40 WK5889 B3
SL2 John White 19 Taylor St, Cranford, 0171-884-5112 Manager M £30,000. 1/10/45 WL43251 B5
SL4 Julie Lee 28 Mavlvern St, Kilburn 0181-554-3541 Assistant F £9,000.0 13/6/65 WA2905 B5
Data Manipulation cont..
 Q10: List the details of all viewings on property PG4 where
a comment has not been supplied (Null search conditions).;
 SELECT viewing.pno, viewing.rno, Date
 FROM viewing
 WHERE pno='PG4' AND comment IS NULL;
 Q11: Produce an abbreviated list of properties arranged in
order of property type (sorting results)
 SELECT Pno, Type, Rooms, Rent
 FROM Property
 ORDER BY Type;
 Q12: List all staff with a salary greater than 10,000
(Comparison Search Condition)
 SELECT Staff.Sno, Staff.Fname, Staff.Lname, Position,
Salary
 FROM Staff
 WHERE Salary > 10000;
 Q13: List all staff with a salary between 20,000 and 30,000.
(Range search condition)
Data Manipulation cont..
 SELECT staff.Sno, staff.FName, staff.LName, staff.Position,
 staff.Salary
 FROM staff
 WHERE staff.Salary BETWEEN 20000 AND 30000;
 Q14: List all Managers and Deputy Managers. (Set
membership search condition)
 SELECT staff.Sno, staff.FName, staff.LName, staff.Position
 FROM staff
 WHERE position in ('Manager', 'Deputy');
Aggregation
 ISO standard defines five aggregate functions. These are:

 COUNT returns number of values in a specified column.


 SUM returns sum of values in a specified column.
 AVG returns average of values in a specified column.
 MIN returns smallest value in a specified column.
 MAX returns largest value in a specified column.
 Each operates on a single column of a table and return single
value. The functions COUNT, MIN, and MAX apply to numeric
and non-numeric fields, but SUM and AVG may be used on
numeric fields only.
 Apart from COUNT(*) , each function eliminates nulls first and
operates only on remaining non-null values. COUNT(*) counts
all rows of a table, regardless of whether nulls or duplicate
values occur.
Aggregation cont…
 One can use DISTINCT before column name to eliminate
duplicates.
 DISTINCT has no effect with MIN/MAX, but may have with
SUM/AVG.
 Aggregate functions can be used only in SELECT list and in
HAVING clause.
 If SELECT list includes an aggregate function and there is no
GROUP BY clause, then SELECT list cannot reference a column
without an aggregate function.
 For example, following is illegal:
 SELECT sno, COUNT(salary) FROM staff;
 All column names in the SELECT list must appear in the Group By
clause unless the name is used only in an aggregate function.
 We will illustrate the use of these functions through examples
using the estate agent database.
Aggregation cont…
 Q16: How many properties cost more than £350 per month for
rent? (The Count function).
 SELECT Count(*) AS count
 FROM property
 WHERE property.Rent > 350;
 Q17: How many different properties were viewed in May 1995?
(count)
 SELECT COUNT (DISTINCT Pno) AS COUNT
 FROM viewing
 WHERE Date BETWEEN ’05/1/95’ AND ’05/31/95’;
 Q18: Find the minimum, maximum and average staff salary.
 SELECT MIN(salary) AS MIN,
 MAX(salary) AS MAX,
 AVG(salary) AS AVG
 FROM staff;
Aggregation cont…
 Q19: Find the number of staff working in each branch
and the total of their salaries.
 SELECT bno, COUNT(sno) AS count, SUM(salary) AS sum
 FROM Staff
 GROUP BY Bno
 ORDER BY bno;
 Q20: For each branch office with more than one member of staff,
find the number of staff working in each branch and the sum of
their salaries.
 SELECT bno, COUNT(sno) AS count, SUM(salary) AS sum
 FROM staff
 GROUP BY bno
 HAVING COUNT(SNO) > 1;
STRUCTURED DATA (PRACTICALS)
 Example 1: Creating a Table
 Use MS Access to create a table to store
information about weather observation stations.
 The table should contain the station ID, City,
State, Latitude Number and Longitude Number.
Then populate the table with imaginary weather
value.
 Solution // SQL command to create a Table
 CREATE TABLE Station (
 ID integer PRIMARY KEY,
 City varchar(20),
 State Varchar(20),
 Lat_N Real,
 Long_N REAL
 );
STRUCTURED DATA (PRACTICALS)
cont..
 // SQL command to populate Table Station
 INSERT INTO Station VALUES (101,
"Lusaka",'"Lusakasa", 30, 45);
 INSERT INTO Station VALUES (102,
"Kitwe","Copperbelt", 20, 40);
 INSERT INTO Station VALUES (103, "Kapiri","
Central", 60, 30);
 INSERT INTO Station VALUES (102,
"Ndola","Copperbelt", 100,605);
 INSERT INTO Station VALUES (104, "Mpika","
Muchinga", 70, 65);
 INSERT INTO Station VALUES (105,
"Solwezi","North Western", 90, 55);
STRUCTURED DATA (PRACTICALS)
cont..
 Creating SQL Queries (Data Manipulation)
 Queries are a fundamental means of accessing
and displaying data from the Tables. Queries can
access a single table or multiple tables;
 Example 1
 Write a query to select only those cities where
latitude is greater than 30.
 Solution
 SELECT *
 FROM Station
 WHERE Lat_N>30;
 Example 2
 Write a query to select only ID, CITY, and STATE
columns.
STRUCTURED DATA (PRACTICALS)
cont..
 Solution
 SELECT ID, CITY, STATE
 FROM Station;
 Example 2
 Create another table to store normalized
temperature and precipitation data and populate
it with the necessary information.
 Solution
 CREATE TABLE STATISTICS
 (ID Int PRIMARY KEY,
 Month Int ,
 TEMP_F REAL ,
 RAIN_I REAL );
 // Populate the table STATS with some statistics
for January and July as presented below:
STRUCTURED DATA (PRACTICALS)
cont..
 INSERT INTO STATS VALUES (13, 1, 57.4, 0.31);
 INSERT INTO STATS VALUES (13, 7, 91.7, 5.15);
 INSERT INTO STATS VALUES (44, 1, 27.3, 0.18);
 INSERT INTO STATS VALUES (44, 7, 74.8, 2.11);
 INSERT INTO STATS VALUES (66, 1, 6.7, 2.10);
 INSERT INTO STATS VALUES (66, 7, 65.8, 4.52);
 Joint Queries Examples:
 Q1. Write a query to look at table
STATISTICS, picking up location information by
joining with table STATION on the ID column:
 Solution
 SELECT *
 FROM STATION, STATISTICS
 WHERE STATION.ID = STATISTICS.ID;
STRUCTURED DATA (PRACTICALS)
cont..
 Q2. Write a query to look at the table
STATISTICS, ordered by month and greatest
rainfall, with columns rearranged:
 Solution
 SELECT MONTH, ID, RAIN_I, TEMP_F
 FROM STATISTICS
 ORDER BY MONTH, RAIN_I DESC;
 Q3. Write query to show MAX and MIN
temperatures as well as average rainfall for each
station:
 Solution
 SELECT MAX (TEMP_F), MIN (TEMP_F), AVG
(RAIN_I), ID
 FROM STATISTICS
 GROUP BY ID;
STRUCTURED DATA (PRACTICALS)
cont..
 SQL Update and Delete Examples
 Q4. Write an SQL command to update all rows of
table STATS to compensate for faulty rain gauges
known to read 0.01 inches low:
 Solution
 UPDATE STATISTICS SET RAIN_I = RAIN_I +
0.01;
 Then run this command to view the changes
 SELECT *
 FROM STATS;
 Q5. Write an SQL command to delete July data
and East Coast data from both tables:
 Solution
 DELETE FROM STATISTICS
 WHERE MONTH = 7
 OR ID IN (SELECT ID FROM STATION
 WHERE LONG_W < 90);
Functional Dependencies
 Relational database design is
concerned with the grouping of
attributes to form "good"
relational schemas.
 The good grouping of attributes
is based on dependency and
relationships of these attributes.
 Functional dependency describes
the relationship between
attributes in a relation.
Design guidelines for relational
schemas cont….
 Consider the relational schema below.

Employee (Ename, Ssn, Bdate, Address,


Dnumber);
Project (Pname, Pnumber, Plocation, Dnum);
Department (Dname, Dnumber, Mgrssn),;
Dept_Location (Dnumber, DLocation);
Works_On (SSN, Pnumber, Hours);

 There should be some justification on the


grouping of attributes in one relation.
 Ideally, this grouping is based on semantics
captured from the application domain and user
requirement analysis documents.
Design guidelines for relational
schemas cont….
 The meaning, or semantics, specifies how to
interpret the attribute values stored in a tuple of
the relation.
 The above can be summarized in guideline 1:
 Design a relation schema so that it is easy to
explain its meaning.
 Do not combine attributes from multiple entity
types and relationship types into a single relation.
 Only foreign keys should be used to refer to other
entities.
 For example, consider the two schemas below:
 Emp_Dept (Ename, Ssn, Dmgrssn, Dnumber,
Dname, Bdate, Address);
 Emp_Proj (Ssn, Pnumber, Pname, Plocation,
Hours, Ename);
Design guidelines for relational
schemas cont….
 Reducing the redundant values in tuples
 One goal of schema design is to minimize the
storage space that the base relations occupy.
 Having redundant data in tuples wastes storage
space unnecessarily.
 The other serious problem with using the
schemas above is the problem of processing
anomalies.
 The various anomalies can be classified into
insertion, deletion and update or modification
anomalies.
 Reducing the null values in tuples
 Grouping many attributes in one relation
produces fat relations.
 A fat relation may have many attributes that do
not apply to all tuples in the relation.
Design guidelines for relational
schemas cont….
 This means we end up with many
nulls in those tuples.
 There are two problems with so
many nulls.
 The first is wasted storage space
and
 the second is the ambiguity that
is associated with the null values
Attribute grouping and functional
dependency notation
 Functional dependency describes the
relationship between attributes in a
relation.
 Eg. if A and B are attributes of
relation R, B is functionally
dependent on A (denoted A → B), if
each value of A in R is associated
with exactly one value of B in R.
 Consider the schema and the
snapshot of the table stock:
Attribute grouping and functional
dependency notation cont..

 .
Attribute grouping and functional
dependency notation cont..
 Functional dependencies help in accomplishing
the following two goals:
 (a) controlling redundancy and
 (b) enhancing data reliability.
 If two tuples agree on the ‘X’ attribute, they
*must* agree on the ‘Y’ attribute, too.
 If X Y we say X functionally determines Y.
 Notice that X Y implies many-to-one or one-to-
one mapping.
 Example: Consider the Emp schema below:
 EMP (name, salary, dept, mgr).
 Consider the following data dependencies:
 1. Each employee has one salary
Name salary
 2. Each employee works in only one department
name dept
Attribute grouping and functional
dependency notation cont..
 Each possible P# (i.e. Part
number) value has precisely
one associated P-desc value,
then P# is a determinant of P-
desc, or P# φ P-desc.
 Each possible P# value has only
one associated Qty-in-stock
value then P# is a determinant
of Qty-in-stock, or P# φ Qty-in-
stock
Dependency Diagrams
 Attribute A is a determinant of B, or B is
dependent on A can be represented in a
FD diagram as:
A B

 The second diagram is read as: attribute


A is a determinant of B and vice versa.
 For our stock table, the FD diagram would look
as:

 For our stock table, the FD diagram would look


as:
 P# Qty-in-stock

 P-desc
Inference Axioms
 An inference axiom is a rule that states
that: if a relation satisfies certain FDs then
it must satisfy certain other FDs.
 The closure of F (usually written as F+) is
the set of all functional dependencies that
may be logically derived from F.
 Often F is the set of most obvious and
important functional dependencies and
 F+, the closure, is the set of all the
functional dependencies including F and
those that can be deduced from F.
 The closure is important and may, for
example, be needed in finding one or more
candidate keys of the relation.
Inference Axioms cont..
 A set of inference rules, called
Armstrong’s axioms, specifies how new
functional dependencies can be inferred
from given ones.
 Let A, B, and C be subsets of the
attributes of the relation R. Armstrong’s F
axioms are as follows:
 (F1) Reflexivity

 If B is a subset of A, then A → B

 (F2) Augmentation

 If A → B, then A,C → B,C

 (F3) Transitivity

 If A → B and B → C, then A → C
Inference Axioms cont..
 Further rules can be derived from the first
three rules that simplify the practical task
of computing X+.
 Let D be another subset of the attributes
of relation R, then:
 (F4) Self-determination
 A→A
 (F5) Decomposition
 If A → B,C, then A → B and A → C
 (F6) Union
 If A → B and A → C, then A → B,C
 (F7) Composition
 If A → B and C → D then A,C →
 B,D
Relational Database System /
Normalization for Relational
Databases
 In the previous lecture we have covered
some informal guidelines for good
database design.
 We also covered the concept of functional
dependency, which is the key factor for
grouping attributes in one relation.
 We showed how bad design causes
modification anomalies such as insertion,
deletion and update anomalies.
Normalization for Relational
Databases
 In this lecture we are covering a series of
formal tests on a relation to determine
whether it satisfies or violates the
requirements of a given normal form.
 The objective of this process is to separate
the data (attribute values) into sets based
functional dependencies between
attributes.
 The lecture will cover the following topics
1. The purpose of Normalization
2. First Normal Form (1NF)
3. Second Normal Form (2NF)
4. Third Normal Form (3NF)
5. Boyce-Codd Normal Form (BCNF)
The purpose of Normalization
.
 Normalizing a logical database design involves
using formal methods to separate the data into
multiple related tables.
 The characteristics of normalised database are a
large number of tables with few columns.
 A database with only few tables and many
columns is indicative of an un-normalised or
partially normalised database.
 The benefits of a normalised relation include:
1. faster sorting and index creation
2. Another benefit is that there will be fewer null
values for data that is either not required or not
known
3. Normalisation also reduces the opportunity for
database inconsistency.
 The side effect of normalization is that as it is
implemented the number and complexity of joins
required to retrieve data is increased.
The purpose of Normalization
cont..
Normalization aims to avoid redundant
duplication. .
 Data duplication does not always imply
redundancy. Data can be duplicated for efficiency
purposes.
 However, this duplication must be controlled.
Duplicated data is present when an attribute has
two (or more) identical values.
 A data value is redundant if you can delete it
without information being lost, so redundancy is
unnecessary duplication.
 Consider Figure 1 below:
The purpose of Normalization
cont..
Part Part .Supp-Part Supp-Part
P# P-desc. P# P-desc. S# P# P-desc. S# P# P-desc.
p2 nut del. nut p2 ---- S2 P1 bolt del. bolt S2 P1 bolt
p1 bolt loss info. p1 bolt S7 P6 bolt loss no
info. S7 P6 bolt
p3 washer p3 washer S2 P4 nut S2 P4 nut
p4 nut p4 nut S5 P1 bolt S5 P1 ----
(a) (b) (C)
Duplicated, but not redundant. Redundantly duplicated data
Figure 1 Redundancy vs. Duplication
loss no
The purpose
info. of Normalization
S7 P6 bolt
t
olt
cont..S2 P4 nut
S5 P1by----
 We have eliminated redundancy splitting the
 table. The
(C) . this split is that P1
advantage of
 description appears
Redundantly only once.data
duplicated
 We have linked the two tables by including p# in
the two tables.
s. Duplication

Supp-Part-1 Part-1
S# P# P# P-desc.
S2 P1 P1 bolt
S7 P6 P6 bolt
(d) S2 P4 (e) P4 nut
S5 P1
Figure 2, Splitting the table
eletion / insertion. This is called
The purpose of Normalization
cont..
So far we implied that table structures which
permit redundancy .could be recognized by
inspection of the table occurrence.
 This is not entirely accurate due to the fact that
attribute values are subject to change / deletion /
insertion.
 This is called deceptive appearances.
 Consider deleting the 4th row from Supp-Part (c)
table. This result in table Supp-Part-2, (f) as:

S# P# P-desc.
S2 P1 bolt (f)
S7 P6 bolt
S2 P4 nut
The purpose of Normalization
cont..
Inspection of (f) does not reveal any redundant
data. .
 It could be even consistent with a rule: "No two
suppliers may supply the same p#".
 Hence, a snapshot of table is an inadequate guide
to presence/absence of redundant data.
 We need to know underlying rules and the DBA
must discover the rules, which apply to the
conceptual model.
 These are the functional dependency rules that
we have covered in the last lecture.
The purpose of Normalization
cont..
The above discussion lead us to the fact that
whenever we split a. table for the purpose of
reducing unnecessary duplication then we must
maintain two important properties during the
decomposition.
 The first one is the Lossless-join property, this
enables us to find any instance of the original
relation from corresponding instances in the
smaller relations.
 The second property is to maintain the
Dependency preservation property, this enables
us to enforce a constraint on the original relation
by enforcing some constraint on each of the
smaller relations.
First Normal Form (1NF)
 .
The data as it first collected may or may not be
suitable to be stored in a relational table.
 In order to be able to place it in a relational table
it must have certain criteria.
 This main basic criterion is summarized by each
single cell in a relational table must only hold a
single atomic value.
 When we store such data into a table format we
refer to the table as an Unnormalized Form
(UNF).
 The creation of such a table results from the
process of transforming the data from the
information source (e.g. a sample form) into table
format with columns and rows.
First Normal Form (1NF) cont..
 .
Consider the Student_Course relational schema
and the table snapshot below:
 STU_COURSE

Sid Name Semester Courses


1 Jim fall95 comp231, comp334
1 Jim spring94 comp111
2 Alice spring94 comp111,comp211
First Normal Form (1NF) cont..
The column “Courses” indicates that the student
.
“Jim” is doing two courses in fall95 and one in
spring94.
 The first row has repeating groups and hence the
table is unnormalised.
 To transform the table into 1st Normal Form
(1NF) we need either to split the table into two
(as shown in the above section) or repeat the
data for each course (though this may create
unnecessary duplications, but we can get rid of
that later). This could results in the table below:
 STU_COURSE_1NF
Sid Name Semester Course
1 Jim fall95 comp231
1 Jim fall95 comp334
1 Jim spring94 comp111
2 Alice spring94 comp111
2 Alice spring94 comp211
First Normal Form (1NF) cont..
 .
The steps of transformation from UNF into 1NF
are as follow:
1. Nominate an attribute or group of
attributes to act as the key for the
unnormalized table.
2. Identify the repeating group(s) in the
unnormalized table, which repeats for the
key attribute(s).
3. Remove the repeating group by entering
appropriate data into the empty columns
of tuples containing the repeating data
(‘flattening’ the table),
4. or by placing the repeating data along
with a copy of the original key
attribute(s) into a separate relation.
Second Normal Form (2NF)
 .
The 2NF uses the concepts of FDs and prime
attribute.
 A prime attribute is an attribute that is member
of the primary key K.
 The term Full functional dependency is denoted
by a FD, such as Y → Z where removal of any
attribute from Y means the FD does not hold any
more.
 For example, using the company schema again,
consider the following.
 {SSN, PNUMBER) → HOURS is a full FD since
neither SSN → HOURS nor PNUMBER → HOURS
hold.
 {SSN, PNUMBER} → ENAME is not a full FD (it is
called a partial dependency) since SSN → ENAME
also holds.
Second Normal Form (2NF) cont..
A relation is in 2NF if.it is in 1NF and every non-
primary-key attribute is fully functionally
dependent on the primary key.
 Consider the schema in figure 4.

 There are no repeating groups, nested


relationship or multi-valued attributes. Hence, the
relation Emp_Proj satisfies the conditions of 1NF.
 However, the non-prime attributes Ename,
Pname & Plocation are partially dependent on the
primary key. Hence, the relation does not satisfy
the FD condition of the 2NF.
Second Normal Form (2NF) cont..
 .
We need to split the tables so that all non-prime
attributes should be fully dependent on the
primary key of the relation.

 The steps of transformation from 1NF into 2NF


are as follow:
1. Identify the primary key for the 1NF relation.
2. Identify the functional dependencies in the
relation.
3. If partial dependencies exist on the primary key
remove them by placing then in a new relation
along with a copy of their determinant.
Third Normal Form (3NF)
 . concept of transitive
This is based on the
dependency.
 A transitive dependency is a condition where A, B
and C are attributes of a relation such that if A →
B and B → C,
 then C is transitively dependent on A through B.
 Consider the company database.

 SSN → DMGRSSN is a transitive FD since


 SSN → DNUMBER and DNUMBER → DMGRSSN
hold.
 SSN → ENAME is non-transitive since there is no
set of attributes X where SSN → X and X →
ENAME.
 A relation schema R is in third normal form (3NF)
if it is in 2NF and no non- prime attribute A in R is
transitively dependent on the primary key.
Third Normal Form (3NF) cont..
 .
This means that all non-prime attributes should
be fully and directly dependent on the primary
key.
 Consider the example in figure 6.

 The relation Emp_Dept is in 1NF since it has no


repeating groups. It is also in 2NF since all its non-
prime attributes are fully dependent on its primary
key.
 The only difference between this and the previous
example is that the attributes Dname and DmgrSSN
are indirectly dependent on the primary key SSN, but
it is still fully dependent on it.
Third Normal Form (3NF) cont..
That is why it is in .the 2NF. To transform the
table into a 3NF we need to get rid of this indirect
or transitive dependency.
 So we split the table into two separate tables as
shown in figure 7.
 .
Third Normal Form (3NF) cont..
 .
The steps of transformation from 2NF into 3NF
are as follow:

1. Identify the primary key in the 2NF relation.

2. Identify functional dependencies in the relation.

3. If transitive dependencies exist on the primary


key remove them by placing them in a new
relation along with a copy of their determinant
(dominant).
Boyce-Codd Normal Form

(BCNF):
The above definitions of the 2nd and 3rd NFs consider the
primary key only. .
 A relation may have more than one candidate key. A
relation may still have redundancy problems in 3NF as it
ignores relationships between candidate keys.
 We need a more general definition that is based on
functional dependency that takes into account all candidate
keys in a relation.
 A BCNF is stronger form than the 3NF in a sense that it
does not suffer from anomalies even if the table has
relationships among its candidate keys.
 A relation is in BCNF, if and only if every determinant is a
candidate key
 BCNF is a very strict 3NF. All 3NF relations that are based
on a single key are in BCNF.
 Violation of BCNF may occur in a relation that contains two
(or more) composite keys, which overlap and share at least
one attribute in common
Boyce-Codd Normal Form
(BCNF) cont..
 Consider this scenario:
.
 The XYZ company provides end user
software training in Database, Network &
Spreadsheets. XYZ employs several
trainers in each of the three subjects. Each
trainer teaches only one subject, that is
for example, a database trainer teaches
database only. Corporate customers may
elect to purchase training contracts for one
or more subjects.
 The table in figure 8 represent a snapshot
of the Client-Staff table and its FDs.
Boyce-Codd Normal Form
(BCNF) cont..

Client Subject Staff


Client Subject Staff
1001 Database Ala
1001 Network Sati
1002 Database Ala Client Staff Subject
1003 Spreadsheet Phil
1004 Database Moh
Boyce-Codd Normal Form
(BCNF) cont..
.
 Figure 8 shows two candidate keys. There
are:
 {Client, Subject}Î Staff & {Client, Staff} Î
Subject
 These candidate keys are overlapping on
Client. The table is in 3NF but it still
suffers from modification anomalies.
 For example consider deleting client 1004.
This will also delete the information that
“Moh” teaches Database. The same is true
for client 1001 on Network.
 The table also suffers from insertion and
update anomalies. Hence, we need to
decompose the table into two tables to
illuminate these anomalies
Boyce-Codd Normal Form
(BCNF) cont..
.
 The table could be split as shown in figure
9.
 .
Client Staff Staff Subject

Client Staff Staff Subject


1001 Ala Ala Database
1001 Sati Sati Network
1002 Ala Phil Spreadsheet
1003 Phil Moh Database
1004 Moh
Boyce-Codd Normal Form
(BCNF) cont..
.
 The steps of transformation from 3NF into
BCNF are as follow:
1. Identify all candidate keys in the relation.
2. Identify all functional dependencies in the
relation.
3. If functional dependencies exist in the
relation where their determinants are not
candidate keys for the relation, remove
the functional dependencies by placing
them in a new relation along with a copy
of their determinant.
Relational DBS / Procedural
Database Languages and PL
.
 The aim of this lecture is to continue the
knowledge gained previously on SQL and
build on it to discuss another database
programming language that has become
very popular in recent years due to its
power and flexibility.
 This lecture will cover the following topics
briefly.
1. Programming in PL/SQL
2. The basic block structure in PL/SQL
3. Cursors in PL/SQL
Programming in PL/SQL
.
 PL/SQL is Oracle's procedural language.
 It comprises the standard language of
SQL and wide high-level programming
language features. PL/SQL is seamlessly
integrated with SQL implementation for
most DBMSs (e.g. Oracle).
 It can be considered as a superset of SQL.
However, SQL DDL statements cannot be
included in PL/SQL as part of the code.
 As PL/SQL code is often stored in its
compiled version and it cannot refer to
objects that do not yet exist at compile
time1.
Programming in PL/SQL cont..
 . control the execution of
PL/SQL enable you to
SQL statements according to different conditions.
 It can also handle runtime errors. Options such as
block structure, various data types, assignment
and conditional statements, loops and error
handling routines give PL/SQL the power of a
third-generation programming languages.
 PL/SQL allows you to write interactive, user-
friendly programs that can pass values into
variables.
 You can store procedures as compiled programs
in the database and give various users shared
access to these functions and procedures which
are usually stored at the server side.
Programming in PL/SQL cont..
For examples, these applications
. can be running
as Web applications, under the web application
server. PL/SQL code is grouped into structures
called blocks.
 A block can be unnamed in which case it is
known as an anonymous block.
 A block (or more) can be saved in a file and be
executed by calling the file from inside PL/SQL (or
SQL) environments.
 If the block is given a name then it is called a
sub-program. The sub-programs can be written
and stored as procedures or functions.
 These can be accessed and shared by many users
as Web applications (stored at the Web server),
stored and accessed at the client side to improve
the user interface, or as triggers or functions
attached to entry or query Oracle forms.
 The only difference between a stored procedure
and a function is that a function returns a single
value after execution.
Programming in PL/SQL cont..
A stored procedure . or functions2 should
encapsulate a logical set of commands that are
often executed sequentially.
 One of the biggest advantages of stored
procedures lies in the design of their execution.
 A client can be in constant communication with
the server to execute a sequence of SQL
commands, which can be a major performance
problem on the network.
 Stored procedures can tremendously reduce the
communication load.
 The database server executes SQL statements
sequentially and output data or messages may
only be returned when the procedure is finished.
 This approach improves performance and offers
other benefits as well. Stored procedures are
actually compiled by database engines the first
time they are used. which improves
performance..
Programming in PL/SQL cont..
The compiled map is . stored on the server with
the procedure.
 Therefore, you do not have to optimize SQL
statements each time you execute them, which
improves performance.
 Built-in functions are of two types. The first type
includes number functions (trigonometric or
mathematical functions).
 The second type includes conversion functions.
For examples:
 TO_CHAR: converts numbers and dates to
character type
 TO_DATE: converts text to dates
 TO_NUMBER: converts text to number and
more functions can be found.
The basic block structure in
PL/SQL
PL/SQL programs are divided and written in

. A block consists of DML
logical blocks of code.
commands.
 Figure 1 below represents the basic syntax for a
PL/SQL code.
 A block is a logical unit of PL/SQL code,
containing at the least a PROCEDURE section and
optionally the DECLARE and EXCEPTION sections.
The basic block structure in
PL/SQL cont..
[BEGIN] . -- beginning of block
[DECLARE] -- variable definitions
Declarations;
{BEGIN} -- beginning of
procedure section
Statements;
[EXCEPTION] -- beginning of
exception section
Handlers;
{END;} --ending of procedure
section
[END] -- denotes ending of block
The basic block structure in
PL/SQL cont..
The DECLARE section contains the definitions of

. such as constants and
variables and other objects
cursors (see later). This section is an optional
part of a PL/SQL block.
 The PROCEDURE section contains conditional
commands and SQL statements and is where the
block is controlled.
 This section is the only mandatory part of a
PL/SQL block in which statements are placed.
 The EXCEPTION section tells the PL/SQL block
how to handle specified errors and user-defined
exceptions. This section is an optional part of a
PL/SQL block.
The basic block structure in
PL/SQL cont..
The variables in the DECLARE section are the
same data types as in .SQL or users defined types.

 These include Char, Number, Date, Boolean, etc.


Variable declaration can include assignments as a
default or initial value.
 You must initialize a variable that is defined as
NOT NULL. Each individual variable must be
defined on a separate line ended with a “;”. For
example the declaration:
 DECLARE
 TeachingYear number (4) NOT NULL: = '2001';
 Figure 1, Syntax for basic PL/SQL block The
%TYPE is a variable attribute that returns the
value of a given column of a table.
 Instead of hard-coding the data type in your
PL/SQL block, you can use %TYPE to maintain
data type consistency within your blocks of code.
The basic block structure in
PL/SQL cont..
For example, see figure 2, which is a program

. last name and address
that picks the student
from a student table.
 The program declare variables to match the types
of the database attributes that the program will
process.
 The %type in each variable declaration means
that the variable is of the same type as the
corresponding column in the table.
 This procedure section calls variables and uses
cursors to manipulate data in the database.
 The PROCEDURE section is the main part of a
block, containing conditional statements and SQL
commands.
The basic block structure in
PL/SQL cont..
 Declare .
 V_Sname Student.Lname%Type;
 V_Address Student.Address%Type;
 Begin
 Select Lname, Address
 Into V_Sname, V_Address
 From Student
 Where City = 'Kuala Lampur';
 Exception
 When others
 DBMS_Output.Put_line ('Error Detected');
 End;
The basic block structure in
PL/SQL cont..
 The EXCEPTION section . is optional and if it is
omitted and errors are encountered, the block will
be terminated.
 Some encountered errors may be trivial and there
is no need to terminate the execution of a block,
so the EXCEPTION section can be used to handle
specified errors or user -defined exceptions in an
orderly manner.
 Exceptions can be user-defined, although many
exceptions are predefined by DBMS.;
Cursors in PL/SQL
 A query execution can.return none or more
records from the target tables or views.
 This depends on how many records satisfy the
search condition(s).
 Results with multiple rows are often need to be
processed individually.
 Therefore we need a pointer to process a record
at a time. This type of a file pointer or variable is
refereed to as a Cursor.
 A database cursor is similar to the cursor on a
word processor screen.
 As you press the Down/Up arrow key, the cursor
scrolls down/up through the text one line at a
time.
Cursors in PL/SQL cont..
 You can skip to a .specific position in either
direction in the same way as keyboard's keys
such as Page Up and Page Down moves you
within a document. Database cursors operate in
the same way.
 One other common use of cursors is to save a
query's results for later use.
 A cursor's result set is created from the result set
of a SELECT query.
 If your application or procedure requires the
repeated use of a set of records, it is faster to
create a cursor once and reuse it several times
than to repeatedly query the database.
 The example in figure 3, represents a cursor that
has been defined to get each record that satisfies
the condition and insert it in a new table.
Cursors in PL/SQL cont..
 DECLARE .
 CURSOR C1 IS
 SELECT Name, Tel, Add
 FROM Student
 Where College = 'ELC';
 BEGIN
 FOR REC IN C1
 LOOP
 INSERT INTO ELC_Students (S_name,
S_tel, S_address)
 VALUES (REC.name, REC.tel,
REC.add);
 END LOOP;
 END;
 .

Cursors in PL/SQL cont..
Cursors are controlled by three commands. These
.
are Open, Fetch and Close.
 The common steps to create, use, and close a
database cursor are as follow (see figure 4):
1. Create the cursor.
2. Open the cursor for use within the procedure or
application. This executes the query, retrieves
the resulting set of records, and sets the position
of the cursor to a position before the first record
in the result of the query.
3. Fetch a record's in the first time retrieves the
first row into the program variables and sets the
cursor to point to that row. A subsequent fetch
brings the second row and so on (one row at a
time) until end of the cursor's records.
4. Close the cursor when the last row has been
processed.
5. De-allocate the cursor to completely discard it.
Cursors in PL/SQL cont..
 DECLARE .
 Cursor Student_cursor is
 Select stu_id, stu_name from students;
 student_record student_cursor%ROWTYPE;
 BEGIN
 Open student_cursor;
 Loop
 Fetch student_cursor into student_record;
 End loop;
 Close student_cursor;
 END;
Cursors in PL/SQL cont..
The example in figure 4 fetches the current row

.
of the cursor into the aggregate variable
 student_record. It uses a loop to scroll the cursor.
The %ROWTYPE attribute declares the data type
of a variable to be the same as each column in
one entire row of data from the cursor.
 In this example we declare a variable called
student_record. The variable student_record has
the same data type as an entire row of data in
the student_cursor. Variables declared using the
%ROWTYPE attribute are also called aggregate
variables.
 A cursor can be created within a session, stored
procedure or a trigger. In a session the cursor
would exist until the user session ends (log off).
 A cursor created inside a stored procedure or a
trigger is good only during the execution of the
stored procedure or trigger.

You might also like