0% found this document useful (0 votes)
2 views

CIS 472 Database System

CSC 472 (Database Systems) is a course designed to introduce students to the principles of database design and applications, covering topics such as data models, database architecture, and SQL. The course includes various units that explore traditional record management systems, the importance of databases, architectural design, and database security, among others. Practical tasks on Microsoft Access RDBMS are also included to enhance learning and application of the concepts taught.

Uploaded by

samuelayomide032
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

CIS 472 Database System

CSC 472 (Database Systems) is a course designed to introduce students to the principles of database design and applications, covering topics such as data models, database architecture, and SQL. The course includes various units that explore traditional record management systems, the importance of databases, architectural design, and database security, among others. Practical tasks on Microsoft Access RDBMS are also included to enhance learning and application of the concepts taught.

Uploaded by

samuelayomide032
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 131

Ibadan Distance Learning

Centre Series

CSC 472
(Database Systems)

By

S. O. Akinola (PhD)
Department of Computer Science,
University of Ibadan,
Ibadan, Nigeria
GENERAL INTRODUCTION AND COURSE OBJECTIVES

CSC 472 (DATABASE SYSTEMS) is a course meant to introduce students to the general
principles of Database design and applications.

In this course, we study databases in its introductory context covering meaning of databases,
types, data modelling, entity relationships and relational principles, in addition, Structured
Query Language (SQL) is introduced to students to build upon.

The Course Contents


Data models; database and schema design; Basic database architecture. Relational database
systems. Relations. Conceptual modelling using ER modelling. Relational Calculus. Relational
Algebra. Schema normalization and integrity constraints; query processing; query optimization
and transactions; recovery; concurrency control; isolation and consistency; distributed, parallel,
and heterogeneous databases; adaptive databases; trigger systems; DB as a service. Database
query languages.
Semester 1, LH 30; PH 45; 4U; Status C

2
TABLE OF CONTENTS

Unit 1: The Traditional Record Management System


1.1 The Importance of Data
1.1.1 Differences between Data and Information
1.1.2 Forms of Data
1.2 The Manual Record Management System
1.3 Automated Data Management
1.4 The Integrated File System
1.5 Components of File Based System
1.6 Problems of File Based System

Unit 2 Meaning and Importance of Database


2.1 The meaning of Database?
2.2 Advantages of Databases over Integrated File System
2.3 Elements of a Database System
2.4 Contexts, Characteristics and Consequences of Database Environments
2.5 Benefits of the Database Approach
2.6 The Structure of a Database System

Unit 3 Architectural Design of a Database System


3.1 Components of a Database System
3.2 Functional Roles of the Components

Unit 4 Database Management System (DBMS)


4.1 The meaning of DBMS?
4.2 Functions of the DBMS
4.3 Components of the DBMS environment
4.4 Functional requirements of a DBMS

Unit 5 Data Models


5.1 The meaning of a data model
5.2 Logical Data Model
5.3 Physical Data Model
5.4 Hierarchical Data Model
5.5 Network Data Model

Unit 6 The Entity Relationship Data Model


6.1 Basic Concepts of ER Models
6.1.1 Entity Sets
6.1.2 Property
6.1.3 Relationships
6.2 ER Symbols

Unit 7 Relational Database Model


7.1 Relational Database (RD) Structure
7.1.1 Relation
7.1.2 Attribute
7.1.3 Domain
7.1.4 Tuple

3
7.1.5 Relational database
7.2 Properties of Relational Tables
7.3 Relational keys
7.3.1 Superkey
7.3.2 Candidate key
7.3.4 Primary key
7.3.5 Foreign key
7.4 Representing Relational Databases

Unit 8: The Relational Database Design Process


8.1 The Basic Steps in Designing a Database System
8.2 The purpose of the system
8.3 The tables that are needed in the system
8.4 Identification of fields with unique values
8.5 The relationships between tables
8.5.1 Relationship between Tables: Further Explanations
8.6 Refining the design
8.7 Entering data and create other system objects

Unit 9: Relational Database Integrity


9.1 Referential Integrity Concept
9.2 Types of Database Integrity
9.3 Relational Integrity Rules
9.4 Nulls
9.5 Entity integrity
9.6 Referential Integrity
9.7 Other Business Rules

Unit 10: Relational Calculus – An Introduction


10.1 The Relational Calculus
10.2 Relational Operators
10.3 One Table Operators
10.3.1 Restrict/Select
10.3.2 Project
10.4 Two-Table Operators
10.4.1 Cartesian Product.
10.4.2 Union.
10.4.3 Intersection
10.4.4 Difference
10.4.4 Divide / Quotient ()
10.4.5 Natural Join
10.4.6 Theta Join ()
10.4.7 Equi Join

Unit 11 Structured Query Language (SQL) – An Introduction


11.1 Few SQL Commands
11.1.1 DDL- Data Definition Language
11.1.2 DML - Data Manipulation Language
11.1.3 DCL- Data Control Language
11.2 Creating Tables

4
11.3 SELECT STATEMENT (GENERAL FORMAT)

Unit 12 Structured Query Language (SQL) – Continuation


12.1 Join, Union, Intersection Queries with Select
12.1.1 A Simple Equijoin
12.1.2 Natural Join
12.1.3 Intersection and Union for two compatible tables
12.1.4 Intersection

Unit 13 Structured Query Language (SQL) – Continuation


13.1 String Operations
13.2 Using IN in Nested Queries
13.3 Aggregate Functions in SQL .
13.4 Use of GROUP BY
13.5 Use of HAVING Clause
13.6 SQL Update Operations - Putting data into the database
13.7 DELETE Operation
13.8 General Format for Inserting a New Record

Unit 14 Normalization of Relational Databases


14.1 Meaning of Database Normalization?
14.2 Basic Concepts
14.3 Database Problems without Normalization
14.3.1 Insertion Anomaly
14.3.2 Updating Anomaly
14.3.3 Deletion Anomaly
14.4 Normalization Rules
14.4.1 First (1st) Normal Form, INF
14.4.2 2nd Normal Form, 2NF
14.4.3 3rd Normal Form, 3NF
14.4.4 Boyce Codd Normal Form - BCNF
14.4.5 Fourth Normal Form (4NF)

Unit 15 Database Security


15.1 Meaning of Database Security
15.2 Importance of Data Security
15.3 Threats to Database
15.3.1 Major Threats to Data Security
15.3.2 Types of Threats
15.4 Integrity Controls: Backups
15.5 Aspects of Data Security
15.6 Types of Security Control on Data
15.7 Database Security Best Practices
15.8 Role of Database Administrator in Data Security
15.9 How to Secure a Database Server
15.10 Security in SQLs

5
Unit 16 Database Transactions and Concurrency Controls
16.1 Meaning of Database Transaction
16.2 Properties of Transaction
16.3 Concurrency Controls
16.4 The Need for Concurrency Control

Practical Tasks on Microsoft Access RDBMS

Where to type SQL statements in Microsoft Access "2007", "2010", "2013" or Access
"2016"

SQL Tutorials (Microsoft Access SQL)

Practical Tasks

Further Readings

6
Unit 1: The Traditional Record Management System
Expected Duration: 1 week or 2 contact hours

Introduction
This Unit introduces you to the meaning and importance of data. The traditional method of
storing data in the computer is discussed. The meaning of files, records and fields are also
discussed. The Unit ends with the problems of file based systems.

Learning Outcomes
When you have studied this session, you should be able to explain:
1.2 The Importance of Data
1.1.1 Differences between Data and Information
1.1.2 Forms of Data
1.2 The Manual Record Management System
1.3 Automated Data Management
1.4 The Integrated File System
1.5 Components of File Based System
1.6 Problems of File Based System

1.1 Importance of Data

Data are discrete facts about real world phenomenon. Data and its storage may be considered
to be the heart of any information system. Data has to be up to date, accurate, accessible in the
required form and available to one or perhaps many users at the same time. For data to be of
value, it must be presented in a form which supports the various operational, financial,
managerial, decision-making, administrative and clerical activities within an organization.

To meet these objectives, data needs to be stored efficiently – to avoid lengthy access times –
and with minimal duplication – to avoid lengthy update times and the possibility of
inconsistency and inaccuracy.

For the data stored by a given organization to have any value at all, it must be
 Accurate
 Consistent
 Meaningful
 Comprehensive
 Relevant
 Timely
 Suitable

Information is a processed data that is meaningful and useful to the user. It is a resource
produced by information system that is important and essential to the operation and
management of a business or an organization.

7
1.1.1 Differences Between Data and Information
The following table gives the differences between data and information
Data Information
1. Raw facts or figures Finished figure/facts
2. Unstructured Structured
3. Unprocessed Processed
4. What exists What is required

1.1.2 Forms of Data

Data can takes the forms of text, image, audio or video.

(i) Text: This includes series of letters, numbers, and other characters whose combined
meaning does not depend on a pre-specified format. For example, a word processed
document, which by reading and interpreting gives information.

(ii) Images: This includes data in the form of pictures. Pictures can either be graphs
generated from formatted text, or photographs or hand-draw pictures.

(iii) Audio: This includes data in the form of sounds. For example, the sound that a
doctor hears by listening to their stethoscope. By listening and interpreting the
sound, they are able to get some information on what the patient's ailment is, if any.
(iv) Video: This includes data that combines the image and audio information types. In
essence, this type of information is imparted through the use of sounds and pictures
by viewing and listening over a period of lime. An example of this type of
information is the videoconference.

An example of data of a student is produced in Table 1.

Table 1: A Student Data

Datum Meaning Type of Data


MatricNo The Matriculation Can be Numeric or Text. E.g. 178654
number of the student (UI format, purely numeric) or
STA/2015/UG/123 (Another university’s format
showing the department of the student, year of
entry and indicating that the student is an
undergraduate with number 123
Name Name of Student Purely alphabetic or string/text e.g. Usman
Uloma Ajao
PhoneNo Mobile Phone number Purely Numeric e.g. 080978654321
of the student
Gender Gender of the student Text data, e.g. Male or Female

8
A typical data processing function will be to locate a student record or data from a large
file based on MatricNo. If this is to be carried out manually a lot of time and efforts will be
required by the data processing personnel.

1.2 The Manual Record Management System

The manual method of business record management involves the use of files, file jackets and
file cabinets with human beings moving files and information manually from one place or
location to another. The manual method of data processing is plagued with certain noticeable
disadvantages when compared with the use of automated systems for the same purpose. In
today’s world, the principal means of office automation among others is the computer system.

A critical look at this manual system reveals that a lot of space is occupied by the cabinets. The
more the files or records, the more difficult it becomes to arrange and keep track of them, thus
making information retrieval slow and laborious. The issues of security against theft, fire and
other physical factors are also of great concern.

1.3 Automated Data Management

Automated processing includes among others, the use of computers and other information
technology equipment, principles and practice for managing the information resources of an
individual or organisation.

1.4 The Integrated File System

Computerisation started by taking each department or section in an organisation and


computerising their operations. There was no interconnectivity of computers as we now have.
Organisational operations got improved but the system does not represent how an organisation
should work in an ideal situation. This method of operation is referred to as the integrated file
system. The integrated file system was proposed as an approach to solving the problems of
piece-meal computer usage within an organisation.

In an integrated file system, the data is pooled into a set of interlocking and inter-dependent
files which are accessible by a number of different users. Some of these integrated file systems
today have been tailored to meet the requirements of particular organisations but they still
suffer from the problem of data duplication and therefore lack proper central control.

1.5 Components of File Based System

(i) File: A file is a complete, named collection of information and the basic unit of storage
that enables a computer to distinguish one set of information from another. For
example, a file named “Aircraft” might contain information about the different types of
aircraft used by a particular company.
(ii) Records: The data held within a file are organised into structured groups of related
elements that describe a person or things called records. For example, a record
describing an individual an aircraft might be composed of the data elements:
“ID_Number, Manufacturer_Name, Description, Classification, Seating_Capacity”
and so on. The aircraft file then contains zero, one or many such records; where each
record describes an individual aircraft.

9
(iii) Fields: The individual elements of a record are referred to as fields. Hence, from the
example of the aircraft, “ID_Number, Manufacturer_Name, Description,
Classification, Seating_Capacity”, etc, each represents an individual field (element) of
the aircraft record. A field is a character or group of characters that has a specific
meaning for describing data. It is also referred to as smallest unit of information in a
record. One or more fields make a record while one or more records make up a file.
(iv) Data Types: The data to be held within each field of a given record will possess certain
characteristics in terms of size (length measured in characters or digits) and types
(numeric, alphabetic, dates, etc). Each field of a record is allocated a particular data
type which describes the allowed characteristics of the data to be held by the field and
further indicates the range of operations, which can be carried out on the field. For
example, arithmetic operations would be valid on fields containing numeric data but
not on fields containing address or a narrative description.
(v) Keys: A key is a field or combination of fields used to identify a record. When a key
uniquely identifies a record, it is referred to as the primary key.

1.6 Problems of File Based System

(i) Simplest tasks that require extensive programming and data manipulation involves high
skilled activities such as data path specification and the understanding of storage
structures.
(ii) There were no provisions for user-friendly system facilities such as queries.
(iii) System centralized control/ administration was difficult.
(iv) With each file having its own file management system, data access programs are
subject to change whenever the structure of the files being accessed changes, i.e.
structural dependencies.
(v) Unnecessary duplication of data. Same data are stored in different files located in
different units of an organization. For instance, a change in marital status of a staff
would mean to update several files and queries across the units in the organisation.
(vi) Data security was virtually non-existent. Changes in file or data characteristics such as
changing a field from integer to real or decimal requires changes in all programs
accessing the data. This is data dependency.
(vii) Lack of data integrity. Data or information about the same element or staff in an
organisation could have different values when updates are not done across board. There
are data anomalies due to inconsistencies and human handling during data transference.

Summary
In Unit 1, you have learnt:
 Importance of Data
 The Manual Record Management System
 The Integrated File System
 Components of File Based System
 Problems of File Based System

Self Assessment Questions (SAQs)

1. What is a file?
2. Discuss the traditional file based system and its problems

10
Unit 2 Meaning and Importance of Database
Expected Duration: 1 week or 2 contact hours

Introduction

We assume that most people have some notion of "database". We see databases in everyday
life - collections of CDs we can order from a company, a phonebook of phone number and
name entries, parts stocked by a supplier to be supplied to a project, records to be processed by
a program, a general repository that a program acts upon (like a cgi-bin program acting on a
web client's behalf to read and write data to disk). This Unit introduces the reader to the
meaning and functional roles databases play in organisation.

Learning Outcomes
When you have studied this session, you should be able to explain:
2.1 The meaning of Database?
2.2 Advantages of Databases over Integrated File System
2.3 Elements of a Database System
2.4 Contexts, Characteristics and Consequences of Database Environments
2.5 Benefits of the Database Approach
2.6 The Structure of a Database System

2.1 What is a Database?

One of the underlying ideas in modern information systems is database; with the database
management system (DBMS) software which manages the database. Database is the storehouse
of data used by other packages. The data must be well organised so that updating, addition,
deletion, etc. would be easier.

All definitions given to database are centred on one basic principle: the collection of related
data that are of importance to an enterprise. It is a central store of independent data and a
description of the data so – collected and stored.

Colin Ritchie (1998) emphasized that to be worthy of being a database, it must have two
essential properties:
(i) It holds data as an integrated system of records.
(ii) It contains self describing information, i.e., it contains description of the data held in
the database, sometimes referred to as the database schemas.

Stefan (1990) defines database as a structured collection of operational data together with a
description of that data. Also, according to Claude et al. (1995), “A database can be seen as a
collection of data managed by a computer, which can be accessed by several users at the same
time”.

According to C. J. Date, a database system is a computerised records keeping process. The


question is “What records to keep?” Records of entities that are of relevance to an enterprise,
a business or activities that are relevant. The records would contain information about the status
and activities of those entities in so far as they have relevance to that business in question. The

11
records would also be kept in data files and these could be data tables, queries, report formats,
programs or procedures. Moreover, the database system provides facilities for creating,
updating, querying, using and administering with the database.

With a bit more precision, when we use the term database, we mean a logically coherent
collection of related data with inherent meaning, built for a certain application, and representing
a "mini-world".

Let’s examine the definitions of a database in detail to understand this concept fully. The
database is a single, possibly large repository of data, which can be used simultaneously by
many departments and users. All data that is required by these users is integrated with a
minimum amount of duplication. And importantly, the database is normally not owned by any
one department or user but is a shared corporate resource.

2.2 Advantages of Databases over Integrated File System

A database system takes care of the problems identified with the integrated file system
discussed in Unit One because it is a single organised collection of structured data with a
minimum of duplication of data items so as to provide a consistent and controlled pool of data.
Databases are set up in order to meet the information needs of major parts of an organisation.
The data in a database is common to all users of the system but is independent of programs that
use the data. Databases are constructed by sections. During this process, it is possible to do the
following:
(a) Add new files of data
(b) Add new fields to records already present in the database
(c) Create relationships between the items of data.

2.3 Elements of a Database System

(i) The data, which are often structured into files, records and fields and also integrated
(linked) across different files.
(ii) The hardware for processing and doing all the manipulations on the data and also for
storage and communication in a network.
(iii) Software comprising three main levels – Operating System (single or
network/distributed user); Database Management System (DBMS) and the application
software, e.g. accounting, payroll, inventory control, personnel management systems.
(iv) Users at three levels – End users; application programmers customising applications
for users and the Database Administrator at the lowest level, related but with different
functions.
(v) Policy or organisational framework that would provide the policy concept within which
the database system would operate.

2.4 Contexts, Characteristics and Consequences of Database Environments

(i) A database may be implemented as a single or multiple data tables, multiple being the
common type.
(ii) Some of the data may be internal within the machine or external – distributed across a
distant network.
(iii) Massive data is the essence of database systems.

12
(iv) Different data storage media – Hard disks, CD, Tapes for parts of the data in the same
database. A database needs to be stored on a large capacity direct access device, with a
good back up for security purposes.
(v) Usually database has multiple users who may need concurrent (simultaneous) access to
data.
(vi) Data security requirements to protect all the investment made on the database to prevent
cyber war (a situation in which a company destroys other company’s database), virus
attack, etc.
(vii) Data recovery system in case of natural or artificial disaster.
(viii) Balancing of time of processes with storage space requirements must be considered in
designing a database system.
(ix) Consider whether the database has the static or dynamic property. Static means the
database wont need any addition to it once it is created. Dynamic means we can add or
delete to it any time.
(x) Relative frequencies of update (adding, deleting, modifying) and retrieval operations
must be considered also.
(xi) Whether the requirement of processes is real time (instantaneous) or batch.

The concept of a database system derives from a situation or become important in situations
when we have persistent data that are accumulating and would need to be maintained on a long
time basis. Also, the data must be organised, protected and use it at one time or the other to
support the activities of a given enterprise. The data must be pooled, integrated and shared
among many often concurrent users in a business or an enterprise.

2.5 Benefits of the Database Approach

(i) Database is a computerised management system. The first advantage of database


derives from its computerisation – compactness, speed, reliability, accuracy, etc.
(ii) Centralised control of data resources of an enterprise, which often results in
reduction in data redundancy, i.e., elimination of unnecessary duplication of data.
High redundancy means inconsistency.
(iii) There is data pooling and sharing capacity.
(iv) Standards on the use of data can be enforced.
(v) Security restrictions on the data can be applied.
(vi) Data integrity can be maintained.
(vii) Conflicting user requirements can be balanced.

2.6 The Structure of a Database System

Figure 2.1 shows the structure of a database system. Data from different units are pooled into
a database. Application programs running on machines in different units/departments of the
organisation have access to the database via a Database Management System (DBMS) like
Microsoft Access.

13
Programs
PERSONNEL DATABASE

EMPLOYEES RECORD
SALES DEPT
DBMS CUSTOMERS RECORD
ACCOUNTS
DEPT SALES RECORD

INVENTORY RECORD
PURCHASING
DEPT ACCOUNTS RECORD

Figure 2.1: The structure of a Database System

Summary
This Unit briefly introduced you to database system. The meaning and elements of database
system are discussed. The advantages of database over the traditional file based system were
also explained

Self Assessment Questions (SAQs)


1. Attempt to define database in a concise and professional manner
2. Discuss the general characteristics and advantages of database systems.
3. Mention one or two places where databases are deployed and why they operate
databases.

14
Unit 3 Architectural Design of a Database System
Expected Duration: 1 week or 2 contact hours

Introduction

The components of a database system are discussed in this Unit. The construction of a database
system is also explained.

Learning Outcomes
When you have studied this session, you should be able to explain:
3.1 Components of a Database System
3.2 Functional Roles of the Components

3.1 Components of a Database System

As explained, a database system comprises of the users, hardware and software. Users’
interface are only given a part of the whole database applicable to them. For example,
Accounting or personnel aspect of the database. The users see the database arranged suitable
to them, i.e., specific users’ logical view. But inside the database, all the organisational data
are organised in a particular manner. All the units of data that are meaningful and appropriate
and linked together are in the database of the organisation. It is therefore the global logical
view.

The Database Administrator (DBA) configures the Database management System (DBMS) in
such a way that the logical data are arranged in physical structure – how many bits, bytes,
storage medium, etc. These are not supposed to be known by the end users. From figure 3.1, it
is shown that there is a sort of mapping between logical and physical views. The DBMS does
this mapping on the instruction of the DBA.
Computer
Users Consoles

Accounting
Package Mapping
Physical Data
Logical Data

Personnel
Package

DBMS Software

Project Application Hardware


Programmer DBA/DA

Figure 3.1: Architectural design of a database system

15
3.2 Functional Roles of the Components

1. Data Administrator (DA)


(i) Makes decisions on the database
(ii) Must understand the process of decision making
(iii)Formulates the policies of data – who should have access, when and communicate
to the DBA
(iv) Be at the strategic level of management

2. Database Administrator (DBA)


(i) Fully a technical person.
(ii) Fully a computer programmer
(iii) Responsible for implementing the policies specified by the DA
(iv) Defining the conceptual schema
(v) Defining the internal schemas
(vi) Liaising with users to ensure that their needs for data are met
(vii) Writing or helping users to write the necessary individual external schemas
(viii) User training
(ix) Consulting on application design
(x) Technical support on application design
(xi) Providing system related services
(xii) Defining security and integrity checks
(xiii) Defining back-ups and recovery procedures
(xiv) Monitoring performance
(xv) Responding to changing requirements and opportunities

3. Programmers
(i) Write application programs like stock control, wage bills for end users
(ii) They are application developers

4. Database management System (DBMS)


It is a software assisting the DBA to create the database, configure it, monitor
supervision, permitting access to the database, etc. It consists of the following tools:
(i) Data Definition Language (DDL): for defining the ata going into the database,
i.e., for data storage
(ii) Data manipulation Language (DML): This is a tool that enables manipulation
of the data that have been stored in the database for purposes of retrieval,
modification or deletion. DML is for programmers usage.
(iii) Data Control Language (DCL): Facility for ensuring data security and
integrity. Integrity means the data is correct. Data security means the data is
protected. In essence, DCL controls the activities in the database. It is used by
the DBA for awarding passwords, security permissions to the users.
(iv) Facilities for data recovery and concurrency controls.
(v) Facilities for database performance, management and control.

5. Data Mapping Function / Program


It is a procedure, method or program for translating data between two different data models
or abstract data types (ADTs). For example, between logical and physical models of data or
between enterprise (global) view and specific application view of the data.

16
6. Database Users
There are three main categories of database users. They are:
(i) Application Programmers: These are expert programmers who write database
applications. These programs make use query operations to support the database
end-users.
(ii) End Users: These employ query languages (programs or packages with user
friendly question-like statements, e.g., LIST ALL EMPLOYEE FROM OYO
STATE OR DISPLAY ALL EMPLOYEES HAVING SURNAME BEGIN
WITH ‘A’) provided as in integrated part of the DBMS. They can also use
written application programs that accept commands from the terminal and in
turn issue requests to the DBMS. End user activities are mostly queries.
(iii) Database Administrators. The person responsible for the overall control and
maintenance of he database system.

3.3 Constructing / Building a Database System

Data as facts only become useful when they are organised, processed and presented in human
understandable forms. Processed business facts provide information the business or
organisation needs to move forward.

A survey of the records an organisation deals with, form of requests for the records and the life
span of such records are the most important factors in any database system. The following
points must be noted:
(a) The kinds of records to be maintained. Records come in many sizes and shapes with
different kinds of information. The methods for filing general correspondence will
definitely be different from the system of filing maps, blueprints, adverts, job
applications, Local Purchasing Orders, LPOs, quotations, etc.
(b) The nature of data or record request. The frequency of request and the speed of retrieval
are of utmost importance in the filing system. This is so because the ability to retrieve
on time might spell the difference between wining and loosing contracts.
(c) The volume of records to be maintained. For reports that are of daily use frequent
entrance into the organisational record pool, equipment needed for filing must be
durable and strong enough to be able to handle information this voluminous.
(d) How long the files will be kept. Some records that should be kept permanently for the
rare instances when they may be requested need not be kept in active files where they
would interfere with fast retrieval of other more frequently requested records
demanding fast retrieval . inactive files would have to removed over time. this process
is called transferring.

Summary
In this Unit you were introduced to the architectural design of a database system. The
different stakeholders in database system were also highlighted. The Unit ends with the
factors for consideration when building a database system

Self Assessment Question (SAQ)


With a suitable diagram, explain the architectural design of a database system.

17
Unit 4 Database Management System (DBMS)
Expected Duration: 1 week or 2 contact hours

Introduction

A database management system (DBMS) is software that allows databases to be defined,


constructed, and manipulated. In this Unit, the meaning of DBMS, its functions in a database
environment and its functional requirements shall be discussed

Learning Outcomes

When you have studied this session, you should be able to explain:
4.1 The meaning of DBMS?
4.2 Functions of the DBMS
4.3 Components of the DBMS environment
4.4 Functional requirements of a DBMS

4.1 What is DBMS?

The Database Management System (DBMS) is a special software package that is used to
interact with the database. It is normally used to define the data, to design and consult with the
database as well as to update it. It contains a variety of facilities including a Data Definition
Language (DDL) to create and modify the database structures – files, users and their privileges;
a query language which supports all forms of retrieval and updating; and numerous interfaces
to liaise with the operating system, telecommunication system, programming languages and
other utility software. It also contains data validation routines and maintains a data dictionary
– a complete description of the database structure and contents. Examples of DBMS are dBase,
MS Access, MySQL, SQL Server, Oracle, etc.

Summarily, the Database Management System is a complex software system that is used to
construct, expand and maintain a database. It regulates data access in the shared database. In
essence, a DBMS is a software system that enables users to define, create, and maintain the
database and also provides controlled access to this database.

4.2 Functions of the DBMS

(i) It is conceptually the super operating system in a database system.


(ii) It serves as an interface between the database and its users
(iii) It allocates storage space to data.
(iv) By storing the definitions of data relationships (metadata) in a data dictionary, database
updates and maintenance are automatic. Hence, DBMS reduces data dependency and
redundancy.
(v) It keeps frequently used data in readily accessible forms, thereby saving time.
(vi) All user transactions on the system are handled by the DBMS.
(vii) It shields the users from the hardware level details of the system
(viii) It provides security to the data in the database
(ix) It handles the transformation of data as it moves from one level to another.
(x) It keeps frequently used data in readily accessible form thereby saving time.

18
(xi) It initiates actual Input / Output (I/O) operations and coordinates them before the host
operating system performs them.

4.3 Components of the DBMS environment


We can identify five major components in the DBMS environment: hardware, software, data,
procedures, and people:
(i) Hardware. The computer system(s) that the DBMS and the application programs
run on. This can range from a single PC, to a single mainframe, to a network of
computers.
(ii) Software. The DBMS software and the application programs, together with the
operating system, including network software if the DBMS is being used over a
network.
(iii) Data. The data acts as a bridge between the hardware and software components
and the human components. As we’ve already said, the database contains both the
operational data and the meta-data (the ‘data about data’).
(iv) Procedures. The instructions and rules that govern the design and use of the
database. This may include instructions on how to log on to the DBMS, make
backup copies of the database, and how to handle hardware or software failures.
(v) People. This includes the database designers, database administrators (DBAs),
application programmers, and the end-users.

4.4 Functional Requirements of a DBMS

1. Links between Data: A database is based on a data model whose specific aim is to
define the way data items represented in the system are structured and the links that can
be established between those data items.
2. Data Consistency: The stored data must be consistent with reality.
3. Ease of Data Access: A DBMS must allow any data item in the database to be accessed
easily.
4. Data Security: A DBMS must be capable of protecting the data it manages against any
external aggression.
5. Data Sharing: The DBMS must provide means for managing data sharing among
several applications.
6. Data Independence: An application that handles data using a file system is strongly
dependent on its data. The application must know how the files are structured and the
method for accessing them. In contrast, a DBMS should allow applications to be written
without the programmer having to worry about the physical data and the associated
access methods. Thus the system can evolve to take account of new needs without
disturbing applications that have already been written. Data independence is a concept
linked with the evolution and maintenance of an application. Any factors that make it
easier to develop future versions of an application, and particularly data independence
represent possible large-scale cost savings; which is an essence of a DBMS.
7. Performance: The above functional requirements or constraints must be realised
without detriment to the system’s overall performance.

19
4.5 Advantages and Disadvantages of DBMSs

(i) Control of data redundancy. The database approach eliminates redundancy where
possible. However, it does not eliminate redundancy entirely, but controls the amount
of redundancy inherent in the database. For example, it’s normally necessary to
duplicate key data items to model relationships between data, and sometimes it’s
desirable to duplicate some data items to improve performance. The reasons for
controlled duplication will become clearer when you read the Units on database design.

(ii) Data consistency. By eliminating or controlling redundancy, we’re reducing the risk of
inconsistencies occurring. If data is stored only once in the database, any update to its
value has to be performed only once and the new value is immediately available to all
users. If data is stored more than once and the system is aware of this, the system can
ensure that all copies of the data are kept consistent. Unfortunately, many of today’s
DBMSs don’t automatically ensure this type of consistency.

(iii) Sharing of data. In a file-based approach (the predecessor to the DBMS approach),
typically files are owned by the people or departments that use them. On the other hand,
the database belongs to the entire organization and can be shared by all authorized users.
In this way, more users share more of the data. Furthermore, new applications can build
on the existing data in the database and add only data that is not currently stored, rather
than having to define all data requirements again. The new applications can also rely
on the functions provided by the DBMS, such as data definition and manipulation, and
concurrency and recovery control, rather than having to provide these functions
themselves.

(iv) Improved data integrity. As already stated, database integrity is usually expressed in
terms of constraints, which are consistency rules that the database is not permitted to
violate. Constraints may apply to data within a single record or they may apply to
relationships between records. Again, data integration allows users to define, and the
DBMS to enforce, integrity constraints.

(v) Improved maintenance through data independence. Since a DBMS separates the data
descriptions from the applications, it helps make applications immune to changes in the
data descriptions. This is known as data independence and its provision simplifies
database application maintenance.

Other advantages include: improved security, improved data accessibility and responsiveness,
increased productivity, increased concurrency, and improved backup and recovery services.

There are, however, some disadvantages of the database approach, such as:
(i) Complexity. As already mentioned, a DBMS is an extremely complex piece of
software, and all users (database designers and developers, DBAs, and end-users)
must understand the DBMS’s functionality to take full advantage of it.

(ii) Cost of DBMS. The cost of DBMSs varies significantly, depending on the
environment and functionality provided. For example, a single-user DBMS for a
PC may cost hundreds of thousands. However, a large mainframe multi-user DBMS
servicing hundreds of users can be extremely expensive, perhaps millions of Naira.

20
There is also the recurrent annual maintenance cost, which is typically a percentage
of the list price.

(iii) Cost of conversion. In some situations, the cost of the DBMS and any extra
hardware may be insignificant compared with the cost of converting existing
applications to run on the new DBMS and hardware. This cost also includes the cost
of training staff to use these new systems, and possibly the employment of specialist
staff to help with the conversion and running of the system. This cost is one of the
main reasons why some companies feel tied to their current systems and cannot
switch to more modern database technology. The term legacy system is sometimes
used to refer to an older, and usually inferior, system (such as file-based,
hierarchical, or network systems).

(iv) Performance. Typically, a file-based system is written for a specific application,


such as invoicing. As a result, performance is generally very good. However, a
DBMS is written to be more general, to cater for many applications rather than just
one. The effect is that some applications may not run as fast using a DBMS as they
did before.

(v) Higher impact of a failure. The centralization of resources increases the


vulnerability of the system. Since all users and applications rely on the availability
of the DBMS, the failure of any component can bring operations to a complete halt
until the failure is repaired.

Summary
In this Unit you have been introduced to the meaning of DBMS, its functions in a database
environment and its functional requirements. All access to the database is through the DBMS.
The DBMS provides facilities that allow users to define the database, and to insert, update,
delete, and retrieve data from the database. The DBMS environment consists of hardware (the
computer), software (the DBMS, operating system, and applications programs), data,
procedures, and people. The people include database administrators (DBAs), database
designers, application programmers, and end-users.

Self Assessment Questions (SAQs)

1. What is a DBMS? Give five functions of a DBMS.


2. Give three types of DBMS you are familiar with
3. What are the functional requirements of a DBMS?
4. What are the merits and demerits of using DBMS in organisations?

21
Unit 5 Data Models
Expected Duration: 1 week or 2 contact hours

Introduction

A database model depicts how the data in a database are stored or arranged. It provides the
technique, which supports the conceptualisation of the database. The model defines the
following:
(i) Rules which bind the relationship among data.
(ii) Constraints among data
(iii) Meaning and interpretation of data, and
(iv) The way data is used.

The different types of database models are discussed in this Unit.

Learning Outcomes
When you have studied this session, you should be able to explain:
5.1 The meaning of a data model
5.2 Logical Data Model
5.3 Physical Data Model
5.4 Hierarchical Data Model
5.5 Network Data Model

5.1 What is a data model?

A data model can be defined as a way of thinking about or conceptualising, organising and
relating data. It is an integrated collection of concepts for describing data, relationships between
data, and constraints on the data used by an organization.

A model is a representation of ‘real world’ objects and events, and their associations. It
concentrates on the essential, inherent aspects of an organization and ignores the accidental
properties. A data model attempts to represent the data requirements of the organization, or
the part of the organization, that you wish to model. It should provide the basic concepts and
notations that will allow database designers and end-users to communicate their understanding
of the organizational data unambiguously and accurately. A data model can be thought of as
comprising three components:

(1) a structural part, consisting of a set of rules that define how the database is to be
constructed;
(2) a manipulative part, defining the types of operations (transactions) that are allowed on
the data (this includes the operations that are used for updating or retrieving data and
for changing the structure of the database);
(3) possibly a set of integrity rules, which ensures that the data is accurate.

Each data model is characterised by the set of concepts, definitions or symbols for representing
data as well as the rules that must be followed by anyone who wishes to employ the data model
for organising and relating data.

22
Essentially, the purpose of a data model is to represent data and to make the data
understandable. If it does this, then it can be easily used to design a database.

5.2 Logical Data Model

It is a model for representing or organising data meaningfully to the users of the data. For
example, either hierarchical data model or as tables or relations as in the relational database
model. This is the model in which users represent data logically.

5.3 Physical Data Model

It is a model for organising data on storage media such as disks or tapes. This has to do with
how data are actually stored in storage devices. It is devised to take advantage of processing
speed and storage space opportunities or constraints of the physical media as well as search
operational requirements, frequency of update and retrieval of data. The idea is to optimise
how data are stored.

5.4 Hierarchical Data Model

Data in a complex view is given an order in hierarchical data model. The subclasses are records.
Access to any record is unidirectional. A record can only be accessed through another record
having direct relationship to it. For example, student records can be accessed through a
department, faculty or college. From Figure 5.1, we cannot access microeconomics records
directly, except we go through social sciences then economics. A strict hierarchy is imposed.
One problem here is that lateral accessing of data is impossible. A record will have only one
parent; but the parent may have children and grandchildren. Other disadvantages of this model
are rigidity (one way direction to access a record), time wasting and inefficiency.

Social Sciences

Economics Geography Psychology

Macroeconomics Microeconomics Rural Economics

Figure 5.1: The hierarchical dta model

5.5 Network Data Model

In this model, records can be accessed in multiple ways. Records can have multiple parents and
parents can have multiple children. Networks are also referred to as multiple hierarchies,
superimposed on one another. Access to a record could be through more than one path.

23
National Budget

Federal Govt State Govt

Local Govt

Towns Suburbs Villages

Figure 5.2: Hierarchical Data Model

The challenge with the network data model is that it is usually very complex or complicated to
design as more objects become involved. Users found it difficult to find relationships between
data. It is not user friendly. It is also very expensive and loaded with problem of navigating
through the best access path. The model is in fact costly to design.

Other classical data models are treated in the next two units.

Summary

Underlying the structure of a database is the data model: a collection of tools for describing
data, data relationships, data semantics and constraints. Some classical data models have been
explained in this unit

Self Assessment Questions (SAQs)

1. What is a data model? Give 3 examples of data model.


2. Discuss the structure, advantages and disadvantages of hierarchical and network data
models.

24
Unit 6 The Entity Relationship Data Model
Expected Duration: 1 week or 2 contact hours

Introduction

The Entity Relationship (ER) model was proposed by Chen (1976). It employs three basic
notions: Entity set, relationship set and attributes. The starting point for designing a database
for an organisation is the ER Modelling where all the entities and their properties or attributes
are obtained. An ER diagram (model) is then drawn to depict the relationships existing between
the entities identified. In this Unit, the reader is introduced to the concept of ER modelling.

Learning Outcomes
When you have studied this session, you should be able to explain:
6.1 Basic Concepts of ER Models
6.1.1 Entity Sets
6.1.2 Property
6.1.3 Relationships
6.2 ER Symbols

6.1 Basic Concepts of ER Models

6.1.1 Entity Sets

An entity is any distinguishable object (concrete or abstract) that exists. For instance, student
is an entity in a school, which is also an entity. An entity has a set of properties and the values
for some of these properties may uniquely identify an entity. For example, matriculation
number of students uniquely identify them in a school.

A group consisting of all similar entities forms an entity set, e.g., the set of all persons who are
students in a school can be defined as entity set student. It is possible for entity set to overlap,
e.g. it is possible to have an entity set of employees of a school (employee) and entity set of all
students of the school (student). A person entity may be an employee or student entity, both or
neither.

We have:
(i) Regular entity: an entity that is not weak, i.e., does not depend on any other entity.
(ii) Weak entity: an entity whose existence depends on another entity. For example,
dependants or children of staff of an organisation. The staff of an organisation is a
regular entity. Similarly, next of kin in the context of students is a weak entity.

An entity may have sub-types. For example, students may have undergraduate, postgraduate
or Diploma as sub-types. Note that sub types are not properties of students but are its sub types.
One can never be an undergraduate without being a student. Similarly, junior staff and senior
staff are sub types of staff.

25
6.1.2 Property

This is a piece of information that describes an entity. Each type of property draws its value
from a value set or domain (i.e. all possible values). In the ER model:
(i) Property can be simple or composite. A composite property is like the name of a
person (first, middle, last or surname). Contact address can be composite if we have
street, office and telephone addresses.
(ii) Property can be key or non-key. A key property uniquely identifies an instance of the
entity. For example, Matric. Number. An instance of an entity is each member of the
entity. An instance of student is a particular student.
(iii) Property can be single or multi-valued. Multi-valued means the property can assume
more than one value at an instant of time, e.g., somebody may have many aliases or
addresses at the same time.
(iv) A property can be base or derived. Base means that the property assumes an original
value. For instance, Total_pay is a derived property of other pays like development
levy and school fees paid.
(v) A property can be null. A null is used when an entity does not have a value for the
property or attribute. Null value is used when an a property is not applicable to an entity.
For example, the number_of_children for an unmarried employee can be null. Null can
also mean that property value is unknown. An unknown value may either be missing
(value does not exist, but we do not have the information) or unknown (we do not know
whether or not the value exists).

6.1.3 Relationships

Relationship is a property that links every entity in the ER model together. An entity can be
involved in tow relationships. At an instant of an enterprise, some entities may not relate with
one another at all. Only entities can be linked but not properties in relationship.

A relationship is an association among several entities. A relationship set is a set of


relationships of the same type. Formally, a relationship is a mathematical relation on n >= 2
(possibly non-distinct) entity sets. Given entity sets E1, E2, ...., En, a relationship set R is a subset
of R
{(e1, e2, ..... en) / e1  E1, e2  E2, .... en  En}

Where (e1, e2, ..... en) is a relationship.

Consider two entities student and game. A relationship plays can be defined to denote the
association between students and the games they play. The association between entity sets is
referred to as participation, that is, the entity sets E1, E2, ..... En participates in relationship set
R.

The function that an entity plays in the relationship is called the entity role. The roles of entity
sets participating in a relationship are not usually specified if the participating entities are
distinct. But in recursive relationship, where the entities are not distinct, it is necessary to to
specify the roles of the participating entities. For example, in the recursive relationship work
for between employee entities. The first employee of a pair takes the role of manager, whereas,
the second takes the role of worker.

26
A relationship may also have descriptive attributes/properties. For example, we could associate
the attribute session and score to the relationship took between student and course entity sets.

The number of entity sets that participates in a relationship set is referred to as the degree of
the relationship set. A binary relationship is of degree 2, a ternary relationship is of degree 3.
Relationship set of degree greater than 2 are usually referred to as n-ary relationship set. Figure
6.1 shows that 1 student offers Many (M) courses.

Name

Student
1
Code Offers DBirth

Relationship
Gender
Title
M
Course
Properties

Unit

Properties

Figure 6.1: ER model for Student and Course Entities

Participation can be partial or total. It is partial if some instances of entity E1 participates in


a relationship with some instances of E2. For example, not all students may take a course, i.e.,
some instances of students may never take a course.

It is total if every instance in an entity must offer a course. Then the participation of student in
the relationship is total. If a particular course is not going to be taken by all students, then the
participation of course in the relationship is partial. Furthermore, we can have the following
different types of relationships:

(i) One-to-One Relation. One member of an entity E1 would participate in one instance
of E2 e.g. one student takes one project or one project is taken by only one student

1 1
Course Student

(ii) One-to-Many Relation. One student can take many courses or one course can be
taken by many students. Of course, one-to-many will definitely reduce to many-to-
many relationship, which is a general form of 1-to-many.

M 1
Course Student

27
6.2 ER Symbols

Entity: Rectangle for entity, double rectangle for weak entity

Regular Weak

Relationship: Diamond for relationship, single for regular and double diamond for weak
M 1
Student Student
(a) Relationship between regular and regular entities

1 M

M 1

(b) Weak relationship between regular and weak entities, e.g. two students of the same
father may have one next of kin.

Properties: drawn with ellipse shapes. Can be base or derived, single or composite. Derived
properties are drawn with broken ellipse lines. The key property is underlined

Name Student
First
Name

Composites
Matric

Last Derived property


Name Base properties

Lines link attributes to entity sets and entity sets to relationship sets. Double line indicates total
participation in a relationship.

Reminder: participation could be total or partial in a relationship by entities. If all members of


E1 participates in the relationship with E2, we double the link.

M 1 M 1
Student Student Student Student

Partial Total

28
Same as above but participation is partial. A few members
E1 participate only once

Interpret the following relationships:

E1 E1 E2
M
(i) (ii)

E1 E2
E1 E2

(iii) (iv)

Some other examples of ER Models


1. Products Manufacturing Company

In a products manufacturing company, the following entities can be obtained:


i. Customer
ii. Product
iii. Company Branch
iv. Employee

The following could depict the ER Model for the database scenario:

Customer
m Visit 1 Branch
m 1

Buys Works
Attends at

m m m
Employee
Product Sell
m m

29
2. Hospital Database Environment
In a hospital environment, the following entities can be obtained:
i. Patient
ii. Doctor
iii. Drug

The following could depict the ER Model for the database scenario:

1
Prescribes Drug
Doctor
m m
1

m 1
Consults Patient Purchase

Summary
The concept of Entity Relationship Model was explained in this unit. ER modelling is the
starting point for modern database design.

Self Assessment Questions (SAQs)


Consider developing a Student Information System (SIS) for your institution; identify all the
entities involved and their attributes. Now, draw a full ER model of the SIS.

30
Unit 7 Relational Database Model
Expected Duration: 1 week or 2 contact hours

Introduction
The Relational Database Management System (often called RDBMS for short) has become the
dominant DBMS in use today. The RDBMS represents the second generation of DBMS and is
based on the relational data model proposed by Dr E.F. Codd in his seminal paper ‘A Relational
Model of Data for Large Shared Data Banks’ in 1970. In the relational model, all data is
logically structured within relations (tables). A great strength of the relational model is this
simple logical structure. Yet, behind this simple structure is a sound theoretical foundation that
is lacking in the first generation of DBMSs (the network and hierarchical DBMSs typified by
systems such as IDMS/R from Computer Associates and IMS from IBM).

The relational model is based on the mathematical concept of a relation, which is physically
represented as a table. Codd, a trained mathematician, used terminology taken from
mathematics, principally set theory and predicate logic. In this unit, we explain the terminology
and structural concepts of the relational Model

Learning Outcomes
When you have studied this session, you should be able to explain:
7.1 Relational Database (RD) Structure
7.1.1 Relation
7.1.2 Attribute
7.1.3 Domain
7.1.4 Tuple
7.1.5 Relational database
7.2 Properties of Relational Tables
7.3 Relational keys
7.3.1 Superkey
7.3.2 Candidate key
7.3.4 Primary key
7.3.5 Foreign key
7.4 Representing Relational Databases

7.1 Relational Database (RD) Structure

The concept of the Relation is the fundamental concept of relational database. Relation is
linking sets of data or relation between sets of data. In relational database, data are organised
in form of tables or relations.

7.1.1 Relation

 A relation is a table with columns and rows.

A relational DBMS requires only that the database be perceived by the user as tables. Note that
this perception applies only to the way we view the database; it does not apply to the physical
structure of the database on disk, which we can implement using a variety of storage structures
(such as a heap file or hash file).

31
The number of entities identified in the ER model determines how many tables to be created
for storing data in the RD model. For each relationship between two or more entities, we also
define two or more relations/tables pertaining to that relationship. Each relation comprises rows
and columns. First row comprises the header of the relation containing the attributes of that
relation. The body comprises the tuples which are like records. One tuple for every instance or
member of an entity. The header is static, i.e. number of attributes not changed frequently; but
the size of the tuples could change due to updating, insertion or deletion. The number N of
attributes is called the degree of the relation while the number of tuples is the cardinality of the
relation, i.e., number of time-varying records in the relation.

7.1.2 Attribute
 An attribute is a named column of a relation.
In the relational model, we use relations to hold information about the objects that we want to
represent in the database. We represent a relation as a table in which the rows of the table
correspond to individual records called tuples and the table columns correspond to attributes.
Attributes can appear in any order and the relation will still be the same relation, and therefore
convey the same meaning.

A1 A2 A3 A4 A5 Header
Tuples

For example, in video rental company, the information on branches is represented by the
Branch relation, with columns for attributes branchNo (the branch number), street, city, state,
zipCode, and mgrStaffNo (the staff number corresponding to the manager of the branch).
Similarly, the information on staff is represented by the Staff relation, with columns for
attributes staffNo (the staff number), name, position, salary, and branchNo (the number of the
branch the staff member works at). Figure 7.1 shows instances of the Branch and Staff relations.
As you can see from this figure, a column contains values for a single attribute; for example,
the branchNo columns contain only numbers of branches.

Branch Relation:
branchNo Street City State ZipCode mgrStaffNo
B001 2 Ibadan street Ibadan Oyo 0020 S23
B002 14 Chuks
Avenue Apapa Lagos 0010 S29

Primary keys Foreign keys


Related Columns

Staff Relation:
staffNo name position Salary branchNo
S23 Akinola S. O. Manager 250,000 B001
S28 Hamzat B. S. Supervisor 150, 000 B007

Figure 7.1: An example of the Branch and Staff Relation


32
7.1.3 Domain

 Domain is the set of allowable values for one or more attributes.

Domains are an important feature of the relational model. Every attribute in a relational
database is associated with a domain of permissible values. Domains may be distinct for each
attribute, or two or more attributes may be associated with the same domain. The domain of
attribute gender is male or female. Domain of age in a government establishment is 18 to 60
years. The domain is important for data integrity, to allow only permissible values for a
particular attribute.

Figure 7.2 shows the domains for some of the attributes of the Branch and Staff relations.

Note that, at any given time, typically there will be values in a domain that don’t currently
appear as values in the corresponding attribute. In other words, a domain describes possible
values for an attribute.

Attribute Domain name Meaning Domain definition


branchNo Branch_Numbers Set of all possible branch numbers. Alphanumeric: size 4, range B001–B999
street Street_Names Set of all possible street names. Alphanumeric: size 60
staffNo Staff_Numbers Set of all possible staff numbers. Alphanumeric: size 5, range S01–S99
position Staff_Positions Set of all possible staff positions. One of Director, Manager, Supervisor,
Assistant, Buyer
Salary Staff_Salaries Possible values of staff salaries. Monetary: 10 digits,
range N10,000.00 – N500,000.00

Figure 7.2: Domains for some attributes of the Branch and Staff relations.

The domain concept is important because it allows us to define the meaning and source of
values that attributes can hold. As a result, more information is available to the system and it
can (theoretically) reject operations that don’t make sense. For example, it would not be
sensible for us to compare a staff number with a branch number, even though the domain
definitions for both these attributes are character strings. Unfortunately, you’ll find that most
Relational Databases don’t currently support domains.

7.1.4 Tuple

 A tuple is a record of a relation.

The fundamental elements of a relation are the tuples or records in the table. In the Staff
relation, each record contains five values, one for each attribute. As with attributes, tuples can
appear in any order and the relation will still be the same relation, and therefore convey the
same meaning.

Finally, we have the definition:

7.1.5 Relational database

 This is a collection of normalized tables.

A relational database consists of tables that are appropriately structured. The appropriateness
is obtained through the process of normalization, to be studied later.
33
Alternative terminology
The terminology for the relational model can be quite confusing. In this unit, we have
introduced two sets of terms: (relation, attribute, tuple) and (table, column, record). Other terms
that you may encounter are file for table, row for record, and field for column. You may also
find various combinations of these terms, such as table, field, and row.

From now on, we will tend to drop the formal terms of relation, tuple, and attribute, and instead
use the more frequently used terms table, column, and record.

7.2 Properties of Relational Tables

A relational table has the following properties:


(i) The table has a name that is distinct from all other tables in the database.
(ii) Each cell of the table contains exactly one value. (For example, it would be wrong to
store several telephone numbers for a single branch in a single cell. In other words,
tables don’t contain repeating groups of data. A relational table that satisfies this
property is said to be normalized or in first normal form.)
(iii) Each column has a distinct name, no duplication of fields.
(iv) The values of a column are all from the same domain and are atomic.
(v) The order of columns has no significance. In other words, provided a column name
is moved along with the column values, we can interchange columns.
(vi) Each record is distinct; there are no duplicate records.
(vii) The order of records has no significance, theoretically. (However, in practice, the
order may affect the efficiency of accessing records)
(viii) More than one attribute can serve as a primary key. If there is no unique key in a
table, i.e., if all fields are all keys, then we can define all the fields as constituting
primary key.

Consider the Purchase table:

Purchase Table
Date Item Price Qty Totals Supplier Invoice No

It is possible we declare a rule that only one supplier can supply on a particular date on
the same item. If the invoice number is not included, then all the attributes can serve as
primary key for the table. In this case, record number or invoice number, serial number,
defined as auto number is used as the primary key.

7.3 Relational keys

As just stated, each record in a table must be unique. This means that we need to be able to
identify a column or combination of columns (called relational keys) that provides uniqueness.
In this section, we explain the terminology used for relational keys.

34
7.3.1 Superkey

 A superkey is a column/attribute, or set of columns, that uniquely identifies a record


within a table.

Since a superkey may contain additional columns that are not necessary for unique
identification, we are interested in identifying superkeys that contain only the minimum number
of columns necessary for unique identification.

7.3.2 Candidate key

 A candidate key is a superkey that contains only the minimum number of columns
necessary for unique identification.

A candidate key for a table has two properties:


 Uniqueness In each record, the values of the candidate key uniquely identify that record.
 Irreducibility No proper subset of the candidate key has the uniqueness property.

Consider the Branch table shown in Figure 7.1. For a given value of city, we would expect to
be able to determine several branches (for example, a particular city can have two branches).
This column, therefore, cannot be selected as a candidate key.

On the other hand, since the company allocates each branch a unique branch number, then for
a given value of the branch number, branchNo, we can determine at most one record, so that
branchNo is a candidate key. Similarly, as no two branches can be located in the same zip code,
zipCode is also a candidate key for the Branch table.

There may be several candidate keys for a table. Consider, for example, a table called Role,
which represents the characters played by actors in videos. The table comprises an actor
number (actorNo), a catalog number (catalogNo), and the name of the character played
(character), as shown in the Table below. For a given actor number, actorNo, there may be
several different videos the actor has starred in. Similarly, for a given catalog number,
catalogNo, there may be several actors who have starred in this video. Therefore, actorNo by
itself or catalogNo by itself cannot be selected as a candidate key. However, the combination
of actorNo and catalogNo identifies at most one record. When a key consists of more than one
column, we call it a composite key.

Role Table

actorNo catalogNo character

A1002 207132 Jide Kosoko


A3006 903455 Ogogo
A8401 634523 James

35
7.3.4 Primary key

 The primary key is the candidate key that is selected to identify records uniquely within
the table.

Since a table has no duplicate records, it is always possible to uniquely identify each record.
This means that a table always has a primary key. In the worst case, the entire set of columns
could serve as the primary key, but usually some smaller subset is sufficient to distinguish the
records. The candidate keys that are not selected to be the primary key are called alternate
keys. For the Branch table, if we choose branchNo as the primary key, zipCode would then be
an alternate key. For the Role table, there is only one candidate key, comprising actorNo and
catalogNo, so these columns would automatically form the primary key.

7.3.5 Foreign key

 A foreign key is an attribute (column), or set of columns, within one table that matches
the candidate key of some (possibly the same) table.

When a column appears in more than one table, its appearance usually represents a relationship
between records of the two tables. For example, in Figure 7.1 the inclusion of branchNo in both
the Branch and Staff tables is quite deliberate and links branches to the details of staff working
there. In the Branch table, branchNo is the primary key. However, in the Staff table the
branchNo column exists to match staff to the branch they work in. In the Staff table, branchNo
is a foreign key. We say that the column branchNo in the Staff table targets or references the
primary key column branchNo in the home table, Branch. In this situation, the Staff table is
also known as the child table and the Branch table as the parent table.

You may recall that one of the advantages of the DBMS approach was control of data
redundancy. This is an example of ‘controlled redundancy’ – these common columns play an
important role in modelling relationships, as we’ll see in later Units.

7.4 Representing Relational Databases

A relational database consists of one or more tables. The common convention for representing
a description of a relational database is to give the name of each table, followed by the column
names in parentheses. This is usually called the database schema. Normally, the primary key
is underlined. The description of the relational database for the video rental company is:

Branch (branchNo, street, city, state, zipCode, mgrStaffNo)


Staff (staffNo, name, position, salary, branchNo)
Video (catalogNo, title, category, dailyRental, price, directorNo)
Director (directorNo, directorName)
Actor (actorNo, actorName)
Role (actorNo, catalogNo, character)
Member (memberNo, fName, lName, address)
Registration (branchNo, memberNo, staffNo, dateJoined)
RentalAgreement (rentalNo, dateOut, dateReturn, memberNo, videoNo)
VideoForRent (videoNo, available, catalogNo, branchNo)

The next tables show an instance of the database schema for a company called videoCompany.

36
Branch
branchNo Street City State ZipCode mgrStaffNo
B001 2 Ibadan street Ibadan Oyo 0020 S23
B002 14 Chuks
Avenue Apapa Lagos 0010 S29

Staff
staffNo name position Salary branchNo
S23 Akinola S. O. Manager 250,000 B001
S28 Hamzat B. S. Supervisor 150, 000 B007

Video
catalogNo title Category dailyRental Price directorNo
207 Die another day Action 100.00 150.00 D1001
289 Men in the dark Fantasy 100.00 300.00 D7834

Director
directorNo directorName
D1001 Papa Kay
D7834 Mama Kay

Actor
actorNo actorName
A1002 Olu omo
A4006 Jide Kosoko

Role
actorNo catalogNo character
A1002 207 Brother Jero
A4006 289 Uncle
Saheed

Member
memberNo fName lName address
M100 Alima Buhari 25 Kolo street, kaduna
M200 Kayode Chuks 10 Malomo street, Agbowo

Registration
branchNo memberNo staffNo dateJoined
B001 M100 S23 5 July 2018
B002 M200 S28 9 October 2020

RentalAgreement
rentalNo dateOut dateReturn memberNo videoNo
R100 4 Jan 2015 8 Jan 2015 M100 1234
R200 5 Feb 2018 10 Feb 2018 M200 5678

37
VideoForRent
videoNo available catalogNo branchNo
1234 Y 207 B001
5678 N 289 B002

Another example: Consider the Student Course ER Model drawn earlier in Unit 6

Name

Student
1
Code Offers DBirth

Relationship
Gender
Title
M
Course
Properties

Unit

Properties
The following could represent the Relational Database schemas for the entities and the
relationship tables.

Student (matricNo, Name, DBirth, Gender, Hall)


Course (courseCode, Title, Unit, Status, Semester)
Offers (matricNo, courseCode, score, session)

Note the foreign keys in the Offers table. This table looks up to the parent tables: Student and
Course.

Summary

The Relational Database has become the dominant database in use today. Relations are
physically represented as tables, with the records corresponding to individual tuples and the
columns to attributes. Properties of relational tables are: each cell contains exactly one value,
column names are distinct, column values come from the same domain, column order is
immaterial, record order is immaterial, and there are no duplicate records.

A superkey is a set of columns that identifies records of a table uniquely, while a candidate
key is a minimal superkey. A primary key is the candidate key chosen for use in identification
of records. A table must always have a primary key. A foreign key is a column, or set of
columns, within one table that is the candidate key of another (possibly the same) table.

38
Self Assessment Questions (SAQs)

1. Consider a product manufacturing company, design a relational database for the


company, modelling the staff, product and customers.
2. Design a relational database for your department with a view to produce students’
results or transcripts.

3. Property Management System


The Department of Computer Science is interested in a web-based information system that
would be employed to manage her hard-earned properties. Your analysis shows that the
properties in the Department are comprised of furniture, electrical fittings, air conditioners,
fans, computers, printers, typewriters, file cabinets, UPS and rugs. These properties are
encoded with some IDs and are located in different offices and laboratories; occupied by
either lecturers, administrative or technical staffers. In addition, each of the properties has
procurement date, manufacturer’s name, supplier’s name and state (functional or defected

Draw the ER and RD models for the above data scenario.

39
Unit 8: The Relational Database Design Process
Expected Duration: 1 week or 2 contact hours

Introduction

Before you build the tables and other objects that will make up your system, it is important to
take time to design it. A good design is the keystone to creating a system that does what you
want it to do effectively, accurately and efficiently. We explore the design principle for
Relational Databases in this Unit.

Learning Outcomes

When you have studied this session, you should be able to explain or determine:
8.1 The Basic Steps in Designing a Database System
8.3 The purpose of the system
8.3 The tables that are needed in the system
8.4 Identification of fields with unique values
8.5 The relationships between tables
8.5.1 Relationship between Tables: Further Explanations
8.6 Refining the design
8.7 Entering data and create other system objects

8.1 The Basic Steps in Designing a Database System

To design a database system for an enterprise or organization, the following steps are
necessary:

 Determine the purpose of the system


 Determine the tables that are needed in the system. This would result from the entities
identified at the Entity-Relationship (ER) Modelling phase.
 Determine the fields that are needed in the tables
 Identify fields with unique values, to serve as primary key
 Determine the relationships between tables. The relationships will have their own
tables as well.
 Refine the design. This is done via Database Normalization process
 Add data (populate tables) and create other system objects

NOTE: Some of the listed steps (determining tables, data fields and relationships) may cross
and be repeated a few times when designing a relational database.

Building a database is a process of examining the data that is necessary and useful for an
application, then breaking it down into a relatively simple row and column format.

8.4 Determine the purpose of the system

To determine the purpose of the system, the database designer needs to know what information
the potential users would want from the database (detailed scenario). From that, he can
determine what subjects he needs to store facts about (the tables) and what facts he need to
store about each subject (the data fields).

40
So, the following questions must be answered:

1. What type of data should the system keep track off?


2. What would the user want to know about the data?
3. What would the user want to do to the data?

The first step in creating a database is creating a plan that serves both as a guide to be used
when implementing the database and as a functional specification for the database after it has
been implemented. The nature and complexity of a database application, as well as the process
of planning it, can vary greatly. In the first case, the database design may be little more than a
few notes on some scratch paper. In the latter case, the design may be a formal document with
hundreds of pages that contain every possible detail about the database.

NOTE: Modeling the structure on paper before opening computer and starting coding is highly
recommended. Planning may seem time-consuming up front, but not planning is twice as time-
consuming later.

8.3 Determine the tables that are needed in the system

To determine the tables can be the trickiest step in the database design process. That is because
the results that you want from the database (e.g. the reports that you want to print, the forms
that you want to use, the questions that you want answered) don't necessarily provide clues
about the structure of the tables that can produce them. In fact, it may be better to sketch and
rework your design on paper first.

When you design your tables, divide up pieces of information by keeping following
fundamental design principles in mind:

1. Each piece of information is stored in only one table


2. A table should not contain duplicate information
3. Each table should contain information about only one subject
4. Information should not be duplicated between tables

It needs be noted that all the entities identified in the ER-Modelling stage will form the
individual tables or relations in the Relational Database design. All their attributes will form
the columns of the tables. For instance, the entity student and customer may be represented by
the following tables, with their primary keys underlined:

Student Table:

MatricNo Surname OtherNames YrBirth PhoneNo HomeAdd

Customer Table:

CustomerID Surname OtherNames OfficeAdd PhoneNo HomeAdd

41
8.4 Identify fields with unique values

Next, we have to identify fields with unique values in order to define table primary key. The
primary key will uniquely identify each individual record in a table and be able to relate
information stored in separate tables. By having a different primary key in each record one can
tell two records apart. The goal of setting primary keys is to ensure each records’ uniqueness.
This is called entity integrity in the database management.

The primary key types:

1. Single-field primary keys (AutoNumber or User-Defined type such as MatricNo or


CustomerID)
2. Multiple-field primary keys, such as MatricNo and CourseCode jointly serving as a
primary key to a table

Notes:

 The power of a relational database system comes from its ability to quickly find and
bring together (related) information stored in separate tables by using queries, forms
and generating reports. In order to do this, each table should include a field or set of
fields to uniquely identify each record stored in the table. This information is called the
primary key of the table. Once we specify a primary key for the table, to ensure
uniqueness, the system will prevent any duplicated or null values from being entered in
the primary key fields.

8.5 Determine the relationships between tables

To be able to set relationships between tables, we must establish a link between fields that
contain common and related information. The link field in another table is known as a foreign
key data field. A relationship is established by linking these key fields between tables - the
primary key in the 'primary' table and a foreign key in the 'related' table.

Consider the following ER Model for Student and Course entities:

Name

Student
1
CourseCode MatricNo
Offers
Relationship
Gender
Title
M
Course
Properties

Unit

Properties
42
The relationship between Course and Student is that one student offers many courses in an
institution. The relationship – Offers, will have its own table in the Relational Database Design.
We can now give a proper name for it as Student_Course Table. The primary keys from each
table will jointly form the primary keys in this table. These keys are however otherwise called
foreign keys. They are foreign in the Student_Course Table because they have already been
defined as primary keys in their base tables. The Student_Course Table is actually going to
serve as a look-up table to the Student and Course tables.

We must find other attributes in the Student_Course Table to augment the primary key
attributes. For example, what may relate a student and course together is the result or score the
student obtained from the course and possibly the Grade Point, GP. In fact, we may change the
name of the relationship table to Exam or Result tables.

For a relationship between Customer and Product, which we may tagged “Buys”, other
attributes in the relationship table could be quantityBought, amountPaid, PurchaseTime and
PurchaseDate.

Student_Course (Result) Table:

MatricNo CourseCode Session Score GP Grade

12345 CSC 472 2021/2022 34 0 F


12456 CSC 235 2022/2023 67 3 B
12345 CSC 472 2022/2023 64 3 B

Note that we also add Session to the table and underlined it, to serve as part of the primary key
for the table Student_Course. This is because, a student may repeat a course in one session and
he must re-take the course in another session. Consider records 1 and 3 in the table above. All
these results are coming to be stored in this table. So, using MatricNo and CourseCode alone
will not suffice to serve as primary key in the table, to avoid duplication or records. The Session
differentiates the two records 1 and 3 in the above table.

Customer_Product (Purchase) Table

CustomerID ProductID PurchaseTime PurchaseDate QtyBought AmountPaid

C01 P01 12.24 23/1/2022 50 450000


C02 P03 10.30 24/2/2022 100 945000
C01 P01 15.43 23/1/2022 30 270000

The Customer_Product Table may also be called Purchase Table. It is a relationship table for
Customer and Product tables. The CustomerID is the primary key coming from the Customer
Table while the ProductID is the primary key coming from the Product Table. Therefore the
two of them are foreign keys in this table. Now, we have added two more attributes from this
table to joint serve as primary key for the Purchase Table: PurchaseTime and PurchaseDate.
Why? A customer could purchase same product same day two or three times, but time of
purchases will differentiate these records from the database. So, duplication of records is
disallowed. Consider records 1 and 3 for customer with ID C01 buying a product with ID P01
on the same date 23/1/2022, but the time of purchases are different.

43
Sometimes three tables (entities) may be involved in a relationship. For instance, a doctor (ID
= D03) prescribing a drug (ID = DR5) to a patient (ID = P03). Each of these personalities are
entities on their own and they will have their individual tables. The relationship table between
them could look as the one presented next.

Doctor_Drug_Patient (Consultation) Table:

DoctorID DrugID PatientID ConsultationTime ConsultationDate Dose

D03 DR5 P03 12.25 23/1/2022 450

Can you explain why we have included ConsultationTime and ConsultationDate as part of the
primary key for this table?

Sometimes, it is possible all the attributes on a table may serve as the primary key. This is
called an all-key table. It may occur when we cannot vividly determine that a group of some
attributes will suffice as primary key for the table.

So, every table must have a primary key - one or more data fields whose contents are unique
to each record. When linking tables we link the primary key field from one (primary or 'parent')
table to a field in another (related or 'child') table that has the same name, structure and data
type. By matching the values from the primary key to the foreign key in both tables, we can
relate two records.

 Tables store data about entities, while columns contain the attributes of the entities.

8.5.1 Relationship between Tables: Further Explanations

Now that we have defined our information into tables and identified primary key fields, we
need a way to tell the system how to bring related information back together in meaningful
ways. To do this, we define relationships between tables. Relationship is an association
between common fields (columns) in two tables. A relationship works by matching data in key
fields. In most cases, these matching fields are the primary keys from one table, which provides
a unique identifier for each record, and a foreign key in the other table. The kind of relationship
that the system creates depends on how the related fields are defined.

When we physically join two tables by connecting fields with related information, we create a
relationship that is recognized by the system (like Microsoft Access). The specified relationship
is important. It tells the system how to find and display information from fields in two or more
tables. The system needs to know whether to look for only one record in a table or to look for
several records on the basis of the relationship.

There are 3 relationship types:

1. One-to-one (1:1) - each record in Table A can have only one matching record in Table
B, and each record in Table B can be related to only one record in Table A. For instance,
one student will execute one project at Final year in a higher institution. Although this
may vary for institutions, but that is the norm.

44
This type of relationship is not frequently used in database systems, but it can be very
useful way to link two tables together. However, the information related in this way
could be in one table. You might use a one-to-one relationship to divide a table with
many fields in order to isolate part of a table for security reasons, or to store information
that applies only to a subset of the main table, or for efficient use of space. A one-to-
one relationship is created if both of the related fields are primary keys or have unique
indexes.

2. One-to-many (1:M) - is the most common type of relationship and it is used to relate
one record from the 'primary' table with many records in the 'related' table. In a one-to-
many relationship, a record ('parent') in Table A can have many matching records
('children') in Table B, but a record ('child') in Table B has only one matching record
('parent') in Table A. This kind of relationship is created if only one of the related fields
is a primary key or has a unique index.

3. Many-to-many (M:M) - is used to relate many records in the table A with many
records in the table B. A record ('parent') in Table A can have many matching records
('children') in Table B, and a record ('child') in Table B can have many matching records
('parents') in Table A. It is the hardest relationship to understand and it is not correct.
By breaking it into two one-to-many relationships and creating a new (junction/link)
table to stand between the two existing tables will enable correct and appropriate
relationship setting. A many-to-many relationship is really two one-to-many
relationships with a junction/link table. NOTE: Link table usually has the composite
primary key that consists of the foreign keys from both tables A and B.

When tables are linked (joined) together, one table is usually called 'parent' or 'primary' or
‘base’ table ('one end' in the 1:M relationship and 'one end' (primarily created table) in the 1:1
relationship) and another table is called 'child' or 'related' table ('many end' in the 1:M
relationship and 'one end' (subsequently created table) in the 1:1 relationship). This is known
as a parent-child relationship between tables. Records in a primary table cannot be modified
or deleted if there are related records in the 'child' table - there will not be an orphan (related)
record without a parent (primary) record. Also, a new record cannot be added to the related
table if there is no associated record in the primary table. This is one of the concepts of database
referential integrity rules, which we will discuss in a latter Unit.

8.6 Refine the design

The process of designing a relational database includes making sure that a table contains only
data directly related to the primary key, that each data field contains only one item of data, and
that redundant (duplicated) data is eliminated. The task of the database designer is to structure
the data in a way that eliminates unnecessary duplication and provides a rapid search path to
all necessary information. This process of specifying and defining tables, keys, columns and
relationships in order to create an efficient database is called normalization.

Normalisation is part of successful database design. Without normalisation, database systems


can be inaccurate, slow and inefficient and they might not produce the data expected.

We use the normalization process to design efficient and functional databases. By normalizing,
we store data where it logically and uniquely belongs. Normalization process involves a few
steps and each step is called a form. Forms range from the first normal form (1NF) to fifth

45
normal form (5NF). There is also one higher level, called domain key normal form (DK/NF).
We will cover the Normalization of database extensively in a Unit later.

8.7 Enter data and create other system objects

When we are satisfied that the table structures meet the design goals described here, then it's
time to go ahead and add all our existing data to the tables. We can then create any queries,
forms, reports, macros, and modules that we may want.

Summary

In this Unit, we have discussed the basic principles that are involved in designing relational
database tables. We also discussed extensively, the concept of relationships between tables in
a relational database environment.

Self-Assessment Questions

Design a full relational database system showing the parent and relationship tables for the
following database scenario:

1. Students’ Record Management System


2. Alumni Record Management System
3. Hospital Record Management System

46
Unit 9: Relational Database Integrity
Expected Duration: 1 week or 2 contact hours

Introduction
In the previous Unit, we discussed the structural part of the relational data model. As we
mentioned in Section 7.1, a data model has two other parts: a manipulative part, defining the
types of operations that are allowed on the data, and a set of integrity rules, which ensure that
the data is accurate. In this section, we discuss the relational integrity.

Learning Outcomes
When you have studied this session, you should be able to explain:

9.1 Referential Integrity Concept


9.2 Types of Database Integrity
9.3 Relational Integrity Rules
9.4 Nulls
9.5 Entity integrity
9.6 Referential Integrity
9.7 Other Business Rules

9.1 Referential Integrity Concept

In addition to specifying relationships between two tables in a database, we also set up


referential integrity rules that help in maintaining a degree of accuracy between tables.
Setting referential integrity rules would prevent unwanted and accidental deletions and
modifications of the 'parent' records that relate to records in the 'child' table. This type of
problem could be catastrophic for any database system. The referential integrity rules keep the
relationships between tables intact and unbroken in a relational database management system.
The referential integrity prohibits us from changing existing data in ways that invalidate and
harm the links between tables.

NOTE: Referential integrity operates strictly on the basis of the tables' key fields. It checks
each time a key field, whether primary or foreign, is added, changed or deleted. If any of these
listed actions creates an invalid relationship between two tables, it is said to violate referential
integrity. Referential integrity is a system of rules that Database management Systems (DBMS)
use to ensure that relationships between records in related tables are valid, and that we don't
accidentally delete or incorrectly change related data.

9.2 Types of Database Integrity

There are 4 types of database integrity:

1. Entity Integrity ensures that each row (record) is a unique instance in a particular table
by enforcing the integrity of the primary key or the identifier column(s) of a table (e.g.
ID, Reference Code, etc).
2. Domain Integrity ensures validity of entries (data input) for a column through the data
type, the data format and the range of possible values (e.g. date, time, age, etc.).

47
3. Referential Integrity preserves the defined relationships between tables when records
are added, modified or deleted by ensuring that the key values are consistent across
tables; such consistency requires that there are no references to non-existent values and
if a key value changes, all references to it change consistently through database,
otherwise a key value cannot be changed.
4. User-Defined Integrity enables specific (required) business rule(s) to be defined and
established in order to provide correct and consistent control of an application's data
access (e.g. who can have permissions to modify data, how generated reports should
look like, which data can be modified, etc.).

9.3 Relational Integrity Rules

The integrity of data in the relational database must be enforced. One of the rules is that a
relation/table must have a unique primary key and secondly, there must be consistency of data
in the database.

The integrity rules in most databases are:


(i) The need for primary key values;
(ii) Uniqueness of the primary key values; and
(iii) The domains of the values of the attributes, e.g. if an attribute is a state of a country;
then if somebody is a Nigerian, the domain of state would be the one in the country.

Since every column has an associated domain, there are constraints (called domain constraints)
in the form of restrictions on the set of values allowed for the columns of tables. In addition,
there are two important integrity rules, which are constraints or restrictions that apply to all
instances of the database. The two principal rules for the relational model are known as entity
integrity and referential integrity. Before we define these terms, we need first to understand
the concept of nulls.

9.4 Nulls

 Null represents a value for a column that is currently unknown or is not applicable for
this record.

A null can be taken to mean ‘unknown’. It can also mean that a value is not applicable to a
particular record, or it could just mean that no value has yet been supplied. Nulls are a way to
deal with incomplete or exceptional data.

However, a null is not the same as a zero numeric value or a text string filled with spaces; zeros
and spaces are values, but a null represents the absence of a value. Therefore, nulls should be
treated differently from other values. For example, suppose it was possible for a branch to be
temporarily without a manager, perhaps because the manager has recently left and a new
manager has not yet been appointed.

In this case, the value for the corresponding mgrStaffNo column would be undefined. Without
nulls, it becomes necessary to introduce false data to represent this state or to add additional
columns that may not be meaningful to the user. In this example, we may try to represent the
absence of a manager with the value ‘None at present’. Alternatively, we may add a new
column ‘currentManager?’ to the Branch table, which contains a value Y (Yes), if there is a

48
manager, and N (No), otherwise. Both these approaches can be confusing to anyone using the
database.

Having defined nulls, we’re now in a position to define the two relational integrity rules

9.5 Entity integrity

 The first integrity rule applies to the primary keys of base tables.

 Entity integrity: In a base or parent table, no column of a primary key can be null.

A base or parent table is a named table whose records are physically stored in the database.
This is in contrast to a view. A view is a ‘virtual table’ that does not actually exist in the
database but is generated by the DBMS from the underlying base/parent tables whenever it’s
accessed.

The null primary key rule states that the primary key values should not be null. From an earlier
definition, we know that a primary key is a minimal identifier that is used to identify records
uniquely. This means that no subset of the primary key is sufficient to provide unique
identification of records. If we allow a null for any part of a primary key, we’re implying that
not all the columns are needed to distinguish between records, which contradicts the definition
of the primary key. For example, as branchNo is the primary key of the Branch table, we should
not be able to insert a record into the Branch table with a null for the branchNo column.

9.6 Referential Integrity

 The second integrity rule applies to foreign keys.

 Referential integrity: If a foreign key exists in a table, either the foreign key value
must match a candidate key value of some record in its home/parent table or the foreign
key value must be wholly null.

In the previous Figure 7.1 Unit 7, branchNo in the Staff table is a foreign key targeting the
branchNo column in the home (parent) table, Branch. It should not be possible to create a staff
record with branch number B300, for example, unless there is already a record for branch
number B300 in the Branch table. However, we should be able to create a new staff record with
a null in the branchNo column to allow for the situation where a new member of staff has joined
the company but has not yet been assigned to a particular branch.

Consider the following relations again:

Student (matricNo, Name, DBirth, Gender, Hall)


Course (courseCode, Title, Unit, Status, Semester)
Offers (matricNo, courseCode, score, session)

If we have 789 as a matric number in the Student table, which is a primary key, we can have a
record with this matric number in the Offers table. But we cannot have record with matric
number 777 in Offers table if it has not been defined in the base/parent table (Student).

49
Foreign keys are attributes (possibly composite) in one relation R2, whose values are required
to match those of the primary key values of some other relation R1; where R1 and R2 may not
necessarily be discrete (may be the same relation). R2 is the referencing (child) relation while
R1 is the parent or base or referenced relation. This is what is referred to as referential integrity
constraints. In other words, no value of foreign key (FK) of R2 can be allowed unless that
value is already present in the primary key (PK) of R1.

9.7 Other Business Rules

 Business rules: Rules that define or constrain some aspect of the organization.

Examples of business rules include domains, which constrain the values that a particular
column can have, and the relational integrity rules that we have just discussed. Another
example is multiplicity, which defines the number of occurrences of one entity (such as a
branch) that may relate to a single occurrence of an associated entity (such as a member of
staff).

It is also possible for users to specify additional constraints that the data must satisfy. For
example, if our videoCompany database has a rule that a member can only rent a maximum of
10 videos at any one time, then the user must be able to specify this rule and expect the DBMS
to enforce it. In this case, it should not be possible for a member to rent a video if the number
of videos the member currently has rented is 10. Unfortunately, the level of support for business
rules varies from system to system.

When a database is created, some things have to be taken into consideration to ensure that the
database is consistent with respect to the constraints. Methods involved in doing this are:

(i) Restricted: Before a record could be deleted or modified, the record with the
primary key must not be participating in any relationship. Before we delete a student
record, we have to check if his matric number occurs in any other tables. One way
of doing is to have another estra key called flag indicating Active (when we enter
the table) and Inactive when we leave it, or status – valid / invalid field. So that in
listing the table, all invalid records would not be listed.

(ii) Cascade: Immediately we delete a record with a particular matric number, all other
records in the database having the matric number should also be deleted.

(iii) Nullifies: All foreign key values are set to NULL. When we delete a student with
matric number 100 in the Student table, then other relation’s domains or records
with matric number 100 is deleted.

(iv) A dialog box (form for user action) must be shown to the user, telling him that an
inconsistency has occurred and that the dialog box should say “What do you want
me to do?”

(v) Archive data before action

(vi) Merge data values as appropriate and meaningfully

50
Accordingly, for each foreign key, the database designer should specify:

(a) The attribute or attribute combination that defines the foreign key
(b) The target or referenced primary key
(c) The foreign key rules. These are NULL, DELETE and UPDATE rules.

The NULL – whether the foreign key can accept NULL in full or in part.
DELETE: What the system should do in the event that a target primary key is to be deleted –
restricted, cascade, etc
The UPDATE – what the system should do when the target or primary key is to be modified
or changed.

Summary

A null represents a value for a column that is unknown at the present time or is not defined for
this record. Entity integrity is a constraint that states that in a base table no column of a primary
key can be null. Referential integrity states that foreign key values must match a candidate
key value of some record in the home (parent) table or be wholly null.

Self Assessment Questions (SAQs)

1. Discuss the concept of referential integrity constraints with reference to an example


relational database.
2. What are the importance of Null, Entity Integrity and Referential Integrity in database
design?
3. Find out how the referential integrity constraints can be enforced in MySQL or MS
Access database system.

51
Unit 10: Relational Calculus – An Introduction
Expected Duration: 1 week or 2 contact hours

Introduction

How do we manipulate databases after creation? Within the relational data model, the data in
the various tables can be manipulated by means of certain primitive operators. The operators
either work on a table or two at a time. Moreover, the result of the operation is another derived
table formed; upon which another operator may be applied, etc., until the required tables in the
database are updated or the required set of data is retrieved. You shall be introduced to the
relational operators in this unit.

Learning Outcomes
When you have studied this session, you should be able to explain:
10.1 The Relational Calculus
10.2 Relational Operators
10.3 One Table Operators
10.3.1 Restrict/Select
10.3.2 Project
10.4 Two-Table Operators
10.4.1 Cartesian Product.
10.4.2 Union.
10.4.3 Intersection
10.4.4 Difference
10.4.4 Divide / Quotient ()
10.4.5 Natural Join
10.4.6 Theta Join ()
10.4.7 Equi Join

10.1 The Relational Calculus

The Relational Calculus is a formal query language. Instead of having to write a sequence of
relational algebra operations, we simply write a single declarative expression, describing the
results that we want. This is somewhat akin to writing a program in C or java instead of
assembler, or (in the spirit of real world examples!) telling the babysitter to call with any
problems instead of detailing how to pick up the phone, dial numbers, etc.

The expressive power is identical to using relational algebra. Many commercial databases use
a language like Structured Query Language or even a language like QBE (Query by Example)
or QUEL (similar to SQL and used for the INGRES RDBMS). A specific relational query
language is said to be relationally complete if it can be used to express any query that the
relational calculus supports.

There are two common ways of creating a relational calculus (both are based on First Order
Predicate Calculus, or basic logical operators).

 In a Tuple Relational Calculus, variables range over tuples, i.e., variables can take on
values of individual table rows. This is just what we want to do a routine query, such

52
as selecting all food items (tuples) from a grocery store (table) where all the ingredients
(specific attribute) are organic (value), say.

 In a Domain Relational Calculus, variables range over domain values of the attributes.
This tends to be more complex, and variables are required for each distinct attribute.

10.2 Relational Operators

E.F. Codd's work that inspired RDBMSs was based on mathematical notions, so it is no surprise
that the theory of database operations are based on set theory. The Relational Algebra provides
a collection of operations to manipulate relations. It supports the notion of a query, or request
to retrieve information from a database.

Relational algebra is a procedural query language. It takes one or two relations as input and
produces a new relation as result. The fundamental operation in the relational algebra are select,
project, union, set difference, Cartesian product and others like intersection, join, etc. The
operators used on relational databases can be a one table or two-tables type. One table operators
are:
(i) Restrict or Select and
(ii) Project

The two table operators are:


(i) Product (Cartesian)
(ii) Union
(iii) Intersection
(iv) Difference
(v) Division
(vi) Natural join
(vii) Theta join
(viii) Equi join

10.3 One Table Operators

10.3.1 Restrict/Select: This operator extracts specified tuples (rows) from a given relation
based on a specified condition. We extract the tuples for which the condition is true. The size
of the original table is reduced. Sigma () is used to denote selection operator. The predicate
appears as a subscript to  and the argument relation is given in parenthesis following . For
example, to select tuples from students where paid_status is okay:

name = “okay”(Students)

Table 10.1: The resulting relation


name Fee-Code Paid_Status
Chuks Nwata F001 Okay
Mallam Aminat F001 No

The comparison operators (=, <, >, , , <>) and logical operators ( (and),  (or),  (not))
can be used in the selection predicate.

53
10.3.2 Project: This extracts specified attributes from a specified relation in the order in which
the attributes are specified. The operator picks certain attributes from a relation. Project is
denoted by (). For example, if we want to list only the loan number and amount of the loans,
we write:
loan-number, amount (Loan)

Table 9.2: The resulting table


Loan-number amount
L17 1000
L19 1400
L29 1300

In summary, P attrlist (R) selects columns with attributes in attrlist from relation R. We might
have a huge employee table with many attributes we don't want to see, so we can look at a more
directed projection of, perhaps, just SSN and salary. (If one of the attributes is not a key,
potential duplicates are discarded.)

10.4 Two-Table Operators

10.4.1 Cartesian Product.


This builds a relation from two specified relations. The resultant relation comprise of all
possible combinations of concatenations of tuples from the relations. The Cartesian product,
denoted by x, allows us to combine two relations. The Cartesian product of r1 and r2 is written
r1 x r2. Since r1 and r2 may have some common attributes, such attributes are distinguished by
attaching the name of the relation from which the attribute originally came. For example, the
relation schema for r = borrower x loan is

(borrower.customer-name, borrower.loan-number, borrower.branch-name, loan.loan-number,


loan.loan-amount)

The relation name prefix could be dropped from those attributes that appear in only one of the
relations. If we have n1 tuples in r1 and n2 tuples in r2, then there are n1 * n2 ways of choosing
pairs of tuples, one from each relation.

A B AxB
a c x y a c x y
b d z i a c z i
j k a c j k
b d x y
b d z i
b d j k

Product is not used alone, but is used in order to get equi, theta or natural join

Cartesian product is not widely used but typically do a Join operation; takes two relations that
are not necessarily UC, "union compatible" (having the same tuple types) and creates tuples
with combined attributes - R(AttrR1, AttrR2, ... , AttrRi) x S (AttrS1, AttrS2, ... , AttrSj) results
in Q with Ri+Sj attributes, Q (AttrS1, AttrS2, ... , AttrSj, AttrR1, ... , AttrRi).

54
10.4.2 Union.
Given two "union compatible" (having the same tuple types; "UC") relations, it returns a new
relation consisting of the set unions. For instance, suppose Akin John is the new Chief
Executive Officer, CEO of two merged companies A and B and wants to see the total set of his
employees, the Union Operation is performed thus: A.EmployeeTable OR B.EmployeeTable.

Union builds a relation consisting of all tuples appearing in either or both of two specified
relations. The union of two relations r and s denoted by r  s is the set of all tuples from r and
s. Duplicates are eliminated in order not to violate uniqueness rule.

Given U = (A1, A2), if A has three tuples and B, five tuples, then the result relation would have
8 tuples. However, the two relations of different degrees cannot be unionized. So also, relations
of different degrees cannot be found on intersection and difference.

For a union operation to be valid the following conditions must therefore be satisfied.
(i) The two relations must be of the same arity, same degree.
(ii) The domain of ith attribute in both relation must be the same for all i.

10.4.3 Intersection
Intersection creates a new relation by intersecting two UC relations. For instance, Akin John
wants a table of all organizations that are both vegetarian and raw foods in their orientation:
A.VegetarianOrganizations AND B.RawFoodOrganizations

The intersection builds a relation having the same elements from two relations of same degree.
The intersection of two relations r and s denoted by r  s is the set of all tuples that are common
to r and s. The result is a relation consisting of all tuples that occur in both relations. The
intersection can be re-written with a pair of set difference operation as

r  s = r – (r – s)

A good example of intersection is listing out customers having an accounts and are on loans
in a bank.

10.4.4 Difference
The Difference returns the set difference of two UC relations. For instance, Akin John wants
to look at a table of all restaurants in Ibadan that serve vegetarian food but not veal (meat).

This builds a relation consisting of all tuples appearing in the first but not in the second of
two specified relations.

A – B = C, this is not the same as B – A = C, unless A and B are the same. Difference is not
commutative.
A B A–B B–A
a c x y a c z I
b d z i b d j k
x y j k

55
10.4.4 Divide / Quotient ()
Divide takes two relations, one binary and the other unary (Degrees 2 and 1, respectively) and
builds a relation consisting of all values of one attribute of the binary relation that merge (in
the other attribute) all values in the unary relation. A Divide can only be done with two
relations, one of degree 2 and the other of degree 1.
A
Customer- Branch- C
name name B Customer-
Johnson Ibadan Branch- Name
Smith Ilorin Name Smith
Alabi Lagos  Ilorin Dejo
Ayisat Abuja Oyo
Olu Owerri
Dejo Oyo
Mutiat Osogbo

10.4.5 Natural Join


This removes the entire duplicate attributes in the relation using the Project operator. First, find
the Cartesian product of two relations, then select only those tuples that are not duplicate. For
example, find the name of all customers who have a loan at the bank and find the amount of
the loan.

10.4.6 Theta Join ()


This deals with inequality of two attributes.

10.4.7 Equi Join


This selects tuples whose primary keys are the same after a Cartesian product has been
performed on the relations

Consider the example relations below

Primary Key Foreign Key


Matric Age Matric Cleared Hall
1 20 2 Y Zik
2 24 3 N Awo
3 44 4 Y Kuti

Cartesian Products
RecNo Matric1 Age Matric2 Cleared Hall
1 1 20 2 Y Zik
2 1 20 3 N Awo
3 1 20 4 Y Kuti
*4 2 24 2 Y Zik
5 2 24 3 N Awo
6 2 24 4 Y Kuti
7 3 44 2 Y Zik
*8 3 44 3 N Awo
9 3 44 4 Y Kuti
56
Equi Join (EJ) occurs where Matric1 = Matric2. RecNos 4 and 8 satisfy the equi join, since the
matrics are the same.

For Natural Join (NJ), eliminate duplicate attributes, and then project from the Cartesian
product. So we have Matric1, age, cleared and hall. Matric2 is ignored.

NJ = P(R(CP[A,B])). i.e. perform Cartesian product first, restrict next and finally project

A Theta Join (TJ) goes through the same process but we specify a condition that is not an
equality, e.g. Matric1 > Matric2

TJ = P(R(CP[A,B]) Condition or Criterion, Ai)

Matric Age Cleared Hall


2 24 Y Zik
3 44 N Awo

Summary
In this Unit, the concept of relational algebra was discussed. The different operators in
relational algebra were highlighted and illustrated with examples. The relational algebra forms
the basis for the Structured Query Language (SQL).

57
Unit 11 Structured Query Language (SQL) – An Introduction
Expected Duration: 1 week or 2 contact hours

Introduction

The relational operators were proposed as means for manipulating relational databases for
indicating specific subsets of data tables, records or fields in the database for purposes of either:
(a) Retrieval .
(b) Update
(c) Defining views or virtual relations,
(d) Defining snapshot data
(e) Defining Access rights, who should have access and at what time
(g) Defining integrity constraints i.e. Defining some specific rules that the database or
section of it must satisfy.
..
The expressions or operators serve as high level and symbolic method of representing the user's
intent. The expressions also provide high-level representation of various database
manipulations that are required and' hence, should ideally be supported by a relational data
manipulation language such as SQL. In other words, the expressive power of any such language
can be assessed or evaluated in terms of how well and conveniently to the user, the language
can be used to implement each of the operators. .

Of course, if a language does not provide a high level equivalent of the operators, or of any of
the operators, then the database administrator would have to provide it or the user would have
to program at a lower level to achieve that functionality. The SQL commands were developed
by Dates et al at IBM.

SQL is both a Data Definition Language (DDL) and a Data Manipulation Language (DML).
As a DDL, it allows a database administrator or database designer to define tables, create views,
etc. As a DML, it allows an end user to retrieve information from tables. It came from an IBM
Research project entitled "SEQUEL" where the intent was to create a structured English-like
query language to interface to the early System R database system. Along with QUEL, SQL
was the first high level declarative database language.

Learning Outcomes
When you have studied this session, you should be able to explain:
11.1 Few SQL Commands
11.1.1 DDL- Data Definition Language
11.1.2 DML - Data Manipulation Language
11.1.3 DCL- Data Control Language
11.2 Creating Tables
11.3 SELECT STATEMENT (GENERAL FORMAT)

11.1 Few SQL Commands


11.1.1 DDL- Data Definition Language
The DDL Component of RDMS can be used to:
 Create Table e.g. Create Student, Create Course
 Create View
 Create Index

58
 DROP TABLE
 DROP VIEW To totally remove items from the table
 DROP INDEX
 ALTER TABLE, ALTER VIEW, ALTER INDEX - for updating

11.1.2 DML - Data Manipulation Language


The DML exists in the database for manipulation
 SELECT (Retrieval)
 UPDATE
 DELETE Data update Commands
 INSERT

11.1.3 DCL- Data Control Language

The DCL pertains to language elements for granting Access, i.e.


(1) Access control
(2) Grant Transaction Log in
(3) Concurrency control - if two users want to have access to the database at the same time,
which has the highest priority?
Note: Specification like primary key or foreign key are in DDL
11.2 Creating Tables

CREATE TABLE TableName


Field list
Primary key field
Foreign key References, Primary key of Table X

CREATE TABLE RESULTS


(CourseNo char (6) Not Null,
Matric char (6) Not Null,
MarkObtained Decimal (6,2))
Primary Key (course No, Matric),
Foreign key (course No) Reference Courses ,
Foreign key (Matric) References Students )
NOTE: Courses and students must be already created.

CREATE TABLE customer


(customerName char(20) NOT NULL,
customerStreet char(30),
customerCity char(30),
PRIMARY KEY (customerName))

CREATE TABLE branch


(branchName char(15) NOT NULL,
branchCity char(30),
assests integer,
PRIMARY KEY (customerName),
Check (assets >= 0))

59
The check clause permits domains to be restricted.
10.2.2 CREATE UNIQUE INDEX STUDENT ON
Students (Matric) Cluster;

NOTE: For each table created, a unique index table must be created in DB2. Index is created
to speed up search for any data i.e. Information Retrieval. Primary key [unique identification
of each record] only enforces no duplication of records but does not enforce orderliness. The
index only enforces orderliness of the records.

NOTE: UNIQUE is optional

11.2.3 DROP TABLE Results ! This delete table called Results

ALTER TABLE Students


Add Gender Char(1)

11.3 SELECT STATEMENT (GENERAL FORMAT)


The SELECT clause is used to indicate the attributes that are required to be in the resulting
relation from a query. Using the keyword distinct after SELECT eliminates duplicates.
SELECT (Distinct) items/ fields FROM TABLE(S)
(WHERE condition)
(GROUP by fields)
(HAVING conditions)
(ORDER BY fields);
Distinct means if you encounter two records that are the same, bring out only one record.
Examples of usage are:
(1) SELECT items From Tables
(2) SELECT DISTINCT items FROM tables
(3) SELECT items (field names) FROM Tables
Select Name "Total fee paid=", fee paid, Age/2 FROM Students.
(4) SELECT * FROM Students i.e. the * means select everything from students
(5) Select Course. Coursecode from Students
(6) Select CourseReg.Matric, Name, Dept FROM Students, CourseReg
This is selecting all the matric numbers from CourseReg and then match the
matric with Name and Dept from Students. .
(7) SELECT items FROM Tables
WHERE condition
ORDER BY (Items) [ORDER] To sort the data retrieved
The [ORDER] may be Descending/ Ascending
SELECT items FROM TABLES GROUP BY items HAVING conditions ORDER
BY item
The WHERE clause is used to specify the predicate that a tuple in the named relation/tables
must satisfy before it can be included in the resulting relation. For example,

60
SELECT loanNumber FROM Loan Where branchName = “Ibadan” AND amount > 1000
This means find all loan numbers for loans made at Ibadan branch with amount greater than
N1000.
The WHERE clause can involve logical connectives such as AND, OR and NOT. The operands
of the logical connectives can be expressions involving comparison operators <, >, =, >=, <=
and < >. Range of values for the operands in the comparison is specified by using the
BETWEEN keyword.
The FROM clause by itself defines a Cartesian product of the relations in the clause. Since
the natural join can be expressed in terms of a Cartesian product, a selection and a projection
we can easily write an SQL expression for natural join. An example is:

SELECT DISTINCT customerName, loanNumber FROM borrower, loan


WHERE borrower.loanNumber = loan.loanNumber
Users can determine the order in which tuples in a relation are displayed through the ORDER
BY clause, which causes tuples in the result of a query to appear in sorted order. To list in
alphabetic order all customers who have loan at Oyo branch, we write:

SELECT distinct customerName FROM borrower AS B, loan AS L


WHERE B.loanNumber = L.branchName = “Oyo”
ORDER BY customerName

The ORDER BY clause lists items in ascending order by default. The sort order can however
be specified by specifying DESC for descending order or ASC for ascending order. Ordering
can also be performed on multiple attributes. For example, to list the entire loan table in
descending order of amount and to order loans that have the same amount by loanNumber, we
write SQL expression as:

SELECT * FROM loan


ORDER BY amount DESC, loanNumber ASC

The grouping variable takes precedence over other conditions e.g. Grouping Courses by Depts.
Grouping variable is used for large groups and then imposing other conditions.

Summary
Structured Query Language (SQL) was presented in this Unit. Every database creation,
modification and other manipulations are carried out using the SQL commands.

61
Unit 12 Structured Query Language (SQL) – Continuation
Expected Duration: 1 week or 2 contact hours

Introduction
In this unit, we shall continue the studying of SQL queries

Learning Outcomes
When you have studied this session, you should be able to explain:

12.1 Join, Union, Intersection Queries with Select


12.1.1 A Simple Equijoin
12.1.2 Natural Join
12.1.3 Intersection and Union for two compatible tables
12.1.4 Intersection

12.1 Join, Union, Intersection Queries with Select

The ability to join two or more tables is probably the most powerful features of relational
Database systems.
(1) The from ... where ... clause implements Restriction
(2) Select items clause implements projection

These constructs / features enable data (scattered) in separate tables to be concatenated,


compared and selectively updated or retrieved. The join query is one in which data are retrieved
from more than one table. Join queries utilize the restriction, projection, product, intersection
and other operators of relational database theories.

12.1.1 A Simple Equijoin

SELECT S.*, SP.* select all fields of supplies S and products SP


FROM S, SP ....
WHERE S.city = SP.City;

SELECT S. Suppliername, SP. productname FROM S, SP WHERE S.city = SP-city

SELECT Name, Hall FROM Students.Accommodation (Students.Accomodation is


Cartesian product).
WHERE Student.matric = Accommodation.matric [Equijoin]

62
Equijoin - A field in one Table must equal to the field in another Table. If they are not equal,
then it is Theta Join.

Matri Name Cartesian Product


111
c A 111 A 111 Awo
222 B 111 A 102 Zik
111 A 116 Bello
Matri Hall 222 B 111 Awo
111
c Awo 222 B 102 Zik
102 Zik 222 B 116 Bello
116 Bello

Result:

Name Hall
A Awo

12.1.2 Natural Join


All duplications are removed in this case. We don't want two matrics as column names: So, we
only add matric

SELECT Name, Hail, Matric .FROM Students, Accommodation


WHERE Students.matric = Accomrnodation.matric
- ..
SELECT Student.*, Accommodation.*
FROM Students.Accommodation (Cartesian Products)

When we now put WHERE Clause ...

WHERE Students.Matric = Accomodation.Matric (this becomes equijoin)

But instead of =, we put <, <>, >, then we have theta join-
WHERE Students.matric < Accomodation.matric

Lastly, when the condition is actually =, then we have natural join and then restrict all the
fields in the select field.

12.1.3 Intersection and Union for two compatible tables- tables of the same degree and
same fields; duplicate records would merge

SELECT A.* FROM A


UNION / INTERSECT SELECT B.* FROM B - Any identical records would fuse

12.1.4 Intersection
X Y
Matric Name Age Matric Name Age
SELECT X.matric, Y.Age 1 A 20 1 A 20
FROM X, Y 2 B 20 2 B 20
WHERE X.matric = Y.matric AND 3 C 30 4 D 40
X.Age = Y.Age AND
X.name = Y. name

63
For union and intersection, all the columns in the two tables must be the same, i.e, same
Domains. The Cartesian product is:

Matric Name Age Matric Name Age


*1 A 20 1 A 20
1 A 20 2 B 20 Result
1- A 20 4 D 40 Matric Name Age
2' B 20 1 A 20 1 A 20
*2 . B 20 2 B 20 2 B 20
2 B 20 4 D 40
3 C 30 1 A 20
3 C 30 2 B 20
3 C 30 4 D 40

Summary
Structured Query Language (SQL) was presented in this Unit. Every database creation,
modification and other manipulations are carried out using the SQL commands.

Self Assessment Question

You are interested in developing a product tracking system that will monitor the different
products a company is manufacturing, the sales on the products per day, the product type selling
most and the customers buying the products. Design a suitable RD model for the system.

64
Unit 13 Structured Query Language (SQL) – Continuation
Expected Duration: 1 week or 2 contact hours

Introduction
In this unit, we shall continue the studying of SQL queries

Learning Outcomes
When you have studied this session, you should be able to explain:
13.1 String Operations
13.2 Using IN in Nested Queries
13.3 Aggregate Functions in SQL .
13.4 Use of GROUP BY
13.5 Use of HAVING Clause
13.6 SQL Update Operations - Putting data into the database
13.7 DELETE Operation
13.8 General Format for Inserting a New Record

13.1 String Operations

Pattern matching is the most commonly used string operation. This is achieved through the
LIKE operator. Two special characters are also used:
(i) Percent (%): the % character matches any substring
(ii) Underscore (_). The _ character matches any character.

It should be noted that patterns are case sensitive. For example, the query “Find all the names
of all customers whose street address include land”, can be expressed thus:

SELECT customerName FROM customer WHERE customerStreet = “%land”

Escape character is used in patterns that contain special pattern character (i.e. %, _) to indicate
that the special character should be treated like a normal character. The escape character should
be placed before the special character in the string. For example, using backslash (\) as escape
character.
(i) LIKE “ab\%cd” escape “\” matches all strings with “ab%cd”
(ii) LIKE “ab\\cd%” escape “\” matches all strings beginning with “ab\cd”

SQL also permits a variety of functions on character strings, such as concatenation (using ||),
extracting substring, finding length of strings, converting between lowercase and uppercase,
etc.
Suggest the interpretation of the following statements:

(a) SELECT Name FROM Students


WHERE address LIKE "% Ibadan %"

or WHERE address LIKE “Ibadan”


or WHERE address LIKE “*Ibadan”
or WHERE address LIKE “Ibadan*”
or WHERE address LIKE “*Ibadan*”

65
or WHERE Matric like "19-21"

(b) SELECT Name FROM Students WHERE Hall is NULL


i.e. where no hall is allocated or is NOT NULL

13.2 Using IN in Nested Queries

Describe what the output of the following query would produce.

SELECT name FROM Students


. WHERE matric IN
(SELECT Matric FROM payment WHERE feespaid < 10000 OR
UnionDuePaid < 500)

13.3 Aggregate Functions in SQL .

The aggregate functions are COUNT, SUM, AVG, MIN, MAX


We can only perform SUM, AVG on Numeric fields e.g.

SELECT COUNT (*) FROM Students


WHERE condition

SELECT COUNT (DISTINCT AGE)


FROM Students

SELECT SUM (Age) FROM Students


WHERE condition

SELECT MIN (Age) FROM Students

SELECT MAX (Score) FROM Students.

13.4 Use of GROUP BY

SELECT Matric, sum (feespaid) FROM Students


GROUP BY Hall

13.5 Use of HAVING Clause

SELECT Matric, feespaid FROM Students


GROUP BY Hall
HAVING Hallmaster NOT NULL

SELECT matric, name FROM Students


GROUP BY State
HAVING GeogZone = SouthSouth

13.6 SQL Update Operations - Putting data into the database

The UPDATE statement is used to change a value in a tuple without changing all values in the

66
tuple. Tuples to be updated are selected using query. The general format is as follows:

UPDATE TABLE
SET field = Scalar value expression e.g. SET State = "OYO'"
WHERE condition
UPDATE STUDENT
SET Hall = "Akinola"
WHERE
Hall = "AWO"

Suppose, accounts with balance over N10000 receive 6% while others receive 5%, we write

UPDATE account
SET balance = balance * 1.06
WHERE balance > 10000

UPDATE account
SET balance = balance * 1.05
WHERE balance <= 10000

13.7 DELETE Operation

A DELETE request is of the form:

DELETE FROM R
WHERE P

Where P represents a predicate and R represents a relation/table. The DELETE statement


deletes tuples T in R for which P(T) is true. Note that only whole tuples can be deleted. We
cannot delete values of a particular attribute. The predicate WHERE clause may be as complex
as a SELECT command’s WHERE clause. Examples of DELETE requests are:

(i) Delete all tuples from Loan relation/table


DELETE from Loan

(ii) Delete all Smith’s account record


DELETE FROM depositor
WHERE cutomerName = “Smith”

(iii) Delete all loan amounts between N1300 and N1500


DELETE FROM account
WHERE amount BETWEEN 1300 and 1500

(iv) Delete all account at every branch located in Ibadan


DELETE FROM account
WHERE branchName IN (SELECT branchName FROM branch
WHERE branchName = “Ibadan”)

The DELETE command operates only on one relation. To delete tuples from several
relations, we must use one DELETE command for each operation.

67
13.8 General Format for Inserting a New Record

The INSERTION statement is used to insert one or more fields into a relation/table.
The general format is:

INSERT INTO TABLENAME [Field list]


VALUES Value list

For example,

INSERT INTO STUDENTS (MATRIC, AGE, SEX ..)


VALUES (1111, 12,16 ... )

INSERT INTO STUDENTS [SELECT * FROM notPaid


WHERE feesPaid > = 10000]

INSERT INTO account


[SELECT branchName, loanNumber FROM loan
WHERE branchName = “Port Harcourt”]

In the last two example, tuples are inserted into the relation based on the result of a query. It is
important that the SELECT statement be evaluated fully before any insertion is carried out.
This is necessary to avoid insertion of infinite number of tuples due to a request such as

INSERT INTO account


[SELECT * FROM account]

Summary
Structured Query Language (SQL) was presented in this Unit. Every database creation,
modification and other manipulations are carried out using the SQL commands.

Self Assessment Questions:

Consider the following data environment:

In the Department of Computer Science, Master students carry out projects as


part of the requirements for their Masters programme in UI. A student is
supervised by a lecturer. Only one project is done by a student.

The Department is interested in an information system that will give information about
past projects done at Masters level, the titles, abstracts, authors and the supervisors.

Using the above data environment,


(i) Identify all the entities involved in the data environment
(ii) Draw an ER Model for the data environment.
(iii)Design an RD Model for the Department of Computer Science

68
Unit 14 Normalization of Relational Databases
Expected Duration: 1 week or 2 contact hours

Introduction

The theory of database normalization seeks to formalize the process of database design and
structuring. How do we know that a particular set of relations in a database have been properly
or improperly structured? Of course proper or improper structuring must be conditioned by
what will provide efficient, information loss-free and adequate storage, retrieval and update
functionality of the database.

There are a number of objectives in database designs


(1) That the database must contain as much information as possible, no more no less.
(2) Minimal redundancy of data
(3) Ease of update, insertion and deletion without unwanted loss of data, or additional
processing of data.
(4) Minimization of the risk of introducing inconsistencies in the database.

Learning Outcomes

When you have studied this session, you should be able to explain:
14.1 Meaning of Database Normalization?
14.2 Basic Concepts
14.3 Database Problems without Normalization
14.3.1 Insertion Anomaly
14.3.2 Updating Anomaly
14.3.3 Deletion Anomaly
14.4 Normalization Rules
14.4.1 First (1st) Normal Form, INF
14.4.2 2nd Normal Form, 2NF
14.4.3 3rd Normal Form, 3NF
14.4.4 Boyce Codd Normal Form - BCNF
14.4.5 Fourth Normal Form (4NF)

14.1 Meaning of Database Normalization?

Database design is about the design of database schema that would prove robust for all times
as the database is updated. We need to design the component relations of the database in such
a way that for all times, the database can be easily searched and updated without problems.
Database design is also about reflecting about the semantics of a data situation in the database,
so that the semantics become a constraint to be specified or to be met by the database. Age
domain in a company may be 25-55 and in others it could be 16-60. The semantic has changed
for age in this case. Put in another way, given the set of entities Ei and attributes of the entity
Aj, how do we group the attributes into headings of relations to achieve the objective of
efficient database design?

The process of arranging the attributes to form headers of relations is referred to as the
normalization of the relations. Different levels of normalization exist: 1NF, 2NF, 3NF, BCNF

69
etc. These are formal specifications of increasing rigour; to ensure that a database meets the
objectives of a good schema design.

When normalizing a database we should achieve four goals:

1. Arranging data into logical groups such that each group describes a part of the whole
2. Minimizing the amount of duplicated data stored in a database
3. Building a database in which we can access and manipulate the data quickly and
efficiently without compromising the integrity of the data storage
4. Organising the data such that, when we modify it, we make the changes in only one
place

Normalization is a complex process with many specific rules and different intensity levels. In
its full definition, normalization is the process of discarding repeating groups, minimizing
redundancy, eliminating composite keys for partial dependency and separating non-key
attributes.

In simple terms, the rules for normalization can be summed up in a single phrase: "Each
attribute (column) must be a fact about the key, the whole key and nothing but the key".
Said another way, each table should describe only one type of entity (information).

A properly normalized design allows us to:

 Use storage space efficiently


 Eliminate redundant data
 Reduce or eliminate inconsistent data
 Ease the database maintenance burden

Relational database theorists have divided normalisation into several rules called normal
forms:

 Un-normalised data = repeating groups, inconsistent data, delete and insert anomalies.
 First Normal Form (no repeating groups) = each cell of a table must contain a single
value, and the table must not contain repeating groups.
 Second Normal Form (each column must depend on the entire primary key) =
must have met all of the database requirements for the 1st form, and data, which does
not directly depend on the table's primary key must be moved into another table.
 Third Normal Form (each column must depend directly on the primary key) =
must have met all database requirements for both 1st and 2nd forms, and all fields that
can be derived from data contained in the other fields and tables must be removed.

NOTE: We must be able to reconstruct the original flat view of the data. If we violate this
rule, we will have defeated the purpose of normalizing the database.

14.2 Basic Concepts

(1) Primary key- Every relation must have a primary key containing a value unique to
every attribute of the relation, and therefore set to identify the relations.
(2) Candidate key- Those keys that have potential to uniquely identify a relation.

70
Primary key is taken from the candidate keys. The candidate keys normally have 1 to 1
correspondence in a relation.
(3) Functional Dependency: Given a relation Rl, attribute Y is said to be functionally
dependent on attribute X (Rx  Ry) if and only if each X’s value in R has associated with
precisely one Y’s value in R at any one time. X and Y may be composite e.g. Address
may be functionally dependent on matric number i.e. once we know matric number, we
can know the address. Matric  Address. (Address is functionally dependent on Matric,
or Matric determines Address)
(4) Transitive functional Dependency:
Given Matric  Name  Next of kin i.e. we may not be able to get next of kin through
matric but through name, which is functionally dependent on matric.
NOTE: Functional dependency depends on the data situation (semantic).
(5) Full Functional Dependency [FFD]
Attribute Y of relation R is said to be fully functionally dependent on attribute X from
the same relation if it is functionally dependent on X but not functionally dependent on
any proper subsets of X.

Tariff (RouteNO, FairType, Price): Once we know the RouteNo and


FairType, then we know the price. If price could also be known from FairType only then
price is not fully functionally dependent on the composite primary key (RouteNo,
Fairtype). Price is fully functionally dependent on them, if the two composite keys can
jointly determine it but not singly.

14.3 Database Problems without Normalization

If a table is not properly normalized and have data redundancy then it will not only eat up extra
memory space but will also make it difficult to handle and update the database, without facing
data loss. Insertion, Updating and Deletion Anomalies are very frequent if database is not
normalized. To understand these anomalies let us take an example of a Student table.

Matno name Department HOD office_tel


401 Akinola CSC Dr. Chuks 53337
402 Solomon CSC Dr. Chuks 53337
403 Olalekan CSC Dr. Chuks 53337
404 Musa CSC Dr. Chuks 53337

In the table above, we have data of 4 Computer Science students. As we can see, data for the
fields Department, HOD (Head of Department) and office_tel is repeated for the students
who are in the same Department in the college, this is Data Redundancy.

14.3.1 Insertion Anomaly

Now, if we have to insert data of 100 students of same Department, then the Department
information will be repeated for all those 100 students. These scenarios are nothing
but Insertion anomalies.

71
14.3.2 Updating Anomaly

What if Dr. Chuks leaves the college? Or is no longer the HOD of computer science
department? In that case all the student records will have to be updated, and if by mistake we
miss any record, it will lead to data inconsistency. This is Updating anomaly.

14.3.3 Deletion Anomaly

In our Student table, two different information are kept together, Student information and
Department information. Hence, at the end of the academic year, if student records are deleted,
we will also lose the Department information. This is Deletion anomaly.

14.4 Normalization Rule

Normalization rules are divided into the following normal forms:


1. First Normal Form
2. Second Normal Form
3. Third Normal Form
4. BCNF
5. Fourth Normal Form

14.4.1 First (1st) Normal Form, INF

A relation is in 1NF iff (if and only if) all underline simple domains contain atomic values
only. This means if each of the attributes of a table has atomic domain. Atomic values are
taking as units that are indivisible.

The 1st (First) Normal Form is more like the Step 1 of the Normalization process. The 1st
Normal form expects us to design our table in such a way that it can easily be extended and it
is easier for us to retrieve data from it whenever required.
Previously we learned and understood how data redundancy or repetition can lead to several
issues like Insertion, Deletion and Updating anomalies and how Normalization can reduce
data redundancy and make the data more meaningful. If tables in a database are not even in the
1st Normal Form, it is considered as bad database design.

Rules for First Normal Form


The first normal form expects us to follow a few simple rules while designing our database,
and they are:

Rule 1: Single Valued Attributes


Each column of a table should be single valued, which means they should not contain multiple
values.

Rule 2: Attribute Domain should not change


This is more of a "Common Sense" rule. In each column the values stored must be of the same
kind or type.

72
For example: If we have a column dob to save date of births of a set of people, then we cannot
or we must not save 'names' of some of them in that column along with 'date of birth' of others
in that column. It should hold only 'date of birth' for all the records/rows.

Rule 3: Unique name for Attributes/Columns


This rule expects that each column in a table should have a unique name. This is to avoid
confusion at the time of retrieving data or performing any other operation on the stored data.
If one or more columns have same name, then the DBMS system will be left confused.

Rule 4: Order doesn't matters


This rule says that the order in which we store the data in our table doesn't matter.

Consider the following table:

ID/N Name Address TelAdd Email

Address and Name are not atomistic in this case. They could be broken down as follows:

Name - Surname, first name, last name.


Address- Postal Address, street address.

As another example, the relation person with attributes idNumber, name and diploma where
diploma is a set of diploma that a person possesses, is not in 1NF. But the relations
person(idNumber, name), and persDiploma(DipID, DipName) are in 1NF.

Another Example:
Although all the rules are self-explanatory, still let's take an example where we will create a
table to store student data which will have student's Matno, their name and the name of Courses
they have opted for. Here is our table, with some sample data added to it.

Matno name Course


101 Akinola OS, CN
103 Clara Java
102 Aminat C, C++

Our table already satisfies 3 rules out of the 4 rules, as all our column names are unique, we
have stored data in the order we wanted to and we have not inter-mixed different type of data
in columns.

But out of the 3 different students in our table, two have opted for more than one Course. And
we have stored the subject names in a single column. But as per the 1st Normal form each
column must contain atomic value.

How to solve this Problem?


It's very simple, because all we have to do is break the values into atomic values. Here is our
updated table and it now satisfies the First Normal Form.

73
Matno name Course
101 Akinola OS
101 Akinola CN
103 Clara Java
102 Aminat C
102 Aminat C++

By doing so, although a few values are getting repeated but values for the Course column are
now atomic for each record/row.

Using the First Normal Form, data redundancy increases, as there will be many columns with
same data in multiple rows but each row as a whole will be unique.

14.4.2 2nd Normal Form, 2NF

A relation is in 2NF if and only if (iff) it is in 1NF and every non-key attributes in the relation
is fully functionally dependent on the primary key, i.e., there should be no Partial Dependency.
This means that a relation in 2Nf is already in 1NF and any attribute not belonging to a key
does not depend on a part of the key.

What is Partial Dependency? First let's understand what is Dependency in a table?


What is Dependency?
Let's take an example of a Student table with columns student_id, name, reg_no (registration
number), branch and address (student's home address).

Matno Name PhoneNo Department Address

In this table, Matno is the primary key and will be unique for every row, hence we can
use Matno to fetch any row of data from this table. Even for a case, where student names are
same, if we know the Matno, we can easily fetch the correct record.

Matno Name PhoneNo Department Address


10 Akinola 3456 CSC Ibadan
11 Akinola 2356 IT Ekiti

Hence we can say a Primary Key for a table is the column or a group of columns (composite
key) which can uniquely identify each record in the table. We can ask from Department of
student with Matno 10, and we can get it. Similarly, if we ask for name of student
with Matno 10 or 11, we will get it. So all we need is Matno and every other
column depends on it, or can be fetched using it. This is Dependency and we also call
it Functional Dependency.

74
So, What is Partial Dependency?
Now that we know what dependency is, we are in a better state to understand what partial
dependency is. For a simple table like Student, a single column like Matno can uniquely
identify all the records in a table. But this is not true all the time. So now let's extend our
example to see if more than 1 column together can act as a primary key.
Matno Course
1 Java
2 C++
3 Php

Let's create another table for Subject, which will have Matno and Course_Title fields
and Course_Title will be the primary key. Now we have a Student table with student
information and another table Course for storing subject information.
Let's create another table Score, to store the marks obtained by students in the respective
subjects. We will also be saving name of the teacher who teaches that subject along with
marks.
Score Table
score_id Matno Course_id marks Lecturer
1 10 1 70 Java Teacher
2 10 2 75 C++ Teacher
3 11 1 80 Java Teacher

In the score table we are saving the Matno to know which student's marks are these
and Course_id to know for which Course the marks are for. Together, Matno +
Course_id forms a Candidate Key for this table, which can be the Primary key.

Possibly you are confused, how this combination can be a primary key? If we are asked to get
marks of student with Matno 10, can we get it from this table? No, because we don't know for
which Course. And if we are given the Course_id, we would not know for which student. Hence
we need Matno + Course_id to uniquely identify any row.

But Where is Partial Dependency?


Now if we look at the Score table, we have a column names Lecturer which is only dependent
on the Course, for Java it's Java Teacher and for C++ it's C++ Teacher & so on.

Now as we just discussed that the primary key for this table is a composition of two columns
which is Matno & Course_id but the Lecturer's name only depends on Course, hence
the Course_id, and marks has nothing to do with Matno. This is Partial Dependency, where
an attribute in a table depends on only a part of the primary key and not on the whole key.

How to remove Partial Dependency?


There can be many different solutions for this, but our objective is to remove Lecturer's name
from Score table.

75
The simplest solution is to remove columns Lecturer from Score table and add it to the Course
table. Hence, the Course table will become:

Course_id Course_name Lecturer


1 Java Java Teacher
2 C++ C++ Teacher
3 PHP PHP Teacher

And our Score table is now in the second normal form, with no partial dependency.

score_id Matno Course_id marks


1 10 1 70
2 10 2 75
3 11 1 80

Another example, consider the following schema.


Matric, Age, Sex, Hall

Once the matric is known, others could be known.


Note: Course would not be here to minimize data redundancy. Instead, we define another
table:

Matric Course
68888 CSC 711
68888 CSC 712
68888 CSC 741
68911 CSC 711

NOTE: The matric and course would now both serve as primary key. It is an All key relation.
If we put course in the first table and we have 20 courses, it means each of the attributes will
be repeated 20 times for each student, which is data redundancy.

As another example, the relation stock(prodNo, depotNo,, label, quantity) with prodNo and
depotNo as joint key and functional dependencies F = {prodNo, depotNo  quantity, prodNo
 label} is not in 2NF. After normalization, we will have stock(prodNo, depotNo, quantity)
and product(prodNo, label).

Note: the arrow indicates “determines”. For example A  B means A determines B.

14.4.3 3rd Normal Form, 3NF

A relation is in 3NF iff it is in 2NF and every non-key attribute is non-transitively dependent
on the primary key. Another way of stating this is that a relation is in 3NF iff the non-key
attributes (if any) are (a) Mutually independent and (b) fully dependent on the primary key.

76
i.e. "There is no functional dependency among the non-key attributes" i.e., any attribute not
belonging to a key does not depend on any other non-key attribute. For example,

R1 = (Al, A2, A3, A4, A5) There is transitivity among A3 and A4, which are different non-
key attributes in the same relation. Remove them to form another relation, so that we now have
two relations of the forms

R1 = (Al. A2, A5)


R2 = (A3, A4)

As another example, let us consider the schema

Banker-info(branchName, customerName, bankerName, officeNumber) with the functional


dependencies
F = {bankername  branchName, officeNumber, customerName, branchName 
bankerName}

The schema is not in 3NF and therefore can be decomposed to:

BankerOffice (bankerName, branchName, officeNumber)


Banker(customerName, branchName, bankerName)

Summarily, Third Normal Form is an upgrade to Second Normal Form. When a table is in the
Second Normal Form and has no transitive dependency, then it is in the Third Normal Form.
Let's use the same example, where we have 3 tables, Student, Course and Score.

Student Table
Matno Name Reg_no Course Address
10 Akinola 07-WY CSC Ibadan
11 Akinola 08-WY IT Ekiti
12 Kelechi 09-WY IT Enugu

Subject Table
Matno Course_name Lecturer
1 Java Java Teacher
2 C++ C++ Teacher
3 Php Php Teacher

Score Table
Score_id Matno Course_id Marks
1 10 1 70
2 10 2 75
3 11 1 80

77
In the Score table, we need to store some more information, which is the exam name and total
marks, so let's add 2 more columns to the Score table.

Score_id Matno Course_id Marks Exam_name Total_marks

Requirements for Third Normal Form


For a table to be in the third normal form,
1. It should be in the Second Normal form.
2. And it should not have Transitive Dependency.

What is Transitive Dependency?

With exam_name and total_marks added to our Score table, it saves more data now. Primary key
for our Score table is a composite key, which means it's made up of two attributes or columns
→ Matno + Course_id.

Our new column exam_name depends on both student and Course. For example, a mechanical
engineering student will have Workshop exam but a computer science student won't. And for
some subjects we have Practical exams and for some we don't. So we can say that exam_name is
dependent on both Matno and Course_id.

And what about our second new column total_marks? Does it depend on our Score table's
primary key?

Well, the column total_marks depends on exam_name as with exam type the total score changes.
For example, practicals are of less marks while theory exams are of more marks.
But, exam_name is just another column in the score table. It is not a primary key or even a part
of the primary key, and total_marks depends on it. This is Transitive Dependency. When a non-
prime attribute depends on other non-prime attributes rather than depending upon the prime
attributes or primary key.

How to remove Transitive Dependency?


Again the solution is very simple. Take out the columns exam_name and total_marks from Score
table and put them in an Exam table and use the exam_id wherever required.

Score Table: In 3rd Normal Form


Score_id Matno Course_id marks exam_id

78
The new Exam table
exam_id exam_name total_marks
1 Workshop 200
2 Mains 70
3 Practicals 30

Advantages of removing Transitive Dependency


The advantages of removing transitive dependency are:
 Amount of data duplication is reduced.
 Data integrity achieved.

14.4.4 Boyce Codd Normal Form - BCNF

A relation is in BCNF iff every determinant is a candidate key. A determinant is any attribute
of a relation on which some other attributes are fully functionally dependent. Stated
alternatively, for a relation to be in BCNF, each of the non-candidate key attributes must be
fully functionally dependent on each of the candidate keys.
..
Of course, because each candidate key is in 1:1 relationship with each of the other candidate
keys, it means that once a non-candidate key is fully functionally dependent on one of the
candidate keys, it is by implication also dependent on all the other candidate keys, As an
illustration, consider the following relation schemas and their functional dependencies.

(i) Customer (customerName, CustomerStreet, customerCity)


customerName  customerStreet, customerCity

(ii) Branch(branchName, branchCity, assets)


branchName  branchCity, assets

(iii) LoanInfo(branchName, customerName, loanNumber, amount)


loanNumber  branchName, amount

Note: the arrow indicates “determines”. For example A  B means A determines B.

The Customer and Branch schemas are in BCNF, because the only nontrivial dependencies in
the schemas hold on their candidate keys.

The LoanInfo schema however is not in BCNF. First, note that loanNumber is not a superkey
for LoanInfo schema, since we could have a pair of tuples representing a single loan made to
two people, for example,
(Ibadan, Mr John, L100, 1000)
(Ibadan, Mrs Janet, L100, 1000)

However, the functional dependency loanNumber  amount is nontrivial. Therefore,


LoanInfo schema does not satisfy the BCNF. It can however be decomposed to two schemas
as follows: Loan(branchName, loanNumber,amount)
Borrower(customerName, loanNumber)

79
Further Explanations on BCNF

Boyce and Codd Normal Form is a higher version of the Third Normal form. This form deals
with certain type of anomaly that is not handled by 3NF. A 3NF table which does not have
multiple overlapping candidate keys is said to be in BCNF.
Boyce-Codd Normal Form or BCNF is an extension to the third normal form, and is also known
as 3.5 Normal Form.
For a table to satisfy the Boyce-Codd Normal Form, it should satisfy the following two
conditions:
1. It should be in the Third Normal Form.
2. And, for any dependency A → B, A should be a super key.
The second point sounds a bit tricky, right? In simple words, it means, that for a dependency
A → B, A cannot be a non-prime attribute, if B is a prime attribute.

Example:
Below we have a college enrolment table with columns Matno, Course and professor.

Matno Course professor


101 Java Prof. Java
101 C++ Prof. Cpp
102 Java Prof. Java2
103 C# Prof. Chash
104 Java Prof. Java

As can be seen, we have also added some sample data to the table.
In the table above:
 One student can enrol for multiple Courses. For example, student with Matno 101, has
opted for Courses - Java & C++
 For each Course, a professor is assigned to the student.
 And, there can be multiple professors teaching one Course like we have for Java.

What do you think should be the Primary Key?

Well, in the table above Matno, Course together form the primary key, because using Matno
and Course, we can find all the columns of the table.
One more important point to note here is, one professor teaches only one Course, but one
Course may have two different professors. Hence, there is a dependency
between Course and professor here, where Course depends on the professor name.

This table satisfies the 1st Normal form because all the values are atomic, column names are
unique and all the values stored in a particular column are of same domain. This table also
satisfies the 2nd Normal Form as there is no Partial Dependency. And, there is
no Transitive Dependency, hence the table also satisfies the 3rd Normal Form.

But this table is not in Boyce-Codd Normal Form. Why?

80
In the table above, Matno, Course form primary key, which means Course column is a prime
attribute.

But, there is one more dependency, professor → Course.


And while Course is a prime attribute, professor is a non-prime attribute, which is not
allowed by BCNF.

How to Satisfy BCNF?


To make this relation (table) satisfy BCNF, we will decompose this table into two
tables, student table and professor table.

Student Table
Matno Prof_id
101 1
101 2
and so on...

Professor Table
Prof_id Professor Course
1 Prof. Java Java
2 Prof. Cpp C++
and so on...

And now, this relation satisfy Boyce-Codd Normal Form.

14.4.5 Fourth Normal Form (4NF)


Fourth Normal Form comes into picture when Multi-valued Dependency occur in any
relation. For a table to satisfy the Fourth Normal Form, it should satisfy the following two
conditions:
1. It should be in the Boyce-Codd Normal Form.
2. And, the table should not have any Multi-valued Dependency.

What is Multi-valued Dependency?


A table is said to have multi-valued dependency, if the following conditions are true,
1. For a dependency A → B, if for a single value of A, multiple value of B exists, then the
table may have multi-valued dependency.
2. Also, a table should have at-least 3 columns for it to have a multi-valued dependency.
3. And, for a relation R(A,B,C), if there is a multi-valued dependency between, A and B,
then B and C should be independent of each other.
If all these conditions are true for any relation (table), it is said to have multi-valued
dependency.

81
Example:
Below we have a college enrolment table with columns s_id, course and hobby.

Matno course hobby


1 Science Cricket
1 Maths Hockey
2 C# Cricket
2 PHP Hockey

As you can see in the table above, student with Matno 1 has opted for two
courses, Science and Maths, and has two hobbies, Cricket and Hockey.
One must be thinking what problem this can lead to, right?

Well the two records for student with Matno 1, will give rise to two more records, as shown
below, because for one student, two hobbies exists, hence along with both the courses, these
hobbies should be specified.

Matno course hobby


1 Science Cricket
1 Maths Hockey
1 Science Hockey
1 Maths Cricket

And, in the table above, there is no relationship between the columns course and hobby. They
are independent of each other. So there is multi-value dependency, which leads to un-necessary
repetition of data and other anomalies as well.

How to satisfy 4th Normal Form?


To make the above relation satisfy the 4th normal form, we can decompose the table into 2
tables.

CourseOpted Table

Matno course
1 Science
1 Maths
2 C#
2 Php

82
And, Hobbies Table,

Matno hobby
1 Cricket
1 Hockey
2 Cricket
2 Hockey

Now this relation satisfies the fourth normal form.


A table can also have functional dependency along with multi-valued dependency. In that case,
the functionally dependent columns are moved in a separate table and the multi-valued
dependent columns are moved to separate tables. If we design our database carefully, we can
easily avoid these issues.

Other Normal forms we (may) have are:

5NF (Fifth Normal Form)


A table is in 5th Normal Form only if it is in 4NF and it cannot be decomposed into any
number of smaller tables without loss of data.

6NF (Sixth Normal Form) Proposed


6th Normal Form is not standardized, yet however, it is being discussed by database experts
for some time. Hopefully, we would have a clear & standardized definition for 6th Normal
Form in the near future…

Summarily... _
 A determinant is any attribute, simple or composite, in which another attribute in a
relation is Fully Functionally Dependent, FFD. Candidate keys are mutually functionally
dependent on each other.
 lNF - means Atomicity of attribute values. .
 2NF- every non-key attributes is fully functionally dependent on the primary key. 2NF
necessitates a composite key.
 3NF - No transitivity. Every non-primary key field must be functionally dependent on the
primary key.
 BCNF = 3NF' Not substantially different from 3NF. It comes into play when we have
more than one candidate key in a relation. There is a subtle difference between the two.

( CourseCode, Matric, Mark) This relation has two primary keys

Mark must be fully dependent on the two primary keys- This is Full Functional Dependency,
FFD. Partial FFD occurs if mark only depends on Matric e.g. if the same mark is awarded to
students depending on the course.

(MATRIC, NAME, AGE, HALL, GENDER)

83
NOTE: from the above schema, Age is transitively dependent on matric through Name.
BCNF Tries to alleviate this problem of 3NF by saying that if Name is also candidate key, then
the problem is removed.

P A1 A2 A3

Breaking the relation To get 3Nf

P A1 A2

A1 A3

Self Assessment Questions (SAQs)

(1) Given the following data situation in an academic community:

Student(Matric, Name, Hall, Sex, Dept, faculty, Age, Coursel, Markl, Course2, Mark2,
Courses3, Mark3, NextKinl, NextKin2, Allergyl, Allergy2, Sportsl, Sports2, FeeTypelPaid,
AmounPaid., DatePaid, FeeType2Paid)

Lecturers (IDNO, Name, Salary, BirthDate, CourselTaught, Course2Taught,


Course3Taught, Nextkinl, Nextkin2, LecturerPublication)

Courses (CourseCode, CourseDescription, Unit, Semester, Session. Dept, LecturerName,


LectID, CoursePrerequisite, RequiredFor, Status)

Develop a database schema for the above data situation that satisfy BCNF

(2) Consider the following relation

Emp-ID Name Dept Salary Course Date-Completed


100 Johnson Marketing 10,000 SPSS 19/6/2007
100 Johnson Marketing 10,000 Surveys 7/10/2007
140 John Accounting 15,000 Tax Accting 21/5/2007
110 Allen Info. System 20,000 SPSS 1/12/2007
110 Allen Info. System 20,000 Surveys 10/12/2007

Explain the concept of database anomalies citing references from the relation above.
(3) Give any two reasons why you think database normalization is very important. Discuss
the process of normalizing database tables up to 3rd Normal Form (3NF).

84
Unit 15 Database Security
Expected Duration: 1 week or 2 contact hours

Introduction

Data is a valuable entity that must have to be firmly handled and managed as with any
economic resource. So some part or all of the commercial data may have tactical importance
to their respective organization and hence must have to be kept protected and confidential.
There is a range of computer-based controls that are offered as countermeasures to these
threats. In this Unit, you will learn about the scope of database security.

Learning Outcomes
When you have studied this session, you should be able to explain:
15.1 Meaning of Database Security
15.2 Importance of Data Security
15.3 Threats to Database
15.3.1 Major Threats to Data Security
15.3.2 Types of Threats
15.4 Integrity Controls: Backups
15.5 Aspects of Data Security
15.6 Types of Security Control on Data
15.7 Database Security Best Practices
15.8 Role of Database Administrator in Data Security
15.9 How to Secure a Database Server
15.10 Security in SQLs

15.1 Meaning of Database Security

Database security involves ensuring that users of database system are allowed or not allowed
to do what they want on the database depending on their circumstances. It is the technique that
protects and secures the database against intentional or accidental threats. Security concerns
will be relevant not only to the data resides in an organization's database: the breaking of
security may harm other parts of the system, which may ultimately affect the database
structure.

Consequently, database security includes hardware parts, software parts, human resources, and
data. To efficiently do the uses of security needs appropriate controls, which are distinct in a
specific mission and purpose for the system. The requirement for getting proper security while
often having been neglected or overlooked in the past days; is now more and more thoroughly
checked by the different organizations.

We consider database security under the following situations:


 Theft and fraudulent.
 Loss of confidentiality or secrecy.
 Loss of data privacy.
 Loss of data integrity.
 Loss of availability of data.

85
These listed circumstances mostly signify the areas in which the organization should focus on
reducing the risk, that is, the chance of incurring loss or damage to data within a database. In
some conditions, these areas are directly related such that an activity that leads to a loss in one
area may also lead to a loss in another since all of the data within an organization are
interconnected.

By contrast, database integrity is concerned with ensuring that the things that users want to
do are correct in terms of not damaging the accuracy or validity of the data in the database.

15.2 Importance of Data Security

Data security is critical for most business and even home computer users. Client information,
payment information, personal files, bank account details- all this information can be hard to
replace and potentially dangerous if it falls into the wrong hands. Data lost due to disaster such
as a flood of fire is crushing, but losing it to hackers or a malware infection can have much
greater consequences.

15.3 Threats to Database

Any situation or event, whether intentionally or incidentally, can cause damage, which can
reflect an adverse effect on the database structure and, consequently, the organization. A threat
may occur by a situation or event involving a person or the action or situations that are probably
to bring harm to an organization and its database.

The degree that an organization undergoes as a result of a threat's following which depends
upon some aspects, such as the existence of countermeasures and contingency plans. Let us
take an example where you have a hardware failure that occurs corrupting secondary storage;
all processing activity must cease until the problem is resolved.

15.3.1 Major Threats to Data Security

Many software vulnerabilities, misconfigurations, or patterns of misuse or carelessness could


result in breaches. Here are a number of the most known causes and types of database
security cyber threats.

a. Accident can happen due to human error or software/ hardware error.


b. Hackers could steal vital information and fraud can easily be perpetrated.
c. Loss of data integrity.
d. Improper data access to personal or confidential data.
e. Loss of data availability through sabotage, a virus, or a worm.

15.3.2 Types of Threats

(1) Insider Threats

An insider threat is a security risk from one of the following three sources, each of which has
privileged means of entry to the database:
 A malicious insider with ill-intent
 A negligent person within the organization who exposes the database to attack through
careless actions

86
 An outsider who obtains credentials through social engineering or other methods, or
gains access to the database’s credentials

An insider threat is one of the most typical causes of database security breaches and it often
occurs because a lot of employees have been granted privileged user access.

(2) Human Error

Weak passwords, password sharing, accidental erasure or corruption of data, and other
undesirable user behaviors are still the cause of almost half of data breaches reported.

(3) Exploitation of Database Software Vulnerabilities

Attackers constantly attempt to isolate and target vulnerabilities in software, and database
management software is a highly valuable target. New vulnerabilities are discovered daily, and
all open source database management platforms and commercial database software vendors
issue security patches regularly. However, if we don’t use these patches quickly, our database
might be exposed to attack. Even if we do apply patches on time, there is always the risk
of zero-day attacks, when attackers discover a vulnerability, but it has not yet been discovered
and patched by the database vendor.

(4) SQL/NoSQL Injection Attacks

A database-specific threat involves the use of arbitrary non-SQL and SQL attack strings into
database queries. Typically, these are queries created as an extension of web application forms,
or received via HTTP requests. Any database system is vulnerable to these attacks, if
developers do not adhere to secure coding practices, and if the organization does not carry out
regular vulnerability testing.

(5) Buffer Overflow Attacks

Buffer overflow takes place when a process tries to write a large amount of data to a fixed-
length block of memory, more than it is permitted to hold. Attackers might use the excess data,
kept in adjacent memory addresses, as the starting point from which to launch attacks.

(6) Denial of Service (DoS/DDoS) Attacks

In a Denial of Service (DoS) attack, the cybercriminal overwhelms the target service - in this
instance, the database server - using a large amount of fake requests. The result is that the server
cannot carry out genuine requests from actual users, and often crashes or becomes unstable.

In a Distributed Denial of Service Attack (DDoS), fake traffic is generated by a large number
of computers, participating in a botnet controlled by the attacker. This generates very large
traffic volumes, which are difficult to stop without a highly scalable defensive architecture.
Cloud-based DDoS protection services can scale up dynamically to address very large DDoS
attacks.

87
(7) Malware

Malware is software written to take advantage of vulnerabilities or to cause harm to a database.


Malware could arrive through any endpoint device connected to the database’s network.
Malware protection is important on any endpoint, but especially so on database servers,
because of their high value and sensitivity.

15.4 Integrity Controls: Backups

Backing up is the process of copying and archiving of computer data so it may be used to
restore the original after a data loss event.
Backups have two distinct purposes:
 The primary purpose is to recover data after its loss, be it by data deletion or corruption.
 The secondary purpose of backups is to recover data from an earlier time, according to
a user-defined data retention policy, typically configured within a backup application
for how long copies of data are required. Backup is just one of the disaster recovery
plans.

15.5 Aspects of Data Security

(i) Legal- Set by law, Breach means penalty.


(ii) Social - No legal regulation, certain people are prevented.
(iii) Ethical- that an action is not good from an ethical point of view
(iv) Physical Controls - Machine itself, Burglary proof, etc.
(v) Hardware Controls - One can't start the computer, or One can't steal the computer.
(vi) Operating System Controls - One cannot start the computer
(vii) Operational controls - putting something on the door that reads Out-of-Bounds to Non-
Database Workers.
(viii) Backup- backing up files.
(ix) Policy Controls- Who should have access to the database, set by the organization.
(x) DBMS Controls- Controls set within the database itself.

15.6 Types of Security Control on Data


1. Access Control
This is the selective restriction of access to a place or other resource. The act of accessing
may mean consuming, entering, or using. Permission to access a resource is called
authorization.
The usual way of supplying access controls to a database system is dependent on the
granting and revoking of privileges within the database. A privilege allows a user to create
or access some database object or to run some specific DBMS utilities. Privileges are
granted users to achieve the tasks required for those jobs.
The database provides various types of access controls:
 Discretionary Access Control (DAC)
 Mandatory Access Control (MAC)

88
2. Auditing
Database auditing involves observing a database so as to be aware of the actions of
database users. Database administrators and consultants often set up auditing
for security purposes, for example, to ensure that those without the permission to access
information do not access it.
3. Authentication
This is the validation control that allows login into a system, email or blog account etc.
Once logged in, we have various privileges until logging out. Some systems will cancel a
session if the machine has been idle for a certain amount of time, requiring that we prove
authentication once again to re-enter. We can log in using multiple factors such as a
password, a smart card or even a fingerprint.
4. Encryption
This security mechanism uses mathematical scheme and algorithms to scramble data into
unreadable text. It can only be decoded or decrypted by the party that possesses the
associated key.
5. Back Up
This is the process of making copy and archiving of computer data in the event
of data loss which is used to restore the original data. Every Database Management
System should offer backup facilities to help with the recovery of a database after a
failure. It is always suitable to make backup copies of the database and log files at the
regular period and for ensuring that the copies are in a secure location. In the event of a
failure that renders the database unusable, the backup copy and the details captured in the
log file are used to restore the database to the latest possible consistent state.

6. Password
This is sequence of secret characters used to enable access to a file,
program, computer system and other resources.

15.7 Database Security Best Practices

1. Deploy physical database security


Data centers or servers can be susceptible to physical attacks by outsiders or even insider
threats. If a cybercriminal gets access to a physical database server, they can steal the data,
corrupt it or even insert harmful malware to gain remote access. Without additional security
measures, it is often difficult to detect these types of attacks since they can bypass digital
security protocols.
When choosing a web hosting service, we should make sure it is a company with a known track
record of taking security matters seriously. It is also best to avoid free hosting services because
of the possible lack of security.
If we house our own servers, adding physical security measures such as cameras, locks and
staffed security personnel is highly suggested. Furthermore, any access to the physical servers
should be logged and only given to specific people in order to mitigate the risk of malicious
activities.

89
2. Separate database servers
Databases require specialized security measures to keep them safe from cyberattacks.
Furthermore, having our data on the same server as our site also exposes it to different attack
vectors that target websites.
Suppose we run an online store and keep our site, non-sensitive data and sensitive data on the
same server. Sure, we can use website security measures provided by the hosting service and
the eCommerce platform’s security features to protect against cyberattacks and fraud.
However, our sensitive data is now vulnerable to attacks through the site and the online store
platform. Any attack that breaches either our site or the online store platform enables the
cybercriminal to potentially access our database, as well.
To mitigate these security risks, we separate our database servers from everything else.
Additionally, we use real-time Security Information and Event Monitoring (SIEM), which is
dedicated to database security and allows organizations to take immediate action in the event
of an attempted breach.
3. Set up an HTTPS proxy server
A proxy server evaluates requests sent from a workstation before accessing the database server.
In a way, this server acts as a gatekeeper that aims to keep out non-authorized requests.
The most common proxy servers are based on HTTP. However, if we are dealing with sensitive
information such as passwords, payment information or personal information, we set up an
HTTPS server. This way, the data traveling through the proxy server is also encrypted, giving
us an additional security layer.
4. Avoid using default network ports
TCP and UDP protocols are used when transmitting data between servers. When setting up
these protocols, they automatically use default network ports.
Default ports are often used in brute force attacks due to their common occurrence. When not
using the default ports, the cyber attacker who targets the server must try different port number
variations with trial and error. This could discourage the assailant from prolonging their attack
attempts due to the additional work that is needed.
However, when assigning a new port, we check the Internet Assigned Numbers
Authority’s port registry to ensure the new port isn’t used for other services.
5. Use real-time database monitoring
Actively scanning a database for breach attempts bolsters (boosts) security and allows us to
react to potential attacks. We can use monitoring software such as Tripwire’s real-time File
Integrity Monitoring (FIM) to log all actions taken on the database’s server and alert us of any
breaches. Furthermore, we set up escalation protocols in case of potential attacks to keep our
sensitive data even safer.
Another aspect to consider is regularly auditing the database security and organizing
cybersecurity penetration tests. These allow us to discover potential security loopholes and
patch them before a potential breach.
6. Use database and web application firewalls
Firewalls are the first layer of defense for keeping out malicious access attempts. On top of
protecting a site, we should also install a firewall to protect the database against different attack
vectors.

90
There are three types of firewalls commonly used to secure a network:
 Packet filter firewall
 Stateful packet inspection (SPI)
 Proxy server firewall
We make sure to configure our firewall to cover any security loopholes correctly. It is also
essential to keep our firewalls updated, as this protects our site and database against new
cyberattack methods.
7. Deploy data encryption protocols
Encrypting our data isn’t just important when keeping our trade secrets; it is also essential when
moving or storing sensitive user information. Setting up data encryption protocols lowers the
risk of a successful data breach. This means that even if cybercriminals get a hold of our data,
that information remains safe.
8. Create regular backups of the database
While it is common to create backups of our website, it is essential to create backups for our
database regularly, as well. This mitigates the risk of losing sensitive information due to
malicious attacks or data corruption.
Here is how to create database backups on the most popular servers: Windows and Linux. Also,
to further increase security, ensure that the backup is stored and encrypted in a separate server.
This way, our data is recoverable and safe if the primary database server gets compromised or
remains inaccessible.
9. Keep applications up to date
Research shows that nine in 10 applications contain outdated software components.
Furthermore, analysis on WordPress plugins revealed that 17,383 plugins hadn’t been updated
for two years, 13,655 for three years and 3,990 for seven years (reported as at early 2022).
Together, this creates a serious security risk when thinking about software that we use to
manage our database or even run our website.
While we should only use trusted and verified database management software, we should also
keep it updated and install new patches when they become available. The same goes for
widgets, plugins and third-party applications, with an additional suggestion to avoid the ones
that have not received regular updates. Steer clear of them altogether.
10. Use strong user authentication
According to Verizon’s most recent research, 80% of data breaches are caused by
compromised passwords. This shows that passwords alone are not a great security measure,
primarily because of the human-error aspect of creating strong passwords.
To combat this issue and add another layer of security to a database, we set up a multi-factor
authentication process. (This method isn’t perfect because of recent trends.) Even if credentials
get compromised, cyber criminals will have a difficult time going around this security protocol.
Also, we consider only allowing validated Internet Protocol (IP) addresses to access the
database to mitigate the risk of a potential breach further. While IP addresses can be copied or
masked, it requires additional effort from the assailant.

91
11. Enhance database security to mitigate the risks of a data breach
Keeping the database secure against malicious attacks is a multi-faceted endeavor, from the
servers’ physical location to mitigating the risk of human error. Even though data breaches are
becoming more frequent, maintaining healthy security protocols lowers the risk of being
targeted and helps to avoid a successful breach attempt.

15.8 Role of Database Administrator in Data Security


A database administrator (DB) is a person responsible for the installation, configuration,
upgrade, administration, monitoring and maintenance of databases in an organization. The role
includes the development and design of database strategies, system monitoring and improving
database performance and capacity, and planning for future expansion requirements. They may
also plan, co-ordinate an implement security measures to safeguard the database.
A database administrator’s responsibilities can include the following tasks:
1. Installing and upgrading the database server and application tools.
2. Allocating system storage and planning future storage requirements for the database
system
3. Modifying the database structure, as necessary, from information given by application
developers.
4. Enrolling users and maintaining system security.
5. Ensuring compliance with database vendor license agreement.
6. Controlling and monitoring user access to the database.
7. Monitoring and optimizing the performance of the database.
8. Planning for backup and recovery of database information.
9. Maintaining archive data.
10. Backing and restoring databases.
11. Contacting database vendor for technical support.
12. Generating various reports by querying database as per need.
13. Define the roles of a database administrator in data security
14. Define backup and list its importance in data security

15.9 How to Secure a Database Server


A database server is a physical or virtual machine running the database. Securing a database
server, also known as “hardening”, is a process that includes physical security, network
security, and secure operating system configuration.
1. Ensure Physical Database Security
Refrain from sharing a server for web applications and database applications, if the database
contains sensitive data. Although it could be cheaper, and easier, to host a site and database
together on a hosting provider, we are placing the security of our data in someone else’s hands.

92
If we do rely on a web hosting service to manage our database, we should ensure that it is a
company with a strong security track record. It is best to stay clear of free hosting services due
to the possible lack of security.
If we manage our database in an on-premise data center, we keep in mind that our data center
is also prone to attacks from outsiders or insider threats. We must ensure we have physical
security measures, including locks, cameras, and security personnel in our physical facility.
Any access to physical servers must be logged and only granted to authorized individuals.
In addition, we should not leave database backups in locations that are publicly accessible, such
as temporary partitions, web folders, or unsecured cloud storage buckets.
2. Lock Down Accounts and Privileges
Let us consider the Oracle database server. After the database is installed, the Oracle Database
Configuration Assistant (DBCA) automatically expires and locks most of the default database
user accounts.
If we install an Oracle database manually, this doesn’t happen and default privileged accounts
won’t be expired or locked. Their password stays the same as their username, by default.
An attacker will try to use these credentials first to connect to the database.
It is critical to ensure that every privileged account on a database server is configured with a
strong, unique password. If accounts are not needed, they should be expired and locked.
For the remaining accounts, access has to be limited to the absolute minimum required. Each
account should only have access to the tables and operations (for example, SELECT or
INSERT) required by the user. We should avoid creating user accounts with access to every
table in the database.
3. Regularly Patch Database servers
We must ensure that patches remain current. Effective database patch management is a crucial
security practice because attackers are actively seeking out new security flaws in databases,
and new viruses and malware appear on a daily basis. A timely deployment of up-to-date
versions of database service packs, critical security hotfixes, and cumulative updates will
improve the stability of database performance.
4. Disable Public Network Access
Organizations store their applications in databases. In most real-world scenarios, the end-user
doesn’t require direct access to the database. Thus, we should block all public network access
to database servers unless we are a hosting provider. Ideally, an organization should set up
gateway servers (VPN or SSH tunnels) for remote administrators.
5. Encrypt All Files and Backups
Irrespective of how solid our defenses are, there is always a possibility that a hacker may
infiltrate our system. Yet, attackers are not the only threat to the security of our database. Our
employees may also pose a risk to our business. There is always the possibility that a malicious
or careless insider will gain access to a file they don’t have permission to access.
Encrypting our data makes it unreadable to both attackers and employees. Without an
encryption key, they cannot access it, this provides a last line of defense against
unwelcome intrusions. Encrypt all-important application files, data files, and backups so that
unauthorized users cannot read your critical data.

93
15.10 Security in SQLs

Method of shielding data from unauthorized access in SQL - 1st approach- constraints view

CREATE VIEW WhatCanBeSeen AS


SELECT items FROM Table
WHERE condition
. GROUP BY
ORDER BY

IF USER IN ... this group


CREATE VIEW WhatThisUserCanSee AS
SELECT (Required items)

Another approach is using grant or invoke

GRANT RightType/ ALL .


(Field(s)) if field is not specified, all the fields in the database would be retrieved.
ON TABLE [TableName (s)]
TO UserName(s)/ PUBLIC

INVOKE will over write the existing grant. Right type could be Select, update, delete, insert

Another example:

GRANT INDEX ON i.e. Grant permission to create index on staff to SalesAdmin


TABLE Staff
To SalesAdmin

REVOKE INSERT, DELETE


ON TABLE SP
FROM (Usernames)

GRANT SELECT
ON TABLE X
TO (USERNAMES)
WITH GRANT OPTION

Option means the user can also grant the same right to another people.

Summary

While security refers to the protection of data against un-authorized disclosure, alteration or
destruction; integrity refers to the validity or meaningfulness of the data. However, both data
security and integrity are ensured through constraints on the access and
update operation that different users may perform on different objects in the database. Tables,
Queries, views, reports, indexes and the database itself are objects of the database. The
constraints might be provided by the DBMS or might have to be specified by the DBA in line
with an organization's policy on database security.

94
Self-Assessment Question (SAQs)

Discuss the measures to take to fully secure a database resource in an organization

95
Unit 16 Database Transactions and Concurrency Controls

Expected Duration: 1 week or 2 contact hours

Introduction
Earlier, you have learned about the functions that the DBMS should have. Among these, some
closely related functions are proposed to make sure that any database should be reliable and
remain in a steady state.

The names of the functions are:

 Transaction support
 Concurrency Control
 Recovery services

Although each function can be discussed discretely, they are mutually dependent. The
reliability and consistency must be maintained in the presence of failures of both hardware and
software components and when several users are accessing the database.

Many DBMSs allow users to carry out simultaneous operations on the database. If these
operations are not restricted, the accesses may get in the way with one another, and the database
can become incompatible. For defeating this problem, the DBMS implements a concurrency
control technique using a protocol that prevents database accesses from prying with one
another. In this Unit, you will learn about the concurrency control and transaction support for
any centralized DBMS that consists of a single database.

Learning Outcomes
At the end of this Unit, you should be able to understand and explain:

16.1 Meaning of Database Transaction


16.2 Properties of Transaction
16.3 Concurrency Controls
16.4 The Need for Concurrency Control

16.1 Meaning of Database Transaction


A transaction is an action or series of actions that are being performed by a single user or
application program, which reads or updates the contents of the database.

A transaction can be defined as a logical unit of work on the database. This may be an entire
program, a piece of a program, or a single command (like the SQL commands such as INSERT
or UPDATE), and it may engage in any number of operations on the database. In the database
context, the execution of an application program can be thought of as one or more transactions
with non-database processing taking place in between.

Example of a Transaction in DBMS


A simple example of a transaction will be dealing with the bank accounts of two users, let say
Akinola and Hammed. A simple transaction of moving an amount of N5000 from Akinola to
Hammed engages many low-level jobs. As the amount of N5000 gets transferred from the

96
Akinola’s account to Hammed’s account, a series of tasks gets performed in the background of
the screen.

This straightforward and small transaction includes several steps: decrease Akinola's bank
account from N5000:

Open_Acc (Akinola)

OldBal = Akinola.bal

NewBal = OldBal - 5000

Ram.bal = NewBal

CloseAccount(Akinola)

One can say, the transaction involves many tasks, such as opening the account of Akinola,
reading the old balance, decreasing the specific amount of N5000 from that account, saving
new balance to an account of Akinola, and finally closing the transaction session.
For adding amount N5000 in Hammed's account, the same sort of tasks needs to be done:

OpenAccount(Hammed)

Old_Bal = Hammed.bal

NewBal = OldBal + 5000

Ahmed.bal = NewBal

CloseAccount(Hammed)

16.2 Properties of Transaction


There are properties that all transactions should follow and possess. The four basic are in
combination termed as ACID properties. ACID properties and its concepts of a transaction are
put forwarded by Haerder and Reuter in the year 1983. The ACID has a full form and is as
follows:

 Atomicity: The 'all or nothing' property. A transaction is an indivisible entity that is


either performed in its entirety or will not get performed at all. This is the responsibility
or duty of the recovery subsystem of the DBMS to ensure atomicity.

 Consistency: A transaction must alter the database from one steady-state to another
steady state. This is the responsibility of both the DBMS and the application developers
to make certain consistency. The DBMS can ensure consistency by putting into effect
all the constraints that have been mainly on the database schema such as integrity and
enterprise constraints.

 Isolation: Transactions that are executing independently of one another is the primary
concept followed by isolation. In other words, the frictional effects of incomplete
transactions should not be visible or come into notice to other transactions going on

97
simultaneously. It is the responsibility of the concurrency control sub-system to ensure
adapting the isolation.

 Durability: The effects of an accomplished transaction are permanently recorded in the


database and must not get lost or vanished due to subsequent failure. So this becomes
the responsibility of the recovery sub-system to ensure durability.

16.3 Concurrency Controls

It is to be noted that the transaction is very closely related to concurrency control. Concurrency
Controls in Database means the method of managing concurrent operations on the database
without getting any obstruction with one another.

16.4 The Need for Concurrency Control

A key purpose in developing a database is to facilitate multiple users to access shared data in
parallel (i.e., at the same time). Concurrent accessing of data is comparatively easy when all
users are only reading data, as there is no means that they can interfere with one another.
However, when multiple users are accessing the database at the same time, and at least one is
updating data, there may be the case of interference, which can result in data inconsistencies.
Concurrency control technique implements some protocols which can be broadly classified
into two categories. These are:

1. Lock-based protocol: Those database systems that are prepared with the concept of
lock-based protocols employ a mechanism where any transaction cannot read or write
data until it gains a suitable lock on it.

2. Timestamp-based Protocol: It is the most frequently used concurrency protocol. It is


the timestamp-based protocol. This protocol uses either system time or logical counter
as a timestamp.

98
Practical Tasks on Microsoft Access RDBMS

Creating Your Microsoft Access Database Tables


1. Open MS Access, Blank Database is selected by default. Now Type the
name you want to give to the database at the lower right corner under File
Name. Finally click on Create.
2. Suppose we are interested in creating Tables for Customer and Product that
look like the ones below

Table Customer:

Table Product:

(a) As soon MS Access is opened, a table called Table 1 is displayed by default.


You can use this Table1 for the Customer Table. Right Click on Table 1 at
the left pane and click on Design View.
(b) In the popped up window, type Customer as the name of the Table1
(c) Change the default ID to CustID. Observe that this field has been chosen for
you to be the primary key for the table
(d) Change the Data Type from Autonumber to Text
(e) Change the Field Size in the down pane to 10 or maximum number of
characters that you desire. You may leave the Description column for now. It
is where you type the description of the fields ( A form of data dictionary).
(f) Click on second row, type CustName as field name. Click on Data Type cell,
Text is selected by default. Just change the field size to something like 30.
(g) Continue in this way to create the other fields. Save the table
(h) To enter some records into the table, double click the table Customer in the
Tables pane. Add the records.
(i) To create the second table, click on Create Menu, then Table design
(j) Type all the field names and their data types.
(k) To insert the Primary Key on PID, click on PID cell, a window is popped up.
Look for the Key symbol by the left of the popped up window, click on the
key icon.

99
(l) Save the table as Product.

3. The Third table will be a relationship (look-up) table for the initial tables
Customer and Product. Note that both CustID and PID from the base tables
will jointly form the primary key for this relationship table. So, we set them
up to “look up to their sources”. We shall call it Customer_Product Table. A
sample of it looks like the one below:

(a) Click on Create, then Table Design.


(b) Type CustID in the first cell and PID in the second row. Leave the Data
Types as they were for now.
(c) Select the two rows and assign them as Primary key together. Note that the
key symbol must appear on both rows, as shown below.

(d) Save the table as Customer_Product


(e) In the Data Type cell for CustID, choose Look up wizard, instead of Text or Number.
(f) Click next and select table Customer. Then click next
(g) Move CustId to Selected Fields panel, click Next.
(h) Choose CustID to sort. Click Next
(i) If you want to change the label for the field, you may do so here. Click on
Enable Data Integrity. And then click on Finish
(j) Save, Click Yes for the problem dialog box and then click OK
(k) Repeat the same process for PID
(l) Create the other fields in the table: OrderedDate, Quantity, InvoiceNo, etc
(m) Add data into the database tables

100
Where to type SQL statements in Microsoft Access "2007", "2010", "2013"
or Access "2016"

http://www.jaffainc.com/SQLStatementsInAccess.htm#Access2007

Follow this procedure:

1. After launching Microsoft Access, either select "more" to open an existing


database or click "Blank Database" to create a New database. If you are creating a
new database, type a name (any name is fine) for your database in the “File Name”
box. Next, click the “Create” button.

Note: If you are selecting an existing database (i.e the downloaded course
database), browse (locate where you saved the database on your computer) for the
database after you click "more".

2. Once Access opens, Click “Create” from the menu running across the top of the
screen.

3. Next, Click the “Query Design” button.

4. You'll see a “Show Table” dialog box. Click close on this dialog box without
selecting any tables.

5. Select the “SQL View” or “SQL” button near the top left of the screen.

6. Use the "SQL View" or “SQL” button to select “SQL View”. (Click the down
arrow located on this button to locate “SQL View”).

7. Type your SQL commands in this view (SQL View).

To run a command, click the "Run" button.

101
SQL Tutorials (Microsoft Access SQL)

1. SELECT Statement
Instructs the Microsoft Access database engine to return information from the database as a set
of records.
Syntax
SELECT [predicate] { * | table.* | [table.]field1 [AS alias1] [, [table.]field2 [AS alias2] [,
…]]} FROM tableexpression [, …] [IN externaldatabase] [WHERE… ] [GROUP
BY… ] [HAVING… ] [ORDER BY… ] [WITH OWNERACCESS OPTION]
The SELECT statement has these parts:

Part Description

predicate One of the following predicates: ALL, DISTINCT, DISTINCTROW, or


TOP. You use the predicate to restrict the number of records returned. If
none is specified, the default is ALL.

* Specifies that all fields from the specified table or tables are selected.

table The name of the table containing the fields from which records are
selected.

field1, field2 The names of the fields containing the data you want to retrieve. If you
include more than one field, they are retrieved in the order listed.

alias1, alias2 The names to use as column headers instead of the original column
names in table.

tableexpression The name of the table or tables containing the data you want to retrieve.

externaldatabase The name of the database containing the tables in tableexpression if they
are not in the current database.

Remarks
To perform this operation, the Microsoft® Jet database engine searches the specified table or
tables, extracts the chosen columns, selects rows that meet the criterion, and sorts or groups the
resulting rows into the order specified.
SELECT statements do not change data in the database.
SELECT is usually the first word in an SQL statement. Most SQL statements are either
SELECT or SELECT…INTO statements.
The minimum syntax for a SELECT statement is:
SELECT fields FROM table

102
You can use an asterisk (*) to select all fields in a table. The following example selects all of
the fields in the Employees table:

SELECT * FROM Employees;

If a field name is included in more than one table in the FROM clause, precede it with the table
name and the . (dot) operator. In the following example, the Department field is in both the
Employees table and the Supervisors table. The SQL statement selects departments from the
Employees table and supervisor names from the Supervisors table:

SELECT Employees.Department, Supervisors.SupvName FROM Employees INNER JOIN


Supervisors WHERE Employees.Department = Supervisors.Department;

When a Recordset object is created, the Microsoft Jet database engine uses the table's field
name as the Field object name in the Recordset object. If you want a different field name or a
name is not implied by the expression used to generate the field, use the AS reserved word.
The following example uses the title Birth to name the returned Field object in the
resulting Recordset object:

SELECT BirthDate AS Birth FROM Employees;

Whenever you use aggregate functions or queries that return ambiguous or


duplicate Field object names, you must use the AS clause to provide an alternate name for
the Field object. The following example uses the title HeadCount to name the
returned Field object in the resulting Recordset object:

SELECT COUNT(EmployeeID) AS HeadCount FROM Employees;

You can use the other clauses in a SELECT statement to further restrict and organize your
returned data. For more information, see the Help topic for the clause you are using.
Example
Some of the following examples assume the existence of a hypothetical Salary field in an
Employees table. Note that this field does not actually exist in the Northwind database
Employees table.
SELECT Count(PostalCode) AS Tally FROM Customers;
SELECT Count (*) AS TotalEmployees, Avg(Salary) AS AverageSalary, Max(Salary) AS
MaximumSalary FROM Employees;

103
2. WHERE Clause
Specifies which records from the tables listed in the FROM clause are affected by
SELECT, UPDATE, or DELETE statement.
Syntax
SELECT fieldlist FROM tableexpression WHERE criteria
A SELECT statement containing a WHERE clause has these parts:

Part Description

fieldlist The name of the field or fields to be retrieved along with any field-name
aliases, selection predicates (ALL, DISTINCT, DISTINCTROW, or
TOP), or other SELECT statement options.

tableexpression The name of the table or tables from which data is retrieved.

criteria An expression that records must satisfy to be included in the query results.

Remarks
The Microsoft Access database engine selects the records that meet the conditions listed in the
WHERE clause. If you do not specify a WHERE clause, your query returns all rows from the
table. If you specify more than one table in your query and you have not included a WHERE
clause or a JOIN clause, your query generates a Cartesian product of the tables.
WHERE is optional, but when included, follows FROM. For example, you can select all
employees in the sales department
(

WHERE Dept = 'Sales'

)
or all customers between the ages of 18 and 30 (

WHERE Age Between 18 And 30

).
If you do not use a JOIN clause to perform SQL join operations on multiple tables, the
resulting Recordset object will not be updatable.

WHERE is similar to HAVING. WHERE determines which records are selected. Similarly,
once records are grouped with GROUP BY, HAVING determines which records are displayed.
Use the WHERE clause to eliminate records you do not want grouped by a GROUP BY clause.

104
Use various expressions to determine which records the SQL statement returns. For example,
the following SQL statement selects all employees whose salaries are more than N21,000:

SELECT LastName, Salary FROM Employees WHERE Salary > 21000;

A WHERE clause can contain up to 40 expressions linked by logical operators, such


as And and Or.
When you enter a field name that contains a space or punctuation, surround the name with
brackets ([ ]). For example, a customer information table might include information about
specific customers:

SELECT [Customer's Favorite Restaurant]

When you specify the criteria argument, date literals must be in U.S. format, even if you are
not using the U.S. version of the Microsoft® Jet database engine. For example, May 10, 1996,
is written 10/5/96 in the United Kingdom and 5/10/96 in the United States. Be sure to enclose
your date literals with the number sign (#) as shown in the following examples.
To find records dated May 10, 1996 in a United Kingdom database, you must use the following
SQL statement:

SELECT * FROM Orders WHERE ShippedDate = #5/10/96#;

You can also use the DateValue function which is aware of the international settings
established by Microsoft Windows®. For example, use this code for the United States:

SELECT * FROM Orders WHERE ShippedDate = DateValue('5/10/96');

And use this code for the United Kingdom:

SELECT * FROM Orders WHERE ShippedDate = DateValue('10/5/96');

Note

If the column referenced in the criteria string is of type GUID, the criteria expression uses a
slightly different syntax:

WHERE ReplicaID = {GUID {12345678-90AB-CDEF-1234-567890ABCDEF}}

Be sure to include the nested braces and hyphens as shown.

105
Example
The following example assumes the existence of a hypothetical Salary field in an Employees
table. Note that this field does not actually exist in the Northwind database Employees
table.This example selects the LastName and FirstName fields of each record in which the last
name is King.

SELECT LastName, FirstName FROM Employees WHERE LastName = 'King';

3. FROM Clause
Specifies the tables or queries that contain the fields listed in the SELECT statement
Syntax:
SELECT fieldlist FROM tableexpression [IN externaldatabase]
A SELECT statement containing a FROM clause has these parts:

Part Description

fieldlist The name of the field or fields to be retrieved along with any field-name
aliases, SQL aggregate functions, selection predicates (ALL,
DISTINCT, DISTINCTROW, or TOP), or other SELECT statement
options.

tableexpression An expression that identifies one or more tables from which data is
retrieved. The expression can be a single table name, a saved query
name, or a compound resulting from an INNER JOIN, LEFT
JOIN, or RIGHT JOIN.

externaldatabase The full path of an external database containing all the tables
in tableexpression.

Remarks
FROM is required and follows any SELECT statement.The order of the table names
in tableexpression is not important.For improved performance and ease of use, it is
recommended that you use a linked table instead of an IN clause to retrieve data from an
external database.
The following example shows how you can retrieve data from the Employees table:

SELECT LastName, FirstName FROM Employees;

Example
Some of the following examples assume the existence of a hypothetical Salary field in an
Employees table. Note that this field does not actually exist in the Northwind database
Employees table.

106
SELECT LastName,FirstName FROM Employees;
This next example counts the number of records that have an entry in the PostalCode field and
names the returned field Tally.
SELECT Count(PostalCode) AS Tally FROM Customers;
This example shows the number of employees and the average and maximum salaries.
SELECT Count (*) AS TotalEmployees, Avg(Salary) AS AverageSalary, Max(Salary) AS
MaximumSalary FROM Employees;

4. GROUP BY Clause
This combines records with identical values in the specified field list into a single record. A
summary value is created for each record if you include an SQL aggregate function, such
as Sum or Count, in the SELECT statement.
Syntax
SELECT fieldlist FROM table WHERE criteria [GROUP BY groupfieldlist]
A SELECT statement containing a GROUP BY clause has these parts:

Part Description

fieldlist The name of the field or fields to be retrieved along with any field-name
aliases, SQL aggregate functions, selection predicates (ALL, DISTINCT,
DISTINCTROW, or TOP), or other SELECT statement options.

table The name of the table from which records are retrieved. For more
information, see the FROM clause.

criteria Selection criteria. If the statement includes a WHEREclause, the Microsoft


Access database engine groups values after applying the WHERE
conditions to the records.

groupfieldlist The names of up to 10 fields used to group records. The order of the field
names in groupfieldlist determines the grouping levels from the highest to
the lowest level of grouping.

Remarks: GROUP BY is optional.


Summary values are omitted if there is no SQL aggregate function in the SELECT statement.
Null values in GROUP BY fields are grouped and are not omitted. However, Null values are
not evaluated in any SQL aggregate function.
Use the WHERE clause to exclude rows you do not want grouped, and use the HAVING clause
to filter records after they have been grouped.
Unless it contains Memo or OLE Object data, a field in the GROUP BY field list can refer to
any field in any table listed in the FROM clause, even if the field is not included in the SELECT

107
statement, provided the SELECT statement includes at least one SQL aggregate function. The
Microsoft® Jet database engine cannot group on Memo or OLE Object fields.
All fields in the SELECT field list must either be included in the GROUP BY clause or be
included as arguments to an SQL aggregate function.
Example
This example creates a list of unique job titles and the number of employees with each title.
SELECT Title, Count([Title]) AS Tally FROM Employees GROUP BY Title;
For each unique job title, this example calculates the number of employees in Ibadan who have
that title.
SELECT Title, Count(Title) AS Tally FROM Employees WHERE Region = 'WA' GROUP
BY Title;

5. HAVING Clause
This specifies which grouped records are displayed in a SELECT statement with a GROUP
BY clause. After GROUP BY combines records, HAVING displays any records grouped by
the GROUP BY clause that satisfy the conditions of the HAVING clause.
Syntax
SELECT fieldlist FROM table WHERE selectcriteria GROUP
BY groupfieldlist [HAVING groupcriteria]
A SELECT statement containing a HAVING clause has these parts:

Part Description

fieldlist The name of the field or fields to be retrieved along with any field-name
aliases, SQL aggregate functions, selection predicates (ALL, DISTINCT,
DISTINCTROW, or TOP), or other SELECT statement options.

table The name of the table from which records are retrieved. For more
information, see the FROM clause.

selectcriteria Selection criteria. If the statement includes a WHERE clause, the Microsoft
Access database engine groups values after applying the WHERE
conditions to the records.

groupfieldlist The names of up to 10 fields used to group records. The order of the field
names in groupfieldlist determines the grouping levels from the highest to
the lowest level of grouping.

groupcriteria An expression that determines which grouped records to display.

Remarks: HAVING is optional

108
HAVING is similar to WHERE, which determines which records are selected. After records
are grouped with GROUP BY, HAVING determines which records are displayed:

SELECT CategoryID, Sum(UnitsInStock) FROM Products GROUP BY CategoryID


HAVING Sum(UnitsInStock) > 100 And Like "BOS*";

A HAVING clause can contain up to 40 expressions linked by logical operators, such


as And and Or.
Example
This example selects the job titles assigned to more than one employee in the Washington
region.
SELECT Title, Count(Title) as Total FROM Employees WHERE Region = 'WA' GROUP
BY Title HAVING Count(Title) > 1;

6. ORDER BY Clause
This sorts a query's resulting records on a specified field or fields in ascending or descending
order.
Syntax:
SELECT fieldlist FROM table WHERE selectcriteria [ORDER BY field1 [ASC |
DESC ][, field2 [ASC | DESC ]][, …]]]

A SELECT statement containing an ORDER BY clause has these parts:

Part Description

fieldlist The name of the field or fields to be retrieved along with any field-name
aliases, SQL aggregate functions, selection predicates (ALL, DISTINCT,
DISTINCTROW, or TOP), or other SELECT statement options.

table The name of the table from which records are retrieved. For more
information, see the FROM clause.

selectcriteria Selection criteria. If the statement includes a WHERE clause, the Microsoft
Access database engine orders values after applying the WHERE conditions
to the records.

field1, field2 The names of the fields on which to sort records.

Remarks:
ORDER BY is optional. However, if you want your data displayed in sorted order, then you
must use ORDER BY.

109
The default sort order is ascending (A to Z, 0 to 9). Both of the following examples sort
employee names in last name order:

SELECT LastName, FirstName FROM Employees ORDER BY LastName;

SELECT LastName, FirstName FROM Employees ORDER BY LastName ASC;

To sort in descending order (Z to A, 9 to 0), add the DESC reserved word to the end of each
field you want to sort in descending order. The following example selects salaries and sorts
them in descending order:

SELECT LastName, Salary FROM Employees ORDER BY Salary DESC, LastName;

If you specify a field containing Memo or OLE Object data in the ORDER BY clause, an error
occurs. The Microsoft Jet database engine does not sort on fields of these types.
ORDER BY is usually the last item in an SQL statement.
You can include additional fields in the ORDER BY clause. Records are sorted first by the first
field listed after ORDER BY. Records that have equal values in that field are then sorted by
the value in the second field listed, and so on.
Example
The SQL statement shown in the following example uses the ORDER BY clause to sort records
by last name in descending order (Z-A).
SELECT LastName,FirstName FROM Employees ORDER BY LastName DESC;

7. ALL, DISTINCT, DISTINCTROW, TOP Predicates


This specifies records selected with SQL queries.
Syntax
SELECT [ALL | DISTINCT | DISTINCTROW | [TOP n [PERCENT]]] FROM table
A SELECT statement containing these predicates has the following parts:

Part Description

ALL Assumed if you do not include one of the predicates. The


Microsoft Access database engine selects all of the records that
meet the conditions in the SQL statement. The following two
examples are equivalent and return all records from the
Employees table:

110
SELECT ALL * FROM Employees ORDER BY
EmployeeID;
SELECT * FROM Employees ORDER BY EmployeeID;

SELECT ALL *
FROM Employees
ORDER BY EmployeeID;

SELECT *
FROM Employees
ORDER BY EmployeeID;

DISTINCT Omits records that contain duplicate data in the selected fields.
To be included in the results of the query, the values for each
field listed in the SELECT statement must be unique. For
example, several employees listed in an Employees table may
have the same last name. If two records contain Smith in the
LastName field, the following SQL statement returns only one
record that contains Smith:

SELECT DISTINCT LastName FROM Employees;

If you omit DISTINCT, this query returns both Smith records.


If the SELECT clause contains more than one field, the
combination of values from all fields must be unique for a
given record to be included in the results.
The output of a query that uses DISTINCT is not updatable
and does not reflect subsequent changes made by other users.

SELECT DISTINCT
LastName
FROM Employees;

DISTINCTROW Omits data based on entire duplicate records, not just duplicate
fields. For example, you could create a query that joins the
Customers and Orders tables on the CustomerID field. The
Customers table contains no duplicate CustomerID fields, but
the Orders table does because each customer can have many
orders. The following SQL statement shows how you can use
DISTINCTROW to produce a list of companies that have at
least one order but without any details about those orders:

SELECT DISTINCTROW CompanyName FROM Customers


INNER JOIN Orders ON Customers.CustomerID =
Orders.CustomerID ORDER BY CompanyName;

If you omit DISTINCTROW, this query produces multiple


rows for each company that has more than one order.

111
DISTINCTROW has an effect only when you select fields
from some, but not all, of the tables used in the query.
DISTINCTROW is ignored if your query includes only one
table, or if you output fields from all tables.

SELECT DISTINCTROW
CompanyName
FROM Customers INNER
JOIN Orders
ON
Customers.CustomerID =
Orders.CustomerID
ORDER BY
CompanyName;

TOP n [PERCENT] Returns a certain number of records that fall at the top or the
bottom of a range specified by an ORDER BY clause. Suppose
you want the names of the top 25 students from the class of
1994:
SELECT TOP 25 FirstName, LastName FROM Students
WHERE GraduationYear = 1994 ORDER BY
GradePointAverage DESC;

If you do not include the ORDER BY clause, the query will


return an arbitrary set of 25 records from the Students table
that satisfy the WHERE clause.

The TOP predicate does not choose between equal values. In


the preceding example, if the twenty-fifth and twenty-sixth
highest grade point averages are the same, the query will return
26 records.

You can also use the PERCENT reserved word to return a


certain percentage of records that fall at the top or the bottom
of a range specified by an ORDER BY clause. Suppose that,
instead of the top 25 students, you want the bottom 10 percent
of the class:

SELECT TOP 10 PERCENT FirstName, LastName FROM


Students WHERE GraduationYear = 1994 ORDER BY
GradePointAverage ASC;

The ASC predicate specifies a return of bottom values. The


value that follows TOP must be an unsigned Integer.
TOP does not affect whether or not the query is updatable.

SELECT TOP 25
FirstName, LastName

112
FROM Students
WHERE GraduationYear
= 1994
ORDER BY
GradePointAverage
DESC;

SELECT TOP 10
PERCENT
FirstName, LastName
FROM Students
WHERE GraduationYear
= 1994
ORDER BY
GradePointAverage ASC;

table The name of the table from which records are retrieved.

Example
This example creates a query that joins the Customers and Orders tables on the CustomerID
field. The Customers table contains no duplicate CustomerID fields, but the Orders table does
because each customer can have many orders. Using DISTINCTROW produces a list of
companies that have at least one order but without any details about those orders.
SELECT DISTINCTROW CompanyName FROM Customers
INNER JOIN Orders ON Customers.CustomerID = Orders.CustomerID
ORDER BY CompanyName;

8. DELETE Statement
This creates a delete query that removes records from one or more of the tables listed in
the FROM clause that satisfy the WHERE clause.
Syntax:
DELETE [table.*] FROM table WHERE criteria
The DELETE statement has these parts:

Part Description

table The optional name of the table from which records are deleted.

table The name of the table from which records are deleted.

113
criteria An expression that determines which records to delete.

Remarks:
DELETE is especially useful when you want to delete many records.

To drop an entire table from the database, you can use the Execute method with
a DROP statement. If you delete the table, however, the structure is lost. In contrast, when you
use DELETE, only the data is deleted; the table structure and all of the table properties, such
as field attributes and indexes, remain intact.
You can use DELETE to remove records from tables that are in a one-to-many relationship
with other tables. Cascade delete operations cause the records in tables that are on the many
side of the relationship to be deleted when the corresponding record in the one side of the
relationship is deleted in the query.
For example, in the relationship between the Customers and Orders tables, the Customers table
is on the one side and the Orders table is on the many side of the relationship. Deleting a record
from Customers results in the corresponding Orders records being deleted if the cascade delete
option is specified.
A delete query deletes entire records, not just data in specific fields. If you want to delete values
in a specific field, create an update query that changes the values to Null.
Important:
 After you remove records using a delete query, you cannot undo the operation. If you
want to know which records were deleted, first examine the results of a select query
that uses the same criteria, and then run the delete query.
 Maintain backup copies of your data at all times. If you delete the wrong records, you
can retrieve them from your backup copies.
Example
This example deletes all records for employees whose title is Trainee. When the FROM clause
includes only one table, you do not have to list the table name in the DELETE statement.
DELETE * FROM Employees WHERE Title = 'Trainee';

9. IN Clause
This identifies tables in any external database to which the Microsoft Access database engine
can connect, such as a dBASE or Paradox database or an external Microsoft® Access database
engine database.
Syntax:
To identify a destination table:
[SELECT | INSERT] INTO destination IN {path | ["path" "type"] | ["" [type; DATABASE
= path]]}
To identify a source table:

114
FROM tableexpression IN {path | ["path" "type"] | ["" [type; DATABASE = path]]}

A SELECT statement containing an IN clause has these parts:

Part Description

destination The name of the external table into which data is inserted.

tableexpression The name of the table or tables from which data is retrieved. This
argument can be a single table name, a saved query, or a compound
resulting from an INNER JOIN, LEFT JOIN, or RIGHT JOIN.

path The full path for the directory or file containing table.

type The name of the database type used to create table if a database is not a
Microsoft Access database engine database (for example, dBASE III,
dBASE IV, Paradox 3.x, or Paradox 4.x).

Remarks:
You can use IN to connect to only one external database at a time.
In some cases, the path argument refers to the directory containing the database files. For
example, when working with dBASE, Microsoft FoxPro®, or Paradox database tables,
the path argument specifies the directory containing .dbf or .db files. The table file name is
derived from the destination or tableexpression argument.
To specify a non-Microsoft Access database engine database, append a semicolon (;) to the
name, and enclose it in single (' ') or double (" ") quotation marks. For example, either 'dBASE
IV;' or "dBASE IV;" is acceptable.
You can also use the DATABASE reserved word to specify the external database. For example,
the following lines specify the same table:

…FROM Table IN "" [dBASE IV; DATABASE=C:\DBASE\DATA\SALES;];

…FROM Table IN "C:\DBASE\DATA\SALES" "dBASE IV;"

For improved performance and ease of use, use a linked table instead of IN.
You can also use the IN reserved word as a comparison operator in an expression. For more
information, see the In operator.

115
Example
The following table shows how you can use the IN clause to retrieve data from an external
database. In each example, assume the hypothetical Customers table is stored in an external
database.

External database SQL statement

Microsoft® Access atabase engine database SELECT CustomerID


FROM Customers
IN OtherDB.mdb
WHERE CustomerID Like "A*";

SELECT CustomerID
FROM Customers
IN OtherDB.mdb
WHERE CustomerID Like "A*";

dBASE III or IV. SELECT CustomerID


To retrieve data from a dBASE III table, FROM Customer
substitute "dBASE III;" for "dBASE IV;". IN "C:\DBASE\DATA\SALES" "dBASE
IV;"
WHERE CustomerID Like "A*";

SELECT CustomerID
FROM Customer
IN "C:\DBASE\DATA\SALES" "dBASE
IV;"
WHERE CustomerID Like "A*";

dBASE III or IV using Database syntax. SELECT CustomerID


FROM Customer
IN "" [dBASE IV;
Database=C:\DBASE\DATA\SALES;]
WHERE CustomerID Like "A*";

SELECT CustomerID
FROM Customer
IN "" [dBASE IV;
Database=C:\DBASE\DATA\SALES;]
WHERE CustomerID Like "A*";

Paradox 3.x or 4.x. SELECT CustomerID


FROM Customer

116
To retrieve data from a Paradox version 3.x IN "C:\PARADOX\DATA\SALES"
table, substitute "Paradox 3.x;" for "Paradox "Paradox 4.x;"
4.x;". WHERE CustomerID Like "A*";

SELECT CustomerID
FROM Customer
IN "C:\PARADOX\DATA\SALES"
"Paradox 4.x;"
WHERE CustomerID Like "A*";

Paradox 3.x or 4.x using Database syntax SELECT CustomerID


FROM Customer
IN "" [Paradox
4.x;Database=C:\PARADOX\DATA\SALE
S;]
WHERE CustomerID Like "A*";

SELECT CustomerID
FROM Customer
IN "" [Paradox
4.x;Database=C:\PARADOX\DATA\SALE
S;]
WHERE CustomerID Like "A*";

A Microsoft Excel worksheet SELECT CustomerID, CompanyName


FROM [Customers$]
IN "c:\documents\xldata.xls" "EXCEL 5.0;"
WHERE CustomerID Like "A*"
ORDER BY CustomerID;

SELECT CustomerID, CompanyName


FROM [Customers$]
IN "c:\documents\xldata.xls" "EXCEL 5.0;"
WHERE CustomerID Like "A*"
ORDER BY CustomerID;

A named range in a worksheet SELECT CustomerID, CompanyName


FROM CustomersRange
IN "c:\documents\xldata.xls" "EXCEL 5.0;"
WHERE CustomerID Like "A*"
ORDER BY CustomerID;

SELECT CustomerID, CompanyName


FROM CustomersRange
IN "c:\documents\xldata.xls" "EXCEL 5.0;"
WHERE CustomerID Like "A*"
ORDER BY CustomerID;

117
10. INSERT INTO Statement
This adds a record or multiple records to a table. This is referred to as an append query.
Syntax:
Multiple-record append query:
INSERT INTO target [(field1[, field2[, …]])] [IN externaldatabase] SELECT
[source.]field1[, field2[, …] FROM tableexpression

Single-record append query:


INSERT INTO target [(field1[, field2[, …]])] VALUES (value1[, value2[, …])
The INSERT INTO statement has these parts:

Part Description

target The name of the table or query to append records to.

field1, field2 Names of the fields to append data to, if following a target argument, or
the names of fields to obtain data from, if following a source argument.

externaldatabase The path to an external database. For a description of the path, see
the IN clause.

source The name of the table or query to copy records from.

tableexpression The name of the table or tables from which records are inserted. This
argument can be a single table name or a compound resulting from
an INNER JOIN, LEFT JOIN, or RIGHT JOIN operation or a saved
query.

value1, value2 The values to insert into the specific fields of the new record. Each value
is inserted into the field that corresponds to the value's position in the
list: value1 is inserted into field1 of the new record, value2 into field2,
and so on. You must separate values with a comma, and enclose text
fields in quotation marks (' ').

Remarks
You can use the INSERT INTO statement to add a single record to a table using the single-
record append query syntax as shown above. In this case, your code specifies the name and
value for each field of the record. You must specify each of the fields of the record that a value
is to be assigned to and a value for that field. When you do not specify each field, the default
value or Null is inserted for missing columns. Records are added to the end of the table.
You can also use INSERT INTO to append a set of records from another table or query by
using the SELECT … FROM clause as shown above in the multiple-record append query
syntax. In this case, the SELECT clause specifies the fields to append to the
specified target table.

118
The source or target table may specify a table or a query. If a query is specified, the Microsoft
Access database engine appends records to any and all tables specified by the query.
INSERT INTO is optional but when included, precedes the SELECT statement.

If your destination table contains a primary key, make sure you append unique, non-Null values
to the primary key field or fields; if you do not, the Microsoft Access database engine will not
append the records.

If you append records to a table with an AutoNumber field and you want to renumber the
appended records, do not include the AutoNumber field in your query. Do include the
AutoNumber field in the query if you want to retain the original values from the field.
Use the IN clause to append records to a table in another database.
To create a new table, use the SELECT… INTO statement instead to create a make-table query.
To find out which records will be appended before you run the append query, first execute and
view the results of a select query that uses the same selection criteria.
An append query copies records from one or more tables to another. The tables that contain the
records you append are not affected by the append query.
Instead of appending existing records from another table, you can specify the value for each
field in a single new record using the VALUES clause. If you omit the field list, the VALUES
clause must include a value for every field in the table; otherwise, the INSERT operation will
fail. Use an additional INSERT INTO statement with a VALUES clause for each additional
record you want to create.
Example
This example selects all records in a hypothetical New Customers table and adds them to the
Customers table. When individual columns are not designated, the SELECT table column
names must match exactly those in the INSERT INTO table.
INSERT INTO Customers SELECT * FROM [New Customers];
This example creates a new record in the Employees table.
INSERT INTO Employees (FirstName,LastName, Title) VALUES ('Harry', 'Washington',
'Trainee');

11. SELECT…INTO Statement


This creates a make-table query
Syntax:
SELECT field1[, field2[, …]] INTO newtable [IN externaldatabase] FROM source
The SELECT…INTO statement has these parts:

Part Description

field1, field2 The name of the fields to be copied into the new table.

119
newtable The name of the table to be created. It must conform to standard naming
conventions. If newtable is the same as the name of an existing table, a
trappable error occurs.

externaldatabase The path to an external database. For a description of the path, see
the IN clause.

source The name of the existing table from which records are selected. This can
be single or multiple tables or a query.

Remarks
You can use make-table queries to archive records, make backup copies of your tables, or make
copies to export to another database or to use as a basis for reports that display data for a
particular time period. For example, you could produce a Monthly Sales by Region report by
running the same make-table query each month.
 You may want to define a primary key for the new table. When you create the table, the
fields in the new table inherit the data type and field size of each field in the query's
underlying tables, but no other field or table properties are transferred.
 To add data to an existing table, use the INSERT INTO statement instead to create an
append query.
 To find out which records will be selected before you run the make-table query, first
examine the results of a SELECT statement that uses the same selection criteria.
Example
This example selects all records in the Employees table and copies them into a new table named
Emp Backup.
SELECT Employees.* INTO [Emp Backup] FROM Employees;
The following query deletes the table because this is a demonstration.
DROP TABLE [Emp Backup];

12. UNION Operation


This creates a union query, which combines the results of two or more independent queries or
tables.
Syntax
[TABLE] query1 UNION [ALL] [TABLE] query2 [UNION [ALL] [TABLE] queryn [ … ]]
The UNION operation has these parts:

Part Description

query1- A SELECT statement, the name of a stored query, or the name of a stored table
n preceded by the TABLE keyword.

120
Remarks
You can merge the results of two or more queries, tables, and SELECT statements, in any
combination, in a single UNION operation. The following example merges an existing table
named New Accounts and a SELECT statement:

TABLE [New Accounts] UNION ALL SELECT * FROM Customers WHERE


OrderAmount > 1000;

By default, no duplicate records are returned when you use a UNION operation; however, you
can include the ALL predicate to ensure that all records are returned. This also makes the query
run faster.
All queries in a UNION operation must request the same number of fields; however, the fields
do not have to be of the same size or data type.
Use aliases only in the first SELECT statement because they are ignored in any others. In the
ORDER BY clause, refer to fields by what they are called in the first SELECT statement.
Notes
 You can use a GROUP BY or HAVING clause in each query argument to group the
returned data.
 You can use an ORDER BY clause at the end of the last query argument to display the
returned data in a specified order.
Example
This example retrieves the names and cities of all suppliers and customers in Lagos
SELECT CompanyName, City FROM Suppliers WHERE Country = 'Lagos'
UNION SELECT CompanyName, City FROM Customers WHERE Country = 'Lagos';

13. UPDATE STATEMENT


This creates an update query that changes values in fields in a specified table based on specified
criteria.
Syntax:
UPDATE table SET newvalue WHERE criteria;
The UPDATE statement has these parts:

Part Description

table The name of the table containing the data you want to modify.

newvalue An expression that determines the value to be inserted into a particular field in
the updated records.

121
criteria An expression that determines which records will be updated. Only records that
satisfy the expression are updated.

Remarks
UPDATE is especially useful when you want to change many records or when the records that
you want to change are in multiple tables.You can change several fields at the same time. The
following example increases the Order Amount values by 10 percent and the Freight values by
3 percent for shippers in Nigeria:

UPDATE Orders SET OrderAmount = OrderAmount * 1.1, Freight = Freight * 1.03


WHERE ShipCountry = 'NG';

Important
 UPDATE does not generate a result set. Also, after you update records using an update
query, you cannot undo the operation. If you want to know which records were updated,
first examine the results of a select query that uses the same criteria, and then run the
update query.
 Maintain backup copies of your data at all times. If you update the wrong records, you
can retrieve them from your backup copies.
Example
This example changes values in the ReportsTo field to 5 for all employee records that currently
have ReportsTo values of 2.
UPDATE Employees SET ReportsTo = 5 WHERE ReportsTo = 2;

14. SQL Aggregate Functions


Using the SQL aggregate functions, you can determine various statistics on sets of values. You
can use these functions in a query and aggregate expressions in the SQL property of
a QueryDef object or when creating a Recordset object based on an SQL query.

14.1 Avg Function


This calculates the arithmetic mean of a set of values contained in a specified field on a query.
Syntax: Avg(expr)
The expr placeholder represents a string expression identifying the field that contains the
numeric data you want to average or an expression that performs a calculation using the data
in that field. Operands in expr can include the name of a table field, a constant, or a function
(which can be either intrinsic or user-defined but not one of the other SQL aggregate functions).
Remarks
The average calculated by Avg is the arithmetic mean (the sum of the values divided by the
number of values). You could use Avg, for example, to calculate average freight cost.
The Avg function does not include any Null fields in the calculation.
122
You can use Avg in a query expression and in the SQL property of a QueryDef object or when
creating a Recordset object based on an SQL query.
Example
This example uses the Orders table to calculate the average freight charges for orders with
freight charges over N100.

SELECT Avg(Freight) AS [Average Freight] FROM Orders WHERE Freight > 100;

14.2 Count Function


This calculates the number of records returned by a query.
Syntax: Count(expr)
The expr placeholder represents a string expression identifying the field that contains the data
you want to count or an expression that performs a calculation using the data in the field.
Operands in expr can include the name of a table field or function (which can be either intrinsic
or user-defined but not other SQL aggregate functions). You can count any kind of data,
including text.
Remarks
You can use Count to count the number of records in an underlying query. For example, you
could use Count to count the number of orders shipped to a particular country.
Although expr can perform a calculation on a field, Count simply tallies the number of
records. It does not matter what values are stored in the records.
The Count function does not count records that have Null fields unless expr is the asterisk (*)
wildcard character. If you use an asterisk, Count calculates the total number of records,
including those that contain Null fields. Count(*) is considerably faster than Count([Column
Name]). Do not enclose the asterisk in quotation marks (' '). The following example calculates
the number of records in the Orders table:

SELECT Count(*)AS TotalOrders FROM Orders;

If expr identifies multiple fields, the Count function counts a record only if at least one of the
fields is not Null. If all of the specified fields are Null, the record is not counted. Separate the
field names with an ampersand (&). The following example shows how you can limit the count
to records in which either ShippedDate or Freight is not Null:

SELECT Count('ShippedDate & Freight') AS [Not Null] FROM Orders;

You can use Count in a query expression. You can also use this expression in
the SQL property of a QueryDef object or when creating a Recordset object based on an SQL
query.

123
Example
This example uses the Orders table to calculate the number of orders shipped to the United
Kingdom.
SELECT Count (ShipCountry) AS [NG Orders] FROM Orders WHERE ShipCountry = 'NG';

14.3 First, Last Functions


These return a field value from the first or last record in the result set returned by a query.
Syntax:
First(expr)
Last(expr)
The expr placeholder represents a string expression identifying the field that contains the data
you want to use or an expression that performs a calculation using the data in that field.
Operands in expr can include the name of a table field, a constant, or a function (which can be
either intrinsic or user-defined but not one of the other SQL aggregate functions).
Remarks
The First and Last functions simply return the value of a specified field in the first or last
record, respectively, of the result set returned by a query. Because records are usually returned
in no particular order (unless the query includes an ORDER BY clause), the records returned
by these functions will be arbitrary.
Example
This example uses the Employees table to return the values from the LastName field of the first
and last records returned from the table.
SELECT First(LastName) as First, Last(LastName) as Last FROM Employees;
The next example compares using the First and Last functions with simply using
the Min and Max functions to find the earliest and latest birth dates of Employees
(i) To find the earliest and latest birth dates of Employee:
SELECT First(BirthDate) as FirstBD,Last(BirthDate) as LastBD FROM Employees;
(ii) Tofind the earliest and latest birth dates of Employees:
SELECT Min(BirthDate) as MinBD,Max(BirthDate) as MaxBD FROM Employees;

14.4 Min, Max Functions


This return the minimum or maximum of a set of values contained in a specified field on a
query.
Syntax:
Min(expr)
Max(expr)
The expr placeholder represents a string expression identifying the field that contains the data
you want to evaluate or an expression that performs a calculation using the data in that field.

124
Operands in expr can include the name of a table field, a constant, or a function (which can be
either intrinsic or user-defined but not one of the other SQL aggregate functions).
Remarks
You can use Min and Max to determine the smallest and largest values in a field based on the
specified aggregation, or grouping. For example, you could use these functions to return the
lowest and highest freight cost. If there is no aggregation specified, then the entire table is used.
You can use Min and Max in a query expression and in the SQL property of
a QueryDef object or when creating a Recordset object based on an SQL query.
Example
This example uses the Orders table to return the lowest and highest freight charges for orders
shipped to Nigeria.
SELECT Min(Freight) AS [Low Freight], Max(Freight)AS [High Freight]
FROM Orders WHERE ShipCountry = 'UK';

14.5 StDev, StDevP Functions


These return estimates of the standard deviation for a population or a population sample
represented as a set of values contained in a specified field on a query.
Syntax:
StDev(expr)
StDevP(expr)
The expr placeholder represents a string expression identifying the field that contains the
numeric data you want to evaluate or an expression that performs a calculation using the data
in that field. Operands in expr can include the name of a table field, a constant, or a function
(which can be either intrinsic or user-defined but not one of the other SQL aggregate functions).
Remarks
The StDevP function evaluates a population, and the StDev function evaluates a population
sample.If the underlying query contains fewer than two records (or no records, for
the StDevP function), these functions return a Null value (which indicates that a standard
deviation cannot be calculated).
You can use the StDev and StDevP functions in a query expression. You can also use this
expression in the SQL property of a QueryDef object or when creating a Recordset object
based on an SQL query.
Example
This example uses the Orders table to estimate the standard deviation of the freight charges for
orders shipped to Nigeria.
To calculate the standard deviation of the freightcharges for orders shipped to Nigeria

SELECT StDev(Freight) AS [Freight Deviation] FROM Orders WHERE ShipCountry = 'NG';

125
14.6 Sum Function
This returns the sum of a set of values contained in a specified field on a query.
Syntax
Sum(expr)
The expr placeholder represents a string expression identifying the field that contains the
numeric data you want to add or an expression that performs a calculation using the data in that
field. Operands in expr can include the name of a table field, a constant, or a function (which
can be either intrinsic or user-defined but not one of the other SQL aggregate functions).
Remarks
The Sum function totals the values in a field. For example, you could use the Sum function to
determine the total cost of freight charges.
The Sum function ignores records that contain Null fields. The following example shows how
you can calculate the sum of the products of UnitPrice and Quantity fields:

SELECT Sum(UnitPrice * Quantity) AS [Total Revenue] FROM [Order Details];

Example
This example uses the Orders table to calculate the total sales for orders shipped to Nigeria.
SELECT Sum(UnitPrice*Quantity)AS [Total NG Sales] FROM Orders INNER JOIN [Order
Details] ON Orders.OrderID = [Order Details].OrderID WHERE (ShipCountry = 'NG');

14.7 Var, VarP Functions


These return estimates of the variance for a population or a population sample represented as a
set of values contained in a specified field on a query.
Syntax:
Var(expr)
VarP(expr)
The expr placeholder represents a string expression identifying the field that contains the
numeric data you want to evaluate or an expression that performs a calculation using the data
in that field. Operands in expr can include the name of a table field, a constant, or a function
(which can be either intrinsic or user-defined but not one of the other SQL aggregate functions).
Remarks
The VarP function evaluates a population, and the Var function evaluates a population
sample.If the underlying query contains fewer than two records, the Var and VarP functions
return a Null value, which indicates that a variance cannot be calculated.
You can use the Var and VarP functions in a query expression or in an SQL statement.
Example
This example uses the Orders table to estimate the variance of freight costs for orders shipped
to Nigeria.

126
SELECT Sum(UnitPrice*Quantity) AS [Total NG Sales] FROM Orders INNER JOIN [Order
Details] ONOrders.OrderID = [Order Details].OrderID WHERE (ShipCountry = 'NG');

Some examples of complex queries


SELECT customer.CSurname, customer.COtherNames, Customer_product.PID,
customer_Product.qty from Customer, Customer_Product where Customer.cid =
Customer_Product.cid ;

Query4
CSurname COtherNames PID qty
AJALA AJAYI p01 34
IFUNANYA WUNMI P02 30

SELECT customer.CSurname, customer.COtherNames, customer_product.PID,


customer_Product.qty, Customer_product.purchaseDate from Customer, Customer_Product
where Customer.cid = Customer_Product.cid and Customer_product.purchaseDate like
"23/4/2021" ;

Query5
CSurname COtherNames PID qty purchaseDate
AJALA AJAYI p01 34 23/4/2021

More readings from:


HTTPS://MSDN.MICROSOFT.COM/EN-
US/LIBRARY/BB208938(V=OFFICE.12).ASPX

127
Practical Tasks

(1) CSC Nigeria Limited is a product manufacturing company. The company is interested in
keeping track of her customers and the orders they place on their products. As an MIS
Manager of the company, you are requested to create a functional database using Microsoft
Access or otherwise to keep track of the manufactured products, the customers of the
company and the orders. The following attributes are used to describe the products:
ProductID, ProductName, ProductManufactureDate, ProductNAFDACNo and
ProductDescription. Customers are described by the following attributes: CustomerID,
CustomerSurname, CustomerOtherNames, CustomerMobileNo, CustomerOfficeAddress.

(a) Using the concept of database normalization, design MS Access Database tables for
Product, Customer and Orders. Insert some hypothetical data into the database tables
(b) Create data input forms for the three tables
(c) The CEO of CSC Nig. LTD. is interested in the following reports:
(i) Total number of customers the company currently maintains.
(ii) List of all products’ names and NAFDAC Numbers being currently
manufactured by the company
(iii) List of all Customer’s names and mobile phone numbers the company has.
(iv) List of customers’ names and office addresses residing in Ibadan alone
(v) The average quantity ordered for a product
(vi) List of Customers’ names, Products’ names, quantity ordered and date ordered
by all customers

Using Structured Query Language (SQL), create all the reports requested by the
CEO of CSC Nig. LTD. Save each query with necessary names.

(Q2) A local Warehouse uses manual method of recording the operation in the warehouse.
The stocks are recorded in books, the customer details are also recorded, and stocks purchased
and supplied are recorded in a book. The first problem was that the stock managers find it
tedious to know the total number of products available on daily, weekly or monthly basis. Most
time, the recorded available stock in the book is greater than the real number of stock available
in the warehouse or the stock had finished. The following are therefore the schemas designed
for the database to solve the inventory problem in the Warehouse. The primary keys are
underlined in the schemas.

PRODUCT (ProductID, ProductName, ProductUnitPrice, SupplierID, StockQTY, CostPrice,


SellPrice)
CUSTOMER (CustID, CustFirstName, CustSurname, CustAddress, CustPhoneNo)
SUPPLIER (SupplierID, SupplierName, SupplierAddress, SupplierPhoneNo)
SALESORDER (SalesOrderID, CustID, SalesOrderDate, ProductID, SalesOrderQTY)
PURCHASEORDER (PurchaseID, PurchaseDate, ProductID, PurchaseQTY)

(a) Design the database tables using MS Access. Populate the tables with some hypothetical
tuples.

128
(b) The CEO of the Warehouse is interested in the following reports:
(i) Total number of customers the company currently maintains.
(ii) List of all products and their quantities currently stored in the Warehouse
(iii) List of all Customer’s names and mobile phone numbers the company has.
(iv) List of Supplier’s names and the addresses of those residing in Bodija alone
(v) The average quantity ordered for a product
(vi) List of Customers’ names, Products’ names, quantity ordered and date ordered
by all customers

Using Structured Query Language (SQL), create all the reports requested by the
CEO of the Warehouse. Save each query with necessary names.

(Q3) CSC is a Business organization located within Ibadan Metropolis. The Company deals
with manufacturing of beverages like Milo, Pronto and Ovaltine. Records of customers, staff
and products manufactured have hitherto being kept in manual files. As soon as a customer
patronizes the company, a form is filled. The form contains some vital information on the
biodata of the customer, such as: Surname, Other Names, Age, Mobile Number and Home
Address. Forms are also filled with all the sales they have on daily basis. The Company
observed that they have issues with their manual way of keeping records of the customers and
their sales. The management therefore employ you in order to assist them in bringing database
innovation to the Company. Now, as a full-fledged database designer:

(a) What do you think are the likely issues that MBA Manufacturing Company is having
with their manual operations?
(b) How will the introduction of database technology assist the Company in this case?
(c) With reference to database, the Management is interested in understanding some
technical terms, which you are saying to them, like Primary key, Foreign Key,
Candidate keys and Referential Integrity Constraint. Using some illustrative tables,
explain the terms to the Management.
(d) Draw an ER Diagram for Staff, Customer and Product entities for the database scenario.
(e) Obtain RD Tables for the Customer, Product and Product_Customer Relations.
(f) Assuming you have created the database on a server, write SQL queries to bring out the
following information:
(i) List of all Customers in the age bracket 30 – 45 years in the database.
(ii) List of all Customers that have Bodija as part of their Home addresses.
(iii) List of Customers that bought Milo on 3/1/2022 at 2:30pm and the quantities
bought.
(iv) List of Customers that lives in Bodija and all the products they bought on
3/1/2022.

(Q4) At CSC Company, located in Ibadan, an Attendance Clocking Machine (ACM) is


installed at the main gate entrance to take attendance of staff once a day in the morning. Each
staff has a clocking card containing a chip on which are recorded the ID and other information
of the staff. When a staff arrives the company premises, the card is inserted into the ACM
which identifies the staff, his/her department and the time the clocking is done.

A computerized database is to be connected to the ACM on a Local Area Network (LAN). The
database will contain table data about Staff Biodata, Units in the company and Attendance. The
Attendance Table is a relationship table between Staff Biodata and Units. It also contains other

129
data such as DateClocked, TimeClocked. Time is recorded on 24 hours scale like 8.00, 13.45,
18.23, etc.

(i) As a Database Manager, draw up the essential three Database tables on paper. Let
each table contains about five records each
(ii) Assuming you are to generate reports for the following information from the
database:
(a) List of all Staff including their Departments from the database. You may
need to create a relationship table for Staff and Department in this case.
(b) List of all Staff names (and their Departments) that resume at exactly 8.00
on 21/1/2021.
(c) List of all Staff names (and their Departments) that resume after 8.00 on
21/1/2021.
(d) List of all Staff names (and their Departments) whose age is 55 years and
above and resume before 8.00 on 21/1/2021. The company will like to
reward them for prompt resumption at work on that day.
(e) List of all Staff names (and their Departments) whose age is less than 55
years and resume after 8.00 on 21/1/2021. The company will like to reward
them for prompt resumption at work on that day.

Write out all the database queries to generate all the above information assuming you
are using Microsoft Access as your database engine.

(Q5) MBA Medical Centre is a hospital located in Ibadan. It specializes in treating only fevers
such as Lasal, Malaria, Typhoid, and Yellow fevers. Patients consult doctors regularly and
diagnoses are conducted on them based on their complaints. Some patients may consult doctors
more than two times on a day especially if their feverish conditions do not change some hours
after taken the drugs prescribed for them. Doctors employed by the Centre are requested to fill
in their details such as names, year of birth and others on a form. Case notes are also opened
for patients, which contain their names, addresses, hospital registration numbers and others.
The case notes also contain their medical complaints history, doctors’ diagnoses and
prescriptions.

The management is interested in creating an MIS Unit for the Centre so that timely information
could be obtained by the management and doctors on their patients. As a full-fledged MIS
Expert, design and implement a relational database for the Centre. Tables in the database
contain information on doctors, patients and consultations, the latter being the relationship table
for doctor and patient relations. Insert at least five records into the tables.

Using a computer, assist the Management of MBA Medical Centre to create the following
reports via Queries on the database:
(i) All patients and their phone numbers
(ii) Mean age of all patients maintained at the Centre
(iii) All patients living in a particular locality such as Bodija Ibadan.
(iv) All patients, whose age is 60 years and above, with the doctors who saw them and
drugs prescribed for them.

130
Further Readings

Database Development (2015) databasedev.co.uk, Accessed April, 2022

Database Security. https://www.w3schools.in/dbms/database-security. Accessed April 2022

Date, C. J. (2004). An Introduction to Database Systems 8th edition,


https://lc.fie.umich.mx/~rodrigo/BD/An%20Introduction%20to%20Database%20Syste
ms%208e%20By%20C%20J%20Date.pdf

Imperva (2022). Database Security, https://www.imperva.com/learn/data-security/database-


security/ Accessed April 2022

Kristina Tuvikene (2022). The State of Security. https://www.tripwire.com/state-of-


security/featured/database-security-best-practices-you-should-know/ Accessed April
2022

Normalization of Database, https://www.studytonight.com/dbms/database-normalization.php


Accessed, March 2022.

Richard Peterson (2022). What is Normalization in DBMS (SQL)? 1NF, 2NF, 3NF, BCNF
Database with Example, https://www.guru99.com/database-normalization.html Updated,
February 12, 2022, Accessed, March 2022.

131

You might also like