CIS 472 Database System
CIS 472 Database System
Centre Series
CSC 472
(Database Systems)
By
S. O. Akinola (PhD)
Department of Computer Science,
University of Ibadan,
Ibadan, Nigeria
GENERAL INTRODUCTION AND COURSE OBJECTIVES
CSC 472 (DATABASE SYSTEMS) is a course meant to introduce students to the general
principles of Database design and applications.
In this course, we study databases in its introductory context covering meaning of databases,
types, data modelling, entity relationships and relational principles, in addition, Structured
Query Language (SQL) is introduced to students to build upon.
2
TABLE OF CONTENTS
3
7.1.5 Relational database
7.2 Properties of Relational Tables
7.3 Relational keys
7.3.1 Superkey
7.3.2 Candidate key
7.3.4 Primary key
7.3.5 Foreign key
7.4 Representing Relational Databases
4
11.3 SELECT STATEMENT (GENERAL FORMAT)
5
Unit 16 Database Transactions and Concurrency Controls
16.1 Meaning of Database Transaction
16.2 Properties of Transaction
16.3 Concurrency Controls
16.4 The Need for Concurrency Control
Where to type SQL statements in Microsoft Access "2007", "2010", "2013" or Access
"2016"
Practical Tasks
Further Readings
6
Unit 1: The Traditional Record Management System
Expected Duration: 1 week or 2 contact hours
Introduction
This Unit introduces you to the meaning and importance of data. The traditional method of
storing data in the computer is discussed. The meaning of files, records and fields are also
discussed. The Unit ends with the problems of file based systems.
Learning Outcomes
When you have studied this session, you should be able to explain:
1.2 The Importance of Data
1.1.1 Differences between Data and Information
1.1.2 Forms of Data
1.2 The Manual Record Management System
1.3 Automated Data Management
1.4 The Integrated File System
1.5 Components of File Based System
1.6 Problems of File Based System
Data are discrete facts about real world phenomenon. Data and its storage may be considered
to be the heart of any information system. Data has to be up to date, accurate, accessible in the
required form and available to one or perhaps many users at the same time. For data to be of
value, it must be presented in a form which supports the various operational, financial,
managerial, decision-making, administrative and clerical activities within an organization.
To meet these objectives, data needs to be stored efficiently – to avoid lengthy access times –
and with minimal duplication – to avoid lengthy update times and the possibility of
inconsistency and inaccuracy.
For the data stored by a given organization to have any value at all, it must be
Accurate
Consistent
Meaningful
Comprehensive
Relevant
Timely
Suitable
Information is a processed data that is meaningful and useful to the user. It is a resource
produced by information system that is important and essential to the operation and
management of a business or an organization.
7
1.1.1 Differences Between Data and Information
The following table gives the differences between data and information
Data Information
1. Raw facts or figures Finished figure/facts
2. Unstructured Structured
3. Unprocessed Processed
4. What exists What is required
(i) Text: This includes series of letters, numbers, and other characters whose combined
meaning does not depend on a pre-specified format. For example, a word processed
document, which by reading and interpreting gives information.
(ii) Images: This includes data in the form of pictures. Pictures can either be graphs
generated from formatted text, or photographs or hand-draw pictures.
(iii) Audio: This includes data in the form of sounds. For example, the sound that a
doctor hears by listening to their stethoscope. By listening and interpreting the
sound, they are able to get some information on what the patient's ailment is, if any.
(iv) Video: This includes data that combines the image and audio information types. In
essence, this type of information is imparted through the use of sounds and pictures
by viewing and listening over a period of lime. An example of this type of
information is the videoconference.
8
A typical data processing function will be to locate a student record or data from a large
file based on MatricNo. If this is to be carried out manually a lot of time and efforts will be
required by the data processing personnel.
The manual method of business record management involves the use of files, file jackets and
file cabinets with human beings moving files and information manually from one place or
location to another. The manual method of data processing is plagued with certain noticeable
disadvantages when compared with the use of automated systems for the same purpose. In
today’s world, the principal means of office automation among others is the computer system.
A critical look at this manual system reveals that a lot of space is occupied by the cabinets. The
more the files or records, the more difficult it becomes to arrange and keep track of them, thus
making information retrieval slow and laborious. The issues of security against theft, fire and
other physical factors are also of great concern.
Automated processing includes among others, the use of computers and other information
technology equipment, principles and practice for managing the information resources of an
individual or organisation.
In an integrated file system, the data is pooled into a set of interlocking and inter-dependent
files which are accessible by a number of different users. Some of these integrated file systems
today have been tailored to meet the requirements of particular organisations but they still
suffer from the problem of data duplication and therefore lack proper central control.
(i) File: A file is a complete, named collection of information and the basic unit of storage
that enables a computer to distinguish one set of information from another. For
example, a file named “Aircraft” might contain information about the different types of
aircraft used by a particular company.
(ii) Records: The data held within a file are organised into structured groups of related
elements that describe a person or things called records. For example, a record
describing an individual an aircraft might be composed of the data elements:
“ID_Number, Manufacturer_Name, Description, Classification, Seating_Capacity”
and so on. The aircraft file then contains zero, one or many such records; where each
record describes an individual aircraft.
9
(iii) Fields: The individual elements of a record are referred to as fields. Hence, from the
example of the aircraft, “ID_Number, Manufacturer_Name, Description,
Classification, Seating_Capacity”, etc, each represents an individual field (element) of
the aircraft record. A field is a character or group of characters that has a specific
meaning for describing data. It is also referred to as smallest unit of information in a
record. One or more fields make a record while one or more records make up a file.
(iv) Data Types: The data to be held within each field of a given record will possess certain
characteristics in terms of size (length measured in characters or digits) and types
(numeric, alphabetic, dates, etc). Each field of a record is allocated a particular data
type which describes the allowed characteristics of the data to be held by the field and
further indicates the range of operations, which can be carried out on the field. For
example, arithmetic operations would be valid on fields containing numeric data but
not on fields containing address or a narrative description.
(v) Keys: A key is a field or combination of fields used to identify a record. When a key
uniquely identifies a record, it is referred to as the primary key.
(i) Simplest tasks that require extensive programming and data manipulation involves high
skilled activities such as data path specification and the understanding of storage
structures.
(ii) There were no provisions for user-friendly system facilities such as queries.
(iii) System centralized control/ administration was difficult.
(iv) With each file having its own file management system, data access programs are
subject to change whenever the structure of the files being accessed changes, i.e.
structural dependencies.
(v) Unnecessary duplication of data. Same data are stored in different files located in
different units of an organization. For instance, a change in marital status of a staff
would mean to update several files and queries across the units in the organisation.
(vi) Data security was virtually non-existent. Changes in file or data characteristics such as
changing a field from integer to real or decimal requires changes in all programs
accessing the data. This is data dependency.
(vii) Lack of data integrity. Data or information about the same element or staff in an
organisation could have different values when updates are not done across board. There
are data anomalies due to inconsistencies and human handling during data transference.
Summary
In Unit 1, you have learnt:
Importance of Data
The Manual Record Management System
The Integrated File System
Components of File Based System
Problems of File Based System
1. What is a file?
2. Discuss the traditional file based system and its problems
10
Unit 2 Meaning and Importance of Database
Expected Duration: 1 week or 2 contact hours
Introduction
We assume that most people have some notion of "database". We see databases in everyday
life - collections of CDs we can order from a company, a phonebook of phone number and
name entries, parts stocked by a supplier to be supplied to a project, records to be processed by
a program, a general repository that a program acts upon (like a cgi-bin program acting on a
web client's behalf to read and write data to disk). This Unit introduces the reader to the
meaning and functional roles databases play in organisation.
Learning Outcomes
When you have studied this session, you should be able to explain:
2.1 The meaning of Database?
2.2 Advantages of Databases over Integrated File System
2.3 Elements of a Database System
2.4 Contexts, Characteristics and Consequences of Database Environments
2.5 Benefits of the Database Approach
2.6 The Structure of a Database System
One of the underlying ideas in modern information systems is database; with the database
management system (DBMS) software which manages the database. Database is the storehouse
of data used by other packages. The data must be well organised so that updating, addition,
deletion, etc. would be easier.
All definitions given to database are centred on one basic principle: the collection of related
data that are of importance to an enterprise. It is a central store of independent data and a
description of the data so – collected and stored.
Colin Ritchie (1998) emphasized that to be worthy of being a database, it must have two
essential properties:
(i) It holds data as an integrated system of records.
(ii) It contains self describing information, i.e., it contains description of the data held in
the database, sometimes referred to as the database schemas.
Stefan (1990) defines database as a structured collection of operational data together with a
description of that data. Also, according to Claude et al. (1995), “A database can be seen as a
collection of data managed by a computer, which can be accessed by several users at the same
time”.
11
records would also be kept in data files and these could be data tables, queries, report formats,
programs or procedures. Moreover, the database system provides facilities for creating,
updating, querying, using and administering with the database.
With a bit more precision, when we use the term database, we mean a logically coherent
collection of related data with inherent meaning, built for a certain application, and representing
a "mini-world".
Let’s examine the definitions of a database in detail to understand this concept fully. The
database is a single, possibly large repository of data, which can be used simultaneously by
many departments and users. All data that is required by these users is integrated with a
minimum amount of duplication. And importantly, the database is normally not owned by any
one department or user but is a shared corporate resource.
A database system takes care of the problems identified with the integrated file system
discussed in Unit One because it is a single organised collection of structured data with a
minimum of duplication of data items so as to provide a consistent and controlled pool of data.
Databases are set up in order to meet the information needs of major parts of an organisation.
The data in a database is common to all users of the system but is independent of programs that
use the data. Databases are constructed by sections. During this process, it is possible to do the
following:
(a) Add new files of data
(b) Add new fields to records already present in the database
(c) Create relationships between the items of data.
(i) The data, which are often structured into files, records and fields and also integrated
(linked) across different files.
(ii) The hardware for processing and doing all the manipulations on the data and also for
storage and communication in a network.
(iii) Software comprising three main levels – Operating System (single or
network/distributed user); Database Management System (DBMS) and the application
software, e.g. accounting, payroll, inventory control, personnel management systems.
(iv) Users at three levels – End users; application programmers customising applications
for users and the Database Administrator at the lowest level, related but with different
functions.
(v) Policy or organisational framework that would provide the policy concept within which
the database system would operate.
(i) A database may be implemented as a single or multiple data tables, multiple being the
common type.
(ii) Some of the data may be internal within the machine or external – distributed across a
distant network.
(iii) Massive data is the essence of database systems.
12
(iv) Different data storage media – Hard disks, CD, Tapes for parts of the data in the same
database. A database needs to be stored on a large capacity direct access device, with a
good back up for security purposes.
(v) Usually database has multiple users who may need concurrent (simultaneous) access to
data.
(vi) Data security requirements to protect all the investment made on the database to prevent
cyber war (a situation in which a company destroys other company’s database), virus
attack, etc.
(vii) Data recovery system in case of natural or artificial disaster.
(viii) Balancing of time of processes with storage space requirements must be considered in
designing a database system.
(ix) Consider whether the database has the static or dynamic property. Static means the
database wont need any addition to it once it is created. Dynamic means we can add or
delete to it any time.
(x) Relative frequencies of update (adding, deleting, modifying) and retrieval operations
must be considered also.
(xi) Whether the requirement of processes is real time (instantaneous) or batch.
The concept of a database system derives from a situation or become important in situations
when we have persistent data that are accumulating and would need to be maintained on a long
time basis. Also, the data must be organised, protected and use it at one time or the other to
support the activities of a given enterprise. The data must be pooled, integrated and shared
among many often concurrent users in a business or an enterprise.
Figure 2.1 shows the structure of a database system. Data from different units are pooled into
a database. Application programs running on machines in different units/departments of the
organisation have access to the database via a Database Management System (DBMS) like
Microsoft Access.
13
Programs
PERSONNEL DATABASE
EMPLOYEES RECORD
SALES DEPT
DBMS CUSTOMERS RECORD
ACCOUNTS
DEPT SALES RECORD
INVENTORY RECORD
PURCHASING
DEPT ACCOUNTS RECORD
Summary
This Unit briefly introduced you to database system. The meaning and elements of database
system are discussed. The advantages of database over the traditional file based system were
also explained
14
Unit 3 Architectural Design of a Database System
Expected Duration: 1 week or 2 contact hours
Introduction
The components of a database system are discussed in this Unit. The construction of a database
system is also explained.
Learning Outcomes
When you have studied this session, you should be able to explain:
3.1 Components of a Database System
3.2 Functional Roles of the Components
As explained, a database system comprises of the users, hardware and software. Users’
interface are only given a part of the whole database applicable to them. For example,
Accounting or personnel aspect of the database. The users see the database arranged suitable
to them, i.e., specific users’ logical view. But inside the database, all the organisational data
are organised in a particular manner. All the units of data that are meaningful and appropriate
and linked together are in the database of the organisation. It is therefore the global logical
view.
The Database Administrator (DBA) configures the Database management System (DBMS) in
such a way that the logical data are arranged in physical structure – how many bits, bytes,
storage medium, etc. These are not supposed to be known by the end users. From figure 3.1, it
is shown that there is a sort of mapping between logical and physical views. The DBMS does
this mapping on the instruction of the DBA.
Computer
Users Consoles
Accounting
Package Mapping
Physical Data
Logical Data
Personnel
Package
DBMS Software
15
3.2 Functional Roles of the Components
3. Programmers
(i) Write application programs like stock control, wage bills for end users
(ii) They are application developers
16
6. Database Users
There are three main categories of database users. They are:
(i) Application Programmers: These are expert programmers who write database
applications. These programs make use query operations to support the database
end-users.
(ii) End Users: These employ query languages (programs or packages with user
friendly question-like statements, e.g., LIST ALL EMPLOYEE FROM OYO
STATE OR DISPLAY ALL EMPLOYEES HAVING SURNAME BEGIN
WITH ‘A’) provided as in integrated part of the DBMS. They can also use
written application programs that accept commands from the terminal and in
turn issue requests to the DBMS. End user activities are mostly queries.
(iii) Database Administrators. The person responsible for the overall control and
maintenance of he database system.
Data as facts only become useful when they are organised, processed and presented in human
understandable forms. Processed business facts provide information the business or
organisation needs to move forward.
A survey of the records an organisation deals with, form of requests for the records and the life
span of such records are the most important factors in any database system. The following
points must be noted:
(a) The kinds of records to be maintained. Records come in many sizes and shapes with
different kinds of information. The methods for filing general correspondence will
definitely be different from the system of filing maps, blueprints, adverts, job
applications, Local Purchasing Orders, LPOs, quotations, etc.
(b) The nature of data or record request. The frequency of request and the speed of retrieval
are of utmost importance in the filing system. This is so because the ability to retrieve
on time might spell the difference between wining and loosing contracts.
(c) The volume of records to be maintained. For reports that are of daily use frequent
entrance into the organisational record pool, equipment needed for filing must be
durable and strong enough to be able to handle information this voluminous.
(d) How long the files will be kept. Some records that should be kept permanently for the
rare instances when they may be requested need not be kept in active files where they
would interfere with fast retrieval of other more frequently requested records
demanding fast retrieval . inactive files would have to removed over time. this process
is called transferring.
Summary
In this Unit you were introduced to the architectural design of a database system. The
different stakeholders in database system were also highlighted. The Unit ends with the
factors for consideration when building a database system
17
Unit 4 Database Management System (DBMS)
Expected Duration: 1 week or 2 contact hours
Introduction
Learning Outcomes
When you have studied this session, you should be able to explain:
4.1 The meaning of DBMS?
4.2 Functions of the DBMS
4.3 Components of the DBMS environment
4.4 Functional requirements of a DBMS
The Database Management System (DBMS) is a special software package that is used to
interact with the database. It is normally used to define the data, to design and consult with the
database as well as to update it. It contains a variety of facilities including a Data Definition
Language (DDL) to create and modify the database structures – files, users and their privileges;
a query language which supports all forms of retrieval and updating; and numerous interfaces
to liaise with the operating system, telecommunication system, programming languages and
other utility software. It also contains data validation routines and maintains a data dictionary
– a complete description of the database structure and contents. Examples of DBMS are dBase,
MS Access, MySQL, SQL Server, Oracle, etc.
Summarily, the Database Management System is a complex software system that is used to
construct, expand and maintain a database. It regulates data access in the shared database. In
essence, a DBMS is a software system that enables users to define, create, and maintain the
database and also provides controlled access to this database.
18
(xi) It initiates actual Input / Output (I/O) operations and coordinates them before the host
operating system performs them.
1. Links between Data: A database is based on a data model whose specific aim is to
define the way data items represented in the system are structured and the links that can
be established between those data items.
2. Data Consistency: The stored data must be consistent with reality.
3. Ease of Data Access: A DBMS must allow any data item in the database to be accessed
easily.
4. Data Security: A DBMS must be capable of protecting the data it manages against any
external aggression.
5. Data Sharing: The DBMS must provide means for managing data sharing among
several applications.
6. Data Independence: An application that handles data using a file system is strongly
dependent on its data. The application must know how the files are structured and the
method for accessing them. In contrast, a DBMS should allow applications to be written
without the programmer having to worry about the physical data and the associated
access methods. Thus the system can evolve to take account of new needs without
disturbing applications that have already been written. Data independence is a concept
linked with the evolution and maintenance of an application. Any factors that make it
easier to develop future versions of an application, and particularly data independence
represent possible large-scale cost savings; which is an essence of a DBMS.
7. Performance: The above functional requirements or constraints must be realised
without detriment to the system’s overall performance.
19
4.5 Advantages and Disadvantages of DBMSs
(i) Control of data redundancy. The database approach eliminates redundancy where
possible. However, it does not eliminate redundancy entirely, but controls the amount
of redundancy inherent in the database. For example, it’s normally necessary to
duplicate key data items to model relationships between data, and sometimes it’s
desirable to duplicate some data items to improve performance. The reasons for
controlled duplication will become clearer when you read the Units on database design.
(ii) Data consistency. By eliminating or controlling redundancy, we’re reducing the risk of
inconsistencies occurring. If data is stored only once in the database, any update to its
value has to be performed only once and the new value is immediately available to all
users. If data is stored more than once and the system is aware of this, the system can
ensure that all copies of the data are kept consistent. Unfortunately, many of today’s
DBMSs don’t automatically ensure this type of consistency.
(iii) Sharing of data. In a file-based approach (the predecessor to the DBMS approach),
typically files are owned by the people or departments that use them. On the other hand,
the database belongs to the entire organization and can be shared by all authorized users.
In this way, more users share more of the data. Furthermore, new applications can build
on the existing data in the database and add only data that is not currently stored, rather
than having to define all data requirements again. The new applications can also rely
on the functions provided by the DBMS, such as data definition and manipulation, and
concurrency and recovery control, rather than having to provide these functions
themselves.
(iv) Improved data integrity. As already stated, database integrity is usually expressed in
terms of constraints, which are consistency rules that the database is not permitted to
violate. Constraints may apply to data within a single record or they may apply to
relationships between records. Again, data integration allows users to define, and the
DBMS to enforce, integrity constraints.
(v) Improved maintenance through data independence. Since a DBMS separates the data
descriptions from the applications, it helps make applications immune to changes in the
data descriptions. This is known as data independence and its provision simplifies
database application maintenance.
Other advantages include: improved security, improved data accessibility and responsiveness,
increased productivity, increased concurrency, and improved backup and recovery services.
There are, however, some disadvantages of the database approach, such as:
(i) Complexity. As already mentioned, a DBMS is an extremely complex piece of
software, and all users (database designers and developers, DBAs, and end-users)
must understand the DBMS’s functionality to take full advantage of it.
(ii) Cost of DBMS. The cost of DBMSs varies significantly, depending on the
environment and functionality provided. For example, a single-user DBMS for a
PC may cost hundreds of thousands. However, a large mainframe multi-user DBMS
servicing hundreds of users can be extremely expensive, perhaps millions of Naira.
20
There is also the recurrent annual maintenance cost, which is typically a percentage
of the list price.
(iii) Cost of conversion. In some situations, the cost of the DBMS and any extra
hardware may be insignificant compared with the cost of converting existing
applications to run on the new DBMS and hardware. This cost also includes the cost
of training staff to use these new systems, and possibly the employment of specialist
staff to help with the conversion and running of the system. This cost is one of the
main reasons why some companies feel tied to their current systems and cannot
switch to more modern database technology. The term legacy system is sometimes
used to refer to an older, and usually inferior, system (such as file-based,
hierarchical, or network systems).
Summary
In this Unit you have been introduced to the meaning of DBMS, its functions in a database
environment and its functional requirements. All access to the database is through the DBMS.
The DBMS provides facilities that allow users to define the database, and to insert, update,
delete, and retrieve data from the database. The DBMS environment consists of hardware (the
computer), software (the DBMS, operating system, and applications programs), data,
procedures, and people. The people include database administrators (DBAs), database
designers, application programmers, and end-users.
21
Unit 5 Data Models
Expected Duration: 1 week or 2 contact hours
Introduction
A database model depicts how the data in a database are stored or arranged. It provides the
technique, which supports the conceptualisation of the database. The model defines the
following:
(i) Rules which bind the relationship among data.
(ii) Constraints among data
(iii) Meaning and interpretation of data, and
(iv) The way data is used.
Learning Outcomes
When you have studied this session, you should be able to explain:
5.1 The meaning of a data model
5.2 Logical Data Model
5.3 Physical Data Model
5.4 Hierarchical Data Model
5.5 Network Data Model
A data model can be defined as a way of thinking about or conceptualising, organising and
relating data. It is an integrated collection of concepts for describing data, relationships between
data, and constraints on the data used by an organization.
A model is a representation of ‘real world’ objects and events, and their associations. It
concentrates on the essential, inherent aspects of an organization and ignores the accidental
properties. A data model attempts to represent the data requirements of the organization, or
the part of the organization, that you wish to model. It should provide the basic concepts and
notations that will allow database designers and end-users to communicate their understanding
of the organizational data unambiguously and accurately. A data model can be thought of as
comprising three components:
(1) a structural part, consisting of a set of rules that define how the database is to be
constructed;
(2) a manipulative part, defining the types of operations (transactions) that are allowed on
the data (this includes the operations that are used for updating or retrieving data and
for changing the structure of the database);
(3) possibly a set of integrity rules, which ensures that the data is accurate.
Each data model is characterised by the set of concepts, definitions or symbols for representing
data as well as the rules that must be followed by anyone who wishes to employ the data model
for organising and relating data.
22
Essentially, the purpose of a data model is to represent data and to make the data
understandable. If it does this, then it can be easily used to design a database.
It is a model for representing or organising data meaningfully to the users of the data. For
example, either hierarchical data model or as tables or relations as in the relational database
model. This is the model in which users represent data logically.
It is a model for organising data on storage media such as disks or tapes. This has to do with
how data are actually stored in storage devices. It is devised to take advantage of processing
speed and storage space opportunities or constraints of the physical media as well as search
operational requirements, frequency of update and retrieval of data. The idea is to optimise
how data are stored.
Data in a complex view is given an order in hierarchical data model. The subclasses are records.
Access to any record is unidirectional. A record can only be accessed through another record
having direct relationship to it. For example, student records can be accessed through a
department, faculty or college. From Figure 5.1, we cannot access microeconomics records
directly, except we go through social sciences then economics. A strict hierarchy is imposed.
One problem here is that lateral accessing of data is impossible. A record will have only one
parent; but the parent may have children and grandchildren. Other disadvantages of this model
are rigidity (one way direction to access a record), time wasting and inefficiency.
Social Sciences
In this model, records can be accessed in multiple ways. Records can have multiple parents and
parents can have multiple children. Networks are also referred to as multiple hierarchies,
superimposed on one another. Access to a record could be through more than one path.
23
National Budget
Local Govt
The challenge with the network data model is that it is usually very complex or complicated to
design as more objects become involved. Users found it difficult to find relationships between
data. It is not user friendly. It is also very expensive and loaded with problem of navigating
through the best access path. The model is in fact costly to design.
Other classical data models are treated in the next two units.
Summary
Underlying the structure of a database is the data model: a collection of tools for describing
data, data relationships, data semantics and constraints. Some classical data models have been
explained in this unit
24
Unit 6 The Entity Relationship Data Model
Expected Duration: 1 week or 2 contact hours
Introduction
The Entity Relationship (ER) model was proposed by Chen (1976). It employs three basic
notions: Entity set, relationship set and attributes. The starting point for designing a database
for an organisation is the ER Modelling where all the entities and their properties or attributes
are obtained. An ER diagram (model) is then drawn to depict the relationships existing between
the entities identified. In this Unit, the reader is introduced to the concept of ER modelling.
Learning Outcomes
When you have studied this session, you should be able to explain:
6.1 Basic Concepts of ER Models
6.1.1 Entity Sets
6.1.2 Property
6.1.3 Relationships
6.2 ER Symbols
An entity is any distinguishable object (concrete or abstract) that exists. For instance, student
is an entity in a school, which is also an entity. An entity has a set of properties and the values
for some of these properties may uniquely identify an entity. For example, matriculation
number of students uniquely identify them in a school.
A group consisting of all similar entities forms an entity set, e.g., the set of all persons who are
students in a school can be defined as entity set student. It is possible for entity set to overlap,
e.g. it is possible to have an entity set of employees of a school (employee) and entity set of all
students of the school (student). A person entity may be an employee or student entity, both or
neither.
We have:
(i) Regular entity: an entity that is not weak, i.e., does not depend on any other entity.
(ii) Weak entity: an entity whose existence depends on another entity. For example,
dependants or children of staff of an organisation. The staff of an organisation is a
regular entity. Similarly, next of kin in the context of students is a weak entity.
An entity may have sub-types. For example, students may have undergraduate, postgraduate
or Diploma as sub-types. Note that sub types are not properties of students but are its sub types.
One can never be an undergraduate without being a student. Similarly, junior staff and senior
staff are sub types of staff.
25
6.1.2 Property
This is a piece of information that describes an entity. Each type of property draws its value
from a value set or domain (i.e. all possible values). In the ER model:
(i) Property can be simple or composite. A composite property is like the name of a
person (first, middle, last or surname). Contact address can be composite if we have
street, office and telephone addresses.
(ii) Property can be key or non-key. A key property uniquely identifies an instance of the
entity. For example, Matric. Number. An instance of an entity is each member of the
entity. An instance of student is a particular student.
(iii) Property can be single or multi-valued. Multi-valued means the property can assume
more than one value at an instant of time, e.g., somebody may have many aliases or
addresses at the same time.
(iv) A property can be base or derived. Base means that the property assumes an original
value. For instance, Total_pay is a derived property of other pays like development
levy and school fees paid.
(v) A property can be null. A null is used when an entity does not have a value for the
property or attribute. Null value is used when an a property is not applicable to an entity.
For example, the number_of_children for an unmarried employee can be null. Null can
also mean that property value is unknown. An unknown value may either be missing
(value does not exist, but we do not have the information) or unknown (we do not know
whether or not the value exists).
6.1.3 Relationships
Relationship is a property that links every entity in the ER model together. An entity can be
involved in tow relationships. At an instant of an enterprise, some entities may not relate with
one another at all. Only entities can be linked but not properties in relationship.
Consider two entities student and game. A relationship plays can be defined to denote the
association between students and the games they play. The association between entity sets is
referred to as participation, that is, the entity sets E1, E2, ..... En participates in relationship set
R.
The function that an entity plays in the relationship is called the entity role. The roles of entity
sets participating in a relationship are not usually specified if the participating entities are
distinct. But in recursive relationship, where the entities are not distinct, it is necessary to to
specify the roles of the participating entities. For example, in the recursive relationship work
for between employee entities. The first employee of a pair takes the role of manager, whereas,
the second takes the role of worker.
26
A relationship may also have descriptive attributes/properties. For example, we could associate
the attribute session and score to the relationship took between student and course entity sets.
The number of entity sets that participates in a relationship set is referred to as the degree of
the relationship set. A binary relationship is of degree 2, a ternary relationship is of degree 3.
Relationship set of degree greater than 2 are usually referred to as n-ary relationship set. Figure
6.1 shows that 1 student offers Many (M) courses.
Name
Student
1
Code Offers DBirth
Relationship
Gender
Title
M
Course
Properties
Unit
Properties
It is total if every instance in an entity must offer a course. Then the participation of student in
the relationship is total. If a particular course is not going to be taken by all students, then the
participation of course in the relationship is partial. Furthermore, we can have the following
different types of relationships:
(i) One-to-One Relation. One member of an entity E1 would participate in one instance
of E2 e.g. one student takes one project or one project is taken by only one student
1 1
Course Student
(ii) One-to-Many Relation. One student can take many courses or one course can be
taken by many students. Of course, one-to-many will definitely reduce to many-to-
many relationship, which is a general form of 1-to-many.
M 1
Course Student
27
6.2 ER Symbols
Regular Weak
Relationship: Diamond for relationship, single for regular and double diamond for weak
M 1
Student Student
(a) Relationship between regular and regular entities
1 M
M 1
(b) Weak relationship between regular and weak entities, e.g. two students of the same
father may have one next of kin.
Properties: drawn with ellipse shapes. Can be base or derived, single or composite. Derived
properties are drawn with broken ellipse lines. The key property is underlined
Name Student
First
Name
Composites
Matric
Lines link attributes to entity sets and entity sets to relationship sets. Double line indicates total
participation in a relationship.
M 1 M 1
Student Student Student Student
Partial Total
28
Same as above but participation is partial. A few members
E1 participate only once
E1 E1 E2
M
(i) (ii)
E1 E2
E1 E2
(iii) (iv)
The following could depict the ER Model for the database scenario:
Customer
m Visit 1 Branch
m 1
Buys Works
Attends at
m m m
Employee
Product Sell
m m
29
2. Hospital Database Environment
In a hospital environment, the following entities can be obtained:
i. Patient
ii. Doctor
iii. Drug
The following could depict the ER Model for the database scenario:
1
Prescribes Drug
Doctor
m m
1
m 1
Consults Patient Purchase
Summary
The concept of Entity Relationship Model was explained in this unit. ER modelling is the
starting point for modern database design.
30
Unit 7 Relational Database Model
Expected Duration: 1 week or 2 contact hours
Introduction
The Relational Database Management System (often called RDBMS for short) has become the
dominant DBMS in use today. The RDBMS represents the second generation of DBMS and is
based on the relational data model proposed by Dr E.F. Codd in his seminal paper ‘A Relational
Model of Data for Large Shared Data Banks’ in 1970. In the relational model, all data is
logically structured within relations (tables). A great strength of the relational model is this
simple logical structure. Yet, behind this simple structure is a sound theoretical foundation that
is lacking in the first generation of DBMSs (the network and hierarchical DBMSs typified by
systems such as IDMS/R from Computer Associates and IMS from IBM).
The relational model is based on the mathematical concept of a relation, which is physically
represented as a table. Codd, a trained mathematician, used terminology taken from
mathematics, principally set theory and predicate logic. In this unit, we explain the terminology
and structural concepts of the relational Model
Learning Outcomes
When you have studied this session, you should be able to explain:
7.1 Relational Database (RD) Structure
7.1.1 Relation
7.1.2 Attribute
7.1.3 Domain
7.1.4 Tuple
7.1.5 Relational database
7.2 Properties of Relational Tables
7.3 Relational keys
7.3.1 Superkey
7.3.2 Candidate key
7.3.4 Primary key
7.3.5 Foreign key
7.4 Representing Relational Databases
The concept of the Relation is the fundamental concept of relational database. Relation is
linking sets of data or relation between sets of data. In relational database, data are organised
in form of tables or relations.
7.1.1 Relation
A relational DBMS requires only that the database be perceived by the user as tables. Note that
this perception applies only to the way we view the database; it does not apply to the physical
structure of the database on disk, which we can implement using a variety of storage structures
(such as a heap file or hash file).
31
The number of entities identified in the ER model determines how many tables to be created
for storing data in the RD model. For each relationship between two or more entities, we also
define two or more relations/tables pertaining to that relationship. Each relation comprises rows
and columns. First row comprises the header of the relation containing the attributes of that
relation. The body comprises the tuples which are like records. One tuple for every instance or
member of an entity. The header is static, i.e. number of attributes not changed frequently; but
the size of the tuples could change due to updating, insertion or deletion. The number N of
attributes is called the degree of the relation while the number of tuples is the cardinality of the
relation, i.e., number of time-varying records in the relation.
7.1.2 Attribute
An attribute is a named column of a relation.
In the relational model, we use relations to hold information about the objects that we want to
represent in the database. We represent a relation as a table in which the rows of the table
correspond to individual records called tuples and the table columns correspond to attributes.
Attributes can appear in any order and the relation will still be the same relation, and therefore
convey the same meaning.
A1 A2 A3 A4 A5 Header
Tuples
For example, in video rental company, the information on branches is represented by the
Branch relation, with columns for attributes branchNo (the branch number), street, city, state,
zipCode, and mgrStaffNo (the staff number corresponding to the manager of the branch).
Similarly, the information on staff is represented by the Staff relation, with columns for
attributes staffNo (the staff number), name, position, salary, and branchNo (the number of the
branch the staff member works at). Figure 7.1 shows instances of the Branch and Staff relations.
As you can see from this figure, a column contains values for a single attribute; for example,
the branchNo columns contain only numbers of branches.
Branch Relation:
branchNo Street City State ZipCode mgrStaffNo
B001 2 Ibadan street Ibadan Oyo 0020 S23
B002 14 Chuks
Avenue Apapa Lagos 0010 S29
Staff Relation:
staffNo name position Salary branchNo
S23 Akinola S. O. Manager 250,000 B001
S28 Hamzat B. S. Supervisor 150, 000 B007
Domains are an important feature of the relational model. Every attribute in a relational
database is associated with a domain of permissible values. Domains may be distinct for each
attribute, or two or more attributes may be associated with the same domain. The domain of
attribute gender is male or female. Domain of age in a government establishment is 18 to 60
years. The domain is important for data integrity, to allow only permissible values for a
particular attribute.
Figure 7.2 shows the domains for some of the attributes of the Branch and Staff relations.
Note that, at any given time, typically there will be values in a domain that don’t currently
appear as values in the corresponding attribute. In other words, a domain describes possible
values for an attribute.
Figure 7.2: Domains for some attributes of the Branch and Staff relations.
The domain concept is important because it allows us to define the meaning and source of
values that attributes can hold. As a result, more information is available to the system and it
can (theoretically) reject operations that don’t make sense. For example, it would not be
sensible for us to compare a staff number with a branch number, even though the domain
definitions for both these attributes are character strings. Unfortunately, you’ll find that most
Relational Databases don’t currently support domains.
7.1.4 Tuple
The fundamental elements of a relation are the tuples or records in the table. In the Staff
relation, each record contains five values, one for each attribute. As with attributes, tuples can
appear in any order and the relation will still be the same relation, and therefore convey the
same meaning.
A relational database consists of tables that are appropriately structured. The appropriateness
is obtained through the process of normalization, to be studied later.
33
Alternative terminology
The terminology for the relational model can be quite confusing. In this unit, we have
introduced two sets of terms: (relation, attribute, tuple) and (table, column, record). Other terms
that you may encounter are file for table, row for record, and field for column. You may also
find various combinations of these terms, such as table, field, and row.
From now on, we will tend to drop the formal terms of relation, tuple, and attribute, and instead
use the more frequently used terms table, column, and record.
Purchase Table
Date Item Price Qty Totals Supplier Invoice No
It is possible we declare a rule that only one supplier can supply on a particular date on
the same item. If the invoice number is not included, then all the attributes can serve as
primary key for the table. In this case, record number or invoice number, serial number,
defined as auto number is used as the primary key.
As just stated, each record in a table must be unique. This means that we need to be able to
identify a column or combination of columns (called relational keys) that provides uniqueness.
In this section, we explain the terminology used for relational keys.
34
7.3.1 Superkey
Since a superkey may contain additional columns that are not necessary for unique
identification, we are interested in identifying superkeys that contain only the minimum number
of columns necessary for unique identification.
A candidate key is a superkey that contains only the minimum number of columns
necessary for unique identification.
Consider the Branch table shown in Figure 7.1. For a given value of city, we would expect to
be able to determine several branches (for example, a particular city can have two branches).
This column, therefore, cannot be selected as a candidate key.
On the other hand, since the company allocates each branch a unique branch number, then for
a given value of the branch number, branchNo, we can determine at most one record, so that
branchNo is a candidate key. Similarly, as no two branches can be located in the same zip code,
zipCode is also a candidate key for the Branch table.
There may be several candidate keys for a table. Consider, for example, a table called Role,
which represents the characters played by actors in videos. The table comprises an actor
number (actorNo), a catalog number (catalogNo), and the name of the character played
(character), as shown in the Table below. For a given actor number, actorNo, there may be
several different videos the actor has starred in. Similarly, for a given catalog number,
catalogNo, there may be several actors who have starred in this video. Therefore, actorNo by
itself or catalogNo by itself cannot be selected as a candidate key. However, the combination
of actorNo and catalogNo identifies at most one record. When a key consists of more than one
column, we call it a composite key.
Role Table
35
7.3.4 Primary key
The primary key is the candidate key that is selected to identify records uniquely within
the table.
Since a table has no duplicate records, it is always possible to uniquely identify each record.
This means that a table always has a primary key. In the worst case, the entire set of columns
could serve as the primary key, but usually some smaller subset is sufficient to distinguish the
records. The candidate keys that are not selected to be the primary key are called alternate
keys. For the Branch table, if we choose branchNo as the primary key, zipCode would then be
an alternate key. For the Role table, there is only one candidate key, comprising actorNo and
catalogNo, so these columns would automatically form the primary key.
A foreign key is an attribute (column), or set of columns, within one table that matches
the candidate key of some (possibly the same) table.
When a column appears in more than one table, its appearance usually represents a relationship
between records of the two tables. For example, in Figure 7.1 the inclusion of branchNo in both
the Branch and Staff tables is quite deliberate and links branches to the details of staff working
there. In the Branch table, branchNo is the primary key. However, in the Staff table the
branchNo column exists to match staff to the branch they work in. In the Staff table, branchNo
is a foreign key. We say that the column branchNo in the Staff table targets or references the
primary key column branchNo in the home table, Branch. In this situation, the Staff table is
also known as the child table and the Branch table as the parent table.
You may recall that one of the advantages of the DBMS approach was control of data
redundancy. This is an example of ‘controlled redundancy’ – these common columns play an
important role in modelling relationships, as we’ll see in later Units.
A relational database consists of one or more tables. The common convention for representing
a description of a relational database is to give the name of each table, followed by the column
names in parentheses. This is usually called the database schema. Normally, the primary key
is underlined. The description of the relational database for the video rental company is:
The next tables show an instance of the database schema for a company called videoCompany.
36
Branch
branchNo Street City State ZipCode mgrStaffNo
B001 2 Ibadan street Ibadan Oyo 0020 S23
B002 14 Chuks
Avenue Apapa Lagos 0010 S29
Staff
staffNo name position Salary branchNo
S23 Akinola S. O. Manager 250,000 B001
S28 Hamzat B. S. Supervisor 150, 000 B007
Video
catalogNo title Category dailyRental Price directorNo
207 Die another day Action 100.00 150.00 D1001
289 Men in the dark Fantasy 100.00 300.00 D7834
Director
directorNo directorName
D1001 Papa Kay
D7834 Mama Kay
Actor
actorNo actorName
A1002 Olu omo
A4006 Jide Kosoko
Role
actorNo catalogNo character
A1002 207 Brother Jero
A4006 289 Uncle
Saheed
Member
memberNo fName lName address
M100 Alima Buhari 25 Kolo street, kaduna
M200 Kayode Chuks 10 Malomo street, Agbowo
Registration
branchNo memberNo staffNo dateJoined
B001 M100 S23 5 July 2018
B002 M200 S28 9 October 2020
RentalAgreement
rentalNo dateOut dateReturn memberNo videoNo
R100 4 Jan 2015 8 Jan 2015 M100 1234
R200 5 Feb 2018 10 Feb 2018 M200 5678
37
VideoForRent
videoNo available catalogNo branchNo
1234 Y 207 B001
5678 N 289 B002
Another example: Consider the Student Course ER Model drawn earlier in Unit 6
Name
Student
1
Code Offers DBirth
Relationship
Gender
Title
M
Course
Properties
Unit
Properties
The following could represent the Relational Database schemas for the entities and the
relationship tables.
Note the foreign keys in the Offers table. This table looks up to the parent tables: Student and
Course.
Summary
The Relational Database has become the dominant database in use today. Relations are
physically represented as tables, with the records corresponding to individual tuples and the
columns to attributes. Properties of relational tables are: each cell contains exactly one value,
column names are distinct, column values come from the same domain, column order is
immaterial, record order is immaterial, and there are no duplicate records.
A superkey is a set of columns that identifies records of a table uniquely, while a candidate
key is a minimal superkey. A primary key is the candidate key chosen for use in identification
of records. A table must always have a primary key. A foreign key is a column, or set of
columns, within one table that is the candidate key of another (possibly the same) table.
38
Self Assessment Questions (SAQs)
39
Unit 8: The Relational Database Design Process
Expected Duration: 1 week or 2 contact hours
Introduction
Before you build the tables and other objects that will make up your system, it is important to
take time to design it. A good design is the keystone to creating a system that does what you
want it to do effectively, accurately and efficiently. We explore the design principle for
Relational Databases in this Unit.
Learning Outcomes
When you have studied this session, you should be able to explain or determine:
8.1 The Basic Steps in Designing a Database System
8.3 The purpose of the system
8.3 The tables that are needed in the system
8.4 Identification of fields with unique values
8.5 The relationships between tables
8.5.1 Relationship between Tables: Further Explanations
8.6 Refining the design
8.7 Entering data and create other system objects
To design a database system for an enterprise or organization, the following steps are
necessary:
NOTE: Some of the listed steps (determining tables, data fields and relationships) may cross
and be repeated a few times when designing a relational database.
Building a database is a process of examining the data that is necessary and useful for an
application, then breaking it down into a relatively simple row and column format.
To determine the purpose of the system, the database designer needs to know what information
the potential users would want from the database (detailed scenario). From that, he can
determine what subjects he needs to store facts about (the tables) and what facts he need to
store about each subject (the data fields).
40
So, the following questions must be answered:
The first step in creating a database is creating a plan that serves both as a guide to be used
when implementing the database and as a functional specification for the database after it has
been implemented. The nature and complexity of a database application, as well as the process
of planning it, can vary greatly. In the first case, the database design may be little more than a
few notes on some scratch paper. In the latter case, the design may be a formal document with
hundreds of pages that contain every possible detail about the database.
NOTE: Modeling the structure on paper before opening computer and starting coding is highly
recommended. Planning may seem time-consuming up front, but not planning is twice as time-
consuming later.
To determine the tables can be the trickiest step in the database design process. That is because
the results that you want from the database (e.g. the reports that you want to print, the forms
that you want to use, the questions that you want answered) don't necessarily provide clues
about the structure of the tables that can produce them. In fact, it may be better to sketch and
rework your design on paper first.
When you design your tables, divide up pieces of information by keeping following
fundamental design principles in mind:
It needs be noted that all the entities identified in the ER-Modelling stage will form the
individual tables or relations in the Relational Database design. All their attributes will form
the columns of the tables. For instance, the entity student and customer may be represented by
the following tables, with their primary keys underlined:
Student Table:
Customer Table:
41
8.4 Identify fields with unique values
Next, we have to identify fields with unique values in order to define table primary key. The
primary key will uniquely identify each individual record in a table and be able to relate
information stored in separate tables. By having a different primary key in each record one can
tell two records apart. The goal of setting primary keys is to ensure each records’ uniqueness.
This is called entity integrity in the database management.
Notes:
The power of a relational database system comes from its ability to quickly find and
bring together (related) information stored in separate tables by using queries, forms
and generating reports. In order to do this, each table should include a field or set of
fields to uniquely identify each record stored in the table. This information is called the
primary key of the table. Once we specify a primary key for the table, to ensure
uniqueness, the system will prevent any duplicated or null values from being entered in
the primary key fields.
To be able to set relationships between tables, we must establish a link between fields that
contain common and related information. The link field in another table is known as a foreign
key data field. A relationship is established by linking these key fields between tables - the
primary key in the 'primary' table and a foreign key in the 'related' table.
Name
Student
1
CourseCode MatricNo
Offers
Relationship
Gender
Title
M
Course
Properties
Unit
Properties
42
The relationship between Course and Student is that one student offers many courses in an
institution. The relationship – Offers, will have its own table in the Relational Database Design.
We can now give a proper name for it as Student_Course Table. The primary keys from each
table will jointly form the primary keys in this table. These keys are however otherwise called
foreign keys. They are foreign in the Student_Course Table because they have already been
defined as primary keys in their base tables. The Student_Course Table is actually going to
serve as a look-up table to the Student and Course tables.
We must find other attributes in the Student_Course Table to augment the primary key
attributes. For example, what may relate a student and course together is the result or score the
student obtained from the course and possibly the Grade Point, GP. In fact, we may change the
name of the relationship table to Exam or Result tables.
For a relationship between Customer and Product, which we may tagged “Buys”, other
attributes in the relationship table could be quantityBought, amountPaid, PurchaseTime and
PurchaseDate.
Note that we also add Session to the table and underlined it, to serve as part of the primary key
for the table Student_Course. This is because, a student may repeat a course in one session and
he must re-take the course in another session. Consider records 1 and 3 in the table above. All
these results are coming to be stored in this table. So, using MatricNo and CourseCode alone
will not suffice to serve as primary key in the table, to avoid duplication or records. The Session
differentiates the two records 1 and 3 in the above table.
The Customer_Product Table may also be called Purchase Table. It is a relationship table for
Customer and Product tables. The CustomerID is the primary key coming from the Customer
Table while the ProductID is the primary key coming from the Product Table. Therefore the
two of them are foreign keys in this table. Now, we have added two more attributes from this
table to joint serve as primary key for the Purchase Table: PurchaseTime and PurchaseDate.
Why? A customer could purchase same product same day two or three times, but time of
purchases will differentiate these records from the database. So, duplication of records is
disallowed. Consider records 1 and 3 for customer with ID C01 buying a product with ID P01
on the same date 23/1/2022, but the time of purchases are different.
43
Sometimes three tables (entities) may be involved in a relationship. For instance, a doctor (ID
= D03) prescribing a drug (ID = DR5) to a patient (ID = P03). Each of these personalities are
entities on their own and they will have their individual tables. The relationship table between
them could look as the one presented next.
Can you explain why we have included ConsultationTime and ConsultationDate as part of the
primary key for this table?
Sometimes, it is possible all the attributes on a table may serve as the primary key. This is
called an all-key table. It may occur when we cannot vividly determine that a group of some
attributes will suffice as primary key for the table.
So, every table must have a primary key - one or more data fields whose contents are unique
to each record. When linking tables we link the primary key field from one (primary or 'parent')
table to a field in another (related or 'child') table that has the same name, structure and data
type. By matching the values from the primary key to the foreign key in both tables, we can
relate two records.
Tables store data about entities, while columns contain the attributes of the entities.
Now that we have defined our information into tables and identified primary key fields, we
need a way to tell the system how to bring related information back together in meaningful
ways. To do this, we define relationships between tables. Relationship is an association
between common fields (columns) in two tables. A relationship works by matching data in key
fields. In most cases, these matching fields are the primary keys from one table, which provides
a unique identifier for each record, and a foreign key in the other table. The kind of relationship
that the system creates depends on how the related fields are defined.
When we physically join two tables by connecting fields with related information, we create a
relationship that is recognized by the system (like Microsoft Access). The specified relationship
is important. It tells the system how to find and display information from fields in two or more
tables. The system needs to know whether to look for only one record in a table or to look for
several records on the basis of the relationship.
1. One-to-one (1:1) - each record in Table A can have only one matching record in Table
B, and each record in Table B can be related to only one record in Table A. For instance,
one student will execute one project at Final year in a higher institution. Although this
may vary for institutions, but that is the norm.
44
This type of relationship is not frequently used in database systems, but it can be very
useful way to link two tables together. However, the information related in this way
could be in one table. You might use a one-to-one relationship to divide a table with
many fields in order to isolate part of a table for security reasons, or to store information
that applies only to a subset of the main table, or for efficient use of space. A one-to-
one relationship is created if both of the related fields are primary keys or have unique
indexes.
2. One-to-many (1:M) - is the most common type of relationship and it is used to relate
one record from the 'primary' table with many records in the 'related' table. In a one-to-
many relationship, a record ('parent') in Table A can have many matching records
('children') in Table B, but a record ('child') in Table B has only one matching record
('parent') in Table A. This kind of relationship is created if only one of the related fields
is a primary key or has a unique index.
3. Many-to-many (M:M) - is used to relate many records in the table A with many
records in the table B. A record ('parent') in Table A can have many matching records
('children') in Table B, and a record ('child') in Table B can have many matching records
('parents') in Table A. It is the hardest relationship to understand and it is not correct.
By breaking it into two one-to-many relationships and creating a new (junction/link)
table to stand between the two existing tables will enable correct and appropriate
relationship setting. A many-to-many relationship is really two one-to-many
relationships with a junction/link table. NOTE: Link table usually has the composite
primary key that consists of the foreign keys from both tables A and B.
When tables are linked (joined) together, one table is usually called 'parent' or 'primary' or
‘base’ table ('one end' in the 1:M relationship and 'one end' (primarily created table) in the 1:1
relationship) and another table is called 'child' or 'related' table ('many end' in the 1:M
relationship and 'one end' (subsequently created table) in the 1:1 relationship). This is known
as a parent-child relationship between tables. Records in a primary table cannot be modified
or deleted if there are related records in the 'child' table - there will not be an orphan (related)
record without a parent (primary) record. Also, a new record cannot be added to the related
table if there is no associated record in the primary table. This is one of the concepts of database
referential integrity rules, which we will discuss in a latter Unit.
The process of designing a relational database includes making sure that a table contains only
data directly related to the primary key, that each data field contains only one item of data, and
that redundant (duplicated) data is eliminated. The task of the database designer is to structure
the data in a way that eliminates unnecessary duplication and provides a rapid search path to
all necessary information. This process of specifying and defining tables, keys, columns and
relationships in order to create an efficient database is called normalization.
We use the normalization process to design efficient and functional databases. By normalizing,
we store data where it logically and uniquely belongs. Normalization process involves a few
steps and each step is called a form. Forms range from the first normal form (1NF) to fifth
45
normal form (5NF). There is also one higher level, called domain key normal form (DK/NF).
We will cover the Normalization of database extensively in a Unit later.
When we are satisfied that the table structures meet the design goals described here, then it's
time to go ahead and add all our existing data to the tables. We can then create any queries,
forms, reports, macros, and modules that we may want.
Summary
In this Unit, we have discussed the basic principles that are involved in designing relational
database tables. We also discussed extensively, the concept of relationships between tables in
a relational database environment.
Self-Assessment Questions
Design a full relational database system showing the parent and relationship tables for the
following database scenario:
46
Unit 9: Relational Database Integrity
Expected Duration: 1 week or 2 contact hours
Introduction
In the previous Unit, we discussed the structural part of the relational data model. As we
mentioned in Section 7.1, a data model has two other parts: a manipulative part, defining the
types of operations that are allowed on the data, and a set of integrity rules, which ensure that
the data is accurate. In this section, we discuss the relational integrity.
Learning Outcomes
When you have studied this session, you should be able to explain:
NOTE: Referential integrity operates strictly on the basis of the tables' key fields. It checks
each time a key field, whether primary or foreign, is added, changed or deleted. If any of these
listed actions creates an invalid relationship between two tables, it is said to violate referential
integrity. Referential integrity is a system of rules that Database management Systems (DBMS)
use to ensure that relationships between records in related tables are valid, and that we don't
accidentally delete or incorrectly change related data.
1. Entity Integrity ensures that each row (record) is a unique instance in a particular table
by enforcing the integrity of the primary key or the identifier column(s) of a table (e.g.
ID, Reference Code, etc).
2. Domain Integrity ensures validity of entries (data input) for a column through the data
type, the data format and the range of possible values (e.g. date, time, age, etc.).
47
3. Referential Integrity preserves the defined relationships between tables when records
are added, modified or deleted by ensuring that the key values are consistent across
tables; such consistency requires that there are no references to non-existent values and
if a key value changes, all references to it change consistently through database,
otherwise a key value cannot be changed.
4. User-Defined Integrity enables specific (required) business rule(s) to be defined and
established in order to provide correct and consistent control of an application's data
access (e.g. who can have permissions to modify data, how generated reports should
look like, which data can be modified, etc.).
The integrity of data in the relational database must be enforced. One of the rules is that a
relation/table must have a unique primary key and secondly, there must be consistency of data
in the database.
Since every column has an associated domain, there are constraints (called domain constraints)
in the form of restrictions on the set of values allowed for the columns of tables. In addition,
there are two important integrity rules, which are constraints or restrictions that apply to all
instances of the database. The two principal rules for the relational model are known as entity
integrity and referential integrity. Before we define these terms, we need first to understand
the concept of nulls.
9.4 Nulls
Null represents a value for a column that is currently unknown or is not applicable for
this record.
A null can be taken to mean ‘unknown’. It can also mean that a value is not applicable to a
particular record, or it could just mean that no value has yet been supplied. Nulls are a way to
deal with incomplete or exceptional data.
However, a null is not the same as a zero numeric value or a text string filled with spaces; zeros
and spaces are values, but a null represents the absence of a value. Therefore, nulls should be
treated differently from other values. For example, suppose it was possible for a branch to be
temporarily without a manager, perhaps because the manager has recently left and a new
manager has not yet been appointed.
In this case, the value for the corresponding mgrStaffNo column would be undefined. Without
nulls, it becomes necessary to introduce false data to represent this state or to add additional
columns that may not be meaningful to the user. In this example, we may try to represent the
absence of a manager with the value ‘None at present’. Alternatively, we may add a new
column ‘currentManager?’ to the Branch table, which contains a value Y (Yes), if there is a
48
manager, and N (No), otherwise. Both these approaches can be confusing to anyone using the
database.
Having defined nulls, we’re now in a position to define the two relational integrity rules
The first integrity rule applies to the primary keys of base tables.
Entity integrity: In a base or parent table, no column of a primary key can be null.
A base or parent table is a named table whose records are physically stored in the database.
This is in contrast to a view. A view is a ‘virtual table’ that does not actually exist in the
database but is generated by the DBMS from the underlying base/parent tables whenever it’s
accessed.
The null primary key rule states that the primary key values should not be null. From an earlier
definition, we know that a primary key is a minimal identifier that is used to identify records
uniquely. This means that no subset of the primary key is sufficient to provide unique
identification of records. If we allow a null for any part of a primary key, we’re implying that
not all the columns are needed to distinguish between records, which contradicts the definition
of the primary key. For example, as branchNo is the primary key of the Branch table, we should
not be able to insert a record into the Branch table with a null for the branchNo column.
Referential integrity: If a foreign key exists in a table, either the foreign key value
must match a candidate key value of some record in its home/parent table or the foreign
key value must be wholly null.
In the previous Figure 7.1 Unit 7, branchNo in the Staff table is a foreign key targeting the
branchNo column in the home (parent) table, Branch. It should not be possible to create a staff
record with branch number B300, for example, unless there is already a record for branch
number B300 in the Branch table. However, we should be able to create a new staff record with
a null in the branchNo column to allow for the situation where a new member of staff has joined
the company but has not yet been assigned to a particular branch.
If we have 789 as a matric number in the Student table, which is a primary key, we can have a
record with this matric number in the Offers table. But we cannot have record with matric
number 777 in Offers table if it has not been defined in the base/parent table (Student).
49
Foreign keys are attributes (possibly composite) in one relation R2, whose values are required
to match those of the primary key values of some other relation R1; where R1 and R2 may not
necessarily be discrete (may be the same relation). R2 is the referencing (child) relation while
R1 is the parent or base or referenced relation. This is what is referred to as referential integrity
constraints. In other words, no value of foreign key (FK) of R2 can be allowed unless that
value is already present in the primary key (PK) of R1.
Business rules: Rules that define or constrain some aspect of the organization.
Examples of business rules include domains, which constrain the values that a particular
column can have, and the relational integrity rules that we have just discussed. Another
example is multiplicity, which defines the number of occurrences of one entity (such as a
branch) that may relate to a single occurrence of an associated entity (such as a member of
staff).
It is also possible for users to specify additional constraints that the data must satisfy. For
example, if our videoCompany database has a rule that a member can only rent a maximum of
10 videos at any one time, then the user must be able to specify this rule and expect the DBMS
to enforce it. In this case, it should not be possible for a member to rent a video if the number
of videos the member currently has rented is 10. Unfortunately, the level of support for business
rules varies from system to system.
When a database is created, some things have to be taken into consideration to ensure that the
database is consistent with respect to the constraints. Methods involved in doing this are:
(i) Restricted: Before a record could be deleted or modified, the record with the
primary key must not be participating in any relationship. Before we delete a student
record, we have to check if his matric number occurs in any other tables. One way
of doing is to have another estra key called flag indicating Active (when we enter
the table) and Inactive when we leave it, or status – valid / invalid field. So that in
listing the table, all invalid records would not be listed.
(ii) Cascade: Immediately we delete a record with a particular matric number, all other
records in the database having the matric number should also be deleted.
(iii) Nullifies: All foreign key values are set to NULL. When we delete a student with
matric number 100 in the Student table, then other relation’s domains or records
with matric number 100 is deleted.
(iv) A dialog box (form for user action) must be shown to the user, telling him that an
inconsistency has occurred and that the dialog box should say “What do you want
me to do?”
50
Accordingly, for each foreign key, the database designer should specify:
(a) The attribute or attribute combination that defines the foreign key
(b) The target or referenced primary key
(c) The foreign key rules. These are NULL, DELETE and UPDATE rules.
The NULL – whether the foreign key can accept NULL in full or in part.
DELETE: What the system should do in the event that a target primary key is to be deleted –
restricted, cascade, etc
The UPDATE – what the system should do when the target or primary key is to be modified
or changed.
Summary
A null represents a value for a column that is unknown at the present time or is not defined for
this record. Entity integrity is a constraint that states that in a base table no column of a primary
key can be null. Referential integrity states that foreign key values must match a candidate
key value of some record in the home (parent) table or be wholly null.
51
Unit 10: Relational Calculus – An Introduction
Expected Duration: 1 week or 2 contact hours
Introduction
How do we manipulate databases after creation? Within the relational data model, the data in
the various tables can be manipulated by means of certain primitive operators. The operators
either work on a table or two at a time. Moreover, the result of the operation is another derived
table formed; upon which another operator may be applied, etc., until the required tables in the
database are updated or the required set of data is retrieved. You shall be introduced to the
relational operators in this unit.
Learning Outcomes
When you have studied this session, you should be able to explain:
10.1 The Relational Calculus
10.2 Relational Operators
10.3 One Table Operators
10.3.1 Restrict/Select
10.3.2 Project
10.4 Two-Table Operators
10.4.1 Cartesian Product.
10.4.2 Union.
10.4.3 Intersection
10.4.4 Difference
10.4.4 Divide / Quotient ()
10.4.5 Natural Join
10.4.6 Theta Join ()
10.4.7 Equi Join
The Relational Calculus is a formal query language. Instead of having to write a sequence of
relational algebra operations, we simply write a single declarative expression, describing the
results that we want. This is somewhat akin to writing a program in C or java instead of
assembler, or (in the spirit of real world examples!) telling the babysitter to call with any
problems instead of detailing how to pick up the phone, dial numbers, etc.
The expressive power is identical to using relational algebra. Many commercial databases use
a language like Structured Query Language or even a language like QBE (Query by Example)
or QUEL (similar to SQL and used for the INGRES RDBMS). A specific relational query
language is said to be relationally complete if it can be used to express any query that the
relational calculus supports.
There are two common ways of creating a relational calculus (both are based on First Order
Predicate Calculus, or basic logical operators).
In a Tuple Relational Calculus, variables range over tuples, i.e., variables can take on
values of individual table rows. This is just what we want to do a routine query, such
52
as selecting all food items (tuples) from a grocery store (table) where all the ingredients
(specific attribute) are organic (value), say.
In a Domain Relational Calculus, variables range over domain values of the attributes.
This tends to be more complex, and variables are required for each distinct attribute.
E.F. Codd's work that inspired RDBMSs was based on mathematical notions, so it is no surprise
that the theory of database operations are based on set theory. The Relational Algebra provides
a collection of operations to manipulate relations. It supports the notion of a query, or request
to retrieve information from a database.
Relational algebra is a procedural query language. It takes one or two relations as input and
produces a new relation as result. The fundamental operation in the relational algebra are select,
project, union, set difference, Cartesian product and others like intersection, join, etc. The
operators used on relational databases can be a one table or two-tables type. One table operators
are:
(i) Restrict or Select and
(ii) Project
10.3.1 Restrict/Select: This operator extracts specified tuples (rows) from a given relation
based on a specified condition. We extract the tuples for which the condition is true. The size
of the original table is reduced. Sigma () is used to denote selection operator. The predicate
appears as a subscript to and the argument relation is given in parenthesis following . For
example, to select tuples from students where paid_status is okay:
name = “okay”(Students)
The comparison operators (=, <, >, , , <>) and logical operators ( (and), (or), (not))
can be used in the selection predicate.
53
10.3.2 Project: This extracts specified attributes from a specified relation in the order in which
the attributes are specified. The operator picks certain attributes from a relation. Project is
denoted by (). For example, if we want to list only the loan number and amount of the loans,
we write:
loan-number, amount (Loan)
In summary, P attrlist (R) selects columns with attributes in attrlist from relation R. We might
have a huge employee table with many attributes we don't want to see, so we can look at a more
directed projection of, perhaps, just SSN and salary. (If one of the attributes is not a key,
potential duplicates are discarded.)
The relation name prefix could be dropped from those attributes that appear in only one of the
relations. If we have n1 tuples in r1 and n2 tuples in r2, then there are n1 * n2 ways of choosing
pairs of tuples, one from each relation.
A B AxB
a c x y a c x y
b d z i a c z i
j k a c j k
b d x y
b d z i
b d j k
Product is not used alone, but is used in order to get equi, theta or natural join
Cartesian product is not widely used but typically do a Join operation; takes two relations that
are not necessarily UC, "union compatible" (having the same tuple types) and creates tuples
with combined attributes - R(AttrR1, AttrR2, ... , AttrRi) x S (AttrS1, AttrS2, ... , AttrSj) results
in Q with Ri+Sj attributes, Q (AttrS1, AttrS2, ... , AttrSj, AttrR1, ... , AttrRi).
54
10.4.2 Union.
Given two "union compatible" (having the same tuple types; "UC") relations, it returns a new
relation consisting of the set unions. For instance, suppose Akin John is the new Chief
Executive Officer, CEO of two merged companies A and B and wants to see the total set of his
employees, the Union Operation is performed thus: A.EmployeeTable OR B.EmployeeTable.
Union builds a relation consisting of all tuples appearing in either or both of two specified
relations. The union of two relations r and s denoted by r s is the set of all tuples from r and
s. Duplicates are eliminated in order not to violate uniqueness rule.
Given U = (A1, A2), if A has three tuples and B, five tuples, then the result relation would have
8 tuples. However, the two relations of different degrees cannot be unionized. So also, relations
of different degrees cannot be found on intersection and difference.
For a union operation to be valid the following conditions must therefore be satisfied.
(i) The two relations must be of the same arity, same degree.
(ii) The domain of ith attribute in both relation must be the same for all i.
10.4.3 Intersection
Intersection creates a new relation by intersecting two UC relations. For instance, Akin John
wants a table of all organizations that are both vegetarian and raw foods in their orientation:
A.VegetarianOrganizations AND B.RawFoodOrganizations
The intersection builds a relation having the same elements from two relations of same degree.
The intersection of two relations r and s denoted by r s is the set of all tuples that are common
to r and s. The result is a relation consisting of all tuples that occur in both relations. The
intersection can be re-written with a pair of set difference operation as
r s = r – (r – s)
A good example of intersection is listing out customers having an accounts and are on loans
in a bank.
10.4.4 Difference
The Difference returns the set difference of two UC relations. For instance, Akin John wants
to look at a table of all restaurants in Ibadan that serve vegetarian food but not veal (meat).
This builds a relation consisting of all tuples appearing in the first but not in the second of
two specified relations.
A – B = C, this is not the same as B – A = C, unless A and B are the same. Difference is not
commutative.
A B A–B B–A
a c x y a c z I
b d z i b d j k
x y j k
55
10.4.4 Divide / Quotient ()
Divide takes two relations, one binary and the other unary (Degrees 2 and 1, respectively) and
builds a relation consisting of all values of one attribute of the binary relation that merge (in
the other attribute) all values in the unary relation. A Divide can only be done with two
relations, one of degree 2 and the other of degree 1.
A
Customer- Branch- C
name name B Customer-
Johnson Ibadan Branch- Name
Smith Ilorin Name Smith
Alabi Lagos Ilorin Dejo
Ayisat Abuja Oyo
Olu Owerri
Dejo Oyo
Mutiat Osogbo
Cartesian Products
RecNo Matric1 Age Matric2 Cleared Hall
1 1 20 2 Y Zik
2 1 20 3 N Awo
3 1 20 4 Y Kuti
*4 2 24 2 Y Zik
5 2 24 3 N Awo
6 2 24 4 Y Kuti
7 3 44 2 Y Zik
*8 3 44 3 N Awo
9 3 44 4 Y Kuti
56
Equi Join (EJ) occurs where Matric1 = Matric2. RecNos 4 and 8 satisfy the equi join, since the
matrics are the same.
For Natural Join (NJ), eliminate duplicate attributes, and then project from the Cartesian
product. So we have Matric1, age, cleared and hall. Matric2 is ignored.
NJ = P(R(CP[A,B])). i.e. perform Cartesian product first, restrict next and finally project
A Theta Join (TJ) goes through the same process but we specify a condition that is not an
equality, e.g. Matric1 > Matric2
Summary
In this Unit, the concept of relational algebra was discussed. The different operators in
relational algebra were highlighted and illustrated with examples. The relational algebra forms
the basis for the Structured Query Language (SQL).
57
Unit 11 Structured Query Language (SQL) – An Introduction
Expected Duration: 1 week or 2 contact hours
Introduction
The relational operators were proposed as means for manipulating relational databases for
indicating specific subsets of data tables, records or fields in the database for purposes of either:
(a) Retrieval .
(b) Update
(c) Defining views or virtual relations,
(d) Defining snapshot data
(e) Defining Access rights, who should have access and at what time
(g) Defining integrity constraints i.e. Defining some specific rules that the database or
section of it must satisfy.
..
The expressions or operators serve as high level and symbolic method of representing the user's
intent. The expressions also provide high-level representation of various database
manipulations that are required and' hence, should ideally be supported by a relational data
manipulation language such as SQL. In other words, the expressive power of any such language
can be assessed or evaluated in terms of how well and conveniently to the user, the language
can be used to implement each of the operators. .
Of course, if a language does not provide a high level equivalent of the operators, or of any of
the operators, then the database administrator would have to provide it or the user would have
to program at a lower level to achieve that functionality. The SQL commands were developed
by Dates et al at IBM.
SQL is both a Data Definition Language (DDL) and a Data Manipulation Language (DML).
As a DDL, it allows a database administrator or database designer to define tables, create views,
etc. As a DML, it allows an end user to retrieve information from tables. It came from an IBM
Research project entitled "SEQUEL" where the intent was to create a structured English-like
query language to interface to the early System R database system. Along with QUEL, SQL
was the first high level declarative database language.
Learning Outcomes
When you have studied this session, you should be able to explain:
11.1 Few SQL Commands
11.1.1 DDL- Data Definition Language
11.1.2 DML - Data Manipulation Language
11.1.3 DCL- Data Control Language
11.2 Creating Tables
11.3 SELECT STATEMENT (GENERAL FORMAT)
58
DROP TABLE
DROP VIEW To totally remove items from the table
DROP INDEX
ALTER TABLE, ALTER VIEW, ALTER INDEX - for updating
59
The check clause permits domains to be restricted.
10.2.2 CREATE UNIQUE INDEX STUDENT ON
Students (Matric) Cluster;
NOTE: For each table created, a unique index table must be created in DB2. Index is created
to speed up search for any data i.e. Information Retrieval. Primary key [unique identification
of each record] only enforces no duplication of records but does not enforce orderliness. The
index only enforces orderliness of the records.
60
SELECT loanNumber FROM Loan Where branchName = “Ibadan” AND amount > 1000
This means find all loan numbers for loans made at Ibadan branch with amount greater than
N1000.
The WHERE clause can involve logical connectives such as AND, OR and NOT. The operands
of the logical connectives can be expressions involving comparison operators <, >, =, >=, <=
and < >. Range of values for the operands in the comparison is specified by using the
BETWEEN keyword.
The FROM clause by itself defines a Cartesian product of the relations in the clause. Since
the natural join can be expressed in terms of a Cartesian product, a selection and a projection
we can easily write an SQL expression for natural join. An example is:
The ORDER BY clause lists items in ascending order by default. The sort order can however
be specified by specifying DESC for descending order or ASC for ascending order. Ordering
can also be performed on multiple attributes. For example, to list the entire loan table in
descending order of amount and to order loans that have the same amount by loanNumber, we
write SQL expression as:
The grouping variable takes precedence over other conditions e.g. Grouping Courses by Depts.
Grouping variable is used for large groups and then imposing other conditions.
Summary
Structured Query Language (SQL) was presented in this Unit. Every database creation,
modification and other manipulations are carried out using the SQL commands.
61
Unit 12 Structured Query Language (SQL) – Continuation
Expected Duration: 1 week or 2 contact hours
Introduction
In this unit, we shall continue the studying of SQL queries
Learning Outcomes
When you have studied this session, you should be able to explain:
The ability to join two or more tables is probably the most powerful features of relational
Database systems.
(1) The from ... where ... clause implements Restriction
(2) Select items clause implements projection
62
Equijoin - A field in one Table must equal to the field in another Table. If they are not equal,
then it is Theta Join.
Result:
Name Hall
A Awo
But instead of =, we put <, <>, >, then we have theta join-
WHERE Students.matric < Accomodation.matric
Lastly, when the condition is actually =, then we have natural join and then restrict all the
fields in the select field.
12.1.3 Intersection and Union for two compatible tables- tables of the same degree and
same fields; duplicate records would merge
12.1.4 Intersection
X Y
Matric Name Age Matric Name Age
SELECT X.matric, Y.Age 1 A 20 1 A 20
FROM X, Y 2 B 20 2 B 20
WHERE X.matric = Y.matric AND 3 C 30 4 D 40
X.Age = Y.Age AND
X.name = Y. name
63
For union and intersection, all the columns in the two tables must be the same, i.e, same
Domains. The Cartesian product is:
Summary
Structured Query Language (SQL) was presented in this Unit. Every database creation,
modification and other manipulations are carried out using the SQL commands.
You are interested in developing a product tracking system that will monitor the different
products a company is manufacturing, the sales on the products per day, the product type selling
most and the customers buying the products. Design a suitable RD model for the system.
64
Unit 13 Structured Query Language (SQL) – Continuation
Expected Duration: 1 week or 2 contact hours
Introduction
In this unit, we shall continue the studying of SQL queries
Learning Outcomes
When you have studied this session, you should be able to explain:
13.1 String Operations
13.2 Using IN in Nested Queries
13.3 Aggregate Functions in SQL .
13.4 Use of GROUP BY
13.5 Use of HAVING Clause
13.6 SQL Update Operations - Putting data into the database
13.7 DELETE Operation
13.8 General Format for Inserting a New Record
Pattern matching is the most commonly used string operation. This is achieved through the
LIKE operator. Two special characters are also used:
(i) Percent (%): the % character matches any substring
(ii) Underscore (_). The _ character matches any character.
It should be noted that patterns are case sensitive. For example, the query “Find all the names
of all customers whose street address include land”, can be expressed thus:
Escape character is used in patterns that contain special pattern character (i.e. %, _) to indicate
that the special character should be treated like a normal character. The escape character should
be placed before the special character in the string. For example, using backslash (\) as escape
character.
(i) LIKE “ab\%cd” escape “\” matches all strings with “ab%cd”
(ii) LIKE “ab\\cd%” escape “\” matches all strings beginning with “ab\cd”
SQL also permits a variety of functions on character strings, such as concatenation (using ||),
extracting substring, finding length of strings, converting between lowercase and uppercase,
etc.
Suggest the interpretation of the following statements:
65
or WHERE Matric like "19-21"
The UPDATE statement is used to change a value in a tuple without changing all values in the
66
tuple. Tuples to be updated are selected using query. The general format is as follows:
UPDATE TABLE
SET field = Scalar value expression e.g. SET State = "OYO'"
WHERE condition
UPDATE STUDENT
SET Hall = "Akinola"
WHERE
Hall = "AWO"
Suppose, accounts with balance over N10000 receive 6% while others receive 5%, we write
UPDATE account
SET balance = balance * 1.06
WHERE balance > 10000
UPDATE account
SET balance = balance * 1.05
WHERE balance <= 10000
DELETE FROM R
WHERE P
The DELETE command operates only on one relation. To delete tuples from several
relations, we must use one DELETE command for each operation.
67
13.8 General Format for Inserting a New Record
The INSERTION statement is used to insert one or more fields into a relation/table.
The general format is:
For example,
In the last two example, tuples are inserted into the relation based on the result of a query. It is
important that the SELECT statement be evaluated fully before any insertion is carried out.
This is necessary to avoid insertion of infinite number of tuples due to a request such as
Summary
Structured Query Language (SQL) was presented in this Unit. Every database creation,
modification and other manipulations are carried out using the SQL commands.
The Department is interested in an information system that will give information about
past projects done at Masters level, the titles, abstracts, authors and the supervisors.
68
Unit 14 Normalization of Relational Databases
Expected Duration: 1 week or 2 contact hours
Introduction
The theory of database normalization seeks to formalize the process of database design and
structuring. How do we know that a particular set of relations in a database have been properly
or improperly structured? Of course proper or improper structuring must be conditioned by
what will provide efficient, information loss-free and adequate storage, retrieval and update
functionality of the database.
Learning Outcomes
When you have studied this session, you should be able to explain:
14.1 Meaning of Database Normalization?
14.2 Basic Concepts
14.3 Database Problems without Normalization
14.3.1 Insertion Anomaly
14.3.2 Updating Anomaly
14.3.3 Deletion Anomaly
14.4 Normalization Rules
14.4.1 First (1st) Normal Form, INF
14.4.2 2nd Normal Form, 2NF
14.4.3 3rd Normal Form, 3NF
14.4.4 Boyce Codd Normal Form - BCNF
14.4.5 Fourth Normal Form (4NF)
Database design is about the design of database schema that would prove robust for all times
as the database is updated. We need to design the component relations of the database in such
a way that for all times, the database can be easily searched and updated without problems.
Database design is also about reflecting about the semantics of a data situation in the database,
so that the semantics become a constraint to be specified or to be met by the database. Age
domain in a company may be 25-55 and in others it could be 16-60. The semantic has changed
for age in this case. Put in another way, given the set of entities Ei and attributes of the entity
Aj, how do we group the attributes into headings of relations to achieve the objective of
efficient database design?
The process of arranging the attributes to form headers of relations is referred to as the
normalization of the relations. Different levels of normalization exist: 1NF, 2NF, 3NF, BCNF
69
etc. These are formal specifications of increasing rigour; to ensure that a database meets the
objectives of a good schema design.
1. Arranging data into logical groups such that each group describes a part of the whole
2. Minimizing the amount of duplicated data stored in a database
3. Building a database in which we can access and manipulate the data quickly and
efficiently without compromising the integrity of the data storage
4. Organising the data such that, when we modify it, we make the changes in only one
place
Normalization is a complex process with many specific rules and different intensity levels. In
its full definition, normalization is the process of discarding repeating groups, minimizing
redundancy, eliminating composite keys for partial dependency and separating non-key
attributes.
In simple terms, the rules for normalization can be summed up in a single phrase: "Each
attribute (column) must be a fact about the key, the whole key and nothing but the key".
Said another way, each table should describe only one type of entity (information).
Relational database theorists have divided normalisation into several rules called normal
forms:
Un-normalised data = repeating groups, inconsistent data, delete and insert anomalies.
First Normal Form (no repeating groups) = each cell of a table must contain a single
value, and the table must not contain repeating groups.
Second Normal Form (each column must depend on the entire primary key) =
must have met all of the database requirements for the 1st form, and data, which does
not directly depend on the table's primary key must be moved into another table.
Third Normal Form (each column must depend directly on the primary key) =
must have met all database requirements for both 1st and 2nd forms, and all fields that
can be derived from data contained in the other fields and tables must be removed.
NOTE: We must be able to reconstruct the original flat view of the data. If we violate this
rule, we will have defeated the purpose of normalizing the database.
(1) Primary key- Every relation must have a primary key containing a value unique to
every attribute of the relation, and therefore set to identify the relations.
(2) Candidate key- Those keys that have potential to uniquely identify a relation.
70
Primary key is taken from the candidate keys. The candidate keys normally have 1 to 1
correspondence in a relation.
(3) Functional Dependency: Given a relation Rl, attribute Y is said to be functionally
dependent on attribute X (Rx Ry) if and only if each X’s value in R has associated with
precisely one Y’s value in R at any one time. X and Y may be composite e.g. Address
may be functionally dependent on matric number i.e. once we know matric number, we
can know the address. Matric Address. (Address is functionally dependent on Matric,
or Matric determines Address)
(4) Transitive functional Dependency:
Given Matric Name Next of kin i.e. we may not be able to get next of kin through
matric but through name, which is functionally dependent on matric.
NOTE: Functional dependency depends on the data situation (semantic).
(5) Full Functional Dependency [FFD]
Attribute Y of relation R is said to be fully functionally dependent on attribute X from
the same relation if it is functionally dependent on X but not functionally dependent on
any proper subsets of X.
If a table is not properly normalized and have data redundancy then it will not only eat up extra
memory space but will also make it difficult to handle and update the database, without facing
data loss. Insertion, Updating and Deletion Anomalies are very frequent if database is not
normalized. To understand these anomalies let us take an example of a Student table.
In the table above, we have data of 4 Computer Science students. As we can see, data for the
fields Department, HOD (Head of Department) and office_tel is repeated for the students
who are in the same Department in the college, this is Data Redundancy.
Now, if we have to insert data of 100 students of same Department, then the Department
information will be repeated for all those 100 students. These scenarios are nothing
but Insertion anomalies.
71
14.3.2 Updating Anomaly
What if Dr. Chuks leaves the college? Or is no longer the HOD of computer science
department? In that case all the student records will have to be updated, and if by mistake we
miss any record, it will lead to data inconsistency. This is Updating anomaly.
In our Student table, two different information are kept together, Student information and
Department information. Hence, at the end of the academic year, if student records are deleted,
we will also lose the Department information. This is Deletion anomaly.
A relation is in 1NF iff (if and only if) all underline simple domains contain atomic values
only. This means if each of the attributes of a table has atomic domain. Atomic values are
taking as units that are indivisible.
The 1st (First) Normal Form is more like the Step 1 of the Normalization process. The 1st
Normal form expects us to design our table in such a way that it can easily be extended and it
is easier for us to retrieve data from it whenever required.
Previously we learned and understood how data redundancy or repetition can lead to several
issues like Insertion, Deletion and Updating anomalies and how Normalization can reduce
data redundancy and make the data more meaningful. If tables in a database are not even in the
1st Normal Form, it is considered as bad database design.
72
For example: If we have a column dob to save date of births of a set of people, then we cannot
or we must not save 'names' of some of them in that column along with 'date of birth' of others
in that column. It should hold only 'date of birth' for all the records/rows.
Address and Name are not atomistic in this case. They could be broken down as follows:
As another example, the relation person with attributes idNumber, name and diploma where
diploma is a set of diploma that a person possesses, is not in 1NF. But the relations
person(idNumber, name), and persDiploma(DipID, DipName) are in 1NF.
Another Example:
Although all the rules are self-explanatory, still let's take an example where we will create a
table to store student data which will have student's Matno, their name and the name of Courses
they have opted for. Here is our table, with some sample data added to it.
Our table already satisfies 3 rules out of the 4 rules, as all our column names are unique, we
have stored data in the order we wanted to and we have not inter-mixed different type of data
in columns.
But out of the 3 different students in our table, two have opted for more than one Course. And
we have stored the subject names in a single column. But as per the 1st Normal form each
column must contain atomic value.
73
Matno name Course
101 Akinola OS
101 Akinola CN
103 Clara Java
102 Aminat C
102 Aminat C++
By doing so, although a few values are getting repeated but values for the Course column are
now atomic for each record/row.
Using the First Normal Form, data redundancy increases, as there will be many columns with
same data in multiple rows but each row as a whole will be unique.
A relation is in 2NF if and only if (iff) it is in 1NF and every non-key attributes in the relation
is fully functionally dependent on the primary key, i.e., there should be no Partial Dependency.
This means that a relation in 2Nf is already in 1NF and any attribute not belonging to a key
does not depend on a part of the key.
In this table, Matno is the primary key and will be unique for every row, hence we can
use Matno to fetch any row of data from this table. Even for a case, where student names are
same, if we know the Matno, we can easily fetch the correct record.
Hence we can say a Primary Key for a table is the column or a group of columns (composite
key) which can uniquely identify each record in the table. We can ask from Department of
student with Matno 10, and we can get it. Similarly, if we ask for name of student
with Matno 10 or 11, we will get it. So all we need is Matno and every other
column depends on it, or can be fetched using it. This is Dependency and we also call
it Functional Dependency.
74
So, What is Partial Dependency?
Now that we know what dependency is, we are in a better state to understand what partial
dependency is. For a simple table like Student, a single column like Matno can uniquely
identify all the records in a table. But this is not true all the time. So now let's extend our
example to see if more than 1 column together can act as a primary key.
Matno Course
1 Java
2 C++
3 Php
Let's create another table for Subject, which will have Matno and Course_Title fields
and Course_Title will be the primary key. Now we have a Student table with student
information and another table Course for storing subject information.
Let's create another table Score, to store the marks obtained by students in the respective
subjects. We will also be saving name of the teacher who teaches that subject along with
marks.
Score Table
score_id Matno Course_id marks Lecturer
1 10 1 70 Java Teacher
2 10 2 75 C++ Teacher
3 11 1 80 Java Teacher
In the score table we are saving the Matno to know which student's marks are these
and Course_id to know for which Course the marks are for. Together, Matno +
Course_id forms a Candidate Key for this table, which can be the Primary key.
Possibly you are confused, how this combination can be a primary key? If we are asked to get
marks of student with Matno 10, can we get it from this table? No, because we don't know for
which Course. And if we are given the Course_id, we would not know for which student. Hence
we need Matno + Course_id to uniquely identify any row.
Now as we just discussed that the primary key for this table is a composition of two columns
which is Matno & Course_id but the Lecturer's name only depends on Course, hence
the Course_id, and marks has nothing to do with Matno. This is Partial Dependency, where
an attribute in a table depends on only a part of the primary key and not on the whole key.
75
The simplest solution is to remove columns Lecturer from Score table and add it to the Course
table. Hence, the Course table will become:
And our Score table is now in the second normal form, with no partial dependency.
Matric Course
68888 CSC 711
68888 CSC 712
68888 CSC 741
68911 CSC 711
NOTE: The matric and course would now both serve as primary key. It is an All key relation.
If we put course in the first table and we have 20 courses, it means each of the attributes will
be repeated 20 times for each student, which is data redundancy.
As another example, the relation stock(prodNo, depotNo,, label, quantity) with prodNo and
depotNo as joint key and functional dependencies F = {prodNo, depotNo quantity, prodNo
label} is not in 2NF. After normalization, we will have stock(prodNo, depotNo, quantity)
and product(prodNo, label).
A relation is in 3NF iff it is in 2NF and every non-key attribute is non-transitively dependent
on the primary key. Another way of stating this is that a relation is in 3NF iff the non-key
attributes (if any) are (a) Mutually independent and (b) fully dependent on the primary key.
76
i.e. "There is no functional dependency among the non-key attributes" i.e., any attribute not
belonging to a key does not depend on any other non-key attribute. For example,
R1 = (Al, A2, A3, A4, A5) There is transitivity among A3 and A4, which are different non-
key attributes in the same relation. Remove them to form another relation, so that we now have
two relations of the forms
Summarily, Third Normal Form is an upgrade to Second Normal Form. When a table is in the
Second Normal Form and has no transitive dependency, then it is in the Third Normal Form.
Let's use the same example, where we have 3 tables, Student, Course and Score.
Student Table
Matno Name Reg_no Course Address
10 Akinola 07-WY CSC Ibadan
11 Akinola 08-WY IT Ekiti
12 Kelechi 09-WY IT Enugu
Subject Table
Matno Course_name Lecturer
1 Java Java Teacher
2 C++ C++ Teacher
3 Php Php Teacher
Score Table
Score_id Matno Course_id Marks
1 10 1 70
2 10 2 75
3 11 1 80
77
In the Score table, we need to store some more information, which is the exam name and total
marks, so let's add 2 more columns to the Score table.
With exam_name and total_marks added to our Score table, it saves more data now. Primary key
for our Score table is a composite key, which means it's made up of two attributes or columns
→ Matno + Course_id.
Our new column exam_name depends on both student and Course. For example, a mechanical
engineering student will have Workshop exam but a computer science student won't. And for
some subjects we have Practical exams and for some we don't. So we can say that exam_name is
dependent on both Matno and Course_id.
And what about our second new column total_marks? Does it depend on our Score table's
primary key?
Well, the column total_marks depends on exam_name as with exam type the total score changes.
For example, practicals are of less marks while theory exams are of more marks.
But, exam_name is just another column in the score table. It is not a primary key or even a part
of the primary key, and total_marks depends on it. This is Transitive Dependency. When a non-
prime attribute depends on other non-prime attributes rather than depending upon the prime
attributes or primary key.
78
The new Exam table
exam_id exam_name total_marks
1 Workshop 200
2 Mains 70
3 Practicals 30
A relation is in BCNF iff every determinant is a candidate key. A determinant is any attribute
of a relation on which some other attributes are fully functionally dependent. Stated
alternatively, for a relation to be in BCNF, each of the non-candidate key attributes must be
fully functionally dependent on each of the candidate keys.
..
Of course, because each candidate key is in 1:1 relationship with each of the other candidate
keys, it means that once a non-candidate key is fully functionally dependent on one of the
candidate keys, it is by implication also dependent on all the other candidate keys, As an
illustration, consider the following relation schemas and their functional dependencies.
The Customer and Branch schemas are in BCNF, because the only nontrivial dependencies in
the schemas hold on their candidate keys.
The LoanInfo schema however is not in BCNF. First, note that loanNumber is not a superkey
for LoanInfo schema, since we could have a pair of tuples representing a single loan made to
two people, for example,
(Ibadan, Mr John, L100, 1000)
(Ibadan, Mrs Janet, L100, 1000)
79
Further Explanations on BCNF
Boyce and Codd Normal Form is a higher version of the Third Normal form. This form deals
with certain type of anomaly that is not handled by 3NF. A 3NF table which does not have
multiple overlapping candidate keys is said to be in BCNF.
Boyce-Codd Normal Form or BCNF is an extension to the third normal form, and is also known
as 3.5 Normal Form.
For a table to satisfy the Boyce-Codd Normal Form, it should satisfy the following two
conditions:
1. It should be in the Third Normal Form.
2. And, for any dependency A → B, A should be a super key.
The second point sounds a bit tricky, right? In simple words, it means, that for a dependency
A → B, A cannot be a non-prime attribute, if B is a prime attribute.
Example:
Below we have a college enrolment table with columns Matno, Course and professor.
As can be seen, we have also added some sample data to the table.
In the table above:
One student can enrol for multiple Courses. For example, student with Matno 101, has
opted for Courses - Java & C++
For each Course, a professor is assigned to the student.
And, there can be multiple professors teaching one Course like we have for Java.
Well, in the table above Matno, Course together form the primary key, because using Matno
and Course, we can find all the columns of the table.
One more important point to note here is, one professor teaches only one Course, but one
Course may have two different professors. Hence, there is a dependency
between Course and professor here, where Course depends on the professor name.
This table satisfies the 1st Normal form because all the values are atomic, column names are
unique and all the values stored in a particular column are of same domain. This table also
satisfies the 2nd Normal Form as there is no Partial Dependency. And, there is
no Transitive Dependency, hence the table also satisfies the 3rd Normal Form.
80
In the table above, Matno, Course form primary key, which means Course column is a prime
attribute.
Student Table
Matno Prof_id
101 1
101 2
and so on...
Professor Table
Prof_id Professor Course
1 Prof. Java Java
2 Prof. Cpp C++
and so on...
81
Example:
Below we have a college enrolment table with columns s_id, course and hobby.
As you can see in the table above, student with Matno 1 has opted for two
courses, Science and Maths, and has two hobbies, Cricket and Hockey.
One must be thinking what problem this can lead to, right?
Well the two records for student with Matno 1, will give rise to two more records, as shown
below, because for one student, two hobbies exists, hence along with both the courses, these
hobbies should be specified.
And, in the table above, there is no relationship between the columns course and hobby. They
are independent of each other. So there is multi-value dependency, which leads to un-necessary
repetition of data and other anomalies as well.
CourseOpted Table
Matno course
1 Science
1 Maths
2 C#
2 Php
82
And, Hobbies Table,
Matno hobby
1 Cricket
1 Hockey
2 Cricket
2 Hockey
Summarily... _
A determinant is any attribute, simple or composite, in which another attribute in a
relation is Fully Functionally Dependent, FFD. Candidate keys are mutually functionally
dependent on each other.
lNF - means Atomicity of attribute values. .
2NF- every non-key attributes is fully functionally dependent on the primary key. 2NF
necessitates a composite key.
3NF - No transitivity. Every non-primary key field must be functionally dependent on the
primary key.
BCNF = 3NF' Not substantially different from 3NF. It comes into play when we have
more than one candidate key in a relation. There is a subtle difference between the two.
Mark must be fully dependent on the two primary keys- This is Full Functional Dependency,
FFD. Partial FFD occurs if mark only depends on Matric e.g. if the same mark is awarded to
students depending on the course.
83
NOTE: from the above schema, Age is transitively dependent on matric through Name.
BCNF Tries to alleviate this problem of 3NF by saying that if Name is also candidate key, then
the problem is removed.
P A1 A2 A3
P A1 A2
A1 A3
Student(Matric, Name, Hall, Sex, Dept, faculty, Age, Coursel, Markl, Course2, Mark2,
Courses3, Mark3, NextKinl, NextKin2, Allergyl, Allergy2, Sportsl, Sports2, FeeTypelPaid,
AmounPaid., DatePaid, FeeType2Paid)
Develop a database schema for the above data situation that satisfy BCNF
Explain the concept of database anomalies citing references from the relation above.
(3) Give any two reasons why you think database normalization is very important. Discuss
the process of normalizing database tables up to 3rd Normal Form (3NF).
84
Unit 15 Database Security
Expected Duration: 1 week or 2 contact hours
Introduction
Data is a valuable entity that must have to be firmly handled and managed as with any
economic resource. So some part or all of the commercial data may have tactical importance
to their respective organization and hence must have to be kept protected and confidential.
There is a range of computer-based controls that are offered as countermeasures to these
threats. In this Unit, you will learn about the scope of database security.
Learning Outcomes
When you have studied this session, you should be able to explain:
15.1 Meaning of Database Security
15.2 Importance of Data Security
15.3 Threats to Database
15.3.1 Major Threats to Data Security
15.3.2 Types of Threats
15.4 Integrity Controls: Backups
15.5 Aspects of Data Security
15.6 Types of Security Control on Data
15.7 Database Security Best Practices
15.8 Role of Database Administrator in Data Security
15.9 How to Secure a Database Server
15.10 Security in SQLs
Database security involves ensuring that users of database system are allowed or not allowed
to do what they want on the database depending on their circumstances. It is the technique that
protects and secures the database against intentional or accidental threats. Security concerns
will be relevant not only to the data resides in an organization's database: the breaking of
security may harm other parts of the system, which may ultimately affect the database
structure.
Consequently, database security includes hardware parts, software parts, human resources, and
data. To efficiently do the uses of security needs appropriate controls, which are distinct in a
specific mission and purpose for the system. The requirement for getting proper security while
often having been neglected or overlooked in the past days; is now more and more thoroughly
checked by the different organizations.
85
These listed circumstances mostly signify the areas in which the organization should focus on
reducing the risk, that is, the chance of incurring loss or damage to data within a database. In
some conditions, these areas are directly related such that an activity that leads to a loss in one
area may also lead to a loss in another since all of the data within an organization are
interconnected.
By contrast, database integrity is concerned with ensuring that the things that users want to
do are correct in terms of not damaging the accuracy or validity of the data in the database.
Data security is critical for most business and even home computer users. Client information,
payment information, personal files, bank account details- all this information can be hard to
replace and potentially dangerous if it falls into the wrong hands. Data lost due to disaster such
as a flood of fire is crushing, but losing it to hackers or a malware infection can have much
greater consequences.
Any situation or event, whether intentionally or incidentally, can cause damage, which can
reflect an adverse effect on the database structure and, consequently, the organization. A threat
may occur by a situation or event involving a person or the action or situations that are probably
to bring harm to an organization and its database.
The degree that an organization undergoes as a result of a threat's following which depends
upon some aspects, such as the existence of countermeasures and contingency plans. Let us
take an example where you have a hardware failure that occurs corrupting secondary storage;
all processing activity must cease until the problem is resolved.
An insider threat is a security risk from one of the following three sources, each of which has
privileged means of entry to the database:
A malicious insider with ill-intent
A negligent person within the organization who exposes the database to attack through
careless actions
86
An outsider who obtains credentials through social engineering or other methods, or
gains access to the database’s credentials
An insider threat is one of the most typical causes of database security breaches and it often
occurs because a lot of employees have been granted privileged user access.
Weak passwords, password sharing, accidental erasure or corruption of data, and other
undesirable user behaviors are still the cause of almost half of data breaches reported.
Attackers constantly attempt to isolate and target vulnerabilities in software, and database
management software is a highly valuable target. New vulnerabilities are discovered daily, and
all open source database management platforms and commercial database software vendors
issue security patches regularly. However, if we don’t use these patches quickly, our database
might be exposed to attack. Even if we do apply patches on time, there is always the risk
of zero-day attacks, when attackers discover a vulnerability, but it has not yet been discovered
and patched by the database vendor.
A database-specific threat involves the use of arbitrary non-SQL and SQL attack strings into
database queries. Typically, these are queries created as an extension of web application forms,
or received via HTTP requests. Any database system is vulnerable to these attacks, if
developers do not adhere to secure coding practices, and if the organization does not carry out
regular vulnerability testing.
Buffer overflow takes place when a process tries to write a large amount of data to a fixed-
length block of memory, more than it is permitted to hold. Attackers might use the excess data,
kept in adjacent memory addresses, as the starting point from which to launch attacks.
In a Denial of Service (DoS) attack, the cybercriminal overwhelms the target service - in this
instance, the database server - using a large amount of fake requests. The result is that the server
cannot carry out genuine requests from actual users, and often crashes or becomes unstable.
In a Distributed Denial of Service Attack (DDoS), fake traffic is generated by a large number
of computers, participating in a botnet controlled by the attacker. This generates very large
traffic volumes, which are difficult to stop without a highly scalable defensive architecture.
Cloud-based DDoS protection services can scale up dynamically to address very large DDoS
attacks.
87
(7) Malware
Backing up is the process of copying and archiving of computer data so it may be used to
restore the original after a data loss event.
Backups have two distinct purposes:
The primary purpose is to recover data after its loss, be it by data deletion or corruption.
The secondary purpose of backups is to recover data from an earlier time, according to
a user-defined data retention policy, typically configured within a backup application
for how long copies of data are required. Backup is just one of the disaster recovery
plans.
88
2. Auditing
Database auditing involves observing a database so as to be aware of the actions of
database users. Database administrators and consultants often set up auditing
for security purposes, for example, to ensure that those without the permission to access
information do not access it.
3. Authentication
This is the validation control that allows login into a system, email or blog account etc.
Once logged in, we have various privileges until logging out. Some systems will cancel a
session if the machine has been idle for a certain amount of time, requiring that we prove
authentication once again to re-enter. We can log in using multiple factors such as a
password, a smart card or even a fingerprint.
4. Encryption
This security mechanism uses mathematical scheme and algorithms to scramble data into
unreadable text. It can only be decoded or decrypted by the party that possesses the
associated key.
5. Back Up
This is the process of making copy and archiving of computer data in the event
of data loss which is used to restore the original data. Every Database Management
System should offer backup facilities to help with the recovery of a database after a
failure. It is always suitable to make backup copies of the database and log files at the
regular period and for ensuring that the copies are in a secure location. In the event of a
failure that renders the database unusable, the backup copy and the details captured in the
log file are used to restore the database to the latest possible consistent state.
6. Password
This is sequence of secret characters used to enable access to a file,
program, computer system and other resources.
89
2. Separate database servers
Databases require specialized security measures to keep them safe from cyberattacks.
Furthermore, having our data on the same server as our site also exposes it to different attack
vectors that target websites.
Suppose we run an online store and keep our site, non-sensitive data and sensitive data on the
same server. Sure, we can use website security measures provided by the hosting service and
the eCommerce platform’s security features to protect against cyberattacks and fraud.
However, our sensitive data is now vulnerable to attacks through the site and the online store
platform. Any attack that breaches either our site or the online store platform enables the
cybercriminal to potentially access our database, as well.
To mitigate these security risks, we separate our database servers from everything else.
Additionally, we use real-time Security Information and Event Monitoring (SIEM), which is
dedicated to database security and allows organizations to take immediate action in the event
of an attempted breach.
3. Set up an HTTPS proxy server
A proxy server evaluates requests sent from a workstation before accessing the database server.
In a way, this server acts as a gatekeeper that aims to keep out non-authorized requests.
The most common proxy servers are based on HTTP. However, if we are dealing with sensitive
information such as passwords, payment information or personal information, we set up an
HTTPS server. This way, the data traveling through the proxy server is also encrypted, giving
us an additional security layer.
4. Avoid using default network ports
TCP and UDP protocols are used when transmitting data between servers. When setting up
these protocols, they automatically use default network ports.
Default ports are often used in brute force attacks due to their common occurrence. When not
using the default ports, the cyber attacker who targets the server must try different port number
variations with trial and error. This could discourage the assailant from prolonging their attack
attempts due to the additional work that is needed.
However, when assigning a new port, we check the Internet Assigned Numbers
Authority’s port registry to ensure the new port isn’t used for other services.
5. Use real-time database monitoring
Actively scanning a database for breach attempts bolsters (boosts) security and allows us to
react to potential attacks. We can use monitoring software such as Tripwire’s real-time File
Integrity Monitoring (FIM) to log all actions taken on the database’s server and alert us of any
breaches. Furthermore, we set up escalation protocols in case of potential attacks to keep our
sensitive data even safer.
Another aspect to consider is regularly auditing the database security and organizing
cybersecurity penetration tests. These allow us to discover potential security loopholes and
patch them before a potential breach.
6. Use database and web application firewalls
Firewalls are the first layer of defense for keeping out malicious access attempts. On top of
protecting a site, we should also install a firewall to protect the database against different attack
vectors.
90
There are three types of firewalls commonly used to secure a network:
Packet filter firewall
Stateful packet inspection (SPI)
Proxy server firewall
We make sure to configure our firewall to cover any security loopholes correctly. It is also
essential to keep our firewalls updated, as this protects our site and database against new
cyberattack methods.
7. Deploy data encryption protocols
Encrypting our data isn’t just important when keeping our trade secrets; it is also essential when
moving or storing sensitive user information. Setting up data encryption protocols lowers the
risk of a successful data breach. This means that even if cybercriminals get a hold of our data,
that information remains safe.
8. Create regular backups of the database
While it is common to create backups of our website, it is essential to create backups for our
database regularly, as well. This mitigates the risk of losing sensitive information due to
malicious attacks or data corruption.
Here is how to create database backups on the most popular servers: Windows and Linux. Also,
to further increase security, ensure that the backup is stored and encrypted in a separate server.
This way, our data is recoverable and safe if the primary database server gets compromised or
remains inaccessible.
9. Keep applications up to date
Research shows that nine in 10 applications contain outdated software components.
Furthermore, analysis on WordPress plugins revealed that 17,383 plugins hadn’t been updated
for two years, 13,655 for three years and 3,990 for seven years (reported as at early 2022).
Together, this creates a serious security risk when thinking about software that we use to
manage our database or even run our website.
While we should only use trusted and verified database management software, we should also
keep it updated and install new patches when they become available. The same goes for
widgets, plugins and third-party applications, with an additional suggestion to avoid the ones
that have not received regular updates. Steer clear of them altogether.
10. Use strong user authentication
According to Verizon’s most recent research, 80% of data breaches are caused by
compromised passwords. This shows that passwords alone are not a great security measure,
primarily because of the human-error aspect of creating strong passwords.
To combat this issue and add another layer of security to a database, we set up a multi-factor
authentication process. (This method isn’t perfect because of recent trends.) Even if credentials
get compromised, cyber criminals will have a difficult time going around this security protocol.
Also, we consider only allowing validated Internet Protocol (IP) addresses to access the
database to mitigate the risk of a potential breach further. While IP addresses can be copied or
masked, it requires additional effort from the assailant.
91
11. Enhance database security to mitigate the risks of a data breach
Keeping the database secure against malicious attacks is a multi-faceted endeavor, from the
servers’ physical location to mitigating the risk of human error. Even though data breaches are
becoming more frequent, maintaining healthy security protocols lowers the risk of being
targeted and helps to avoid a successful breach attempt.
92
If we do rely on a web hosting service to manage our database, we should ensure that it is a
company with a strong security track record. It is best to stay clear of free hosting services due
to the possible lack of security.
If we manage our database in an on-premise data center, we keep in mind that our data center
is also prone to attacks from outsiders or insider threats. We must ensure we have physical
security measures, including locks, cameras, and security personnel in our physical facility.
Any access to physical servers must be logged and only granted to authorized individuals.
In addition, we should not leave database backups in locations that are publicly accessible, such
as temporary partitions, web folders, or unsecured cloud storage buckets.
2. Lock Down Accounts and Privileges
Let us consider the Oracle database server. After the database is installed, the Oracle Database
Configuration Assistant (DBCA) automatically expires and locks most of the default database
user accounts.
If we install an Oracle database manually, this doesn’t happen and default privileged accounts
won’t be expired or locked. Their password stays the same as their username, by default.
An attacker will try to use these credentials first to connect to the database.
It is critical to ensure that every privileged account on a database server is configured with a
strong, unique password. If accounts are not needed, they should be expired and locked.
For the remaining accounts, access has to be limited to the absolute minimum required. Each
account should only have access to the tables and operations (for example, SELECT or
INSERT) required by the user. We should avoid creating user accounts with access to every
table in the database.
3. Regularly Patch Database servers
We must ensure that patches remain current. Effective database patch management is a crucial
security practice because attackers are actively seeking out new security flaws in databases,
and new viruses and malware appear on a daily basis. A timely deployment of up-to-date
versions of database service packs, critical security hotfixes, and cumulative updates will
improve the stability of database performance.
4. Disable Public Network Access
Organizations store their applications in databases. In most real-world scenarios, the end-user
doesn’t require direct access to the database. Thus, we should block all public network access
to database servers unless we are a hosting provider. Ideally, an organization should set up
gateway servers (VPN or SSH tunnels) for remote administrators.
5. Encrypt All Files and Backups
Irrespective of how solid our defenses are, there is always a possibility that a hacker may
infiltrate our system. Yet, attackers are not the only threat to the security of our database. Our
employees may also pose a risk to our business. There is always the possibility that a malicious
or careless insider will gain access to a file they don’t have permission to access.
Encrypting our data makes it unreadable to both attackers and employees. Without an
encryption key, they cannot access it, this provides a last line of defense against
unwelcome intrusions. Encrypt all-important application files, data files, and backups so that
unauthorized users cannot read your critical data.
93
15.10 Security in SQLs
Method of shielding data from unauthorized access in SQL - 1st approach- constraints view
INVOKE will over write the existing grant. Right type could be Select, update, delete, insert
Another example:
GRANT SELECT
ON TABLE X
TO (USERNAMES)
WITH GRANT OPTION
Option means the user can also grant the same right to another people.
Summary
While security refers to the protection of data against un-authorized disclosure, alteration or
destruction; integrity refers to the validity or meaningfulness of the data. However, both data
security and integrity are ensured through constraints on the access and
update operation that different users may perform on different objects in the database. Tables,
Queries, views, reports, indexes and the database itself are objects of the database. The
constraints might be provided by the DBMS or might have to be specified by the DBA in line
with an organization's policy on database security.
94
Self-Assessment Question (SAQs)
95
Unit 16 Database Transactions and Concurrency Controls
Introduction
Earlier, you have learned about the functions that the DBMS should have. Among these, some
closely related functions are proposed to make sure that any database should be reliable and
remain in a steady state.
Transaction support
Concurrency Control
Recovery services
Although each function can be discussed discretely, they are mutually dependent. The
reliability and consistency must be maintained in the presence of failures of both hardware and
software components and when several users are accessing the database.
Many DBMSs allow users to carry out simultaneous operations on the database. If these
operations are not restricted, the accesses may get in the way with one another, and the database
can become incompatible. For defeating this problem, the DBMS implements a concurrency
control technique using a protocol that prevents database accesses from prying with one
another. In this Unit, you will learn about the concurrency control and transaction support for
any centralized DBMS that consists of a single database.
Learning Outcomes
At the end of this Unit, you should be able to understand and explain:
A transaction can be defined as a logical unit of work on the database. This may be an entire
program, a piece of a program, or a single command (like the SQL commands such as INSERT
or UPDATE), and it may engage in any number of operations on the database. In the database
context, the execution of an application program can be thought of as one or more transactions
with non-database processing taking place in between.
96
Akinola’s account to Hammed’s account, a series of tasks gets performed in the background of
the screen.
This straightforward and small transaction includes several steps: decrease Akinola's bank
account from N5000:
Open_Acc (Akinola)
OldBal = Akinola.bal
Ram.bal = NewBal
CloseAccount(Akinola)
One can say, the transaction involves many tasks, such as opening the account of Akinola,
reading the old balance, decreasing the specific amount of N5000 from that account, saving
new balance to an account of Akinola, and finally closing the transaction session.
For adding amount N5000 in Hammed's account, the same sort of tasks needs to be done:
OpenAccount(Hammed)
Old_Bal = Hammed.bal
Ahmed.bal = NewBal
CloseAccount(Hammed)
Consistency: A transaction must alter the database from one steady-state to another
steady state. This is the responsibility of both the DBMS and the application developers
to make certain consistency. The DBMS can ensure consistency by putting into effect
all the constraints that have been mainly on the database schema such as integrity and
enterprise constraints.
Isolation: Transactions that are executing independently of one another is the primary
concept followed by isolation. In other words, the frictional effects of incomplete
transactions should not be visible or come into notice to other transactions going on
97
simultaneously. It is the responsibility of the concurrency control sub-system to ensure
adapting the isolation.
It is to be noted that the transaction is very closely related to concurrency control. Concurrency
Controls in Database means the method of managing concurrent operations on the database
without getting any obstruction with one another.
A key purpose in developing a database is to facilitate multiple users to access shared data in
parallel (i.e., at the same time). Concurrent accessing of data is comparatively easy when all
users are only reading data, as there is no means that they can interfere with one another.
However, when multiple users are accessing the database at the same time, and at least one is
updating data, there may be the case of interference, which can result in data inconsistencies.
Concurrency control technique implements some protocols which can be broadly classified
into two categories. These are:
1. Lock-based protocol: Those database systems that are prepared with the concept of
lock-based protocols employ a mechanism where any transaction cannot read or write
data until it gains a suitable lock on it.
98
Practical Tasks on Microsoft Access RDBMS
Table Customer:
Table Product:
99
(l) Save the table as Product.
3. The Third table will be a relationship (look-up) table for the initial tables
Customer and Product. Note that both CustID and PID from the base tables
will jointly form the primary key for this relationship table. So, we set them
up to “look up to their sources”. We shall call it Customer_Product Table. A
sample of it looks like the one below:
100
Where to type SQL statements in Microsoft Access "2007", "2010", "2013"
or Access "2016"
http://www.jaffainc.com/SQLStatementsInAccess.htm#Access2007
Note: If you are selecting an existing database (i.e the downloaded course
database), browse (locate where you saved the database on your computer) for the
database after you click "more".
2. Once Access opens, Click “Create” from the menu running across the top of the
screen.
4. You'll see a “Show Table” dialog box. Click close on this dialog box without
selecting any tables.
5. Select the “SQL View” or “SQL” button near the top left of the screen.
6. Use the "SQL View" or “SQL” button to select “SQL View”. (Click the down
arrow located on this button to locate “SQL View”).
101
SQL Tutorials (Microsoft Access SQL)
1. SELECT Statement
Instructs the Microsoft Access database engine to return information from the database as a set
of records.
Syntax
SELECT [predicate] { * | table.* | [table.]field1 [AS alias1] [, [table.]field2 [AS alias2] [,
…]]} FROM tableexpression [, …] [IN externaldatabase] [WHERE… ] [GROUP
BY… ] [HAVING… ] [ORDER BY… ] [WITH OWNERACCESS OPTION]
The SELECT statement has these parts:
Part Description
* Specifies that all fields from the specified table or tables are selected.
table The name of the table containing the fields from which records are
selected.
field1, field2 The names of the fields containing the data you want to retrieve. If you
include more than one field, they are retrieved in the order listed.
alias1, alias2 The names to use as column headers instead of the original column
names in table.
tableexpression The name of the table or tables containing the data you want to retrieve.
externaldatabase The name of the database containing the tables in tableexpression if they
are not in the current database.
Remarks
To perform this operation, the Microsoft® Jet database engine searches the specified table or
tables, extracts the chosen columns, selects rows that meet the criterion, and sorts or groups the
resulting rows into the order specified.
SELECT statements do not change data in the database.
SELECT is usually the first word in an SQL statement. Most SQL statements are either
SELECT or SELECT…INTO statements.
The minimum syntax for a SELECT statement is:
SELECT fields FROM table
102
You can use an asterisk (*) to select all fields in a table. The following example selects all of
the fields in the Employees table:
If a field name is included in more than one table in the FROM clause, precede it with the table
name and the . (dot) operator. In the following example, the Department field is in both the
Employees table and the Supervisors table. The SQL statement selects departments from the
Employees table and supervisor names from the Supervisors table:
When a Recordset object is created, the Microsoft Jet database engine uses the table's field
name as the Field object name in the Recordset object. If you want a different field name or a
name is not implied by the expression used to generate the field, use the AS reserved word.
The following example uses the title Birth to name the returned Field object in the
resulting Recordset object:
You can use the other clauses in a SELECT statement to further restrict and organize your
returned data. For more information, see the Help topic for the clause you are using.
Example
Some of the following examples assume the existence of a hypothetical Salary field in an
Employees table. Note that this field does not actually exist in the Northwind database
Employees table.
SELECT Count(PostalCode) AS Tally FROM Customers;
SELECT Count (*) AS TotalEmployees, Avg(Salary) AS AverageSalary, Max(Salary) AS
MaximumSalary FROM Employees;
103
2. WHERE Clause
Specifies which records from the tables listed in the FROM clause are affected by
SELECT, UPDATE, or DELETE statement.
Syntax
SELECT fieldlist FROM tableexpression WHERE criteria
A SELECT statement containing a WHERE clause has these parts:
Part Description
fieldlist The name of the field or fields to be retrieved along with any field-name
aliases, selection predicates (ALL, DISTINCT, DISTINCTROW, or
TOP), or other SELECT statement options.
tableexpression The name of the table or tables from which data is retrieved.
criteria An expression that records must satisfy to be included in the query results.
Remarks
The Microsoft Access database engine selects the records that meet the conditions listed in the
WHERE clause. If you do not specify a WHERE clause, your query returns all rows from the
table. If you specify more than one table in your query and you have not included a WHERE
clause or a JOIN clause, your query generates a Cartesian product of the tables.
WHERE is optional, but when included, follows FROM. For example, you can select all
employees in the sales department
(
)
or all customers between the ages of 18 and 30 (
).
If you do not use a JOIN clause to perform SQL join operations on multiple tables, the
resulting Recordset object will not be updatable.
WHERE is similar to HAVING. WHERE determines which records are selected. Similarly,
once records are grouped with GROUP BY, HAVING determines which records are displayed.
Use the WHERE clause to eliminate records you do not want grouped by a GROUP BY clause.
104
Use various expressions to determine which records the SQL statement returns. For example,
the following SQL statement selects all employees whose salaries are more than N21,000:
When you specify the criteria argument, date literals must be in U.S. format, even if you are
not using the U.S. version of the Microsoft® Jet database engine. For example, May 10, 1996,
is written 10/5/96 in the United Kingdom and 5/10/96 in the United States. Be sure to enclose
your date literals with the number sign (#) as shown in the following examples.
To find records dated May 10, 1996 in a United Kingdom database, you must use the following
SQL statement:
You can also use the DateValue function which is aware of the international settings
established by Microsoft Windows®. For example, use this code for the United States:
Note
If the column referenced in the criteria string is of type GUID, the criteria expression uses a
slightly different syntax:
105
Example
The following example assumes the existence of a hypothetical Salary field in an Employees
table. Note that this field does not actually exist in the Northwind database Employees
table.This example selects the LastName and FirstName fields of each record in which the last
name is King.
3. FROM Clause
Specifies the tables or queries that contain the fields listed in the SELECT statement
Syntax:
SELECT fieldlist FROM tableexpression [IN externaldatabase]
A SELECT statement containing a FROM clause has these parts:
Part Description
fieldlist The name of the field or fields to be retrieved along with any field-name
aliases, SQL aggregate functions, selection predicates (ALL,
DISTINCT, DISTINCTROW, or TOP), or other SELECT statement
options.
tableexpression An expression that identifies one or more tables from which data is
retrieved. The expression can be a single table name, a saved query
name, or a compound resulting from an INNER JOIN, LEFT
JOIN, or RIGHT JOIN.
externaldatabase The full path of an external database containing all the tables
in tableexpression.
Remarks
FROM is required and follows any SELECT statement.The order of the table names
in tableexpression is not important.For improved performance and ease of use, it is
recommended that you use a linked table instead of an IN clause to retrieve data from an
external database.
The following example shows how you can retrieve data from the Employees table:
Example
Some of the following examples assume the existence of a hypothetical Salary field in an
Employees table. Note that this field does not actually exist in the Northwind database
Employees table.
106
SELECT LastName,FirstName FROM Employees;
This next example counts the number of records that have an entry in the PostalCode field and
names the returned field Tally.
SELECT Count(PostalCode) AS Tally FROM Customers;
This example shows the number of employees and the average and maximum salaries.
SELECT Count (*) AS TotalEmployees, Avg(Salary) AS AverageSalary, Max(Salary) AS
MaximumSalary FROM Employees;
4. GROUP BY Clause
This combines records with identical values in the specified field list into a single record. A
summary value is created for each record if you include an SQL aggregate function, such
as Sum or Count, in the SELECT statement.
Syntax
SELECT fieldlist FROM table WHERE criteria [GROUP BY groupfieldlist]
A SELECT statement containing a GROUP BY clause has these parts:
Part Description
fieldlist The name of the field or fields to be retrieved along with any field-name
aliases, SQL aggregate functions, selection predicates (ALL, DISTINCT,
DISTINCTROW, or TOP), or other SELECT statement options.
table The name of the table from which records are retrieved. For more
information, see the FROM clause.
groupfieldlist The names of up to 10 fields used to group records. The order of the field
names in groupfieldlist determines the grouping levels from the highest to
the lowest level of grouping.
107
statement, provided the SELECT statement includes at least one SQL aggregate function. The
Microsoft® Jet database engine cannot group on Memo or OLE Object fields.
All fields in the SELECT field list must either be included in the GROUP BY clause or be
included as arguments to an SQL aggregate function.
Example
This example creates a list of unique job titles and the number of employees with each title.
SELECT Title, Count([Title]) AS Tally FROM Employees GROUP BY Title;
For each unique job title, this example calculates the number of employees in Ibadan who have
that title.
SELECT Title, Count(Title) AS Tally FROM Employees WHERE Region = 'WA' GROUP
BY Title;
5. HAVING Clause
This specifies which grouped records are displayed in a SELECT statement with a GROUP
BY clause. After GROUP BY combines records, HAVING displays any records grouped by
the GROUP BY clause that satisfy the conditions of the HAVING clause.
Syntax
SELECT fieldlist FROM table WHERE selectcriteria GROUP
BY groupfieldlist [HAVING groupcriteria]
A SELECT statement containing a HAVING clause has these parts:
Part Description
fieldlist The name of the field or fields to be retrieved along with any field-name
aliases, SQL aggregate functions, selection predicates (ALL, DISTINCT,
DISTINCTROW, or TOP), or other SELECT statement options.
table The name of the table from which records are retrieved. For more
information, see the FROM clause.
selectcriteria Selection criteria. If the statement includes a WHERE clause, the Microsoft
Access database engine groups values after applying the WHERE
conditions to the records.
groupfieldlist The names of up to 10 fields used to group records. The order of the field
names in groupfieldlist determines the grouping levels from the highest to
the lowest level of grouping.
108
HAVING is similar to WHERE, which determines which records are selected. After records
are grouped with GROUP BY, HAVING determines which records are displayed:
6. ORDER BY Clause
This sorts a query's resulting records on a specified field or fields in ascending or descending
order.
Syntax:
SELECT fieldlist FROM table WHERE selectcriteria [ORDER BY field1 [ASC |
DESC ][, field2 [ASC | DESC ]][, …]]]
Part Description
fieldlist The name of the field or fields to be retrieved along with any field-name
aliases, SQL aggregate functions, selection predicates (ALL, DISTINCT,
DISTINCTROW, or TOP), or other SELECT statement options.
table The name of the table from which records are retrieved. For more
information, see the FROM clause.
selectcriteria Selection criteria. If the statement includes a WHERE clause, the Microsoft
Access database engine orders values after applying the WHERE conditions
to the records.
Remarks:
ORDER BY is optional. However, if you want your data displayed in sorted order, then you
must use ORDER BY.
109
The default sort order is ascending (A to Z, 0 to 9). Both of the following examples sort
employee names in last name order:
To sort in descending order (Z to A, 9 to 0), add the DESC reserved word to the end of each
field you want to sort in descending order. The following example selects salaries and sorts
them in descending order:
If you specify a field containing Memo or OLE Object data in the ORDER BY clause, an error
occurs. The Microsoft Jet database engine does not sort on fields of these types.
ORDER BY is usually the last item in an SQL statement.
You can include additional fields in the ORDER BY clause. Records are sorted first by the first
field listed after ORDER BY. Records that have equal values in that field are then sorted by
the value in the second field listed, and so on.
Example
The SQL statement shown in the following example uses the ORDER BY clause to sort records
by last name in descending order (Z-A).
SELECT LastName,FirstName FROM Employees ORDER BY LastName DESC;
Part Description
110
SELECT ALL * FROM Employees ORDER BY
EmployeeID;
SELECT * FROM Employees ORDER BY EmployeeID;
SELECT ALL *
FROM Employees
ORDER BY EmployeeID;
SELECT *
FROM Employees
ORDER BY EmployeeID;
DISTINCT Omits records that contain duplicate data in the selected fields.
To be included in the results of the query, the values for each
field listed in the SELECT statement must be unique. For
example, several employees listed in an Employees table may
have the same last name. If two records contain Smith in the
LastName field, the following SQL statement returns only one
record that contains Smith:
SELECT DISTINCT
LastName
FROM Employees;
DISTINCTROW Omits data based on entire duplicate records, not just duplicate
fields. For example, you could create a query that joins the
Customers and Orders tables on the CustomerID field. The
Customers table contains no duplicate CustomerID fields, but
the Orders table does because each customer can have many
orders. The following SQL statement shows how you can use
DISTINCTROW to produce a list of companies that have at
least one order but without any details about those orders:
111
DISTINCTROW has an effect only when you select fields
from some, but not all, of the tables used in the query.
DISTINCTROW is ignored if your query includes only one
table, or if you output fields from all tables.
SELECT DISTINCTROW
CompanyName
FROM Customers INNER
JOIN Orders
ON
Customers.CustomerID =
Orders.CustomerID
ORDER BY
CompanyName;
TOP n [PERCENT] Returns a certain number of records that fall at the top or the
bottom of a range specified by an ORDER BY clause. Suppose
you want the names of the top 25 students from the class of
1994:
SELECT TOP 25 FirstName, LastName FROM Students
WHERE GraduationYear = 1994 ORDER BY
GradePointAverage DESC;
SELECT TOP 25
FirstName, LastName
112
FROM Students
WHERE GraduationYear
= 1994
ORDER BY
GradePointAverage
DESC;
SELECT TOP 10
PERCENT
FirstName, LastName
FROM Students
WHERE GraduationYear
= 1994
ORDER BY
GradePointAverage ASC;
table The name of the table from which records are retrieved.
Example
This example creates a query that joins the Customers and Orders tables on the CustomerID
field. The Customers table contains no duplicate CustomerID fields, but the Orders table does
because each customer can have many orders. Using DISTINCTROW produces a list of
companies that have at least one order but without any details about those orders.
SELECT DISTINCTROW CompanyName FROM Customers
INNER JOIN Orders ON Customers.CustomerID = Orders.CustomerID
ORDER BY CompanyName;
8. DELETE Statement
This creates a delete query that removes records from one or more of the tables listed in
the FROM clause that satisfy the WHERE clause.
Syntax:
DELETE [table.*] FROM table WHERE criteria
The DELETE statement has these parts:
Part Description
table The optional name of the table from which records are deleted.
table The name of the table from which records are deleted.
113
criteria An expression that determines which records to delete.
Remarks:
DELETE is especially useful when you want to delete many records.
To drop an entire table from the database, you can use the Execute method with
a DROP statement. If you delete the table, however, the structure is lost. In contrast, when you
use DELETE, only the data is deleted; the table structure and all of the table properties, such
as field attributes and indexes, remain intact.
You can use DELETE to remove records from tables that are in a one-to-many relationship
with other tables. Cascade delete operations cause the records in tables that are on the many
side of the relationship to be deleted when the corresponding record in the one side of the
relationship is deleted in the query.
For example, in the relationship between the Customers and Orders tables, the Customers table
is on the one side and the Orders table is on the many side of the relationship. Deleting a record
from Customers results in the corresponding Orders records being deleted if the cascade delete
option is specified.
A delete query deletes entire records, not just data in specific fields. If you want to delete values
in a specific field, create an update query that changes the values to Null.
Important:
After you remove records using a delete query, you cannot undo the operation. If you
want to know which records were deleted, first examine the results of a select query
that uses the same criteria, and then run the delete query.
Maintain backup copies of your data at all times. If you delete the wrong records, you
can retrieve them from your backup copies.
Example
This example deletes all records for employees whose title is Trainee. When the FROM clause
includes only one table, you do not have to list the table name in the DELETE statement.
DELETE * FROM Employees WHERE Title = 'Trainee';
9. IN Clause
This identifies tables in any external database to which the Microsoft Access database engine
can connect, such as a dBASE or Paradox database or an external Microsoft® Access database
engine database.
Syntax:
To identify a destination table:
[SELECT | INSERT] INTO destination IN {path | ["path" "type"] | ["" [type; DATABASE
= path]]}
To identify a source table:
114
FROM tableexpression IN {path | ["path" "type"] | ["" [type; DATABASE = path]]}
Part Description
destination The name of the external table into which data is inserted.
tableexpression The name of the table or tables from which data is retrieved. This
argument can be a single table name, a saved query, or a compound
resulting from an INNER JOIN, LEFT JOIN, or RIGHT JOIN.
path The full path for the directory or file containing table.
type The name of the database type used to create table if a database is not a
Microsoft Access database engine database (for example, dBASE III,
dBASE IV, Paradox 3.x, or Paradox 4.x).
Remarks:
You can use IN to connect to only one external database at a time.
In some cases, the path argument refers to the directory containing the database files. For
example, when working with dBASE, Microsoft FoxPro®, or Paradox database tables,
the path argument specifies the directory containing .dbf or .db files. The table file name is
derived from the destination or tableexpression argument.
To specify a non-Microsoft Access database engine database, append a semicolon (;) to the
name, and enclose it in single (' ') or double (" ") quotation marks. For example, either 'dBASE
IV;' or "dBASE IV;" is acceptable.
You can also use the DATABASE reserved word to specify the external database. For example,
the following lines specify the same table:
For improved performance and ease of use, use a linked table instead of IN.
You can also use the IN reserved word as a comparison operator in an expression. For more
information, see the In operator.
115
Example
The following table shows how you can use the IN clause to retrieve data from an external
database. In each example, assume the hypothetical Customers table is stored in an external
database.
SELECT CustomerID
FROM Customers
IN OtherDB.mdb
WHERE CustomerID Like "A*";
SELECT CustomerID
FROM Customer
IN "C:\DBASE\DATA\SALES" "dBASE
IV;"
WHERE CustomerID Like "A*";
SELECT CustomerID
FROM Customer
IN "" [dBASE IV;
Database=C:\DBASE\DATA\SALES;]
WHERE CustomerID Like "A*";
116
To retrieve data from a Paradox version 3.x IN "C:\PARADOX\DATA\SALES"
table, substitute "Paradox 3.x;" for "Paradox "Paradox 4.x;"
4.x;". WHERE CustomerID Like "A*";
SELECT CustomerID
FROM Customer
IN "C:\PARADOX\DATA\SALES"
"Paradox 4.x;"
WHERE CustomerID Like "A*";
SELECT CustomerID
FROM Customer
IN "" [Paradox
4.x;Database=C:\PARADOX\DATA\SALE
S;]
WHERE CustomerID Like "A*";
117
10. INSERT INTO Statement
This adds a record or multiple records to a table. This is referred to as an append query.
Syntax:
Multiple-record append query:
INSERT INTO target [(field1[, field2[, …]])] [IN externaldatabase] SELECT
[source.]field1[, field2[, …] FROM tableexpression
Part Description
field1, field2 Names of the fields to append data to, if following a target argument, or
the names of fields to obtain data from, if following a source argument.
externaldatabase The path to an external database. For a description of the path, see
the IN clause.
tableexpression The name of the table or tables from which records are inserted. This
argument can be a single table name or a compound resulting from
an INNER JOIN, LEFT JOIN, or RIGHT JOIN operation or a saved
query.
value1, value2 The values to insert into the specific fields of the new record. Each value
is inserted into the field that corresponds to the value's position in the
list: value1 is inserted into field1 of the new record, value2 into field2,
and so on. You must separate values with a comma, and enclose text
fields in quotation marks (' ').
Remarks
You can use the INSERT INTO statement to add a single record to a table using the single-
record append query syntax as shown above. In this case, your code specifies the name and
value for each field of the record. You must specify each of the fields of the record that a value
is to be assigned to and a value for that field. When you do not specify each field, the default
value or Null is inserted for missing columns. Records are added to the end of the table.
You can also use INSERT INTO to append a set of records from another table or query by
using the SELECT … FROM clause as shown above in the multiple-record append query
syntax. In this case, the SELECT clause specifies the fields to append to the
specified target table.
118
The source or target table may specify a table or a query. If a query is specified, the Microsoft
Access database engine appends records to any and all tables specified by the query.
INSERT INTO is optional but when included, precedes the SELECT statement.
If your destination table contains a primary key, make sure you append unique, non-Null values
to the primary key field or fields; if you do not, the Microsoft Access database engine will not
append the records.
If you append records to a table with an AutoNumber field and you want to renumber the
appended records, do not include the AutoNumber field in your query. Do include the
AutoNumber field in the query if you want to retain the original values from the field.
Use the IN clause to append records to a table in another database.
To create a new table, use the SELECT… INTO statement instead to create a make-table query.
To find out which records will be appended before you run the append query, first execute and
view the results of a select query that uses the same selection criteria.
An append query copies records from one or more tables to another. The tables that contain the
records you append are not affected by the append query.
Instead of appending existing records from another table, you can specify the value for each
field in a single new record using the VALUES clause. If you omit the field list, the VALUES
clause must include a value for every field in the table; otherwise, the INSERT operation will
fail. Use an additional INSERT INTO statement with a VALUES clause for each additional
record you want to create.
Example
This example selects all records in a hypothetical New Customers table and adds them to the
Customers table. When individual columns are not designated, the SELECT table column
names must match exactly those in the INSERT INTO table.
INSERT INTO Customers SELECT * FROM [New Customers];
This example creates a new record in the Employees table.
INSERT INTO Employees (FirstName,LastName, Title) VALUES ('Harry', 'Washington',
'Trainee');
Part Description
field1, field2 The name of the fields to be copied into the new table.
119
newtable The name of the table to be created. It must conform to standard naming
conventions. If newtable is the same as the name of an existing table, a
trappable error occurs.
externaldatabase The path to an external database. For a description of the path, see
the IN clause.
source The name of the existing table from which records are selected. This can
be single or multiple tables or a query.
Remarks
You can use make-table queries to archive records, make backup copies of your tables, or make
copies to export to another database or to use as a basis for reports that display data for a
particular time period. For example, you could produce a Monthly Sales by Region report by
running the same make-table query each month.
You may want to define a primary key for the new table. When you create the table, the
fields in the new table inherit the data type and field size of each field in the query's
underlying tables, but no other field or table properties are transferred.
To add data to an existing table, use the INSERT INTO statement instead to create an
append query.
To find out which records will be selected before you run the make-table query, first
examine the results of a SELECT statement that uses the same selection criteria.
Example
This example selects all records in the Employees table and copies them into a new table named
Emp Backup.
SELECT Employees.* INTO [Emp Backup] FROM Employees;
The following query deletes the table because this is a demonstration.
DROP TABLE [Emp Backup];
Part Description
query1- A SELECT statement, the name of a stored query, or the name of a stored table
n preceded by the TABLE keyword.
120
Remarks
You can merge the results of two or more queries, tables, and SELECT statements, in any
combination, in a single UNION operation. The following example merges an existing table
named New Accounts and a SELECT statement:
By default, no duplicate records are returned when you use a UNION operation; however, you
can include the ALL predicate to ensure that all records are returned. This also makes the query
run faster.
All queries in a UNION operation must request the same number of fields; however, the fields
do not have to be of the same size or data type.
Use aliases only in the first SELECT statement because they are ignored in any others. In the
ORDER BY clause, refer to fields by what they are called in the first SELECT statement.
Notes
You can use a GROUP BY or HAVING clause in each query argument to group the
returned data.
You can use an ORDER BY clause at the end of the last query argument to display the
returned data in a specified order.
Example
This example retrieves the names and cities of all suppliers and customers in Lagos
SELECT CompanyName, City FROM Suppliers WHERE Country = 'Lagos'
UNION SELECT CompanyName, City FROM Customers WHERE Country = 'Lagos';
Part Description
table The name of the table containing the data you want to modify.
newvalue An expression that determines the value to be inserted into a particular field in
the updated records.
121
criteria An expression that determines which records will be updated. Only records that
satisfy the expression are updated.
Remarks
UPDATE is especially useful when you want to change many records or when the records that
you want to change are in multiple tables.You can change several fields at the same time. The
following example increases the Order Amount values by 10 percent and the Freight values by
3 percent for shippers in Nigeria:
Important
UPDATE does not generate a result set. Also, after you update records using an update
query, you cannot undo the operation. If you want to know which records were updated,
first examine the results of a select query that uses the same criteria, and then run the
update query.
Maintain backup copies of your data at all times. If you update the wrong records, you
can retrieve them from your backup copies.
Example
This example changes values in the ReportsTo field to 5 for all employee records that currently
have ReportsTo values of 2.
UPDATE Employees SET ReportsTo = 5 WHERE ReportsTo = 2;
SELECT Avg(Freight) AS [Average Freight] FROM Orders WHERE Freight > 100;
If expr identifies multiple fields, the Count function counts a record only if at least one of the
fields is not Null. If all of the specified fields are Null, the record is not counted. Separate the
field names with an ampersand (&). The following example shows how you can limit the count
to records in which either ShippedDate or Freight is not Null:
You can use Count in a query expression. You can also use this expression in
the SQL property of a QueryDef object or when creating a Recordset object based on an SQL
query.
123
Example
This example uses the Orders table to calculate the number of orders shipped to the United
Kingdom.
SELECT Count (ShipCountry) AS [NG Orders] FROM Orders WHERE ShipCountry = 'NG';
124
Operands in expr can include the name of a table field, a constant, or a function (which can be
either intrinsic or user-defined but not one of the other SQL aggregate functions).
Remarks
You can use Min and Max to determine the smallest and largest values in a field based on the
specified aggregation, or grouping. For example, you could use these functions to return the
lowest and highest freight cost. If there is no aggregation specified, then the entire table is used.
You can use Min and Max in a query expression and in the SQL property of
a QueryDef object or when creating a Recordset object based on an SQL query.
Example
This example uses the Orders table to return the lowest and highest freight charges for orders
shipped to Nigeria.
SELECT Min(Freight) AS [Low Freight], Max(Freight)AS [High Freight]
FROM Orders WHERE ShipCountry = 'UK';
125
14.6 Sum Function
This returns the sum of a set of values contained in a specified field on a query.
Syntax
Sum(expr)
The expr placeholder represents a string expression identifying the field that contains the
numeric data you want to add or an expression that performs a calculation using the data in that
field. Operands in expr can include the name of a table field, a constant, or a function (which
can be either intrinsic or user-defined but not one of the other SQL aggregate functions).
Remarks
The Sum function totals the values in a field. For example, you could use the Sum function to
determine the total cost of freight charges.
The Sum function ignores records that contain Null fields. The following example shows how
you can calculate the sum of the products of UnitPrice and Quantity fields:
Example
This example uses the Orders table to calculate the total sales for orders shipped to Nigeria.
SELECT Sum(UnitPrice*Quantity)AS [Total NG Sales] FROM Orders INNER JOIN [Order
Details] ON Orders.OrderID = [Order Details].OrderID WHERE (ShipCountry = 'NG');
126
SELECT Sum(UnitPrice*Quantity) AS [Total NG Sales] FROM Orders INNER JOIN [Order
Details] ONOrders.OrderID = [Order Details].OrderID WHERE (ShipCountry = 'NG');
Query4
CSurname COtherNames PID qty
AJALA AJAYI p01 34
IFUNANYA WUNMI P02 30
Query5
CSurname COtherNames PID qty purchaseDate
AJALA AJAYI p01 34 23/4/2021
127
Practical Tasks
(1) CSC Nigeria Limited is a product manufacturing company. The company is interested in
keeping track of her customers and the orders they place on their products. As an MIS
Manager of the company, you are requested to create a functional database using Microsoft
Access or otherwise to keep track of the manufactured products, the customers of the
company and the orders. The following attributes are used to describe the products:
ProductID, ProductName, ProductManufactureDate, ProductNAFDACNo and
ProductDescription. Customers are described by the following attributes: CustomerID,
CustomerSurname, CustomerOtherNames, CustomerMobileNo, CustomerOfficeAddress.
(a) Using the concept of database normalization, design MS Access Database tables for
Product, Customer and Orders. Insert some hypothetical data into the database tables
(b) Create data input forms for the three tables
(c) The CEO of CSC Nig. LTD. is interested in the following reports:
(i) Total number of customers the company currently maintains.
(ii) List of all products’ names and NAFDAC Numbers being currently
manufactured by the company
(iii) List of all Customer’s names and mobile phone numbers the company has.
(iv) List of customers’ names and office addresses residing in Ibadan alone
(v) The average quantity ordered for a product
(vi) List of Customers’ names, Products’ names, quantity ordered and date ordered
by all customers
Using Structured Query Language (SQL), create all the reports requested by the
CEO of CSC Nig. LTD. Save each query with necessary names.
(Q2) A local Warehouse uses manual method of recording the operation in the warehouse.
The stocks are recorded in books, the customer details are also recorded, and stocks purchased
and supplied are recorded in a book. The first problem was that the stock managers find it
tedious to know the total number of products available on daily, weekly or monthly basis. Most
time, the recorded available stock in the book is greater than the real number of stock available
in the warehouse or the stock had finished. The following are therefore the schemas designed
for the database to solve the inventory problem in the Warehouse. The primary keys are
underlined in the schemas.
(a) Design the database tables using MS Access. Populate the tables with some hypothetical
tuples.
128
(b) The CEO of the Warehouse is interested in the following reports:
(i) Total number of customers the company currently maintains.
(ii) List of all products and their quantities currently stored in the Warehouse
(iii) List of all Customer’s names and mobile phone numbers the company has.
(iv) List of Supplier’s names and the addresses of those residing in Bodija alone
(v) The average quantity ordered for a product
(vi) List of Customers’ names, Products’ names, quantity ordered and date ordered
by all customers
Using Structured Query Language (SQL), create all the reports requested by the
CEO of the Warehouse. Save each query with necessary names.
(Q3) CSC is a Business organization located within Ibadan Metropolis. The Company deals
with manufacturing of beverages like Milo, Pronto and Ovaltine. Records of customers, staff
and products manufactured have hitherto being kept in manual files. As soon as a customer
patronizes the company, a form is filled. The form contains some vital information on the
biodata of the customer, such as: Surname, Other Names, Age, Mobile Number and Home
Address. Forms are also filled with all the sales they have on daily basis. The Company
observed that they have issues with their manual way of keeping records of the customers and
their sales. The management therefore employ you in order to assist them in bringing database
innovation to the Company. Now, as a full-fledged database designer:
(a) What do you think are the likely issues that MBA Manufacturing Company is having
with their manual operations?
(b) How will the introduction of database technology assist the Company in this case?
(c) With reference to database, the Management is interested in understanding some
technical terms, which you are saying to them, like Primary key, Foreign Key,
Candidate keys and Referential Integrity Constraint. Using some illustrative tables,
explain the terms to the Management.
(d) Draw an ER Diagram for Staff, Customer and Product entities for the database scenario.
(e) Obtain RD Tables for the Customer, Product and Product_Customer Relations.
(f) Assuming you have created the database on a server, write SQL queries to bring out the
following information:
(i) List of all Customers in the age bracket 30 – 45 years in the database.
(ii) List of all Customers that have Bodija as part of their Home addresses.
(iii) List of Customers that bought Milo on 3/1/2022 at 2:30pm and the quantities
bought.
(iv) List of Customers that lives in Bodija and all the products they bought on
3/1/2022.
A computerized database is to be connected to the ACM on a Local Area Network (LAN). The
database will contain table data about Staff Biodata, Units in the company and Attendance. The
Attendance Table is a relationship table between Staff Biodata and Units. It also contains other
129
data such as DateClocked, TimeClocked. Time is recorded on 24 hours scale like 8.00, 13.45,
18.23, etc.
(i) As a Database Manager, draw up the essential three Database tables on paper. Let
each table contains about five records each
(ii) Assuming you are to generate reports for the following information from the
database:
(a) List of all Staff including their Departments from the database. You may
need to create a relationship table for Staff and Department in this case.
(b) List of all Staff names (and their Departments) that resume at exactly 8.00
on 21/1/2021.
(c) List of all Staff names (and their Departments) that resume after 8.00 on
21/1/2021.
(d) List of all Staff names (and their Departments) whose age is 55 years and
above and resume before 8.00 on 21/1/2021. The company will like to
reward them for prompt resumption at work on that day.
(e) List of all Staff names (and their Departments) whose age is less than 55
years and resume after 8.00 on 21/1/2021. The company will like to reward
them for prompt resumption at work on that day.
Write out all the database queries to generate all the above information assuming you
are using Microsoft Access as your database engine.
(Q5) MBA Medical Centre is a hospital located in Ibadan. It specializes in treating only fevers
such as Lasal, Malaria, Typhoid, and Yellow fevers. Patients consult doctors regularly and
diagnoses are conducted on them based on their complaints. Some patients may consult doctors
more than two times on a day especially if their feverish conditions do not change some hours
after taken the drugs prescribed for them. Doctors employed by the Centre are requested to fill
in their details such as names, year of birth and others on a form. Case notes are also opened
for patients, which contain their names, addresses, hospital registration numbers and others.
The case notes also contain their medical complaints history, doctors’ diagnoses and
prescriptions.
The management is interested in creating an MIS Unit for the Centre so that timely information
could be obtained by the management and doctors on their patients. As a full-fledged MIS
Expert, design and implement a relational database for the Centre. Tables in the database
contain information on doctors, patients and consultations, the latter being the relationship table
for doctor and patient relations. Insert at least five records into the tables.
Using a computer, assist the Management of MBA Medical Centre to create the following
reports via Queries on the database:
(i) All patients and their phone numbers
(ii) Mean age of all patients maintained at the Centre
(iii) All patients living in a particular locality such as Bodija Ibadan.
(iv) All patients, whose age is 60 years and above, with the doctors who saw them and
drugs prescribed for them.
130
Further Readings
Richard Peterson (2022). What is Normalization in DBMS (SQL)? 1NF, 2NF, 3NF, BCNF
Database with Example, https://www.guru99.com/database-normalization.html Updated,
February 12, 2022, Accessed, March 2022.
131