Database

Download as pdf or txt
Download as pdf or txt
You are on page 1of 224

Faculty of Science and Technology

CBDB3403
Database

Copyright © Open University Malaysia (OUM)


CBDB3403
DATABASE
Assoc Prof Zaidah Ibrahim

Copyright © Open University Malaysia (OUM)


Project Directors: Prof Dato’ Dr Mansor Fadzil
Assoc Prof Dr Norlia T. Goolamally
Open University Malaysia

Module Writer: Assoc Prof Zaidah Ibrahim


Universiti Teknologi MARA (UiTM)

Moderator: Rohaizak Omar @ Abd Rahim


Open University Malaysia

Developed by: Centre for Instructional Design and Technology


Open University Malaysia

First Edition, August 2014 (rs)

Copyright © Open University Malaysia (OUM), August 2014, CBDB3403


All rights reserved. No part of this work may be reproduced in any form or by any means without
the written permission of the President, Open University Malaysia (OUM).

Copyright © Open University Malaysia (OUM)


Table of Contents
Course Guide xi–xvi

Topic 1 Introduction to Database 1


1.1 Introduction to Databases 2
1.1.1 Basic Concepts and Definitions 4
1.2 Traditional File-based Systems 6
1.2.1 File-based Approach 6
1.2.2 Limitations of File-based Approach 9
1.3 Database Approach 12
1.3.1 The Database 13
1.3.2 The Database Management System (DBMS) 15
1.4 Roles in the Database Environment 17
Summary 19
Key Terms 20
Self-Test 20
References 21

Topic 2 Relational Data Model 23


2.1 Terminology 24
2.1.1 Relational Data Structure 24
2.1.2 Relational Keys 29
2.1.3 Representing Relational Database Schemas 32
2.2 Integrity Constraints 35
2.2.1 Nulls 35
2.2.2 Entity Integrity 35
2.2.3 Referential Integrity 36
2.3 Views 36
2.3.1 Base Relations and Views 37
2.3.2 Purpose of Views 37
Summary 37
Key Terms 38
Self-Test 39
References 39

Copyright © Open University Malaysia (OUM)


iv  TABLE OF CONTENTS

Topic 3 Structured Query Language (SQL): Data Manipulation 40


3.1 Introduction to Structured Query Language (SQL) 41
3.1.1 History of SQL 42
3.1.2 Importance of SQL 42
3.2 Writing SQL Commands 43
3.3 Data Manipulation 44
3.3.1 Simple Queries 45
3.3.2 Sorting Results 53
3.3.3 Using the SQL Aggregate Functions 56
3.3.4 Grouping Results 58
3.3.5 Subqueries 60
3.4 Database Updates 64
3.4.1 INSERT 64
3.4.2 UPDATE 68
3.4.3 DELETE 70
Summary 72
Key Terms 73
Self-Test 73
References 73

Topic 4 SQL: Data Definition 74


4.1 The ISO SQL Data Types 75
4.1.1 SQL Identifiers 75
4.1.2 SQL Data Types 76
4.2 Integrity Enhancement Feature 78
4.2.1 Required Data 79
4.2.2 Domain Constraints 79
4.2.3 Entity Integrity 80
4.2.4 Referential Integrity 80
4.3 Data Definition 82
4.3.1 Creating a Database 83
4.3.2 Creating a Table 83
4.3.3 Changing a Table Definition 86
4.3.4 Removing a Table 88
4.4 Views 88
4.4.1 Creating a View 89
4.4.2 Removing a View 90
Summary 91
Key Terms 92
Self-Test 92
References 93

Copyright © Open University Malaysia (OUM)


TABLE OF CONTENTS  v

Topic 5 Entity-Relationship (ER) Modelling 94


5.1 Entity 95
5.2 Attributes 96
5.3 Relationships 96
5.3.1 Relationship Cardinality 97
5.3.2 Classification of Cardinalities 99
5.3.3 Degree of Relationship 101
5.3.4 Recursive Relationship 101
5.3.5 Resolving Many-to-Many Relationships 102
5.4 Strong and Weak Entities 103
5.5 Generalisation Hierarchies 104
5.5.1 Disjointness and Completeness Constraints 106
Summary 112
Key Terms 113
Self-Test 113
References 114

Topic 6 Normalisation 115


6.1 The Purpose of Normalisation 116
6.2 How Normalisation Supports Database Design 117
6.3 Data Redundancy and Update Anomalies 117
6.3.1 Insertion Anomalies 119
6.3.2 Deletion Anomalies 120
6.3.3 Modification Anomalies 121
6.4 Functional Dependencies 122
6.4.1 Characteristics of Functional Dependencies 123
6.4.2 Identifying Functional Dependencies 126
6.4.3 Identifying the Primary Key for a Relation Using
Functional Dependencies 127
6.5 The Process of Normalisation 128
6.5.1 First Normal Form 131
6.5.2 Second Normal Form 133
6.5.3 Third Normal Form 135
Summary 137
Key Terms 138
Self-Test 138
References 139

Copyright © Open University Malaysia (OUM)


vi  TABLE OF CONTENTS

Topic 7 Database Design Methodology 140


7.1 Introduction to Database Design Methodology 141
7.1.1 What is Design Methodology? 141
7.1.2 Critical Success Factors in Database Design 143
7.2 Conceptual Database Design Methodology 144
7.3 Logical Database Design for Relational Model 148
7.4 Physical Database Design for Relational Model 151
Summary 153
Key Terms 153
Self-Test 154
References 154

Topic 8 Database Security 155


8.1 Threats to a Database 156
8.2 Computer-based Controls 159
8.2.1 Authorisation 159
8.2.2 Access Controls 161
8.2.3 Views 161
8.2.4 Backup and Recovery 162
8.2.5 Encryption 163
8.2.6 Redundant Array of Independent Disks (RAID) 164
8.3 Security in Microsoft Office Access Database
Management System (DBMS) 165
8.4 DBMS and Web Security 166
8.4.1 Proxy Servers 167
8.4.2 Firewalls 168
8.4.3 Digital Signatures 168
8.4.4 Digital Certificates 168
Summary 170
Key Terms 170
Self-Test 171
References 171

Copyright © Open University Malaysia (OUM)


TABLE OF CONTENTS  vii

Topic 9 Transaction Management 172


9.1 Database Transactions 173
9.1.1 Transaction Example 173
9.1.2 Transaction Properties 175
9.2 Concurrency Control 176
9.2.1 Interference Problems 177
9.2.2 Concurrency Control Tools 180
9.3 Recovery Management 186
9.3.1 Database Failures 187
9.3.2 Recovery Tools 187
9.3.3 Recovery Techniques 190
Summary 193
Key Terms 194
Self-Test 195
References 195

Topic 10 Web Technology and Database Management System (DBMS) 196


10.1 Types of Databases 197
10.2 The Web 199
10.2.1 Requirements for Web-DBMS Integration 201
Summary 204
Key Terms 204
Self-Test 205
References 205

Copyright © Open University Malaysia (OUM)


viii  TABLE OF CONTENTS

Copyright © Open University Malaysia (OUM)


COURSE GUIDE

Copyright © Open University Malaysia (OUM)


Copyright © Open University Malaysia (OUM)
COURSE GUIDE DESCRIPTION
You must read this Course Guide carefully from the beginning to the end. It tells
you briefly what the course is about and how you can work your way through
the course material. It also suggests the amount of time you are likely to spend in
order to complete the course successfully. Please keep on referring to the Course
Guide as you go through the course material as it will help you to clarify
important study components or points that you might miss or overlook.

INTRODUCTION
CBDB3403 Database is one of the courses offered by Faculty of Information
Technology and Multimedia Communication at Open University Malaysia
(OUM). This course is worth 3 credit hours and should be covered over 8 to
15 weeks.

COURSE AUDIENCE
This course is targeted to all IT students specialising in Information Systems.
Students enrolled in other IT-related specialisations also will find this course
useful as this course will answer many of their questions regarding database
system development.

As an open and distance learner, you should be acquainted with learning


independently and being able to optimise the learning modes and environment
available to you. Before you begin this course, please confirm the course material,
the course requirements and how the course is conducted.

STUDY SCHEDULE
It is a standard OUM practice that learners accumulate 40 study hours for every
credit hour. As such, for a three-credit hour course, you are expected to spend
120 study hours. Table 1 gives an estimation of how the 120 study hours could be
accumulated.

Copyright © Open University Malaysia (OUM)


xii  COURSE GUIDE

Table 1: Estimation of Time Accumulation of Study Hours

Study
Study Activities
Hours
Briefly go through the course content and participate in initial
3
discussion
Study the module 60
Attend 3 to 5 tutorial sessions 10
Online participation 12
Revision 15
Assignment(s), Test(s) and Examination(s) 20
TOTAL STUDY HOURS 120

COURSE OUTCOMES
By the end of this course, you should be able to:

1. Explain the concept and technology of database, basic database


terminologies, architecture aspect, data model and database software;

2. Describe the technique and methodology that support the process of


database designing;

3. Apply the skill of using the standard methodology to design a relation


database;

4. Demonstrate the application of SQL and its importance;

5. Point out the meaning of transaction and its benefits; and

6. Summarise the advantages and disadvantages of the Web as the database


platform.

Copyright © Open University Malaysia (OUM)


COURSE GUIDE  xiii

COURSE SYNOPSIS
This course is divided into 10 topics. The synopsis for each topic is presented
below:

Topic 1 introduces the field of database management examining the problems


with the precursor to the database system, file-based systems and advantages
offered by the database approach.

Topic 2 introduces the concepts behind the relational model; as the most popular
data model at present and the most often chosen for standard business
applications. After introducing the terminology and showing the relationship
with mathematical relations, the relational integrity rules, entity integrity and
referential integrity are discussed. The topic concludes with an overview on
views, which is expanded later in Topic 4.

Topic 3 introduces the data manipulation statements of the SQL standard:


SELECT, INSERT, UPDATE and DELETE. The topic is presented as a tutorial,
giving a series of work examples that demonstrate the main concepts of these
statements.

Topic 4 covers the main data definition facilities of the SQL standard. Again, the
topic is presented as a worked tutorial. The topic introduces the SQL data types
and data definition statements – the Integrity Enhancement Feature (IEF) and
more advanced features of the data definition statements; including the access
control statements GRANT and REVOKE. It also examines views and how they
can be created in SQL.

Topic 5 covers the concepts of the entity-relationship (ER) model. ER modelling


is an important technique for any database designer to master and forms the
basis of the methodology presented in Topic 7. You will be introduced to UML; a
represention of ER diagrams.

Topic 6 examines the concepts behind normalisation, which is another important


technique used in the logical database design methodology. Using a series of
work examples drawn from the integrated case study, they demonstrate how to
transition a design from one normal form to another and show the advantages of
having a logical database design that conforms to particular normal forms up to
the third normal form (3NF).

Copyright © Open University Malaysia (OUM)


xiv  COURSE GUIDE

Topic 7 presents a methodology for database design. The methodology is divided


into three parts covering conceptual, logical and physical database designs.

Topic 8 considers database security, not just in the context of DBMS security but
also in the context of security of the DBMS environment. The topic also examines
the security problems that can arise in the Web environment and presents some
approaches to overcome them.

Topic 9 concentrates on three functions that a Database Management System


(DBMS) should provide, namely, transaction management, concurrency control
and recovery. These functions are intended to ensure that the database is reliable
and remains in a consistent state when multiple users are accessing the database
and in the presence of failures of both hardware and software components. The
topic also discusses advanced transaction models that are more appropriate for
transactions that may be of a long duration.

Topic 10 examines the integration of the DBMS into the Web environment. After
providing a brief introduction to the Internet and the Web technology, this topic
examines the appropriateness of the Web as a database application platform and
discusses the advantages and disadvantages of this approach. It then considers
a number of the different approaches to integrating DBMSs into the Web
environment.

TEXT ARRANGEMENT GUIDE


Before you go through this module, it is important that you note the text
arrangement. Understanding the text arrangement will help you to organise your
study of this course in a more objective and effective way. Generally, the text
arrangement for each topic is as follows:

Learning Outcomes: This section refers to what you should achieve after you
have completely covered a topic. As you go through each topic, you should
frequently refer to these learning outcomes. By doing this, you can continuously
gauge your understanding of the topic.

Self-Check: This component of the module is inserted at strategic locations


throughout the module. It may be inserted after one sub-section or a few sub-
sections. It usually comes in the form of a question. When you come across this
component, try to reflect on what you have already learnt thus far. By attempting
to answer the question, you should be able to gauge how well you have
understood the sub-section(s). Most of the time, the answers to the questions can
be found directly from the module itself.

Copyright © Open University Malaysia (OUM)


COURSE GUIDE  xv

Activity: Like Self-Check, the Activity component is also placed at various


locations or junctures throughout the module. This component may require you
to solve questions, explore short case studies, or conduct an observation or
research. It may even require you to evaluate a given scenario. When you come
across an Activity, you should try to reflect on what you have gathered from the
module and apply it to real situations. You should, at the same time, engage
yourself in higher order thinking where you might be required to analyse,
synthesise and evaluate instead of only having to recall and define.

Summary: You will find this component at the end of each topic. This component
helps you to recap the whole topic. By going through the summary, you should
be able to gauge your knowledge retention level. Should you find points in the
summary that you do not fully understand, it would be a good idea for you to
revisit the details in the module.

Key Terms: This component can be found at the end of each topic. You should go
through this component to remind yourself of important terms or jargon used
throughout the module. Should you find terms here that you are not able to
explain, you should look for the terms in the module.

PRIOR KNOWLEDGE
Knowledge of the windows operating system and Microsoft Access application is
required for this course.

ASSESSMENT METHOD
Please refer to myINSPIRE.

REFERENCES
Connolly, T. M., & Begg, C. E. (2009). Database systems: A practical approach to
design, implementation and management (5th ed.). Boston, MA: Addison-
Wesley.

Hoffer, J. A., Prescott, M. B., & Topi, H. (2008). Modern database management
(9th ed.). New Jersey, NJ: Prentice-Hall.

Mannino, M. (2011). Database design, application development and


administration (5th ed.). Scottsdale, AZ: Ediyu.

Copyright © Open University Malaysia (OUM)


xvi  COURSE GUIDE

Post, G. V. (2004). Database management systems: Designing and building


business applications (3rd ed.). New York, NY: McGraw-Hill.

Pratt, P. J., & Last, M. Z. (2008). A guide to SQL (8th ed.). Mason, OH: Cengage
Learning.

Rob, P., & Coronel, C. (2001). Database systems: Design, implementation and
management (8th ed.). Stamford, CT: Cengage Learning.

Shelly, G. B. (2011). Discovering computers. Stamford, CT: Cengage Learning.

TAN SRI DR ABDULLAH SANUSI (TSDAS)


DIGITAL LIBRARY
The TSDAS Digital Library has a wide range of print and online resources for the
use of its learners. This comprehensive digital library, which is accessible
through the OUM portal, provides access to more than 30 online databases
comprising e-journals, e-theses, e-books and more. Examples of databases
available are EBSCOhost, ProQuest, SpringerLink, Books247, InfoSci Books,
Emerald Management Plus and Ebrary Electronic Books. As an OUM learner,
you are encouraged to make full use of the resources available through this
library.

Copyright © Open University Malaysia (OUM)


Topic  Introduction to
1 Database

LEARNING OUTCOMES
By the end of this topic, you should be able to:
1. Identify the characteristics of file-based systems;
2. Describe four limitations of file-based systems;
3. Define the database and database management systems (DBMS);
4. Describe four advantages and two disadvantages of DBMS.
5. Identify four features of DBMS; and
6. Classify types of people involved in the DBMS environment.

 INTRODUCTION
Have you heard of the word „database‰ or „database system‰? If you have, then
you will have a better understanding of these words by taking this course.
However, if you have not heard of them, do not worry. By taking this course, you
will be guided until you know, understand and are able to apply it to real world
problems.

You might ask yourself why do we need to study database systems? Well, this is
similar to ask yourself why you need to study programming, operating systems
or other IT-related subjects. The answer is that database systems have become an
important component of successful businesses and organisations. Since you
might probably intend to be a manager, entrepreneur or an IT professional, it is
vital to have a basic understanding of database systems.

Copyright © Open University Malaysia (OUM)


2  TOPIC 1 INTRODUCTION TO DATABASE

This topic introduces the area of database management systems, examines the
problems with the traditional file-based systems and discusses what database
management system (DBMS) can offer. In the first subtopic, there will be an
explanation of some uses of database systems that we can find in our everyday
lives. Then, in the next subtopic, we will compare the file-based system with
database systems. Next, we will discuss the roles that people perform in the
database environment and lastly, we will discuss the advantages and
disadvantages of database management systems.

1.1 INTRODUCTION TO DATABASES


Now, let us start by asking some questions regarding your most common
activities. I am sure that you always go to the supermarket to purchase your
goods and to the automated teller machine to withdraw or deposit money. Have
you ever wondered where all these data come from or how they are stored? Also,
have you ever wondered whether your account is being balanced correctly?

For your information, all these activities are possible with the existence of
DBMSs. It means that our life is affected by database technology. Computerised
databases are important to the functioning of modern organisations. Before we
proceed further, let us take a look at its definition.

(a) What is a Database System?

A database system is a collection of application programs that interact


with the database along with the DBMS and the database itself.
Connolly & Begg (2009)

(b) What is a Database?

A database is a shared collection of logically related data and a


description of this data, designed to meet the information needs of an
organisation.
Connolly & Begg (2009)

Copyright © Open University Malaysia (OUM)


TOPIC 1 INTRODUCTION TO DATABASE  3

(c) What is a Database Application?

A database application is a program that manages and controls access to


the database.
Connolly & Begg (2009)

(d) What is DBMS?

DBMS is a software system that enables users to define, create, maintain


and control access to the database.
Connolly & Begg (2009)

The number of database applications has increased tremendously over


the past two decades (Jeffrey et al., 2011). The use of database to support
customer relationship management, online shopping and employee
relationship management is growing.

However, before we discuss this matter any further, let us examine some
applications of database systems that you have used without realising that
you are accessing a database system in your daily life such as:

(i) Purchase from the Supermarket


When you purchase goods from a supermarket, you would have
noticed that the checkout assistant will scan the bar codes of your
purchases and the total of your payment would be calculated.
Basically, what has happened is that the bar code reader is linked to
an application programme that uses the bar code to find out the price
of the item and the price will be displayed on the cash register. Then,
the programme would reduce the number of such items in stock. If
the reorder level falls below a specified predefined value, the database
system would automatically place an order to obtain more stocks of
that item. In this case, the sales manager can keep track of the items
that were sold and need to be ordered.

Copyright © Open University Malaysia (OUM)


4  TOPIC 1 INTRODUCTION TO DATABASE

(ii) Purchase Using Your Credit Card


When you purchase an item using a credit card, your credit card will
be swiped using the card reader that is linked to a database that
contains information about the purchase that you have made using
your credit card. The database application programme would use
your credit card number to check if the price of the item that you wish
to purchase together with the total purchase that you have made that
month is within your credit limit. Once the purchase is confirmed, the
information about your recent purchase would be added to the
database.

So, now do you realise that you are a user of database systems? The database
technology does not only improve the daily operations of organisations but also
the quality of decisions made. For instance, with the database systems, a
supermarket can keep track of its inventory and sales in a very short time. This
may lead to a fast decision in terms of making new orders of products. In this
case, the products will always be available for the customers. Thus, the business
may grow as the customerÊs satisfaction is always met. In other words, it would
be an advantage to those who collect, manage and interpret information
effectively in todayÊs world.

1.1.1 Basic Concepts and Definitions


Notice that in the previous discussion, sometimes the word data is used and
sometimes the word information is used. Do you think that there is a difference
between data and information? If your answer is yes, then you are correct. What
is the difference between data and information?

(a) What is Data?

Data is a collection of unprocessed items that may consist of text,


numbers, images and video.
Shelly et al. (2011)

Today, data can be represented in various forms like sound, images and
videos. For instance, you can record your speech into a computer using the
computerÊs microphone. Images taken using a digital camera or scanned
using a scanner can also be transferred into a computer. So, actually there
are so many different types of data around us. Can you name some other
data that you might have used or produced before?

Copyright © Open University Malaysia (OUM)


TOPIC 1 INTRODUCTION TO DATABASE  5

Now, the next thing that we will discuss is how we can make our data
meaningful and useful. This can be done by processing it.

(b) What is Information?

Information refers to the data that have been processed in such a way
that the knowledge of the person who uses the data is increased.
Jeffrey et al. (2011)

For instance, the speech that you have recorded and images that you have
stored in a computer, could be used as part of your presentation by using
any of presentation software. The speech may represent some definitions of
the terms that are included in your presentation slides. Thus, by including
it into your presentation, the recorded speech has more meaning and is
more useful. The images could also be sent to your friends through e-mails
for them to view. What this means is that you have transformed the data
that you have stored into information, once you have done something with
it. In other words, computers process data into information.

In this course, we are concerned with the organisation of data and


information and how it can be used in analysis and decision making. The
more data and information you have, the better your analysis and decision
making would be. However, how can you store these large volumes of data
and information? This is where a database comes in.

The next subtopic will discuss the traditional file-based system and to
examine its limitations and also to understand why database systems are
needed.

SELF-CHECK 1.1

1. Define database system and give one example where database


system can be used in your daily life.

2. Name a software system that enables users to define, create,


maintain and control access to the database.

Copyright © Open University Malaysia (OUM)


6  TOPIC 1 INTRODUCTION TO DATABASE

1.2 TRADITIONAL FILE-BASED SYSTEMS


Now, let us talk about the traditional file-based system.

1.2.1 File-based Approach


What is a file-based system?

A file-based system is a collection of application programs that perform


services for the end-users such as studentsÊ reports for the academic office and
lecturersÊ report for the deanÊs office. Each program defines and manages its
own data.
Connolly & Begg (2009)

Traditionally, manual files are used to store all internal and external data within
an organisation. These files are stored in cabinets and for security purposes,
whereby the cabinets are locked or located in a secure area. When any
information is needed, you may have to search starting from the first page until
you find the information that you are looking for. To speed up the searching
process, you may create an indexing system to help you locate the information
that you are looking for faster. You may have such a system that stores all your
results or important documents.

The manual filing system works well if the number of items stored is not large.
However, this kind of system may fail if you want to do a cross-reference or
process any of the information in the file. Then, computer-based data processing
emerges and traditional filing system is replaced with these computer-based data
processing systems or file-based systems. However, instead of having a
centralised store for the organisationÊs operational access, a decentralised
approach is taken. In this approach, each department would have their own file-
based system, which they would monitor and control separately.

Let us refer to the following example.

Copyright © Open University Malaysia (OUM)


TOPIC 1 INTRODUCTION TO DATABASE  7

Example 1.1:
File Processing System at Make-Believe Real Estate Company
Make-Believe Real Estate company has three departments: sales, contract and
personnel. Each of these departments are physically located in the same building,
but in separate floors and each has its own file-based system. The function of the
sales department is to sell and rent properties. The function of the contract
department is to handle lease agreements associated with properties for rent.
The function of the personnel department is to store information about staff.
Figure 1.1 illustrates the file-based system for Make-Believe Real Estate company.
Each department has its own application programme that handles similar
operations like data entry, file maintenance and generating reports.

Figure 1.1: File-based system for Make-Believe Real Estate company

By looking at Figure 1.1, we can see that the sales executive can store and retrieve
information from the sales files through sales application program. The sales files
may consist of information regarding properties, owners and clients. Figure 1.2
illustrates examples of the content of these three files. Figure 1.3 shows the
content of the Contract files while Figure 1.4 is for the Personnel files. Notice that
the Client files in the sales and contract departments are the same. What this
means is that duplication occurs when using decentralised file-based system.

Copyright © Open University Malaysia (OUM)


8  TOPIC 1 INTRODUCTION TO DATABASE

Figure 1.2: Property, Owner and Client files used by the sales department

Figure 1.3: Lease, Property and Client files used by the contract department

Copyright © Open University Malaysia (OUM)


TOPIC 1 INTRODUCTION TO DATABASE  9

Figure 1.4: Personnel file used by the personnel department

By referring to Figures 1.2, 1.3 and 1.4, we can see that:

A file is simply a collection of records, while a record is a collection of fields


and a field is a collection of alphanumeric characters.

Thus, the personnel file in Figure 1.4 consists of two records and each record
consists of nine fields. Can you list the number of records and fields in the client
file as shown in Figure 1.3?

Now, let us discuss the limitations of the file-based system which we have
mentioned earlier. No doubt, file-based systems have proved to be a great
improvement over manual filing systems. However, a few problems still occur
with this system, especially if the volume of the data and information increases.

1.2.2 Limitations of File-based Approach


What are the disadvantages or limitations of file-based system? Can you identify
one? Well, actually there are four limitations associated with the conventional
file-based system (see Figure 1.5).

Copyright © Open University Malaysia (OUM)


10  TOPIC 1 INTRODUCTION TO DATABASE

Figure 1.5: Four limitations of file-based approach

Let us discuss the limitations further.

(a) Separation and Isolation of Data


Suppose you want to match the requirements of your clients with the
available properties. How are you going to do this? Basically, what you will
have to do is to go to the sales department and access the property and
client files to match the requirements with the properties available. How
are you going to access the information from these two files?

Well, you can create a temporary file of those clients who have a „house‰ as
the preferred type and search for the available house from the property file.
Then, you may create another temporary file of those clients who have an
„apartment‰ as the preferred type and do the searching again. The search
would be more complex if you have to access more than two files and from
different departments. In other words, the separation and isolation of data
would make the retrieval process time consuming.

(b) Duplication of Data


If you look back at Figures 1.2 and 1.3, you will notice that both sales and
contract departments have the property and client files. This duplication
wastes time as the data would be entered twice in two different
departments. The data may be entered incorrectly which leads to different
information from both departments. Besides that, more storage is being
used and this can be associated with cost as extra storage is needed,
whereby the cost will be increased.

Copyright © Open University Malaysia (OUM)


TOPIC 1 INTRODUCTION TO DATABASE  11

Another disadvantage of duplication of data is that there may be no


consistency when updating the files. Suppose that the rental cost is being
updated in the property file of the sales department but not in the contract
department. Then, problems may occur as the client may be informed with
two different costs. You can imagine the problem that may arise due to this.

(c) Program Data Dependence


The physical structure of the files like the length of the text for each field is
defined in the application program. Thus, if the property department
decides to change the clientÊs first name from 10 characters to 20 characters,
then the file description of the first name for all the affected files need to be
modified. What this means is that the length of the first name for the owner
and client file in the property department need to be changed also. It is
often difficult to locate all affected programs by such changes. Try to
imagine if you have a lot of files in your file-based system and you may
have to check each file for such modification, do you not think that this
would be very time consuming?

(d) Limited Data Sharing


By looking back at Figures 1.2 and 1.3, we can see that the contract
department does not have the owner file as the sales department. What this
means is that if the contract department would like to access information
regarding the owner of a property, no direct access is allowed. This request
may need to go through the management of both departments and again,
the overall process may be time consuming.

Now, after understanding the limitations of file-based system, let us discuss a


solution to the limitations. The answer would be introducing database system.
This will be explained in the next subtopic.

SELF-CHECK 1.2
1. What is a file-based system?

2. Explain two limitations of file-based system.

Copyright © Open University Malaysia (OUM)


12  TOPIC 1 INTRODUCTION TO DATABASE

1.3 DATABASE APPROACH


How can the database approach improve the limitations of file-based system?
Can you identify at least one advantage of the database approach compared to
the file-based approach? Well, the database approach emphasises the integration
and sharing of data through the organisation which means that all departments
should be able to integrate and share the same data. The three advantages of the
database approach are shown in Figure 1.6.

Figure 1.6: Three advantages of database approach

Now, we shall see the details of the listed advantages.

(a) Program Data Independence


With the database approach, data descriptions are stored in a central
location called repository, separately from the application program. Thus, it
allows an organisationÊs data to change and evolve without changing the
application program that process the data. Therefore, the changing of data
would be easier and faster.

(b) Planned Data Redundancy and Improved Data Consistency


Ideally, each data should be recorded in only one place in the database.
Thus, a good database design would integrate redundant data files into a
single logical structure. In this case, any updates of data would be easier
and faster. In fact, we can avoid wasted storage space that results from
redundant data storage. By controlling data redundancy, the data would
also be consistent.

Copyright © Open University Malaysia (OUM)


TOPIC 1 INTRODUCTION TO DATABASE  13

(c) Increased Productivity of Application Development


A database approach reduces the cost and time for developing new
database applications. What this means is that with the same database,
different applications can be developed. Thus, there is no need to design
and develop a new database for different applications (Hoffer et al., 2008).

Meanwhile, there are two disadvantages of the database approach. These are
shown in Figure 1.7.

Figure 1.7: Two disadvantages of database approach

ACTIVITY 1.1
Search in the Internet the details of the two disadvantages listed in
Figure 1.7. Discuss your findings with your coursemates in the forum.

1.3.1 The Database


Can you recall the definition of database? It is a shared collection of logically
related data and a description of this data, designed to meet the information
needs of an organisation (Connolly & Begg, 2009). In other words, it is a large
repository of data that can be used by many users at the same time.

Database is also defined as a self-describing collection of integrated records


because it consists of a description of the data. The description of the data is
called system catalogue or data dictionary of metadata.
Connolly & Begg (2009)

Copyright © Open University Malaysia (OUM)


14  TOPIC 1 INTRODUCTION TO DATABASE

How about data abstraction?

Data abstraction is a database approach that separates the structure of data


from application programs.

Thus, we can change the internal definition of an object in the database without
affecting the users of the object, provided that the external definition remains the
same. For instance, if we were to add a new field to a record or create a new file,
then the existing applications are unaffected. More examples of this will be
shown in the next topic.

Some other terms that you need to understand are shown in Table 1.1.

Table 1.1: Database Terms

Term Definition
Entity A specific object (for example a department, place or event) in the
organisation that is to be represented in the database
Attribute A property that explains some characteristics of the object that we
wish to record
Relationship An association between entities

Source: Connolly & Begg (2009)

Figure 1.8 illustrates an example of an entity-relationship (ER) diagram for part


of a department in an organisation.

Figure 1.8: An example of entity-relationship diagram

By referring to Figure 1.8, we can see that the ER diagram consists of two entities
(rectangles) namely Department and Staff. It has one relationship, where it
indicates that a department has many staff. For each entity, there is one attribute,
that is, DepartmentNo and StaffNo. In other words, the database holds data that
are logically related. More explanations on this will be discussed in later topics.

Copyright © Open University Malaysia (OUM)


TOPIC 1 INTRODUCTION TO DATABASE  15

SELF-CHECK 1.3

1. What is database abstraction?

2. Define entity, attribute and relationships.

1.3.2 The Database Management System (DBMS)


Now, what about DBMS? Can you recall the definition of DBMS from the earlier
explanation? DBMS is a software that interacts with the userÊs application
programs and the database (Connolly & Begg, 2009). Initially, DBMSs provide
efficient storage and retrieval of data. However, as the marketplace and
innovation demands increase, DBMSs have evolved to provide a broad range of
features for data acquisition, storage, dissemination, maintenance, retrieval and
formatting which make DBMSs more complex.

Now, let us discuss in detail five common features of a DBMS (see Figure 1.9).

Figure 1.9: Five features of DBMS

(a) Database Definition


In defining a database, the entities stored in tables (an entity is defined as a
cluster of data usually about a single item or object that can be accessed)
and relationships that indicate the connections among tables must be
specified. Most DBMSs provide several tools to define database. The
Structured Query Language (SQL) is an industry standard language
supported by most DBMSs that can be used to define tables and
relationships among tables (Mannino, 2011). More discussions on SQL will
be done in the later topics.
Copyright © Open University Malaysia (OUM)
16  TOPIC 1 INTRODUCTION TO DATABASE

(b) Non-procedural Access


The most important feature of DBMS is the ability to answer queries. A
query is a request to extract useful data. For instance, a learner DBMS is
where a few tables may have been defined, like the personal information
and result tables, and a query might be a request to list the names of the
learners who will be graduating next semester. Non-procedural access
allows users to submit queries by specifying what parts of a database to
retrieve (Mannino, 2011). We will continue our discussion on queries in the
later topics.

(c) Application Development


Most DBMSs provide graphical tools for building complete applications
using forms and reports. For instance, data entry forms provide an easy
way to enter and edit data. Report forms make it easy to view results of a
query (Mannino, 2011).

(d) Transaction Processing


Transaction processing allows DBMS to process large volumes of repetitive
work. A transaction is a unit of job that should be processed continuously
without any interruptions from other users and without loss of data due to
failures. An example of a transaction is making an airline reservation. The
user does not know the details of the transaction processing other than the
assurance that the process is reliable and safe (Mannino, 2011).

(e) Database Tuning


Database tuning includes a few monitoring processes that could improve
the performance. Utility programs can be used to reorganise a database,
select physical structures for better performance and repair damaged parts
of a database. This feature is important for DBMSs that support large
databases with many simultaneous users usually known as Enterprise
DBMS. On the other hand, desktop DBMS run on personal computers and
small servers that support limited transaction processing features, usually
used by small businesses (Mannino, 2011).

Copyright © Open University Malaysia (OUM)


TOPIC 1 INTRODUCTION TO DATABASE  17

1.4 ROLES IN THE DATABASE ENVIRONMENT


Now, let us look at the people involved in the DBMS environment. Basically,
there are four types of people that are involved in the DBMS environment as
shown in Figure 1.10.

Figure 1.10: Four types of people involved in DBMS environment

Now, let us look at them in detail.

(a) Data and Database Administrators


The data and database administrators are those who manage the data
resources in a DBMS environment. This includes database planning,
development and maintenance of standards, policies and procedures as
well as conceptual or logical database design where they work together
with senior managers. In other words, some of their roles are:

(i) Production of proprietary and open source technologies and


databases on diverse platforms that must be managed simultaneously
in many organisations;

(ii) Rapid growth in the size of databases; and

(iii) Expansion of applications that require linking corporate databases to


the Internet.

Copyright © Open University Malaysia (OUM)


18  TOPIC 1 INTRODUCTION TO DATABASE

(b) Database Designers


There are two types of database designers:

(i) Logical Database Designer


The logical database designer is responsible to identify data,
relationships between data and constraints on data that is to be stored
in the database. He or she needs to have a thorough understanding of
the organisationÊs data.

(ii) Physical Database Designer


A physical database designer needs to decide how the logical
database design can be physically developed. He or she is responsible
to map the logical database design into a set of tables, selecting
specific storage structures and access methods for the data to produce
good performance and design the security measures needed for the
data (Connolly & Begg, 2009).

(c) Application Developers


An application developer is responsible to provide the required
implementation for the end users. Usually, an application developer works
on the specification produced by the system analysts. The applications may
be written in a third generation or fourth generation programming
language.

(d) End Users


The end users are the customers for the database that have been designed to
serve their information needs. End users can be categorised as:

(i) Naive Users


Naive users usually do not know much about DBMS as they would
only use simple commands or select from a list of options provided by
the application.

(ii) Sophisticated Users


Sophisticated users usually have some knowledge about the structure
and facilities offered by DBMS. They use high-level query language to
retrieve their needs. Some may even write their own application
program.

Copyright © Open University Malaysia (OUM)


TOPIC 1 INTRODUCTION TO DATABASE  19

SELF-CHECK 1.4
Who are the people involved in the database environment? Briefly
explain their responsibilities.

 The database management system (DBMS) is an important component of an


information system and has changed the way many organisations operate.

 The predecessor to the DBMS was the file-based system where each program
defines and manages its own data. Thus, data redundancy and data
independence become major problems.

 File-based systems have four limitations, namely, separation and isolation of


data, duplication of data, program data independence and limited data
sharing.

 DBMS contains four features: database definition, non-procedural access,


application development and transaction processing.

 The database approach was introduced to resolve the problems with file-
based system. All access to the database can be made through the DBMS.

 Some advantages of the database approach are control of data redundancy,


data consistency, sharing of data and improvement of security and integrity.
Meanwhile, two disadvantages are complexity and cost.

 There are four types of people involved in the DBMS environment which are
data and database administrators, database designers, application designers
and end users.

Copyright © Open University Malaysia (OUM)


20  TOPIC 1 INTRODUCTION TO DATABASE

Data File-based system


Database Information
Database application Metadata
Database system Program
Database management system (DBMS) Relationship
Entity Structured query language (SQL)

1. Define each of the following key terms:

(a) Data

(b) Information

(c) Database

(d) Database application

(e) Database system

(f) Database management system

2. List two disadvantages of file-based systems.

3. List two examples of database system other than what have been discussed
in this topic.

4. Discuss the main components of the DBMS environment and how they are
related to each other.

Copyright © Open University Malaysia (OUM)


TOPIC 1 INTRODUCTION TO DATABASE  21

5. Discuss the roles of the following personnel in the database environment:

(a) Database administrator

(b) Logical database designer

(c) Physical database designer

(d) Application developer

(e) End user

6. Study the University Student Affairs case study presented below. In what
ways would a DBMS help this organisation? What data can you identify
that needs to be represented in the database? What relationships exist
between the data items?

Case study: University Student Affairs

Data requirements:

(a) Student

(i) Student identification number

(ii) First and last name

(iii) Home address

(iv) Date of birth

(v) Sex

(vi) Semester of study

(vii) Nationality

(viii) Program of study

(ix) Recent Cumulative Grade Point Average (CGPA)

Copyright © Open University Malaysia (OUM)


22  TOPIC 1 INTRODUCTION TO DATABASE

(b) College (a college is an accommodation provided for the students.


Each college in the university has the following information):

(i) College name

(ii) College address

(iii) College office number

(iv) College manager

(v) Number of rooms

(vi) Room number

(c) Sample query transactions:

(i) List the names of students who are staying in the colleges

(ii) List the number of empty rooms in the colleges

(iii) List the names of students within specific CGPA

Connolly, T. M., & Begg, C. E. (2009). Database systems: A practical approach to


design, implementation and management (5th ed.). Boston, MA: Addison-
Wesley.

Hoffer, J. A., Prescott, M. B., & McFadden, F. R. (2008). Modern database


management (8th ed.). New Jersey, NJ: Prentice-Hall.

Jeffrey, A. H., Prescott, M., & Topi, H. (2008). Modern database management
(9th ed.). New Jersey, NJ: Prentice Hall.

Mannino, M. (2011). Database design, application development and


administration (5th ed.). Scottsdale, AZ: Ediyu.

Shelly, G. B. (2011). Discovering computers. Stamford, CT: Cengage Learning.

Copyright © Open University Malaysia (OUM)


Topic  Relational Data
2 Model

LEARNING OUTCOMES
By the end of this topic, you should be able to:
1. Explain terminologies on relational database;
2. Discuss how tables are used to represent data;
3. Identify the candidate, primary, superkey and foreign keys;
4. Describe the meaning of entity integrity and referential integrity;
and
5. Explain the concept and purpose of views in relational systems.

 INTRODUCTION
Topic 1 was a starting point for your study of database technology. You learned
about the database characteristics and the database management system (DBMS)
features. This topic will focus on the relational data model but before that, let us
look at a brief introduction of the model. The relational model was developed by
E. F. Codd in 1970. The simplicity and familiarity of the model made it hugely
popular, compared to the other data models that existed during that time. Since
then, relational DBMSs have dominated the market for business DBMS
(Mannino, 2011).

This topic provides you with an exploration of the relational data model. You
will discover that the strength of this data model lies in its simple logical
structure, whereby these relations are treated as independent elements. You will
then see how these independent elements can be related to one another.

Copyright © Open University Malaysia (OUM)


24  TOPIC 2 RELATIONAL DATA MODEL

In order to ensure that the data in the database are accurate and meaningful,
integrity rules are explained. Two important integrity rules will be described,
which are, entity integrity and referential integrity. Finally, we will end the topic
with the concept of views and its purpose.

2.1 TERMINOLOGY
First of all, let us start with the definitions of some of the pertinent terminology.
The relational data model was developed because of its simplicity and its familiar
terminology. The model is based on the concept of a relation which is physically
represented as a table (Connolly & Begg, 2009). This subtopic presents the basic
terminology and structural concepts of the relational model.

2.1.1 Relational Data Structure


Now, let us learn about relational data structure.

(a) Relation

A relation is a table with columns and rows. A relation is represented as


a two-dimensional table in which columns correspond to attributes and
rows correspond to tuples. Another set of terms describes a relation as a
file, tuples as records and attributes as fields.
Connolly & Begg (2009)

The alternative terminology for a relation is summarised in Table 2.1.

Table 2.1: Alternative Terminology

Formal Terms Alternative 1 Alternative 2


Relation Table File
Tuple Row Record
Attribute Column Field

Copyright © Open University Malaysia (OUM)


TOPIC 2 RELATIONAL DATA MODEL  25

The relation must have a name that is distinct from other relation names in
the same database. Table 2.2 shows a listing of the two-dimensional table
named Employee, consisting of seven columns and six rows. The heading
part consists of the table name and the column names. The body shows the
rows of the table.

Table 2.2: Employee Table

EmpNo Name MobileTelNo Position Gender DOB Salary


E1708 Shan Dass 012-5463344 Administrator F 19-Feb-1975 980
E1214 Tan Ai Lee 017-6697123 Salesperson M 23-Dec-1969 1500
E1090 Mat Zulkifli 013-6710899 Manager M 07-May-1960 3000
E3211 Lim Kim 017-5667110 Assistant M 15-Jun-1967 2600
Hock Manager
E4500 Lina Hassan 012-6678190 Clerk F 31-May-1980 750
E5523 Mohd 013-3506711 Clerk M 14-Feb-1979 600
Firdaus

(b) Attribute

An attribute is a named column of a relation.

In the Employee table (see Table 2.2), the columns for attributes are EmpNo
(Employee Number), Name, MobileTelNo (Mobile Telephone Number),
Position, Gender, DOB (Date Of Birth) and Salary.

You must take note that every column row intersection contains a single
atomic data value. For example, the EmpNo columns contain only the
number of a single existing employee.

Attributes can be classified into several types. To make it easy for us to


understand these types of attributes, the attributes are paired according to
opposite meanings.

Copyright © Open University Malaysia (OUM)


26  TOPIC 2 RELATIONAL DATA MODEL

 Simple vs. Composite Attributes


A simple attribute has only one component, exists independently and
cannot be broken up. Independence means that the component does not
have to rely on other attributes. For example, a name or gender - both
cannot be divided into two or more sections. This is in contrast to the
composite attribute which comprises of many components, each one
existing independently. An example is an address with sub-attributes
such as house number, road name, postcode, district, state and country.

 Single-Value vs. Multiple-Value Attributes


Single-value attribute is an attribute that consists of only a single value.
Metric number and identity card number are examples of single-valued
attributes. A multiple value attribute is an attribute consisting of many
values. For example, an individual can have several telephone numbers
such as house telephone number, office telephone number and mobile
telephone number.

 Derived Attribute
A derived attribute is an attribute where its value is derived from the
value of related attributes or set of other attributes. An example is the
age attribute which is derived from the date of birth.

Data Types
Data types indicate the kind of data for the column (character, numeric,
yes or no, etc.) and permissible operations (numeric operations, string
operations) for the column. Table 2.3 lists the five common data types.

Table 2.3: Five Common Data Types

Data Type Description


Numeric The data on which you can perform arithmetic operations of
addition, subtraction, multiplication and division
Character For fixed-length text which can contain any character (space
included) or symbol that is not intended for mathematical operation
Variable For variable-length text which can contain any character (space
Character included) or symbol not intended for mathematical operation
Date Used to store calendar dates using the year, month and day fields.
For date, the allowable operations include comparing two dates
and generating a date by adding or subtracting a number of days
from a given date
Logical For attributes containing data with two values such as true or false,
or yes or no

Copyright © Open University Malaysia (OUM)


TOPIC 2 RELATIONAL DATA MODEL  27

In the Employee relation in Table 2.2, Salary is a numeric attribute.


Arithmetic operations can be performed on these attributes. For example,
you will be able to sum the salaries to get the total salary of the employees
and determine the annual salary of each employee by multiplying the
employee salary by 12.

The attributes EmpNo, MobileTelNo and Gender are types of fixed-length


text characters, each column value must contain the maximum number of
characters. You will notice that every column in the EmpNo attribute
consists of five characters, while every column in MobileTelNo attribute
consists of 11 characters. The Gender attribute consists of only one character
that is F for Female or M for Male.

The Name and Position attributes are types of variable length. These
columns contain only the actual number of characters and not the
maximum length. As you can see from the Employee relation, the number
of characters in the Name attribute column varies from 9 to 13, while the
number of characters in the Position attribute column varies from 5 to 13.
Finally, the Date attribute column consists of 10 characters of the format
(DD/MM/YY).

Every attribute in defined on a domain.

Domain is a set of allowable values for one or more attributes.


Connolly & Begg (2009)

For example, in the MobileTelNo attribute, the first three digits is limited to
012/3/6/7/9 which corresponds to the mobile telecommunications service
operators in Malaysia. Similarly, the Gender is limited to the characters F or
M. Table 2.4 summarises the domains for the Employee relation.

Copyright © Open University Malaysia (OUM)


28  TOPIC 2 RELATIONAL DATA MODEL

Table 2.4: Domains for the Employee Relation

Attribute Domain Name Meaning Domain Definition


EmpNo Employee The set of all possible Character; size 5, range
Numbers employee numbers E0001 ă E9999
Name Names The set of all Character; size 20
employee names
MobileTelNo Telephone The set of possible Fixed character; size 11,
Numbers hand phone numbers first 3 digits
in Malaysia 012/013/01/016/017/019
Position Positions The set of possible Variable character; size 15
positions for
employees
Gender Genders Gender of the Character; size 1, value M
employee or F
DOB Dates of Birth Possible values of Date; range from 1-Jan-
staff birth dates 1950, format dd-mm-yy
Salary Salaries Possible values of Numeric: 7 digits; range
staff salaries 8400.00 ă 50000.00

The domain concept is important because it allows the user to define the
meaning and source of values that the attributes can hold.

(c) Tuple
What does a tuple mean?

A tuple is a row of a relation.

Each row in the Employee relation represents the employeeÊs information.


For example, the fourth row in the Employee relation describes an
employee named Lim Kim Hock. The Employee relation contains six
distinct rows. You can describe the employee table as consisting of six
records.

Copyright © Open University Malaysia (OUM)


TOPIC 2 RELATIONAL DATA MODEL  29

SELF-CHECK 2.1

1. What is a relation?

2. What does a column, a row and an intersection represent?

2.1.2 Relational Keys


Rational keys can be divided into four categories as shown in Figure 2.1.

Figure 2.1: Four types of relational keys

Let us learn about them in detail.

(a) Superkey

A superkey is a column or combination of columns that uniquely


identifies a row within a relation.

The combination of every column in a table is always a superkey because


rows in a table must be unique (Mannino, 2011). Given the listing of
Employee relations in Table 2.2, a superkey can be any of the following:

(i) EmpNo;

(ii) EmpNo, Name; and

(iii) EmpNo, Name, MobileTelNo.

Copyright © Open University Malaysia (OUM)


30  TOPIC 2 RELATIONAL DATA MODEL

(b) Candidate Key

A candidate key is described as a superkey without redundancies.


Rob & Coronel (2011)

A relation can have several candidate keys. When a key consists of more
than one attribute, it is known as a composite key. Therefore EmpNo and
Name are composite keys.

A listing of a relation cannot be used to prove that an attribute or


combination of attributes is a candidate key. The fact that there are no
duplicates currently in the Employee relation does not guarantee that
duplicates would not occur in the future. For example, if we take a look at
the rows in our Employee relation, we can also pick the attribute Name as a
candidate because all names are unique in this particular moment.

However, we cannot discount the possibility that someone who shares the
same name as listed above becomes an employee in the future. This may
make the Name attribute an unwise choice as a candidate key because of
duplicates. However, attributes EmpNo and MobileTelNo are suitable
candidate keys as an employeeÊs identification in any organisation is
unique. MobileTelNo can be picked to be the candidate key because we
know that no duplicate hand phone numbers exist, thereby making it
unique.

(c) Primary Key

A primary key is the candidate key selected to uniquely identify rows


within the relation.

You may note that a primary key is a superkey as well. In our employee
table, the EmpNo can be chosen to be the primary key while MobileTelNo
then becomes alternate key.

Copyright © Open University Malaysia (OUM)


TOPIC 2 RELATIONAL DATA MODEL  31

(d) Foreign Key

A foreign key is an attribute or a set of attributes in one table whose


values must match the candidate key of another relation.

When an attribute is used in more than one relation, it represents a


relationship between the two relations. Consider the relations of Product
and Supplier in Figure 2.2.

Figure 2.2: Relations between Supplier and Product

The addition of SuppNo in both the Supplier and Product tables links each
supplier to details of the products that are supplied. In the Supplier
relation, SuppNo is the primary key. In the Product relation, SuppNo
attribute exists to match the product to the supplier. In the Product relation,
SuppNo is the foreign key. Notice that each data value of SuppNo in
Product matches the SuppNo in Supplier. The reverse need not necessarily
be true.

Copyright © Open University Malaysia (OUM)


32  TOPIC 2 RELATIONAL DATA MODEL

SELF-CHECK 2.2
Explain the following:

(a) Superkey (c) Primary key

(b) Candidate key (d) Foreign key

2.1.3 Representing Relational Database Schemas


A relational database consists of any number of relations. The relational schema
for part of Order Entry Database is shown in Table 2.5.

Table 2.5: Relational Database Schemas of Order Entry Database

Schema Relations Item


Customer CustNo, Name, Street, City, Postcode, TelNo, Balance
Employee EmpNo, Name, TelNo, Position, Gender, DOB, Salary
Invoice InvoiceNo, Date, DatePaid, OrderNo
Order OrderNo, OrderDate, OrderStreet, OrderCity, OrderPostcode, CustNo,
EmpNo
OrderDetail OrderNo, ProductNo, QtyOrdered
Product ProductNo, Name UnitPrice, QytOnHand, ReorderLevel, SuppNo
Delivery DeliveryNo, DeliveryDate, OrderNo, ProductNo, EmpNo
Supplier SuppNo, Name, Street, City, Postcode, TelNo, ContactPerson

The standard way of representing a relation schema is to give the name of the
relation followed by attribute names in parenthesis. The primary key is
underlined. An instance of this relational database schema is shown in Figure 2.3.

Copyright © Open University Malaysia (OUM)


TOPIC 2 RELATIONAL DATA MODEL  33

Copyright © Open University Malaysia (OUM)


34  TOPIC 2 RELATIONAL DATA MODEL

Figure 2.3: Instance of Order Entry Database

Copyright © Open University Malaysia (OUM)


TOPIC 2 RELATIONAL DATA MODEL  35

2.2 INTEGRITY CONSTRAINTS


In this subtopic, the set of integrity constraints which ensures that all data are
kept accurate are discussed. You have already been exposed to attribute domain,
a form of constraint that limits on set of values allowed for attributes. Before we
go on to explain two integrity constraints (namely entity integrity and referential
integrity) it is essential to understand the concept of nulls.

2.2.1 Nulls

Null represents an unknown attribute value, known but missing attribute


value or value that is „not applicable‰ for row.
Rob & Coronel (2011)

Nulls are not the same as a zeros or spaces as null represents the absence of a
value (Connolly & Begg, 2009). For example, in the Invoice relation of the Order
Entry Database, the DatePaid attribute in the second row is null until the
customer pays for the order.

2.2.2 Entity Integrity

Entity integrity ensures that a relation must have primary key attribute and
the primary key attribute cannot be null.

This guarantees the primary key as unique and ensures that foreign keys can
accurately reference primary key values. In the Employee table, the EmpNo is the
primary key. We cannot insert new employee details into the table with a null
EmpNo. The OrderDetail has the composite primary key OrderNo and
ProductNo, so to insert a new row, both values must be known.

Copyright © Open University Malaysia (OUM)


36  TOPIC 2 RELATIONAL DATA MODEL

2.2.3 Referential Integrity

Referential integrity means a foreign key value in a relation must match a


primary key value of the tuple in the referenced relation or the foreign key
value can be null.
Connolly & Begg (2009)

For example, in the Order Entry Database, the Product table has the foreign key
SuppNo. You will notice that every entry of SuppNo in the rows of the Product
table matches the SuppNo of the referenced table Supplier. However, we can
create a new product record with a null SuppNo, if currently no suppliers have
been identified to supply the product.

SELF-CHECK 2.3

1. What is a null?
2. Can a primary key value have a null value?
3. What is the value for a foreign key?

2.3 VIEWS

A view is a virtual or derived relation that may be derived from one or more
base relations.
Connolly & Begg (2009)

In this subtopic, a brief discussion of views is given as follows.

Copyright © Open University Malaysia (OUM)


TOPIC 2 RELATIONAL DATA MODEL  37

2.3.1 Base Relations and Views


The relations in the Order Entry Database are base relations. A base relation is a
relation in which the tuples are physically stored in the database. A view is a
virtual relation that does not exist in the database but produced upon request.
The result of one or more operations on the base relations can produce a view. A
view appears to exist for the user but does not exist in storage as base relations
do. Views are dynamic and changes made to the base relations are automatically
reflected in the views (Connolly & Begg, 2009).

2.3.2 Purpose of Views


Views are beneficial for the following reasons:

(a) Allow users to customise data according to their needs, so that the same
data can be seen by different users in different ways at the same time; and

(b) Hide part of the database from selected users, hence providing a powerful
security system. These users will not be aware of the existence of all the
tuples and attributes in the database (Connolly & Begg, 2009).

SELF-CHECK 2.4
1. What is a view?

2. What can you do with a view?

 The relational data model was developed because of its simplicity and
familiar terminology. The model is based on the concept of a relation which is
physically represented as a table (Connolly & Begg, 2009).

 A relation is represented as a two-dimensional table in which the columns


correspond to attributes and rows correspond to tuples.

Copyright © Open University Malaysia (OUM)


38  TOPIC 2 RELATIONAL DATA MODEL

 The intersection of column or row represents a single atomic value. The


values in an attribute must be of the same data type. The values of the
column are from the same attribute domain. The order of the rows and
columns has no significance.

 Superkey is a column or combination of columns that uniquely identifies a


row within a relation. Candidate key is described as a superkey without
redundancies. Primary key is the candidate key that is selected to uniquely
identify rows within the relation. Foreign key is an attribute or a set of
attributes in one table whose values must match the candidate key of another
relation.

 Null represents the absence of a value. Primary key value cannot be null. A
foreign key value must match the primary key value in the related table or it
can be null.

 A view is a virtual or derived relation that may be derived from one or more
base relations. Views allow users to customise data according to their needs
and hide part of the database from certain users providing security to the
database.

Attribute Primary key


Attribute domain Record
Base relation Referential integrity
Candidate key Relation
Column Relational database
Composite key Relational schema
Domain Rows
Entity integrity Superkey
Field Table
File Tuples
Foreign key Views
Null

Copyright © Open University Malaysia (OUM)


TOPIC 2 RELATIONAL DATA MODEL  39

1. How is creating a table similar to writing a chapter of a book?

2. What is the difference between a primary key and a candidate key? Give an
example.

3. The following forms part of a database held in a relational DBMS:

Resort (resortNo, resortName, city, country)

Room (roomNo, resortNo, type, cost, bedQty, bedType)

Booking (bookingNo, resortNo, guestNo, dateFrom, dateTo, roomNo)

Guest (guestNo, guestName, guestAddress)

Resort consists of resort details and resortNo is the primary key. Room
contains room details for each resort and roomNo is the primary key.
Booking contains details of bookings and bookingNo is the primary key.
Guest contains guest details and guestNo is the primary key.

(i) Identify the foreign keys in this schema. Explain how the entity and
referential integrity rules apply to these relations; and

(ii) Produce four sample tables for these relations that observe the
relational integrity rules.

Connolly, T. M., & Begg, C. E. (2009). Database systems: A practical approach to


design, implementation and management (5th ed.). Boston, MA: Addison-
Wesley.

Mannino, M. (2011). Database design, application development and


administration (5th ed.). Scottsdale. AZ: Ediyu.

Rob, P., & Coronel, C. (2011). Database systems: Design, implementation and
management (8th ed.). Stamford, CT: Cengage Learning.

Copyright © Open University Malaysia (OUM)


Topic  Structured Query
Language (SQL):
3 Data
Manipulation
LEARNING OUTCOMES
By the end of this topic, you should be able to:
1. Discuss the purpose and importance of structured query language
(SQL);
2. Identify the main features of SQL; and
3. Describe the basic features of data manipulation language (DML).

 INTRODUCTION
In this topic, you will learn the basic features and functions of structured query
language (SQL). SQL is simple and relatively easy to learn. It is the standard
language for relational database models for data administration (for creating
tables, indexes and views including control access) and data manipulation (to
add, modify, delete and retrieve data). In this topic, the focus is on data
manipulation.

Copyright © Open University Malaysia (OUM)


TOPIC 3 STRUCTURED QUERY LANGUAGE (SQL): DATA MANIPULATION  41

3.1 INTRODUCTION TO STRUCTURED QUERY


LANGUAGE (SQL)
Have you ever wondered how an application works? It uses SQL! The front-end
translates your mouse clicks and text entries into SQL and then „speaks‰ to the
database in the universal language of SQL.

In this subtopic, we will provide a description of what SQL is, give the
background and history of SQL and discuss the importance of SQL to the
database application.

(a) What is SQL?

Structured query language (SQL) is a language used to communicate


with a database.

This language allows us to perform tasks such as to retrieve and update


data in a database. It also allows you to create and define a database. SQL is
very widely used and supported by most database vendors with little
variations in their syntax and features. In other words, if you learn how to
use SQL, you can apply this knowledge to MS Access, SQL Server, Oracle
or Ingres and countless other databases. According to the American
National Standards Institute (ANSI), it is the standard language for
relational database management systems.

(b) SQL Commands


SQL commands can be divided into two main sublanguages:

(i) Data Definition Language (DDL)


Used to define the database structure and control access to the
database; and

(ii) Data Manipulation Language (DML)


Used to retrieve and update data from existing tables within the
database.

In this topic, we focus only on the DML commands while Topic 4 will continue
with the discussion on DDL.

Copyright © Open University Malaysia (OUM)


42  TOPIC 3 STRUCTURED QUERY LANGUAGE (SQL): DATA MANIPULATION

3.1.1 History of SQL


The history of SQL is explained in Table 3.1.

Table 3.1: History of Structured Query Language

Year Description
1970 The relational mode from which SQL draws much of its conceptual core
was formally defined by Dr E. F. Codd, a researcher for IBM
1974 Began the System/R project and developed Structured English Query
Language (SEQUEL)
1974ă1975 System/R was implemented on an IBM prototype called SEQUEL-XRM
1976ă1977 System/R was completely rewritten to include multi-table and multi-
user features. When the system was revised, it was briefly called
„SEQUEL/2‰ and then renamed „SQL‰ for legal reasons
1983 IBM began to develop commercial products that implement SQL based
on their System/R prototype, including DB2

Several other software vendors accepted the rise of the relational model and
announced SQL-based products. These included Oracle, Sybase and Ingres
(based on the University of CaliforniaÊs Berkeley Ingres project).

3.1.2 Importance of SQL


The three importances of SQL are shown in Table 3.2.

Table 3.2: Three Importances of SQL

Importance Description
Standard language for relational It has been globally accepted
database
A powerful data management tool Almost all major database vendors support SQL
Easy to learn SQL is a non-procedural language. You just need
to know what is to be done; you do not need to
know how it is to be done

Copyright © Open University Malaysia (OUM)


TOPIC 3 STRUCTURED QUERY LANGUAGE (SQL): DATA MANIPULATION  43

SELF-CHECK 3.1
1. Briefly explain SQL.

2. Explain one importance of SQL.

3.2 WRITING SQL COMMANDS


Before we introduce the SQL commands, let us look at the following rules for
writing an SQL statement:

(a) SQL is a keyword based language. It consists of:

(i) Reserved Words


A reserved word has a fixed meaning and must be spelled exactly as
required.

(ii) User-defined Words


User-defined words are words to represent the names of various
database objects including tables, columns and indexes. They are
defined by the user.

(b) The SQL syntax is not case sensitive. Thus, words can be typed in either
small or capital letters.

(c) SQL language is a free format. However, to make it more readable, it is


advisable to use indentation and alienation.

(d) The SQL notation used throughout this module follows the Backus Naur
Form (BNF) which is described as follows:

(i) Uppercase letters are used to represent reserved words;

(ii) Lower-case letters are used to represent user-defined words;

(iii) A vertical bar ( | ) indicates a choice among alternatives;

(iv) Curly brackets ({}) indicate a required element; and

(v) Brackets ( [ ] ) indicate an optional element.

Copyright © Open University Malaysia (OUM)


44  TOPIC 3 STRUCTURED QUERY LANGUAGE (SQL): DATA MANIPULATION

SELF-CHECK 3.2
1. What does case-sensitive mean?

2. What is BNF and how is it being used in SQL?

3.3 DATA MANIPULATION


In this topic, we will only focus on the DML commands as shown in Table 3.3.

Table 3.3: Four Commands of Data Manipulation Language

Command Details
SELECT Extracts data from a database table
UPDATE Updates data in a database table
DELETE Deletes data from a database table
INSERT INTO Inserts new data into a database table

As mentioned earlier, SQL statements are not case sensitive. In other words,
SELECT is the same as select. In our discussion and illustration of SQL
commands, we will use tables from the previous topic; Table 2.5 and rename it as
Table 3.4 in this topic.

Table 3.4: Relational Database Schemas of Order Entry Database

Schema Relations Item


Customer CustNo, Name, Street, City, Postcode, TelNo, Balance
Employee EmpNo, Name, TelNo, Position, Gender, DOB, Salary
Invoice InvoiceNo, Date, DatePaid, OrderNo
Order OrderNo, OrderDate, OrderStreet, OrderCity, OrderPostcode, CustNo,
EmpNo
OrderDetail OrderNo, ProductNo, QtyOrdered
Product ProductNo, Name UnitPrice, QytOnHand, ReorderLevel, SuppNo
Delivery DeliveryNo, DeliveryDate, OrderNo, ProductNo, EmpNo
Supplier SuppNo, Name, Street, City, Postcode, TelNo, ContactPerson

Copyright © Open University Malaysia (OUM)


TOPIC 3 STRUCTURED QUERY LANGUAGE (SQL): DATA MANIPULATION  45

3.3.1 Simple Queries


The SQL SELECT statement allows you to retrieve and display selected data
from one or more tables in your database. The SELECT statement also allows you
to group and sort the result in a specified order. The following is the general
form of a SELECT statement:

Syntax
SELECT [DISTINCT | ALL] [*] [column_expression [AS new_name]]
FROM tablename [alias] [....]
[WHERE condition]
[GROUP BY column_list] [HAVING condition]
[ORDER BY column_list]

The meanings of clauses used in the SELECT statement are listed in Table 3.5.

Table 3.5: Six Clauses and Meanings in the SELECT Statement

Clause Meaning
SELECT Specifies the columns or/and expressions that should be in the output
FROM Indicates the table(s) from which data will be obtained
WHERE Specifies the rows to be used. If not included, all table rows are used
GROUP BY Indicate categorisation of results
HAVING Indicate the conditions under which a category (group) will be included
ORDER BY Sorts the result according to specified criteria

The order of these clauses cannot be changed. The SELECT and FROM clauses
are mandatory to use in the SELECT statement and others are optional. The
result of this statement is a table. Next, you are going to learn the variations of
the SELECT statement.

(a) Retrieve All Rows


In this section, we illustrate the variations of the SELECT statement, using
SELECT and FROM clauses.

Example 3.1: To retrieve all columns and all rows

Query 1: Provide list of all information about all employees.

This query requires us to select all columns and all rows from the Employee
table.

Copyright © Open University Malaysia (OUM)


46  TOPIC 3 STRUCTURED QUERY LANGUAGE (SQL): DATA MANIPULATION

Let us take a look at how to write this query.

SELECT EmpNo, Name, TelNo, Position, Gender, DOB, Salary


FROM Employee;

For queries that require listing all columns, the SELECT clause can be
shortened by using asterisks (*). Therefore, you may write the query above
as:

SELECT *
FROM Employee;

Both statements produce the same result as shown in Table 3.6.

Table 3.6: Result Table for Query 1

EmpNo Name TelNo Position Gender DOB Salary


E1708 Shan Dass 012-5463344 Administrator F 19-Feb-1975 980
E1214 Tan Haut Lee 017-6697123 Salesperson M 23-Dec-1969 1500
E1090 Ahmad 013-6710899 Manager M 07-May-1960 3000
Zulkifli
E3211 Lim Kim 017-5667110 Assistant M 15-Jun-1967 2600
Hock Manager
E4500 Lina Hassan 012-6678190 Clerk F 31-May-1980 750
E5523 Mohd 013-3506711 Clerk M 14-Feb-1979 600
Firdaus

Example 3.2: To retrieve specific columns, all rows

Query 2: Display names, salary and position for all employees.

This query requires selecting only specific columns from the Employee
table.

Let us take a look at how to write this query:

SELECT Name, Salary, Position


FROM Employee;

Copyright © Open University Malaysia (OUM)


TOPIC 3 STRUCTURED QUERY LANGUAGE (SQL): DATA MANIPULATION  47

As mentioned earlier, a result of a SQL statement is a relation or table. The


arrangement of the columns in the result table is based on the order written
at the SELECT clause. Thus, in this example, you will see that the columns
of your result table as shown in the Table 3.7 are listed in the order of name,
salary and position.

Table 3.7: Results Table for Query 2

Name Salary Position


Shan Dass 980 Administrator
Tan Haut Lee 1500 Salesperson
Ahmad Zulkifli 3000 Manager
Lim Kim Hock 2600 Assistant Manager
Lina Hassan 750 Clerk
Mohd Firdaus 600 Clerk

Example 3.3: Use of DISTINCT

The keyword DISTINCT is used in the SELECT clause for retrieving non-
duplicate data from a column or columns.

Query 3: Display a list of positions that is recorded in the employee table.

This query can be written as follows and the result is as shown in Table 3.8:

SELECT Position
FROM Employee;

Table 3.8: Result Table for Query 3 without DISTINCT Keyword

Position
Administrator
Salesperson
Manager
Assistant Manager
Clerk
Clerk

Copyright © Open University Malaysia (OUM)


48  TOPIC 3 STRUCTURED QUERY LANGUAGE (SQL): DATA MANIPULATION

The result above contains duplicates, in which the Clerk is written twice.
What if we only want to select each distinct element of position? This is
easy to accomplish in SQL. All we need to do is to use DISTINCT keyword
after SELECT.

The syntax is as follows:

SELECT DISTINCT column_name


FROM table_name;

Therefore, we rewrite the query as:

SELECT DISTINCT (Position)


FROM Employee;

With the statement above, the duplicate is eliminated and we get the result
table as shown in Table 3.9.

Table 3.9: Result Table for Query 3 with DISTINCT Keyword

Position
Administrator
Salesperson
Manager
Assistant Manager
Clerk

(b) Row Selection (WHERE Clause)


In our prior examples of SELECT statements, we retrieve all data or rows in
specified columns from a table. To select only some rows or to specify a
selection criterion, we use the WHERE clause. The WHERE clause filters
rows from the FROM clause tables. Omitting the WHERE clause specifies
that all rows are used.

Copyright © Open University Malaysia (OUM)


TOPIC 3 STRUCTURED QUERY LANGUAGE (SQL): DATA MANIPULATION  49

There are five basic search conditions that can be used in a query as shown
in Table 3.10.

Table 3.10: Five Search Conditions in a Query

Search Conditon Description


Comparison Compares the value of an expression to the value of another
expression
Range Tests whether the value of an expression falls within a
specified range of values
Set membership Tests whether a value matches any value in a set of values
Pattern match Tests whether a string matches a specified pattern
Null Tests a column for null (unknown) value

Source: Connolly & Begg (2009)

Each type of these search conditions will be presented in this section.

Example 3.4: Comparison search condition

Query 4: List all employees with a salary greater than RM1000.

SELECT EmpNo, Name, TelNo, Position, Salary


FROM Employee
WHERE Salary > 1000;

This statement filters all rows based on the condition where the salary is
greater than 1000. The result returns from this statement is shown in
Table 3.11.

Table 3.11: Result Table for Example 3.4

EmpNo Name TelNo Position Salary


E1214 Tan Haut Lee 017-6697123 Salesperson 1500
E1090 Ahmad Zulkifli 013-6710899 Manager 3000
E3211 Lim Kim Hock 017-5667110 Assistant 2600
Manager

Copyright © Open University Malaysia (OUM)


50  TOPIC 3 STRUCTURED QUERY LANGUAGE (SQL): DATA MANIPULATION

Figure 3.1 shows a list of comparison operators that can be used in the
WHERE clause. In addition, a more complex condition can be generated
using the logical operators AND, OR and NOT.

Operator Description
= Equal
<> or != Not equal
> Greater than
< Less than
>= Greater than or equal
<= Less than or equal

Figure 3.1: Comparison operators

Example 3.5: Compound comparison search condition

Query 5: List all employees whose position is as Clerk or Salesperson.

SELECT EmpNo, Name, TelNo, Position, Gender


FROM Employee
WHERE Position = ÂClerkÊ or Position = ÂSalespersonÊ;

This statement uses the logical operator OR in the WHERE clause to find
employees with a position as Clerk or Salesperson. Table 3.12 shows the
result returns from executing this statement.

Table 3.12: Result Table for Example 3.5

EmpNo Name TelNo Position Gender


E1214 Tan Haut Lee 017-6697123 Salesperson M
E4500 Lina Hassan 012-6678190 Clerk F
E5523 Mohd Firdaus 013-3506711 Clerk M

Example 3.6: Range search condition (BETWEEN)

Query 6: Find employees with a salary between RM1000 and RM3000.

SELECT EmpNo, Name, TelNo, Position, Salary


FROM Employee
WHERE Salary BETWEEN 1000 AND 3000;

Copyright © Open University Malaysia (OUM)


TOPIC 3 STRUCTURED QUERY LANGUAGE (SQL): DATA MANIPULATION  51

An easier method to define a range in the WHERE clause is by using the


BETWEEN key word as shown in the above statement. The BETWEEN test
includes the endpoints of the range.

So, in this example, the condition in the WHERE clause can also be written
as:

WHERE Salary >= 1000 and Salary <= 3000;

The results return from executing both statements are shown in Table 3.13.

Table 3.13: Result Table for Example 3.6

EmpNo Name TelNo Position Salary


E1214 Tan Haut Lee 017-6697123 Salesperson 1500
E1090 Ahmad Zulkifli 013-6710899 Manager 3000
E3211 Lim Kim Hock 017-5667110 Assistant Manager 2600

Example 3.7: Set membership search condition (IN/NOT IN)

Query 7: List all salespersons and clerks.

SELECT EmpNo, Name, TelNo, Position, Gender


FROM Employee
WHERE Position IN (ÂClerkÊ, ÂSalespersonÊ);

Set membership condition (IN) tests whether a value matches any value in
a set of values. In this query, it finds rows in the Employee table with
positions as clerks or salespersons. This statement returns the result as
shown in the Table 3.14, which is similar to the results for the query in
Example 5.

Table 3.14: Results Table for Example 3.7

EmpNo Name TelNo Position Gender


E1214 Tan Haut Lee 017-6697123 Salesperson M
E4500 Lina Hassan 012-6678190 Clerk F
E5523 Mohd Firdaus 013-3506711 Clerk M

Copyright © Open University Malaysia (OUM)


52  TOPIC 3 STRUCTURED QUERY LANGUAGE (SQL): DATA MANIPULATION

There is also a negated version (NOT IN) that can be used to list all rows
excluded from the IN list, for instance, if we want to find employees that
are not clerks or salespersons. This query can be expressed as follows and
the result table is shown in Table 3.15.

SELECT EmpNo, Name, TelNo, Position, Gender


FROM Employee
WHERE Position NOT IN (ÂClerkÊ, ÂSalespersonÊ);

Table 3.15: Result Table for a Query Using NOT IN Keyword

EmpNo Name TelNo Position Gender DOB Salary


E1708 Shan Dass 012-5463344 Administrator F 19-Feb-1975 980
E1090 Ahmad 013-6710899 Manager M 07-May-1960 3000
Zulkifli
E3211 Lim Kim 017-5667110 Assistant M 15-Jun-1967 2600
Hock Manager

(c) Use of LIKE


In our earlier examples, we have looked at conditions that involve exact
matches. However, in some cases, exact matches will not work. For
example, you might only know a certain character or string of the desired
value. In such cases, you use the LIKE operator with a wildcard symbol as
shown in Figure 3.2.

Wildcard Symbol Description


% (percentage) Sequence of zero or more characters
_ (underscore) Any single character

Figure 3.2: Wildcard symbol

Example 3.8: Pattern match search condition (LIKE)

Query 8: Find all employees who have Celcom prepaid numbers. In other
words, their hand phone numbers must start with 013.

SELECT EmpNo, Name, TelNo


FROM Employee
WHERE TelNo LIKE Â013%Ê;

Copyright © Open University Malaysia (OUM)


TOPIC 3 STRUCTURED QUERY LANGUAGE (SQL): DATA MANIPULATION  53

This statement list all phone numbers starts with 013 and it does not matter
what numbers or characters are following it.

The result table returns from executing this statement is shown in the
Table 3.16.

Table 3.16: Result Table from Example 3.8

EmpNo Name TelNo


E1090 Ahmad Zulkifli 013-6710899
E5523 Mohd Firdaus 013-3506711

SELF-CHECK 3.3
1. What is the purpose of using the SELECT statement?
2. Explain the function of each of the clauses in the SELECT
statement.
3. By referring to Table Employee, write the SELECT statement to:
(a) Display the names of all employees;
(b) Display the names of the employees whose salary is less
than 1000; and
(c) Display the names of the salespersons and their salary.

3.3.2 Sorting Results


Now, we will look at how to sort the rows of the result table using the ORDER
BY clause.

Example 3.9: Single column ordering

Query 9: List salaries for all employess, arranged in ascending order of salary.

SELECT EmpNo, Name, TelNo, Position, Salary


FROM Employee
ORDER BY Salary DESC;

Copyright © Open University Malaysia (OUM)


54  TOPIC 3 STRUCTURED QUERY LANGUAGE (SQL): DATA MANIPULATION

If you want to sort the list in descending order, the word DESC must be specified
in the ORDER BY clause after the column name, as seen here:

SELECT EmpNo, Name, TelNo, Position, Salary


FROM Employee
ORDER BY Salary DESC;

Executing this statement will produce results as in Table 3.17 for ascending list
and Table 3.18 for descending list.

Table 3.17: Result Table for Sorting Salary in Ascending Order

EmpNo Name TelNo Position Salary


E5523 Mohd Firdaus 013-3506711 Clerk 600
E4500 Lina Hassan 012-6678190 Clerk 750
E1708 Shan Dass 012-5463344 Administrator 980
E1214 Tan Haut Lee 017-6697123 Salesperson 1500
E3211 Lim Kim Hock 017-5667110 Assistant Manager 2600
E1090 Ahmad Zulkifli 013-6710899 Manager 3000

Table 3.18: Result Table for Sorting Salary in Descending Order

EmpNo Name TelNo Position Salary


E1090 Ahmad Zulkifli 013-6710899 Manager 3000
E3211 Lim Kim Hock 017-5667110 Assistant Manager 2600
E1214 Tan Haut Lee 017-6697123 Salesperson 1500
E1708 Shan Dass 012-5463344 Administrator 980
E4500 Lina Hassan 012-6678190 Clerk 750
E5523 Mohd Firdaus 013-3506711 Clerk 600

Copyright © Open University Malaysia (OUM)


TOPIC 3 STRUCTURED QUERY LANGUAGE (SQL): DATA MANIPULATION  55

Example 3.10: Multicolumn ordering

Query 10: List the employees sorted by position and in each position sort the list
in descending order by salary.

This query requires using two sort keys. The Position is the primary sort key and
the Salary is the secondary or minor sort key. The primary sort key has to be
written first in the list and followed by minor keys. You may have more than one
minor key.

SELECT EmpNo, Name, TelNo, Position, Salary


FROM Employee
ORDER BY Position, Salary DESC;

This statement will provide us with the table as shown in Table 3.19.

Table 3.19: Result Table for Example 3.10

EmpNo Name TelNo Position Salary


E1708 Shan Dass 012-5463344 Administrator 980
E3211 Lim Kim Hock 017-5667110 Assistant Manager 2600
E4500 Lina Hassan 012-6678190 Clerk 750
E5523 Mohd Firdaus 013-3506711 Clerk 600
E1090 Ahmad Zulkifli 013-6710899 Manager 3000
E1214 Tan Haut Lee 017-6697123 Salesperson 1500

SELF-CHECK 3.4
1. Write the SELECT statement to display all the information about
the employess, sorted by the names of employees, in descending
order.

2. Write the SELECT statement to display the employees sorted by


position and in each position, sort the list in ascending order by
salary.

Copyright © Open University Malaysia (OUM)


56  TOPIC 3 STRUCTURED QUERY LANGUAGE (SQL): DATA MANIPULATION

3.3.3 Using the SQL Aggregate Functions


In this subtopic, we are going to discuss five important aggregate functions as
shown in Table 3.20. They are called aggregate functions because they summarise
the results of a query, rather than listing all of the rows.

Table 3.20: Five Aggregate Functions

Aggregate Function Description


COUNT Gives the number of rows satisfying the conditions
SUM Gives the total of the values in a specified column
AVG Gives the average of a specified column
MIN Gives the smallest value in a specified column
MAX Gives the largest value in a specified column

In this subtopic, we use Product and Delivery tables shown in Table 3.21 and
Table 3.22 to illustrate the use of these aggregate functions.

Table 3.21: Product

ProductNo Name UnitPrice QtyOnHand ReorderLevel SuppNo


P2344 17 inch Monitor 200 20 15 S8843
P2346 19 inch Monitor 250 15 10 S8843
P4590 Laser Printer 650 5 10 S9888
P5443 Colour Laser 750 8 5 S9898
Printer
P6677 Colour Scanner 350 15 10 S9995

Table 3.22: Delivery

DeliveryNo DeliveryDate OrderNo ProductNo EmpNo


D5505 27-Jan-2013 1120 P4590 E5523
D5600 28-Jan-2013 1120 P6677 E4500
D5601 28-Jan-2013 1120 P2344 E4500
D5650 23-Feb-2013 4399 P2344 E5523
D5651 23-Feb-2013 4399 P5443 E5523
D5700 20-Apr-2013 6234 P2346 E4500
D5710 08-May-2013 9503 P2344 E4500

Copyright © Open University Malaysia (OUM)


TOPIC 3 STRUCTURED QUERY LANGUAGE (SQL): DATA MANIPULATION  57

Example 3.11: Use of COUNT(*)

Query 11: Find the number of products supplied by supplier number S8843.

SELECT COUNT(ProductNo) AS NumOfProduct


FROM Product
WHERE SuppNo = ÂS8843Ê;

This statement will only count the number of products that is supplied by a
supplier with supplier number S8843. In Example 3.11, the return value of this
statement is 2 as shown in the Table 3.23.

Table 3.23: Result Table from Example 3.11

NumOfProduct
2

Example 3.12: Use of COUNT(DISTINCT)

Query 12: How many different products were delivered from January to April in
the year 2013?

SELECT COUNT(DISTINCT(ProductNo) AS NumOfProduct


FROM Delivery
WHERE DeliveryDate BETWEEN Â1-Jan-13Ê AND Ê30-April-13Ê;

The use of keyword DISTINCT eliminates duplicate products delivered during


the search condition. The result return by this statement is shown in the
Table 3.24:

Table 3.24: Result Table from Example 3.12

NumOfProduct
5

Copyright © Open University Malaysia (OUM)


58  TOPIC 3 STRUCTURED QUERY LANGUAGE (SQL): DATA MANIPULATION

Example 3.13: Use of COUNT and SUM

Query 13: Count the number of products with less than RM500 per unit and total
its QuantityOnHand.

SELECT COUNT(ProductNo) AS NumOfProduct, SUM(QtyOnHand) AS


TotalStock
FROM Delivery
WHERE DeliveryDate < 500;

This statement counts the number of price that is less than 500 and sums up its
QuantityOnHand. The result is shown in Table 3.25.

Table 3.25: Result Table for Example 3.13

NumOfProduct TotalStock
3 50

Example 3.14: Use of MIN, MAX, AVE

Query 14: Find the minimum, maximum and the average price per unit

SELECT MIN(PricePerUnit) AS Minimum, MAX(PricePerUnit), AS Maximum,


AVE(PricePerUnit) AS Average
FROM Product;

The result table for the above statement is shown in Table 3.26.

Table 3.26: Result Table for Example 3.14

Minimum Maximum Average


200 750 440

3.3.4 Grouping Results


In the previous examples of aggregate functions (COUNT, MAX, MIN, SUM), we
have seen that the outputs of these functions are summarised into a single row of
the result table. There may be an instance where we want a summary of these
functions to be grouped according to a specified column. This grouping
summary can be done by using the GROUP BY clause.

Copyright © Open University Malaysia (OUM)


TOPIC 3 STRUCTURED QUERY LANGUAGE (SQL): DATA MANIPULATION  59

(a) GROUP BY clause


Example 3.15 illustrates the use of the GROUP BY clause. Note that only
column names that appear in the GROUP BY clause can be in the SELECT
list, unless the names are used in the aggregate functions.

Example 3.15: Use of GROUP BY

Query 15: Find the number of products supplied by each supplier.

SELECT SuppNo, COUNT(ProductNo) AS NumOfProduct


FROM Product
GROUP BY SuppNo;

This statement finds the number of products using aggregate function


COUNT for each Supplier based on the SuppNo. The result table is shown
in Table 3.27.

Table 3.27: Result Table for Example 3.15

SuppNo NumOfProduct
S8843 2
S9888 1
S9898 1
S9995 1

(b) Restricting Grouping (HAVING Clause)


Similar to the WHERE clause which allows us to filter rows, we can use the
HAVING clause to filter the groups specified in the GROUP BY clause.
Therefore, when you want to use the HAVING clause, the GROUP BY
clause must be applied too. Typically, the aggregate function is used in the
condition in the HAVING clause.

Example 3.16: Use of HAVING

Query 16: Find the OrderNo that has more than one product.

SELECT OrderNo, Count(ProductNo) AS NumOfProduct


FROM Delivery
GROUP BY OrderNo
HAVING COUNT(ProductNo) > 1;

Copyright © Open University Malaysia (OUM)


60  TOPIC 3 STRUCTURED QUERY LANGUAGE (SQL): DATA MANIPULATION

This operation groups the Delivery data based on OrderNo and lists only
those groups that have more than one product. The output of this operation
is shown in Table 3.28.

Table 3.28: Result table in Example 3.16

OrderNo NumOfProduct
1120 3
4399 2

3.3.5 Subqueries
In this subtopic, we are going to learn how to use subqueries. Here, we provide
examples of subqueries that involve the use of the SELECT statement within
another SELECT statement which is also sometimes referred to as nested
SELECT. In terms of the order of the execution, the inner SELECT will be
performed first and the result of the inner SELECT will be used for the filter
condition in the outer SELECT statement.

(a) Use of Subqueries


Example 3.17 and 3.18 illustrates the use of subqueries.

Example 3.17: Using a subquery with equality

Query 17: List the product names and its price per unit for products that are
supplied by ABX Technics.

SELECT Names AS ProductNames, UnitPrice


FROM Product
WHERE SuppNo = { SELECT SuppNo
FROM Supplier
WHERE Name = „ABX Technics‰};

First, the inner SELECT statement is executed to get the supplier number of
ABX Technics. The output from this statement is tested as part of the search
condition in the WHERE clause of the outer SELECT statement. Note that
the „=‰ sign has been used in the WHERE clause of the outer SELECT since
the result of the inner SELECT contains only one value.

Copyright © Open University Malaysia (OUM)


TOPIC 3 STRUCTURED QUERY LANGUAGE (SQL): DATA MANIPULATION  61

The final result table from this query is shown in Table 3.29.

Table 3.29: Result table in Example 3.17


ProductNames UnitPrice
17 inch Monitor 200
19 inch Monitor 250

Example 3.18: Nested subquery: Use of IN

List the supplier number, product names and its price per unit for products
that are supplied by the supplier from Petaling Jaya.

SELECT SuppNo AS SupplierNo, Names AS ProductName, UnitPrice


FROM Product
WHERE SuppNo IN { SELECT SuppNo
FROM Supplier
WHERE City = „Petaling Jaya‰};

First, the inner SELECT statement can have more than one value. Therefore,
the „IN‰ keyword is used in the search condition in the WHERE clause in
the outer SELECT statement. The result table for the above statement is
shown in Table 3.30.

Table 3.30: Result Table for Example 3.18

SupplierNo ProductNames UnitPrice


S9898 Colour Laser Printer 200
S9990 Colour Scanner 250

(b) Multitable Queries


So far, we have learnt how to retrieve data using the SELECT statement
from only one table. Sometimes, we need results that contain columns from
more than one table. Thus, we need to perform a join operation to combine
these columns into one result table. To perform a join, we need to specify
the tables to be used in the FROM clause. The join condition that specifies
the matching or common column(s) of the tables to be joined is written in
the WHERE clause. Example 3.19, 3.20 and 3.21 illustrate how to join tables.

Copyright © Open University Malaysia (OUM)


62  TOPIC 3 STRUCTURED QUERY LANGUAGE (SQL): DATA MANIPULATION

Example 3.19: Simple join

Query 19: List the supplier names for each product.

SELECT p.Name AS ProductName, s.Names AS SupplierName


FROM Product p, Supplier s
WHERE s.SuppNo = p.SuppNo;

This statement joins two tables which are Product and Supplier. Since the
common column in both tables is SuppNo, therefore this column is used for
the join condition in the WHERE clause. The output of this simple join
statement is shown in Table 3.31.

Table 3.31: Result Table for Example 3.19

ProductName SupplierName
17 inch Monitor ABX Technics
19 inch Monitor ABX Technics
Laser Printer Soft System
Colour Laser Printer ID Computers
Colour Scanner ITN Suppliers

Example 3.20: Sorting a join

Query 20: Sort the list of products based on supplier name and for each
supplier name, sort the list based on Product names in descending order.

SELECT Product.Name AS ProductName, Supplier. Names AS


SupplierName
FROM Product p, Supplier s
WHERE s.SuppNo = p.SuppNo
ORDER BY s.Name, p.Name DESC;

Copyright © Open University Malaysia (OUM)


TOPIC 3 STRUCTURED QUERY LANGUAGE (SQL): DATA MANIPULATION  63

This statement is similar to the previous example, except it includes the


ORDER BY clause for sorting purposes. The result is sorted in ascending
order by supplier name and for those suppliers that have more than one
product, the product name is sorted in descending order (refer to
Table 3.32).

Table 3.32: Result Table for Example 3.20

ProductName SupplierName
19 inch Monitor ABX Technics
17 inch Monitor ABX Technics
Laser Printer Soft System
Colour Laser Printer ID Computers
Colour Scanner ITN Suppliers

Example 3.21: Three table join

Query 21: Find the supplier names of the product that were delivered in Jan
2013. Sort the list based on Supplier name.

SELECT Supplier.Names AS SupplierName, Product. Name AS ProductName,


DeliveryDate
FROM Supplier s, Product p, Delivery d
WHERE s.SuppNo = p.SuppNo AND p.ProductNo = d.ProductNo AND
(DeliveryDate >= „1-Jan-13‰ and DeliveryDate <= „31-Jan-13‰)
ORDER BY s.Name;

These queries require us to join three tables. All the join conditions are
listed in the WHERE clause. As noted earlier, the common column names
for both tables to be joined need to be used as the join condition. To join the
supplier and product, the supplier number is used and to join the product
and delivery tables, the product number is used. The result from this join is
shown in the Table 3.33.

Table 3.33: Result Table for Example 3.21

SupplierName ProductName DeliveryDate


ABX Technics 17 inch Monitor 27-Jan-2013
ITN Suppliers Colour Scanner 28-Jan-2013
Soft System Laser Printer 27-Jan-2013

Copyright © Open University Malaysia (OUM)


64  TOPIC 3 STRUCTURED QUERY LANGUAGE (SQL): DATA MANIPULATION

3.4 DATABASE UPDATES


Existing tables can be altered in various ways. The simplest is to update one or
more attribute values within a row. UPDATE statements use WHERE clauses to
identify which row(s) to update.

In this subtopic, you are going to learn SQL commands that are used for
modifying the contents of a table in a database. The SQL commands that are
commonly used are as shown in Figure 3.3.

Figure 3.3: Three SQL commands commonly used

3.4.1 INSERT
INSERT is used to add new records or data into an existing database table. The
syntax for INSERT command is as follows:

INSERT INTO tablename [(Column List)]


VALUES(dataValue List)

(a) columnList is optional; if omitted, SQL assumes the column list and its
order are similar to the column names that you specified when you first
created the table;

(b) Any columns omitted must have been declared as NULL when the table
was created, unless DEFAULT was specified when creating a column; and

Copyright © Open University Malaysia (OUM)


TOPIC 3 STRUCTURED QUERY LANGUAGE (SQL): DATA MANIPULATION  65

(c) dataValueList must match columnList as follows:


(i) Must have the same number of items in each list;

(ii) Must be a direct correspondence in position of items in both lists; and

(iii) The data type of each item in dataValueList must be compatible with
data type of the corresponding column.

In this subtopic, we illustrate the variation of the INSERT statement using the
table of Supplier as in Table 3.34.

Table 3.34: Supplier Table

SupNo Name Street City Postcode TelNo ContactPerson


S8843 ABX Technics 12, Jalan Subang Subang Jaya 45600 56334532 Teresa Ng
S9884 Soft System 239, Jalan 2/2 Shah Alam 40450 55212233 Fatimah
S9898 ID 70, Jalan Hijau Petaling Jaya 41700 77617709 Larry Wong
Computers
S9990 ITN 45, Jalan Maju Subang Jaya 45610 56345505 Tang Lee Huat
Suppliers
S9995 FAST 3, Lahad Lane Petaling Jaya 41760 77553434 Henry
Delivery

Example 3.22: To add a new row

Query 22: Add a new record as given below to the Supplier table.

Supplier Number: S9996


Supplier Name: NR Tech
Supplier Address: 20 Jalan Selamat, 62000 Kuala Lumpur,
Supplier Tel No: 23456677
Contact Person: Nick

This query can be written as:

INSERT into Supplier (SupNo, Name, Street, City, Postcode, TelNo,


ContactPerson)
VALUES („S9996‰, „NR Tech‰, „20 Jalan Selamat‰, „Kuala Lumpur‰, 62000,
23456677, „Nick‰);

Copyright © Open University Malaysia (OUM)


66  TOPIC 3 STRUCTURED QUERY LANGUAGE (SQL): DATA MANIPULATION

Since you want to insert values for all the columns in the table, therefore you
may omit the column list. Thus, you may write the SQL statement as:

INSERT into Supplier


VALUES („S9996‰, „NR Tech‰, „20 Jalan Selamat‰, „Kuala Lumpur‰, 62000,
23456677, „Nick‰);

Note that you must enclose the values of a non-numeric column in quotation
marks such as „Kuala Lumpur‰ for the City. Executing any of this statement will
give us the result in Table 3.35.

Table 3.35: Result Table for Example 3.22

SupNo Name Street City Postcode TelNo ContactPerson


S8843 ABX Technics 12, Jalan Subang Subang Jaya 45600 56334532 Teresa Ng
S9884 Soft System 239, Jalan 2/2 Shah Alam 40450 55212233 Fatimah
S9898 ID Computers 70, Jalan Hijau Petaling Jaya 41700 77617709 Larry Wong
S9990 ITN Suppliers 45, Jalan Maju Subang Jaya 45610 56345505 Tang Lee Huat
S9995 FAST Delivery 3, Lahad Lane Petaling Jaya 41760 77553434 Henry
S9996 NR Tech 20 Jalan Selamat Kuala Lumpur 62000 23456677 Nick

Example 3.23: Insert a row into a specified column

You may insert a new record with only a specific column into a table. However,
for every mandatory column, the column that is defined as NOT NULL in the
CREATE TABLE statement must be supplied with a value.

Query 23: Add a new record as given below to the Supplier table.

Supplier Number: S9997


Supplier Name: Total System
Supplier Address: 25 Jalan Tanjung, Kuala Lumpur,
Supplier Tel No: 23456677

Copyright © Open University Malaysia (OUM)


TOPIC 3 STRUCTURED QUERY LANGUAGE (SQL): DATA MANIPULATION  67

In this example, the data provided is not complete. Some information is missing
such as the postcode and contact person. In this case, you only need to specify
the column names that we are going to use. You may also omit the column list
but a NULL value is required for the column name that has no value.

INSERT into Supplier (SupNo, name, street, city, TelNo)


VALUES („S9997‰, „Total System‰, „25 Jalan Tanjung‰, „Kuala Lumpur‰,
4385667);

You may also write as:

INSERT into Supplier


VALUES („S9997‰, „Total System‰, „25 Jalan Tanjung‰, „Kuala Lumpur‰,
NULL, 4385667, NULL);

The result of this INSERT operation is given in Table 3.36.

Table 3.36: Result Table for Example 3.23

SupNo Name Street City Postcode TelNo ContactPerson


S8843 ABX Technics 12, Jalan Subang Subang Jaya 45600 56334532 Teresa Ng
S9884 Soft System 239, Jalan 2/2 Shah Alam 40450 55212233 Fatimah
S9898 ID Computers 70, Jalan Hijau Petaling Jaya 41700 77617709 Larry Wong
S9990 ITN Suppliers 45, Jalan Maju Subang Jaya 45610 56345505 Tang Lee Huat
S9995 FAST Delivery 3, Lahad Lane Petaling Jaya 41760 77553434 Henry
S9996 NR Tech 20 Jalan Selamat Kuala Lumpur 62000 23456677 Nick
S9997 Total System 25 Jalan Tanjung Kuala Lumpur 43385667

Copyright © Open University Malaysia (OUM)


68  TOPIC 3 STRUCTURED QUERY LANGUAGE (SQL): DATA MANIPULATION

3.4.2 UPDATE
The UPDATE statement is used to update or change records that match specified
criteria. This is accomplished by carefully constructing a WHERE clause.

The syntax of the UPDATE statement is as follows:

UPDATE TableName
SET columnName1 = dataValue1
[, columnName2 = dataValue2...]
[WHERE searchCondition]

(a) TableName is the name of a table;

(b) SET clause specifies the names of one or more columns that are to be
updated;

(c) WHERE clause is optional;

(d) If omitted, named columns are updated for all rows in the table;

(e) If specified, only those rows that satisfy searchCondition are updated; and

(f) New dataValue(s) must be compatible with data type for corresponding
column.

Let us look at the variance in the use of the UPDATE statement for modifying
values in a table.

Example 3.24: Update all rows

Updating may involve modifying a particular column for all records in a table.

Query 24: Increase the salary of each employee to 10% pay rise.

The UPDATE statement will be as follows:

UPDATE Employee
SET Salary = Salary*1.10;

Copyright © Open University Malaysia (OUM)


TOPIC 3 STRUCTURED QUERY LANGUAGE (SQL): DATA MANIPULATION  69

The result table from this operation is shown in Table 3.37.

Table 3.37: Result Table for Example 3.24

EmpNo Name TelNo Position Gender DOB Salary


E1708 Shan Dass 012-5463344 Administrator F 19-Feb-1975 1078
E1214 Tan Haut Lee 017-6697123 Salesperson M 23-Dec-1969 1650
E1090 Ahmad Zulkifli 013-6710899 Manager M 07-May-1960 3300
E3211 Lim Kim Hock 017-5667110 Assistant M 15-Jun-1967 2860
Manager
E4500 Lina Hassan 012-6678190 Clerk F 31-May-1980 825
E5523 Mohd Firdaus 013-3506711 Clerk M 14-Feb-1979 660

Example 3.25: Update specified rows

Query 25: Increase the salary only for managers by 5%.

If the changes are only for particular rows with a specified criteria, then the
WHERE clause needs to be used in the statement. This can be written as follows:

UPDATE Staff
SET Salary = Salary*1.05
WHERE Position = ÂManagerÊ;

The result of this operation is given in Table 3.38.

Table 3.38: Result Table for Example 3.25

EmpNo Name TelNo Position Gender DOB Salary


E1708 Shan Dass 012-5463344 Administrator F 19-Feb-1975 980
E1214 Tan Haut Lee 017-6697123 Salesperson M 23-Dec-1969 1500
E1090 Ahmad Zulkifli 013-6710899 Manager M 07-May-1960 3150
E3211 Lim Kim Hock 017-5667110 Assistant Manager M 15-Jun-1967 2600
E4500 Lina Hassan 012-6678190 Clerk F 31-May-1980 750
E5523 Mohd Firdaus 013-3506711 Clerk M 14-Feb-1979 600

Copyright © Open University Malaysia (OUM)


70  TOPIC 3 STRUCTURED QUERY LANGUAGE (SQL): DATA MANIPULATION

Example 3.26: Update specified row and specific column

Query 26: Update the contact person, Ahmad for Total System.

We may also sometimes only need to update one column for a specific row. For
instance, this query requires us to update the contact person in the Supplier table,
Ahmad for supplier name Total System. Thus, the UPDATE statement for this
query would be as follows:

UPDATE Supplier
SET ContactPerson = „Ahmad‰
WHERE Name = ÂTotal SystemÊ;

The result of this operation is shown in Table 3.39.

Table 3.39: Result table in Example 3.26

Contact
SupNo Name Street City Postcode TelNo
Person
S8843 ABX Technics 12, Jalan Subang Subang Jaya 45600 56334532 Teresa Ng
S9884 Soft System 239, Jalan 2/2 Shah Alam 40450 55212233 Fatimah
S9898 ID 70, Jalan Hijau Petaling Jaya 41700 77617709 Larry Wong
Computers
S9990 ITN Suppliers 45, Jalan Maju Subang Jaya 45610 56345505 Tang Lee Huat
S9995 FAST 3, Lahad Lane Petaling Jaya 41760 77553434 Henry
Delivery
S9996 NR Tech 20, Jalan Selamat Kuala Lumpur 62000 23456677 Nick
S9997 Total System 25, Jalan Tanjung Kuala Lumpur 43385667 Ahmad

3.4.3 DELETE
The DELETE statement is used to delete records or rows from an existing table.

The syntax for the DELETE statement is as follows:

DELETE FROM TableName


[WHERE searchCondition]

Copyright © Open University Malaysia (OUM)


TOPIC 3 STRUCTURED QUERY LANGUAGE (SQL): DATA MANIPULATION  71

(a) TableName can be the name of a base table or an updatable view; and

(b) searchCondition is optional; if omitted, all rows are deleted from the table.
This does not delete the table. If search_condition is specified, only those
rows that satisfy the condition are deleted.

Examples 3.27 and 3.28 show the use of the DELETE command:

Example 3.27: Delete specified records or rows

Query 27: Delete supplier name „Total System‰ from the Supplier table.

You need to use the WHERE clause when you want to delete only a specified
record. Thus, the statement would be as follows:

DELETE FROM Supplier


WHERE Name = ÂTotal SystemÊ;

Table 3.40 shows the Supplier table after deleting records of the supplier named
Total System.

Table 3.40: Result Table for Example 3.27

SupNo Name Street City Postcode TelNo ContactPerson


S8843 ABX Technics 12, Jalan Subang Subang Jaya 45600 56334532 Teresa Ng
S9884 Soft System 239, Jalan 2/2 Shah Alam 40450 55212233 Fatimah
S9898 ID Computers 70, Jalan Hijau Petaling Jaya 41700 77617709 Larry Wong
S9990 ITN Suppliers 45, Jalan Maju Subang Jaya 45610 56345505 Tang Lee Huat
S9995 FAST 3, Lahad Lane Petaling Jaya 41760 77553434 Henry
Delivery
S9996 NR Tech 20 Jalan Selamat Kuala Lumpur 62000 23456677 Nick

Example 3.28: Delete all records or rows

Query 28: Delete all records in the Shipping table.

If you want to delete all records from the Shipping table, then you skip the
WHERE clause. Thus, the statement would be written as:

DELETE FROM Shipping;

Copyright © Open University Malaysia (OUM)


72  TOPIC 3 STRUCTURED QUERY LANGUAGE (SQL): DATA MANIPULATION

This command will delete all rows in the table Shipping but it does not delete the
table. This means that the table structure, attributes and indexes will still be
intact.

 SQL is the standard language for relational database management systems.


SQL is divided into two categories: data definition language (DDL) and data
manipulation language (DML).

 DML allows you to retrieve, add, modify and delete data from the table(s).
The basic DML commands are SELECT, INSERT, UPDATE and DELETE.

 The SELECT statement is the most important statement for retrieving data
from the existing database. The result from each query of a SELECT
statement is in the form of a table. A SELECT statement has the following
syntax:

SELECT [DISTINCT | ALL] [*][column_expression [AS new_name]]


FROM tablename [alias][....]
[WHERE condition]
[GROUP BY column_list] [HAVING condition]
[ORDER BY column_list]

 The SELECT statement allows result tables not only from one table but also
from more than one table. When more than one table is involved, join
operation must be used by specifying the names of tables in the FROM clause
and the join condition in the WHERE clause.

 The Other SQL DML commands use for data manipulation are the INSERT,
UPDATE and DELETE commands. INSERT is used to insert new row(s) into
the existing table. UPDATE is used to modify value(s) for all or a specified
column of an existing table. DELETE is used to delete row(s) from an existing
table.

Copyright © Open University Malaysia (OUM)


TOPIC 3 STRUCTURED QUERY LANGUAGE (SQL): DATA MANIPULATION  73

Data definition language (DDL) Queries


Data manipulation language (DML) SELECT
Multitable queries SQL command
DELETE Subqueries
INSERT UPDATE

1. What are the two major components of SQL and what functions do they
serve?

2. Identify two advantages and two disadvantages of SQL.

3. What restrictions apply to the use of the aggregate functions within the
SELECT statement?

4. Explain how the GROUP BY clause works. Identify one difference between
the WHERE and HAVING clauses.

5. Identify one difference between a subquery and a join.

Connolly, T. M., & Begg, C. E. (2009). Database systems: A practical approach to


design, implementation and management (5th ed.). Boston, MA: Addison-
Wesley.

Pratt, P. J., & Last, M. Z. (2008). A guide to SQL (8th ed.). Mason, OH: Cengage
Learning.

Rob, P., & Coronel, C. (2011).Database systems: Design, implementation and


management (8th ed.). Stamford, CT: Cengage Learning.

Copyright © Open University Malaysia (OUM)


Topic  SQL: Data
4 Definition

LEARNING OUTCOMES
By the end of this topic, you should be able to:
1. Identify the data types supported by SQL;
2. Define the integrity constraints using SQL;
3. Use the integrity enhancement feature in the Create Table
statement; and
4. Create and delete views using SQL.

 INTRODUCTION
In Topic 3, we examined in detail the structured query language (SQL)
particularly the SQL data manipulation features. By now, you would have been
comfortable with the SELECT statement.

In this topic, we will explore the main SQL data definition facilities. We begin
this topic by examining the ISO SQL data types. The integrity enhancement
feature (IEF) improves the functionality of SQL and allows the constraint
checking to be standardised. We will examine required data, domains, entity
integrity and referential integrity constraints. Then, we will discuss the main SQL
data definition features which include the database and table creation as well as
the altering and deleting of a table. This topic concludes with the creation and the
removal of views.

Copyright © Open University Malaysia (OUM)


TOPIC 4 SQL: DATA DEFINITION  75

4.1 THE ISO SQL DATA TYPES


We begin this subtopic by defining the valid identifiers in SQL and proceed with
the SQL data types.

4.1.1 SQL Identifiers


Now, let us learn about SQL identifiers.

(a) SQL identifiers are used to identify the following items in the database:

(i) Table names;

(ii) View names; and

(iii) Attributes (columns).

(b) The characters that can be used consists of:

(i) Upper-case letters (A ă Z);

(ii) Lower case characters (a ă z);

(iii) Digits (0 ă 9); and

(iv) Underscore ( _ ) character.

(c) The identifiers have the following restrictions:

(i) It cannot be more than 128 characters;

(ii) It must start with a letter; and

(iii) It cannot contain spaces.

Copyright © Open University Malaysia (OUM)


76  TOPIC 4 SQL: DATA DEFINITION

4.1.2 SQL Data Types

The data type character is referred to as a string data type while exact numeric
and approximate numeric data types are referred to as numeric data types.

Figure 4.1 shows the ISO SQL data types.

Figure 4.1: ISO SQL data types


Source: Connolly & Begg (2009)

(a) Boolean Data


Boolean or logical data consists of two distinct values:

(i) TRUE; or

(ii) FALSE.

(b) Character Data


A character string can be defined in terms of:

(i) Fixed Length


To define fixed characters, we declare it as CHAR.

For example, in our Order Entry Database, the EmpNo attribute in the
Employee table has a fixed length of five characters. It is declared as:

EmpNo CHAR(5)

This column has a fixed length of five characters and when we insert
less than five characters, the string is padded with blanks to make up
for up to five characters.

Copyright © Open University Malaysia (OUM)


TOPIC 4 SQL: DATA DEFINITION  77

(ii) Variable Length


Variable length characters are declared as VARCHAR.

The column Name in the Employee relation has a variable length of


up to 15 characters. It will be declared as:

Name VARCHAR(15)

This column has a variable length of 15 characters and if we enter less


than 15 characters, only the characters entered are stored.

(c) Exact Numeric Data


The exact numeric data type is used to define a number with an exact
representation.

(i) Decimal Number


The decimal number representation is declared as:

Column Name DECIMAL (T, R)

The T value indicates the total number of digits and the R value
indicates the number of digits to the right of the decimal point.

For example, column Salary in the Employee relation can be declared


as:

Salary DECIMAL(7,2)

Which can handle a value up to 99,999.99.

(ii) Positive or Negative Number


For a large positive or negative number, that is, a number without a
decimal point, we declare them as an INTEGER.

For example, the column QtyOnHand in the Product table can be


declared as:

QtyOnHand INTEGER(4)

Copyright © Open University Malaysia (OUM)


78  TOPIC 4 SQL: DATA DEFINITION

(d) Approximate Numeric Data


Real numbers used for scientific calculations are of the approximate
numeric data type and are declared using FLOAT (p), where p is the
precision parameter which indicates the number of significant digits.

The precision digits are as follows:

(i) FLOAT: Can vary up to 38 digits; and

(ii) Real numbers: Fixed at 18 digits.

(e) Date
The date data type is defined in columns such as the DOB (date of birth)
column in the Employee table. This is declared in the SQL as:

DOB DATE

The default format can be specified for example DD-MON-YY as used in


our example.

SELF-CHECK 4.1

1. Describe SQL identifiers.

2. Identify the data types in SQL.

4.2 INTEGRITY ENHANCEMENT FEATURE


In this subtopic, we will consider the features provided by SQL for integrity
control to ensure that the database is consistent. There are four types of integrity
constraints as shown in Figure 4.2.

Figure 4.2: Four types of integrity constraints

Copyright © Open University Malaysia (OUM)


TOPIC 4 SQL: DATA DEFINITION  79

4.2.1 Required Data

A null is not a blank or zero and is used to represent data that is not available
or not applicable.
Connolly & Begg (2009)

However, some columns must contain some valid data. For example, every
employee in the Employee relation must have a position, whether they are a
salesperson, manager or a clerk. SQL provides the NOT NULL clause in the
CREATE TABLE statement to enforce the required data constraint.

To ensure that the column position of the Employee table cannot be null, we
define the column as:

Position VARCHAR(15) NOT NULL

When NOT NULL is specified, the Position column must have a data value.

4.2.2 Domain Constraints


Every column will have a set of allowable values, for instance, the gender of the
employee is either male (M) or female (F). Therefore ,the gender column has the
domain „M‰ or „F‰. In SQL, we can use the CHECK clause to enforce this
domain constraint on a column in the CREATE and ALTER TABLE statements.

To ensure that the gender can only be specified as „M‰ or „F‰, we define the
domain constraint in the Gender column as:

Gender CHAR NOT NULL CHECK (gender in (ÂMÊ, ÂFÊ))

Copyright © Open University Malaysia (OUM)


80  TOPIC 4 SQL: DATA DEFINITION

4.2.3 Entity Integrity

Entity integrity is defined as the primary key value of a table which must be
unique and cannot be null.

For example, every EmpNo in the Employee relation is unique and identifies the
employee.

To support the entity integrity, SQL provides the PRIMARY KEY clause in the
CREATE and ALTER TABLE statements. For example, to declare EmpNo as the
primary key, we use the clause as:

PRIMARY KEY (EmpNo)

4.2.4 Referential Integrity


SQL supports the referential integrity constraint with the FOREIGN KEY clause
in the CREATE and ALTER TABLE statements. For example, to specify the
foreign key SuppNo of the Product table, we state it as:

FOREIGN KEY (SuppNo) REFERENCES

A foreign key value in a relation must match a candidate key value of the
tuple in the referenced relation or the foreign key value can be null.
Connolly & Begg (2009)

As for the Order Entry Database, the Product table has the foreign key SuppNo.
You will notice that every entry of SuppNo in the rows of the Product table (child
table) matches the SuppNo of the referenced table Supplier (parent table).

Copyright © Open University Malaysia (OUM)


TOPIC 4 SQL: DATA DEFINITION  81

(a) SQL rejects any attempts to:

(i) Delete a referenced row.


What would happen to the related rows in the child table with the
matching foreign key value if the referenced row in the parent table is
deleted? Deleting a referenced row is specified using the ON DELETE
clause.

(ii) Updating the primary key of a referenced row.


What happens to related rows when the primary key of the referenced
row in the parent table is updated (Mannino, 2011)? Updating the
primary key in the parent table is specified using the ON UPDATE
clause.

(b) SQL supports four actions


SQL supports four actions when a user attempts to delete a referenced row
from the parent table and there are one or more matching rows in the child
table. The same four actions can be applied when the primary key in the
parent table is updated (Mannino, 2011; Connolly & Begg, 2009).

(i) CASCADE
Perform the same action to related rows. For example, if a SuppNo in
the Supplier table is deleted, then the related rows in the Product table
will be deleted in a cascading manner.

Similarly, for example, if a SuppNo in the Supplier table is updated,


then the related rows in the Product table need be updated in a
cascading manner. We specify this SQL as:

FOREIGN KEY (SuppNo) REFERENCES


Supplier ON UPDATE CASCADE

(ii) SET NULL


Delete the row from the parent table and set the foreign key value in
the child table to NULL. For example, if the SuppNo in the Supplier
table is deleted, then the related rows in the Product table will be set
to NULL. This will be valid only if the foreign key values do not have
the NOT NULL clause specified.

Copyright © Open University Malaysia (OUM)


82  TOPIC 4 SQL: DATA DEFINITION

(iii) NO ACTION
Reject the delete operation from the parent table. For example, do not
allow the SuppNo in the Supplier table to be deleted if there are
related rows in the Product table.

For example, if we do not allow the rows in the Supplier table to be


deleted and there are related rows in the Product table, then we can
specify it in SQL as:

FOREIGN KEY (SuppNo) REFERENCES


Supplier ON DELETE NO ACTION

(iv) SET DEFAULT


Delete the row from the parent table and set the foreign key in the
child table to its default value. This is valid only if the default values
are set. For example, if the SuppNo in the Supplier table is deleted,
then set SuppNo in the Product table to a default value such as
„TENDERS SOON‰.

You must also consider the impact of referenced rows on insert operations.
A referenced row (in the parent table) must be inserted before its related
rows (in the child table). For example, before inserting a row in the Product
table, the referenced row in the Supplier must exist.

SELF-CHECK 4.2

1. How do you define Primary and Foreign keys?

2. Identify the actions that SQL supports. Briefly explain them.

4.3 DATA DEFINITION


The SQL data definition language (DDL) allows database items such as table and
views to be created and to be deleted.

Copyright © Open University Malaysia (OUM)


TOPIC 4 SQL: DATA DEFINITION  83

4.3.1 Creating a Database


The authority to create a database lies with the database administrator. A schema
is a named collection of database objects that are related to one another. These
objects in the schema include tables, views, domains and so on.

(a) SQL provides for the definition of the schema as:

CREATE SCHEMA [ ] AUTHORISATION

(b) Therefore, if the schema is OrderProcessing and the creator is Lim, the SQL
statement is:

CREATE SCHEMA OrderProcessing AUTHORISATION Lim;

(c) A schema can be deleted using the following command:

DROP SCHEMA OrderProcessing

4.3.2 Creating a Table


We now create the table structure using the CREATE TABLE statement which
has the following syntax (Connolly & Begg, 2009):

CREATE TABLE tablename


( columnName dataType [ NOT NULL]
[ DEFAULT defaultOption] [CHECK (searchCondition)]
[PRIMARY KEY (listofcolumns)]
[FOREIGN KEY (listofForeignKeyColumns)
REFERENCES ParentTableName[(listOfCandidateKeyColumns)]
[ON UPDATE referentialAction]
[ON DELETE referentialAction])

Copyright © Open University Malaysia (OUM)


84  TOPIC 4 SQL: DATA DEFINITION

The CREATE TABLE statement creates a table consisting of one or more columns
of the defined data type.

The optional DEFAULT clause provides for default values in a column.


Whenever an INSERT statement fails to specify a column value, SQL will use the
default value.

The NOT NULL is specified to ensure that the column must have a data value.

The remaining clauses are constraints and are headed by the clause:

CONSTRAINT constraintname

The PRIMARY KEY clause specifies the column(s) that comprise the primary
key. It is assumed by default that the primary key value is NOT NULL.

The FOREIGN KEY clause specifies a foreign key in the child table and its
relationship to the parent table. This clause specifies the following:

(a) A listofForeignKeyColumns, the column(s) that form the foreign key;

(b) A REFERENCES subclause indicting to the parent table that holds the
matching primary key;

(c) An optional ON UPDATE clause to specify the action taken on the foreign
key value of the child table, if the matching primary key in the parent table
is updated. These actions were discussed in the previous Subtopic 4.2.4;
and

(d) An optional ON DELETE clause to specify the action taken on the child
table if the row(s) in the parent table are deleted, whose primary key values
matches the foreign key value in the child table. These actions were
discussed in the previous Subtopic 4.2.4.

The following three examples show the CREATE TABLE statements for the
Order Entry Database using the tables Customer, Order and OrderDetail.

Copyright © Open University Malaysia (OUM)


TOPIC 4 SQL: DATA DEFINITION  85

Example 4.1:
Creating the Customer table using the features of the CREATE TABLE statement:

CREATE TABLE Customer


( CustNo CHAR(5)
Name VARCHAR(20) NOT NULL
Street VARCHAR(15) NOT NULL
City VARCHAR(15) NOT NULL
Postcode CHAR(5) NOT NULL
MobileTelNo CHAR(11) NOT NULL
Balance DECIMAL(6,2) NOT NULL
CONSTRAINT PKCustomer PRIMARY KEY(CustNo) )

Example 4.2:
Creating the Order table:

CREATE TABLE Order


( OrderNo CHAR(4)
OrderDate DATE NOT NULL
OrderStreet VARCHAR(15) NOT NULL
OrderCity VARCHAR(15) NOT NULL
OrderPostcode CHAR(5) NOT NULL
CustNo CHAR(5)
CustNo CHAR(5) NOT NULL
EmpNo CHAR(5) NOT NULL
CONSTRAINT PKOrder PRIMARY KEY(OrderNo) )
CONSTRAINT FKCustNo FOREIGN KEY(CustNo) REFERENCES
Customer
ON DELETE NO ACTION
ON UPDATE CASCADE
CONSTRAINT FKEmpNo FOREIGN KEY(EmpNo) REFERENCES Employee
ON DELETE NO ACTION
ON UPDATE CASCADE )

Copyright © Open University Malaysia (OUM)


86  TOPIC 4 SQL: DATA DEFINITION

Example 4.3:
Creating the OrderDetail table:

CREATE TABLE OrderDetail


( OrderNo CHAR(4) NOT NULL
ProductNo CHAR(5) NOT NULL
QtyOrdered INTEGER(4) NOT NULL
CONSTRAINT PKOrderDetail PRIMARY KEY(OrderNo, ProductNo)
CONSTRAINT FKOrderNo FOREIGN KEY(OrderNo) REFERENCES
Order
ON DELETE NO ACTION
ON UPDATE CASCADE
CONSTRAINT FKProductNo FOREIGN KEY(ProductNo) REFERENCES
Product
ON DELETE NO ACTION
ON UPDATE CASCADE )

You can now create the rest of the tables in the Order Entry Database as an
exercise.

4.3.3 Changing a Table Definition


The ALTER TABLE statement supports modification of a table definition. The
definition of the ALTER TABLE statement consists of the options:

(a) Adding a new column to a table and dropping an existing column;

(b) Adding a new table constraint and dropping an existing table constraint;
and

(c) Setting a default for a column and dropping an existing default for a
column.

Copyright © Open University Malaysia (OUM)


TOPIC 4 SQL: DATA DEFINITION  87

The basic syntax of the statement is:

ALTER TABLE TableName


[ADD [COLUMN] columnName dataType [NOT NULL]
[DEFAULT defaultOption]
[DROP [COLUMN] columnName [RESTRICT|CASCADE]]
[ADD [CONSTRAINT][ConstraintName]] tableConstraintDefinition]
[DROP CONSTRAINT ConstraintName [RESTRICT|CASCADE]]
[ALTER [COLUMN] SET DEFAULT defaultOption]
[ALTER [COLUMN] DROP DEFAULT]

A table ConstraintDefinition includes the PRIMARY KEY, FOREIGN KEY or the


CHECK clauses.

The ADD COLUMN clause is the same as the definition of a column in the
CREATE TABLE statement.

The DROP COLUMN clause defines the name of the column to be dropped and
has the following options (Connolly & Begg, 2009):

(a) RESTRICT
The DROP operation is rejected if the column is referenced by another
database object.

(b) CASCADE
The DROP operation proceeds and drops the column from any database
items it is referenced by.

For example, if we want to add an extra column, that is, Branch_No to the
Employee table, the SQL statements would be:

ALTER TABLE Employee


ADD Branch_No CHAR(4) NOT NULL;

Copyright © Open University Malaysia (OUM)


88  TOPIC 4 SQL: DATA DEFINITION

4.3.4 Removing a Table


We can remove a table from the database by using the DROP TABLE statement
which has the following syntax:

DROP TABLE TableName

For example, to remove the OrderDetail table we specify it as:

DROP TABLE OrderDetail

The DROP TABLE statement should be carried out with care as the total effect
can be damaging to the rest of the database tables. It is recommended that this
clause be used if a table is created with an incorrect structure. Then, the DROP
TABLE clause can be used to delete this table and the structure can be created
again.

SELF-CHECK 4.3
1. What does the CREATE TABLE statement do?
2. What does the ALTER TABLE statement do?
3. How can we remove a table from the database?

4.4 VIEWS
What does a view mean?

A view is a virtual or derived relation that may be derived from one or more
base relations.

Views do not physically exist in the database. They allow users to customise the
data according to their needs and hide part of the database from certain users.
Let us look at how views are created.

Copyright © Open University Malaysia (OUM)


TOPIC 4 SQL: DATA DEFINITION  89

4.4.1 Creating a View


The format of the CREATE VIEW is:

CREATE VIEW ViewName [(newColumnName[,⁄])]


AS subset [WITH [CASCADED|LOCAL] CHECK OPTION]

A view is defined by specifying an SQL SELECT statement. A name may


optionally be assigned to each column in the view. If the column name is
omitted, each column in the view takes the name of the corresponding column in
the subselect statement. The subselect is known as the defining query (Connolly
& Begg, 2009).

Now, we will examine the different types of views.

(a) Creating a Horizontal View


Create a view so that the sales manager in the Order Entry Database can see
the details of customers who work in Ipoh. This horizontal view is created
in SQL as:

CREATE VIEW CustomerIpoh


AS SELECT *
FROM Customer
WHERE City = ÂIpohÊ;

This will give us a view known as CustomerIpoh with the same column
names as the Customer table but only those rows where the City is Ipoh.
This view is shown below in Table 4.1.

Table 4.1: CustomerIpoh View

CustNo Name Street City Postcode MobileTelNo Balance


C8542 Lim Ah Kow 12, Jalan Baru Ipoh 34501 012-5672314 500
C1010 Fong Kim Lee 54, Main Street Ipoh 34570 012-5677118 350

This gives the manager a customised view of the Customer table.

Copyright © Open University Malaysia (OUM)


90  TOPIC 4 SQL: DATA DEFINITION

(b) Creating a Vertical View


If now, for example, the manager wants to see only the customer numbers,
names and their balance, than we create the view in SQL as:

CREATE VIEW CustomerIpohBal


AS SELECT CustNo, Name, Balance
FROM CustomerIpoh;

This view created from the CustomerIpoh View is called CustomerIpohBal.


It has the columns CustNo, Name and Balance. The columns in this view
are displayed in Table 4.2.

Table 4.2: CustomerIpohBal View

CustNo Name Balance


C8542 Lim Ah Kow 500
C1010 Fong Kim Lee 350

4.4.2 Removing a View


A view is removed from the database with a DROP VIEW statement:

DROP VIEW ViewName [RESTRICT | CASCADE]

DROP VIEW causes the definition of the view to be deleted from the schema. For
example, to remove the view CustomerIpoh, we specify it in SQL as:

DROP VIEW CustomerIpoh;

If CASCADE is specified, DROP VIEW deletes all objects that reference the view.
If the RESTRICT option is chosen and other database items that depend on the
existence of the view are being dropped, then the command is rejected.

Copyright © Open University Malaysia (OUM)


TOPIC 4 SQL: DATA DEFINITION  91

SELF-CHECK 4.4

1. Briefly explain a VIEW.

2. How do you create a VIEW?

 SQL identifiers can use the letters a-z (upper and lower), numbers and the (_)
for table, view and column names. The identifiers cannot be more than 128
characters, must start with a letter and cannot contain spaces.

 The available data types identified in SQL are Boolean, character, exact
numeric, approximate numeric and date.

 Required data in SQL are specified by the NOT NULL clause. Domain
constraint is specified using the CHECK clause.

 Primary keys are defined using the PRIMARY KEY clause.

 Foreign keys are specified using the FOREIGN KEY clause. The update and
delete actions on referenced rows are specified by the ON DELETE and ON
UPDATE subclauses.

 The CREATE TABLE statement creates a table consisting of one or more


columns of the defined data type. The ALTER TABLE statement supports
modification of a table definition.

 We can remove a table from the database by using the DROP TABLE
statement.

 A view is a derived relation that does not physically exist in the database. It
allows users to customise the data according to their needs.

 A view is created by the CREATE VIEW statement. It is not a stored table and
it is not necessary to recreate the view each time it is referenced. The different
types of views that can be created include the horizontal and vertical views.

Copyright © Open University Malaysia (OUM)


92  TOPIC 4 SQL: DATA DEFINITION

ALTER TABLE DROP SCHEMA


Approximate numeric DROP TABLE
Boolean DROP VIEW
CASCADE Entity integrity
Character data Exact numeric
CHECK FOREIGN KEY
CREATE DOMAIN NO ACTION
CREATE SCHEMA NOT NULL
CREATE TABLE PRIMARY KEY
CREATE VIEW Referential integrity
Data types SET DEFAULT
Date SET NULL
DROP DOMAIN SQL identifiers

1. Discuss each of the clauses of the CREATE TABLE statement.

2. Explain how the process of VIEW resolution works.

3. Discuss how the access control mechanisms of SQL work.

4. Consider the following table :

(a) Component (ComponentNo, Contract, ComponentPrice) which


represents the price negotiated under each contract for a component
(a component may have a different price under each contract);

(b) Now, consider the following view CheapComponent which contains


the specific component numbers for components that cost less than
RM100; and

Copyright © Open University Malaysia (OUM)


TOPIC 4 SQL: DATA DEFINITION  93

(c) CREATE VIEW CheapComponent (componentNo) AS SELECT


DISTINCT componentNo:

(i) FROM component; and

(ii) WHERE componentPrice < 100.

5. Discuss how you would maintain this as a materialised view and under
what circumstances you would be able to maintain the view without
having to access the underlying base table Component.

Connolly, T. M., & Begg, C. E. (2009). Database systems: A practical approach to


design, implementation and management (5th ed.). Boston, MA: Addison-
Wesley.

Mannino, M. (2011). Database design, application development and


administration (5th ed.). Scottsdale, AZ: Ediyu.

Copyright © Open University Malaysia (OUM)


Topic  Entity-
5 Relationship
(ER) Modelling
LEARNING OUTCOMES
By the end of this topic, you should be able to:
1. Define the basic concepts of the entity-relationship (ER) model;
2. Recognise one-to-one, one-to-many and many-to-many
relationships; and
3. Describe generalisation hierarchies.

 INTRODUCTION
In order to design a database, there must be a clear understanding of how the
business operates, so that the design produced will meet user requirements. The
Entity-Relationship (ER) model allows database designers, programmers and end
users to give their input on the nature of the data and how it is used in the
business. Therefore, the ER model is a means of communication that is non-
technical and easily understood.

In the ER model, we begin by identifying the significant data known as entities


and associate the relationship between these entities. Information about the
entities known as attributes are then added and any constraints on the entities,
relationship and on the attributes are then identified (Connolly and Begg, 2009).

Copyright © Open University Malaysia (OUM)


TOPIC 5 ENTITY-RELATIONSHIP (ER) MODELLING  95

In this topic, you are provided with the basic concepts of the ER model which
enable you to understand the notation of ER diagrams. The CrowÊs Foot notation
is used here to represent the ER diagrams.

5.1 ENTITY
What does entity mean?

Entity is a collection of objects of interest in an application.


Mannino (2011)

Entity can be physical such as people, places or objects as well as events and
concepts such as reservation or course. A full list is given in Table 5.1:

Table 5.1: Five Entity Types

Entity Objects
Persons DOCTOR, CUSTOMER, EMPLOYEE, STUDENT, SUPPLIER
Places BUILDING, OFFICE, FACULTY
Objects STATIONERY, MACHINE, BOOK, PRODUCT, VEHICLE
Events TOURNAMENT, AWARD, FLIGHT, ORDER, RESERVATION
Concepts COURSE, FUND, QUALIFICATION

In CrowÊs Foot notation, an entity is represented in a rectangle with a singular


noun name inside. For example, the notation for Customer entity is shown in
Figure 5.1.

Figure 5.1: Example of CrowÊs Foot entity

Copyright © Open University Malaysia (OUM)


96  TOPIC 5 ENTITY-RELATIONSHIP (ER) MODELLING

5.2 ATTRIBUTES
What does attribute stand for?

Attribute is a descriptive property or characteristics of an entity.

The attributes of the entity Customer are CustNo, Name, Street, City, Postcode,
TelNo and Balance.

For example, the notation for entity Customer with the stated attributes is
represented in Figure 5.2. The primary key CustNo is underlined.

Figure 5.2: Example of attributes

5.3 RELATIONSHIPS
What can you say to define relationship?

A relationship is a set of business association that exists between one or more


entities.
Connolly & Begg (2009)

Copyright © Open University Malaysia (OUM)


TOPIC 5 ENTITY-RELATIONSHIP (ER) MODELLING  97

Each relationship is given a name that describes its function. An example of a


relationship is shown in Figure 5.3.

Figure 5.3: Example of a relationship

Consider the example in Figure 5.3 between the Customer entity and Order
entity. In the CrowÊs Foot notation, relationship names appear on the line
connecting the entity involved in the relationship. In Figure 5.3, the Makes
relationship shows that the Customer and Order entities are directly related. The
Makes relationship is binary because it involves two entities.

5.3.1 Relationship Cardinality

Cardinalities contain the number of objects that participate in a relationship.


Mannino (2011)

The meaning of cardinalities can be shown in an instance diagram (see


Figure 5.4):

Figure 5.4: Instance diagram of the Makes relationship

Copyright © Open University Malaysia (OUM)


98  TOPIC 5 ENTITY-RELATIONSHIP (ER) MODELLING

Figure 5.4 shows a set of customers {Customer1, Customer2, and Customer3}, a


set of orders {Order1, Order2, Order3, Order4} and connections between the two
sets. In this figure, Customer1 is related to Order1, Order2 and Order3,
Customer2 is not related to any Order entity and Customer3 is related to Order4.

Similarly, Order1 is related to Customer1, Order2 to Customer1 and Order3 to


Customer1. You can see also that Order4 is related to Customer3. From this
diagram, we see that each order is related to exactly one customer. Note that in
the other direction, each customer is related to zero or more orders.

There are three main types of relationship that can exist between entities:

(a) One-to-One Relationship


An Order generates only one invoice and an Invoice is generated by an
order (see Figure 5.5).

Figure 5.5: One-to-one relationship

(b) One-to-Many Relationship


Each Customer can make one or more orders and an Order is from one
customer (see Figure 5.6).

Figure 5.6: One-to-many relationship

(c) Many-to-Many Relationship


An Order has one or more product and a Product can be in one or more
orders (see Figure 5.7).

Figure 5.7: Many-to-many relationship

Copyright © Open University Malaysia (OUM)


TOPIC 5 ENTITY-RELATIONSHIP (ER) MODELLING  99

5.3.2 Classification of Cardinalities


Cardinalities are classified by common values for minimum and maximum
cardinality.

A minimum cardinality of one or more shows a mandatory relationship. A


minimum cardinality of zero indicates an optional relationship as shown in
Figure 5.8.

Figure 5.8: Optional relationship

In this relationship, an Employee processes zero, one or more orders and each
Order is processed by one employee. The above Processes relationship is optional
to the Employee entity because an Employee entity can be stored without being
related to an Order entity. It is mandatory for the Order entity because an order,
however, has to be processed by one employee.

The CrowÊs Foot notation uses three symbols to represent cardinalities.

(a) The CrowÊs Foot symbol shows many related entities. The CrowÊs Foot
symbol near the Order entity type means that a customer can be related to
many orders;

(b) The circle means a cardinality of zero; and

(c) A line perpendicular to the relationship indicates a cardinality of one.

To show minimum and maximum cardinality, the symbols are placed next to
each entity type in a relationship. In Figure 5.9, a customer is related to a
minimum of zero offerings, (circle in the inside position) and a maximum of
many offerings (CrowÊs Foot in the outside position). In the same way an order is
related to exactly one (one and only one) customer as shown by the single
vertical lines in both inside and outside positions. Table 5.2 shows a summary of
cardinality classifications using CrowÊs Foot notation.

Copyright © Open University Malaysia (OUM)


100  TOPIC 5 ENTITY-RELATIONSHIP (ER) MODELLING

Figure 5.9: Entity-relationship diagram with cardinalities

Table 5.2: Summary of Cardinality Classifications

Cardinality Minimum Maximum


Graphic Notation
Interpretation Instances Instances
Exactly one 1 1
(one and one only)

Zero or one 0 1

One or more 1 Many (>1)

Zero, one or more 0 Many (>1)

Copyright © Open University Malaysia (OUM)


TOPIC 5 ENTITY-RELATIONSHIP (ER) MODELLING  101

5.3.3 Degree of Relationship


What does degree of relationship mean?

Degree of relationship is the number of participating entity in a relationship.

A relationship of degree two is called binary. An example of a binary relationship


is the Makes relationship in the previous Figure 5.6, with two entities known as
Customer and Order. Other examples of binary relationship are shown in the
previous Figures 5.5, 5.7 and 5.8. The binary relationship is the most common
relationship.

A relationship of degree three is known as ternary relationship. An example of


such a relationship is shown in Figure 5.10.

Figure 5.10: An example of a ternary relationship

5.3.4 Recursive Relationship

Recursive relationship is a relationship where the same entity participates


more than once in different roles.
Connolly & Begg (2009)

Copyright © Open University Malaysia (OUM)


102  TOPIC 5 ENTITY-RELATIONSHIP (ER) MODELLING

In a recursive or unary (degree = 1) relationship, there is only one entity


involved. For example, an employee is supervised by a supervisor who is also an
employee.

The Employee entity participates twice in this relationship:

(a) As a supervisor; and

(b) As a member of an employee who is supervised.

This recursive relationship is shown in Figure 5.11.

Figure 5.11: Recursive relationship

This relationship is read as „Each supervisor supervises one or more employees‰


and „An employee is supervised by one supervisor‰.

5.3.5 Resolving Many-to-Many Relationships


We examine many-to-many relationships again by looking at our previous
example in Figure 5.7. This is reproduced again in Figure 5.12, this time with
attributes and primary key.

Figure 5.12: Many-to-many relationship

Copyright © Open University Malaysia (OUM)


TOPIC 5 ENTITY-RELATIONSHIP (ER) MODELLING  103

For each many-to-many relationship, create a new relation to represent the


relationship. So, in this case, we create a new relation known as OrderDetail
which has the attribute QtyOrdered. The primary key of entities Order and
Product are posted into the new relation OrderDetail. Hence, the relation
OrderDetail has two primary keys that are, OrderNo and ProductNo which also
act as foreign keys. This is depicted in Figure 5.13.

Figure 5.13: Resolving many-to-many relationships

5.4 STRONG AND WEAK ENTITIES


How do we define a strong entity?

A strong entity is an entity that is not existent-dependent on some other entity.

Strong entities have their own primary keys. Examples of strong entities are
Product, Employee, Customer, Order, Invoice, etc. Strong entity types are known
as parent or dominant entities.

Copyright © Open University Malaysia (OUM)


104  TOPIC 5 ENTITY-RELATIONSHIP (ER) MODELLING

What about weak entity?

A weak entity is existence-dependent on some other entity type.

Weak entities borrow all or part of the primary keys from another (strong) entity.
As an example, see Figure 5.14 whereby the Room entityÊs existence is dependent
on the Building entity. You can only refer a room by providing its associated
building identifier. The underlined attribute in the Room is part of the primary
key but not the entire primary key. Therefore, the primary key of Room is the
combination of BuildingId and RoomNo.

Figure 5.14: Weak entity room with strong entity building


Source: Mannino (2011)

5.5 GENERALISATION HIERARCHIES


What does generalisation hierarchy stand for?

Generalisation hierarchy is a technique where attributes that are common to


several types of entity are grouped into their own entity called supertype.

What does supertype stand for?

Supertype is an entity that stores attributes that are common to one or more
entity subtypes, meanwhile subtype is an entity that inherits some common
attributes from an entity supertype and then adds other attributes that are unique
to an instance of the subtype.

Copyright © Open University Malaysia (OUM)


TOPIC 5 ENTITY-RELATIONSHIP (ER) MODELLING  105

Figure 5.15 shows a generalisation hierarchy to classify employees according to


their job types. For example, the Employee entity is the supertype (parent) and
the entities Pilot, Mechanic and Accountant are the subtypes (children). Because
each subtype entity is a supertype entity, therefore a Pilot, Mechanic or an
Accountant is an Employee.

Figure 5.15: Generalisation hierarchy for employees


Source: Rob & Coronel (2011)

Inheritance is the sharing of attributes between supertypes and subtypes.

Copyright © Open University Malaysia (OUM)


106  TOPIC 5 ENTITY-RELATIONSHIP (ER) MODELLING

Inheritance means that the attributes of a supertype are automatically part of its
subtypes, that is, each subtype will inherit the attributes of the supertype. For
example, the attributes of Pilot entity are its inherited attributes that are EmpNo,
Name and HireDate, as well as its direct attributes that are PilotLicence and Pilot
Ratings. This is because the Pilot is a subtype or Employee. These direct
attributes are known as specialist attributes.

Specialisation is the process of maximising differences between members of


an entity by identifying their distinguishing characteristics.
Connolly & Begg (2009)

When we identify a set of subtypes of an entity, we give attributes specific to


each subtype. In our example, employees have different job roles of Pilot,
Mechanic and Accountant. Therefore, they have specialised attributes for their
job roles. The Pilot entity has the specialised attributes PilotLicence and
PilotRatings which are not applicable to the job roles of Mechanic and
Accountant. The Mechanic entity has its own specialised attributes of EmpRate
and OverTimeRate while the Accountant entity has its specialised attributes of
Qualification and ProfOrg.

5.5.1 Disjointness and Completeness Constraints


What does disjointness (D) mean?

Disjointness (D) means that subtypes in a generalisation hierarchy do not


have any entities in common.

Based on the example in Figure 5.15, the generalisation hierarchy is disjoint (non-
overlapping) because an Employee cannot be a Pilot and at the same time a
Mechanic. The employee must be a Pilot or a Mechanic or an Accountant. To
show the disjoint constraint, D is used as shown in Figure 5.16.

Copyright © Open University Malaysia (OUM)


TOPIC 5 ENTITY-RELATIONSHIP (ER) MODELLING  107

Figure 5.16: Generalisation hierarchy with disjoint constraint

The generalisation hierarchy in Figure 5.17 is non-disjoint (overlapping) because


a member of a Faculty can be an Academic Staff as well as a Student. The absence
of D indicates that the generalisation hierarchy is not disjoint.

Figure 5.17: Generalisation hierarchy with non-disjoint constraint

Copyright © Open University Malaysia (OUM)


108  TOPIC 5 ENTITY-RELATIONSHIP (ER) MODELLING

What does completeness (C) mean?

Completeness (C) means that every entity of a supertype must be an entity in


one of the subtypes in the generalisation hierarchy.

In Figure 5.18, the completeness constraint means every Staff must either be
employed as FullTime or PartTime Staff. To show the completeness constraint, C
is used as shown in Figure 5.18.

Figure 5.18: Generalisation hierarchy with the completeness constraint

In contrast, the generalisation hierarchy is not complete if the entity does not fall
into any of the subtype entities. If we consider our previous example in the
Employee generalisation hierarchy as shown in Figure 5.16, we note that the
employee roles are pilot, mechanic and accountant. However, if the job role
involves an office administrator, then this entity would fall into any of the
subtypes as it would not have any special attributes. Therefore, the entity office
administrator would remain in the supertype entity as employee. The absence of
C indicates that the generalisation hierarchy is not complete.

Copyright © Open University Malaysia (OUM)


TOPIC 5 ENTITY-RELATIONSHIP (ER) MODELLING  109

Some generalisation hierarchies have both the disjoint and complete constraints
as shown in Figure 5.19.

Figure 5.19: Generalisation hierarchy with disjointness and completeness constraints

The disjoint and completeness constraints give us four categories of


generalisation hierarchy as shown in Table 5.3.

Table 5.3: Four Categories of Generalisation Hierarchy

Generalisation Hierarchy Category Indication


Disjoint and complete Presence of D, C
Nondisjoint and complete Presence of C only
Disjoint and not complete Presence of D only
Non-disjoint and not incomplete Absence of the D, C

Copyright © Open University Malaysia (OUM)


110  TOPIC 5 ENTITY-RELATIONSHIP (ER) MODELLING

ACTIVITY 5.1
1. Consider the Company database which keeps track of a
companyÊs employees, departments and projects:

(a) The company is organised into departments. Each


department has a unique name, unique number and
particular employee that manages the department;

(b) A department controls a number of projects, each of which


has a unique name and unique number;

(c) We store each employeeÊs name, national card ID, address,


salary and birth date. An employee is assigned to one
department but may work on several projects, which are not
controlled by the same department. We keep track of the
number of hours per week that the employee works on each
project. A project may involve more than one employee; and

(d) We want to keep track of the dependents of each employee


for insurance purposes. We keep each dependentÊs first
name, birth date and relationship to the employee.

Answer the following questions about the Company database and


state clearly any assumptions that you may make:

(a) List all entities with their attributes. Underline the primary
key. Identify all weak entities; and

(b) Draw the ER diagram for the Company database.

2. Given the many-to-many relationship (see Figure 5.20), resolve the


problem. In your solution, use your own attributes and define
primary keys.

Figure 5.20: ER diagram for Question 2

Copyright © Open University Malaysia (OUM)


TOPIC 5 ENTITY-RELATIONSHIP (ER) MODELLING  111

3. You are going to develop a database that will store information


about journals. Each journal has a journal identification number
and name. Each journal may have any number of issues (for
example, monthly issues or quarterly issues, etc). Each issue is
identified by its number and date issued. Each issue contains a
number of articles. The length in terms of number of words is kept
for each article, together with the number of diagrams in the
article. Each article may be written by one or more writers. The
writerÊs name and address as well as fee paid to a writer for an
article is also recorded. A writer may contribute as many articles
to any journal.

Draw an ER diagram to represent the given information.

4. Create an ER model for a video store using the following rules:

(a) Customers of the video store are assigned a unique customer


number when they make their first rental. In addition to the
customer number, other information such as name and
address are also recorded;

(b) Each videotape that the store owns is identified by a unique


code. Other information about the video includes the date of
purchase;

(c) When a customer selects a video to rent, the store needs to


record the rent date, rent time, return date and total charges.
The cost for each video rented is RM3. A customer can rent
several videos at a time;

(d) The store owns several videos with the same movie title.
Unique identifier will be assigned to each movie title. Other
information on movies include a title and the year produced;
and

(e) Each movie title is associated with a list of actors and one or
more directors. The store has a unique code to identify each
actor and director. In addition to the actor and director
record, other basic information on actors and directors are
stored. By using this information, the store can easily find
movies according to the actor or director.

Copyright © Open University Malaysia (OUM)


112  TOPIC 5 ENTITY-RELATIONSHIP (ER) MODELLING

5. You are assigned to design a Hospital Management System for


Klang Valley Medical Centre. The following are requirements of
the system:

There are three types of employees in this hospital which are the
physician (medical doctor), nurse and administrator. Unlike
administrative staff, a physician and a nurse staff have special
attributes. A physician has a qualification and an expert area. A
nurse has a position and a ward_id where he or she is placed. A
physician treats many patients and a patient can be treated by
more than one physician. Each treatment has prescriptions. The
prescription has a prescription_id, date, product_code, dosage
and amount. A patient can be placed in a ward. A ward is serviced
by several nurse staff. The ward information includes ward
number, building, ward_type and a number of beds.

Draw an ER diagram to represent the given information.

 A CrowÊs Foot notation is about entity relationship diagrams. It describes


symbols, important relationship patterns and generalisation hierarchies.

 The basic symbols are entity, relationships, attributes and cardinalities to


show the number of entities participating in a relationship.

 The relationship patterns described here are the many-to-many relationship,


identifying relationships providing a primary key to weak entities and self-
referencing relationships.

 Generalisation hierarchies allow classification of entity to have similarities


and specialisation among entity.

 The notations of ER diagram provide a solid background for the task of


applying the notations to business problems. To master data modelling, you
need to understand the ER diagram notations and obtain sufficient practice
developing ER diagram.

Copyright © Open University Malaysia (OUM)


TOPIC 5 ENTITY-RELATIONSHIP (ER) MODELLING  113

Attribute CrowÊs Foot notation


Cardinalities Entity-relationship (ER) model
Completeness Entity
Constraint Relationship

1. Discuss the entity that can be represented in an ER model and give


examples of entities with a physical existence.

2. Discuss how relationship can be represented in an ER model and give


examples of unary, binary and ternary relationships.

3. Discuss how multiplicity can be represented as both cardinality and


participation constraints on a relationship type.

4. Create an ER diagram for each of the following descriptions:

(a) Each organisation operates three divisions and each division belongs
to one company;

(b) Each division in (a) employs one or more employees and each
employee works for one division;

(c) Each of the employees in (b) may or may not have one or more
dependents, and each dependent belongs to one employee;

(d) Each employee in (c) may or may not have an employment history;
and

(e) Represent all the ER diagrams described in (a), (b), (c) and (d) as a
single ER diagram.

Copyright © Open University Malaysia (OUM)


114  TOPIC 5 ENTITY-RELATIONSHIP (ER) MODELLING

Connolly, T. M., & Begg, C. E. (2009). Database systems: A practical approach to


design, implementation and management (5th ed.). Boston, MA: Addison-
Wesley.

Mannino, M. (2011). Database design, application development and


administration (5th ed.). Scottsdale, AZ: Ediyu.

Rob, P., & Coronel, C. (2011). Database systems: Design, implementation and
management (8th ed.). Stamford, CT: Cengage Learning.

Copyright © Open University Malaysia (OUM)


Topic  Normalisation
6
LEARNING OUTCOMES
By the end of this topic, you should be able to:
1. Identify the importance of normalisation in the database design;
2. Discuss the problems related to data redundancy;
3. Explain the characteristics of functional dependency which
describes the relationship between attributes;
4. Identify the functional dependency concept in normalisation;
5. Discuss the characteristics of three normal forms; and
6. Describe the normalisation process up to third normal forms in the
design of a database.

 INTRODUCTION
In this topic, we introduce the concept of normalisation and explain its
importance in the database design. Next, we will present the potential problems
in the database design which is also referred to as update anomalies. One of the
main goals of normalisation is to produce a set of relations that is free from
update anomalies. Then, we go into the key concept of normalisation process
which is functional dependency. Normalisation involves a step by step process
or normal forms. Thus, this topic will cover a discussion of the normalisation
process up to the third normal form.

Copyright © Open University Malaysia (OUM)


116  TOPIC 6 NORMALISATION

6.1 THE PURPOSE OF NORMALISATION


What is normalisation?

Normalisation is a multi-step process aimed at reducing data redundancy


and to help eliminate data anomalies that can result from such redundancy.

Normalisation technique used when designing a database and works through a


series of stages, described as normal forms.

The first three stages are referred to as:

(a) First normal form (1NF);

(b) Second normal form (2NF); and

(c) Third normal form (3NF).

The concept of normalisation was first developed and documented by E. F. Codd


(1972). There are two goals of the normalisation process:

(a) Eliminating redundant data (for example, storing the same data in more
than one table); and

(b) Ensuring data dependencies make sense (only storing related data in a
table).

Both of these are worthy goals as they reduce the amount of space a database
consumes and ensure that data is logically stored. If our database is not
normalised, it can be inaccurate, slow, inefficient and it might not produce the
data we expect. Also, if we have a normalised database, queries, forms and
reports are much easier to design!

SELF-CHECK 6.1
1. Define normalisation.

2. Identify two purposes of normalisation.

Copyright © Open University Malaysia (OUM)


TOPIC 6 NORMALISATION  117

6.2 HOW NORMALISATION SUPPORTS


DATABASE DESIGN?
Normalisation involves the analysis of functional dependencies between
attributes (or data items). It helps us decide on which attributes should be
grouped together in a relation.

Why normalisation?

Normalisation is about designing a „good‰ database, that is, a set of related


tables with a minimum of redundant data and no update, delete or insert
anomalies.

Normalisation is a „bottom up‰ approach to database design. The designer


interviews users and collects documents such as reports. The data on a report can
be listed and then normalised to produce the required tables and attributes.

Normalisation is also used to repair a „bad‰ database design, that is, given a set
of tables that exhibit update, delete and insert anomalies. The normalisation
process can be used to change this set of tables to a set that does not have
problems.

SELF-CHECK 6.2

1. Briefly explain how normalisation supports database design.

2. Is normalisation a „bottom-up‰ or „top-down‰ approach to


database design? Briefly explain.

6.3 DATA REDUNDANCY AND UPDATE


ANOMALIES
What does data redundancy mean?

Data redundancy refers to an issue related to the duplication of unnecessary


data within a database.

Copyright © Open University Malaysia (OUM)


118  TOPIC 6 NORMALISATION

The redundant data utilises a lot of unnecessary space and also may create
problems when updating the database, also called update anomalies, which may
lead to data inconsistency and inaccuracy.

As mentioned earlier, the main aim of a database design is to eliminate data


redundancy. To do so, you must take special care to organise the data in your
database. Normalisation is a method of organising your data as it helps you to
decide which attributes should be grouped together in a relation.

To illustrate the problem associated with data redundancy that causes update
anomalies, let us compare the Supplier and Product relation shown in Figure 6.1
with the alternative format that combine these relations into a single relation
called Product-Supplier relation as shown in Figure 6.2. For the Supplier relation,
supplier number (SuppNo) is the primary key and for Product relation, product
number (ProductNo) is the primary key. For the Product-Supplier relation,
ProductNo is chosen as the primary key.

Figure 6.1: Supplier and product relation

Copyright © Open University Malaysia (OUM)


TOPIC 6 NORMALISATION  119

Figure 6.2: Product-Supplier relation

You should notice that in the Product-Supplier relation, details of the supplier
are included for each product. These supplier details (SName, TelNo and
ContactPerson attributes) are unnecessarily repeated for every product that is
supplied by the same supplier and this leads to data redundancy. For instance,
the product number P2344 and P2346 has the same supplier, thus the same
supplier details for both products are repeated. These supplier details attribute
are also considered as a repeating group.

On the other hand, in the Product relation, only the supplier number is repeated
for the purpose to link each product to a supplier and in the Supplier relation,
details of each supplier appear only once.

A relation with data redundancy, as shown in Figure 6.2, may result in a problem
called update anomalies comprising of insertion, deletion and modification
anomalies. In the following subtopic, we illustrate each of these anomalies using
the Product-Supplier relation.

6.3.1 Insertion Anomalies


How do insertion anomalies exist?

Insert on anomalies exist when adding a new record that will cause
unnecessary data redundancy or when there are unnecessary constraint
places on a task of adding new records.

Copyright © Open University Malaysia (OUM)


120  TOPIC 6 NORMALISATION

There are two examples of insertion anomalies for the Product-Supplier relation
in Figure 6.2:

(a) Product-Supplier Relation


Since the information about product and supplier are combined together in
a single relation, to add a new supplier is not possible without entering
values into attributes for products such as product numbers. This is because
product number is the primary key of the relation and based on the entity
integrity rule, a null value is not allowed for a primary key. In other words,
we cannot add a new supplier unless we assigned a product to that new
supplier. This kind of problem is an example of insert anomaly.

(b) Type of Insertion Anomaly


When we want to insert a new product that is supplied by an existing
supplier, we need to ensure that details of the supplier (repeating group)
are accurately entered and consistent with existing stored values. For
instance, to insert a new product supplied by S9990, we must ensure that
details of supplier S9990 are accurately entered and consistent with the
values for supplier S9990 in other tuples of the Product-Supplier relation. In
a properly normalised database, the insertion anomaly can be avoided as
we only need to enter the supplier number for each product in the product
relation and the details of the supplier are entered only once in the Supplier
relation.

6.3.2 Deletion Anomalies


How does a deletion anomaly exist?

A deletion anomaly exists when deleting a record that would remove a record
not intended for deletion.

In this case, when we want to delete a product from the Product-Supplier


relation, details about the supplier would also be removed from database. There
is a possibility that we are deleting the only tuple that we have for a particular
supplier. For instance, if we want to delete product P5443, details on supplier
„S9898‰ would also be removed from the database.

Copyright © Open University Malaysia (OUM)


TOPIC 6 NORMALISATION  121

As a result, we lose the whole information about this supplier because supplier
S9898 only appears in the tuple that we removed. In a properly normalised
database, this deletion anomaly can be avoided as the information about supplier
and product is stored in separate relations and they are linked together using
supplier number. Therefore, when we delete product number P5443 from
Product relation, details about supplier S9898 from Supplier relation are not
affected.

6.3.3 Modification Anomalies


How does a deletion anomaly exist?

An update anomaly exists when modifying a specific value necessitates which


is the same modification in other records or tables.

Redundant information not only wastes storage but makes updates more
difficult. This difficulty is called modification anomaly.

For example, changing the name of the contact person for supplier S9990 would
require that all tuples containing supplier S9990 need to be updated. If for some
reason, all tuples are not updated, we might have a database that has two
different names of contact persons for supplier S9990.

Since our example is only dealing with a small relation, it does not seem to be a
big problem. However, its effect would be very significant when we are dealing
with a very large database.

Similar to insertion and deletion anomalies, we may avoid the modification


anomaly by having a properly normalised database. In our examples, these
update anomalies arise primarily because the Product-Supplier relation has
information about both product and supplier. One solution to deal with this
problem is to decompose the relation into two smaller relations, Product and
Supplier.

Before we discuss the details of the normalisation process, let us look at the
functional dependency concept, which is an important concept in the
normalisation process.

Copyright © Open University Malaysia (OUM)


122  TOPIC 6 NORMALISATION

SELF-CHECK 6.3
1. Briefly explain data redundancy.
2. Give one example of how data redundancy can cause update
anomalies.
3. Briefly differentiate between insertion anomalies, deletion
anomalies and modification anomalies.

6.4 FUNCTIONAL DEPENDENCIES


Functional dependency is an important concept underlying the normalisation
process.

Functional dependency describes the relationship between attributes


(columns) in a relation.

In this subtopic, we explain the characteristics and the type of functional


dependency that are important for the normalisation process. For our discussion
on this concept, we will refer to the CustomerOrdering relational schema as
shown in Figure 6.3.

Figure 6.3: CustomerOrdering relation

Copyright © Open University Malaysia (OUM)


TOPIC 6 NORMALISATION  123

The details of this relation is shown in Table 6.1.

Table 6.1: CustomerProduct Relation

Order Order Product Prod Unit Qty


CustNo CustName TelNo
No Date No Name Price Ordered
C3340 Bakar Nordin 017-6891122 6234 16-Apr- P2346 19 inch 250 4
2013 Monitor
C1010 Fong Kim Lee 012-5677118 1120 23-Jan- P4590 Laser 650 2
2013 Printer
C1010 Fong Kim Lee 012-5677118 1120 23-Jan- P6677 Colour 350 2
2013 Scanner
C1010 Fong Kim Lee 012-5677118 1120 23-Jan- P2344 17 inch 200 3
2013 Monitor
C2388 Jaspal Singh 013-3717071 4399 19-Feb- P2344 17 inch 200 2
2013 Monitor
C2388 Jaspal Singh 013-3717071 4399 19-Feb- P5443 Colour 750 5
2013 Laser
Printer
C4455 Daud Osman 017-7781256 9503 02-May- P2344 17 inch 200 10
2013 Monitor

6.4.1 Characteristics of Functional Dependencies


Before we look into the normalisation process, let us first understand the concept
and characteristics of functional dependence, which is crucial in understanding
the normalisation process. As mentioned earlier, functional dependency
describes the relationship between attributes in a relation, in which one attribute
or group of attributes determines the value of another.

For a simple illustration of this concept, let us use a relation with attributes A and
B. B is functionally dependent on A, if each value of A is associated with exactly
one value of B. This dependency between A and B is written as „AB‰.

Copyright © Open University Malaysia (OUM)


124  TOPIC 6 NORMALISATION

We may think of how to determine functional dependency like this: given a value
for attribute A, can we determine the single value for B? If B relies on A, then A is
said to functionally determine B. The functional dependency between attribute A
and B is represented diagrammatically in Figure 6.4.

Figure 6.4: Functional dependency between A and B

An attribute or group of attributes on the left hand side of the arrow of a


functional dependency is referred to as a determinant.

In Figure 6.4, A is the determinant. Thus, we may say, „A functionally


determines B‰.

Now, let us look at the CustomerOrdering relation as shown in the previous


Figure 6.3 to find the functional dependencies. First, we consider the attributes
CustNo and CustName. It is true that a specific CustNo can only be associated
with one value of CustName. In other words, the relationship between CustNo
and CustName is one-to-one (for each customer number, there is only one name).
Thus, we can say that CustNo determines CustName or CustName is
functionally dependent on CustNo. This dependency can be written as CustNo
 CustName.

Another example is the relationship between CustNo and OrderNo. Based on the
CustomerOrdering relation, a customer may make more than one order. Thus, a
CustNo may be associated with more than one OrderNo. In other words, the
relationship between CustNo and OrderNo is one-to-many as illustrated in
Figure 6.5. In this case, we can say that OrderNo is not functionally dependent on
CustNo.

Figure 6.5: OrderNo is not functionally dependent on CustNo

Copyright © Open University Malaysia (OUM)


TOPIC 6 NORMALISATION  125

Now, let us examine the opposite direction of the relationship. Is CustNo


functionally dependent on OrderNo? Is a specific OrderNo only associated with
one value of CustNo? In this case, we can say that each OrderNo is associated
with only one CustNo as illustrated in Figure 6.6. Thus, OrderNo determines
CustNo or CustNo is functionally dependent on OrderNo, which can be written
as OrderNo  CustNo.

Figure 6.6: CustNo is functionally dependent on OrderNo

Additional characteristics of functional dependency that are important for


normalisation process are as follows:

(a) Full Functional Dependency


Indicates that if A and B are attributes (columns) of a relation, B is fully
functionally dependent on A and if B is functionally dependent on A but
not on any proper subset of A, then, OrderNo  CustNo.

(b) Partial Functional Dependency


Indicates that if A and B are attributes of a relation, B is partially dependent
on A and if there is some attribute that can be removed from A, the
dependency still holds.

Take, for example, the following functional dependency that exists in the
ConsumerOrdering relation, that is, (OrderNo, ProductNo)  CustNo.
CustNo is functionally dependent on a subset of A(OrderNo, ProductNo),
namely, OrderNo.

(c) Transitive Functional Dependency


A condition where A, B and C are attributes of a relation that if A is
functionally dependent on B and B is functionally dependent on C, then C
is transitively dependent on A via B.

Copyright © Open University Malaysia (OUM)


126  TOPIC 6 NORMALISATION

Consider the following functional dependencies that exist in the


ConsumerOrdering relation:

OrderNoCustNo,
OrderNo  CustName
CustNo  CustName

So, OrderNo attributes functionally determine the CustName via the CustNo
attribute.

6.4.2 Identifying Functional Dependencies


Identifying functional dependencies can be difficult and confusing, if we do not
understand the meaning of each attribute and the relationship between the
attributes. This information should be gathered first from users or owners of the
system to be built before the functional dependency can be identified. Examining
the userÊs requirement specification, business model and rules of the enterprise
will provide a good source for this information.

Now, let us list down all the possible functional dependencies for the
CustomerOrdering relation. We will get a list of functional dependencies as listed
in Figure 6.7.

CustNo  CustName, TelNo


OrderNo  CustNo,CustName, TelNo, OrderDate
ProductNo  ProdName, UnitPrice
OrderNo,ProductNo  QtyOrdered, CustNo, CustName,
TelNo,

Figure 6.7: List of functional dependencies

We may write the functional dependencies by grouping them together based on


their determinants as given in Figure 6.8 or we may list each of them separately
(That is, CustNo  CustName, CustNo  TelNo).

There are five determinants in the CustomerOrdering relation: CustNo,


OrderNo, ProductNo and (OrderNo, ProductNo). We have to ensure that for
each functional dependency, the left hand side determinant is associated with
only a single value of the right hand side attribute(s).

Copyright © Open University Malaysia (OUM)


TOPIC 6 NORMALISATION  127

6.4.3 Identifying the Primary Key for a Relation


Using Functional Dependencies
In our previous discussion, we have identified a list of functional dependencies
for the Customer Ordering relation by analysing the relationship between
attributes. Besides identifying the determinants, functional dependency also
assists us in specifying the integrity constraint and thus, helps to identify the
primary key for a relation. Before we can select a primary key, we need to
identify the possible candidate keys.

In order to find the candidate key(s), we must identify the attribute (or group of
attributes) that uniquely identifies each tuple in a relation. Therefore, to identify
the possible choices of candidate keys, we should examine the determinants for
each functional dependency. Then, we select one of them (if more than one) as
the primary key. All attributes that are not the primary key attribute are referred
to as non-key attributes. These non-key attributes must be functionally
dependent on the key.

Now, let us identify the candidate keys for relation CustomerOrdering. We have
identified the functional dependencies for this relation as given in the previous
Figure 6.7. The determinants for these functional dependencies are: CustNo,
OrderNo, ProductNo and (OrderNo, ProductNo). From this list of determinants,
the (OrderNo, ProductNo) is the only determinant that uniquely identifies each
tuple in the relation. It is also true that all attributes (besides the OrderNo and
ProductNo) are functionally dependent on the determinants with combination of
attributes OrderNo and ProductNo (OrderNo, ProductNo). Thus, it is the
candidate key and the primary key for CustomerOrdering relation.

In this subtopic, we have shown the importance of the functional dependency in


assisting us to identify the primary key for a given relation. Understanding of
this concept is fundamental to the normalisation process which is to be discussed
next.

Copyright © Open University Malaysia (OUM)


128  TOPIC 6 NORMALISATION

6.5 THE PROCESS OF NORMALISATION


Firstly, what does database normalisation mean?

Database normalisation is the process of organising and decomposing an


inefficient structured relation into smaller and more efficiently structured
relations.

In other words, the process of normalisation involves determining what data


should be stored in each relation with the aim to minimise data redundancy and
update anomalies. It makes use of functional dependencies that exist in a relation
and the primary key or candidate keys in analysing the relations.

The normalisation process involves a series of steps and each step is called a
normal form. Three normal forms were initially proposed called first normal
form (1NF), second normal form (2NF) and third normal form (3NF).

Subsequently, Boyce and Codd introduced a stronger definition of 3NF called


Boyce-Codd Normal Form (BCNF). With the exception of 1NF, all these normal
forms are based on functional dependencies among the attributes of a relation.

Higher normal forms that go beyond BCNF were introduced later, such as fourth
normal form (4NF) and fifth normal form (5NF). However, these later normal
forms deal with situations that are very rare. In this topic, we will only cover the
first three normal forms. Figure 6.8 illustrates the process of normalisation up to
the third normal form.

Copyright © Open University Malaysia (OUM)


TOPIC 6 NORMALISATION  129

Figure 6.8: Diagrammatic illustration of the normalisation process


Source: Connolly & Begg (2009)

As illustrated in Figure 6.8, all information gathered about attributes is


transferred into a table format. This table is described as being in the
unnormalised form (UNF). From here, we need to go through a step by step test
of each normal form until it produces a set of relations that fulfil requirements for
the 3NF. The definition of the unnormalised form (UNF) as well as the first,
second and third normal forms can be found in Table 6.2.

Copyright © Open University Malaysia (OUM)


130  TOPIC 6 NORMALISATION

Table 6.2: Definition for Normal Forms

Normal Form Definition


Unnormalised A table that contains one or more repeating groups
Form (UNF)
First Normal A relation in which the intersection of each row and column contains
Form (1NF) one and only one value (atomic value)
Second Normal A relation that is in first normal form and every non-primary key
Form (2NF) attributes is fully functionally dependent on the primary key
Third Normal A relation that is in first and second normal form and in which no
Form (3NF) non-primary key attributes are transitively dependent on the primary
key

The details of the process will be discussed in the following subtopic. Let us
assume that we have transferred all the required attributes from the user
specification requirement into the table format and referred to it as
CustomerOrdering table as shown in Table 6.3. We are going to use the
CustomerOrdering table to illustrate the normalisation process.

Table 6.3: Unnormalised CustomerOrdering Relation

Order Order Product Prod Unit Qty


CustNo CustName TelNo
No Date No Name Price Ordered
C3340 Bakar Nordin 017-6891122 6234 16-Apr-2013 P2346 19 inch 250 4
Monitor
C1010 Fong Kim Lee 012-5677118 1120 23-Jan-2013 P4590 Laser Printer 650 2
P6677 Colour 350 2
Scanner
P2344 17 inch 200 3
Monitor
C2388 Jaspal Singh 013-3717071 4399 19-Feb-2013 P2344 17 inch 200 2
Monitor
4399 19-Feb-2013 P5443 Colour Laser 750 5
Printer
C4455 Daud Osman 017-7781256 9503 02-May-2013 P2344 17 inch 200 10
Monitor

Copyright © Open University Malaysia (OUM)


TOPIC 6 NORMALISATION  131

6.5.1 First Normal Form


First, let us check the CustomerOrdering table and identify whether it is
considered as unnormalised or already in the 1NF. Based on the definition given
in Table 6.2, a table is unnormalised if it contains one or more repeating groups.
In other words, if the table contains multi-valued attributes, an attribute or a
group of attributes that have more than one value for an instance of an entity.

In order for us to transform the unnormalised table to a normalised table, two


steps need to be performed, which are:

(a) Nominate an attribute or group of attributes to act as the key for the
unnormalised table; and

(b) Identify the repeating groups(s) in the unnormalised table which repeats
for the key attribute(s).

If the table contains repeating groups or multi-valued attributes, then we need to


remove these repeating groups. This can be done using any of these two
approaches:

(a) By entering appropriate data into the empty columns of rows containing
the repeating data (fill in the blanks by duplicating the non-repeating data
where required); or

(b) By placing the repeating data along with a copy of the original key
attribute(s) into a separate relation.

Then, after performing one of the above approaches, we need to check whether
the relation is in the 1NF or not. In order to do so, we have to follow these rules:

(a) Identify the key attribute;

(b) Identify the repeating groups; and

(c) Place the repeating groups into a separate relation along with a copy of its
determinants.

The process above must be repeated for all the new relations created for the
repeating attributes to ensure that all relations are in 1NF.

Copyright © Open University Malaysia (OUM)


132  TOPIC 6 NORMALISATION

For example, let us use the first approach by entering the appropriate value to
each cell of the table. Then, we will select a primary key for the relation and
check for repeating groups. If there is a repeating group, then we have to remove
the repeating group to a new relation.

The first step is to check whether the table is unnormalised or is already in the
1NF. Using the CustomerOrdering table to illustrate the normalisation process,
we then select a primary key for the table, which is CustNo. Next, we need to
find repeating groups or multi-valued attributes. We can see that ProductNo,
ProductName, UnitPrice and QtyOrdered have more than one value for CustNo
= „C1010‰ and „C2388‰. So, these attributes are repeating groups and thus, the
table is unnormalised.

As illustrated in Figure 6.8, our next step is to transform this unnormalised table
into 1NF. First, we need to make the table into a normalised relation. Let us apply
the first approach in which we need to fill up all the empty cells with a relevant
value as shown in Table 6.4. Each cell in the table now has an atomic value.

Table 6.4: A Normalised Table

Order Order Product Prod Unit Qty


CustNo CustName TelNo
No Date No Name Price Ordered
C3340 Bakar Nordin 017-6891122 6234 16-Apr- P2346 19 inch 250 4
2013 Monitor
C1010 Fong Kim Lee 012-5677118 1120 23-Jan-2013 P4590 Laser 650 2
Printer
C1010 Fong Kim Lee 012-5677118 1120 23-Jan-2013 P6677 Colour 350 2
Scanner
C1010 Fong Kim Lee 012-5677118 1120 23-Jan-2013 P2344 17 inch 200 3
Monitor
C2388 Jaspal Singh 013-3717071 4399 19-Feb-2013 P2344 17 inch 200 2
Monitor
C2388 Jaspal Singh 013-3717071 4399 19-Feb-2013 P5443 Colour 750 5
Laser
Printer
C4455 Daud Osman 017-7781256 9503 02-May- P2344 17 inch 200 10
2013 Monitor

The next step is to check if the table we just created is in 1NF. Firstly, we need to
identify the primary key for this table and then check for repeating groups. The
best choice would be to look at the list of functional dependencies that you have
identified. From the functional dependency list, we can say that the combination
of OrderNo and ProductNo (OrderNo, ProductNo) is functionally determined by
all the non-key attributes in the table.

Copyright © Open University Malaysia (OUM)


TOPIC 6 NORMALISATION  133

This means that the value of each (OrderNo, ProductNo) is associated with only
a single value of all other attributes in the table and (OrderNo, ProductNo) also
uniquely identifies each of the tuple in the relation. Thus, we can conclude that
(OrderNo, ProductNo) is the best choice as the primary key, since the relation
will not have any repeating group. Therefore, this relation is in 1NF (refer to
Table 6.5).

Table 6.5: First Normal Form CustomerOrdering Relation

Order Order Product Prod Unit Qty


CustNo CustName TelNo
No Date No Name Price Ordered
C3340 Bakar Nordin 017-6891122 6234 16-Apr- P2346 19 inch 250 4
2013 Monitor
C1010 Fong Kim Lee 012-5677118 1120 23-Jan- P4590 Laser Printer 650 2
2013
C1010 Fong Kim Lee 012-5677118 1120 23-Jan- P6677 Colour 350 2
2013 Scanner
C1010 Fong Kim Lee 012-5677118 1120 23-Jan- P2344 17 inch 200 3
2013 Monitor
C2388 Jaspal Singh 013-3717071 4399 19-Feb- P2344 17 inch 200 2
2013 Monitor
C2388 Jaspal Singh 013-3717071 4399 19-Feb- P5443 Colour Laser 750 5
2013 Printer
C4455 Daud Osman 017-7781256 9503 02-May- P2344 17 inch 200 10
2013 Monitor

6.5.2 Second Normal Form


For relations to be in a 2NF, they must first be in 1NF. They must also have no
partial dependencies. A partial dependency occurs when the primary key is
made up of more than one attribute (that is, it is a composite primary key) and
there exists an attribute (which is a non-primary key attribute) that is fully
functionally dependant on only part of the primary key.

These partial dependencies can be removed by removing all of the partially


dependent attributes into another relation along with a copy of the determinant
attribute (which is part of the primary key in the original relation).

Copyright © Open University Malaysia (OUM)


134  TOPIC 6 NORMALISATION

Let us now transform Table 6.5 to 2NF. The first step is to examine whether the
relation has partial dependency. Since the primary key chosen for the relation is a
combination of two attributes, therefore we should check for partial dependency.
From the list of functional dependencies, attributes ProductNo and UnitPrice are
also full functionally dependent on part of the primary key which is the
ProductNo while the CustNo, CustName, TelNo and OrderDate are fully
functionally dependent on part of the primary key, which is the OrderNo.

Thus, this relation is not in 2NF and we need to remove these partial dependent
attributes into a new relation along with a copy of their determinants. Therefore,
we have to remove ProductName and UnitPrice into a new relation, along with
its determinant which is ProductNo. We also need to remove CustNo,
CustName, TelNo and OrderDate into another new relation along with the
determinant OrderNo. After performing this process, 1NF Customer Ordering
relation is now broken down into three relations, which can be named as
Product, Order and OrderProduct as listed in Figure 6.9.

Figure 6.9: Second Normal Form relations derived from CustomerProduct relation

Copyright © Open University Malaysia (OUM)


TOPIC 6 NORMALISATION  135

Since we made changes to the original relation and have created two new
relations, we need to check and ensure that each of these relations is in 2NF.
Based on the definition of 2NF, these relations must first be checked for 1NF test
for repeating groups, then be checked for partial dependency. All these relations
are in 1NF as none of them have repeating groups.

For relations Order and Product, we may skip the partial dependency test as
their primary key only has one attribute. Thus, both of the relations are already
in 2NF. For the OrderProduct relation, there is only one non-key attribute which
is QtyOrdered and this attribute is fully functionally dependent on (OrderNo,
ProductNo). Thus, this relation is also in 2NF.

6.5.3 Third Normal Form


Getting a relation to 3NF involves removing any transitive dependencies.
Therefore, a relation in 3NF must be in 1NF and 2NF, and it must have no non-
primary key attributes which are transitively dependent upon the primary key.
In other words, we must check for functional dependency between two non-key
attributes. Thus, we may conclude that if 2NF relations only have one non-key
attribute then the relation is also in 3NF.

If there is a transitive dependency, we must remove the transitive dependency


attribute(s) or attribute(s) with a non-key determinant to a new relation along
with a copy of its determinants.

Now, let us look at all the three 2NF relations as shown in the previous
Figure 6.9. Since we are looking for a functional dependency between two non-
key attributes, we can say that the relation OrderProduct is already in 3NF. This
is because this relation only has one non-key attribute which is the QtyOrdered.
We need to check for relation Product and Order, as both of these relations have
more than one non-key attribute.

Let us check the Product relation. There is no transitive dependency in this


relation. Thus, we can say that this relation is also in 3NF. Next, we check the
Order relation. Based on our functional dependency list, we can see that CustNo
functionally determines CustName and TelNo. Thus, CustName and TelNo are
transitive attributes and need to be removed from the Order relation into a new
relation along with a copy of the determinant.

By completing this process, we derive one additional relation named as


Customer relation. For the newly created relation, we need to restart the process
to check for 1NF. The primary key for this new relation is normally the
determinant of the transitive attribute(s), which is CustNo. The relation has no

Copyright © Open University Malaysia (OUM)


136  TOPIC 6 NORMALISATION

repeating group, thus it is in 1NF. It is also in 2NF since its primary key consists
of only one attribute. It also has no transitive dependency and thus, the Customer
relation is already in 3NF.

Now, let us check the other three relations. All of them have no transitive
dependencies. Therefore, we conclude that these relations are in 3NF, as shown
in Figure 6.10.

Figure 6.10: Third normal form relations derived from the CustomerProduct relation

Copyright © Open University Malaysia (OUM)


TOPIC 6 NORMALISATION  137

 Normalisation is a process of organising data and breaking it into smaller


relations that are easier to manage. The primary reason we normalise a
database is to prevent redundant data that can eliminate update anomalies.

 Data redundancy refers to an issue related to the duplication of unnecessary


data within a database. The redundant data utilises a lot of space
unnecessarily and also may create problems when updating the database
(also called update anomalies) which may lead to data inconsistency and
inaccuracy.

 One of the most important concepts underlying the normalisation process is


functional dependency. Functional dependency describes the relationship
between attributes (columns) in a relation.

 Normalisation works through a series of stages, described as normal forms.


The first three stages are referred to as:

 First normal form (1NF);

 Second normal form (2NF); and

 Third normal form (3NF).

 The 1NF eliminates duplicate attributes from the same relation, creates
separate relations for each group of related data and identifies each tuple
with a unique attribute or set of attributes (the primary keys).

 The 2NF will remove subsets of data that apply to multiple rows of a table,
place them in separate tables and create relationships between these new
relations and the original relation by copying the determinants of the partial
dependency attributes to the new relations.

 The 3NF will remove columns that are not dependent upon the primary key,
which is the functional dependency between the two non-key attributes.

Copyright © Open University Malaysia (OUM)


138  TOPIC 6 NORMALISATION

Data redundancy Second normal form (2NF)


First normal form (1NF) Third normal form (3NF)
Functional dependency Unnormalised form (UNF)
Normalisation Update anomalies
Normalisation proses

Refer to the following figure and convert this user view to a set of 3NF relations.

XYZ COLLEGE
CLASS LIST
SEMESTER SEPT 2013

COURSE CODE: IT123


COURSE TITLE: INTRODUCTION TO DATABASE
LECTURERÊS NAME: MR ALEX LIM
LECTURERÊS LOCATION: A 203

STUDENT ID NAME MAJOR GRADE


200701 SAM COMP SC A
200702 LINDA INFO TECH B
200703 ANNE COMP SC B
200704 BOB COMP SC A

Assume the following:

(a) A lecturer has a unique location;

(b) A student has a unique major; and

(c) A course has a unique title.

Copyright © Open University Malaysia (OUM)


TOPIC 6 NORMALISATION  139

Connolly, T. M., & Begg, C. E. (2009). Database systems: A practical approach to


design, implementation and management (5th ed.). Boston, MA: Addison-
Wesley.

Rob, P., & Coronel, C. (2011). Database systems: Design, implementation and
management (8th ed.). Stamford, CT: Cengage Learning.

Copyright © Open University Malaysia (OUM)


Topic  Database Design
7 Methodology

LEARNING OUTCOMES
By the end of this topic, you should be able to:
1. Discuss the purpose of design methodology;
2. Explain the three main phases of design methodology; and
3. Apply the methodology for designing relational databases.

 INTRODUCTION
In this topic, we will describe three main phases of database design methodology
for relational databases. These phases are namely conceptual, logical and
physical database designs. The conceptual design phase focuses on building a
conceptual data model which is independent of software and hardware
implementation details.

The logical design phase maps the conceptual model on to a logical model of a
specific data model but independent of the software and physical consideration.
Last but not least the physical design phase is tailored to a specific database
management system (DBMS) and focuses on the hardware requirements. The
detailed activities associated with each of these phases will be discussed more on
the next subtopics.

Copyright © Open University Malaysia (OUM)


TOPIC 7 DATABASE DESIGN METHODOLOGY  141

7.1 INTRODUCTION TO DATABASE DESIGN


METHODOLOGY
Database design methodology provided in this topic is based on the guideline
proposed by Connolly and Begg (2009). They introduced three main phases of
database design methodology, namely conceptual, logical and physical database
designs.

In this subtopic, we provide a description of what a design methodology is and


give a brief overview of these three main phases of database design.

7.1.1 What is Design Methodology?

Design methodology is an approach taken in designing or building things and


it serves as a guideline on how things are done.

Normally, a design methodology is broken down into phases or stages and for
each phase, the detailed steps are outlined and appropriate tools and techniques
are specified. Design methodology is able to support and facilitate designers in
planning, modelling and managing a database development project in a
structured and systematic manner. Validation is one of the key aspects in the
design methodology as it helps to ensure that the produced models accurately
represent the user requirement specifications.

As mentioned earlier, we are going to adopt the database design methodology


proposed by Connolly and Begg (2009) for our discussion in this topic.

The methodology consists of three main phases as follows:

(a) Conceptual
The conceptual database design is aimed at producing a conceptual
representation of the required database. The core activity in this phase
involves the use of entity-relationship (ER) modelling in which the entities,
relationship and attributes are defined.

(b) Logical
In the logical design phase, the aim is to map the conceptual model which is
represented by the ER model to the logical structure of the database.
Among the activities involved in this phase is the use of the normalisation
process to derive and validate relations.

Copyright © Open University Malaysia (OUM)


142  TOPIC 7 DATABASE DESIGN METHODOLOGY

(c) Physical
In the physical design phase, the emphasis is to translate the logical
structure of the physical implementation of the database using the defined
database management system.

Besides the stated three main phases, this methodology has also outlined eight
core steps. Step 1 focuses on the conceptual database design phase, Step 2 focuses
on the logical database design phase and Step 3 to Step 8 focus on the physical
database design phase. This topic will only cover Step 1 to Step 6 (refer to Table 7.1).

Table 7.1: Six Steps of the Design Methodology

Step Description
1 Build a Conceptual Data Model
(a) Identify the entity
(b) Identify the relationship
(c) Identify and associate the attributes with entity or relationship
(d) Determine the attribute domains
(e) Determine the candidate, primary and alternate key attributes
(f) Consider the use of enhanced modelling concepts (optional step)
(g) Check the model for redundancy
(h) Validate the conceptual model against user transactions
(i) Review the conceptual data model with user
2 Build and Validate a Logical Data Model
(a) Derive the relations for logical data model
(b) Validate the relations using normalisation
(c) Validate the relations against user transactions
(d) Check the integrity constraints
(e) Review the logical data model with user
(f) Merge the logical data models into a global model (optional step)
3 Translate a Logical Database Design for a Target DBMS
(a) Design the base tables
(b) Design the representation of derived data
(c) Design the remaining business rules
4 Design the File Organisations and Indexes
(a) Analyse the transactions
(b) Choose the file organisation
(c) Choose the indexes
(d) Estimate the disk space requirements
5 Design the User Views
6 Design a Security Mechanism

Copyright © Open University Malaysia (OUM)


TOPIC 7 DATABASE DESIGN METHODOLOGY  143

Detailed descriptions of these steps will be presented based on their phases,


starting with the conceptual design phase in the next subtopic.

7.1.2 Critical Success Factors in Database Design


Connolly and Begg (2009) have outlined nine critical factors to the success of
database design as the following:

(a) Work interactively with others as much as possible;

(b) Follow a structured methodology throughout the data modelling process;

(c) Employ a data-driven approach;

(d) Incorporate structural and integrity considerations into the data models;

(e) Combine conceptualisation, normalisation and transaction validation


techniques into the data modelling methodology;

(f) Use diagrams to represent as much of the data models as possible;

(g) Use a database design language (DBDL) to represent additional data


semantics;

(h) Build a data dictionary to supplement the data model diagrams; and

(i) Be willing to repeat steps.

These factors serve as a guideline for designers and they need to be incorporated
into the database design methodology.

SELF-CHECK 7.1

1. Briefly explain what is design methodology.

2. Identify the phases of design methodology.

3. Identify three critical success factors in database design.

Copyright © Open University Malaysia (OUM)


144  TOPIC 7 DATABASE DESIGN METHODOLOGY

7.2 CONCEPTUAL DATABASE DESIGN


METHODOLOGY
In this subtopic, we present the steps involved in the conceptual database design
phase. The main focus of the conceptual database design phase is to produce a
conceptual data model that fulfils the enterprise requirements. Our discussions
on design methodology are based on Connolly and BeggÊs (2009) guideline. We
are going to use the Product Ordering case study for our discussion.

Step 1: Build a Conceptual Data Model


Among the key elements that need to be identified for a conceptual model
include:

(a) Entity;

(b) Relationship;

(c) Attributes and attribute domains;

(d) Primary and alternate keys; and

(e) Integrity constraints.

Depending on the size of the database application to be built, we may produce


one local conceptual data model for every user view. In our discussion, we
assume that we only need to build one conceptual data model. The following are
the steps that we need to perform for building the conceptual data model.

(a) Step 1a: Identify the Entity


Identifying the main objects, also referred to as entity , which are required
for the model is the first step to be performed. This information can be
obtained from the userÊs requirement specification.

We have identified seven entity in the conceptual data model:

(i) Customer;

(ii) Employee;

(iii) Product;

Copyright © Open University Malaysia (OUM)


TOPIC 7 DATABASE DESIGN METHODOLOGY  145

(iv) Order;

(v) Invoice;

(vi) Delivery; and

(vii) Supplier.

(b) Step 1b: Identify the Relationship


Next, we need to determine the important relationships that exist between
the entity that have been identified.

How do we identify relationships?

Relationships are identified by examining the transactions that are


needed by the users in the requirements specification.

The relationships are typically described using a verb. Use of ER diagrams


help to visualise the relationship and model more effectively. We also need
to include the cardinality and participation constraints of relationship in
the diagram. The description of this information needs to be documented
for the refinement and validation purposes.

For our product ordering case study, we have identified the following
relationships:

(i) Between Customer and Order: Customer makes Order;

(ii) Between Product and Order: Order has Product;

(iii) Between Supplier and Product: Supplier supplies Product;

(iv) Between Order and Invoice: Order has Invoice;

(v) Between Employee and Order: Employee takes Order; and

(vi) Between Order and Delivery: Order sends for Delivery.

Copyright © Open University Malaysia (OUM)


146  TOPIC 7 DATABASE DESIGN METHODOLOGY

To visualise the relationship between the entities, we use an ER diagram


based on UML notation as shown in Figure 7.1.

Figure 7.1: Initial ER diagram showing entity and relationship

(c) Step 1c: Identify and Associate the Attributes with Entity or Relationship
After identifying the entity and relationship, the next step is to identify
their attributes. It is important to determine the type of these attributes.

As discussed in the previous Topic 2, attributes can be categories as:

(i) Simple or composite;

(ii) Single or multi-valued; and

(iii) Derived attributes.

Copyright © Open University Malaysia (OUM)


TOPIC 7 DATABASE DESIGN METHODOLOGY  147

Again, we need to document the details of each identified attribute. For our
case study, the list of attributes for the defined entities is as follows:

Customer CustNo, Name, CustAddress,TelNo, Balance


Employee EmpNo, Name, TelNo, Position, Gender, DOB, Salary
Order OrderNo, OrderDate, OrderAddress
Invoice InvoiceNo, Date, DatePaid, OrderNo;
Product ProductNo,Name,UnitPrice, QtyOnHand, ReorderLevel, SuppNo
Delivery DeliveryNo, DeliveryDate, OrderNo, ProductNo;
Supplier SuppNo, Name, SuppAddress, TelNo, ContactPerson

(d) Step 1d: Determine the Attribute Domains


Next, we need to determine domains for each attribute and document the
details of each domain. If we have more than one user view, the domain of
an attribute for each user view might be different.

(e) Step 1e: Determine the Candidate, Primary and Alternate Key Attributes
As we have mentioned in Topic 2, a relation must have a key that can
uniquely identify each of the tuples. In order to identify the primary key,
we need to first determine the candidate key for each of the entities.

The primary key for each of the entities are underlined as follows:

Customer CustNo, Name, CustAddress,TelNo, Balance


Employee EmpNo, Name, TelNo, Position, Gender, DOB, Salary
Order OrderNo, OrderDate, OrderAddress, CustNo
Invoice InvoiceNo, Date, DatePaid, OrderNo
Product ProductNo, Name, UnitPrice, QtyOnHand, ReorderLevel, SuppNo
Delivery DeliveryNo, DeliveryDate, OrderNo, ProductNo
Supplier SuppNo, Name, SuppAddress, TelNo, ContactPerson

(f) Step 1f: Consider the Use of Enhanced Modelling Concepts (Optional Step)
This step is involved with the use of enhanced modelling concepts such as
specialisation or generalisation, aggregation and composition. These
concepts are beyond the scope of our discussion.

Copyright © Open University Malaysia (OUM)


148  TOPIC 7 DATABASE DESIGN METHODOLOGY

(g) Step 1g: Check the Model for Redundancy


Checking for the presence of any redundancy in the model is an important
step to perform. The common checking for the redundancy is to re-evaluate
the one-to-one relationship. If the entities in the relationship are similar,
then we need to merge them together as one entity and may need to
redefine the primary key. This type of problem typically exists when we
have more than one user view.

(h) Step 1h: Validate the Conceptual Model Against User Transactions
We have to ensure that the conceptual model supports the transactions
required by the user view.

(i) Step 1i: Review the Conceptual Data Model with User
User involvement during the review of the model is important to ensure
that the model is a „true‰ representation of the userÊs view of the enterprise.

SELF-CHECK 7.2

Identify the steps involved in building a conceptual data model.

7.3 LOGICAL DATABASE DESIGN FOR


RELATIONAL MODEL
In this subtopic, we will describe the activities involved in the logical database
design phase, which is Step 2 of the design methodology. The main focus of this
phase is to map and validate the conceptual data model that has been created in
Step 1 onto the logical structure. The detailed descriptions of the steps are
presented as follows:

Step 2: Build and Validate the Logical Data Model


The main objective of this step is to translate the conceptual data model into the
logical data model. The activities involved in this process include defining the
relations, relationship and integrity constraints. The ER model is the source
representing the conceptual data model and normalisation is an important
technique used for the validation in the logical design phase. The following are
the activities involved in this phase:

(a) Step 2a: Derive the Relations for Logical Data Model
Firstly, we create a set of relations for the logical data model based on the
ER model produced in the prior design phase to represent the entities,
relationships and key attributes.

Copyright © Open University Malaysia (OUM)


TOPIC 7 DATABASE DESIGN METHODOLOGY  149

Each entity is classified as a strong or weak entity type. The relationship is


examined for its relationship type (one-to-one, one-to-many and many-to-
many), its cardinality (minimum and maximum occurrence) and
participation (optional or mandatory).

Examining our ER diagram from the previous phase as shown in the


previous Figure 7.1, we found that two of the relationships have many-to-
many relationships. These relationships are between the Order and Product
and between the Order and Delivery. A many-to-many relationship needs
to be converted into two one-to-many relationships. As a result of these
changes, our new ER diagram is as shown in Figure 7.2.

Figure 7.2: Entity-relationship diagram after converting the many-to-many


relationships

(b) Step 2b: Validate the Relations Using Normalisation


In order to make sure that the relations have an adequate number of
attributes yet minimal data redundancies, we need to validate that all
relations are at least in third normal form (3NF). Please refer to Topic 6 for
the details of the normalisation process.

(c) Step 2c: Validate the Relations Against User Transactions


For this step, it is important to ensure that derived relations support the
required transactions as mentioned in the user requirement specifications.

Copyright © Open University Malaysia (OUM)


150  TOPIC 7 DATABASE DESIGN METHODOLOGY

(d) Step 2d: Check the Integrity Constraints


This step is crucial in order to protect the database from becoming
unfinished, imprecise or incompatible. In this step, we identify what
integrity constraints are needed. This includes identifying:

(i) Required data: Identify the attributes that cannot be null;

(ii) Attribute domain constraints: Define a set of allowable values for each
attribute;

(iii) Multiplicity: Define the constraint for the relationship;

(iv) Entity integrity: Constraint for primary key;

(v) Referential integrity: Constraint for foreign key; and

(vi) General constraints to implement business rules.

(e) Step 2e: Review the Logical Data Model with User
In this step, we need to let the user review the logical data model to ensure
that the model is the true representation of the data requirements of their
enterprise. This is to ensure that the user is satisfied and we can continue to
the next step.

(f) Step 2f: Merge the Logical Data Models into a Global Model (Optional Step)
This step is important for a multi-user view. Since each user view will have
its own conceptual model, (referred to as local conceptual model) therefore
each of these models will be mapped to a separate local logical data model.
During this step, all the local logical models will be merged into one global
logical model. Since we consider our case study as a single user view, this
step is skipped.

SELF-CHECK 7.3

Identify the steps involved in building and validating a logical data


model.

Copyright © Open University Malaysia (OUM)


TOPIC 7 DATABASE DESIGN METHODOLOGY  151

7.4 PHYSICAL DATABASE DESIGN FOR


RELATIONAL MODEL
Physical database design involves the processes for producing a description of
the implementation of a database using a defined DBMS on a secondary storage.
This description includes information on the base relation, storage structures,
access methods and security mechanisms. The key focus of the physical database
design phase is on performance in terms of efficiency and simplicity. The steps
taken in this phase are to ensure that all key functions performed are good and
simple to implement. Changes on the logical data model may be required if
there is complexity in the implementation and/or for the improvement of
performance.

The output from the logical design phase consisting of all the documents that
provide a description of the process of the logical data model such as the ER
diagram, relational schema and data dictionary are important sources for the
physical design process. Unlike the logical phase which is independent of the
DBMS and implementation consideration, the physical phase is tailored to a
specific DBMS and is dependent on the implementation details.

In the physical phase, Connolly and Begg (2009) have outlined six steps, starting
with Step 3 until Step 8. For our discussion of this phase, we only present Step 3
to Step 6 as follows:

Step 3: Translate the Logical Database Design for a Target DBMS


This step is concerned with mapping the logical data model to the target DBMS.
Our focus is to produce a relational database schema that can be implemented in
the target DBMS. All the processes performed for every step of design need to be
documented for easy maintenance.

(a) Step 3a: Design the Base Tables


We begin with designing the base relations that have been identified in the
logical data model in the target DBMS. For each of these relations, we need
to define the attributes and entity constraints for primary and referential
constraints for foreign keys. For each of the attributes, among the
information that we need to define in the target DBMS include the domain,
data types and default values.

Copyright © Open University Malaysia (OUM)


152  TOPIC 7 DATABASE DESIGN METHODOLOGY

(b) Step 3b: Design the Representation of Derived Data


It is also important in this stage to decide how to represent the derived
attributes which normally should not be in the base relation.

(c) Step 3c: Design the Remaining Business Rules


Besides the entity and referential integrity constraints, the design of
business rules as the general constraint for the target DBMS is also
important to ensure the accuracy of the information systemÊs functionality.

Step 4: Design the File Organisations and Indexes


Since one of the key focuses of the physical design phase is on the performance
efficiency, to determine the optimal file organisation and indexes is a crucial task.
Among the steps that need to be taken are as follows:

(a) Step 4a: Analyse the Transactions


Understanding all the functionalities of the transactions that will run on the
database is vital.

(b) Step 4b: Choose the File Organisation


There are many types of file structures. Thus, we need to analyse and
determine the best file organisation and access method.

(c) Step 4c: Choose the Indexes


We need to decide whether we should use indexes to improve the
performance.

(d) Step 4d: Estimate the Disk Space Requirements


The size of storage space for the database affects the performance. Thus, the
right estimation of the space is important.

Step 5: Design the User Views


This step is important in a multi-user environment. The objective of this step is to
design the user views that were identified during the requirement and analyse
the system development lifecycle.

Step 6: Design Security Mechanism


Security is one of the important aspects of the database design. The objective of
this step is to realise the security measures as required by the user. The designer
must investigate the security features provided by the selected DBMS.

Copyright © Open University Malaysia (OUM)


TOPIC 7 DATABASE DESIGN METHODOLOGY  153

 The database design methodology provided in this topic is based on the


guidelines proposed by Connolly and Begg (2009). They have introduced
three main phases of database design methodology, namely conceptual,
logical and physical database designs.

 The conceptual database design aims to produce a conceptual data model


that accurately represents the user requirement and enterprise business
model. The core activity in this phase involves the use of ER modelling in
which the entities, relationship and attributes are defined.

 In the logical design phase, the aim is to map the conceptual model which is
represented by the ER model to the logical structure of the database. Among
the activities involved in this phase is the use of the normalisation process to
derive and validate relations.

 The physical database design involves the processes for producing a


description of the implementation of a database using a defined DBMS on
secondary storage. This description includes information on the base relation,
storage structures, access methods and security mechanisms.

 Documentation is crucial in database design. The details of each process need


to be documented. It is impossible to maintain a database with an
undocumented design.

Conceptual database design Logical database design


Design methodology Physical database design
Entity-relationship (ER) modelling Secondary indexes

Copyright © Open University Malaysia (OUM)


154  TOPIC 7 DATABASE DESIGN METHODOLOGY

1. Discuss the important role played by users in the process of database


design.

2. How would you check a data model for redundancy? Give an example to
illustrate your answer.

3. Briefly explain the difference between conceptual, logical and physical


database design. Why these tasks might be carried out by different people?

Connolly, T. M., & Begg, C. E. (2009). Database systems: A practical approach to


design, implementation and management (5th ed.). Boston, MA: Addison
Wesley.

Post, G. V. (2004). Database management systems: Designing and building


business applications (3rd ed.). New York, NY: McGraw-Hill.

Rob, P., & Coronel, C. (2001). Database systems: Design, implementation and
management (8th ed.). Stamford, CT: Cengage Learning.

Copyright © Open University Malaysia (OUM)


Topic  Database
8 Security

LEARNING OUTCOMES
By the end of this topic, you should be able to:
1. Discuss the importance of database security to an organisation;
2. Identify five types of threats that can affect a database system;
3. Describe six methods to protect a computer system using
computer-based controls; and
4. Identify four methods for securing a Database Management System
(DBMS) on the Web.

 INTRODUCTION
In this topic, we will discuss database security. What do you think about security
in general? Do you feel safe at home or on the road? What about database
security? Do you think that database security is important? What is the value of
the data? What if your personal data or your financial data is stolen? Do you
think that harm could come to you? For sure, some of you have watched spy
movies where computer hackers hack the computer system to access confidential
data and what they do with this information. These are some of the questions
that you might need to think of and consider.

Database security involves protecting a database from unauthorised access,


malicious destruction and even any accidental loss or misuse. Due to the high
value of data incorporate databases, there is strong motivation for unauthorised
users to gain access to it, for instance, competitors or dissatisfied employees.

Copyright © Open University Malaysia (OUM)


156  TOPIC 8 DATABASE SECURITY

The competitors may have strong motivation to access confidential information


about product development plans, cost-saving initiatives and customer profiles.
Some may want to access information regarding unannounced financial results,
business transactions and even customersÊ credit card numbers. They may not
only steal the valuable information, in fact, if they have access to the database,
they may even destroy it and cause havoc (Mannino, 2011). Furthermore, the
database environment has grown more complex where access to data has become
more open through the Internet and mobile computing. Thus, you can imagine
the importance of having database security.

Security is a broad subject and involves many legal and ethical issues. Of course,
there are a few approaches that can be applied in order to maintain database
security. However, before talking about the ways to protect our database, let us
first discuss in more detail the various threats to a database in the next subtopic.

ACTIVITY 8.1

Visit the following website that discusses the balance between the roles
and rights regarding database security:

http://databases.about.com/od/security/a/databaseroles.htm

Write a one-page report discussing your opinion about the article.

8.1 THREATS TO A DATABASE


What does treat mean?

A threat is any situation or event, either intentional or unintentional that may


affect a system and organisation.

Whether the threat is intentional or unintentional, the impact may be the same.
The threats may be caused by a situation or event that involves a person, action
or circumstance that is likely to produce harm to someone or to an organisation.

The harm may be:

(a) Tangible: Loss of hardware, software or data; and

(b) Intangible: Loss of credibility or client confidence and trust.


Copyright © Open University Malaysia (OUM)
TOPIC 8 DATABASE SECURITY  157

Threats to data security may be a direct and intentional threat to the database.
For instance, those who gain unauthorised access to a database like computer
hackers may steal or change the data in the database. They would have to have
special knowledge in order to do so. Table 8.1 illustrates five types of threats and
12 examples of threats (Connolly & Begg, 2009).

Table 8.1: Five Types and Twelve Examples of Threats


Theft and Loss of Loss of Loss of Loss of
Threat
Fraud Confidentiality Privacy Integrity Availability
Using another personÊs
“ “ “
means of access
Unauthorised amendment
“ “
or copying of data
Program alteration “ “ “
Inadequate policies and
procedures that allow a mix
“  
of confidential and normal
output
Wire tapping  “ 
Illegal entry by hacker “ “ “
Creating „trapdoor‰ into
 “ “
system
Theft of data, program and
“ ““ “ “
equipment
Failure of security
mechanisms, giving greater “ “ “
access than normal
Staff shortages or strikes  
Inadequate staff training “ “ “ “
Viewing and disclosing
““ “ “
unauthorised data
Electronic interference and
“ “
radiation
Data corruption owing to
“ “
power loss or surge
Fire (electrical fault,
lightning strike, arson), ““ ““
flood, bomb
Physical damage to
““ ““
equipment
Breaking cables or
“ “
disconnection of cables
Introduction of viruses “ “

Copyright © Open University Malaysia (OUM)


158  TOPIC 8 DATABASE SECURITY

However, focusing on a database security alone will not ensure a secure


database. This is because all parts of the systems must be secure. This includes
the buildings in which the database is stored physically, networks, operating
systems and personnel who have authorised access to the system (Hoffer et al.,
2008). Figure 8.1 illustrates the possible locations for data security threats.

Figure 8.1: Summary of potential threats to computer systems


Source: Connolly & Begg (2009)

SELF-CHECK 8.1

1. Define a threat.

2. Differentiate between tangible and intangible harms. Give two


examples of each.

Copyright © Open University Malaysia (OUM)


TOPIC 8 DATABASE SECURITY  159

8.2 COMPUTER-BASED CONTROLS


By now you should have understood the various types of threats that may attack
the database. Now, it is time to discuss the various ways in which we can secure
our system. The six methods of computer-based controls that threaten the
computer systems range from physical controls to administrative policies and
procedures (refer to Figure 8.2).

Figure 8.2: Six methods of computer-based security controls

8.2.1 Authorisation
What does authorisation mean?

Authorisation is the granting of a right or privilege that enables a subject to


have legitimate access to a system or a systemÊs object.
Connolly & Begg (2009)

The process of authorisation involves authentication of the subject or a person


requesting access to objects or systems.

Copyright © Open University Malaysia (OUM)


160  TOPIC 8 DATABASE SECURITY

How about authentication?

Authentication is a mechanism that determines whether a user is who he or


she claims to be.
Connolly & Begg (2009)

Usually, a user or subject can gain access to a system through individual user
accounts where each user is given a unique identifier which is used by the
operating system to determine that they have the authorisation to do so. The
process of creating the user accounts is usually the responsibility of a system
administrator. Each user account is given a unique password chosen by the user
and known to the operating system.

A separate but similar process would be applied to give the authorised user
access to a database management system (DBMS). This authorisation is the
responsibility of a database administrator. In this case, an authorised user to a
system may not necessarily have access to a DBMS or any associated application
program (Connolly & Begg, 2009).

Authorisation rules refer to a control integrated in the data management system


that controls the access to the data and the actions that client or personnel may
take when they access the data. Table 8.2 illustrates an example of an
authorisation rule represented as a table.

Table 8.2: Sample of Authorisation Rules

Personnel with password Personnel with password


Action
„SUMMER‰ „SPRING‰
Read Y Y
Insert N Y
Modify N Y
Delete N N

Source: Hoffer et al. (2008)

Copyright © Open University Malaysia (OUM)


TOPIC 8 DATABASE SECURITY  161

Referring to Table 8.2, we can see that the personnel whose passwords are
„SUMMER‰ can only read the data while the personnel with the password
„SPRING‰ can perform read, insert and modify the data. However, notice that
the authorisation table that consists of the authorisation rules contain highly
sensitive data. They should be protected by stringent security rules. Usually, the
one selected as the data administration has the authority to access and modify
the table (Hoffer et al., 2008).

8.2.2 Access Controls


Usually, access controls to a database system is based on granting and revoking
of privileges.

A privilege allows a user to create or access (that is read, write or modify) a


database object or to execute a DBMS utility.

The DBMS keeps track of how these privileges are granted to users and possibly
revoked, and ensures that at all times, only users with necessary privileges can
access an object.

Most commercial DBMSs provide an approach to manage privileges that use


structured query language (SQL) and discretionary access control (DAC). The
SQL standard supports DAC through the GRANT and REVOKE commands. The
GRANT command gives privileges to users while the REVOKE command takes
away privileges (Connolly & Begg, 2009). More explanations of this will be
discussed in the next subtopic since we are focusing on Microsoft Office Access.

8.2.3 Views
What does view mean?

A view is the dynamic result of one or more relational operations operating


on the base relations to produce another relation. It is a virtual relation that
does not actually exist in the database but is produced upon request by a
particular user at the end of a request.
Connolly & Begg (2009)

Copyright © Open University Malaysia (OUM)


162  TOPIC 8 DATABASE SECURITY

In other words, a view is created by querying one or more of the base tables,
producing a dynamic result table for the user at the time of the request (Hoffer
et al., 2008).

The user may be allowed to access the view but not the base tables which the
view is based. The view mechanism hides some parts of the database from
certain users and the user is not aware of the existence of any attributes or rows
that are missing from the view. Thus, a user is allowed to see what they need to
see only. Several users may share the same view but only restricted ones may be
given the authority to update the data.

8.2.4 Backup and Recovery


What backup is?

Backup is a process of periodically taking a copy of a database and log file to


an offline storage media.
Connolly & Begg (2009)

Backup is very important for a DBMS to recover the database following a failure
or damage.

A DBMS should provide four basic facilities for backup and recovery of a
database. They are:

(a) Backup Facilities


Backup facilities provide periodic backup copies of the database. Typically,
a backup copy is produced at least once per day. The copy should be stored
in a secured location where it is protected from loss or damage.

There are three types of backups which are:

(i) Regular backup: Large databases may be time consuming;

(ii) Cold backup: Database is shut down and appropriate for small
database; and

(iii) Hot backup: Only a selected portion of the database is shut down
from use and is more practical for large databases.

Thus, determining backup strategies must be based on the demands being


placed on the database systems.

Copyright © Open University Malaysia (OUM)


TOPIC 8 DATABASE SECURITY  163

(b) Journalising Facilities


These facilities maintain an audit trail of transactions and database changes.
In the event of failure, a consistent database state can be re-established
using the information in the journals, together with the most recent backup.

(c) Checkpoint Facilities


The DBMS periodically suspends all processing and synchronises its files to
establish a recovery point. The checkpoint record stores the necessary
information in order to restart the system. A DBMS may perform
checkpoints automatically or based on commands in the application
programs. When failures occur, it is often possible to resume processing
from the most recent checkpoints. In this case, only a few minutes of
processing work may be repeated, compared to a few hours for a complete
restart of the dayÊs processing.

(d) Recovery Manager


It allows the DBMS to restore the database to a correct condition and restart
processing transactions (Hoffer et al., 2008).

8.2.5 Encryption
What does encryption stand for?

Encryption is the process of encoding a data by using a special algorithm that


renders the data unreadable by any program without a decryption key.
Connolly & Begg (2009)

Data encryption can be used to protect highly sensitive data like customer credit
card numbers or user passwords. Some DBMS products include encryption
routines that would automatically encode the sensitive data when they are stored
or transmitted over communication channels. For instance, encryption is usually
used in electronic funds transfer systems. So, if the original data or plain text is
RM5000, it may be encrypted using a special encryption algorithm that would be
changed to XTezzz. Any system that provides an encryption facility must also
provide the decryption facility to decode the data that has been encrypted. The
encrypted data is called cipher text.

These decoding schemes must also be protected otherwise the advantages of


encryption are lost. They also usually require significant computing resources.

Copyright © Open University Malaysia (OUM)


164  TOPIC 8 DATABASE SECURITY

There are two common forms or encryption (Hoffer et al., 2008):

(a) One-Key
With the one-key approach, also known as data encryption standard (DES),
both the sender and the receiver need to know the key that is used to
scramble the transmitted or stored data.

(b) Two-Key
A two-key approach, also known as asymmetric encryption, employs a
private and a public key. This approach is popular in e-commerce
applications for transmission security and database storage of payment
data such as credit card numbers.

8.2.6 Redundant Array of Independent Disks (RAID)


The DBMS should continue to operate even though one of the hardware
components fail. This is very important especially for real-time processing where
a one second delay in result processing would affect the system performance or
cause money loss. Thus, the hardware that the DBMS is running on must be
fault-tolerant where the DBMS should continue operating and processing, even if
there is a hardware failure.

The five hardware components that should be fault-tolerant are (Connolly &
Begg, 2009):

(a) Disk drives;

(b) Disk controllers,

(c) Central processing unit (CPU);

(d) Power supplies; and

(e) Cooling fans.

One way to handle fault-tolerant hardware is the use of RAID.

Redundant array of independent disks (RAID) works by having a large disk


array containing an arrangement of several independent disks.

Copyright © Open University Malaysia (OUM)


TOPIC 8 DATABASE SECURITY  165

These disks are organised to improve performance. The performance can be


increased through data stripping where the data is segmented into equal-size
partitions, distributed across multiple disks. This looks like the data is stored in a
single large disk but in fact, the data is distributed across several smaller disks,
being processed in parallel (Connolly & Begg, 2009).

SELF-CHECK 8.2

1. Define authorisation and authorisation rules.

2. Identify the backup facilities.

3. Briefly explain how encryption can secure the data in a database.

8.3 SECURITY IN MICROSOFT OFFICE ACCESS


DATABASE MANAGEMENT SYSTEM
(DBMS)
The SQL GRANT and REVOKE statements discussed earlier are not available in
Microsoft Office Access. So, securing a database using Microsoft Office Access
can be performed by setting a password for opening a database. A password can
be assigned when opening a database by clicking Tools, then Security menu.

Thus, only users who key in the correct password can open the database.
However, once a database is open, all the objects in the database can be accessed.
Therefore, it is advisable to change the password regularly.

SELF-CHECK 8.3

How do you set the password to open an existing database in Microsoft


Office Access?

Copyright © Open University Malaysia (OUM)


166  TOPIC 8 DATABASE SECURITY

8.4 DBMS AND WEB SECURITY


The explosions of websites that make current data accessible to viewers through
the Internet connection raise a lot of security issues. The challenge is to transmit
and receive information over the Internet while ensuring that:

(a) It is accessible only to the sender and receiver;

(b) It has not been changed during transmission;

(c) The receiver can be certain that the data came from the sender;

(d) The sender can be certain that the receiver is genuine; and

(e) The sender cannot deny he or she sent the data.

Another issue that needs to be considered in the web environment is that the
information being transmitted may have executable content. An executable
content can perform the following malicious actions (Connolly & Begg, 2009)
such as:

(a) Destroy data or program;

(b) Reformat complete disks;

(c) Shut down the system; and

(d) Collect and download confidential data.

Nowadays, malware or malicious software like computer viruses and spam are
widely spread. What do computer viruses and spam mean?

Computer viruses are unauthorised computer codes that are created to


destroy the data or corrupt the computer.

A spam is unwanted e-mail that we receive without knowing who the sender
is or without wanting to receive the e-mail.

Copyright © Open University Malaysia (OUM)


TOPIC 8 DATABASE SECURITY  167

Their presence could fill up the inbox of e-mails and we would be just wasting
our time deleting them. Thus, the next subtopic will discuss some of the methods
on how to secure the database in a web environment (refer to Figure 8.3).

Figure 8.3: Four methods for securing database management system (DBMS)

8.4.1 Proxy Servers


What proxy server is?

A proxy server is a computer that is located between a web browser and a


web server. It intercepts all requests to the web server and performs the
requests.

If it cannot fulfil the requests itself, then it will pass the request to the web server.
Thus, actually its main purpose is to improve performance. For instance, assume
that User 1 and User 2 access the web through a proxy server. When User 1
requests a certain web page and later User 2 requests the same, the proxy server
would just fetch the page that has been residing in the cache page. Thus, the
retrieval process would be faster.

Besides that, proxy servers can also be used to filter requests. For instance, an
organisation might use a proxy server to prevent its employees or clients from
accessing certain websites. In this case, the known bad websites or insecure
websites could be identified and access to it could be denied (Connolly & Begg,
2009).

Copyright © Open University Malaysia (OUM)


168  TOPIC 8 DATABASE SECURITY

8.4.2 Firewalls
What does firewall mean?

A firewall is a system designed to prevent unauthorised access to or from a


private network.
Connolly & Begg (2009)

Firewalls could be implemented in hardware, software or a combination of both.


All messages or requests entering or leaving the Internet pass through the
firewall and it would examine the messages and requests and block those that do
not meet the specified security characteristics.

8.4.3 Digital Signatures


Do you know what does digital signature used for?

A digital signature could be used to verify that the data comes from the
authorised sender.

It consists of two pieces of information (Connolly & Begg, 2009):

(a) A string of bits that is computed from the data that is being signed using
signature algorithms; and

(b) A private key or password of the individual wishing the signature.

8.4.4 Digital Certificates


What does digital certificate mean?

A digital certificate is an attachment to an electronic message used to verify


that a user sending a message is who he or she claims to be.

Copyright © Open University Malaysia (OUM)


TOPIC 8 DATABASE SECURITY  169

It also provides the receiver with the ways to decode a reply. A digital certificate
could be applied from a Certificate Authority. The Certificate Authority issues an
encrypted digital certificate that consists of the applicantÊs public key and
various other identification information. The receiver of an encrypted message
uses the Certificate AuthorityÊs public key to decode the digital certificate
attached to the message (Connolly & Begg, 2009).

ACTIVITY 8.2

1. For each of the following situations, identify the appropriate


computer-based control and discuss one reason why such a
control is chosen:

(a) A national brokerage firm uses an electronic funds transfer


(EFT) system to transmit sensitive financial data between
locations;

(b) An organisation has set up an off-site computer-based


training centre and it wishes to restrict access to the site to
authorised employees only; and

(c) A small manufacturing firm uses a simple password system


to protect its database but realises that it needs a more
comprehensive system to grant different privileges like read,
view, delete or update to different users.

2. What concerns would you have if you accept a job as a database


administrator and discover that the database users are entering
one common password to log on to the database?

3. An organisation has a database server with three disk devices. The


accounting and payroll applications share one of these devices
and are experiencing performance problems. You have been asked
to investigate the problem. What might you have suggested to
overcome this problem?

Copyright © Open University Malaysia (OUM)


170  TOPIC 8 DATABASE SECURITY

 Database security is the mechanism that protects the database against


intentional or unintentional threats.

 A threat is any situation or event, whether intentional or unintentional, that


will affect a system and organisation.

 There are five types of threats:

 Theft and fraud;

 Loss of confidentiality;

 Loss of privacy;

 Loss of integrity; and

 Loss of availability.

 Computer-based security controls for the multi-user environment include


authorisation, access controls, views, backup and recovery, encryption and
RAID technology.

 The security measures associated with DBMSs on the web include proxy
servers, firewalls, digital signatures and digital certificates.

Authorisation Encryption
Authentication Firewalls
Cold backup Hot backup
Decryption Redundant array of independent disks (RAID)
Digital certificates Recovery
Digital signatures Threat

Copyright © Open University Malaysia (OUM)


TOPIC 8 DATABASE SECURITY  171

1. Discuss the importance of database security.

2. Discuss the security measures provided by Microsoft Office Access.

3. Explain the approaches to secure DBMS on the Web.

About.com. (2014). Database security issues. Retrieved from


http://databases.about.com/od/security/Database_Security_Issues.htm

Connolly, T. M., & Begg, C. E. (2009). Database systems: A practical approach to


design, implementation and management (4th ed.). Boston, MA: Addison
Wesley.

Hoffer, J. A., Prescott, M. B., & McFadden, F. R. (2008). Modern database


management (8th ed.). New Jersey, NJ: Prentice-Hall.

Mannino M. (2011). Database design, application development and


administration (5th ed.). Scottsdale, AZ: Ediyu.

Copyright © Open University Malaysia (OUM)


Topic  Transaction
9 Management

LEARNING OUTCOMES
By the end of this topic, you should be able to:
1. Identify the ACID properties of transactions;
2. Discuss the concepts of concurrency transparency;
3. Describe the concepts of recovery management;
4. Explain the role of locks to prevent interference problems among
multiple users; and
5. Summarise the role of recovery tools to deal with database failures.

 INTRODUCTION
The data in a database must always be consistent and reliable; these are the
essential functions of the database management system (DBMS). This topic
focuses on transaction management which consists of concurrency control and
recovery management, both of which are functions of DBMS.

In this topic, you will look at the properties of transactions and under
concurrency control, you will study its objectives, types of interference problems
and tools to prevent interference problems caused by multiple accesses. You will
also find out about the different types of failures that can occur in the database,
recovery tools used by the DBMS and also the recovery process.

Copyright © Open University Malaysia (OUM)


TOPIC 9 TRANSACTION MANAGEMENT  173

9.1 DATABASE TRANSACTIONS


What does database transaction mean?

A database transaction is a set of operations that must be processed as one


unit of work.
Mannino (2011)

According to Connolly and Begg (2009), the transaction may be an entire


program or a single command and may involve any number of operations on the
database.

9.1.1 Transaction Example


How do you define a transaction?

A transaction is a logical unit of work on the database.


Connolly & Begg (2009)

For example, if we want to give all employees a pay rise of 15%, operations to
perform this action are shown in Figure 9.1.

Begin_Transaction
Read EmpNo, Salary
Salary = Salary * 1.15
Write EmpNo, Salary
Commit

Figure 9.1: A sample transaction

Copyright © Open University Malaysia (OUM)


174  TOPIC 9 TRANSACTION MANAGEMENT

This is a simple transaction to give all employees a pay rise of 15%. The Begin
Transaction and Commit statements define the statement in a transaction. Any
other structured query language (SQL) statements between them are part of the
transaction. In Figure 9.1, the transactions consist of two database operations that
are the Read and Write. If a transaction completes successfully (refer to
Figure 9.1), it will commit and the database will return to a new consistent state.

Besides the Begin Transaction and Commit statements, the Rollback statement
may be used. The Rollback statement will remove all the effects of a transaction if
it does not execute successfully. A Rollback statement can be used in several
contexts for example to cancel a transaction or to respond to errors. A sample
transaction with the Rollback statement is shown in Figure 9.2.

Begin_Transaction
Read ProductNo(X), QtyOnHand
QtyOnHand = QtyOnHand + 10
Write ProductNo(X), QtyOnHand
......
......
Rollback

Figure 9.2: A sample transaction with Rollback

The transaction in Figure 9.2 increments the quantity on hand (QtyOnHand) of


product number (ProductNo) X by 10 but before the transaction can commit, it
encounters an error and issues a Rollback, thereby removing the effects of the
transaction. Since the transaction did not execute successfully, it was aborted and
the database must be restored to the consistent state it was in before the
transaction started.

The three commands of Begin Transaction, Commit and Rollback provides


delimiters to a transaction.

Copyright © Open University Malaysia (OUM)


TOPIC 9 TRANSACTION MANAGEMENT  175

9.1.2 Transaction Properties


There are properties that all transactions should follow. These four properties are
known as ACID properties as shown in Table 9.1.

Table 9.1: Four Transaction Properties (ACID)

Property Description
Atomicity Although a transaction is conceptually atomic, a transaction would
usually consist of a number of steps. It is necessary to make sure that
either all actions of a transaction are completed or the transaction has
no effect on the database. Therefore, a transaction is either completed
successfully or rolled back. This is sometime called the all-or-nothing
property.
Consistency Although a database may become inconsistent during the execution of
a transaction, it is assumed that a completed transaction preserves the
consistency of the database. For example, if a person withdraws RM100
from an ATM, the personÊs account is balanced before withdrawal
(transaction). After withdrawal, the account must also be balanced.
Isolation No other transactions should view any partial results of the actions of a
transaction since intermediate states may violate consistency. Each
transaction must be executed as if it was the only transaction being
carried out.
Durability Once the transaction has completed successfully, its effects must persist
and a transaction must complete before its effects can be made
permanent. A committed transaction cannot be aborted.

Source: Connolly & Begg (2009)

As stated earlier, DBMS provides the following two services to ensure that
transaction follows the ACID properties, recovery management and concurrency
management (Mannino, 2011). These two services are:

(a) Concurrency Management


Allows users to think that the database is a single-user system when in
actual fact, there are many simultaneous users. For example, there may be
many users trying to book airline tickets in a single flight but concurrency
control ensures that passengers are not overbooked. The consistency and
isolation properties of transactions are ensured by the concurrency control
manager.

Copyright © Open University Malaysia (OUM)


176  TOPIC 9 TRANSACTION MANAGEMENT

(b) Recovery Management


Recovery management ensures that the database returns to a consistent or
correct state after a hardware or software failure. For example, if there is a
communication failure during an ATM transaction, effects of the
transaction are removed from the database. It is the responsibility of the
recovery services to ensure the atomicity and the durability properties of
transactions.

9.2 CONCURRENCY CONTROL


Many businesses such as banks or airlines have many users accessing the
databases concurrently. Multiple users accessing the database simultaneously
cannot be permitted to interfere with one another. What is the objective of
concurrency control?

The objective of concurrency control is to maximise transaction throughput while


preventing interference among multiple users.

Transaction throughput is the number of transactions processed per unit of time;


a measure of the amount of work performed by the DBMS. More than one
million transactions per second as a benchmark have been reported by the
Transaction Processing Council. Transaction throughput is related to response
time; higher transaction throughput means lower response time (Mannino, 2011).

Concurrent transactions manipulating common data may cause interference


problems, for example, on the seat-remaining column of a flight table. Many
users may want to book on the number of seats remaining in a particular flight. It
is crucial that DBMS control concurrent access to the seat-remaining column of a
flight table while users are booking simultaneously (Mannino, 2011).

Copyright © Open University Malaysia (OUM)


TOPIC 9 TRANSACTION MANAGEMENT  177

9.2.1 Interference Problems


There are three interference problems that can be caused by concurrent access.
These are shown in Figure 9.3.

Figure 9.3: Three interference problem

(a) Lost Update Problem


This type of problem occurs when one userÊs update overwrites another
userÊs update. This is a serious problem as updates to the database are lost
forever. We show an example of this problem by using a bank account
transaction.

Consider TransactionA and TransactionB (see Table 9.2) in which


TransactionA is executing concurrently with TransactionB. TransactionA is
withdrawing RM100 from an account with BalanceX (BX), initially RM500
and TransactionB is depositing RM100 into the same account.

Table 9.2: The Lost Update Problem

Time TransactionA TransactionB BX


T1 Begin_transaction 500
T2 Begin_transaction Read (BX) 500
T3 Read (BX) BX=BX + 100 500
T4 BX=BX - 100 Write (BX) 600
T5 Write (BX) Commit 400
T6 Commit 400

Source: Connolly & Begg (2009)

Copyright © Open University Malaysia (OUM)


178  TOPIC 9 TRANSACTION MANAGEMENT

TransactionA and TransactionB both read the balance as RM500.


TransactionB increases the balance to RM600 and writes it to the database.
TransactionA reduces the balance by RM100 and writes the value of RM400
as the balance. TransactionA has therefore overwritten the value of
TransactionB; it has lost RM100 in the process. The correct value of BX
should therefore be RM600.

(b) Uncommitted Dependency Problem


An uncommitted dependency problem occurs when one transaction reads
data written by another transaction before the other transaction commits
(Mannino, 2011). This type of interference problem is also known as dirty
read because it is caused by the transaction reading dirty (uncommitted)
data.

In Table 9.3, TransactionB updates BalanceX (BX) to RM600 but aborts the
transaction. This could be due to an error or maybe because it was updating
the wrong account. It aborts the transaction by issuing a Rollback which
causes the value in BX to revert back to its original value of RM500.
However, Transaction A is incorrectly reading the value in BX as RM600 at
time T5. It then decrements it by RM100, giving BX an incorrect value of
RM500 and goes on to commit it. As you would have figured, the correct
value in BX is RM400.

Table 9.3: Uncommitted Dependency Problem

Time TransactionA TransactionB Balance


T1 Begin_transaction 500
T2 Read (BX) 500
T3 BX=BX + 100 500
T4 Begin_transaction Write (BX) 600
T5 Read (BX) ⁄⁄ 600
T6 BX=BX -100 Rollback 500
T7 Write (BX) 500
T8 Commit 500

Source: Connolly & Begg (2009)

Copyright © Open University Malaysia (OUM)


TOPIC 9 TRANSACTION MANAGEMENT  179

(c) Inconsistent Analysis Problem


This problem occurs when a transaction calculating a summary function
reads some values, before another transaction changes the values but reads
other values, after another transaction changes the values.

In Table 9.4, TransactionA is updating the balances of BalanceX (BX) and


BalanceY (BY). Transaction B is summarising BX and BY. Transaction B has
read the value of BX correctly which is RM400 but it reads the value of BY
as RM500 before TransactionA has incremented it to RM100. This has
resulted in an incorrect value in Sum at time T9. TransactionB should
therefore wait for TransactionA to update BY to read the new value of
RM600 in BY. The correct value written to the database should be RM1000.

Table 9.4: Inconsistent Analysis Problem

Time TransactionA TransactionB BX BY SUM


T1 Begin_transaction 500 500 0
T2 Read (BX) 500 500 0
T3 BX=BX -100 500 500 0
T4 Write (BX) Begin_transaction 400 500 0
T5 Read (BX) 400 500 0
T6 Sum= Sum + BX 400 500 400
T7 Read (BY) 400 500 500
T8 Sum = Sum + BY 400 500 900
T9 Read (BY) Write Sum 400 500 900
T10 BY=BY+100 Commit 400 500 900
T11 Write BY 400 600 900
T12 Commit 400 600 900

Copyright © Open University Malaysia (OUM)


180  TOPIC 9 TRANSACTION MANAGEMENT

9.2.2 Concurrency Control Tools


Here, we look at the tools, namely locks and two-phase locking protocol (2PL)
used by most DBMSs to prevent the interference problems.

(a) Locks
Locks prevent other users from accessing a database item in use. A
database item can be a column, row or even an entire table. A transaction
must acquire a lock before accessing a database item. There are two types of
locks:

(i) Shared Lock (S)


A shared lock must be obtained before reading a database item. The
transaction can therefore read a data item but not update it. Any
number of users can hold a shared lock.

(ii) Exclusive Lock (X)


An exclusive must be obtained before writing to a data item. The
transaction can both read and write the data item. An exclusive lock
gives a transaction exclusive access to the data item. Therefore, as long
as the transaction holds the lock, no other transaction can read or
update that data item (Connolly & Begg, 2009; Mannino, 2011). This is
summarised in the Table 9.5.

Table 9.5: Locking Conflicts

User 1 Holds User 2 Requests


S Lock X Lock
S Lock Lock granted User 2 waits
X Lock User 2 waits User 2 waits

Source: Mannino (2011)

Some DBMSs allows locks to be upgraded or downgraded. A shared lock


can be upgraded to an exclusive lock. If a transaction has a shared lock on a
database item but wants to update the database item, then it can request for
the lock to be upgraded to an exclusive lock. In the same way, an exclusive
lock held by a transaction can be downgraded to a shared lock.

Copyright © Open University Malaysia (OUM)


TOPIC 9 TRANSACTION MANAGEMENT  181

(b) Two-Phase Locking Protocol (2PL)


The concurrency control manager ensures that all transactions follow the
2PL. By following the rules of this protocol, every transaction is divided
into two phases as shown in Figure 9.4.

Figure 9.4: Two phases of 2PL

The transaction need not request for all locks at the same time. The
transaction would usually acquire some locks, do some processing and
continue to get more locks as needed. However, the transaction would only
release all the locks when no new locks are needed (Connolly & Begg,
2009).

The two rules of the 2PL are as follows:

(i) Before reading or writing a data item that the transaction must
acquire, S or X lock to the data item; and

(ii) After releasing a lock, the transaction does not acquire any new locks.

Copyright © Open University Malaysia (OUM)


182  TOPIC 9 TRANSACTION MANAGEMENT

We take a look at how the 2PL is used to overcome the three interference
problems as stated earlier in subtopic 9.2.1.

(i) Preventing the Lost Update Problem Using 2PL


The 2PL can be used to solve the lost update problem as shown in
Table 9.6.

Table 9.6: Preventing the Lost Update Problem

Time TransactionA TransactionB BX


T1 Begin_transaction 500
T2 Begin_transaction XLock (BX) 500
T3 XLock (BX) Read (BX) 500
T4 Wait BX=BX + 100 500
T5 Wait Write (BX) 600
T6 Wait Commit/Unlock(BX) 600
T7 Read (BX) 600
T8 BX=BX - 100 600
T9 Write (BX) 500
T10 Commit/Unlock(BX) 500

TransactionB first obtains an exclusive (X) lock on BX. It then


increments BX by RM100 and goes on to write this value to the
database. TransactionA requests an exclusive lock on BX but this is
not granted as the lock is kept by TransactionB. As this is an exclusive
lock, TransactionA has to wait for TransactionB to release the lock
before it can acquire it. TransactionB only releases the lock once it
commits. When TransactionA gets the exclusive lock on BX, we now
see that it reads the value of BX as RM600, decrements it by RM100
and commits the value of BX as RM500, instead of the inaccurate
value of RM400 as was the case when 2PL was not applied.

Copyright © Open University Malaysia (OUM)


TOPIC 9 TRANSACTION MANAGEMENT  183

(ii) Preventing the Uncommitted Dependency Problem Using 2PL


The 2PL can be used to solve the uncommitted dependency problem
as shown in Table 9.7.

Table 9.7: Preventing the Uncommitted Dependency Problem

Time TransactionA TransactionB Balance


T1 Begin_transaction 500
T2 XLock(BX) 500
T3 Read (BX) 500
T4 Begin_transaction BX=BX + 100 600
T5 XLock(BX) Write (BX) 600
T6 Wait Rollback/Unlock (BX) 500
T7 Read (BX) 500
T8 BX=BX -100 500
T9 Write (BX) 400
T10 Commit/Unlock (BX) 400

TransactionB first obtains an exclusive lock on BX. It then proceeds to


increment BX by RM100 giving RM600 but before it commits this
value, a rollback is issued. This removes the updates of TransactionB.
Hence, BX original value of RM500 remains unchanged. Meanwhile,
TransactionA starts and requests an exclusive lock on BX but is not
granted as the exclusive lock on BX is held by TransactionB. When it
finally gets the exclusive lock, it reads BX as RM500, decrements it by
RM100 and commits BX as RM400. This is the correct value, unlike
previously when 2PL was not used.

Copyright © Open University Malaysia (OUM)


184  TOPIC 9 TRANSACTION MANAGEMENT

(iii) Preventing the Inconsistent Analysis Problem Using 2PL


The 2PL can be used to solve the inconsistent analysis problem as
shown in Table 9.8.

Table 9.8: Preventing the Inconsistent Analysis Problem

Time TransactionA TransactionB BX BY SUM


T1 Begin_transaction 500 500 0
T2 XLock(BX) 500 500 0
T3 Read (BX) 500 500 0
T4 BX=BX -100 Begin_transaction 400 500 0
T5 Write (BX) SLock(BX) 400 500 0
T6 XLock(BY) Wait 400 500 0
T7 Read (BY) Wait 400 500 0
T8 BY=BY+100 Wait 400 500 0
T9 Write BY Wait 400 600 0
T10 Commit/Unlock(BX,BY) Wait 400 600 0
T11 Read (BX) 400 600 0
T12 Sum= Sum + BX 400 600 400
T13 SLock(BY) 400 600 400
T14 Read (BY) 400 600 400
T15 Sum = Sum + BY 400 600 1000
T16 Write Sum 400 600 1000
T17 Commit/Unlock(BX,BY) 400 600 1000

TransactionA obtains the exclusive locks for both BX and BY. It then
proceeds to do the updating of BX and BY, releasing the locks only
when BX and BY is committed. Meanwhile, when TransactionB
requests for the shared lock for BX it must wait, as TransactionA has
the exclusive lock. When the shared lock for BX is granted after
TransactionA commits, it totals BX and then requests for the shared
lock for BY which is granted. TransactionB then goes on to add BY to
Sum, giving a total value of RM1000; unlike Table 9.4 which gave an
incorrect value of RM900 when 2PL was not used.

Copyright © Open University Malaysia (OUM)


TOPIC 9 TRANSACTION MANAGEMENT  185

(iv) Deadlocks
The use of locks to solve interference problems can lead to deadlocks.
When two transactions are each waiting for locks to be released that
are held by the other, it can result in a deadlock. It is the problem of
mutual waiting (Connolly & Begg, 2009; Mannino, 2011).

Table 9.9 illustrates the problem of deadlock. It shows two


transactions that are deadlocked because each is waiting for the other
to release a lock on an item it holds.

Table 9.9: Deadlock between Two Transactions

Time TransactionA TransactionB


T1 Begin_transaction
T2 XLock(BX) Begin_transaction
T3 Read (BX) XLock(BY)
T4 BX=BX + 100 Read (BY)
T5 Write(BX) BY=BY -100
T6 XLock(BY) Write(BY)
T7 Wait XLock(BX)
T8 Wait Wait
T9 Wait Wait
T10 ⁄.. Wait
T11 ⁄.. ⁄..
T12 ⁄.. ⁄..

Source: Connolly & Begg (2009)

Transaction A has an exclusive lock on BX and at time T6 requests for


an exclusive lock on BY but has to wait because it is being held by
TransactionB. Meanwhile, TransactionB at time T7 requests for the
exclusive lock for BX but has to wait because it is being held by
TransactionA. Both these transactions cannot continue because each is
waiting for a lock it cannot obtain until the other completes (Connolly
& Begg, 2009).

Copyright © Open University Malaysia (OUM)


186  TOPIC 9 TRANSACTION MANAGEMENT

To control deadlocks, most DBMSs use a simple time-out policy. In


this method, the concurrency control manager aborts any transaction
that is waiting for more than a specified time. It may, however, affect
transactions which are not in a deadlock. The time-out policy should
be large enough so that only deadlocked transactions are affected
(Mannino, 2011).

(v) Locking Granularity


Table 9.10 explains locking granularity, coarse granularity and finer
locks. If a transaction obtains a lock on the database, no other users
can access the database and everyone must wait.

Table 9.10: Locking Granularity, Coarse Granularity and Finer Locks

Item Definition
Locking granularity Refers to the size of the database item locked
Coarse granularity Refers to large items such as the entire database or
an entire table
Finer locks Refers to the row or a column

(vi) Optimistic Approaches


The use of locks and 2PL is a pessimistic approach to concurrency
control. Locking assumes that every transaction conflicts. Optimistic
concurrency control approaches assume that conflicts are rare and it is
more efficient to check for conflicts. In this approach, transactions can
access the database without obtaining locks. The concurrency control
manager, then, checks for conflicts. If a conflict has occurred, the
concurrency control manager issues a rollback and restarts the
problematic transaction (Mannino, 2011).

9.3 RECOVERY MANAGEMENT


What does recovery management stand for?

Recovery management is a service provided by the DBMS to restore the


database to a correct state after a failure.

Here, we look at the different types of failures, tools used by recovery


management and recovery process that uses these tools.

Copyright © Open University Malaysia (OUM)


TOPIC 9 TRANSACTION MANAGEMENT  187

9.3.1 Database Failures


There are many types of failures that can affect database systems. These failures
affect the main memory as well as secondary storage. The five causes of failure
are (Connolly & Begg, 2009):

(a) System crashes due to hardware, software or network errors: Affecting


main memory;

(b) Media failures such as disk crash: Affecting parts of secondary storage;

(c) Application software errors such as logical errors in the program accessing
the database;

(d) Natural physical disasters such as power failures, fires or earthquakes; and

(e) Carelessness or unintentional destruction of data or facilities by users.

9.3.2 Recovery Tools


The DBMS is equipped with these four tools to support the recovery process.
They are:

(a) Log File


The file contains information about all changes made to the database. It
provides a history of database changes made by the transactions. The log
file contains the following information:

(i) Transaction records. They contain (Connolly & Begg, 2009):

 Transaction identifiers;

 Type of log records (transaction start, insert, update, delete, abort


and commit);

 Identifiers of data item (table, row, column) affected by the


database actions (insert, delete and update operations);

 Before-image of the data item, the value before the change (update
and delete operations only);

 After-image of the data item, the value after the change (insert and
update operations only); and

 Log management information such as pointers to previous and


next log records for the transaction.
Copyright © Open University Malaysia (OUM)
188  TOPIC 9 TRANSACTION MANAGEMENT

(ii) Checkpoint records. They are described next. A section of the log file
reproduced here from Mannino (2011) is shown in Table 9.11.

Table 9.11: A Section of the Log File

Tid Time Operation Table Row Column Before Image After Image
T1 10:12 Start
T1 10:13 Update Acct 1000 AcctBal 100 200
T1 10:14 Update Acct 1514 AcctBal 500 400
T1 10:15 Commit

The recovery manager can perform the following two operations on


the log:

 Undo operation: The database reverts to the previous state by


substituting the old value (before image) for whatever value is
stored in the database (after image).

 Redo operation: The database establishes a new state by


substituting a new value (after image) for whatever value is stored
in the database (before image).

(b) Checkpoint
What does checkpoint stand for?

The checkpoint is the time when all transactions stop.

At this point, a checkpoint record is written to the log and database buffers
are written to disk.

Checkpoints are written at periodic intervals and involve the following


operations (Connolly & Begg, 2009; Mannino 2011):

(i) Writing all log records in the main memory to secondary storage;

(ii) Writing all modified blocks in the database buffers to secondary


storage; and

(iii) Writing a checkpoint record to the log file.

Copyright © Open University Malaysia (OUM)


TOPIC 9 TRANSACTION MANAGEMENT  189

(c) Backup Mechanism


At regular intervals, backup copies of the database and the log file are
made. Backups are used if the database has been damaged or destroyed.
Backups can be a complete copy of the database or incremental backup
consisting only of the modifications made, since the last complete or
incremental backup (Connolly & Begg, 2009).

(d) Transactions and Recovery


It is the responsibility of the recovery manager to ensure that the
„atomicity‰ and „durability‰ properties of transactions are maintained if a
failure should occur. Let us consider the read and write operations in the
database.

For example, we want to give an employee a salary rise of 5%:

(i) To perform the read operation, the database does the following:

 Finds the address of the disk block that contains the employee
record;

 Transfers the disk block into a database buffer in the main


memory; and

 Copies the salary data from the database buffer into a variable.

(ii) To perform the write operation the database does the following:

 Finds the address of the disk block that contains the employee
record;

 Transfers the disk block into a database buffer in main memory;

 Copies the salary data from a variable into the database buffer;
and

 Writes the database buffer back to disk.

Database buffers are in the main memory where data is transferred to and
from secondary storage. Buffers are flushed to secondary storage when they
are full or when the transaction commits. It is only then that update
operations are considered as permanent. If the failure occurs between writing
to the buffers and flushing the buffers to a secondary storage, the recovery
manager must determine if the transaction has committed or not.

Copyright © Open University Malaysia (OUM)


190  TOPIC 9 TRANSACTION MANAGEMENT

If the transaction has committed, then to ensure durability, the recovery


manager has to redo the transaction updates to the database.

If the transaction has not committed, then the recovery manager has to undo
the effects of the transaction on the database to ensure atomicity (Connolly &
Begg, 2009).

SELF-CHECK 9.1
1. What are the causes of database failures?

2. Differentiate four types of recovery tools.

9.3.3 Recovery Techniques


The following are the types of recovery techniques used which are dependent on
the extent of damage to the database:

(a) If the damage to the database is massive, then the last backup copy of the
database will be restored and the update operations of committed
transactions is reapplied using the log file; and

(b) If the database is not physically damaged but is in an inconsistent state,


then there are two techniques available to the DBMS to help recover the
database:

(i) Deferred Update


In the deferred update approach, updates are written to the database
only when the database reaches the commit point. Updates are not
written to the database at checkpoint time except for committed
transactions. Therefore, for transactions which have not been
committed, the undo operations are not necessary and are not used in
the deferred update technique. However, the recovery manager has to
redo the committed transactions to ensure that the changes are
permanently on the database.

Copyright © Open University Malaysia (OUM)


TOPIC 9 TRANSACTION MANAGEMENT  191

The log records will be examined at failure to identify the active


transactions at the time the failure occurred. Starting at the last entry
in the log file, we go back to the most recent checkpoint record
(Connolly & Begg, 2009):

 Any transaction with the transaction start and the transaction


commit log records should be redone. The records will be redone
using the after image log records for the transactions; and

 Any transaction with the transaction start and transaction abort


log records nothing needs to be done.

(ii) Immediate Update


In the immediate update approach, updates are written to the
database as they occur without waiting to reach the commit point.
Database writes also occur at checkpoint time. If the transaction has
committed, then the recovery manager has to redo the transaction
updates to the database. If the transaction has not committed, then the
recovery manager has to undo the effects of the transaction.

Database writes must occur after the corresponding writes to the log file. This
is known as write ahead log protocol.

If updates are made to the database first and the failure occurred before writing
to the log records, the recovery manager would not be able to identify which
transaction needs to be undone or redone.

The recovery manager examines the log file after a failure to determine if the
transaction needs to be undone or redone (Connolly & Begg, 2009):

(a) Any transaction with the transaction start and the transaction commit log
records should be redone. The records will be redone using the after image
log records for the transactions; and

(b) Any transaction with the transaction start but not the transaction commits
log records should be undone. The records will be undone using the before
image log records.

Copyright © Open University Malaysia (OUM)


192  TOPIC 9 TRANSACTION MANAGEMENT

To help you understand recovery from a system failure, Figure 9.5 shows a
number of transactions with the commit time, most recent checkpoint tc and
system failure tf.

Figure 9.5: Transaction Timeline


Source: Mannino (2011)

A summary of recovery operations for the transaction timeline in Figure 9.5 for
deferred update is shown in Table 9.12 and a summary of the operations for
immediate update is shown in Table 9.13.

Table 9.12: Summary of Restart Work for the Deferred Update Technique

Transaction ID Description Operation


T1 Finished before CP None
T2 Started before CP finished before failure Redo
T3 Started before CP not yet finished None
T4 Started after CP finished before failure Redo
T5 Started after CP not yet finished None

Source: Mannino (2011)

Copyright © Open University Malaysia (OUM)


TOPIC 9 TRANSACTION MANAGEMENT  193

Table 9.13: Summary of Restart Work for the Immediate Update Technique

Transaction ID Description Operation


T1 Finished before CP None
T2 Started before CP finished before failure Redo
T3 Started before CP not yet finished Undo
T4 Started after CP finished before failure Redo
T5 Started after CP not yet finished Undo

Source: Mannino (2011)

 Concurrency control ensures that the database is consistent in the event of


multiple users accessing the database.

 Recovery management ensures that the database returns to a consistent or


correct state after a hardware or software failure.

 A database transaction is a set of operations that must be processed as one


unit of work (Mannino, 2011). The transaction may be an entire program or a
single command and may involve any number of operations on the database
(Connolly & Begg, 2009).

 There are properties that all transactions should follow. These four properties
are known as ACID properties: atomicity, consistency, isolation and
durability.

 The objective of concurrency control is to maximise transaction throughput


while preventing interference among multiple users.

 The three examples of interference problems that can be caused by concurrent


access are the lost update problem, the uncommitted dependency problem
and the inconsistent analysis problem.

 Locks and the two-phase locking protocol are used by most DBMSs to
prevent interference problems.

Copyright © Open University Malaysia (OUM)


194  TOPIC 9 TRANSACTION MANAGEMENT

 The use of locks to solve interference problems can lead to deadlocks. To


control deadlocks, most DBMSs use a simple time-out policy.

 Locking granularity refers to the size of the database item locked. Coarse
granularity refers to large items such as the entire database or an entire table.
Finer locks refer to the row or a column.

 The causes for failure are system crashes, media failures, application software
errors, natural physical disasters and carelessness or unintentional
destruction of data.

 The DBMS is equipped with some tools to support the recovery process
which include the log file and checkpoint table.

 There are two techniques available to the DBMS to help recover the database.
These two techniques are known as deferred update and immediate update.

 In the deferred update, only the redo operator is used whereas in the
immediate update, both the redo and undo operator are applied.

ACID properties Immediate update


Checkpoint table Interference problems
Concurrency control Locks
Database transaction Log file
Deadlocks Recovery management
Deferred update Shrinking phase
Growing phase Two-phase locking protocol (2PL)

Copyright © Open University Malaysia (OUM)


TOPIC 9 TRANSACTION MANAGEMENT  195

1. Discuss the meaning of transaction and explain why transaction is


important in a DBMS.

2. Discuss the types of failure that might occur in a database environment.

3. Explain the mechanism for concurrency control that can be used in a multi-
user environment.

4. Explain why and how the log file is an important feature in any recovery
mechanism.

5. Discuss the similarities and differences between deferred update and


immediate update recovery protocols.

Connolly, T. M., & Begg, C. E. (2009). Database systems: A practical approach to


design, implementation and management (5th ed.). Boston, MA: Addison
Wesley.

Mannino, M. (2011). Database design, application development and


administration (5th ed.). Scottsdale, AZ: Ediyu.

Rob, P., & Coronel, C. (2011). Database systems: Design, implementation and
management (8th ed.). Stamford, CT: Cengage Learning.

Copyright © Open University Malaysia (OUM)


Topic  Web Technology
10 and Database
Management
System (DBMS)
LEARNING OUTCOMES
By the end of this topic, you should be able to:
1. Identify five types of databases;
2. Describe the approaches for integrating databases into the web
environment; and
3. Summarise the advantages and disadvantages of database
platforms.

 INTRODUCTION
In this topic, we will discuss web technology and database management system
(DBMS). As the use of the world wide web (the web) has increased, the
importance of databases has become evident. What do you think is the reason for
this? Have you noticed that e-commerce, business conducted over the Internet
and information retrieval is easily available right now? These are some of the
reasons of the importance of databases as a lot of information need to be stored
and retrieved. It seems that almost every business and government agencies have
stepped up to the challenge of adapting its business to take advantage of the
global network called the Internet.

Copyright © Open University Malaysia (OUM)


TOPIC 10 WEB TECHNOLOGY AND DATABASE MANAGEMENT SYSTEM (DBMS)  197

Many websites today are file-based where each web document is stored in a
separate file. This approach should be suitable for small websites. However, for
large websites, this approach may lead to problems. Thus, the aim of this topic is
to examine some of the current technologies for web-DBMS integration.

ACTIVITY 10.1

Visit the following website:

http://businessfinancemag.com/technology/8-factors-choosing-database

Identify three factors that need to be considered in choosing a DBMS.

10.1 TYPES OF DATABASES


First of all, let us start our discussion on various types of databases. DBMS have
evolved over the past 30 years. In this subtopic, we will discuss five types of
databases (see Figure 10.1).

Figure 10.1: Five types of databases

(a) Hierarchical
IBM introduced the first generation of database technology, known as
hierarchical database, in the mid-1960s. In a hierarchical database, records
are grouped in a logical hierarchy, connecting in a branching structure
similar to an organisational chart. An application retrieves data by first
finding the primary record and then, follows the pointers that are stored in
the record to other connected records.
Copyright © Open University Malaysia (OUM)
198  TOPIC 10 WEB TECHNOLOGY AND DATABASE MANAGEMENT SYSTEM (DBMS)

For example, under a customerÊs name (parent), would be stored the child,
description of the last purchase and date. A child under that would be the
individual items purchased, cost per item and description of item. Another
child under that would be the item's manufactureÊs name. Hierarchical
database only allow for one parent segment per child. In other words, it
only allows for one-to-many relationships.

(b) Network
Network database, which was introduced in the 1970s, allows complex data
structures to be built but is inflexible and requires careful design. It is very
fast and efficient in storage like airline booking system. It allows for many-
to-many relationships.

(c) Relational
A relational database allows the definition of data structures, storage and
retrieval operations as well as integrity constraints. The data and relations
between them are organised in tables. A table is a collection of records and
each record in a table contains the same fields, as mentioned in earlier topics.

Certain fields may be designated as keys, which mean that searches for
specific values of that field will use indexing to speed them up. Where
fields in two different tables take values from the same set, a join operation
can be performed to select related records in the two tables by matching
values in those fields. Often but not always, the fields will have the same
name in both tables. Table 10.1 shows the advantages and disadvantages of
relational databases.

Table 10.1: Advantages and Disadvantages of Relational Databases

Advantages Disadvantages
 There are many popular types of  Some restrictions on field lengths
DBMS in use and as a result, which can lead to occasional
technical development effort practical problems.
ensures that advances like object-  Structured Query Language (SQL)
orientation and web serving appear does not provide an efficient way to
quickly and reliably. browse alphabetically through an
 There are many third party tools index.
such as report writers that are tuned
to work with the popular relational
DBMS via standards such as Open
Database Connectivity (ODBC).
 Offer distributed databases and
distributed processing options
which might be advantageous for
some large organisations.

Copyright © Open University Malaysia (OUM)


TOPIC 10 WEB TECHNOLOGY AND DATABASE MANAGEMENT SYSTEM (DBMS)  199

(d) Object-Oriented
Object-oriented for a database means the capability of storing and
retrieving objects in addition to mere data. It adds database functionality to
object programming languages where applications require less code, use
more natural data modelling and code bases are easier to maintain.

In contrast to relational DBMS where a complex data structure must be


flattened out to fit into tables or joined together from those tables to form
the in-memory structure, object DBMS have no performance overhead to
store or retrieve the web or hierarchy of interrelated objects. Object DBMS
is better suited to support applications such as financial portfolio risk
analysis systems, telecommunications service applications, the web
document structures, design and manufacturing systems, and hospital
patient record systems that have complex relationships between data.

(e) The Future


The development pace of computing appears to accelerate year on year.
The future will call for efficient handling of objects and sophisticated web
serving. This will be discussed in more detail in the next subtopic.

SELF-CHECK 10.1

Identify the different types of databases and briefly explain one


advantage and one disadvantage of each.

10.2 THE WEB


Firstly, let us talk about the webÊs definition.

The web is a hypermedia-based system that provides a means of browsing


information on the Internet in a non-sequential way by using hyperlinks.
Connolly & Begg (2009)

It provides a simple one-stop-centre that allows users to explore the large volume
of pages of information residing on the Internet. The information is presented on
web pages that consist of a collection of text, graphics, pictures, sound, video and
hyperlinks to other web pages. The hyperlinks allow users to navigate to other
web pages in a non-sequential approach (Connolly & Begg, 2009).

Copyright © Open University Malaysia (OUM)


200  TOPIC 10 WEB TECHNOLOGY AND DATABASE MANAGEMENT SYSTEM (DBMS)

The web consists of a network of computers that has two roles, which are, as
servers that provide information and as clients that request information.
Examples of web servers are Apache HTTP Server and Microsoft Internet
Information Server (IIS). Examples of clients or web browsers are Microsoft
Internet Explorer, and Mozilla (Connolly & Begg, 2009). There are three basic
components of a web environment, as shown in Table 10.2.

Table 10.2: Three Basic Components of Web Environment

Acronym Component Description


HTML Hypertext markup language The information on the web is stored in
documents using HTML
HTTP Hypertext transfer protocol HTTP governs the exchange of
information between web server and web
browser
URL Uniform resource locator URL is the address that identifies the
documents and locations

Source: Connolly & Begg (2009)

A web page can be either static or dynamic. A static web page is where the
content of the document does not change unless the soil itself is changed. On the
other hand, the content of a dynamic web page is generated each time it is
accessed.

Thus, the features of a dynamic web page can be summarised as follows:

(a) It can respond to user input from the browser; and

(b) It can be customised by and for each user.

As a database is dynamic and changes as users create, insert, update and delete
data, then using dynamic web pages would be much more suitable than static
web pages. Dynamic web pages need hypertext that can be generated by servers.
To achieve this, scripts can be written that perform the conversion from different
data formats into HTML (Connolly & Begg, 2009).

Copyright © Open University Malaysia (OUM)


TOPIC 10 WEB TECHNOLOGY AND DATABASE MANAGEMENT SYSTEM (DBMS)  201

10.2.1 Requirements for Web-DBMS Integration


The requirements for the integration of database applications with the Web are as
follows (Connolly & Begg, 2009):

(a) The ability to access valuable corporate data in a secure manner;

(b) Data and vendor have independent connectivity to allow freedom of choice
in the selection of DBMS now and in the future;

(c) The ability to interface to the database independent of any proprietary web
browser or web server;

(d) A connectivity solution that takes advantage of all features of an


organisation of DBMS;

(e) An open architecture approach to allow interoperability with a variety of


systems and technologies;

(f) A cost-effective solution that allows for scalability, growth and changes in
strategic directions and helps reduce the costs of developing and
maintaining applications;

(g) Support for transactions that span multiple HTTP requests;

(h) Support for session and application-based authentication;

(i) Acceptable performance;

(j) Minimal administration overhead; and

(k) A set of high-level productivity tools to allow applications to be developed,


maintained and deployed with relative ease and speed.

How do we integrate the web and DBMSs? The approaches are as follows:

(a) Scripting languages such as JavaScript and VBScript;

(b) Common gateway interface (CGI);

(c) HTTP cookies;

Copyright © Open University Malaysia (OUM)


202  TOPIC 10 WEB TECHNOLOGY AND DATABASE MANAGEMENT SYSTEM (DBMS)

(d) Extension to the web server like MicrosoftÊs Internet Information Server
API (ISAPI);

(e) Java, J2EE, JDBC, SQLJ, JDO, Servlets and JavaServer Pages (JSP);

(f) MicrosoftÊs web solution platform: .NET, Active server pages (ASP) and
ActiveX data objects (ADO); and

(g) OracleÊs Internet platform.

You might wonder why we need web-DBMS integration. This is because of these
eight advantages (Table 10.3).

Table 10.3: Eight Advantages of web-DBMS integration

Advantages Explanation
Simplicity HTML as a markup language is easy for both developers and end-
users to learn since it does not have an overly complex
functionality.
Platform Most of the web browsers are platform-independent, thus,
independence applications do not need to be modified to run on different
operating system or windows-based environments.
Graphical user Web browsers provide a common, easy-to-use GUI that can be
interface (GUI) used to access databases. With a common interface, training cost for
the end-users can be reduced.
Standardisation An HTML document on one machine can be read by users on any
machine in the world with an Internet connection and a web
browser.
Cross-platform Web browsers are available for almost every type of computer
support platform and this allows users on most types of computers to
access a database from anywhere in the world. Thus, information
can be accessed with minimum time and effort.
Transparent This built-in support for networking simplifies database access,
network access without having the users to purchase separate expensive
networking software.
Scalable By storing the application on a separate server, the Web eliminates
deployment the time and cost associated with application deployment. Thus, it
simplifies the handling of data maintenance and management of
multiple platforms across different offices.
Innovation The web allows organisations to provide new services and connect
to new customers through globally accessible applications.

Copyright © Open University Malaysia (OUM)


TOPIC 10 WEB TECHNOLOGY AND DATABASE MANAGEMENT SYSTEM (DBMS)  203

What about the disadvantages of the web-DBMS approach? They are discussed
in Table 10.4.

Table 10.4: Three Disadvantages of Web-DBMS integration

Disadvantages Explanation
Reliability Difficulties in sending and receiving data may arise when access to
information on a server is done at peak times due to overloading of
userÊs access
Security Once you have sent or received data, there is no 100% guarantee
that the data is secured. User authentication and secure data
transmissions are critical due to the large number of anonymous
users
Limited Even though HTML provides an easy-to-use interface, some highly
functionality of interactive database applications may not be converted easily to
HTML web-based applications. These extra functionalities may be added
but then, it may be too complex for some users. There may also be
some overhead performance in downloading and executing these
codes.

Source: Connolly & Begg (2009)

ACTIVITY 10.2

Visit the following website:


http://www.databasejournal.com/sqletc/article.php/1428721

Get some ideas on how to develop a simple website with dynamic


pages and a database.

SELF-CHECK 10.2

1. Identify the requirements for web-DBMS integration.

2. Identify how web-DBMS integration can perform.

3. Identify three advantages and disadvantages of web-DBMS


integration.

Copyright © Open University Malaysia (OUM)


204  TOPIC 10 WEB TECHNOLOGY AND DATABASE MANAGEMENT SYSTEM (DBMS)

 The Internet is a worldwide collection of interconnected computer networks.

 The web is a hypermedia-based system that provides a simple means to


explore the information on the Internet in a non-sequential way.

 Database can be categorised into five types, namely, hierarchical, network,


relational, object-oriented and the future.

 Information on the web is stored in documents using HTML. The web


browser exchanges information with a web server using HTTP.

 The advantages of the web as a database platform include simplicity,


platform independence, GUI, standardisation, cross-platform support and
transparent network access.

 The disadvantages include lack of reliability, poor security and limited


functionality of HTML.

 A few approaches for integrating databases into the web environment are
scripting languages, CGI, HTTP cookies and OracleÊs Internet platform.

Hierarchical database Object-oriented database


Hypertext markup language (HTML) Relational database
Hypertext transfer protocol (HTTP) Uniform resource locator (URL)
Network database

Copyright © Open University Malaysia (OUM)


TOPIC 10 WEB TECHNOLOGY AND DATABASE MANAGEMENT SYSTEM (DBMS)  205

1. Using any web browser, take a look at government, private and educational
institution websites and write a report on the GUI and security of the
websites.

2. Discuss how web services can be used to effectively integrate business


applications and data. Find one example from the industry and discuss the
web services that are being used.

Connolly, T. M., & Begg, C. E. (2009). Database systems: A practical approach to


design, implementation and management (5th ed.). Boston, MA: Addison-
Wesley.

Mannino, M. (2005). Database design, application development and


administration (3rd ed.). New York, NY: McGraw-Hill.

Hoffer, J. A., Prescott, M. B., & McFadden, F. R. (2008). Modern database


management (8th ed.). New Jersey, NJ: Prentice-Hall.

Copyright © Open University Malaysia (OUM)


MODULE FEEDBACK
MAKLUM BALAS MODUL

If you have any comment or feedback, you are welcome to:

1. E-mail your comment or feedback to [email protected]

OR

2. Fill in the Print Module online evaluation form available on myINSPIRE.

Thank you.

Centre for Instructional Design and Technology


(Pusat Reka Bentuk Pengajaran dan Teknologi )
Tel No.: 03-27732578
Fax No.: 03-26978702

Copyright © Open University Malaysia (OUM)


Copyright © Open University Malaysia (OUM)

You might also like