Database
Database
Database
CBDB3403
Database
INTRODUCTION
CBDB3403 Database is one of the courses offered by Faculty of Information
Technology and Multimedia Communication at Open University Malaysia
(OUM). This course is worth 3 credit hours and should be covered over 8 to
15 weeks.
COURSE AUDIENCE
This course is targeted to all IT students specialising in Information Systems.
Students enrolled in other IT-related specialisations also will find this course
useful as this course will answer many of their questions regarding database
system development.
STUDY SCHEDULE
It is a standard OUM practice that learners accumulate 40 study hours for every
credit hour. As such, for a three-credit hour course, you are expected to spend
120 study hours. Table 1 gives an estimation of how the 120 study hours could be
accumulated.
Study
Study Activities
Hours
Briefly go through the course content and participate in initial
3
discussion
Study the module 60
Attend 3 to 5 tutorial sessions 10
Online participation 12
Revision 15
Assignment(s), Test(s) and Examination(s) 20
TOTAL STUDY HOURS 120
COURSE OUTCOMES
By the end of this course, you should be able to:
COURSE SYNOPSIS
This course is divided into 10 topics. The synopsis for each topic is presented
below:
Topic 2 introduces the concepts behind the relational model; as the most popular
data model at present and the most often chosen for standard business
applications. After introducing the terminology and showing the relationship
with mathematical relations, the relational integrity rules, entity integrity and
referential integrity are discussed. The topic concludes with an overview on
views, which is expanded later in Topic 4.
Topic 4 covers the main data definition facilities of the SQL standard. Again, the
topic is presented as a worked tutorial. The topic introduces the SQL data types
and data definition statements – the Integrity Enhancement Feature (IEF) and
more advanced features of the data definition statements; including the access
control statements GRANT and REVOKE. It also examines views and how they
can be created in SQL.
Topic 8 considers database security, not just in the context of DBMS security but
also in the context of security of the DBMS environment. The topic also examines
the security problems that can arise in the Web environment and presents some
approaches to overcome them.
Topic 10 examines the integration of the DBMS into the Web environment. After
providing a brief introduction to the Internet and the Web technology, this topic
examines the appropriateness of the Web as a database application platform and
discusses the advantages and disadvantages of this approach. It then considers
a number of the different approaches to integrating DBMSs into the Web
environment.
Learning Outcomes: This section refers to what you should achieve after you
have completely covered a topic. As you go through each topic, you should
frequently refer to these learning outcomes. By doing this, you can continuously
gauge your understanding of the topic.
Summary: You will find this component at the end of each topic. This component
helps you to recap the whole topic. By going through the summary, you should
be able to gauge your knowledge retention level. Should you find points in the
summary that you do not fully understand, it would be a good idea for you to
revisit the details in the module.
Key Terms: This component can be found at the end of each topic. You should go
through this component to remind yourself of important terms or jargon used
throughout the module. Should you find terms here that you are not able to
explain, you should look for the terms in the module.
PRIOR KNOWLEDGE
Knowledge of the windows operating system and Microsoft Access application is
required for this course.
ASSESSMENT METHOD
Please refer to myINSPIRE.
REFERENCES
Connolly, T. M., & Begg, C. E. (2009). Database systems: A practical approach to
design, implementation and management (5th ed.). Boston, MA: Addison-
Wesley.
Hoffer, J. A., Prescott, M. B., & Topi, H. (2008). Modern database management
(9th ed.). New Jersey, NJ: Prentice-Hall.
Pratt, P. J., & Last, M. Z. (2008). A guide to SQL (8th ed.). Mason, OH: Cengage
Learning.
Rob, P., & Coronel, C. (2001). Database systems: Design, implementation and
management (8th ed.). Stamford, CT: Cengage Learning.
LEARNING OUTCOMES
By the end of this topic, you should be able to:
1. Identify the characteristics of file-based systems;
2. Describe four limitations of file-based systems;
3. Define the database and database management systems (DBMS);
4. Describe four advantages and two disadvantages of DBMS.
5. Identify four features of DBMS; and
6. Classify types of people involved in the DBMS environment.
INTRODUCTION
Have you heard of the word „database‰ or „database system‰? If you have, then
you will have a better understanding of these words by taking this course.
However, if you have not heard of them, do not worry. By taking this course, you
will be guided until you know, understand and are able to apply it to real world
problems.
You might ask yourself why do we need to study database systems? Well, this is
similar to ask yourself why you need to study programming, operating systems
or other IT-related subjects. The answer is that database systems have become an
important component of successful businesses and organisations. Since you
might probably intend to be a manager, entrepreneur or an IT professional, it is
vital to have a basic understanding of database systems.
This topic introduces the area of database management systems, examines the
problems with the traditional file-based systems and discusses what database
management system (DBMS) can offer. In the first subtopic, there will be an
explanation of some uses of database systems that we can find in our everyday
lives. Then, in the next subtopic, we will compare the file-based system with
database systems. Next, we will discuss the roles that people perform in the
database environment and lastly, we will discuss the advantages and
disadvantages of database management systems.
For your information, all these activities are possible with the existence of
DBMSs. It means that our life is affected by database technology. Computerised
databases are important to the functioning of modern organisations. Before we
proceed further, let us take a look at its definition.
However, before we discuss this matter any further, let us examine some
applications of database systems that you have used without realising that
you are accessing a database system in your daily life such as:
So, now do you realise that you are a user of database systems? The database
technology does not only improve the daily operations of organisations but also
the quality of decisions made. For instance, with the database systems, a
supermarket can keep track of its inventory and sales in a very short time. This
may lead to a fast decision in terms of making new orders of products. In this
case, the products will always be available for the customers. Thus, the business
may grow as the customerÊs satisfaction is always met. In other words, it would
be an advantage to those who collect, manage and interpret information
effectively in todayÊs world.
Today, data can be represented in various forms like sound, images and
videos. For instance, you can record your speech into a computer using the
computerÊs microphone. Images taken using a digital camera or scanned
using a scanner can also be transferred into a computer. So, actually there
are so many different types of data around us. Can you name some other
data that you might have used or produced before?
Now, the next thing that we will discuss is how we can make our data
meaningful and useful. This can be done by processing it.
Information refers to the data that have been processed in such a way
that the knowledge of the person who uses the data is increased.
Jeffrey et al. (2011)
For instance, the speech that you have recorded and images that you have
stored in a computer, could be used as part of your presentation by using
any of presentation software. The speech may represent some definitions of
the terms that are included in your presentation slides. Thus, by including
it into your presentation, the recorded speech has more meaning and is
more useful. The images could also be sent to your friends through e-mails
for them to view. What this means is that you have transformed the data
that you have stored into information, once you have done something with
it. In other words, computers process data into information.
The next subtopic will discuss the traditional file-based system and to
examine its limitations and also to understand why database systems are
needed.
SELF-CHECK 1.1
Traditionally, manual files are used to store all internal and external data within
an organisation. These files are stored in cabinets and for security purposes,
whereby the cabinets are locked or located in a secure area. When any
information is needed, you may have to search starting from the first page until
you find the information that you are looking for. To speed up the searching
process, you may create an indexing system to help you locate the information
that you are looking for faster. You may have such a system that stores all your
results or important documents.
The manual filing system works well if the number of items stored is not large.
However, this kind of system may fail if you want to do a cross-reference or
process any of the information in the file. Then, computer-based data processing
emerges and traditional filing system is replaced with these computer-based data
processing systems or file-based systems. However, instead of having a
centralised store for the organisationÊs operational access, a decentralised
approach is taken. In this approach, each department would have their own file-
based system, which they would monitor and control separately.
Example 1.1:
File Processing System at Make-Believe Real Estate Company
Make-Believe Real Estate company has three departments: sales, contract and
personnel. Each of these departments are physically located in the same building,
but in separate floors and each has its own file-based system. The function of the
sales department is to sell and rent properties. The function of the contract
department is to handle lease agreements associated with properties for rent.
The function of the personnel department is to store information about staff.
Figure 1.1 illustrates the file-based system for Make-Believe Real Estate company.
Each department has its own application programme that handles similar
operations like data entry, file maintenance and generating reports.
By looking at Figure 1.1, we can see that the sales executive can store and retrieve
information from the sales files through sales application program. The sales files
may consist of information regarding properties, owners and clients. Figure 1.2
illustrates examples of the content of these three files. Figure 1.3 shows the
content of the Contract files while Figure 1.4 is for the Personnel files. Notice that
the Client files in the sales and contract departments are the same. What this
means is that duplication occurs when using decentralised file-based system.
Figure 1.2: Property, Owner and Client files used by the sales department
Figure 1.3: Lease, Property and Client files used by the contract department
Thus, the personnel file in Figure 1.4 consists of two records and each record
consists of nine fields. Can you list the number of records and fields in the client
file as shown in Figure 1.3?
Now, let us discuss the limitations of the file-based system which we have
mentioned earlier. No doubt, file-based systems have proved to be a great
improvement over manual filing systems. However, a few problems still occur
with this system, especially if the volume of the data and information increases.
Well, you can create a temporary file of those clients who have a „house‰ as
the preferred type and search for the available house from the property file.
Then, you may create another temporary file of those clients who have an
„apartment‰ as the preferred type and do the searching again. The search
would be more complex if you have to access more than two files and from
different departments. In other words, the separation and isolation of data
would make the retrieval process time consuming.
SELF-CHECK 1.2
1. What is a file-based system?
Meanwhile, there are two disadvantages of the database approach. These are
shown in Figure 1.7.
ACTIVITY 1.1
Search in the Internet the details of the two disadvantages listed in
Figure 1.7. Discuss your findings with your coursemates in the forum.
Thus, we can change the internal definition of an object in the database without
affecting the users of the object, provided that the external definition remains the
same. For instance, if we were to add a new field to a record or create a new file,
then the existing applications are unaffected. More examples of this will be
shown in the next topic.
Some other terms that you need to understand are shown in Table 1.1.
Term Definition
Entity A specific object (for example a department, place or event) in the
organisation that is to be represented in the database
Attribute A property that explains some characteristics of the object that we
wish to record
Relationship An association between entities
By referring to Figure 1.8, we can see that the ER diagram consists of two entities
(rectangles) namely Department and Staff. It has one relationship, where it
indicates that a department has many staff. For each entity, there is one attribute,
that is, DepartmentNo and StaffNo. In other words, the database holds data that
are logically related. More explanations on this will be discussed in later topics.
SELF-CHECK 1.3
Now, let us discuss in detail five common features of a DBMS (see Figure 1.9).
SELF-CHECK 1.4
Who are the people involved in the database environment? Briefly
explain their responsibilities.
The predecessor to the DBMS was the file-based system where each program
defines and manages its own data. Thus, data redundancy and data
independence become major problems.
The database approach was introduced to resolve the problems with file-
based system. All access to the database can be made through the DBMS.
There are four types of people involved in the DBMS environment which are
data and database administrators, database designers, application designers
and end users.
(a) Data
(b) Information
(c) Database
3. List two examples of database system other than what have been discussed
in this topic.
4. Discuss the main components of the DBMS environment and how they are
related to each other.
6. Study the University Student Affairs case study presented below. In what
ways would a DBMS help this organisation? What data can you identify
that needs to be represented in the database? What relationships exist
between the data items?
Data requirements:
(a) Student
(v) Sex
(vii) Nationality
(i) List the names of students who are staying in the colleges
Jeffrey, A. H., Prescott, M., & Topi, H. (2008). Modern database management
(9th ed.). New Jersey, NJ: Prentice Hall.
LEARNING OUTCOMES
By the end of this topic, you should be able to:
1. Explain terminologies on relational database;
2. Discuss how tables are used to represent data;
3. Identify the candidate, primary, superkey and foreign keys;
4. Describe the meaning of entity integrity and referential integrity;
and
5. Explain the concept and purpose of views in relational systems.
INTRODUCTION
Topic 1 was a starting point for your study of database technology. You learned
about the database characteristics and the database management system (DBMS)
features. This topic will focus on the relational data model but before that, let us
look at a brief introduction of the model. The relational model was developed by
E. F. Codd in 1970. The simplicity and familiarity of the model made it hugely
popular, compared to the other data models that existed during that time. Since
then, relational DBMSs have dominated the market for business DBMS
(Mannino, 2011).
This topic provides you with an exploration of the relational data model. You
will discover that the strength of this data model lies in its simple logical
structure, whereby these relations are treated as independent elements. You will
then see how these independent elements can be related to one another.
In order to ensure that the data in the database are accurate and meaningful,
integrity rules are explained. Two important integrity rules will be described,
which are, entity integrity and referential integrity. Finally, we will end the topic
with the concept of views and its purpose.
2.1 TERMINOLOGY
First of all, let us start with the definitions of some of the pertinent terminology.
The relational data model was developed because of its simplicity and its familiar
terminology. The model is based on the concept of a relation which is physically
represented as a table (Connolly & Begg, 2009). This subtopic presents the basic
terminology and structural concepts of the relational model.
(a) Relation
The relation must have a name that is distinct from other relation names in
the same database. Table 2.2 shows a listing of the two-dimensional table
named Employee, consisting of seven columns and six rows. The heading
part consists of the table name and the column names. The body shows the
rows of the table.
(b) Attribute
In the Employee table (see Table 2.2), the columns for attributes are EmpNo
(Employee Number), Name, MobileTelNo (Mobile Telephone Number),
Position, Gender, DOB (Date Of Birth) and Salary.
You must take note that every column row intersection contains a single
atomic data value. For example, the EmpNo columns contain only the
number of a single existing employee.
Derived Attribute
A derived attribute is an attribute where its value is derived from the
value of related attributes or set of other attributes. An example is the
age attribute which is derived from the date of birth.
Data Types
Data types indicate the kind of data for the column (character, numeric,
yes or no, etc.) and permissible operations (numeric operations, string
operations) for the column. Table 2.3 lists the five common data types.
The Name and Position attributes are types of variable length. These
columns contain only the actual number of characters and not the
maximum length. As you can see from the Employee relation, the number
of characters in the Name attribute column varies from 9 to 13, while the
number of characters in the Position attribute column varies from 5 to 13.
Finally, the Date attribute column consists of 10 characters of the format
(DD/MM/YY).
For example, in the MobileTelNo attribute, the first three digits is limited to
012/3/6/7/9 which corresponds to the mobile telecommunications service
operators in Malaysia. Similarly, the Gender is limited to the characters F or
M. Table 2.4 summarises the domains for the Employee relation.
The domain concept is important because it allows the user to define the
meaning and source of values that the attributes can hold.
(c) Tuple
What does a tuple mean?
SELF-CHECK 2.1
1. What is a relation?
(a) Superkey
(i) EmpNo;
A relation can have several candidate keys. When a key consists of more
than one attribute, it is known as a composite key. Therefore EmpNo and
Name are composite keys.
However, we cannot discount the possibility that someone who shares the
same name as listed above becomes an employee in the future. This may
make the Name attribute an unwise choice as a candidate key because of
duplicates. However, attributes EmpNo and MobileTelNo are suitable
candidate keys as an employeeÊs identification in any organisation is
unique. MobileTelNo can be picked to be the candidate key because we
know that no duplicate hand phone numbers exist, thereby making it
unique.
You may note that a primary key is a superkey as well. In our employee
table, the EmpNo can be chosen to be the primary key while MobileTelNo
then becomes alternate key.
The addition of SuppNo in both the Supplier and Product tables links each
supplier to details of the products that are supplied. In the Supplier
relation, SuppNo is the primary key. In the Product relation, SuppNo
attribute exists to match the product to the supplier. In the Product relation,
SuppNo is the foreign key. Notice that each data value of SuppNo in
Product matches the SuppNo in Supplier. The reverse need not necessarily
be true.
SELF-CHECK 2.2
Explain the following:
The standard way of representing a relation schema is to give the name of the
relation followed by attribute names in parenthesis. The primary key is
underlined. An instance of this relational database schema is shown in Figure 2.3.
2.2.1 Nulls
Nulls are not the same as a zeros or spaces as null represents the absence of a
value (Connolly & Begg, 2009). For example, in the Invoice relation of the Order
Entry Database, the DatePaid attribute in the second row is null until the
customer pays for the order.
Entity integrity ensures that a relation must have primary key attribute and
the primary key attribute cannot be null.
This guarantees the primary key as unique and ensures that foreign keys can
accurately reference primary key values. In the Employee table, the EmpNo is the
primary key. We cannot insert new employee details into the table with a null
EmpNo. The OrderDetail has the composite primary key OrderNo and
ProductNo, so to insert a new row, both values must be known.
For example, in the Order Entry Database, the Product table has the foreign key
SuppNo. You will notice that every entry of SuppNo in the rows of the Product
table matches the SuppNo of the referenced table Supplier. However, we can
create a new product record with a null SuppNo, if currently no suppliers have
been identified to supply the product.
SELF-CHECK 2.3
1. What is a null?
2. Can a primary key value have a null value?
3. What is the value for a foreign key?
2.3 VIEWS
A view is a virtual or derived relation that may be derived from one or more
base relations.
Connolly & Begg (2009)
(a) Allow users to customise data according to their needs, so that the same
data can be seen by different users in different ways at the same time; and
(b) Hide part of the database from selected users, hence providing a powerful
security system. These users will not be aware of the existence of all the
tuples and attributes in the database (Connolly & Begg, 2009).
SELF-CHECK 2.4
1. What is a view?
The relational data model was developed because of its simplicity and
familiar terminology. The model is based on the concept of a relation which is
physically represented as a table (Connolly & Begg, 2009).
Null represents the absence of a value. Primary key value cannot be null. A
foreign key value must match the primary key value in the related table or it
can be null.
A view is a virtual or derived relation that may be derived from one or more
base relations. Views allow users to customise data according to their needs
and hide part of the database from certain users providing security to the
database.
2. What is the difference between a primary key and a candidate key? Give an
example.
Resort consists of resort details and resortNo is the primary key. Room
contains room details for each resort and roomNo is the primary key.
Booking contains details of bookings and bookingNo is the primary key.
Guest contains guest details and guestNo is the primary key.
(i) Identify the foreign keys in this schema. Explain how the entity and
referential integrity rules apply to these relations; and
(ii) Produce four sample tables for these relations that observe the
relational integrity rules.
Rob, P., & Coronel, C. (2011). Database systems: Design, implementation and
management (8th ed.). Stamford, CT: Cengage Learning.
INTRODUCTION
In this topic, you will learn the basic features and functions of structured query
language (SQL). SQL is simple and relatively easy to learn. It is the standard
language for relational database models for data administration (for creating
tables, indexes and views including control access) and data manipulation (to
add, modify, delete and retrieve data). In this topic, the focus is on data
manipulation.
In this subtopic, we will provide a description of what SQL is, give the
background and history of SQL and discuss the importance of SQL to the
database application.
In this topic, we focus only on the DML commands while Topic 4 will continue
with the discussion on DDL.
Year Description
1970 The relational mode from which SQL draws much of its conceptual core
was formally defined by Dr E. F. Codd, a researcher for IBM
1974 Began the System/R project and developed Structured English Query
Language (SEQUEL)
1974ă1975 System/R was implemented on an IBM prototype called SEQUEL-XRM
1976ă1977 System/R was completely rewritten to include multi-table and multi-
user features. When the system was revised, it was briefly called
„SEQUEL/2‰ and then renamed „SQL‰ for legal reasons
1983 IBM began to develop commercial products that implement SQL based
on their System/R prototype, including DB2
Several other software vendors accepted the rise of the relational model and
announced SQL-based products. These included Oracle, Sybase and Ingres
(based on the University of CaliforniaÊs Berkeley Ingres project).
Importance Description
Standard language for relational It has been globally accepted
database
A powerful data management tool Almost all major database vendors support SQL
Easy to learn SQL is a non-procedural language. You just need
to know what is to be done; you do not need to
know how it is to be done
SELF-CHECK 3.1
1. Briefly explain SQL.
(b) The SQL syntax is not case sensitive. Thus, words can be typed in either
small or capital letters.
(d) The SQL notation used throughout this module follows the Backus Naur
Form (BNF) which is described as follows:
SELF-CHECK 3.2
1. What does case-sensitive mean?
Command Details
SELECT Extracts data from a database table
UPDATE Updates data in a database table
DELETE Deletes data from a database table
INSERT INTO Inserts new data into a database table
As mentioned earlier, SQL statements are not case sensitive. In other words,
SELECT is the same as select. In our discussion and illustration of SQL
commands, we will use tables from the previous topic; Table 2.5 and rename it as
Table 3.4 in this topic.
Syntax
SELECT [DISTINCT | ALL] [*] [column_expression [AS new_name]]
FROM tablename [alias] [....]
[WHERE condition]
[GROUP BY column_list] [HAVING condition]
[ORDER BY column_list]
The meanings of clauses used in the SELECT statement are listed in Table 3.5.
Clause Meaning
SELECT Specifies the columns or/and expressions that should be in the output
FROM Indicates the table(s) from which data will be obtained
WHERE Specifies the rows to be used. If not included, all table rows are used
GROUP BY Indicate categorisation of results
HAVING Indicate the conditions under which a category (group) will be included
ORDER BY Sorts the result according to specified criteria
The order of these clauses cannot be changed. The SELECT and FROM clauses
are mandatory to use in the SELECT statement and others are optional. The
result of this statement is a table. Next, you are going to learn the variations of
the SELECT statement.
This query requires us to select all columns and all rows from the Employee
table.
For queries that require listing all columns, the SELECT clause can be
shortened by using asterisks (*). Therefore, you may write the query above
as:
SELECT *
FROM Employee;
This query requires selecting only specific columns from the Employee
table.
The keyword DISTINCT is used in the SELECT clause for retrieving non-
duplicate data from a column or columns.
This query can be written as follows and the result is as shown in Table 3.8:
SELECT Position
FROM Employee;
Position
Administrator
Salesperson
Manager
Assistant Manager
Clerk
Clerk
The result above contains duplicates, in which the Clerk is written twice.
What if we only want to select each distinct element of position? This is
easy to accomplish in SQL. All we need to do is to use DISTINCT keyword
after SELECT.
With the statement above, the duplicate is eliminated and we get the result
table as shown in Table 3.9.
Position
Administrator
Salesperson
Manager
Assistant Manager
Clerk
There are five basic search conditions that can be used in a query as shown
in Table 3.10.
This statement filters all rows based on the condition where the salary is
greater than 1000. The result returns from this statement is shown in
Table 3.11.
Figure 3.1 shows a list of comparison operators that can be used in the
WHERE clause. In addition, a more complex condition can be generated
using the logical operators AND, OR and NOT.
Operator Description
= Equal
<> or != Not equal
> Greater than
< Less than
>= Greater than or equal
<= Less than or equal
This statement uses the logical operator OR in the WHERE clause to find
employees with a position as Clerk or Salesperson. Table 3.12 shows the
result returns from executing this statement.
So, in this example, the condition in the WHERE clause can also be written
as:
The results return from executing both statements are shown in Table 3.13.
Set membership condition (IN) tests whether a value matches any value in
a set of values. In this query, it finds rows in the Employee table with
positions as clerks or salespersons. This statement returns the result as
shown in the Table 3.14, which is similar to the results for the query in
Example 5.
There is also a negated version (NOT IN) that can be used to list all rows
excluded from the IN list, for instance, if we want to find employees that
are not clerks or salespersons. This query can be expressed as follows and
the result table is shown in Table 3.15.
Query 8: Find all employees who have Celcom prepaid numbers. In other
words, their hand phone numbers must start with 013.
This statement list all phone numbers starts with 013 and it does not matter
what numbers or characters are following it.
The result table returns from executing this statement is shown in the
Table 3.16.
SELF-CHECK 3.3
1. What is the purpose of using the SELECT statement?
2. Explain the function of each of the clauses in the SELECT
statement.
3. By referring to Table Employee, write the SELECT statement to:
(a) Display the names of all employees;
(b) Display the names of the employees whose salary is less
than 1000; and
(c) Display the names of the salespersons and their salary.
Query 9: List salaries for all employess, arranged in ascending order of salary.
If you want to sort the list in descending order, the word DESC must be specified
in the ORDER BY clause after the column name, as seen here:
Executing this statement will produce results as in Table 3.17 for ascending list
and Table 3.18 for descending list.
Query 10: List the employees sorted by position and in each position sort the list
in descending order by salary.
This query requires using two sort keys. The Position is the primary sort key and
the Salary is the secondary or minor sort key. The primary sort key has to be
written first in the list and followed by minor keys. You may have more than one
minor key.
This statement will provide us with the table as shown in Table 3.19.
SELF-CHECK 3.4
1. Write the SELECT statement to display all the information about
the employess, sorted by the names of employees, in descending
order.
In this subtopic, we use Product and Delivery tables shown in Table 3.21 and
Table 3.22 to illustrate the use of these aggregate functions.
Query 11: Find the number of products supplied by supplier number S8843.
This statement will only count the number of products that is supplied by a
supplier with supplier number S8843. In Example 3.11, the return value of this
statement is 2 as shown in the Table 3.23.
NumOfProduct
2
Query 12: How many different products were delivered from January to April in
the year 2013?
NumOfProduct
5
Query 13: Count the number of products with less than RM500 per unit and total
its QuantityOnHand.
This statement counts the number of price that is less than 500 and sums up its
QuantityOnHand. The result is shown in Table 3.25.
NumOfProduct TotalStock
3 50
Query 14: Find the minimum, maximum and the average price per unit
The result table for the above statement is shown in Table 3.26.
SuppNo NumOfProduct
S8843 2
S9888 1
S9898 1
S9995 1
Query 16: Find the OrderNo that has more than one product.
This operation groups the Delivery data based on OrderNo and lists only
those groups that have more than one product. The output of this operation
is shown in Table 3.28.
OrderNo NumOfProduct
1120 3
4399 2
3.3.5 Subqueries
In this subtopic, we are going to learn how to use subqueries. Here, we provide
examples of subqueries that involve the use of the SELECT statement within
another SELECT statement which is also sometimes referred to as nested
SELECT. In terms of the order of the execution, the inner SELECT will be
performed first and the result of the inner SELECT will be used for the filter
condition in the outer SELECT statement.
Query 17: List the product names and its price per unit for products that are
supplied by ABX Technics.
First, the inner SELECT statement is executed to get the supplier number of
ABX Technics. The output from this statement is tested as part of the search
condition in the WHERE clause of the outer SELECT statement. Note that
the „=‰ sign has been used in the WHERE clause of the outer SELECT since
the result of the inner SELECT contains only one value.
The final result table from this query is shown in Table 3.29.
List the supplier number, product names and its price per unit for products
that are supplied by the supplier from Petaling Jaya.
First, the inner SELECT statement can have more than one value. Therefore,
the „IN‰ keyword is used in the search condition in the WHERE clause in
the outer SELECT statement. The result table for the above statement is
shown in Table 3.30.
This statement joins two tables which are Product and Supplier. Since the
common column in both tables is SuppNo, therefore this column is used for
the join condition in the WHERE clause. The output of this simple join
statement is shown in Table 3.31.
ProductName SupplierName
17 inch Monitor ABX Technics
19 inch Monitor ABX Technics
Laser Printer Soft System
Colour Laser Printer ID Computers
Colour Scanner ITN Suppliers
Query 20: Sort the list of products based on supplier name and for each
supplier name, sort the list based on Product names in descending order.
ProductName SupplierName
19 inch Monitor ABX Technics
17 inch Monitor ABX Technics
Laser Printer Soft System
Colour Laser Printer ID Computers
Colour Scanner ITN Suppliers
Query 21: Find the supplier names of the product that were delivered in Jan
2013. Sort the list based on Supplier name.
These queries require us to join three tables. All the join conditions are
listed in the WHERE clause. As noted earlier, the common column names
for both tables to be joined need to be used as the join condition. To join the
supplier and product, the supplier number is used and to join the product
and delivery tables, the product number is used. The result from this join is
shown in the Table 3.33.
In this subtopic, you are going to learn SQL commands that are used for
modifying the contents of a table in a database. The SQL commands that are
commonly used are as shown in Figure 3.3.
3.4.1 INSERT
INSERT is used to add new records or data into an existing database table. The
syntax for INSERT command is as follows:
(a) columnList is optional; if omitted, SQL assumes the column list and its
order are similar to the column names that you specified when you first
created the table;
(b) Any columns omitted must have been declared as NULL when the table
was created, unless DEFAULT was specified when creating a column; and
(iii) The data type of each item in dataValueList must be compatible with
data type of the corresponding column.
In this subtopic, we illustrate the variation of the INSERT statement using the
table of Supplier as in Table 3.34.
Query 22: Add a new record as given below to the Supplier table.
Since you want to insert values for all the columns in the table, therefore you
may omit the column list. Thus, you may write the SQL statement as:
Note that you must enclose the values of a non-numeric column in quotation
marks such as „Kuala Lumpur‰ for the City. Executing any of this statement will
give us the result in Table 3.35.
You may insert a new record with only a specific column into a table. However,
for every mandatory column, the column that is defined as NOT NULL in the
CREATE TABLE statement must be supplied with a value.
Query 23: Add a new record as given below to the Supplier table.
In this example, the data provided is not complete. Some information is missing
such as the postcode and contact person. In this case, you only need to specify
the column names that we are going to use. You may also omit the column list
but a NULL value is required for the column name that has no value.
3.4.2 UPDATE
The UPDATE statement is used to update or change records that match specified
criteria. This is accomplished by carefully constructing a WHERE clause.
UPDATE TableName
SET columnName1 = dataValue1
[, columnName2 = dataValue2...]
[WHERE searchCondition]
(b) SET clause specifies the names of one or more columns that are to be
updated;
(d) If omitted, named columns are updated for all rows in the table;
(e) If specified, only those rows that satisfy searchCondition are updated; and
(f) New dataValue(s) must be compatible with data type for corresponding
column.
Let us look at the variance in the use of the UPDATE statement for modifying
values in a table.
Updating may involve modifying a particular column for all records in a table.
Query 24: Increase the salary of each employee to 10% pay rise.
UPDATE Employee
SET Salary = Salary*1.10;
If the changes are only for particular rows with a specified criteria, then the
WHERE clause needs to be used in the statement. This can be written as follows:
UPDATE Staff
SET Salary = Salary*1.05
WHERE Position = ÂManagerÊ;
Query 26: Update the contact person, Ahmad for Total System.
We may also sometimes only need to update one column for a specific row. For
instance, this query requires us to update the contact person in the Supplier table,
Ahmad for supplier name Total System. Thus, the UPDATE statement for this
query would be as follows:
UPDATE Supplier
SET ContactPerson = „Ahmad‰
WHERE Name = ÂTotal SystemÊ;
Contact
SupNo Name Street City Postcode TelNo
Person
S8843 ABX Technics 12, Jalan Subang Subang Jaya 45600 56334532 Teresa Ng
S9884 Soft System 239, Jalan 2/2 Shah Alam 40450 55212233 Fatimah
S9898 ID 70, Jalan Hijau Petaling Jaya 41700 77617709 Larry Wong
Computers
S9990 ITN Suppliers 45, Jalan Maju Subang Jaya 45610 56345505 Tang Lee Huat
S9995 FAST 3, Lahad Lane Petaling Jaya 41760 77553434 Henry
Delivery
S9996 NR Tech 20, Jalan Selamat Kuala Lumpur 62000 23456677 Nick
S9997 Total System 25, Jalan Tanjung Kuala Lumpur 43385667 Ahmad
3.4.3 DELETE
The DELETE statement is used to delete records or rows from an existing table.
(a) TableName can be the name of a base table or an updatable view; and
(b) searchCondition is optional; if omitted, all rows are deleted from the table.
This does not delete the table. If search_condition is specified, only those
rows that satisfy the condition are deleted.
Examples 3.27 and 3.28 show the use of the DELETE command:
Query 27: Delete supplier name „Total System‰ from the Supplier table.
You need to use the WHERE clause when you want to delete only a specified
record. Thus, the statement would be as follows:
Table 3.40 shows the Supplier table after deleting records of the supplier named
Total System.
If you want to delete all records from the Shipping table, then you skip the
WHERE clause. Thus, the statement would be written as:
This command will delete all rows in the table Shipping but it does not delete the
table. This means that the table structure, attributes and indexes will still be
intact.
DML allows you to retrieve, add, modify and delete data from the table(s).
The basic DML commands are SELECT, INSERT, UPDATE and DELETE.
The SELECT statement is the most important statement for retrieving data
from the existing database. The result from each query of a SELECT
statement is in the form of a table. A SELECT statement has the following
syntax:
The SELECT statement allows result tables not only from one table but also
from more than one table. When more than one table is involved, join
operation must be used by specifying the names of tables in the FROM clause
and the join condition in the WHERE clause.
The Other SQL DML commands use for data manipulation are the INSERT,
UPDATE and DELETE commands. INSERT is used to insert new row(s) into
the existing table. UPDATE is used to modify value(s) for all or a specified
column of an existing table. DELETE is used to delete row(s) from an existing
table.
1. What are the two major components of SQL and what functions do they
serve?
3. What restrictions apply to the use of the aggregate functions within the
SELECT statement?
4. Explain how the GROUP BY clause works. Identify one difference between
the WHERE and HAVING clauses.
Pratt, P. J., & Last, M. Z. (2008). A guide to SQL (8th ed.). Mason, OH: Cengage
Learning.
LEARNING OUTCOMES
By the end of this topic, you should be able to:
1. Identify the data types supported by SQL;
2. Define the integrity constraints using SQL;
3. Use the integrity enhancement feature in the Create Table
statement; and
4. Create and delete views using SQL.
INTRODUCTION
In Topic 3, we examined in detail the structured query language (SQL)
particularly the SQL data manipulation features. By now, you would have been
comfortable with the SELECT statement.
In this topic, we will explore the main SQL data definition facilities. We begin
this topic by examining the ISO SQL data types. The integrity enhancement
feature (IEF) improves the functionality of SQL and allows the constraint
checking to be standardised. We will examine required data, domains, entity
integrity and referential integrity constraints. Then, we will discuss the main SQL
data definition features which include the database and table creation as well as
the altering and deleting of a table. This topic concludes with the creation and the
removal of views.
(a) SQL identifiers are used to identify the following items in the database:
The data type character is referred to as a string data type while exact numeric
and approximate numeric data types are referred to as numeric data types.
(i) TRUE; or
(ii) FALSE.
For example, in our Order Entry Database, the EmpNo attribute in the
Employee table has a fixed length of five characters. It is declared as:
EmpNo CHAR(5)
This column has a fixed length of five characters and when we insert
less than five characters, the string is padded with blanks to make up
for up to five characters.
Name VARCHAR(15)
The T value indicates the total number of digits and the R value
indicates the number of digits to the right of the decimal point.
Salary DECIMAL(7,2)
QtyOnHand INTEGER(4)
(e) Date
The date data type is defined in columns such as the DOB (date of birth)
column in the Employee table. This is declared in the SQL as:
DOB DATE
SELF-CHECK 4.1
A null is not a blank or zero and is used to represent data that is not available
or not applicable.
Connolly & Begg (2009)
However, some columns must contain some valid data. For example, every
employee in the Employee relation must have a position, whether they are a
salesperson, manager or a clerk. SQL provides the NOT NULL clause in the
CREATE TABLE statement to enforce the required data constraint.
To ensure that the column position of the Employee table cannot be null, we
define the column as:
When NOT NULL is specified, the Position column must have a data value.
To ensure that the gender can only be specified as „M‰ or „F‰, we define the
domain constraint in the Gender column as:
Entity integrity is defined as the primary key value of a table which must be
unique and cannot be null.
For example, every EmpNo in the Employee relation is unique and identifies the
employee.
To support the entity integrity, SQL provides the PRIMARY KEY clause in the
CREATE and ALTER TABLE statements. For example, to declare EmpNo as the
primary key, we use the clause as:
A foreign key value in a relation must match a candidate key value of the
tuple in the referenced relation or the foreign key value can be null.
Connolly & Begg (2009)
As for the Order Entry Database, the Product table has the foreign key SuppNo.
You will notice that every entry of SuppNo in the rows of the Product table (child
table) matches the SuppNo of the referenced table Supplier (parent table).
(i) CASCADE
Perform the same action to related rows. For example, if a SuppNo in
the Supplier table is deleted, then the related rows in the Product table
will be deleted in a cascading manner.
(iii) NO ACTION
Reject the delete operation from the parent table. For example, do not
allow the SuppNo in the Supplier table to be deleted if there are
related rows in the Product table.
You must also consider the impact of referenced rows on insert operations.
A referenced row (in the parent table) must be inserted before its related
rows (in the child table). For example, before inserting a row in the Product
table, the referenced row in the Supplier must exist.
SELF-CHECK 4.2
(b) Therefore, if the schema is OrderProcessing and the creator is Lim, the SQL
statement is:
The CREATE TABLE statement creates a table consisting of one or more columns
of the defined data type.
The NOT NULL is specified to ensure that the column must have a data value.
The remaining clauses are constraints and are headed by the clause:
CONSTRAINT constraintname
The PRIMARY KEY clause specifies the column(s) that comprise the primary
key. It is assumed by default that the primary key value is NOT NULL.
The FOREIGN KEY clause specifies a foreign key in the child table and its
relationship to the parent table. This clause specifies the following:
(b) A REFERENCES subclause indicting to the parent table that holds the
matching primary key;
(c) An optional ON UPDATE clause to specify the action taken on the foreign
key value of the child table, if the matching primary key in the parent table
is updated. These actions were discussed in the previous Subtopic 4.2.4;
and
(d) An optional ON DELETE clause to specify the action taken on the child
table if the row(s) in the parent table are deleted, whose primary key values
matches the foreign key value in the child table. These actions were
discussed in the previous Subtopic 4.2.4.
The following three examples show the CREATE TABLE statements for the
Order Entry Database using the tables Customer, Order and OrderDetail.
Example 4.1:
Creating the Customer table using the features of the CREATE TABLE statement:
Example 4.2:
Creating the Order table:
Example 4.3:
Creating the OrderDetail table:
You can now create the rest of the tables in the Order Entry Database as an
exercise.
(b) Adding a new table constraint and dropping an existing table constraint;
and
(c) Setting a default for a column and dropping an existing default for a
column.
The ADD COLUMN clause is the same as the definition of a column in the
CREATE TABLE statement.
The DROP COLUMN clause defines the name of the column to be dropped and
has the following options (Connolly & Begg, 2009):
(a) RESTRICT
The DROP operation is rejected if the column is referenced by another
database object.
(b) CASCADE
The DROP operation proceeds and drops the column from any database
items it is referenced by.
For example, if we want to add an extra column, that is, Branch_No to the
Employee table, the SQL statements would be:
The DROP TABLE statement should be carried out with care as the total effect
can be damaging to the rest of the database tables. It is recommended that this
clause be used if a table is created with an incorrect structure. Then, the DROP
TABLE clause can be used to delete this table and the structure can be created
again.
SELF-CHECK 4.3
1. What does the CREATE TABLE statement do?
2. What does the ALTER TABLE statement do?
3. How can we remove a table from the database?
4.4 VIEWS
What does a view mean?
A view is a virtual or derived relation that may be derived from one or more
base relations.
Views do not physically exist in the database. They allow users to customise the
data according to their needs and hide part of the database from certain users.
Let us look at how views are created.
This will give us a view known as CustomerIpoh with the same column
names as the Customer table but only those rows where the City is Ipoh.
This view is shown below in Table 4.1.
DROP VIEW causes the definition of the view to be deleted from the schema. For
example, to remove the view CustomerIpoh, we specify it in SQL as:
If CASCADE is specified, DROP VIEW deletes all objects that reference the view.
If the RESTRICT option is chosen and other database items that depend on the
existence of the view are being dropped, then the command is rejected.
SELF-CHECK 4.4
SQL identifiers can use the letters a-z (upper and lower), numbers and the (_)
for table, view and column names. The identifiers cannot be more than 128
characters, must start with a letter and cannot contain spaces.
The available data types identified in SQL are Boolean, character, exact
numeric, approximate numeric and date.
Required data in SQL are specified by the NOT NULL clause. Domain
constraint is specified using the CHECK clause.
Foreign keys are specified using the FOREIGN KEY clause. The update and
delete actions on referenced rows are specified by the ON DELETE and ON
UPDATE subclauses.
We can remove a table from the database by using the DROP TABLE
statement.
A view is a derived relation that does not physically exist in the database. It
allows users to customise the data according to their needs.
A view is created by the CREATE VIEW statement. It is not a stored table and
it is not necessary to recreate the view each time it is referenced. The different
types of views that can be created include the horizontal and vertical views.
5. Discuss how you would maintain this as a materialised view and under
what circumstances you would be able to maintain the view without
having to access the underlying base table Component.
INTRODUCTION
In order to design a database, there must be a clear understanding of how the
business operates, so that the design produced will meet user requirements. The
Entity-Relationship (ER) model allows database designers, programmers and end
users to give their input on the nature of the data and how it is used in the
business. Therefore, the ER model is a means of communication that is non-
technical and easily understood.
In this topic, you are provided with the basic concepts of the ER model which
enable you to understand the notation of ER diagrams. The CrowÊs Foot notation
is used here to represent the ER diagrams.
5.1 ENTITY
What does entity mean?
Entity can be physical such as people, places or objects as well as events and
concepts such as reservation or course. A full list is given in Table 5.1:
Entity Objects
Persons DOCTOR, CUSTOMER, EMPLOYEE, STUDENT, SUPPLIER
Places BUILDING, OFFICE, FACULTY
Objects STATIONERY, MACHINE, BOOK, PRODUCT, VEHICLE
Events TOURNAMENT, AWARD, FLIGHT, ORDER, RESERVATION
Concepts COURSE, FUND, QUALIFICATION
5.2 ATTRIBUTES
What does attribute stand for?
The attributes of the entity Customer are CustNo, Name, Street, City, Postcode,
TelNo and Balance.
For example, the notation for entity Customer with the stated attributes is
represented in Figure 5.2. The primary key CustNo is underlined.
5.3 RELATIONSHIPS
What can you say to define relationship?
Consider the example in Figure 5.3 between the Customer entity and Order
entity. In the CrowÊs Foot notation, relationship names appear on the line
connecting the entity involved in the relationship. In Figure 5.3, the Makes
relationship shows that the Customer and Order entities are directly related. The
Makes relationship is binary because it involves two entities.
There are three main types of relationship that can exist between entities:
In this relationship, an Employee processes zero, one or more orders and each
Order is processed by one employee. The above Processes relationship is optional
to the Employee entity because an Employee entity can be stored without being
related to an Order entity. It is mandatory for the Order entity because an order,
however, has to be processed by one employee.
(a) The CrowÊs Foot symbol shows many related entities. The CrowÊs Foot
symbol near the Order entity type means that a customer can be related to
many orders;
To show minimum and maximum cardinality, the symbols are placed next to
each entity type in a relationship. In Figure 5.9, a customer is related to a
minimum of zero offerings, (circle in the inside position) and a maximum of
many offerings (CrowÊs Foot in the outside position). In the same way an order is
related to exactly one (one and only one) customer as shown by the single
vertical lines in both inside and outside positions. Table 5.2 shows a summary of
cardinality classifications using CrowÊs Foot notation.
Zero or one 0 1
Strong entities have their own primary keys. Examples of strong entities are
Product, Employee, Customer, Order, Invoice, etc. Strong entity types are known
as parent or dominant entities.
Weak entities borrow all or part of the primary keys from another (strong) entity.
As an example, see Figure 5.14 whereby the Room entityÊs existence is dependent
on the Building entity. You can only refer a room by providing its associated
building identifier. The underlined attribute in the Room is part of the primary
key but not the entire primary key. Therefore, the primary key of Room is the
combination of BuildingId and RoomNo.
Supertype is an entity that stores attributes that are common to one or more
entity subtypes, meanwhile subtype is an entity that inherits some common
attributes from an entity supertype and then adds other attributes that are unique
to an instance of the subtype.
Inheritance means that the attributes of a supertype are automatically part of its
subtypes, that is, each subtype will inherit the attributes of the supertype. For
example, the attributes of Pilot entity are its inherited attributes that are EmpNo,
Name and HireDate, as well as its direct attributes that are PilotLicence and Pilot
Ratings. This is because the Pilot is a subtype or Employee. These direct
attributes are known as specialist attributes.
Based on the example in Figure 5.15, the generalisation hierarchy is disjoint (non-
overlapping) because an Employee cannot be a Pilot and at the same time a
Mechanic. The employee must be a Pilot or a Mechanic or an Accountant. To
show the disjoint constraint, D is used as shown in Figure 5.16.
In Figure 5.18, the completeness constraint means every Staff must either be
employed as FullTime or PartTime Staff. To show the completeness constraint, C
is used as shown in Figure 5.18.
In contrast, the generalisation hierarchy is not complete if the entity does not fall
into any of the subtype entities. If we consider our previous example in the
Employee generalisation hierarchy as shown in Figure 5.16, we note that the
employee roles are pilot, mechanic and accountant. However, if the job role
involves an office administrator, then this entity would fall into any of the
subtypes as it would not have any special attributes. Therefore, the entity office
administrator would remain in the supertype entity as employee. The absence of
C indicates that the generalisation hierarchy is not complete.
Some generalisation hierarchies have both the disjoint and complete constraints
as shown in Figure 5.19.
ACTIVITY 5.1
1. Consider the Company database which keeps track of a
companyÊs employees, departments and projects:
(a) List all entities with their attributes. Underline the primary
key. Identify all weak entities; and
(d) The store owns several videos with the same movie title.
Unique identifier will be assigned to each movie title. Other
information on movies include a title and the year produced;
and
(e) Each movie title is associated with a list of actors and one or
more directors. The store has a unique code to identify each
actor and director. In addition to the actor and director
record, other basic information on actors and directors are
stored. By using this information, the store can easily find
movies according to the actor or director.
There are three types of employees in this hospital which are the
physician (medical doctor), nurse and administrator. Unlike
administrative staff, a physician and a nurse staff have special
attributes. A physician has a qualification and an expert area. A
nurse has a position and a ward_id where he or she is placed. A
physician treats many patients and a patient can be treated by
more than one physician. Each treatment has prescriptions. The
prescription has a prescription_id, date, product_code, dosage
and amount. A patient can be placed in a ward. A ward is serviced
by several nurse staff. The ward information includes ward
number, building, ward_type and a number of beds.
(a) Each organisation operates three divisions and each division belongs
to one company;
(b) Each division in (a) employs one or more employees and each
employee works for one division;
(c) Each of the employees in (b) may or may not have one or more
dependents, and each dependent belongs to one employee;
(d) Each employee in (c) may or may not have an employment history;
and
(e) Represent all the ER diagrams described in (a), (b), (c) and (d) as a
single ER diagram.
Rob, P., & Coronel, C. (2011). Database systems: Design, implementation and
management (8th ed.). Stamford, CT: Cengage Learning.
INTRODUCTION
In this topic, we introduce the concept of normalisation and explain its
importance in the database design. Next, we will present the potential problems
in the database design which is also referred to as update anomalies. One of the
main goals of normalisation is to produce a set of relations that is free from
update anomalies. Then, we go into the key concept of normalisation process
which is functional dependency. Normalisation involves a step by step process
or normal forms. Thus, this topic will cover a discussion of the normalisation
process up to the third normal form.
(a) Eliminating redundant data (for example, storing the same data in more
than one table); and
(b) Ensuring data dependencies make sense (only storing related data in a
table).
Both of these are worthy goals as they reduce the amount of space a database
consumes and ensure that data is logically stored. If our database is not
normalised, it can be inaccurate, slow, inefficient and it might not produce the
data we expect. Also, if we have a normalised database, queries, forms and
reports are much easier to design!
SELF-CHECK 6.1
1. Define normalisation.
Why normalisation?
Normalisation is also used to repair a „bad‰ database design, that is, given a set
of tables that exhibit update, delete and insert anomalies. The normalisation
process can be used to change this set of tables to a set that does not have
problems.
SELF-CHECK 6.2
The redundant data utilises a lot of unnecessary space and also may create
problems when updating the database, also called update anomalies, which may
lead to data inconsistency and inaccuracy.
To illustrate the problem associated with data redundancy that causes update
anomalies, let us compare the Supplier and Product relation shown in Figure 6.1
with the alternative format that combine these relations into a single relation
called Product-Supplier relation as shown in Figure 6.2. For the Supplier relation,
supplier number (SuppNo) is the primary key and for Product relation, product
number (ProductNo) is the primary key. For the Product-Supplier relation,
ProductNo is chosen as the primary key.
You should notice that in the Product-Supplier relation, details of the supplier
are included for each product. These supplier details (SName, TelNo and
ContactPerson attributes) are unnecessarily repeated for every product that is
supplied by the same supplier and this leads to data redundancy. For instance,
the product number P2344 and P2346 has the same supplier, thus the same
supplier details for both products are repeated. These supplier details attribute
are also considered as a repeating group.
On the other hand, in the Product relation, only the supplier number is repeated
for the purpose to link each product to a supplier and in the Supplier relation,
details of each supplier appear only once.
A relation with data redundancy, as shown in Figure 6.2, may result in a problem
called update anomalies comprising of insertion, deletion and modification
anomalies. In the following subtopic, we illustrate each of these anomalies using
the Product-Supplier relation.
Insert on anomalies exist when adding a new record that will cause
unnecessary data redundancy or when there are unnecessary constraint
places on a task of adding new records.
There are two examples of insertion anomalies for the Product-Supplier relation
in Figure 6.2:
A deletion anomaly exists when deleting a record that would remove a record
not intended for deletion.
As a result, we lose the whole information about this supplier because supplier
S9898 only appears in the tuple that we removed. In a properly normalised
database, this deletion anomaly can be avoided as the information about supplier
and product is stored in separate relations and they are linked together using
supplier number. Therefore, when we delete product number P5443 from
Product relation, details about supplier S9898 from Supplier relation are not
affected.
Redundant information not only wastes storage but makes updates more
difficult. This difficulty is called modification anomaly.
For example, changing the name of the contact person for supplier S9990 would
require that all tuples containing supplier S9990 need to be updated. If for some
reason, all tuples are not updated, we might have a database that has two
different names of contact persons for supplier S9990.
Since our example is only dealing with a small relation, it does not seem to be a
big problem. However, its effect would be very significant when we are dealing
with a very large database.
Before we discuss the details of the normalisation process, let us look at the
functional dependency concept, which is an important concept in the
normalisation process.
SELF-CHECK 6.3
1. Briefly explain data redundancy.
2. Give one example of how data redundancy can cause update
anomalies.
3. Briefly differentiate between insertion anomalies, deletion
anomalies and modification anomalies.
For a simple illustration of this concept, let us use a relation with attributes A and
B. B is functionally dependent on A, if each value of A is associated with exactly
one value of B. This dependency between A and B is written as „AB‰.
We may think of how to determine functional dependency like this: given a value
for attribute A, can we determine the single value for B? If B relies on A, then A is
said to functionally determine B. The functional dependency between attribute A
and B is represented diagrammatically in Figure 6.4.
Another example is the relationship between CustNo and OrderNo. Based on the
CustomerOrdering relation, a customer may make more than one order. Thus, a
CustNo may be associated with more than one OrderNo. In other words, the
relationship between CustNo and OrderNo is one-to-many as illustrated in
Figure 6.5. In this case, we can say that OrderNo is not functionally dependent on
CustNo.
Take, for example, the following functional dependency that exists in the
ConsumerOrdering relation, that is, (OrderNo, ProductNo) CustNo.
CustNo is functionally dependent on a subset of A(OrderNo, ProductNo),
namely, OrderNo.
OrderNoCustNo,
OrderNo CustName
CustNo CustName
So, OrderNo attributes functionally determine the CustName via the CustNo
attribute.
Now, let us list down all the possible functional dependencies for the
CustomerOrdering relation. We will get a list of functional dependencies as listed
in Figure 6.7.
In order to find the candidate key(s), we must identify the attribute (or group of
attributes) that uniquely identifies each tuple in a relation. Therefore, to identify
the possible choices of candidate keys, we should examine the determinants for
each functional dependency. Then, we select one of them (if more than one) as
the primary key. All attributes that are not the primary key attribute are referred
to as non-key attributes. These non-key attributes must be functionally
dependent on the key.
Now, let us identify the candidate keys for relation CustomerOrdering. We have
identified the functional dependencies for this relation as given in the previous
Figure 6.7. The determinants for these functional dependencies are: CustNo,
OrderNo, ProductNo and (OrderNo, ProductNo). From this list of determinants,
the (OrderNo, ProductNo) is the only determinant that uniquely identifies each
tuple in the relation. It is also true that all attributes (besides the OrderNo and
ProductNo) are functionally dependent on the determinants with combination of
attributes OrderNo and ProductNo (OrderNo, ProductNo). Thus, it is the
candidate key and the primary key for CustomerOrdering relation.
The normalisation process involves a series of steps and each step is called a
normal form. Three normal forms were initially proposed called first normal
form (1NF), second normal form (2NF) and third normal form (3NF).
Higher normal forms that go beyond BCNF were introduced later, such as fourth
normal form (4NF) and fifth normal form (5NF). However, these later normal
forms deal with situations that are very rare. In this topic, we will only cover the
first three normal forms. Figure 6.8 illustrates the process of normalisation up to
the third normal form.
The details of the process will be discussed in the following subtopic. Let us
assume that we have transferred all the required attributes from the user
specification requirement into the table format and referred to it as
CustomerOrdering table as shown in Table 6.3. We are going to use the
CustomerOrdering table to illustrate the normalisation process.
(a) Nominate an attribute or group of attributes to act as the key for the
unnormalised table; and
(b) Identify the repeating groups(s) in the unnormalised table which repeats
for the key attribute(s).
(a) By entering appropriate data into the empty columns of rows containing
the repeating data (fill in the blanks by duplicating the non-repeating data
where required); or
(b) By placing the repeating data along with a copy of the original key
attribute(s) into a separate relation.
Then, after performing one of the above approaches, we need to check whether
the relation is in the 1NF or not. In order to do so, we have to follow these rules:
(c) Place the repeating groups into a separate relation along with a copy of its
determinants.
The process above must be repeated for all the new relations created for the
repeating attributes to ensure that all relations are in 1NF.
For example, let us use the first approach by entering the appropriate value to
each cell of the table. Then, we will select a primary key for the relation and
check for repeating groups. If there is a repeating group, then we have to remove
the repeating group to a new relation.
The first step is to check whether the table is unnormalised or is already in the
1NF. Using the CustomerOrdering table to illustrate the normalisation process,
we then select a primary key for the table, which is CustNo. Next, we need to
find repeating groups or multi-valued attributes. We can see that ProductNo,
ProductName, UnitPrice and QtyOrdered have more than one value for CustNo
= „C1010‰ and „C2388‰. So, these attributes are repeating groups and thus, the
table is unnormalised.
As illustrated in Figure 6.8, our next step is to transform this unnormalised table
into 1NF. First, we need to make the table into a normalised relation. Let us apply
the first approach in which we need to fill up all the empty cells with a relevant
value as shown in Table 6.4. Each cell in the table now has an atomic value.
The next step is to check if the table we just created is in 1NF. Firstly, we need to
identify the primary key for this table and then check for repeating groups. The
best choice would be to look at the list of functional dependencies that you have
identified. From the functional dependency list, we can say that the combination
of OrderNo and ProductNo (OrderNo, ProductNo) is functionally determined by
all the non-key attributes in the table.
This means that the value of each (OrderNo, ProductNo) is associated with only
a single value of all other attributes in the table and (OrderNo, ProductNo) also
uniquely identifies each of the tuple in the relation. Thus, we can conclude that
(OrderNo, ProductNo) is the best choice as the primary key, since the relation
will not have any repeating group. Therefore, this relation is in 1NF (refer to
Table 6.5).
Let us now transform Table 6.5 to 2NF. The first step is to examine whether the
relation has partial dependency. Since the primary key chosen for the relation is a
combination of two attributes, therefore we should check for partial dependency.
From the list of functional dependencies, attributes ProductNo and UnitPrice are
also full functionally dependent on part of the primary key which is the
ProductNo while the CustNo, CustName, TelNo and OrderDate are fully
functionally dependent on part of the primary key, which is the OrderNo.
Thus, this relation is not in 2NF and we need to remove these partial dependent
attributes into a new relation along with a copy of their determinants. Therefore,
we have to remove ProductName and UnitPrice into a new relation, along with
its determinant which is ProductNo. We also need to remove CustNo,
CustName, TelNo and OrderDate into another new relation along with the
determinant OrderNo. After performing this process, 1NF Customer Ordering
relation is now broken down into three relations, which can be named as
Product, Order and OrderProduct as listed in Figure 6.9.
Figure 6.9: Second Normal Form relations derived from CustomerProduct relation
Since we made changes to the original relation and have created two new
relations, we need to check and ensure that each of these relations is in 2NF.
Based on the definition of 2NF, these relations must first be checked for 1NF test
for repeating groups, then be checked for partial dependency. All these relations
are in 1NF as none of them have repeating groups.
For relations Order and Product, we may skip the partial dependency test as
their primary key only has one attribute. Thus, both of the relations are already
in 2NF. For the OrderProduct relation, there is only one non-key attribute which
is QtyOrdered and this attribute is fully functionally dependent on (OrderNo,
ProductNo). Thus, this relation is also in 2NF.
Now, let us look at all the three 2NF relations as shown in the previous
Figure 6.9. Since we are looking for a functional dependency between two non-
key attributes, we can say that the relation OrderProduct is already in 3NF. This
is because this relation only has one non-key attribute which is the QtyOrdered.
We need to check for relation Product and Order, as both of these relations have
more than one non-key attribute.
repeating group, thus it is in 1NF. It is also in 2NF since its primary key consists
of only one attribute. It also has no transitive dependency and thus, the Customer
relation is already in 3NF.
Now, let us check the other three relations. All of them have no transitive
dependencies. Therefore, we conclude that these relations are in 3NF, as shown
in Figure 6.10.
Figure 6.10: Third normal form relations derived from the CustomerProduct relation
The 1NF eliminates duplicate attributes from the same relation, creates
separate relations for each group of related data and identifies each tuple
with a unique attribute or set of attributes (the primary keys).
The 2NF will remove subsets of data that apply to multiple rows of a table,
place them in separate tables and create relationships between these new
relations and the original relation by copying the determinants of the partial
dependency attributes to the new relations.
The 3NF will remove columns that are not dependent upon the primary key,
which is the functional dependency between the two non-key attributes.
Refer to the following figure and convert this user view to a set of 3NF relations.
XYZ COLLEGE
CLASS LIST
SEMESTER SEPT 2013
Rob, P., & Coronel, C. (2011). Database systems: Design, implementation and
management (8th ed.). Stamford, CT: Cengage Learning.
LEARNING OUTCOMES
By the end of this topic, you should be able to:
1. Discuss the purpose of design methodology;
2. Explain the three main phases of design methodology; and
3. Apply the methodology for designing relational databases.
INTRODUCTION
In this topic, we will describe three main phases of database design methodology
for relational databases. These phases are namely conceptual, logical and
physical database designs. The conceptual design phase focuses on building a
conceptual data model which is independent of software and hardware
implementation details.
The logical design phase maps the conceptual model on to a logical model of a
specific data model but independent of the software and physical consideration.
Last but not least the physical design phase is tailored to a specific database
management system (DBMS) and focuses on the hardware requirements. The
detailed activities associated with each of these phases will be discussed more on
the next subtopics.
Normally, a design methodology is broken down into phases or stages and for
each phase, the detailed steps are outlined and appropriate tools and techniques
are specified. Design methodology is able to support and facilitate designers in
planning, modelling and managing a database development project in a
structured and systematic manner. Validation is one of the key aspects in the
design methodology as it helps to ensure that the produced models accurately
represent the user requirement specifications.
(a) Conceptual
The conceptual database design is aimed at producing a conceptual
representation of the required database. The core activity in this phase
involves the use of entity-relationship (ER) modelling in which the entities,
relationship and attributes are defined.
(b) Logical
In the logical design phase, the aim is to map the conceptual model which is
represented by the ER model to the logical structure of the database.
Among the activities involved in this phase is the use of the normalisation
process to derive and validate relations.
(c) Physical
In the physical design phase, the emphasis is to translate the logical
structure of the physical implementation of the database using the defined
database management system.
Besides the stated three main phases, this methodology has also outlined eight
core steps. Step 1 focuses on the conceptual database design phase, Step 2 focuses
on the logical database design phase and Step 3 to Step 8 focus on the physical
database design phase. This topic will only cover Step 1 to Step 6 (refer to Table 7.1).
Step Description
1 Build a Conceptual Data Model
(a) Identify the entity
(b) Identify the relationship
(c) Identify and associate the attributes with entity or relationship
(d) Determine the attribute domains
(e) Determine the candidate, primary and alternate key attributes
(f) Consider the use of enhanced modelling concepts (optional step)
(g) Check the model for redundancy
(h) Validate the conceptual model against user transactions
(i) Review the conceptual data model with user
2 Build and Validate a Logical Data Model
(a) Derive the relations for logical data model
(b) Validate the relations using normalisation
(c) Validate the relations against user transactions
(d) Check the integrity constraints
(e) Review the logical data model with user
(f) Merge the logical data models into a global model (optional step)
3 Translate a Logical Database Design for a Target DBMS
(a) Design the base tables
(b) Design the representation of derived data
(c) Design the remaining business rules
4 Design the File Organisations and Indexes
(a) Analyse the transactions
(b) Choose the file organisation
(c) Choose the indexes
(d) Estimate the disk space requirements
5 Design the User Views
6 Design a Security Mechanism
(d) Incorporate structural and integrity considerations into the data models;
(h) Build a data dictionary to supplement the data model diagrams; and
These factors serve as a guideline for designers and they need to be incorporated
into the database design methodology.
SELF-CHECK 7.1
(a) Entity;
(b) Relationship;
(i) Customer;
(ii) Employee;
(iii) Product;
(iv) Order;
(v) Invoice;
(vii) Supplier.
For our product ordering case study, we have identified the following
relationships:
(c) Step 1c: Identify and Associate the Attributes with Entity or Relationship
After identifying the entity and relationship, the next step is to identify
their attributes. It is important to determine the type of these attributes.
Again, we need to document the details of each identified attribute. For our
case study, the list of attributes for the defined entities is as follows:
(e) Step 1e: Determine the Candidate, Primary and Alternate Key Attributes
As we have mentioned in Topic 2, a relation must have a key that can
uniquely identify each of the tuples. In order to identify the primary key,
we need to first determine the candidate key for each of the entities.
The primary key for each of the entities are underlined as follows:
(f) Step 1f: Consider the Use of Enhanced Modelling Concepts (Optional Step)
This step is involved with the use of enhanced modelling concepts such as
specialisation or generalisation, aggregation and composition. These
concepts are beyond the scope of our discussion.
(h) Step 1h: Validate the Conceptual Model Against User Transactions
We have to ensure that the conceptual model supports the transactions
required by the user view.
(i) Step 1i: Review the Conceptual Data Model with User
User involvement during the review of the model is important to ensure
that the model is a „true‰ representation of the userÊs view of the enterprise.
SELF-CHECK 7.2
(a) Step 2a: Derive the Relations for Logical Data Model
Firstly, we create a set of relations for the logical data model based on the
ER model produced in the prior design phase to represent the entities,
relationships and key attributes.
(ii) Attribute domain constraints: Define a set of allowable values for each
attribute;
(e) Step 2e: Review the Logical Data Model with User
In this step, we need to let the user review the logical data model to ensure
that the model is the true representation of the data requirements of their
enterprise. This is to ensure that the user is satisfied and we can continue to
the next step.
(f) Step 2f: Merge the Logical Data Models into a Global Model (Optional Step)
This step is important for a multi-user view. Since each user view will have
its own conceptual model, (referred to as local conceptual model) therefore
each of these models will be mapped to a separate local logical data model.
During this step, all the local logical models will be merged into one global
logical model. Since we consider our case study as a single user view, this
step is skipped.
SELF-CHECK 7.3
The output from the logical design phase consisting of all the documents that
provide a description of the process of the logical data model such as the ER
diagram, relational schema and data dictionary are important sources for the
physical design process. Unlike the logical phase which is independent of the
DBMS and implementation consideration, the physical phase is tailored to a
specific DBMS and is dependent on the implementation details.
In the physical phase, Connolly and Begg (2009) have outlined six steps, starting
with Step 3 until Step 8. For our discussion of this phase, we only present Step 3
to Step 6 as follows:
In the logical design phase, the aim is to map the conceptual model which is
represented by the ER model to the logical structure of the database. Among
the activities involved in this phase is the use of the normalisation process to
derive and validate relations.
2. How would you check a data model for redundancy? Give an example to
illustrate your answer.
Rob, P., & Coronel, C. (2001). Database systems: Design, implementation and
management (8th ed.). Stamford, CT: Cengage Learning.
LEARNING OUTCOMES
By the end of this topic, you should be able to:
1. Discuss the importance of database security to an organisation;
2. Identify five types of threats that can affect a database system;
3. Describe six methods to protect a computer system using
computer-based controls; and
4. Identify four methods for securing a Database Management System
(DBMS) on the Web.
INTRODUCTION
In this topic, we will discuss database security. What do you think about security
in general? Do you feel safe at home or on the road? What about database
security? Do you think that database security is important? What is the value of
the data? What if your personal data or your financial data is stolen? Do you
think that harm could come to you? For sure, some of you have watched spy
movies where computer hackers hack the computer system to access confidential
data and what they do with this information. These are some of the questions
that you might need to think of and consider.
Security is a broad subject and involves many legal and ethical issues. Of course,
there are a few approaches that can be applied in order to maintain database
security. However, before talking about the ways to protect our database, let us
first discuss in more detail the various threats to a database in the next subtopic.
ACTIVITY 8.1
Visit the following website that discusses the balance between the roles
and rights regarding database security:
http://databases.about.com/od/security/a/databaseroles.htm
Whether the threat is intentional or unintentional, the impact may be the same.
The threats may be caused by a situation or event that involves a person, action
or circumstance that is likely to produce harm to someone or to an organisation.
Threats to data security may be a direct and intentional threat to the database.
For instance, those who gain unauthorised access to a database like computer
hackers may steal or change the data in the database. They would have to have
special knowledge in order to do so. Table 8.1 illustrates five types of threats and
12 examples of threats (Connolly & Begg, 2009).
SELF-CHECK 8.1
1. Define a threat.
8.2.1 Authorisation
What does authorisation mean?
Usually, a user or subject can gain access to a system through individual user
accounts where each user is given a unique identifier which is used by the
operating system to determine that they have the authorisation to do so. The
process of creating the user accounts is usually the responsibility of a system
administrator. Each user account is given a unique password chosen by the user
and known to the operating system.
A separate but similar process would be applied to give the authorised user
access to a database management system (DBMS). This authorisation is the
responsibility of a database administrator. In this case, an authorised user to a
system may not necessarily have access to a DBMS or any associated application
program (Connolly & Begg, 2009).
Referring to Table 8.2, we can see that the personnel whose passwords are
„SUMMER‰ can only read the data while the personnel with the password
„SPRING‰ can perform read, insert and modify the data. However, notice that
the authorisation table that consists of the authorisation rules contain highly
sensitive data. They should be protected by stringent security rules. Usually, the
one selected as the data administration has the authority to access and modify
the table (Hoffer et al., 2008).
The DBMS keeps track of how these privileges are granted to users and possibly
revoked, and ensures that at all times, only users with necessary privileges can
access an object.
8.2.3 Views
What does view mean?
In other words, a view is created by querying one or more of the base tables,
producing a dynamic result table for the user at the time of the request (Hoffer
et al., 2008).
The user may be allowed to access the view but not the base tables which the
view is based. The view mechanism hides some parts of the database from
certain users and the user is not aware of the existence of any attributes or rows
that are missing from the view. Thus, a user is allowed to see what they need to
see only. Several users may share the same view but only restricted ones may be
given the authority to update the data.
Backup is very important for a DBMS to recover the database following a failure
or damage.
A DBMS should provide four basic facilities for backup and recovery of a
database. They are:
(ii) Cold backup: Database is shut down and appropriate for small
database; and
(iii) Hot backup: Only a selected portion of the database is shut down
from use and is more practical for large databases.
8.2.5 Encryption
What does encryption stand for?
Data encryption can be used to protect highly sensitive data like customer credit
card numbers or user passwords. Some DBMS products include encryption
routines that would automatically encode the sensitive data when they are stored
or transmitted over communication channels. For instance, encryption is usually
used in electronic funds transfer systems. So, if the original data or plain text is
RM5000, it may be encrypted using a special encryption algorithm that would be
changed to XTezzz. Any system that provides an encryption facility must also
provide the decryption facility to decode the data that has been encrypted. The
encrypted data is called cipher text.
(a) One-Key
With the one-key approach, also known as data encryption standard (DES),
both the sender and the receiver need to know the key that is used to
scramble the transmitted or stored data.
(b) Two-Key
A two-key approach, also known as asymmetric encryption, employs a
private and a public key. This approach is popular in e-commerce
applications for transmission security and database storage of payment
data such as credit card numbers.
The five hardware components that should be fault-tolerant are (Connolly &
Begg, 2009):
SELF-CHECK 8.2
Thus, only users who key in the correct password can open the database.
However, once a database is open, all the objects in the database can be accessed.
Therefore, it is advisable to change the password regularly.
SELF-CHECK 8.3
(c) The receiver can be certain that the data came from the sender;
(d) The sender can be certain that the receiver is genuine; and
Another issue that needs to be considered in the web environment is that the
information being transmitted may have executable content. An executable
content can perform the following malicious actions (Connolly & Begg, 2009)
such as:
Nowadays, malware or malicious software like computer viruses and spam are
widely spread. What do computer viruses and spam mean?
A spam is unwanted e-mail that we receive without knowing who the sender
is or without wanting to receive the e-mail.
Their presence could fill up the inbox of e-mails and we would be just wasting
our time deleting them. Thus, the next subtopic will discuss some of the methods
on how to secure the database in a web environment (refer to Figure 8.3).
Figure 8.3: Four methods for securing database management system (DBMS)
If it cannot fulfil the requests itself, then it will pass the request to the web server.
Thus, actually its main purpose is to improve performance. For instance, assume
that User 1 and User 2 access the web through a proxy server. When User 1
requests a certain web page and later User 2 requests the same, the proxy server
would just fetch the page that has been residing in the cache page. Thus, the
retrieval process would be faster.
Besides that, proxy servers can also be used to filter requests. For instance, an
organisation might use a proxy server to prevent its employees or clients from
accessing certain websites. In this case, the known bad websites or insecure
websites could be identified and access to it could be denied (Connolly & Begg,
2009).
8.4.2 Firewalls
What does firewall mean?
A digital signature could be used to verify that the data comes from the
authorised sender.
(a) A string of bits that is computed from the data that is being signed using
signature algorithms; and
It also provides the receiver with the ways to decode a reply. A digital certificate
could be applied from a Certificate Authority. The Certificate Authority issues an
encrypted digital certificate that consists of the applicantÊs public key and
various other identification information. The receiver of an encrypted message
uses the Certificate AuthorityÊs public key to decode the digital certificate
attached to the message (Connolly & Begg, 2009).
ACTIVITY 8.2
Loss of confidentiality;
Loss of privacy;
Loss of availability.
The security measures associated with DBMSs on the web include proxy
servers, firewalls, digital signatures and digital certificates.
Authorisation Encryption
Authentication Firewalls
Cold backup Hot backup
Decryption Redundant array of independent disks (RAID)
Digital certificates Recovery
Digital signatures Threat
LEARNING OUTCOMES
By the end of this topic, you should be able to:
1. Identify the ACID properties of transactions;
2. Discuss the concepts of concurrency transparency;
3. Describe the concepts of recovery management;
4. Explain the role of locks to prevent interference problems among
multiple users; and
5. Summarise the role of recovery tools to deal with database failures.
INTRODUCTION
The data in a database must always be consistent and reliable; these are the
essential functions of the database management system (DBMS). This topic
focuses on transaction management which consists of concurrency control and
recovery management, both of which are functions of DBMS.
In this topic, you will look at the properties of transactions and under
concurrency control, you will study its objectives, types of interference problems
and tools to prevent interference problems caused by multiple accesses. You will
also find out about the different types of failures that can occur in the database,
recovery tools used by the DBMS and also the recovery process.
For example, if we want to give all employees a pay rise of 15%, operations to
perform this action are shown in Figure 9.1.
Begin_Transaction
Read EmpNo, Salary
Salary = Salary * 1.15
Write EmpNo, Salary
Commit
This is a simple transaction to give all employees a pay rise of 15%. The Begin
Transaction and Commit statements define the statement in a transaction. Any
other structured query language (SQL) statements between them are part of the
transaction. In Figure 9.1, the transactions consist of two database operations that
are the Read and Write. If a transaction completes successfully (refer to
Figure 9.1), it will commit and the database will return to a new consistent state.
Besides the Begin Transaction and Commit statements, the Rollback statement
may be used. The Rollback statement will remove all the effects of a transaction if
it does not execute successfully. A Rollback statement can be used in several
contexts for example to cancel a transaction or to respond to errors. A sample
transaction with the Rollback statement is shown in Figure 9.2.
Begin_Transaction
Read ProductNo(X), QtyOnHand
QtyOnHand = QtyOnHand + 10
Write ProductNo(X), QtyOnHand
......
......
Rollback
Property Description
Atomicity Although a transaction is conceptually atomic, a transaction would
usually consist of a number of steps. It is necessary to make sure that
either all actions of a transaction are completed or the transaction has
no effect on the database. Therefore, a transaction is either completed
successfully or rolled back. This is sometime called the all-or-nothing
property.
Consistency Although a database may become inconsistent during the execution of
a transaction, it is assumed that a completed transaction preserves the
consistency of the database. For example, if a person withdraws RM100
from an ATM, the personÊs account is balanced before withdrawal
(transaction). After withdrawal, the account must also be balanced.
Isolation No other transactions should view any partial results of the actions of a
transaction since intermediate states may violate consistency. Each
transaction must be executed as if it was the only transaction being
carried out.
Durability Once the transaction has completed successfully, its effects must persist
and a transaction must complete before its effects can be made
permanent. A committed transaction cannot be aborted.
As stated earlier, DBMS provides the following two services to ensure that
transaction follows the ACID properties, recovery management and concurrency
management (Mannino, 2011). These two services are:
In Table 9.3, TransactionB updates BalanceX (BX) to RM600 but aborts the
transaction. This could be due to an error or maybe because it was updating
the wrong account. It aborts the transaction by issuing a Rollback which
causes the value in BX to revert back to its original value of RM500.
However, Transaction A is incorrectly reading the value in BX as RM600 at
time T5. It then decrements it by RM100, giving BX an incorrect value of
RM500 and goes on to commit it. As you would have figured, the correct
value in BX is RM400.
(a) Locks
Locks prevent other users from accessing a database item in use. A
database item can be a column, row or even an entire table. A transaction
must acquire a lock before accessing a database item. There are two types of
locks:
The transaction need not request for all locks at the same time. The
transaction would usually acquire some locks, do some processing and
continue to get more locks as needed. However, the transaction would only
release all the locks when no new locks are needed (Connolly & Begg,
2009).
(i) Before reading or writing a data item that the transaction must
acquire, S or X lock to the data item; and
(ii) After releasing a lock, the transaction does not acquire any new locks.
We take a look at how the 2PL is used to overcome the three interference
problems as stated earlier in subtopic 9.2.1.
TransactionA obtains the exclusive locks for both BX and BY. It then
proceeds to do the updating of BX and BY, releasing the locks only
when BX and BY is committed. Meanwhile, when TransactionB
requests for the shared lock for BX it must wait, as TransactionA has
the exclusive lock. When the shared lock for BX is granted after
TransactionA commits, it totals BX and then requests for the shared
lock for BY which is granted. TransactionB then goes on to add BY to
Sum, giving a total value of RM1000; unlike Table 9.4 which gave an
incorrect value of RM900 when 2PL was not used.
(iv) Deadlocks
The use of locks to solve interference problems can lead to deadlocks.
When two transactions are each waiting for locks to be released that
are held by the other, it can result in a deadlock. It is the problem of
mutual waiting (Connolly & Begg, 2009; Mannino, 2011).
Item Definition
Locking granularity Refers to the size of the database item locked
Coarse granularity Refers to large items such as the entire database or
an entire table
Finer locks Refers to the row or a column
(b) Media failures such as disk crash: Affecting parts of secondary storage;
(c) Application software errors such as logical errors in the program accessing
the database;
(d) Natural physical disasters such as power failures, fires or earthquakes; and
Transaction identifiers;
Before-image of the data item, the value before the change (update
and delete operations only);
After-image of the data item, the value after the change (insert and
update operations only); and
(ii) Checkpoint records. They are described next. A section of the log file
reproduced here from Mannino (2011) is shown in Table 9.11.
Tid Time Operation Table Row Column Before Image After Image
T1 10:12 Start
T1 10:13 Update Acct 1000 AcctBal 100 200
T1 10:14 Update Acct 1514 AcctBal 500 400
T1 10:15 Commit
(b) Checkpoint
What does checkpoint stand for?
At this point, a checkpoint record is written to the log and database buffers
are written to disk.
(i) Writing all log records in the main memory to secondary storage;
(i) To perform the read operation, the database does the following:
Finds the address of the disk block that contains the employee
record;
Copies the salary data from the database buffer into a variable.
(ii) To perform the write operation the database does the following:
Finds the address of the disk block that contains the employee
record;
Copies the salary data from a variable into the database buffer;
and
Database buffers are in the main memory where data is transferred to and
from secondary storage. Buffers are flushed to secondary storage when they
are full or when the transaction commits. It is only then that update
operations are considered as permanent. If the failure occurs between writing
to the buffers and flushing the buffers to a secondary storage, the recovery
manager must determine if the transaction has committed or not.
If the transaction has not committed, then the recovery manager has to undo
the effects of the transaction on the database to ensure atomicity (Connolly &
Begg, 2009).
SELF-CHECK 9.1
1. What are the causes of database failures?
(a) If the damage to the database is massive, then the last backup copy of the
database will be restored and the update operations of committed
transactions is reapplied using the log file; and
Database writes must occur after the corresponding writes to the log file. This
is known as write ahead log protocol.
If updates are made to the database first and the failure occurred before writing
to the log records, the recovery manager would not be able to identify which
transaction needs to be undone or redone.
The recovery manager examines the log file after a failure to determine if the
transaction needs to be undone or redone (Connolly & Begg, 2009):
(a) Any transaction with the transaction start and the transaction commit log
records should be redone. The records will be redone using the after image
log records for the transactions; and
(b) Any transaction with the transaction start but not the transaction commits
log records should be undone. The records will be undone using the before
image log records.
To help you understand recovery from a system failure, Figure 9.5 shows a
number of transactions with the commit time, most recent checkpoint tc and
system failure tf.
A summary of recovery operations for the transaction timeline in Figure 9.5 for
deferred update is shown in Table 9.12 and a summary of the operations for
immediate update is shown in Table 9.13.
Table 9.12: Summary of Restart Work for the Deferred Update Technique
Table 9.13: Summary of Restart Work for the Immediate Update Technique
There are properties that all transactions should follow. These four properties
are known as ACID properties: atomicity, consistency, isolation and
durability.
Locks and the two-phase locking protocol are used by most DBMSs to
prevent interference problems.
Locking granularity refers to the size of the database item locked. Coarse
granularity refers to large items such as the entire database or an entire table.
Finer locks refer to the row or a column.
The causes for failure are system crashes, media failures, application software
errors, natural physical disasters and carelessness or unintentional
destruction of data.
The DBMS is equipped with some tools to support the recovery process
which include the log file and checkpoint table.
There are two techniques available to the DBMS to help recover the database.
These two techniques are known as deferred update and immediate update.
In the deferred update, only the redo operator is used whereas in the
immediate update, both the redo and undo operator are applied.
3. Explain the mechanism for concurrency control that can be used in a multi-
user environment.
4. Explain why and how the log file is an important feature in any recovery
mechanism.
Rob, P., & Coronel, C. (2011). Database systems: Design, implementation and
management (8th ed.). Stamford, CT: Cengage Learning.
INTRODUCTION
In this topic, we will discuss web technology and database management system
(DBMS). As the use of the world wide web (the web) has increased, the
importance of databases has become evident. What do you think is the reason for
this? Have you noticed that e-commerce, business conducted over the Internet
and information retrieval is easily available right now? These are some of the
reasons of the importance of databases as a lot of information need to be stored
and retrieved. It seems that almost every business and government agencies have
stepped up to the challenge of adapting its business to take advantage of the
global network called the Internet.
Many websites today are file-based where each web document is stored in a
separate file. This approach should be suitable for small websites. However, for
large websites, this approach may lead to problems. Thus, the aim of this topic is
to examine some of the current technologies for web-DBMS integration.
ACTIVITY 10.1
http://businessfinancemag.com/technology/8-factors-choosing-database
(a) Hierarchical
IBM introduced the first generation of database technology, known as
hierarchical database, in the mid-1960s. In a hierarchical database, records
are grouped in a logical hierarchy, connecting in a branching structure
similar to an organisational chart. An application retrieves data by first
finding the primary record and then, follows the pointers that are stored in
the record to other connected records.
Copyright © Open University Malaysia (OUM)
198 TOPIC 10 WEB TECHNOLOGY AND DATABASE MANAGEMENT SYSTEM (DBMS)
For example, under a customerÊs name (parent), would be stored the child,
description of the last purchase and date. A child under that would be the
individual items purchased, cost per item and description of item. Another
child under that would be the item's manufactureÊs name. Hierarchical
database only allow for one parent segment per child. In other words, it
only allows for one-to-many relationships.
(b) Network
Network database, which was introduced in the 1970s, allows complex data
structures to be built but is inflexible and requires careful design. It is very
fast and efficient in storage like airline booking system. It allows for many-
to-many relationships.
(c) Relational
A relational database allows the definition of data structures, storage and
retrieval operations as well as integrity constraints. The data and relations
between them are organised in tables. A table is a collection of records and
each record in a table contains the same fields, as mentioned in earlier topics.
Certain fields may be designated as keys, which mean that searches for
specific values of that field will use indexing to speed them up. Where
fields in two different tables take values from the same set, a join operation
can be performed to select related records in the two tables by matching
values in those fields. Often but not always, the fields will have the same
name in both tables. Table 10.1 shows the advantages and disadvantages of
relational databases.
Advantages Disadvantages
There are many popular types of Some restrictions on field lengths
DBMS in use and as a result, which can lead to occasional
technical development effort practical problems.
ensures that advances like object- Structured Query Language (SQL)
orientation and web serving appear does not provide an efficient way to
quickly and reliably. browse alphabetically through an
There are many third party tools index.
such as report writers that are tuned
to work with the popular relational
DBMS via standards such as Open
Database Connectivity (ODBC).
Offer distributed databases and
distributed processing options
which might be advantageous for
some large organisations.
(d) Object-Oriented
Object-oriented for a database means the capability of storing and
retrieving objects in addition to mere data. It adds database functionality to
object programming languages where applications require less code, use
more natural data modelling and code bases are easier to maintain.
SELF-CHECK 10.1
It provides a simple one-stop-centre that allows users to explore the large volume
of pages of information residing on the Internet. The information is presented on
web pages that consist of a collection of text, graphics, pictures, sound, video and
hyperlinks to other web pages. The hyperlinks allow users to navigate to other
web pages in a non-sequential approach (Connolly & Begg, 2009).
The web consists of a network of computers that has two roles, which are, as
servers that provide information and as clients that request information.
Examples of web servers are Apache HTTP Server and Microsoft Internet
Information Server (IIS). Examples of clients or web browsers are Microsoft
Internet Explorer, and Mozilla (Connolly & Begg, 2009). There are three basic
components of a web environment, as shown in Table 10.2.
A web page can be either static or dynamic. A static web page is where the
content of the document does not change unless the soil itself is changed. On the
other hand, the content of a dynamic web page is generated each time it is
accessed.
As a database is dynamic and changes as users create, insert, update and delete
data, then using dynamic web pages would be much more suitable than static
web pages. Dynamic web pages need hypertext that can be generated by servers.
To achieve this, scripts can be written that perform the conversion from different
data formats into HTML (Connolly & Begg, 2009).
(b) Data and vendor have independent connectivity to allow freedom of choice
in the selection of DBMS now and in the future;
(c) The ability to interface to the database independent of any proprietary web
browser or web server;
(f) A cost-effective solution that allows for scalability, growth and changes in
strategic directions and helps reduce the costs of developing and
maintaining applications;
How do we integrate the web and DBMSs? The approaches are as follows:
(d) Extension to the web server like MicrosoftÊs Internet Information Server
API (ISAPI);
(e) Java, J2EE, JDBC, SQLJ, JDO, Servlets and JavaServer Pages (JSP);
(f) MicrosoftÊs web solution platform: .NET, Active server pages (ASP) and
ActiveX data objects (ADO); and
You might wonder why we need web-DBMS integration. This is because of these
eight advantages (Table 10.3).
Advantages Explanation
Simplicity HTML as a markup language is easy for both developers and end-
users to learn since it does not have an overly complex
functionality.
Platform Most of the web browsers are platform-independent, thus,
independence applications do not need to be modified to run on different
operating system or windows-based environments.
Graphical user Web browsers provide a common, easy-to-use GUI that can be
interface (GUI) used to access databases. With a common interface, training cost for
the end-users can be reduced.
Standardisation An HTML document on one machine can be read by users on any
machine in the world with an Internet connection and a web
browser.
Cross-platform Web browsers are available for almost every type of computer
support platform and this allows users on most types of computers to
access a database from anywhere in the world. Thus, information
can be accessed with minimum time and effort.
Transparent This built-in support for networking simplifies database access,
network access without having the users to purchase separate expensive
networking software.
Scalable By storing the application on a separate server, the Web eliminates
deployment the time and cost associated with application deployment. Thus, it
simplifies the handling of data maintenance and management of
multiple platforms across different offices.
Innovation The web allows organisations to provide new services and connect
to new customers through globally accessible applications.
What about the disadvantages of the web-DBMS approach? They are discussed
in Table 10.4.
Disadvantages Explanation
Reliability Difficulties in sending and receiving data may arise when access to
information on a server is done at peak times due to overloading of
userÊs access
Security Once you have sent or received data, there is no 100% guarantee
that the data is secured. User authentication and secure data
transmissions are critical due to the large number of anonymous
users
Limited Even though HTML provides an easy-to-use interface, some highly
functionality of interactive database applications may not be converted easily to
HTML web-based applications. These extra functionalities may be added
but then, it may be too complex for some users. There may also be
some overhead performance in downloading and executing these
codes.
ACTIVITY 10.2
SELF-CHECK 10.2
A few approaches for integrating databases into the web environment are
scripting languages, CGI, HTTP cookies and OracleÊs Internet platform.
1. Using any web browser, take a look at government, private and educational
institution websites and write a report on the GUI and security of the
websites.
OR
Thank you.