Introduction Database
Introduction Database
CDDB3103
Introduction to Database
INTRODUCTION
CDDB3103 Introduction to Database is one of the courses offered by the Faculty
of Information Technology and Multimedia Communication at Open University
Malaysia (OUM). This course is worth three credit hours and should be covered
over 8 to 15 weeks.
COURSE AUDIENCE
This course is targeted to all IT students specialising in Information Systems.
Students enrolled in other IT-related specialisations also will also find this course
useful as this course will answer many of their questions regarding database
system development.
STUDY SCHEDULE
It is a standard OUM practice that learners accumulate 40 study hours for every
credit hour. As such, for a three-credit hour course, you are expected to spend
120 study hours. Table 1 gives an estimation of how the 120 study hours could be
accumulated.
Study
Study Activities
Hours
Online participation 12
Revision 15
COURSE OUTCOMES
By the end of this course, you should be able to:
1. Explain the concepts and technology of database, basic database
terminologies, architectural aspects, data models and database software;
2. Describe the technique and methodology that support the process of database
designing;
3. Apply the skill of using the standard methodology to design a relation
database; and
4. Describe the meaning and application of SQL and its importance;
COURSE SYNOPSIS
This course is divided into 8 topics. The synopsis for each topic is presented
below:
Topic 2 introduces the concepts behind the relational model, the most popular
data model at present, and the model most often chosen for standard business
Topic 4 covers the main data definition facilities of the SQL standard. Again, the
topic is presented as a worked tutorial. The topic introduces the SQL data types
and data definition statements, the Integrity Enhancement Feature (IEF) and
more advanced features of the data definition statements, including the access
control statements GRANT and REVOKE. It also examines views and how they
can be created in SQL.
Topic 8 considers database security, not just in the context of DBMS security but
also in the context of security of the DBMS environment. The topic also examines
the security problems that can arise in the Web environment and presents some
approaches to overcome them.
Learning Outcomes: This section refers to what you should achieve after you
have completely covered a topic. As you go through each topic, you should
frequently refer to these learning outcomes. By doing this, you can continuously
gauge your understanding of the topic.
Summary: You will find this component at the end of each topic. This component
helps you to recap the whole topic. By going through the summary, you should
be able to gauge your knowledge retention level. Should you find points in the
summary that you do not fully understand, it would be a good idea for you to
revisit the details in the module.
Key Terms: This component can be found at the end of each topic. You should go
through this component to remind yourself of important terms or jargon used
throughout the module. Should you find terms here that you are not able to
explain, you should look for the terms in the module.
section), at the end of every topic or at the back of the module. You are
encouraged to read or refer to the suggested sources to obtain the additional
information needed and to enhance your overall understanding of the course.
PRIOR KNOWLEDGE
Knowledge of the Windows operating system and Microsoft Access application
is required for this course.
ASSESSMENT METHOD
Please refer to myINSPIRE.
REFERENCES
Connolly, T. M., & Begg, C. E. (2009). Database systems: A practical approach to
design, implementation and management (5th ed.). Boston: Addison-Wesley.
Hoffer, J. A., Prescott, M. B., & Topi, H. (2008). Modern database management
(9th ed.). New Jersey: Prentice-Hall.
Michael, V. M. (2011). Database design, aplication development and
administration (5th ed.). Ediyu.
Post, G. V. (2004). Database management systems: Designing and building
business applications (3rd ed.). New York: McGraw-Hill.
Pratt, P. J., & Last, M. Z. (2008). A guide to SQL (8th ed.). Cengage Learning.
Rob, P., & Coronel, C. (2001). Database systems: Design, implementation and
management (8th ed.). Cengage Learning.
Shelly, G. B. (2011). Discovering computers. Cengage Learning.
INTRODUCTION
Hi there everyone! Have you heard of the word „database‰ or „database
system‰? If you have, then you will have a better understanding of these words
by taking this course. However, if you have not heard of them, then do not
worry. By taking this course, you will be guided towards understanding and
being able to apply these concepts to real world problems.
This topic introduces the area of database management systems, examining the
problems with the traditional file-based systems and discussing what Database
Management System (DBMS) can offer. In the first subtopic, we will discuss
some uses of database systems in everyday life. Then, in the next subtopic, we
Copyright © Open University Malaysia (OUM)
2 TOPIC 1 INTRODUCTION TO DATABASES
will compare the file-based system with database systems. Next, we will discuss
the roles that people perform in the database environment and lastly, we will
discuss the advantages and disadvantages of database management systems.
For your information, all these activities are made possible with the existence of
DBMS. This means that our life is affected by database technology. Computerised
databases are important to the functioning of modern organisations. Well, before
we proceed further, let us take a look at some important definitions.
The number of database applications has increased tremendously over the past two
decades (Jeffrey et. al., 2011). The use of databases to support customer relationship
management, online shopping and employee relationship management is growing.
However, before we discuss this topic any further, let us examine some applications
of database systems that you have probably used without realising that you are
accessing a database system in your daily life such as:
So, do you now realise that you are a user of database systems? Database
technology does not only improve the daily operations of organisations but also
the quality of decisions made. For instance, with the database systems, a
supermarket can keep track of its inventory and sales very quickly. This may
lead to a fast decision in terms of making new orders of products. In this case,
products will always be available for customers. Thus, the business may grow as
customer satisfaction is always met. In other words, there are great advantages to
collecting, managing and interpreting information effectively in todayÊs world.
Today, data can be represented in various forms like sound, images and
video. For instance, you can record your speech into a computer using the
computerÊs microphone. Images taken using a digital camera or scanned
using a scanner can also be transferred into a computer. Actually, there are
so many different types of data around us. Can you name some other data
that you might have used or produced?
Now, the next thing that we will discuss is how we can make our data
meaningful and useful. This can be done by processing it.
Information refers to the data that have been processed in such a way
that the knowledge of the person who uses the data is increased.
Jeffrey et al. (2011)
For instance, the speech that you have recorded and images that you have
stored in a computer, could be used as part of your presentation by using
any presentation software. The speech may include definitions of some
terms that are mentioned in your presentation slides. Thus, by including it
into your presentation, the recorded speech has more meaning and
usefulness. The images could also be sent to your friends through e-mail for
them to view. What this mean is that you have transformed the data that
you have stored into information, once you have done something with it. In
other words, computers process data into information.
The next section will discuss the traditional file-based system, to examine its
limitations, and also to understand why database systems are needed.
SELF-CHECK 1.1
Traditionally, manual files are used to store all internal and external data within
an organisation. These files were stored in cabinets and for security purposes, the
cabinets are locked or located in a secure area. When any information is needed,
one has to search from the first page until the required information is located. To
speed up the searching process, you may create an indexing system to help you
locate information more quickly. You may have such system that store all your
results or important documents.
The manual filing system works well if the number of items stored is not large.
However, this kind of system may fail if you want to cross-reference or process
any of the information in the file. Later, computer-based data processing has
emerged and replaced the traditional filing system with computer-based data
processing systems or file-based systems. However, instead of having a
centralised store for the organisationÊs operational access, a decentralised
approach was taken. In this approach, each department would have their own
file-based system, which they would monitor and control separately.
By looking at Figure 1.1, we can see that the sales executive can store and retrieve
information from the sales files through sales application programmes. The sales
files may consist of information regarding properties, owners and clients. Figure
1.2 illustrates examples of the contents of these three files. Figure 1.3 shows the
contents of the Contract files while Figure 1.4 is for the Personnel files. Notice
that the Client files in the sales and contract departments are the same. What this
mean is that duplication occurs when using decentralised file-based systems.
Figure 1.2: Property, Owner and Client files used by the sales department
Figure 1.3: Lease, Property and Client files used by the contract department
Thus, the Personnel file in Figure 1.4 consists of two records and each record
consists of nine fields. Can you list the number of records and fields in the client
file as shown in Figure 1.3?
Now, let us discuss the limitations of the file-based system. No doubt, file-based
systems proved to be a great improvement over manual filing systems. However,
a few problems still occur with this system, especially if the volume of the data
and information increases.
Well, you can create a temporary file of those clients who have a „house‰ as
the preferred type and search for the available houses from the property
file. Then, you may create another temporary file of those clients who have
an „apartment‰ as the preferred type and do another search. The search
would be more complex if you have to access more than two files, and from
different departments. In other words, the separation and isolation of data
makes the retrieval process time consuming.
SELF-CHECK 1.2
1. What is a file-based system?
2. Explain two limitations of file-based systems.
sharing of data through the organisation, which means that all departments
should be able to integrate and share the same data. The three advantages of
database approach are explained next (refer to Figure 1.6).
Meanwhile, two disadvantages of the database approach are shown in Figure 1.7.
ACTIVITY 1.1
Search the Internet to find out more details on the two disadvantages
listed in Figure 1.7. Discuss your findings with your coursemates in
the forum.
Thus, we can change the internal definition of an object in the database without
affecting the users of the object, provided that the external definition remains the
same. For instance, if we were to add a new field to a record or create a new file,
existing applications would not be affected. More examples of this will be shown
in the next topic.
Some other terms that you need to understand are shown in Table 1.1.
Term Definition
Entity A specific object (for example a department, place or event) in
the organisation that is to be represented in the database
Attribute A property that explains some characteristics of the object that
we wish to record
Relationship An association between entities
By referring to Figure 1.8, we can see that the ER Diagram consists of two entities
(rectangles), namely Department and Staff. It has one relationship, where it indicates
that a department has staff. For each entity, there is one attribute, that is,
DepartmentNo and StaffNo. In other words, the database holds data that is logically
related. More explanations on this will be discussed in later topics.
SELF-CHECK 1.3
1. What is database abstraction?
2. Define entity, attribute and relationships.
Now, let us discuss in detail five common features of DBMS (see Figure 1.9).
way to enter and edit data. Report forms provide easy-to-view results of a
query (Mannino, 2011).
SELF-CHECK 1.4
Who are the people involved in the database environment? Briefly
explain their responsibilities or roles.
The predecessor to the DBMS was the file-based system, where each
programme defines and manages its own data. Thus, data redundancy and
data independence became major problems.
The database approach was introduced to resolve the problems with file-
based system. All access to the database can be made through DBMS.
There are four types of people involved in DBMS environment, which are
data and database administrators, database designers, application designers,
and end users.
3. List two examples of database systems other than those discussed in this
topic.
4. Discuss the main components of the DBMS environment and how they are
related to each other.
6. Study the University Student Affairs case study presented below. In what
ways would a DBMS help this organisation? What data can you identify
that needs to be represented in the database? What relationships exist
between the data items?
Data requirements:
(a) Student
(i) Student identification number
(ii) First and last name
(iii) Home address
(iv) Date of birth
(v) Sex
(vi) Semester of study
(vii) Nationality
(viii) Programme of study
(ix) Recent Cumulative Grade Point Average (CGPA)
INTRODUCTION
Topic 1 was a starting point for your study of database technology. You learned about
the database characteristics and the Database Management System (DBMS) features.
In this topic, you will focus on the relational data model, but before that, let us have a
brief introduction to the model. The relational model was developed by E. F. Codd in
1970. The simplicity and familiarity of the model made it hugely popular, compared to
the other data models that existed during that time. Since then, relational DBMSs have
dominated the market for business DBMS (Mannino, 2011).
In this topic we will explore the relational data model. You will discover that the
strength of this data model lies in its simple logical structure, whereby these
relations are treated as independent elements. You will then see how these
independent elements can be related to one another.
In order to ensure that the data in the database is accurate and meaningful,
integrity rules are explained. We describe to you two important integrity rules,
entity integrity and referential integrity. Finally, we will end the topic with the
concept of views and its purpose.
2.1 TERMINOLOGY
First of all, let us start with the definitions of some of the pertinent terminology.
The relational data model was developed because of its simplicity and its familiar
terminology. The model is based on the concept of a relation, which is physically
represented as a table (Connolly and Begg, 2009). This section presents the basic
terminology and structural concepts of the relational model.
(a) Relation
The relation must have a name that is distinct from other relation names in
the same database. Table 2.2 shows a listing of the two-dimensional table
named Employee, consisting of seven columns and six rows. The heading
part consists of the table name and the column names. The body shows the
rows of the table.
(b) Attribute
In the Employee table (see Table 2.2), the columns for attributes are EmpNo
(Employee Number), Name, MobileTelNo (Mobile Telephone Number),
Position, Gender, DOB (date of birth) and Salary.
You must take note that every column-row intersection contains a single
atomic data value. For example, the EmpNo columns contain only the
number of a single existing employee.
Data types indicate the kind of data for the column (character, numeric, yes
or no, etc) and permissible operations (numeric operations, string
operations) for the column. Table 2.3 lists the five common data types.
Date Date is used to store calendar dates using the year, month and day
fields. For date, the allowable operations include comparing two
dates and generating a date by adding or subtracting a number of
days from a given date
Logical For attributes containing data with two values such as true or
false, or yes or no
The Name and Position attributes are of variable length. These columns
contain only the actual number of characters not the maximum length. As
you can see from the Employee relation, the number of characters in the
Name attribute column varies from 9 to 13, while the number of characters
in the Position attribute column varies from 5 to 13. Finally, the Date
attribute column consists of 10 characters in the format (DD/MM/YY).
The domain concept is important because it allows the user to define the
meaning and source of values that the attributes can hold.
(c) Tuple
SELF-CHECK 2.1
1. What is a relation?
2. What do a column, a row and an intersection represent?
(a) Superkey
A relation can have several candidate keys. When a key consists of more
than one attribute, it is known as composite key. Therefore EmpNo, Name
is a composite key.
However, we cannot discount the possibility that someone who shares the
same name as listed above will become an employee in the future. This may
make the Name attribute an unwise choice as a candidate key because of
duplicates. However, attributes EmpNo and MobileTelNo are suitable
candidate keys as an employeeÊs identification in any organisation, is
unique. MobileTelNo can be picked to be the candidate key because we
know that no duplicate hand phone numbers exist, thereby making it
unique.
You may note that a primary key is a superkey as well. In our employee
table, the EmpNo can be chosen to be the primary key, MobileTelNo then
becomes alternate key.
The addition of SuppNo in both the Supplier and Product tables links each
supplier to details of the products that is supplied. In the Supplier relation,
SuppNo is the primary key. In the Product relation, SuppNo attribute exists
to match the product to the supplier. In the Product relation, SuppNo is the
foreign key. Notice that every data value of SuppNo in Product matches the
SuppNo in Supplier. The reverse need not necessarily be true.
SELF-CHECK 2.2
The standard way of representing a relation schema is to give the name of the relation
followed by attribute names in parenthesis. The primary key is underlined. An
instance of this relational database schema is shown in Figure 2.3.
2.2.1 Nulls
Nulls are not the same as a zeros or spaces as nulls represents the absence of a
value (Connolly and Begg, 2009). For example, in the Invoice relation of the
Order Entry Database, the DatePaid attribute in the second row is null until the
customer pays for the order.
This guarantees the primary key is unique and ensures that foreign keys can
accurately reference primary key values. In the Employee table, the EmpNo is the
primary key. We cannot insert new employee details into the table with a null
EmpNo. The OrderDetail has the composite primary key OrderNo and
ProductNo, so to insert a new row, both values must be known.
For example, in the Order Entry Database, the Product table has the foreign key
SuppNo. You will notice that every entry of SuppNo in the rows of the Product
table matches the SuppNo of the referenced table Supplier. However, we can
create a new product record with a null SuppNo, if currently no suppliers have
been identified to supply the product.
SELF-CHECK 2.3
1. What is a null?
2. Can a primary key value have a null value?
3. What is the value for a foreign key?
2.3 VIEWS
A view is a virtual or derived relation that may be derived from one or more
base relations.
do. Views are dynamic; changes made to the base relations are automatically
reflected in the views (Connolly and Begg, 2009).
SELF-CHECK 2.4
1. What is a view?
2. What can you do with a view?
The relational data model was developed because of its simplicity and
familiar terminology. The model is based on the concept of a relation, which
is physically represented as a table (Connolly & Begg, 2009).
A null represents the absence of a value. The primary key value cannot be
null. A foreign key value must match the primary key value in the related
table or it can be null.
A view is a virtual or derived relation that may be derived from one or more
base relations. Views allow users to customise the data according to their
needs. Views also hide part of the database from certain users, providing
security to the database.
2. What is the difference between a primary key and a candidate key? Give an
example.
Resort consists of resort details and resortNo is the primary key. Room
contains room details for each resort and roomNo, resortNo form the primary
key. Booking contains details of bookings and resortNo, guestNo, dateFrom
forms the primary key. Guest contains guest details and guestNo is the
primary key.
(i) Identify the foreign keys in this schema. Explain how the entity and
referential integrity rules apply to these relations; and
(ii) Produce four sample tables for these relations that observe the relational
integrity rules.
INTRODUCTION
In this topic, you will learn the basic features and functions of Structured Query
Language (SQL). SQL is simple and relatively easy to learn. It is the standard
language for the relational database model for data administration (creating tables,
indexes and views, controlling access) and data manipulation (adding, modifying,
deleting and retrieving data). In this topic, the focus is on the data manipulation.
In this subtopic, we provide an outline description of what SQL is, give the
background and history of SQL, and discuss the importance of SQL to the
database application.
In this topic, we focus only on the DML commands while the DDL commands
will be covered in Topic 4.
Year Description
1970 The relational mode from which SQL draws much of its conceptual
core was formally defined by Dr E. F. Codd, a researcher for IBM.
1974 System/R project began and Structured English Query Language
(SEQUEL) was developed.
1974 – 1975 System/R was implemented on an IBM prototype called SEQUEL-
XRM.
1976 – 1977 System/R was completely rewritten to include multi-table and multi-
user features. When the system was revised, it was briefly called
"SEQUEL/2" and then renamed "SQL" for legal reasons
1983 IBM began to develop commercial products that implemented SQL
based on their System/R prototype, including DB2.
Several other software vendors accepted the rise of the relational model and
announced SQL-based products. These included Oracle, Sybase and Ingres
(based on the University of California's Berkeley Ingres project).
Importance Description
Standard language for relational It has been globally accepted.
database
A powerful data management tool Almost all major database vendors support SQL.
Easy to learn SQL is a non-procedural language. You just need
to know what to be done; you do not need to
know how it is to be done.
SELF-CHECK 3.1
1. Briefly explain SQL.
2. Explain one reason behind the importance of SQL.
(b) The SQL syntax is not case sensitive. Thus, words can be typed in either
small or capital letters.
(d) The SQL notation used throughout this book follows the Backus Naur Form
(BNF) which is described as follows:
(i) Uppercase letters are used to represent reserved words;
(ii) Lower-case letters are used to represent user-defined words;
(iii) A vertical bar ( | ) indicates a choice among alternatives;
(iv) Curly braces ({}) indicate a required element; and
(v) Square brackets ( [ ] ) indicate an optional element.
SELF-CHECK 3.2
1. What does Âcase-sensitiveÊ mean?
2. What is BNF and how is it used in SQL?
Command Details
SELECT Extracts data from a database table
UPDATE Updates data in a database table
DELETE Deletes data from a database table
INSERT INTO Inserts new data into a database table
As mentioned earlier, SQL statements are not case sensitive. In other words,
SELECT is the same as select. In our discussion and illustration of SQL
commands, we will use tables from the previous topic; for example, Table 2.5
from Topic 2 is renamed as Table 3.4 in this topic.
Syntax
SELECT [DISTINCT | ALL] [*] [column_expression [AS new_name]]
FROM tablename [alias] [....]
[WHERE condition]
[GROUP BY column_list] [HAVING condition]
[ORDER BY column_list]
The meanings of clauses used in the SELECT statement are listed in Table 3.5.
Clause Meaning
SELECT Specifies the columns or/and expressions that should be in the output
FROM Indicates the table(s) from which data will be obtained
WHERE Specifies the rows to be used. If not included, all table rows are used
GROUP BY Indicates categorisation of results
HAVING Indicates the conditions under which a category (group) will be
included
ORDER BY Sorts the results according to specified criteria
The order of these clauses cannot be changed. The SELECT and FROM clause
must be used in the SELECT statement while the other clauses are optional. The
results from this statement are in the form of a table. Next, you are going to learn
variations of the SELECT statement.
This query requires us to select all columns and all rows from the Employee
table.
For queries that require listing all columns, the SELECT clause can be shortened
by using asterisks (*). Therefore, you may write the query above as:
SELECT *
FROM Employee;
This query requires selecting all rows but only specific columns from the
Employee table.
This query can be written as follows and the results are as shown in Table 3.8:
SELECT Position
FROM Employee;
Position
Administrator
Salesperson
Manager
Assistant Manager
Clerk
Clerk
The result above contains duplicates, in which the word Clerk is written
twice. What if we only want to select each distinct element of position? This
is easy to accomplish in SQL. All we need to do is to use DISTINCT
keyword after SELECT.
With the statement above, the duplicate is eliminated and we get the results
table as shown in Table 3.9.
Position
Administrator
Salesperson
Manager
Assistant Manager
Clerk
There are five basic search conditions that can be used in a query, as shown
in Table 3.10.
Table 3.10: Five Search Conditions in a Query
Search Description
Condition
Comparison Compares the value of an expression to the value of another
expression
Range Tests whether the value of an expression falls within a specified
range of values
Set membership Tests whether a value matches any value in a set of values
This statement filters all rows based on the condition where the salary is greater
than 1000. The results returned by this statement are shown in Table 3.11.
Table 3.11: Results Table for Example 4
Figure 3.1 shows a list of comparison operators that can be used in the
WHERE clause. In addition, more complex conditions can be generated
using the logical operators AND, OR and NOT.
Operator Description
= Equal
<> or != Not equal
> Greater than
< Less than
>= Greater than or equal
<= Less than or equal
This statement uses the logical operator OR in the WHERE clause to find
employees with the position of Clerk or Salesperson. Table 3.12 shows the
results returned from executing this statement.
So, in this example, the condition in the WHERE clause can also be written
as:
The results from executing both statements are shown in Table 3.13.
Set membership condition (IN) tests whether a value matches any other
value in a set of values. In this query, it finds rows in the Employee table
with position clerk or Salesperson. This statement returns results as shown
in the Table 3.14, which are similar to the results for the query in Example 5.
There is also a negated version (NOT IN) that can be used to list all rows
excluded from the IN list. For instance, if we want to find employees who
are not clerks or supervisors. This query can be expressed as follows and
the results table is shown in the Table 3.15.
Query 8: Find all employees who have Celcom prepaid numbers. In other
words, their hand phone numbers must start with 013.
This statement lists all phone numbers starting with 013, and it does not
matter what numbers or characters follow this.
The results table returned from executing this statement is shown in the
Table 3.16.
SELF-CHECK 3.3
Query 9: List salaries for all employees, arranged in ascending order of salary.
If you want to sort the list in descending order, the word DESC must be specified
in the ORDER BY clause after the column name, as shown below.
Executing this statement will produce results as in Table 3.17 for the ascending
list and Table 3.18 for the descending list.
Query 10: List the employees sorted by position and in each position sort the list
in descending order by salary.
This query requires using two sort keys. The Position is the primary sort key and
the Salary is the secondary or minor sort key. The primary sort key has to be
written first in the list and followed by minor keys. You may have more than one
minor key.
This statement will provide us with the table as shown in Table 3.19.
SELF-CHECK 3.4
1. Write the SELECT statement to display all the information about the
employees, sorted by the names of employees in descending order.
2. Write the SELECT statement to display the employees sorted by
position and in each position, sort the list in ascending order by salary.
In this subtopic, we use Product and Delivery tables, shown in Table 3.21 and
Table 3.22, to illustrate the use of these aggregate functions.
Query 11: Find the number of products supplied by supplier number S8843.
This statement will only count the number of products supplied by a supplier with
supplier number S8843. The label of the column in the output table can be changed
with the use of the US keyword immediately after the column name. In Example 11,
the returned value of this statement is 2, as shown in Table 3.23.
NumOfProduct
2
Query 12: How many different products were delivered from January to April in
the year 2013.
The use of the DISTINCT keyword eliminates duplicate products delivered during the
search condition. The results returned by this statement are shown in the Table 3.24:
NumOfProduct
5
Query 13: Count the number of products which costs less than RM500 per unit
and total its QuantityOnHand.
This statement counts the number of products that cost less than RM500 and
sums up its QuantityOnHand. The result is shown in Table 3.25.
NumOfProduct TotalStock
3 50
Query 14: Find the minimum, maximum and the average price per unit
The results table for the above statement is shown in the Table 3.26.
This statement finds the number of products using the aggregate function
COUNT for each Supplier based on the SuppNo. The results table is shown
in Table 3.27.
SuppNo NumOfProduct
S8843 2
S9888 1
S9898 1
S9995 1
Query 16: Find the OrderNo that has more than one product.
This operation groups the Delivery data based on OrderNo and lists only
those groups that have more than one product. The output of this operation
is shown in Table 3.28.
OrderNo NumOfProduct
1120 3
4399 2
3.3.5 Subqueries
In this subtopic, we are going to learn how to use subqueries. Here, we provide
examples of subqueries that involve the use of the SELECT statement within
another SELECT statement, which is also sometimes referred to as nested
SELECT. In terms of the order of execution, the inner SELECT will be performed
first and the result of the inner SELECT will be used for the filter condition in the
outer SELECT statement.
Query 17: List the product names and price per unit for products that are
supplied by ABX Technics.
First, the inner SELECT statement is executed to get the supplier number of
ABX Technics. The output from this statement is tested as part of the search
condition in the WHERE clause of the outer SELECT statement. Note that
the „=‰ sign has been used in the WHERE clause of the outer SELECT since
the result of the inner SELECT contains only one value.
The final result table from this query is shown in Table 3.29.
List the supplier number, product names and price per unit for products
that are supplied by suppliers from Petaling Jaya.
First, the inner SELECT statement can have more than one value. Therefore,
the „IN‰ keyword is used in the search condition for the WHERE clause in
the outer SELECT statement. The results table for the above statement is
shown in Table 3.30.
This statement joins two tables which are Product and Supplier. Since the
common column in both tables is SuppNo, this column is used for the join
condition in the WHERE clause. The output of this simple join statement is
shown in Table 3.31.
ProductName SupplierName
17 inch Monitor ABX Technics
19 inch Monitor ABX Technics
Laser Printer Soft System
Colour Laser Printer ID Computers
Colour Scanner ITN Suppliers
Query 20: Sort the list of products based on supplier name and for each
supplier name, sort the list based on Product names in descending order.
This statement is similar to the previous example, except it includes the ORDER
BY clause for sorting purposes. The results are sorted in ascending order by
supplier name and for those suppliers that have more than one product, the
product name is sorted in descending order (refer to Table 3.32).
ProductName SupplierName
19 inch Monitor ABX Technics
17 inch Monitor ABX Technics
Laser Printer Soft System
Colour Laser Printer ID Computers
Colour Scanner ITN Suppliers
Query 21: Find the supplier names of product that were delivered in Jan
2013. Sort the list based on Supplier name.
SELECT Supplier.Names AS SupplierName, Product. Name AS ProductName,
DeliveryDate
FROM Supplier s, Product p, Delivery d
WHERE s.SuppNo = p.SuppNo AND p.ProductNo = d.ProductNo AND
(DeliveryDate >= „1-Jan-13‰ and DeliveryDate <= „31-Jan-13‰)
ORDER BY s.Name;
These queries require that we join three tables. All the join conditions are
listed in the WHERE clause. As noted earlier, the common column names
for both tables to be joined need to be used as the join condition. To join the
supplier and product, the supplier number is used and to join the product
and delivery tables, the product number is used. The results from this join
are shown in the Table 3.33.
In this subtopic, you are going to learn SQL commands that are used for
modifying the contents of a table in a database. The SQL commands that are
commonly used are as shown in Figure 3.3.
3.4.1 INSERT
INSERT is used to add new records or data into an existing database table. The
syntax for INSERT command is as follows:
(a) columnList is optional; if omitted, SQL assumes the column list and its
order are similar to the column names that you specified when you first
created the table;
(b) Any columns omitted must have been declared as NULL when table was
created, unless DEFAULT was specified when creating a column; and
Query 22: Add a new record as given below to the Supplier table.
INSERT into Supplier (SupNo, Name, Street, City, Postcode, TelNo, ContactPerson)
VALUES („S9996‰, „NR Tech‰, „20 Jalan Selamat‰, „Kuala Lumpur‰, 62000,
23456677, „Nick‰);
Since you want to insert values for all the columns in the table, you may omit the
column list. Thus, you may write the SQL statement as below:
Note that you must enclose the values of a non-numeric column in quotation
marks, such as „Kuala Lumpur „ for the City. Executing either of these
statements will give us the result in Table 3.35.
Query 23: Add a new record as given below to the Supplier table.
In this example, the data provided is not complete. Some information is missing
such as the postcode and contact person. In this case, you only need to specify
the column names that we are going to use. You may also omit the column list
but you are required to use the NULL value for the column name that has no
value.
3.4.2 UPDATE
The update statement is used to update or change records that match specified
criteria. This is accomplished by carefully constructing a WHERE clause.
UPDATE TableName
SET columnName1 = dataValue1
[, columnName2 = dataValue2...]
[WHERE searchCondition]
Let us look at the variance in the use of UPDATE statements for modifying
values in a table.
Updating may involve modifying a particular column for all records in a table.
Query 24: Increase the salary of each employee according to a 10% pay raise.
UPDATE Employee
SET Salary = Salary*1.10;
If the changes are only for particular rows with specified criteria, then the
WHERE clause needs to be used in the statement. This can be written as follows:
UPDATE Staff
SET Salary = Salary*1.05
WHERE Position = ÂManagerÊ;
Query 26: Update the contact person, Ahmad for Total System.
We may also sometimes need only to update one column for a specific row. For
instance, this query requires us to update the contact person in the Supplier table,
Ahmad for supplier name Total System. Thus, the UPDATE statement for this
query would be as below:
UPDATE Supplier
SET ContactPerson = „Ahmad‰
WHERE Name = ÂTotal SystemÊ;
3.4.3 DELETE
The DELETE statement is used to delete records or rows from an existing table.
(a) TableName can be the name of a base table or an updatable view; and
(b) searchCondition is optional; if omitted, all rows are deleted from the table.
This does not delete the table. If search_condition is specified, only those
rows that satisfy the condition are deleted.
Query 27: Delete supplier name „Total System‰ from the Supplier table.
You need to use the WHERE clause when you want to delete only a specified
record. Thus, the statement would be as given below:
Table 3.40 shows the Supplier table after deleting records of the supplier named
Total System.
If you want to delete all records from the Shipping table, then you skip the
WHERE clause. Thus, the statement would be written as:
This command will delete all rows in the table shipping but it does not delete the
table. This means that the table structure, attributes and indexes will still be
intact.
DML allows you to retrieve, add, modify and delete data from table(s). The
basic DML commands are SELECT, INSERT, UPDATE and DELETE.
The SELECT statement is the most important statement for retrieving data
from the existing database. The result from each query of a SELECT
statement is in the form of a table. A SELECT statement has the following
syntax:
SELECT statements allow us to produce a results table, not only from one
table but from more than one table. When more than one table is involved,
the join operation must be used by specifying the names of tables in the
FROM clause and the join condition in the WHERE clause.
The other SQL DML commands used for data manipulation are the INSERT,
UPDATE and DELETE commands. INSERT is used to insert new row(s) into
the existing table. UPDATE is used to modify value(s) for all columns or a
specified column of an existing table. DELETE is used to delete row(s) from
an existing table.
1. What are the two major components of SQL and what function do they serve?
3. What restrictions apply to the use of the aggregate functions within the
SELECT statement?
4. Explain how the GROUP BY clause works. Identify one difference between
the WHERE and HAVING clauses.
INTRODUCTION
In Topic 3, we have examined Structured Query Language (SQL), particularly the
SQL data manipulation features. By now, you should be comfortable with the
SELECT statement.
In this topic, we will explore the main SQL data definition facilities. We begin
this topic by examining the ISO SQL data types. The Integrity Enhancement
Feature (IEF) improves the functionality of SQL and allows the constraint
checking to be standardised. We will examine required data, domains, entity
integrity and referential integrity constraints. Then, we will discuss the main SQL
data definition features, which include database and table creation, and altering and
deleting of a table. This topic concludes with the creation and removal of views.
(a) SQL identifiers are used to identify the following items in the database:
(i) Table names;
(ii) View names; and
(iii) Attributes (columns).
The character data type is referred to as a string data type while exact numeric
and approximate numeric data types are referred to as numeric data types.
For example, in our Order Entry Database, the EmpNo attribute in the
Employee table has a fixed length of five characters. It is declared as:
EmpNo CHAR(5)
This column has a fixed length of five characters and when we insert
less than five characters, the string is padded with blanks to make up
for up to five characters.
Name VARCHAR(15)
The T value indicates the total number of digits and the R value
indicates the number of digits to the right of the decimal point.
For example, column Salary in the Employee relation can be declared as:
Salary DECIMAL(7,2)
QtyOnHand INTEGER(4)
(e) Date
The date data type is defined in columns such as the DOB (Date of Birth)
column in the Employee table. This is declared in the SQL as:
DOB DATE
SELF-CHECK 4.1
1. Describe SQL identifiers.
2. Identify the data types in SQL.
A null is not a blank or zero and is used to represent data that is not available
or not applicable.
However, some columns must contain some valid data. For example, every
employee in the Employee relation must have a position, whether they are a
salesperson, manager or a clerk. SQL provides the NOT NULL clause in the
CREATE TABLE statement to enforce the required data constraint.
To ensure that the column position of the Employee table cannot be null, we
define the column as:
When NOT NULL is specified, the Position column must have a data value.
To ensure that the gender can only be specified as „M‰ or „F‰, we define the
domain constraint in the Gender column as:
To support the entity integrity, SQL provides the PRIMARY KEY clause in the
CREATE and ALTER TABLE statements. For example, to declare EmpNo as the
primary key, we use the clause as:
A foreign key value in a relation must match a candidate key value of the
tuple in the referenced relation or the foreign key value can be null.
For example, in the Order Entry Database, the Product table has the foreign key
SuppNo. You will notice that every entry of SuppNo in the rows of the Product
table (child table) matches the SuppNo of the referenced table Supplier (parent
table).
SQL supports the referential integrity constraint with the FOREIGN KEY clause
in the CREATE and ALTER TABLE statements. For example, to specify the
foreign key SuppNo of the Product table, we state it as:
(i) CASCADE
Perform the same action to related rows. For example, if a SuppNo in
the Supplier table is deleted, then the related rows in the Product table
will be deleted in a cascading manner.
Copyright © Open University Malaysia (OUM)
78 TOPIC 4 STRUCTURED QUERY LANGUAGE: DATA DEFINITION
(iii) NO ACTION
Reject the delete operation from the parent table. For example, do not
allow the SuppNo in the Supplier table to be deleted if there are
related rows in the Product table.
You must also consider the impact of referenced rows on insert operations.
A referenced row (in the parent table) must be inserted before its related
rows (in the child table). For example, before inserting a row in the Product
table, the referenced row in the Supplier must exist.
SELF-CHECK 4.2
1. How do you define Primary key and Foreign key?
2. Identify the actions supported by SQL. Briefly explain these actions.
(b) Therefore, if the schema is OrderProcessing and the creator is Lim, the SQL
statement is:
The CREATE TABLE statement creates a table consisting of one or more columns
of the defined data type.
The optional DEFAULT clause provides for default values in a column.
Whenever an INSERT statement fails to specify a column value, SQL will use the
default value.
The NOT NULL is specified to ensure that the column must have a data value.
The remaining clauses are constraints and are headed by the clause:
CONSTRAINT constraintname
The PRIMARY KEY clause specifies the column(s) that comprise the primary
key. It is assumed by default that the primary key value is NOT NULL.
The FOREIGN KEY clause specifies a foreign key in the child table and its
relationship to the parent table. This clause specifies the following:
(a) A listofForeignKeyColumns, containing the column(s) that form the foreign
key;
(b) A REFERENCES subclause, indicating the parent table that holds the
matching primary key;
(c) An optional ON UPDATE clause to specify the action taken on the foreign
key value of the child table, if the matching primary key in the parent table
is updated. These actions were discussed in the subtopic 4.2.4; and
(d) An optional ON DELETE clause to specify the action taken on the child
table if the row(s) in the parent table are deleted, where the primary key
values match the foreign key value in the child table. These actions were
discussed in subtopic 4.2.4.
The following three examples show the CREATE TABLE statements for the
Order Entry Database using the tables Customer, Order and OrderDetail.
(a) Creating the Customer table using the features of the CREATE TABLE
statement:
(d) You can now create the rest of the tables in the Order Entry Database as an
exercise.
The DROP COLUMN clause defines the name of the column to be dropped and
has the following options (Connolly and Begg, 2009):
(a) RESTRICT
The DROP operation is rejected if the column is referenced by another
database object.
(b) CASCADE
The DROP operation proceeds and drops the column from any database
items it is referenced by.
For example, if we want to add an extra column, that is, Branch_No to the
Employee table, the SQL statements would be:
The DROP TABLE statement should be carried out with care as the total effect
can be damaging to the rest of the database tables. It is recommended that this
clause be used if a table is created with an incorrect structure. Then, the DROP
TABLE clause can be used to delete this table and the structure created again.
SELF-CHECK 4.3
1. What does the CREATE TABLE statement do?
2. What does the ALTER TABLE statement do?
3. How can we remove a table from the database?
4.4 VIEWS
A view is a virtual or derived relation that may be derived from one or more
base relations.
Views do not physically exist in the database. It allows users to customise the
data according to their needs and hides part of the database from certain users.
Let us look at how views are created.
This will give us a view known as CustomerIpoh with the same column
names as the Customer table but only those rows where the City is Ipoh.
This view is shown below in Table 4.1.
DROP VIEW causes the definition of the view to be deleted from the schema. For
example, to remove the view CustomerIpoh, we specify it in SQL as:
If CASCADE is specified, DROP VIEW deletes all objects that reference the view.
If the RESTRICT option is chosen and other database items that depend on the
existence of the view are dropped, then the command is rejected.
SELF-CHECK 4.4
1. Briefly explain a VIEW.
2. How do you create a VIEW?
SQL identifiers can use the letters a-z (upper and lower), numbers and the ( _ )
character, for table, view and column names. The identifiers cannot be more
than 128 characters, must start with a letter and cannot contain spaces.
The available data types identified in SQL are Boolean, character, exact
numeric, approximate numeric and date.
Required data in SQL are specified by the NOT NULL clause. Domain
constraint is specified using the CHECK clause.
Foreign keys are specified using the FOREIGN KEY clause. The update and
delete actions on referenced rows are specified by the ON DELETE and ON
UPDATE subclauses.
We can remove a table from the database by using the DROP TABLE
statement.
A „view‰ is a derived relation that does not physically exist in the database. It
allows users to customise the data according to their needs.
(c) Discuss how you would maintain this as a materialised view and under
what circumstances you would be able to maintain the view without
having to access the underlying base table Component.
INTRODUCTION
In order to design a database, there must be a clear understanding of how the
business operates, so that the design produced will meet user requirements. The
Entity-Relationship (ER) model allows database designers, programmers and end
users to give their input on the nature of the data and how it is used in the
business. Therefore, the ER model is a means of communication that is non-
technical and easily understood.
In this topic, you are provided with the basic concepts of the ER model, which
enable you to understand the notation of ER diagrams. The CrowÊs Foot notation
is used here to represent the ER diagrams.
Mannino (2011)
Entity type can be physical such as people, places or objects, as well as events
and concepts, such as reservation or course. A full list is given in Table 5.1:
5.2 ATTRIBUTES
The attributes of the entity Customer are CustNo, Name, Street, City, Postcode,
TelNo and Balance.
For example, the notation for the entity Customer with the above attributes is
represented in Figure 5.2. The primary key CustNo is underlined.
5.3 RELATIONSHIPS
Each relationship type is given a name that describes its function. An example of
a relationship type is shown in Figure 5.3.
Consider the example in Figure 5.3, showing the relationship between the
Customer entity and Order entity. In the CrowÊs Foot notation, relationship
names appear on the line connecting the entity types involved in the relationship.
In Figure 5.3, the Makes relationship shows that the Customer and Order entity
types are directly related. The Makes relationship is binary because it involves
two entity types.
Mannino (2011)
Figure 5.4 shows a set of customers {Customer1, Customer2, and Customer3}, a set
of orders {Order1, Order2, Order3, Order4} and connections between the two sets.
In this figure, Customer1 is related to Order1, Order2 and Order3, Customer2 is
not related to any Order entities and Customer3 is related to Order4.
There are three main types of relationship that can exist between entities:
In this relationship, an Employee processes zero, one or more orders and each
Order is processed by one employee. The above Process relationship is optional
to the Employee entity because an Employee entity can be stored without being
related to an Order entity. However, it is mandatory for the Order entity because
an order has to be processed by one employee.
To show minimum and maximum cardinality, the symbols are placed next to
each entity type in the relationship. In Figure 5.9, a customer is related to a
minimum of zero offerings, (circle in the inside position) and a maximum of
many offerings (CrowÊs Foot in the outside position). In the same way an order is
related to exactly one (one and only one) customer as shown by the single
vertical lines in both inside and outside positions. Table 5.2 shows a summary of
cardinality classifications using CrowÊs Foot notation.
In a recursive or unary (degree =1) relationship, there is only one entity involved.
For example, an employee is supervised by a supervisor who is also an
employee.
Strong entities have their own primary key. Examples of strong entities are
Product, Employee, Customer, Order, Invoice, etc. Strong entity types are known
as parent or dominant entities.
Weak entities borrow all or part of the primary keys from another (strong) entity.
As an example, see Figure 5.14 whereby the Room entity is existent-dependent
on the Building entity type. You can only refer a room by providing its associated
building identifier. The underlined attribute in the Room entity is part of the
primary key but not the entire primary key. Therefore, the primary key of Room
entity is the combination of BuildingId and RoomNo.
Supertype is an entity that stores attributes that are common to one or more
entity subtypes. Meanwhile, subtype is an entity that inherits some common
attributes from an entity supertype and then adds other attributes that are unique
to an instance of the subtype.
In the example above, the attributes Empno, Name and HireDate also apply to
subtypes. For example, every entity of Pilot has an Empno, Name and HireDate
because it is a subtype of Employee. Inheritance means that the attributes of a
supertype are automatically part of its subtypes. That is each subtype will inherit
the attributes of the supertype. For example, the attributes of Pilot entity are its
inherited attributes that are Empno, Name and HireDate, as well as its direct
attributes that are PilotLicence and Pilot Ratings. These direct attributes are
known as specialist attributes.
Based on the example in Figure 5.15, the generalisation hierarchy is disjoint (non-
overlapping) because an Employee cannot be a Pilot and at the same time a
Mechanic. The employee must be a Pilot or a Mechanic or an Accountant. To
show the disjoint constraint, D is used as shown in Figure 5.16.
In Figure 5.18, the completeness constraint means every member of Staff must
either be employed as FullTime or PartTime Staff. To show the completeness
constraint, C is used as shown in Figure 5.18.
In contrast, the generalisation hierarchy is not complete if the entity does not fall
into any of the subtype entities. If we consider our previous example in the
Employee generalisation hierarchy as shown in figure 5.16, we note that the
employee roles are pilot, mechanic and accountant. However, if the job role
involves Âoffice administratorÊ, then this entity would fall into any of the
subclasses as it would not have any special attributes. Therefore, the entity type
office administrator would remain in the superclass entity as employee. The
absence of C indicates that the generalisation hierarchy is not complete.
Some generalisation hierarchies have both the disjoint and complete constraints
as shown in Figure 5.19.
ACTIVITY 5.1
ACTIVITY 5.2
The basic symbols are entity types, relationships, attributes and cardinalities
to show the number of entities participating in a relationship.
1. Discuss the entity types that can be represented in an ER model and give
1. examples of entities with a physical existence.
1.
1.
2. Discuss what relationship types can be represented in an ER model and give
1. examples of unary, binary, ternary and quaternary relationships.
INTRODUCTION
In this topic, we introduce the concept of normalisation and explain its
importance in database design. Next, we will present the potential problems in
database design, which are also referred to as update anomalies. One of the main
goals of normalisation is to produce a set of relations that is free from update
anomalies. Then, we go on to discuss the key concept that is fundamental to
understanding the normalisation process, which is functional dependency.
Normalisation involves a step-by-step process or normal forms. This topic will
cover discussion of the normalisation process up to the third normal form.
What is normalisation?
Both of these are worthy goals as they reduce the amount of space a database
consumes and ensure that data is logically stored. If our database is not
normalised, it can be inaccurate, slow, inefficient and it might not produce the
data we expect. Not to mention, if we have a normalised database, queries, forms
and reports are much easier to design!
SELF-CHECK 6.1
1. Define normalisation.
2. Identify two purposes of normalisation.
Why normalisation?
Normalisation is also used to repair a „bad‰ database design, i.e. a set of tables
that exhibit update, delete and insert anomalies. The normalisation process can
be used to change this set of tables to a set that does not have problems.
SELF-CHECK 6.2
1. Briefly explain how normalisation supports database design.
2. Is normalisation a „bottom-up‰ or „top-down‰ approach to
database design? Briefly explain.
The redundant data utilises a lot of unnecessary space and also may create
problems when updating the database, also called update anomalies, which
may lead to data inconsistency and inaccuracy.
To illustrate the problem associated with data redundancy that causes update
anomalies, let us compare the Supplier and Product relations shown in Figure 6.1
with the alternative format that combines these relations into a single relation
called Product-Supplier as shown in Figure 6.2. For the Supplier relation,
supplier number (SuppNo) is the primary key and for Product relation, product
number (ProductNo) is the primary key. For the Product-Supplier relation,
ProductNo is chosen as primary key.
You should notice that in the Product-Supplier relation, details of the supplier
are included for every each product. These supplier details (SName, TelNo and
ContactPerson attributes) are unnecessarily repeated for every product that is
supplied by the same supplier and this leads to data redundancy. For instance,
the product number P2344 and P2346 has the same supplier, thus the same
supplier details for both products are repeated. These supplier details attribute
are also consider as a repeating group. On the other hand, in the Product relation,
only the supplier number is repeated for the purpose to link each product to a
supplier and in the Supplier relation, details of each supplier appear only once.
A relation with data redundancy as shown in Figure 6.2 may result in a problem
called update anomalies, comprising of insertion, deletion and modification
anomalies. In the following subtopic, we illustrate each of these anomalies using
the Product-Supplier relation.
Insert anomalies exist when adding a new record that will cause unnecessary
data redundancy or when there is unnecessary constraint placed on the task
of adding new records.
There are two examples of insertion anomalies for the product-supplier relation
in Figure 6.2:
A deletion anomaly exists when deleting a record that would remove a record
not intended for deletion.
As a result, we will lose all information about this supplier because supplier
S9898 only appears in the tuple that we have removed. In a properly normalised
database, this deletion anomaly can be avoided as the information about supplier
and product is stored in separate relations and they are linked together using
supplier number. Therefore, when we delete product number P5443 from the
Product relation, details about supplier S9898 from the Supplier relation are not
affected.
Copyright © Open University Malaysia (OUM)
TOPIC 6 NORMALISATION 117
Redundant information not only wastes storage but makes updates more
difficult. This difficulty is called modification anomaly.
For example, changing the name of the contact person for supplier S9990 would
require that all tuples containing supplier S9990 be updated. If for some reason,
all tuples are not updated, we might have a database that has two different
names of contact persons for supplier S9990.
Since our example only involves a small relation, this does not seem to be a big
problem. However, its effect would be very significant when we are dealing with
a very large database.
Before we discuss the details of the normalisation process, let us look at the
functional dependency concept, which is an important concept in the
normalisation process.
SELF-CHECK 6.3
1. Briefly explain data redundancy.
2. Give one example on how data redundancy can cause update
anomalies.
3. Briefly differentiate between insertion anomalies, deletion
anomalies and modification anomalies.
For a simple illustration of this concept, let us use, for instance, a relation with
attributes A and B. B is functionally dependent on A, if each value of A is
associated with exactly one value of B. This dependency between A and B is
written as „AB‰.
We may think of how to determine functional dependency like this: given a value
for attribute A, can we determine the single value for B? If B relies on A, then A is
said to functionally determine B. The functional dependency between attribute A
and B is represented diagrammatically in Figure 6.4.
Now, let us look at the CustomerOrdering relation as shown in Figure 6.3 to find
the functional dependencies. First, we consider the attributes CustNo and
CustName. It is true that for a specific CustNo, it can only be associated with one
value of CustName. In other words, the relationship between CustNo and
CustName is one to one (for each customer number, there is only one name).
Thus, we can say that CustNo determines CustName or CustName is
functionally dependent on CustNo. This dependency can be written as CustNo
CustName.
Let us try another example, the relationship between CustNo and OrderNo.
Based on the CustomerOrdering relation, a customer may make more than one
order. Thus, a CustNo may be associated with more than one OrderNo. In other
Say, for example, the following functional dependency that exists in the
ConsumerOrdering relation, that is, (OrderNo, ProductNo) CustNo.
Say, for example, consider the following functional dependencies that exist in the
ConsumerOrdering relation:
OrderNoCustNo,
OrderNo CustName
CustNo CustName
Therefore, the OrderNo attribute functionally determines the CustName via the
CustNo attribute.
Now, let us list down all the possible functional dependencies for the
CustomerOrdering relation. We will get a list of functional dependencies as listed
in Figure 6.7.
In order to find the candidate key(s), we must identify the attribute (or group of
attributes) that uniquely identifies each tuple in a relation. Therefore, to identify
the possible choices of candidate keys, we should examine the determinants for
each functional dependency. Then, we select one of them (if more than one) as
the primary key. All attributes that are not the primary key attribute are referred
to as non-key attributes. These non-key attributes must be functionally
dependent on the key.
Now, let us identify the candidate keys for the CustomerOrdering relation. We
have identified the functional dependencies for this relation as given in Figure
6.8. The determinants for these functional dependencies are: CustNo, OrderNo,
ProductNo and (OrderNo, ProductNo). From this list of determinants, the
(OrderNo, ProductNo) is the only determinant that uniquely identifies each tuple
in the relation. It is also true that all attributes (besides the OrderNo and
ProductNo) are functionally dependent on the determinants with the
combination of attributes OrderNo and Product (OrderNo, ProductNo). Thus, it
is the candidate key and the primary key for the CustomerOrdering relation.
The normalisation process involves a series of steps and each step is called a
normal form. Three normal forms were initially proposed called First Normal
Form (1NF), Second Normal Form (2NF) and Third Normal Form (3NF).
Higher normal forms that go beyond BCNF were introduced later, such as
Fourth Normal Form (4NF) and Fifth Normal Form (5NF). However, these later
normal forms deal with situations that are very rare. In this topic, we will only
cover the first three normal forms. Figure 6.8 illustrates the process of
normalisation up to third normal form.
The details of the process will be discussed in the following subtopic. Let us
assume that we have transferred all the required attributes from the user
requirements specification into the table format and referred it as
CustomerOrdering table as shown in Table 6.3. We are going to use the
CustomerOrdering table to illustrate the normalisation process.
In order for us to transform the unnormalised table into a normalised table, two
steps need to be performed, which are:
(a) Nominate an attribute or group of attributes to act as the key for the
unnormalised table; and
(b) Identify the repeating groups(s) in the unnormalised table which repeats
for the key attribute(s).
Then, after performing one of the above approaches, we need to check whether
the relation is in the 1NF. We have to follow these rules:
(a) Identify the key attribute;
(b) Identify the repeating groups; and
(c) Place the repeating groups into a separate relation along with a copy of its
determinants.
The process above must repeat to all the new relations created for the repeating
attributes to ensure that all relations are in 1NF.
For example, let us use the first approach by entering the appropriate value into
each cell of the table. Then, we will select a primary key for the relation and
check for repeating groups. If there is a repeating group, then we have to remove
the repeating group and place it into a new relation.
The first step is to check whether the table is unnormalised or already in the 1NF.
We will use CustomerOrdering table to illustrate the normalisation process. First,
we select a primary key for the table, which is CustNo. Then, we need to look for
repeating groups or multi-valued attributes. We can see that ProductNo,
ProductName, UnitPrice and QtyOrdered have more than one value for CustNo
= „C1010‰ and „C2388‰. So, these attributes are repeating groups and thus, the
table is unnormalised.
As illustrated in Figure 6.8, our next step is to transform this unnormalised table
into 1NF. First, we need to make the table into a normalised relation. Let us apply
the first approach in which we need to fill up all the empty cells with a relevant
value as shown in Table 6.4. Each cell in the table now has an atomic value.
The next step is to check if the table we just created is in 1NF. Firstly, we need to
identify the primary key for this table and then, check for repeating groups. The
best choice would be to look at the list of functional dependencies that you have
identified. From the functional dependency list, we can say that the combination
of OrderNo and ProductNo (OrderNo, ProductNo) functionally determines all
the non-key attributes in the table.
This means that the value of each (OrderNo, ProductNo) is associated with only
a single value of all other attributes in the table and (OrderNo, ProductNo) also
uniquely identifies each of the tuples in the relation. Thus, we can conclude that
(OrderNo, ProductNo) is the best choice as the primary key, since the relation
will not have any repeating groups. Therefore, this relation is in 1NF (refer to
Table 6.5).
Let us now transform the data in Table 6.5 to 2NF. The first step is to examine
whether the relation has partial dependency. Since the primary key chosen for
the relation is a combination of two attributes, therefore we should check for
partial dependency. From the list of functional dependencies, attributes
ProductName and UnitPrice are fully functionally dependent on part of the
primary key which is the ProductNo. While CustNo, CustName, TelNo and
OrderDate are fully functionally dependent on part of the primary key, which is
the OrderNo.
Thus, this relation is not in 2NF and we need to remove these partial dependent
attributes into a new relation along with a copy of their determinants. Therefore,
we have to remove ProductName and UnitPrice into a new relation, along with
their determinant which is ProductNo. We also need to remove CustNo,
CustName, TelNo and OrderDate into another new relation, along with the
determinant OrderNo. After performing this process, 1NF Customer Ordering
relation is now broken down into three relations, which can be named as
Product, Order and OrderProduct as listed in Figure 6.9.
Figure 6.9: Second Normal Form relations derived from CustomerProduct relation
Since we made changes to the original relation and have created two new
relations, we need to check and ensure that each of these relations is in 2NF.
Based on the definition of 2NF, these relations must first be checked with the 1NF
test for repeating groups, and then checked for partial dependency. All these
relations are in 1NF as none of them has any repeating groups.
For relations Order and Product, we may skip the partial dependency test as
their primary key only has one attribute. Thus, both of the relations are already
in 2NF. For the OrderProduct relation, there is only one non-key attribute which
is QtyOrdered and this attribute is fully functionally dependent on (OrderNo,
ProductNo). Thus, this relation is also in 2NF.
Now, let us look at all the three 2NF relations as shown in Figure 6.9. Since we
are looking for functional dependency between two non-key attributes, we can
say that the relation OrderProduct is already in 3NF. This is because this relation
only has one non-key attribute which is the QtyOrdered. We need to check for
relation Product and Order, as both of these relations have more than one non-
key attribute.
Now, let us check the other three relations. All of them have no transitive
dependencies. Therefore, we conclude that these relations are in 3NF, as shown
in Figure 6.10.
Figure 6.10: Third Normal Form relations derived from the CustomerProduct relation
The 1NF eliminates duplicate attributes from the same relation, creates a
separate relation for each group of related data and identifies each tuple with
a unique attribute or set of attributes (the primary keys).
The 2NF will remove subsets of data that apply to multiple rows of a table,
place them in separate tables, and create relationships between these new
relations and the original relation by copying the determinants of the partial
dependency attributes to the new relations.
The 3NF will remove columns that are not dependent upon the primary key,
which is the functional dependency between the two non-key attributes.
Refer to the following figure and convert this user view to a set of 3NF relations.
XYZ COLLEGE
CLASS LIST
SEMESTER SEPT 2013
INTRODUCTION
In this topic, we will describe three main phases of database design methodology
for relational databases. These phases are namely conceptual, logical and
physical database design. The conceptual design phase focuses on building a
conceptual data model which is independent of software and hardware
implementation details.
The logical design phase maps the conceptual model on to a logical model of a
specific data model but this is done independently of the software and physical
consideration. The physical design phase is tailored to a specific Database
Management System (DBMS) and focus on the hardware requirements. The
detailed activities associated to each of these phases will be discussed.
Normally, a design methodology is broken down into phases or stages and for
each phase, the detailed steps are outlined, and appropriate tools and techniques
are specified. Design methodology is able to support and facilitate designers in
planning, modelling and managing a database development project in a
structured and systematic manner. Validation is one of the key aspects in the
design methodology as it helps to ensure that the produced models are
accurately representing the user requirementsÊ specification.
(a) Conceptual
The conceptual database design aims to produce a conceptual
representation of the required database. The core activity in this phase
involves the use of Entity-Relationship (ER) modelling in which the entities,
relationship and attributes are defined.
(b) Logical
In the logical design phase, the aim is to map the conceptual model which is
represented by the ER model to the logical structure of the database.
Among the activities involve in this phase is the use of the normalisation
process to derive and validate relations.
(c) Physical
In the physical design phase, the emphasis is to translate the logical
structure of the physical implementation of the database using the defined
database management system.
Besides the three main phases above, this methodology has also outlined the
eight core steps. Step 1 focuses on the conceptual database design phase; Step 2
focuses on the logical database design phase and Step 3 to Step 8 focus on the
physical database design phase. This topic will only cover Steps 1 to Step 6 (refer
to Table 7.1).
Table 7.1: Six Steps of Design Methodology
Step Description
1 Build a Conceptual Data Model
(a) Identify the entity types
(b) Identify the relationship types
(c) Identify and associate the attributes with entity or relationship types
(d) Determine the attribute domains
(e) Determine the candidate, primary and alternate key attributes
(f) Consider the use of enhanced modelling concepts (optional step)
(g) Check the model for redundancy
(h) Validate the conceptual model against user transactions
(i) Review the conceptual data model with user
2 Build and Validate a Logical Data Model
(a) Derive the relations for logical data model
(b) Validate the relations using normalisation
(c) Validate the relations against user transactions
(d) Check the integrity constraints
(e) Review the logical data model with user
(f) Merge the logical data models into global model (optional step)
3 Translate a Logical Database Design for a Target DBMS
(a) Design the base tables
(b) Design the representation of derived data
(c) Design the remaining business rules
4 Design the File Organisations and Indexes
(a) Analyse the transactions
(b) Choose the file organisation
(c) Choose the indexes
(d) Estimate the disk space requirements
5 Design the User Views
6 Design a Security Mechanism
These factors serve as a guideline for designers and they need to be incorporated
into the database design methodology.
SELF-CHECK 7.1
1. Briefly explain what design methodology is.
2. Identify the phases of design methodology.
3. Identify three critical success factors in database design.
For our product ordering case study, we have identified the following
relationships:
(i) Between Customer and Order: Customer makes Order;
(ii) Between Product and Order: Order has Product;
(iii) Between Supplier and Product: Supplier supplies Product;
(iv) Between Order and Invoice: Order has Invoice;
(v) Between Employee and Order: Employee takes Order; and
(vi) Between Order and Delivery: Order is sent for Delivery.
(c) Step 1c: Identify and Associate the Attributes with Entity or Relationship Types
After identifying the entity and relationship types, the next step is to
identify their attributes. It is important to determine the type of these
attributes.
Again, we need to document the details of each identified attribute. For our
case study, the list of attributes for the defined entities is as follows:
(e) Step 1e: Determine the Candidate, Primary and Alternate Key Attributes
As we have mentioned in Topic 2, a relation must have a key that can
uniquely identify each of the tuples. In order to identify the primary key,
we need to first determine the candidate key for each of the entity types.
The primary keys for each of the entity types are underlined in the list below.
(f) Step 1f: Consider the Use of Enhanced Modelling Concepts (Optional Step)
This step is involved with the use of enhanced modelling concepts such as
specialisation or generalisation, aggregation and composition. These
concepts are beyond the scope of our discussion.
(h) Step 1h: Validate the Conceptual Model Against User Transactions
We have to ensure that the conceptual model supports the transactions
required by the user view.
(i) Step 1i: Review the Conceptual Data Model with User
User involvement during the review of the model is important to ensure
that the model is a „true‰ representation of the userÊs view of the enterprise.
SELF-CHECK 7.2
technique used for the validation in the logical design phase. The following are
the activities involved in this phase.
(a) Step 2a: Derive the Relations for the Logical Data Model
Firstly, we create a set of relations for the logical data model based on the
ER model produced in the prior design phase to represent the entities,
relationships and key attributes.
Examining our ER diagram from the previous phase as shown in the Figure
7.1, we found that two of the relationships have many-to-many
relationships. These relationships are between the Order and Product, and
between the Order and Delivery. A many-to-many relationship needs to be
converted into two one-to-many relationships. Resulting from these
changes, our new ER diagram is as shown in Figure 7.2.
(e) Step 2e: Review the Logical Data Model with Users
In this step, we need to let the users review the logical data model to ensure
that the model is the true representation of the data requirements of their
enterprise. This is to ensure that the users are satisfied and we can continue
to the next step.
(f) Step 2f: Merge the Logical Data Models into the Global Model (Optional
Step)
This step is important for a multi-user view. Since each user view will have
its own conceptual model, or referred to as a local conceptual model, each
of these models will be mapped to a separate local logical data model.
During this step, all the local logical models will be merged into one global
logical model. Since we consider our case study as a single user view, this
step is skipped.
SELF-CHECK 7.3
The output from the logical design phase consists of all the documents that
provide a description of the process of the logical data model such as ER
diagrams, relational schema and data dictionary. These are important sources for
the physical design process. Unlike the logical phase, which is independent of the
DBMS and implementation considerations, the physical phase is tailored to a
specific DBMS and is dependent on the implementation details.
Connolly and Begg (2009) have outlined six steps of the physical phase, from
Step 3 to Step 8. For our discussion of this phase, we will only present Step 3 to
Step 6 as follows:
In the logical design phase, the aim is to map the conceptual model, which is
represented by the ER model, to the logical structure of the database. Among
the activities involved in this phase is the use of normalisation process to
derive and validate relations.
1. Discuss the important role played by users in the process of database design.
2. How would you check a data model for redundancy? Give an example to
illustrate your answer.
INTRODUCTION
In this topic, we will discuss database security. What do you think about security
in general? Do you feel safe at home or on the road? What about database
security. Do you think that database security is important? What is the value of
the data? What if your personal data or your financial data is stolen? Do you
think that harm could come to you? I am sure that some of you have watched spy
movies where computer hackers hack a computer system to access confidential
data and use this data for various purposes. These are some of the questions that
you might need to think and consider.
Well, now let us focus on our topic. Database security involves protecting a
database from unauthorised access, malicious destruction and even any
accidental loss or misuse. Due to the high value of data in corporate databases,
there is strong motivation for unauthorised users to gain access to the data, for
instance, competitors or dissatisfied employees.
business transactions and even customersÊ credit card numbers. They may not
only steal the valuable information, but in fact, if they have access to the
database, they may even destroy it and great havoc may occur (Mannino, 2011).
Furthermore, the database environment has grown more complex where access
to data has become more open through the Internet and mobile computing. Thus,
you can imagine the importance of having database security.
Security is a broad subject and involves many issues like legal and ethical issues.
Of course, there are a few approaches that can be applied in order to maintain
database security. However, before talking about the ways to protect our
database, let us first discuss various threats to a database in more detail in the
next subtopic.
ACTIVITY 8.1
Visit the following website that discusses the balance between the roles
and rights related to database security:
http://databases.about.com/od/security/a/databaseroles.htm
Write a one-page report discussing your opinion about the article.
Whether the threat is intentional or unintentional, the impact may be the same.
The threats may be caused by a situation or event that involves a person, action
or circumstance that is likely to produce harm to someone or to an organisation.
Threats to data security may be a direct and intentional threat to the database.
For instance, those who gain unauthorised access to a database like computer
hackers, may steal or change the data in the database. They would have to have
special knowledge in order to do so. Table 8.1 illustrates five types of threats and
12 examples of threats (Connolly and Begg, 2009).
SELF-CHECK 8.1
1. Define a threat.
2. Differentiate between tangible and intangible harm. Give two
examples of each.
8.2.1 Authorisation
Usually, a user or subject can gain access to or a system through individual user
accounts, where each user is given a unique identifier, used by the operating
system to determine that they have the authorisation to do so. The process of
creating the user accounts is usually the responsibility of a system administrator.
Associated with each unique user account is a password, chosen by the user and
known to the operating system.
A separate but similar process would be applied to give the authorised user
access to a Database Management System (DBMS). This authorisation is the
responsibility of a database administrator. In this case, an authorised user to a
system may not necessarily have access to a DBMS or any associated application
programs (Connolly and Begg, 2009).
The DBMS keeps track of how these privileges are granted to users and possibly
revoked, and ensures that at all times, only users with necessary privileges can
access an object.
8.2.3 Views
In other words, a view is created by querying one or more of the base tables,
producing a dynamic result table for the user at the time of the request (Hoffer et
al., 2008).
The user may be allowed to access the view but not the base tables which the
view is based on. The view mechanism hides some parts of the database from
certain users and the user is not aware of the existence of any attributes or rows
that are missing from the view. Thus, a user is allowed to see what they need to
see only. Several users may share the same view but only limited users may be
given the authority to update the data.
Backup is very important for a DBMS to recover the database following a failure
or damage.
A DBMS should provide four basic facilities for backup and recovery of a
database as follows:
8.2.5 Encryption
Data encryption can be used to protect highly sensitive data like customer credit
card numbers or user passwords. Some DBMS products include encryption
routines that would automatically encode the sensitive data when they are stored
or transmitted over communication channels. For instance, encryption is usually
used in electronic funds transfer systems. Therefore, if the original data or plain
text is RM5000, it may be encrypted using a special encryption algorithm that
would be changed to XTezzz. Any system that provides encryption facility must
also provide the decryption facility to decode the data that has been encrypted.
There are two common forms or encryption as follows (Hoffer et al., 2008):
(a) One-Key
With one-key approach, also known as Data Encryption Standard (DES),
both the sender and the receiver need to know the key that is used to
scramble the transmitted or stored data.
(b) Two-Key
A two-key approach, also known as asymmetric encryption, employs a
private and a public key. This approach is popular in e-commerce
applications for transmission security and database storage of payment
data such as credit card numbers.
The five hardware components that should be fault-tolerant are (Connolly and
Begg, 2009):
(a) Disk drives;
(b) Disk controllers,
(c) Central Processing Unit (CPU);
(d) Power supplies; and
(e) Cooling fans.
One way to handle fault-tolerance is the use of RAID. These disks are organised
to improve performance. The performance can be increased through data
stripping where the data is segmented into equal-size partitions, distributed
across multiple disks. This looks like the data is stored in a single large disk but
in fact, the data is distributed across several smaller disks, being processed in
parallel (Connolly and Begg, 2009).
SELF-CHECK 8.2
Thus, only users who key in the correct password can open the database.
However, once a database is open, all the objects in the database can be accessed.
Therefore, it is advisable to change the password regularly.
SELF-CHECK 8.3
How do you set the password to open an existing database in Microsoft
Office Access?
Another issue that needs to be considered in the web environment is that the
information being transmitted may have executable content. An executable
content can perform the following malicious actions (Connolly and Begg, 2009):
(a) Destroy data or programs;
(b) Reformat a disk completely;
(c) Shut down the system; and
Nowadays, malware or malicious software like computer viruses and spams are
widely spread.
Spam is unwanted e-mail that we receive without knowing who the sender is
or without wanting to receive the e-mail.
Their presence could fill up electronic mail inboxes and we would be just wasting our
time deleting them. Thus, the next subtopic will discuss some of the methods to secure
the database in a web environment (refer to Figure 8.3).
If a proxy server cannot fulfil the requests itself, then it will pass the request to
the web server. Thus, actually its main purpose is to improve performance. For
instance, assume that User 1 and User 2 access the web through a proxy server.
When User 1 requests a certain web page and later User 2 requests the same
page, the proxy server would just fetch the page that has been residing in the
cache page. Thus, the retrieval process would be faster.
Besides that, proxy servers can also be used to filter requests. For instance, an
organisation might use a proxy server to prevent its employees or clients from
accessing certain websites. In this case, the known bad websites or insecure
websites can be identified and access to these sites can be denied (Connolly and
Begg, 2009).
8.4.2 Firewalls
A digital signature can be used to verify that data comes from authorised
senders.
It also provides the receiver with the ways to decode a reply. A digital certificate
could be applied from a Certificate Authority. The Certificate Authority issues an
encrypted digital certificate that consists of the applicantÊs public key and
various other identifications of information. The receiver of an encrypted
message uses the Certificate AuthorityÊs public key to decode the digital
certificate attached to the message (Connolly and Begg, 2009).
ACTIVITY 8.2
The security measures associated with DBMS on the web include proxy
servers, firewalls, digital signatures and digital certificates.
Authorisation Encryption
Authentication Firewalls
Cold backup Hot backup
Decryption Redundant Array of Independent
Disks (RAID)
Digital certificates
Recovery
Digital signatures
Threat
OR
Thank you.