8 Database Hodders
8 Database Hodders
Key terms
Database – a structured collection of items of data that can be accessed by different
applications programs.
Relational database – a database where the data items are linked by internal pointers.
Table – a group of similar data, in a database, with rows for each instance of an entity and
columns for each attribute.
Record (database) – a row in a table in a database.
Field – a column in a table in a database.
Tuple – one instance of an entity, which is represented by a row in a table.
Entity – anything that can have data stored about it, for example, a person, place, event, thing.
Attribute (database) – an individual data item stored for an entity, for example, for a person,
attributes could include name, address, date of birth.
Candidate key – an attribute or smallest set of attributes in a table where no tuple has the same
value.
Primary key – a unique identifier for a table. It is a special case of a candidate key.
Secondary key – a candidate key that is an alternative to the primary key.
Foreign key – a set of attributes in one table that refer to the primary key in another table.
Relationship – situation in which one table in a database has a foreign key that refers to a
primary key in another table in the database.
Referential integrity – property of a database that does not contain any values of a foreign key
that are not matched to the corresponding primary key.
Index (database) – a data structure built from one or more columns in a database table to
speed up searching for data.
Entity-relationship (E-R) model or E-R diagram – a graphical representation of a database
and the relationships between the entities.
Normalisation (database) – the process of organising data to be stored in a database into two
or more tables and relationships between the tables, so that data redundancy is minimised.
First normal form (1NF) – the status of a relational database in which entities do not contain
repeated groups of attributes.
Second normal form (2NF) – the status of a relational database in which entities are in 1NF
and any non-key attributes depend upon the primary key.
Third normal form (3NF) – the status of a relational database in which entities are in 2NF
and all non-key attributes are independent.
Composite key – a set of attributes that form a primary key to provide a unique identifier for a
table.
8.1.1 The limitations of a file-based approach
A file is a collection of items of data. It can be structured as a collection of records, where each
record is made up of fields containing data about the same ‘thing’. Individual elements of data
can be called data items.
When a program is used for data processing, the organisation of any records used depends on
how the program is written. Records can be fixed or variable in length and each record may also
contain information about its structure, for example, the number of fields or the length of the
record. If these records are to be processed by another program, that program must be written to
use the exact same record structure. If the structure is changed by one program, the other
program must be rewritten as well. This can cause problems if updating programs is not carefully
managed.
For example, a business keeps separate payroll files and sales files. Each file is used by a
different application.
Several problems have occurred using this file-based approach. The name of a member of staff
and their staff number are stored twice. The way the staff name is stored is different for each
program. If the staff number was changed by the payroll program and not by the sales program,
these fields may contain different values for the same member of staff. The fields in the two files
are also in a different order: the staff number is the fifth field in the payroll record and the second
field in the sales file.
A file-based approach is limited because
• storage space is wasted when data items are duplicated by the separate applications and some
data is redundant
• data can be altered by one application and not by another; it then becomes inconsistent
• enquiries available can depend on the structure of the data and the software used so the data is
not independent.
ACTIVITY 8A
Match the problems with the payroll and sales system to the limitations of a file-based
approach set out above.
8.1.2 The advantages of a relational database over a
file-based approach
What is a database? There are many different definitions of a database, such as:
… A (large) collection of data items and links between them, structured in a way that allows it to be accessed by a number of
different applications programs. The term is also used loosely to describe any collection of data.
BCS Glossary of Computing, 14th Edition
… An electronic filing cabinet which allows the user to perform various tasks including: adding new empty files, inserting
data into existing files, retrieving data from existing files, updating data in existing files and cross-referencing data in files.
An Introduction to Database Systems (sixth edition) by CJ Date
The problems that occurred using the file-based approach have been solved. The name of a
member of staff and their staff number are only stored once. So, any changes made to the data by
the payroll application will be seen by the sales processing application and vice versa. The fields
are the same and in the same order.
A database approach is beneficial because
• storage space is not wasted as data items are only stored once, meaning little or no redundant
data
• data altered in one application is available in another application, so the data is consistent
• enquiries available are not dependent on the structure of the data and the software used, so the
data is independent.
8.1.3 Relational database model terminology
In order to rigorously define the structure of a relational database we need to be able to
understand and use the terminology associated with a relational database.
A relational database data structure can look similar to a file-based structure as it also consists of
records and fields. A table is a group of similar data, in a database, with rows for each instance
of an entity and columns for each attribute. A record is a row in a table in a database. A field is a
column in a table in a database.
For example, a database of students in a school could contain the table Student with a record for
each student that contains the fields First Name, Second Name, Date of Birth and Class ID.
Now data is independent of the program processing it. The terms record and field are also used in
file processing, so there is more rigorous terminology used specifically for relational databases.
Files of data are replaced by tables, with each row of a table representing a record (a tuple,
sometimes called a logical record or an occurrence of an entity). Each column of the table is an
attribute that can also be referred to as a field.
An entity is anything that can have data stored about it, such as a person, place, event or object.
An attribute is an individual data item stored for an entity; to use the same example as before, for
a student attributes could include first name, second name, date of birth and class. As stated
before, a table is a group of similar data, in a database, with rows for each instance of an entity
and columns for each attribute. A tuple is one instance of an entity, which is represented by a
row in a table.
Data is shared between applications using the database. In order to ensure the consistency of data
updating is controlled or automatic, so that any copies of a data item are changed to the new
value. Also, in order to reduce the number of copies of a data item to a minimum, a relational
database uses pointers between tables. These pointers are keys that provide relationships between
tables.
There are different types of keys.
• A candidate key is an attribute or smallest set of attributes in a table where no tuple has the
same value.
• A primary key is a unique identifier for a table, it is a special case of a candidate key.
• A secondary key is a candidate key that is an alternative to the primary key.
• A foreign key is a set of attributes in one table that refer to the primary key in another table.
For example, a database of chemical elements contains a table Elements with attributes Symbol,
Name and Atomic Weight. As all these attributes are unique to each element, all are candidate
keys. One of these could be chosen as the primary key, for example Symbol. Then the other two
attributes, Name and Atomic Weight, would be secondary keys.
Most tables have only one candidate key, which is used as the primary key. For example, the
student table could have an extra attribute Student ID, which is unique to each student.
Relationships
A relationship is formed when one table in a database has a foreign key that refers to a primary
key in another table in the database. In order to ensure referential integrity the database must
not contain any values of a foreign key that are not matched to the corresponding primary key.
Most databases include more than one table. For example, a school database could contain the
table Student and another table Class that contains the Class ID, the Teacher Name and
Location of classroom. Only values for Class ID that are stored in the Class table can be used as
the foreign key in the Student table.
Table 8.5 Part of a table for student entity
Relationships may be mandatory or optional. For example, in a workroom with desks, each
employee has one desk, but there could be spare desks. The relationship between desk and
employee is zero or one, so this relationship is optional. The relationship between mother and
child is mandatory because every mother must have at least one child, so the relationship is one
or many. The type of relationship and whether it is mandatory or optional gives the cardinality of
the relationship. The cardinality of relationships is shown in Figure 8.4.
ACTIVITY 8B
The School database will also include the following details about each teacher:
• teaching licence number
• date of birth
• address.
A teacher can have more than one class. A table Teacher is to be added.
List the attributes for this new table. Show the change that should be made to the attributes in
the Class table.
Draw the new E-R diagram for the three tables in the database.
EXTENSION ACTIVITY 8A
In small groups, identify suitable entity relationships for each example of cardinality shown
above. Explain your findings to another group or the whole class.
8.1.5 The normalisation process
Normalisation is used to construct a relational database that has integrity and in which data
redundancy is reduced. Tables that are not normalised will be larger. As more data is stored, it
will be harder to update the database when changes are made and more difficult to extract the
required data to answer queries.
For example, if the School database is held in a single table it could look like this:
Table 8.7
This could cause problems when alterations are made to the records. Every time a new student is
added, the teacher’s name, address, licence number, date of birth, and the location of the
classroom need to be added as well. If Mr Khan leaves the school and is replaced by another
teacher, then every record containing his name and other details needs to be changed. If all the
students from Class 7B leave, then all the details about Class 7B will be lost.
The rules for normalisation are set out as follows.
1 First normal form (1NF) – entities do not contain repeated groups of attributes.
2 Second normal form (2NF) – entities are in 1NF and any non-key attributes depend upon the
primary key. There are no partial dependencies.
3 Third normal form (3NF) – entities are in 2NF and all non-key attributes are independent.
The table contains no non-key dependencies.
When the database is in 3NF, all attributes in a table depend upon the key, the whole key and
nothing but the key.
The School database also includes subject choices for each student. For this database to be
normalised, the process is:
Table 8.8
ACTIVITY 8C
Construct an E-R diagram to represent the database structure of the fully normalised school
database shown above.
EXTENSION ACTIVITY 8B
Discuss any other possible problems that could occur with this database.
Hint: look at the subject table and think about subjects that could have more than one teacher
or different levels. Identify an improved database structure that could solve the problem.
The School database example showed at each stage why the database was not normalised. Here
is another example for you to try.
A database has been set up as a single table to store employees of a business and their contacts.
Part of the database is shown below.
Table 8.12 Un-normalised employee database
This table is not in 1NF because there are repeating attributes and the table is not in 3NF because
there are non-key dependencies. The employee database can be represented as:
EMPLOYEE(EmployeeNumber, EmployeeName, Position, ContactNumber, ContactName,
ContactEmailAddress).
Where EmployeeNumber is the primary key ContactNumber, ContactName and
ContactEmailAddress may be repeated as often as required.
ACTIVITY 8D
Normalise the Employee database and show the new tables. Draw the E-R diagram for the
normalised database.
ACTIVITY 8E
1 a) i) Describe the limitations of a file-based approach to storage and retrieval of data.
ii) Give two benefits of using a database management system.
b) A new relational database is to be developed. The developer needs to produce a
normalised database design.
i) Explain what is meant by normalisation.
ii) Describe the process of normalisation.
2 A warehouse stores parts for cars for several manufacturers. A database stores the following
data for each part:
Part number, part description, date last ordered, minimum order level, manufacturer name,
manufacturer address, manufacturer contact details, position in warehouse, number in stock
a) Design a fully normalised database for the parts.
b) Draw the E-R diagram.
8.2 Database management systems (DBMSs)
WHAT YOU SHOULD ALREADY KNOW
Try these two questions before you read the second part of this chapter.
1 a) Name a database management system (DBMS) you have used.
b) Describe three tasks that you have used the DBMS for.
2 Most DBMSs include back-up procedures and access rights to keep the data secure.
a) Describe what is meant by back-up.
b) Describe what is meant by access rights.
c) How do these features help to keep data secure?
Key terms
Database management system (DBMS) – systems software for the definition, creation and
manipulation of a database.
Data management – the organisation and maintenance of data in a database to provide the
information required.
Data dictionary – a set of data that contains metadata (data about other data) for a database.
Data modelling – the analysis and definition of the data structures required in a database and
to produce a data model.
Logical schema – a data model for a specific database that is independent of the DBMS used
to build that database.
Access rights (database) – the permissions given to database users to access, modify or delete
data.
Developer interface – feature of a DBMS that provides developers with the commands
required for definition, creation and manipulation of a database.
Structured query language (SQL) – the standard query language used with relational
databases for data definition and data modification.
Query processor – feature of a DBMS that processes and executes queries written in
structured query language (SQL).
8.2.1 How a DBMS addresses the limitations of a file-
based approach
Data redundancy issue
This is solved by storing data in separate linked tables, which reduces the duplication of data as
most items of data are only stored once. Items of data used to link tables by the use of foreign
keys are stored more than once. The DBMS will flag any possible errors when any attempt is
made to accidentally delete this type of item.
Query processor
The query processor takes a query written in SQL and processes it. The query processor includes
a DDL interpreter, a DML compiler and a query evaluation engine. Any DDL statements are
interpreted and recorded in the database’s data dictionary. DML statements are compiled into
low level instructions that are executed by the query evaluation engine. The DML compiler will
also optimise the query.
ACTIVITY 8F
1 a) Describe how a DBMS overcomes the limitations of a file-based approach to the storage
and retrieval of data.
b) Describe how a DBMS ensures that data stored in a database is secure.
2 a) Describe three features provided by a DBMS.
b) A school stores timetabling data for all pupils and classes.
Which features could a DBMS use to ensure that the administrators, teachers and pupils
can only see the information available to them?
8.3 Data definition language (DDL) and data
manipulation language (DML)
WHAT YOU SHOULD ALREADY KNOW
Try this exercise before you read the third part of this chapter.
Using a DBMS with a graphical user interface (GUI), create the student database used in
Section 8.1.5.
Write the following queries using a query-by-example form.
1 A list of all the teachers and their subjects.
2 A list of the pupils in class 7A in alphabetical order of second name.
3 A list of the students studying each subject.
You may want to save this database to practise your SQL commands.
Key terms
Data definition language (DDL) – a language used to create, modify and remove the data
structures that form a database.
Data manipulation language (DML) – a language used to add, modify, delete and retrieve the
data stored in a relational database.
SQL script – a list of SQL commands that perform a given task, often stored in a file for
reuse.
8.3.1 Industry standard methods for building and
modifying a database
DBMSs use a data definition language (DDL) to create, modify and remove the data structures
that form a relational database. DDL statements are written as a script that uses syntax similar to
a computer program.
DBMSs use a data manipulation language (DML) to add, modify, delete and retrieve the data
stored in a relational database. DML statements are written in a script that is similar to a
computer program.
These languages have different functions: DDL is used for working on the relational database
structure, whereas DML is used to work with the data stored in the relational database.
Most DBMSs use structured query language (SQL) for both data definition and data
manipulation. SQL was developed in the 1970s and since then it has been adopted as an industry
standard.
8.3.2 SQL (DDL) commands and scripts
In order to be able to understand and write SQL, you should have practical experience of writing
SQL scripts. There are many applications that allow you to do this. For example, MySQL and
SQLite are freely available ones. When using any SQL application it is important that you check
the commands available to use as these may differ slightly from those listed below.
You will need to be able to understand and use the following DDL commands.
You also need to be familiar with the following data types used for attributes in SQL.
Here are some examples of DDL that could have been used when the school database was
created.
ACTIVITY 8G
Create the Teacher table and add the Licence Number as a foreign key to the Class table.
8.3.3 SQL (DML) commands and scripts
In order to be able to understand and write SQL, you should have practical experience of writing
SQL scripts and queries. There are many applications that allow you to do this. Again, MySQL
and SQLite are freely available ones. You can also write SQL commands in Access. When using
any SQL application, it is important that you check the commands available to use as these may
differ slightly from those listed below.
You will need to be able to understand and use the following DML commands.
Here are some examples of DML that could have been used to query and update the school
database.
This query will show, in alphabetical order of second name, the first and second names of all
students in class 7A:
This query will show the teacher’s name and the subject taught:
ACTIVITY 8H
Create a query to show each student’s First Name, Second Name and the subjects studied by
each student.
If the values for all the columns are not known, then the table columns need to be specified
before the values are inserted:
These statements will delete the specified row(s) from the Student table (take care: DELETE FROM
Student will delete the whole table!):
Table: Programmer
FirstName Team
c) i) A relationship between the two tables has been implemented.
Explain how this has been done.
[2]
ii) Explain why the Program table is not in third normal form (3NF).
[2]
iii) Write the table definitions to give the database in 3NF.
[2]
2 A school stores a large amount of data. This includes student attendance, qualification, and
contact details. The school’s software uses a file-based approach to store this data.
a) The school is considering changing to a DBMS.
i) State what DBMS stands for.
[1]
ii) Describe two ways in which the Database Administrator (DBA) could use the DBMS
software to ensure the security of the student data.
[4]
iii) A feature of the DBMS software is a query processor.
Describe how the school secretary could use this software.
[2]
iv) The DBMS has replaced software that used a file-based approach with a relational
database.
Describe how using a relational database has overcome the previous problems
associated with a file-based approach.
[3]
b) The database design has three tables to store the classes that students attend.
STUDENT(StudentID, FirstName, LastName, Year, TutorGroup)
CLASS(ClassID, Subject)
CLASS-GROUP(StudentID, ClassID)
Primary keys are not shown.
There is a one-to-many relationship between CLASS and CLASS-GROUP.
i) Describe how this relationship is implemented.
[2]
ii) Describe the relationship between CLASS-GROUP and STUDENT.
[1]
iii) Write an SQL script to display the StudentID and FirstName of all students who are
in the tutor group 10B.
Display the list in alphabetical order of LastName.
[4]
iv) Write an SQL script to display the LastName of all students who attend the class
whose ClassID is CS1.
[4]
Cambridge International AS & A Level Computer Science 9608
Paper 12 Q8 June 2016