What Is Data
What Is Data
A. What is Data?
1. Database - an Explanation
2. Keys
3. Summary
B. The Relational Database Model
1. Database Example
2. Database Queries
C. The Need for Normalisation
1. Complex Entities
2. Un-Normalised Form (UNF)
3. Converting to First Normal Form (1NF)
4. Converting to Second Normal Form (2NF)
5. Convert to Third Normal Form (3NF)
6. Normalisation Activity
D. Entity Model Diagrams
1. Drawing the Entity Model
E. The Data Dictionary
There are numerous examples of data we use in our everyday lives. We can think of data as
the information that we need in order to do our job, organise ourselves, or even carry out our
daily activities.
However, for data to be easily useable, it needs to be organised and this is the purpose of
creating and using databases. Let us take an example. For the DVD collection mentioned above,
what information would we hold about each DVD? It may be as follows:
Title
Year
Genre (eg: comedy, drama, scifi)
Director
Runtime
Certificate
Principal actors
If we were to create a record like this for each DVD in our collection then we have, in fact,
created a database.
Activity
What information does your telephone account provider need in order to create your phone bill
or statement? List all the items you can think of.
What is Data?
Database - an Explanation
Let us continue with the DVD example from the previous screen. Imagine you have created,
perhaps on individual cards, a record for each DVD in your collection. There is a lot of
information held in that set of cards and it would be very useful if we could interrogate it easily.
For example we could "ask" our set of cards some questions such as:
"Who directed Alien?"
"Give me a list of all films with runtime greater than 120 minutes"
Activity
See if you can come up with another set of queries similar to the above.
The problem we have is accessing the information in our set of cards quickly and easily. You will
appreciate that to answer the queries above we would probably have to read every card in the
collection in order to ensure that the query was successful. This could be very time-consuming if
you have a large DVD collection.
What is Data?
Keys
In addition it would be a good idea to give each DVD a serial number to identify each uniquely.
This is because it is possible for more than one film to have the same name. (Check out Cape
Fear, made in 1962 and remade in 1991). This unique number we usually refer to as the "Key".
Each item in the supermarket has a bar-code, The bar code represents the product-code.
The product-code is the key of the item in the database.
Your telephone has a unique number which no-one else has. This is the key of your
phone's record held by your service provider.
At work you will usually be known by your name. But it is essential to uniquely identify
each employee individually since more than one person can have the same name. This is
usually implemented be allocating each employee an Employee Number. This is the key
of your employee record.
Activity
For the following items list the information we would store for each and identify the key for each.
We hold information about objects such as Product, Employee, DVD, etc. These items of
information (such as Product-Code, Product-Name) we actually call Attributes.
One of the attributes is usually designed to uniquely identify an occurrence of the object.
We call this attribute the Key. For example Employee-Number, DVD-Serial-Number,
Product-Code.
However, holding data on cards or some similar medium is not efficient in today's technological
world and nowadays we hold our sets of data in a product known as a Database package.
(In fact our DVD records, discussed above, also form a database and the medium is cards).
As far as electronic databases are concerned there are several architectures in existence.
However in this unit our design and implementation of databases will use what is called
the Relational Database Model.
The Relational Database Model
This is an approach where we consider our collections of information about objects as Tables. A
table is another name for Relation hence the term Relational database. Here is our DVD
collection held as a table:
This is a database comprising a single table representing the data we hold on all of our DVDs.
You can think of a table as a template or place-holder. A database may contain more than one
table. As we will see relational database packages such as Microsoft's Access provide facilities for
us to create tables such as this.
You will see that we can ask this database questions (queries) such as "Give me a list of all
films directed by Stephen Spielberg" which would return the result Jaws and Jurassic Park.
Another query might be "Display a list of SciFi files with runtime greater than 120 minutes"
which would produce the result The Matrix and Jurassic Par
The Relational Database Model
Database Example
Here is an example of a database with two tables. This example represents a trip to the
supermarket! The first table, PURCHASES, represents the items the customers are buying and
the second table, PRODUCT, contains details of the items.
There are three customers in the store at the moment (note the three transaction
numbers)
Customers are buying a different number of items
In order to produce the customer's till receipt we need to use both of the tables
The tables both contain Product Code
The Relational Database Model
Database Queries
In the next section we will begin to learn how to design databases like this. For the moment,
however, we will concentrate on looking at how queries are handled. Let us take the query
"Produce the till receipt for customer transaction CF235692".
We then search for the next occurrence of Transaction Code CF235692 and find that the
customer has also ordered Product JL225. Using this information to access the Product table we
search down until we find the associated product details: "Chemofed Farm Assured Chicken
£2.99".
There are no further items ordered so the customer's bill will be calculated and the till receipt
finalised. (We will learn how to do calculations later).
Activity
Show the processing required to answer the query: "Which customer ordered the "Talkia T600
Mobile Phone".
Note that, even though more than one customer is ordering items AF123 and ZE228, the details
of these products are only stored once in the database. This is because we have carefully
designed the database using a technique called Normalisation which we will learn in the next
section.
The Need for Normalisation
We need to ensure that all of our data structures are able to be implemented in a relational
database. This is not a problem if the structure is what we term Simple - for example Customer:
Customer Number
Name
Address
Telephone
Credit Limit
Item Ordered
Quantity
Price
This can easily be implemented in a database table. However a customer may order several
items and each customer in our database may order a different number of items. This situation
makes it difficult for us to implement the data in a relational database since we do not know how
many Order entries to allow.
For example we may have the situation where most of our customers only order a single item,
several are ordering 2 or 3 and one is ordering 20! In this case we would have to allow for up to
20 orders in our table for every customer but most are only ordering one product. This is very
wasteful of space and would make the database perform very poorly.
The document above represents a College's record of all the courses it offers - one document for
each course. Students may take several courses and Tutors may be in charge of more than one
course. You will notice that certain data repeats more than once (Student No, Student Name,
Date of Birth, Gender and Last Attendance Date) - this is therefore a Complex entity since
different courses will have different numbers of students. The structure needs to be converted
into its Simple form. The structure will be converted in stages - these stages are called Normal
Forms. Here is a summary of the Normal Forms:
The 3NF version is the simple version of the structure which can be implemented in the
database.
The Need for Normalisation
Un-Normalised Form (UNF)
To produce the Un-Normalised Form (UNF)of an entity we must:
We carry out the process using a special document, as shown below. The keys are shown by a
tick (() and the repeating group of attributes is shown bracketed.
The Need for Normalisation
Converting to First Normal Form (1NF)
To convert the entity to First Normal Form (1NF) we must:
We now have two entities linked by the Course Code key. This means that each student in the
class is represented by an entry in the second table so now there is no problem with varying
numbers of students
The Need for Normalisation
Converting to Second Normal Form (2NF)
We have solved the problem of the inability to store the document in database tables but have
introduced a new problem. Consider the table we produced on the previous page. Click on the
following link to see it again:
Note that the student's details (Student No, Name, Date of Birth, Gender and Last
Attendance Date) are stored in this table. But if the student enrolled on another course then
the same information would be stored again. We only need to store a student's details once in
our database. This situation occurs because some of the non-key attributes in this table are
referenced by part of the table's key - not the whole key. This problem can be dealt with by
applying the following rules:
In our example you will note that Student Name, Date of Birth and Gender are referenced
by Student No while Last Attendance Date is referenced by Course Code and Student
Number (because it is the date for that student on that particular course). The document below
shows this stage of Normalisation:
We now have three tables in our database. Note that the first table is already in Second Normal
Form because is only has a single-part key.
The Need for Normalisation
Convert to Third Normal Form (3NF)
Sometimes within an entity we can find that there exists a "key" and "dependent" relationship
between a group of non-key attributes. In our example above it is obvious in table 1 that this
relationship exists between Tutor Id and Tutor Name. In this case they are removed to form a
new table. If we did not perform the 3NF conversion then the course tutor's details (in this case,
Name only) would be repeated each time this tutor's courses were stored. Here is the process:
3NF Solution
Note that the entities are related to each other mainly by sharing keys (main keys and foreign
keys). This is an important feature which enables us to navigate through the database and
process queries. For any sizable database it is helpful to show these relationships in a diagram
known as the Entity Model. The diagram has two symbols:
The connection shown above is called a one-to-many relationship, the left end being
the one and the right the many.
The diagram is constructed by linking together entities that are related to each other by way of
their keys. It is normal to construct the diagram in a top-down left-to-right manner so that it
reads naturally just as if we were reading a page of text or screen. We would normally begin with
any entity which has foreign keys and link it to its partner. In our example this means that we
would begin with Tutor and Course and link them as shown in the adjacent diagram.
This tells us two things: firstly it states that "one Tutor <has some relationship to> many
Courses". Translating this we could say "A Tutor teaches many Courses". (As we go through this
process you will note that as we create the relationships we are also creating factual statements
about our system as it exists).
The second thing this diagram tells us is that Course contains the key of Tutor since the "many"
symbol is attached to the Course entity. This is a crucial concept and it is recommended
that you use this method for constructing the diagram.
We can now complete the diagram linking entities according to this rule remembering we
navigate top-to-bottom and left-to-right.
the Classlist entity contains the keys of Student and Course (check this from the 3NF
description)
Activity
Using the Mosspark Surgery case study you completed in the last section, create the associated
Entity Model form the 3NF descriptions.
The Data Dictionary
We have now created the 3NF descriptions of our data. We called these descriptions Entities.
However if we are going to implement them in a Relational Database we usually call
them Tables as you will recall from earlier. The main reason we call them Entities at all is that
they may be implemented in a variety of applications including programming languages such as
Java, Visual Basic or C# and these languages use different terms for the implementations.
Regardless of what package you use we will have to tell it some more information about the
entity so that it can be implemented properly. The typical information we usually have to specify
for each attribute is as follows:
Type refers to the data the attribute will hold. This can be numeric, text, currency, date,
for example although there are many others.
Size refers to the number of characters required for the attribute. For example a National
Insurance Number is 9 characters long.
Range refers to the allowable set of values. For example our range of valid Product-
Codes may be AA001-ZZ999.
The collection of entities we are going to implement along with each one's list of attributes with
all the relevant information is called the Data Dictionary. This is the stage directly before
implementation. This is usually recorded on a special document. Click on the link below to see an
example of this using the entity Student:
Activity
1. Complete the Data Dictionary entries for the other entities in Mosspark Community
College case study.
Complete the Data Dictionary entries for the Mosspark Surgery case study.
The Data Dictionary
In the next section we will create the tables in Microsoft Access and learn how to enter the data
into the table.
Creating the Database Tables
We are now going to create the database tables using Microsoft Access. It is assumed that you
know how to start the package so you should create a new database
called MossparkCommunityCollege and navigate to Create Table in the Design View page.
Create the four tables from the 3NF descriptions. Click on the following link to see what the
database information page should look when this is done:
Click the following link to see the design view of the Classlist table. Note the two-part key:
After the tables have been created you should create the relevant Relationships. Base this on
the Entity Model you created in earlier. Click the link below to see what it should look like:
Relationships
Congratulations, you've now reached the end of this section. You can go back to any topic by
using the menu on the left-hand side of the screen, or return to the home page for the unit by
clicking the HN Computing logo in the top left-hand corner.