0% found this document useful (0 votes)
59 views

What Is Data

This document provides an overview of relational databases and normalization. It discusses what data is and how it can be organized into databases and tables. Key concepts explained include database structure, primary keys, queries, and normalization to break entities into multiple tables in first, second, and third normal form to avoid data redundancy and anomalies. Examples of database tables are provided for a DVD collection and supermarket purchases to illustrate relational databases and how queries can retrieve related data from multiple tables.

Uploaded by

Viraj G.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views

What Is Data

This document provides an overview of relational databases and normalization. It discusses what data is and how it can be organized into databases and tables. Key concepts explained include database structure, primary keys, queries, and normalization to break entities into multiple tables in first, second, and third normal form to avoid data redundancy and anomalies. Examples of database tables are provided for a DVD collection and supermarket purchases to illustrate relational databases and how queries can retrieve related data from multiple tables.

Uploaded by

Viraj G.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Table of Contents

A. What is Data?
1. Database - an Explanation
2. Keys
3. Summary
B. The Relational Database Model
1. Database Example
2. Database Queries
C. The Need for Normalisation
1. Complex Entities
2. Un-Normalised Form (UNF)
3. Converting to First Normal Form (1NF)
4. Converting to Second Normal Form (2NF)
5. Convert to Third Normal Form (3NF)
6. Normalisation Activity
D. Entity Model Diagrams
1. Drawing the Entity Model
E. The Data Dictionary

F. Creating the Database Tables


What is Data?

There are numerous examples of data we use in our everyday lives. We can think of data as
the information that we need in order to do our job, organise ourselves, or even carry out our
daily activities.

Some examples of data that we use are:

 details of the products available at the supermarket


 our mobile phone account details
 bills such as gas, electricity, etc
 our DVD movie collection

However, for data to be easily useable, it needs to be organised and this is the purpose of
creating and using databases. Let us take an example. For the DVD collection mentioned above,
what information would we hold about each DVD? It may be as follows:

 Title
 Year
 Genre (eg: comedy, drama, scifi)
 Director
 Runtime
 Certificate
 Principal actors

If we were to create a record like this for each DVD in our collection then we have, in fact,
created a database.

Activity

What information does your telephone account provider need in order to create your phone bill
or statement? List all the items you can think of.
What is Data?
Database - an Explanation
Let us continue with the DVD example from the previous screen. Imagine you have created,
perhaps on individual cards, a record for each DVD in your collection. There is a lot of
information held in that set of cards and it would be very useful if we could interrogate it easily.
For example we could "ask" our set of cards some questions such as:

"Give me a list of all movie titles in my collection".

"Give me a list of all movies directed by Stanley Kubrick".

"Who directed Alien?"

"What films has Marlon Brando appeared in?"

"Give me a list of all films with runtime greater than 120 minutes"

"What science fiction films do I have directed by Ridley Scott?

Activity

See if you can come up with another set of queries similar to the above.

The problem we have is accessing the information in our set of cards quickly and easily. You will
appreciate that to answer the queries above we would probably have to read every card in the
collection in order to ensure that the query was successful. This could be very time-consuming if
you have a large DVD collection.
What is Data?
Keys
In addition it would be a good idea to give each DVD a serial number to identify each uniquely.
This is because it is possible for more than one film to have the same name. (Check out Cape
Fear, made in 1962 and remade in 1991). This unique number we usually refer to as the "Key".

This is an important concept. Let us look at several examples:

 Each item in the supermarket has a bar-code, The bar code represents the product-code.
The product-code is the key of the item in the database.

 Your telephone has a unique number which no-one else has. This is the key of your
phone's record held by your service provider.

 At work you will usually be known by your name. But it is essential to uniquely identify
each employee individually since more than one person can have the same name. This is
usually implemented be allocating each employee an Employee Number. This is the key
of your employee record.

Activity

For the following items list the information we would store for each and identify the key for each.

 Each product in the supermarket


 Each employee in the organisation
 Patient details at your doctor's surgery

Student details for each student in a college


What is Data?
Summary
We have learned several facts in this section:

 We hold information about objects such as Product, Employee, DVD, etc. These items of
information (such as Product-Code, Product-Name) we actually call Attributes.

 One of the attributes is usually designed to uniquely identify an occurrence of the object.
We call this attribute the Key. For example Employee-Number, DVD-Serial-Number,
Product-Code.

 We can ask questions of our set of data.

However, holding data on cards or some similar medium is not efficient in today's technological
world and nowadays we hold our sets of data in a product known as a Database package.

(In fact our DVD records, discussed above, also form a database and the medium is cards).

As far as electronic databases are concerned there are several architectures in existence.
However in this unit our design and implementation of databases will use what is called
the Relational Database Model.
The Relational Database Model

This is an approach where we consider our collections of information about objects as Tables. A
table is another name for Relation hence the term Relational database. Here is our DVD
collection held as a table:

Serial Title Year Director Genre Runtime Certificate Actor 1 Actor 2


No
0001 Alien 1979 Ridley SciFi 117 18 John Sigourney
Scott Hurt Weaver
0002 Godfather 1971 Stanley Drama 175 18 Marlon Al Pacino
Kubrick Brando
0003 Jaws 1975 Stephen Drama 124 15 Robert Richard
Spielberg Shaw Dreyfuss
0004 The 1999 Andy SciFi 136 15 Keanu Laurence
Matrix Wachowski Reeves Fishburne
0005 Jurassic 1993 Stephen SciFi 127 PG Sam Laura
Park Spielberg Neill Dern
0006 Life of 1979 Terry Jones Comedy 94 15 Graham John
Brian Chapman Cleese

This is a database comprising a single table representing the data we hold on all of our DVDs.
You can think of a table as a template or place-holder. A database may contain more than one
table. As we will see relational database packages such as Microsoft's Access provide facilities for
us to create tables such as this.

You will see that we can ask this database questions (queries) such as "Give me a list of all
films directed by Stephen Spielberg" which would return the result Jaws and Jurassic Park.

Another query might be "Display a list of SciFi files with runtime greater than 120 minutes"
which would produce the result The Matrix and Jurassic Par
The Relational Database Model
Database Example
Here is an example of a database with two tables. This example represents a trip to the
supermarket! The first table, PURCHASES, represents the items the customers are buying and
the second table, PRODUCT, contains details of the items.

We can see several facts regarding this database:

 There are three customers in the store at the moment (note the three transaction
numbers)
 Customers are buying a different number of items
 In order to produce the customer's till receipt we need to use both of the tables
 The tables both contain Product Code
The Relational Database Model
Database Queries
In the next section we will begin to learn how to design databases like this. For the moment,
however, we will concentrate on looking at how queries are handled. Let us take the query
"Produce the till receipt for customer transaction CF235692".

First we go to the Purchases table and search down until we find Transaction Code CF235692.


From there we can see that this customer has bought item AF123. We then use this information
to access the Product table - we search down until we find AF123 and access the Product
Name "Stuffalot Dog Treats 1Kg" and the Product Price "£1.99". This information can then be
added to the customer's bill and printed on the receipt.

We then search for the next occurrence of Transaction Code CF235692 and find that the
customer has also ordered Product JL225. Using this information to access the Product table we
search down until we find the associated product details: "Chemofed Farm Assured Chicken
£2.99".

There are no further items ordered so the customer's bill will be calculated and the till receipt
finalised. (We will learn how to do calculations later).

Activity

Show the processing required to answer the query: "Which customer ordered the "Talkia T600
Mobile Phone".

Note that, even though more than one customer is ordering items AF123 and ZE228, the details
of these products are only stored once in the database. This is because we have carefully
designed the database using a technique called Normalisation which we will learn in the next
section.
The Need for Normalisation

We need to ensure that all of our data structures are able to be implemented in a relational
database. This is not a problem if the structure is what we term Simple - for example Customer:

 Customer Number
 Name
 Address
 Telephone
 Credit Limit
 Item Ordered
 Quantity
 Price

This can easily be implemented in a database table. However a customer may order several
items and each customer in our database may order a different number of items. This situation
makes it difficult for us to implement the data in a relational database since we do not know how
many Order entries to allow.

For example we may have the situation where most of our customers only order a single item,
several are ordering 2 or 3 and one is ordering 20! In this case we would have to allow for up to
20 orders in our table for every customer but most are only ordering one product. This is very
wasteful of space and would make the database perform very poorly.

A structure like this is not called Simple - it is called Complex. A complex entity must be


converted to simple entities before it can be implemented in the database. This conversion
process is called Normalisation.
The Need for Normalisation
Complex Entities

The document above represents a College's record of all the courses it offers - one document for
each course. Students may take several courses and Tutors may be in charge of more than one
course. You will notice that certain data repeats more than once (Student No, Student Name,
Date of Birth, Gender and Last Attendance Date) - this is therefore a Complex entity since
different courses will have different numbers of students. The structure needs to be converted
into its Simple form. The structure will be converted in stages - these stages are called Normal
Forms. Here is a summary of the Normal Forms:

 Un-Normalised Form (UNF)


 First Normal Form (1NF)
 Second Normal Form (2NF)
 Third Normal Form (3NF)

The 3NF version is the simple version of the structure which can be implemented in the
database.
The Need for Normalisation
Un-Normalised Form (UNF)
To produce the Un-Normalised Form (UNF)of an entity we must:

 list the attributes of the entity


 identify the main key
 identity the repeating group of attributes
 identify its key

We carry out the process using a special document, as shown below. The keys are shown by a
tick (() and the repeating group of attributes is shown bracketed.
The Need for Normalisation
Converting to First Normal Form (1NF)
To convert the entity to First Normal Form (1NF) we must:

 remove the repeating group of attributes to form a new entity


 add to it the original key

The following document shows this stage of Normalisation

We now have two entities linked by the Course Code key. This means that each student in the
class is represented by an entry in the second table so now there is no problem with varying
numbers of students
The Need for Normalisation
Converting to Second Normal Form (2NF)
We have solved the problem of the inability to store the document in database tables but have
introduced a new problem. Consider the table we produced on the previous page. Click on the
following link to see it again:

First Normal Form (1NF)

Note that the student's details (Student No, Name, Date of Birth, Gender and Last
Attendance Date) are stored in this table. But if the student enrolled on another course then
the same information would be stored again. We only need to store a student's details once in
our database. This situation occurs because some of the non-key attributes in this table are
referenced by part of the table's key - not the whole key. This problem can be dealt with by
applying the following rules:

 Examine tables with a composite key (a key made up of two parts)


 For each non-key attribute, determine if its key is the first part, or the second part, or
if neither then the answer is both parts
 Remove the partial key and its dependents to form a new table

In our example you will note that Student Name, Date of Birth and Gender are referenced
by Student No while Last Attendance Date is referenced by Course Code and Student
Number (because it is the date for that student on that particular course). The document below
shows this stage of Normalisation:

Second Normal Form

We now have three tables in our database. Note that the first table is already in Second Normal
Form because is only has a single-part key.
The Need for Normalisation
Convert to Third Normal Form (3NF)
Sometimes within an entity we can find that there exists a "key" and "dependent" relationship
between a group of non-key attributes. In our example above it is obvious in table 1 that this
relationship exists between Tutor Id and Tutor Name. In this case they are removed to form a
new table. If we did not perform the 3NF conversion then the course tutor's details (in this case,
Name only) would be repeated each time this tutor's courses were stored. Here is the process:

 Identify any dependencies between non-key attributes within each table


 Remove them to form a new table
 Promote one of the attributes to be the key of the new table
 This becomes the Foreign Key link in the original table (shown with a *).

The following document shows this stage of Normalisation:


The Need for Normalisation
Normalisation Activity
Now try an example yourself. Convert the data structure below to Third Normal Form:

Please Note: Patients may only have ONE visit per day


Click on the link below to take another look at the Third Normal Form (3NF) solution to the
Normalisation example we looked at in the last section:

3NF Solution

Note that the entities are related to each other mainly by sharing keys (main keys and foreign
keys). This is an important feature which enables us to navigate through the database and
process queries. For any sizable database it is helpful to show these relationships in a diagram
known as the Entity Model. The diagram has two symbols:

A box to represent each entity:

and a connector to represent each


relationship:

The connection shown above is called a one-to-many relationship, the left end being
the one and the right the many.

The diagram is constructed by linking together entities that are related to each other by way of
their keys. It is normal to construct the diagram in a top-down left-to-right manner so that it
reads naturally just as if we were reading a page of text or screen. We would normally begin with
any entity which has foreign keys and link it to its partner. In our example this means that we
would begin with Tutor and Course and link them as shown in the adjacent diagram.
This tells us two things: firstly it states that "one Tutor <has some relationship to> many
Courses". Translating this we could say "A Tutor teaches many Courses". (As we go through this
process you will note that as we create the relationships we are also creating factual statements
about our system as it exists).

The second thing this diagram tells us is that Course contains the key of Tutor since the "many"
symbol is attached to the Course entity. This is a crucial concept and it is recommended
that you use this method for constructing the diagram.
We can now complete the diagram linking entities according to this rule remembering we
navigate top-to-bottom and left-to-right.

From this diagram we can make


several statements:

 a Tutor teaches many Courses

 a Student appears on many Classlists

 the Course entity contains the key of Tutor

 the Classlist entity contains the keys of Student and Course (check this from the 3NF
description)

 a Course contains a Classlist (remember - "many" means 1, 2, 3,.....)

Activity

Using the Mosspark Surgery case study you completed in the last section, create the associated
Entity Model form the 3NF descriptions.
The Data Dictionary

We have now created the 3NF descriptions of our data. We called these descriptions Entities.
However if we are going to implement them in a Relational Database we usually call
them Tables as you will recall from earlier. The main reason we call them Entities at all is that
they may be implemented in a variety of applications including programming languages such as
Java, Visual Basic or C# and these languages use different terms for the implementations.
Regardless of what package you use we will have to tell it some more information about the
entity so that it can be implemented properly. The typical information we usually have to specify
for each attribute is as follows:

 Type refers to the data the attribute will hold. This can be numeric, text, currency, date,
for example although there are many others.

 Size refers to the number of characters required for the attribute. For example a National
Insurance Number is 9 characters long.

 Range refers to the allowable set of values. For example our range of valid Product-
Codes may be AA001-ZZ999.

 Comments refers to any relevant information we wish to add.

The collection of entities we are going to implement along with each one's list of attributes with
all the relevant information is called the Data Dictionary. This is the stage directly before
implementation. This is usually recorded on a special document. Click on the link below to see an
example of this using the entity Student:

Data Dictionary Entry for Student

This shows the entity Student. It has four attributes. For example the Student


Number attribute is text, size 6 bytes, valid range is 900001 to 999999. There are no comments
for any of the attributes.

Activity

1. Complete the Data Dictionary entries for the other entities in Mosspark Community
College case study.

Complete the Data Dictionary entries for the Mosspark Surgery case study.
The Data Dictionary
In the next section we will create the tables in Microsoft Access and learn how to enter the data
into the table.
Creating the Database Tables

We are now going to create the database tables using Microsoft Access. It is assumed that you
know how to start the package so you should create a new database
called MossparkCommunityCollege and navigate to Create Table in the Design View page.
Create the four tables from the 3NF descriptions. Click on the following link to see what the
database information page should look when this is done:

Database Information Page

Click the following link to see the design view of the Classlist table. Note the two-part key:

Design View of Classlist Table

After the tables have been created you should create the relevant Relationships. Base this on
the Entity Model you created in earlier. Click the link below to see what it should look like:

Relationships

Congratulations, you've now reached the end of this section. You can go back to any topic by
using the menu on the left-hand side of the screen, or return to the home page for the unit by
clicking the  HN Computing  logo in the top left-hand corner.

You might also like