0% found this document useful (0 votes)
45 views57 pages

Entity Relationships To Normal Forms

Database modeling involves creating entity relationships (ER) diagrams and normalizing data to reduce redundancy and inconsistencies. ER diagrams use entities (like tables), attributes, and relationships to represent the structure and interactions of data in a database. Normalization determines the optimal structure of tables and relationships through a series of normal forms. Together, ER diagrams and normalization form the basis for database design and analysis prior to physical implementation in a database system like SQL Server or Oracle.

Uploaded by

Sarah Mathibe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views57 pages

Entity Relationships To Normal Forms

Database modeling involves creating entity relationships (ER) diagrams and normalizing data to reduce redundancy and inconsistencies. ER diagrams use entities (like tables), attributes, and relationships to represent the structure and interactions of data in a database. Normalization determines the optimal structure of tables and relationships through a series of normal forms. Together, ER diagrams and normalization form the basis for database design and analysis prior to physical implementation in a database system like SQL Server or Oracle.

Uploaded by

Sarah Mathibe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

DATABASE MODELLING

FROM ENTITY RELATIONSHIPS (ER) to NORMAL


FORMS (NF)
INTRODUCTION
• In the late 1960s the mathematician Dr E F Codd, at the IBM San Jose
Research Laboratory, suggested use of principles of mathematics
• to design, create and manage a database system.
• Codd's ideas were first published in 1970 in a seminal paper 'A Relational Model of
Data for Large Shared Data Banks'.
• This research gave birth to the relational model on which relational
databases are based.
• It also led to the development of a database analysis and design
methodology known as normalisation.
• This methodology addressed the problems/anomalies associated with data
duplication, insertion, deletion and updates.
• This design methodology determines the contents of a database and
• It is independent of the constraints of the final physical database.
INTRODUCTION…
• One of the rules proposed by Codd was that a
relational database should include a common
language that is used to:
• create the database
• store and manipulate the data within it
• manage security.
• The language that was widely adopted
was Structured Query Language (SQL)
Database Defined
• A database is a collection of records stored on some type of
media.
• Storage in the past has included punch cards, paper tape,
magnetic tapes and disks.
• Previously, different departments in a company would design
their own databases with their own copies of data. So,
• For example: there would be multiple copies of employee
details held in various systems and departments,
• For example: Human Resources, Pensions and Project
Management.
What problems could be posed by example?
Information held by Human Information held by IT
Resources Department:
Surname: Jones Surname: Jones
Forename: Billy Forename: Billie
DOB: 28/08/1980 DOB: 28/08/1980
Department: IT Department: IT
Address: 1 Bath Street Address: 23 Hope Street
Gender: Female Gender: Female
Salary: £23,000 Salary: £27,000
Advantages
• Reduced data redundancy
• Reduced updating errors and increased consistency
• Greater data integrity and independence from
applications programs
• Improved data access to users through use of host
and query languages
• Improved data security
• Reduced costs for data entry, storage, and retrieval
Disadvantages/ Limitations
• Database systems are complex, difficult, and time-consuming
to design
• Substantial hardware and software start-up costs
• Damage to database affects virtually all applications
programs
• Extensive conversion costs in moving from a file-based
system to a database system
• Initial training required for all programmers and users
• IN SUMMARY – DBMS IMPLEMENTATION IS A COSTLY
EXERCISE
Stages in Creating a Database
• The process of creating a database can be broadly divided into two
main stages:
• Data analysis involves using a formalised methodology to create a
database design. Two widely used methods are
• Entity Relationship Modelling (ER) and
• Normalisation.
• A database design is independent of the final database system chosen.
• The same design can be physically implemented in different types of
databases.
• Physical implementation of that design in a database system.
• Examples include
• Oracle - (RDBMS) from Oracle Corporation.
• MySQL (open source)
• DB2 (IBM)
• SQL Server (Microsoft).
TERMINOLOGY
ENTITY
• An entity is any object in the system that we want to model
and store information about.
• Entities are usually recognizable concepts, either concrete or
abstract, such as person, places, things, or events which have
relevance to the database.
• Some specific examples of entities are Employee, Student,
Lecturer.
• An entity is analogous to a table in the relational model.
• An entity occurrence is an instance of an entity.
• For example: Billy Jones (i.e. SN12345, Billy, Jones,
18/08/1950).
• An entity occurrence can also be referred to as a record.
ENTITY REPRESENTATION
• Entities are represented by rectangles (either with round or square
corners):
Attribute
• An attribute is an item of information which is stored about an
entity.
• For example, the entity 'lecturer' could have attributes:
• staff id, surname, forename, date of birth, telephone number, etc.
• An attribute can only appear in one entity, unless it is the key
attribute in another entity.
• In a traditional filing system an attribute equates to a field in a
record.
• For your projects:
• make a list of entities involved in that system.
• List possible attributes for entities you have identified.
• Remember that the convention for naming an entity is to use a singular name,
e.g. student.
ATTRIBUTE REPRESENTATION
RELATIONSHIP
• A relationship is an association (link) of entities where
• The association includes one entity from each participating
entity type.
• A relationship is established by a foreign key in one entity
linking to the primary key in another.
• For example, an employee works in a department.
• A department code could be included as a foreign key in the
employee entity to link to the department entity.
REPRESENTATION
Degrees of Relationship (Cardinality)
• Is the degree of relationship is the number of
occurrences in one entity which are associated
(or linked) to the number of occurrences in
another.
• There are three degrees of relationship, known
as:
• One-to-One (1:1)
• One-to-Many (1:M)
• Many-to-Many (M:N)
One-to-One (1:1)
• One-to-one is where one occurrence of an entity relates to only one
occurrence in another entity,
• e.g. if a man only marries one woman and a woman only marries one
man, it is a one-to-one (1:1) relationship.
• In database systems one-to-one relationships rarely exist in practice,
but they can.
• However, you may consider combining them into one entity.
One-to-many (1:M)
• A one-to-many relationship is where one occurrence in an
entity relates to many occurrences in another entity.
• One manager manages many employees, but each employee
only has one manager
• it is a one-to many (1:m) relationship.
Many-to-Many (M:N)
• Relationship where many occurrences in an entity relate to many
occurrences in another entity.
• One lecturer teaches many students and a student is taught by many
lecturers.
• many-to-many relationships rarely exist in databases.
• Normally they occur because an entity has been missed.
 Rectangles represent entity sets.
 Diamonds represent relationship sets.
 Lines link attributes to entity sets and entity sets to relationship sets.
 Ellipses represent attributes
 Double ellipses represent multivalued attributes.
 Dashed ellipses denote derived attributes.
 Underline indicates primary key attributes (will study later)
Keys
• A key is a data item that allows us to uniquely identify individual
occurrences or an entity type.
• Key(s) can be one or more fields (i.e. attributes).
• Facilitate sorting and quick retrieval of information from a database
by
• For instance, in a student's table, a combination of the last name and
first name fields (or perhaps last name, first name and birth dates to
ensure you identify each student uniquely) as a key field.
• There are several types of key field:
• Primary Key
• Secondary Key
• Foreign key
• Simple key
• Compound key
• Composite key
Primary Key
• A primary key consists of one or more attributes that
distinguishes a specific record from any other.
• In the table only one UNIQUE number exists and acts as the
primary key for each record
• A primary key is mandatory. That is, each entity occurrence
must have a value for its primary key.
• For examples:
• your student number is a primary key as this uniquely identifies you
within the college student records system.
• An employee number uniquely identifies a member of staff within a
company.
• An IP address uniquely addresses a PC on the internet.
Secondary Key
• An entity may have one or more choices for the primary key.
• Collectively these are known as candidate keys.
• One is selected as the primary key.
• Those not selected are known as secondary keys.
• For example, a student has a student number, a Identity number (Id)
number and an email address.
• If the student number is chosen as the primary key then the Id number
and email address are secondary keys.
• However, it is important to note that if any student does not have a Id
number or email address (i.e. the attribute is not mandatory) then it
cannot be chosen as a primary key.
Foreign Key
• A foreign key is one or more attributes in one entity, which enables a
link (or relationship) to another entity.
• A foreign key in one entity links to a primary key in another entity.
• However, if business rules permit, a foreign key may be optional.
• For example, an employee works in a department. The department
number column in the employee entity is a foreign key, which links to
the department entity.
Simple Keys vs Compound Keys
• A simple key consists of a single attribute to uniquely identify an
entity occurrence
• for example, a student number, uniquely identifies a particular student.
• No two students would have the same student number.
• A compound key consists of more than one attribute to uniquely
identify an entity occurrence.
• Each attribute, which makes up compound key, is also a simple key in
its own right.
• For example, we have an entity named enrolment, which holds the
courses on which a student is enrolled. In this scenario a student is
allowed to enrol in more than one course.
• This has a compound key of both student number and course
number, which is required to uniquely identify a student on a
particular course.
Simple Keys vs Compound Keys
• Student number and course number combined is a compound primary
key for the enrolment entity.
Composite Key
• A composite key consists of more than one attribute to uniquely
identify an entity occurrence.
• This differs from a compound key in that one or more of the
attributes, which make up the key, are not simple keys in their own
right.
• For example, you have a database holding your CD collection. One of
the entities is called tracks, which holds details of the tracks on a CD.
• This has a composite key of CD name, track number.
The Process of Normalisation
• Normalisation is a data analysis technique to design a
database system.
• It allows the database designer to understand the current
data structures in an organisation.
• it aids any future changes and enhancements to the system.
• Normalisation is a technique for producing relational schema
with the following properties:
• No Information Redundancy
• No Update Anomalies
• The end result of normalisation is a set of entities, which removes
unnecessary redundancy (i.e. duplication of data) and
• avoids the anomalies which will be discussed next.
Anomalies
•Anomalies are inconvenient or error-
prone situation arising when we process
the tables.
•There are three types of anomalies:
• Update Anomalies
• Delete Anomalies
• Insert Anomalies
Update Anomalies
• An Update Anomaly exists when one or more instances of duplicated
data is updated, but not all.
• For example, consider Jones moving address - you need to update all
instances of Jones's address.
StudentNum CourseNum Student Address Course
Name
S21 9201 Jones Edinburgh Accounts

S21 9267 Jones Edinburgh Accounts

S24 9267 Smith Glasgow physics

S30 9201 Richards Manchester Computing

S30 9322 Richards Manchester Maths


Delete Anomalies
• A Delete Anomaly exists when certain attributes are lost because of
the deletion of other attributes.
• For example, consider what happens if Student S30 is the last student
to leave the course - All information about the course is lost.
StudentNum CourseNum Student Address Course
Name
S21 9201 Jones Edinburgh Accounts
S21 9267 Jones Edinburgh Accounts
S24 9267 Smith Glasgow physics
S30 9201 Richards Manchester Computing
S30 9322 Richards Manchester Maths
Insert Anomalies
• An Insert Anomaly occurs when certain attributes cannot be inserted
into the database without the presence of other attributes.
• For example this is the converse of delete anomaly - we can't add a
new course unless we have at least one student enrolled on the
course.
StudentNum CourseNum Student Address Course
Name
S21 9201 Jones Edinburgh Accounts
S21 9267 Jones Edinburgh Accounts
S24 9267 Smith Glasgow physics
S30 9201 Richards Manchester Computing
S30 9322 Richards Manchester Maths
Normalisation Stages
• Process involves applying a series of tests on a relation to determine
whether it satisfies or violates the requirements of a given normal
form.
• When a test fails, the relation is decomposed into simpler relations
that individually meet the normalisation tests.
• The higher the normal form the less vulnerable to update anomalies
the relations become.
• Three Normal forms: 1NF, 2NF and 3NF were initially proposed by
Codd.
• All these normal forms are based on the functional dependencies
among the attributes of a relation.
steps of normalisation
• Normalisation follows a staged process that obeys a set of
rules. The steps of normalisation are:
• Step 1: Select the data source and convert into an
unnormalised table (UNF)
• Step 2: Transform the unnormalised data into first normal
form (1NF)
• Step 3: Transform data in first normal form (1NF) into second
normal form (2NF)
• Step 4: Transform data in second normal form (2NF) into
third normal form (3NF)
steps of normalisation
• Occasionally, the data may still be subject to
anomalies in third normal form.
• In this case, we may have to perform further
transformations.
• Transform third normal form to Boyce-Codd normal
form (BCNF)
• Transform Boyce-Codd normal form to fourth normal
form (4NF)
• Transform fourth normal form to fifth normal form
(5NF)
Normalisation by Example
• We will demonstrate the process of normalisation (to 3NF)
by use of an example.
• Normalisation is a bottom-up technique for database design,
normally based on an existing system (which may be paper-
based).
• We start by analysing the documentation, e.g. reports,
screen layouts from that system.
• We will begin with the Project Management Report, which
describes projects being worked upon by employees.
• This report is to be 'normalised'.
• Each of the first four normalisation steps is explained.
Step 1
• Select the data source (i.e. the report from the previous
page) and convert into an unnormalised table (UNF).
• The process is as follows:
• Create column headings for the table for each data item on the
report (ignoring any calculated fields).
• A calculated field is one that can be derived from other
information on the form. In this case total staff and average
hourly rate.
• Enter sample data into table. (This data is not simply the data on
the report but a representative sample.
• In this example it shows several employees working on several
projects. In this company the same employee can work on
different projects and at a different hourly rate.)
Step 1, continued…
• Identify a key for the table (and underline it).
• Remove duplicate data. (In this example, for the chosen
key of Project Code, the values for Project Code, Project
Title, Project Manager and Project Budget are duplicated
if there are two or more employees working on the same
project.
• Project Code chosen for the key and duplicate data,
associated with each project code, is removed.
• Do not confuse duplicate data with repeating attributes
which is described in the next step.
UNF: unnormalised table
Step 2
• Transform a table of unnormalised data into first normal form (1NF)
by moving any repeating attributes to a new table.
• A repeating attribute is a data field within the UNF relation that may
occur with multiple values for a single value of the key. The process is
as follows:
• Identify repeating attributes.
• Remove these repeating attributes to a new table together with
a copy of the key from the UNF table.
• Assign a key to the new table (and underline it).
• The key from the original unnormalised table always becomes part of
the key of the new table.
• A compound key is created.
• The value for this key must be unique for each entity occurrence.
Important to Note that:
• After removing the duplicate data, the repeating attributes are easily
identified.
• In the previous table the Employee No, Employee Name, Department
No, Department Name and Hourly Rate attributes are repeating.
• That is, there is potential for more than one occurrence of these
attributes for each project code.
• These are the repeating attributes and have been moved to a new
table together with a copy of the original key (i.e. Project Code).
• A key of Project Code and Employee No has been defined for this new
table.
• This combination is unique for each row in the table.
1NF Tables: Repeating Attributes Removed
Project Code Project Title Project Manager Project Budget

PC010 Pensions System M Phillips 24500

PC045 Salaries System H Martin 17400

PC064 HR System K Lewis 12250


1NF Tables: Repeating Attributes Removed
Project Code Employee No. Employee Name Department No. Department Hourly Rate
Name
PC010 S10001 A Smith L004 IT 22.00
PC010 S10030 L Jones L023 Pensions 18.50
PC010 S21010 P Lewis L004 IT 21.00
PC045 S10010 B Jones L004 IT 21.75
PC045 S10001 A Smith L004 IT 18.00
PC045 S31002 T Gilbert L028 Database 25.50
PC045 S13210 W Richards L008 Salary 17.00
PC064 S31002 T Gilbert L028 Database 23.25
PC064 S21010 P Lewis L004 IT 17.50
PC064 S10034 B James L009 HR 16.50
Step 3: 1NF  2NF
• Remove any -key attributes (partial Dependencies) that only depend on
part of the table key to a new table.
• What has to be determined "is field A dependent upon field B or vice
versa?"
• This means: "Given a value for A, do we then have only one possible
value for B, and vice versa?"
• If the answer is yes, A and B should be put into a new relation with A
becoming the primary key.
• A should be left in the original relation and marked as a foreign key.
• Ignore tables with (a) a simple key or (b) with no non-key attributes
(these go straight to 2NF with no conversion).
Step 3: 1NF  2NF
• The process is as follows:
• Take each non-key attribute in turn and ask the question: is this
attribute dependent on one part of the key?
• If yes, remove the attribute to a new table with a copy of
the part of the key it is dependent upon.
• The key it is dependent upon becomes the key in the new table.
Underline the key in this new table.
• If no, check against other part of the key and repeat above
process
• If still no, i.e. not dependent on either part of the key, keep
attribute in current table.
Important to Note that:
• The first table went straight to 2NF as it has a simple key
(Project Code).
• Employee name, Department No and Department Name are
dependent upon Employee No only.
• Therefore, they were moved to a new table with Employee
No being the key.
• However, Hourly Rate is dependent upon both Project Code
and Employee No as an employee may have a different
hourly rate depending upon which project they are working
on.
• Therefore it remained in the original table.
2NF Tables: Partial Key Dependencies Removed

Project Project Title Project Project


Code Manager Budget

PC010 Pensions System M Phillips 24500

PC045 Salaries System H Martin 17400

PC064 HR System K Lewis 12250


2NF Tables: Partial Key Dependencies Removed

Project Employee Hourly Employee Employee Department Department


Code No. Rate No. Name No. Name
PC010 S10001 22.00 S10001 A Smith L004 IT
PC010 S10030 18.50 S10030 L Jones L023 Pensions
PC010 S21010 21.00 S21010 P Lewis L004 IT
PC045 S10010 21.75 S10010 B Jones L004 IT
PC045 S10001 18.00 S31002 T Gilbert L028 Database
PC045 S31002 25.50 S13210 W Richards L008 Salary
PC045 S13210 17.00 S10034 B James L009 HR
PC064 S31002 23.25
PC064 S21010 17.50
PC064 S10034 16.50
Step 4
• Moves relations in second normal form (2NF) into third normal
form (3NF).
• Move to a new table any non-key attributes that are more
dependent on other non-key attributes than the table key.
• What has to be determined is "is field A dependent upon field B
or vice versa?"
• This means: "Given a value for A, do we then have only one
possible value for B, and vice versa?"
• If the answer is yes, then A and B should be put into a new
relation, with A becoming the primary key.
• A should be left in the original relation and marked as a foreign
key.
• Ignore tables with zero or only one non-key attribute (these go
straight to 3NF with no conversion).
Step 4
• The process is as follows: If a non-key attribute is more
dependent on another non-key attribute than the table
key:
• Move the dependent attribute, together with a copy of
the non-key attribute upon which it is dependent, to a
new table.
• Make the non-key attribute, upon which it is dependent,
the key in the new table. Underline the key in this new
table.
• Leave the non-key attribute, upon which it is dependent,
in the original table and mark it a foreign key (*).
3NF Tables: Non-Key Dependencies Removed
Project Project Title Project Project
Code Manager Budget
PC010 Pensions M Phillips 24500
System
PC045 Salaries System H Martin 17400

PC064 HR System K Lewis 12250


3NF Tables: Non-Key Dependencies Removed
Project Code Employee No. Hourly Rate
PC010 S10001 22.00
PC010 S10030 18.50
PC010 S21010 21.00
PC045 S10010 21.75
PC045 S10001 18.00
PC045 S31002 25.50
PC045 S13210 17.00
064 S31002 23.25
PC064 S21010 17.50
3NF Tables: Non-Key Dependencies Removed
Employee No. Employee Name Department No. *
S10001 A Smith L004
S10030 L Jones L023
S21010 P Lewis L004
S10010 B Jones L004
S31002 T Gilbert L023
S13210 W Richards L008
S10034 B James L0009
3NF Tables: Non-Key Dependencies Removed
Department No. Department Name
L004 IT
L023 Pensions
L028 Database
L008 Salary
L009 HR
Summary of Normalisation Rules
• That is the complete process.
• Having started off with an unnormalised table we finished
with four normalised tables in 3NF.
• You will notice that duplication has been removed (apart
from the keys needed to establish the links between those
tables).
• The process may look complicated.
• However, if you follow the rules completely, and do not miss
out any steps, then you should arrive at the correct solution.
• If you omit a rule there is a high probability that you will end
up with too few tables or incorrect keys.
Summary of Normalisation Rules
• The following normal forms were discussed in this section:
• First normal form: A table is in the first normal form if it contains
no repeating columns.
• Second normal form: A table is in the second normal form if it is
in the first normal form and contains only columns that are
dependent on the whole (primary) key.
• Third normal form: A table is in the third normal form if it is in
the second normal form and all the non-key columns are
dependent only on the primary key. If the value of a non-key
column is dependent on the value of another non-key column we
have a situation known as transitive dependency. This can be
resolved by removing the columns dependent on non-key items
to another table.

You might also like