Data Modeling and Relational Database Design Oracle Course
Data Modeling and Relational Database Design Oracle Course
Database Design
Volume 1 • Student Guide
...........................................................................................
®
Authors Copyright Oracle Corporation, 1998, 1999,2001. All rights reserved.
The information in this document is subject to change without notice. If you find
any problems in the documentation, please report them in writing to Education
Products, Oracle Corporation, 500 Oracle Parkway, Box 659806, Redwood
Publishers Shores, CA 94065. Oracle Corporation does not warrant that this document is
error-free.
Avril Price-Budgen
Fiona Simpson Oracle, SQL*Plus, SQL*Net, Oracle Developer, Oracle7, Oracle8, Oracle
Designer and PL/SQL are trademarks or registered trademarks of Oracle
Don Griffin Corporation.
All other products or company names are used for identification purposes only,
and may be trademarks of their respective owners.
Contents
.....................................................................................................................................................
Contents
Lesson 1: Introduction to Entities, Attributes, and Relationships
Introduction 1-2
Why Conceptual Modeling? 1-4
Entity Relationship Modeling 1-7
Goals of Entity Relationship Modeling 1-8
Database Types 1-9
Entities 1-10
Entities and Sets 1-12
Attributes 1-13
Relationships 1-15
Entity Relationship Models and Diagrams 1-17
Representation 1-18
Attribute Representation 1-19
Relationship Representation 1-20
Data and Functionality 1-23
Types of Information 1-24
Other Graphical Elements 1-27
Summary 1-28
Practice 1—1: Instance or Entity 1-29
Practice 1—2: Guest 1-30
Practice 1—3: Reading 1-31
Practice 1—4: Read and Comment 1-32
Practice 1—5: Hotel 1-33
Practice 1—6: Recipe 1-34
General Instructor Notes 1-35
Practices 1-38
Suggested Timing 1-41
Workshop Interviewing 1-42
.....................................................................................................................................................
®
iii
Contents
.....................................................................................................................................................
.....................................................................................................................................................
iv Data Modeling and Relational Database Design
Contents
.....................................................................................................................................................
Lesson 4: Constraints
Introduction 4-2
Identification 4-4
Unique Identifier 4-6
Arcs 4-12
Arc or Subtypes 4-16
More About Arcs and Subtypes 4-17
Hidden Relationships 4-18
Domains 4-19
Some Special Constraints 4-20
Summary 4-24
Practice 4—1: Identification Please 4-25
Practice 4—2: Identification 4-26
Practice 4—3: Moonlight UID 4-28
Practice 4—4: Tables 4-29
Practice 4—5: Modeling Constraints 4-30
.....................................................................................................................................................
®
v
Contents
.....................................................................................................................................................
.....................................................................................................................................................
vi Data Modeling and Relational Database Design
Contents
.....................................................................................................................................................
.....................................................................................................................................................
®
vii
Contents
.....................................................................................................................................................
Appendix A: Solutions
Introduction to Solutions A-2
Practice 1—1 Instance or Entity: Solution A-4
Practice 1—2 Guest: Solution A-5
Practice 1—3 Reading: Solution A-6
Practice 1—4 Read and Comment: Solution A-7
Practice 1—5 Hotel: Solution A-8
Practice 1—6 Recipe: Solution A-9
Practice 2—1 Books: Solution A-11
Practice 2—2 Moonlight: Solution A-12
Practice 2—3 Shops: Solution A-13
Practice 2—4 Subtypes: Solution A-14
Practice 2—5 Schedule: Solution A-15
Practice 2—6 Address: Solution A-16
Practice 3—1 Read the Relationship: Solution A-18
Practice 3—2 Find a Context: Solution A-19
Practice 3—3 Name the Intersection Entity: Solution A-20
Practice 3—4 Receipt: Solution A-21
Practice 3—5 Moonlight P&O: Solution A-23
Practice 3—6 Price List: Solution A-27
Practice 3—7 E-mail: Solution A-28
Practice 3—8 Holiday: Solution A-30
Practice 3—9: Normalize an ER Model: Solution A-32
Practice 4—1 Identification Please: Solution A-34
Practice 4—2 Identification: Solution A-36
Practice 4—3 Moonlight UID: Solution A-39
Practice 4—4 Tables: Solution A-40
Practice 4—5 Constraints: Solution A-41
.....................................................................................................................................................
viii Data Modeling and Relational Database Design
Contents
.....................................................................................................................................................
Appendix B: Normalization
Introduction B-2
Normalization and its Benefits B-3
First Normal Form B-7
Second Normal Form B-9
Third Normal Form B-11
Summary B-13
.....................................................................................................................................................
®
ix
Contents
.....................................................................................................................................................
.....................................................................................................................................................
x Data Modeling and Relational Database Design
.................................
Introduction to
Entities, Attributes, and
Relationships
Lesson 1: Introduction to Entities, Attributes, and Relationships
..........................................................................................................................................
Introduction
Lesson Aim
This lesson explains the reasons for conceptual modeling and introduces the key role
players: entities, attributes, and relationships.
Overview
1-2
.............................................................................................................................................
1-2 Data Modeling and Relational Database Design
Introduction
..........................................................................................................................................
Objectives
At the end of this lesson, you should be able to do the following:
• Explain why conceptual modeling is important
• Describe what an entity is and give examples
• Describe what an attribute is and give examples
• Describe what a relationship is and give examples
• Draw a simple diagram
• Read a simple diagram
..........................................................................................................................................
®
1-3
Lesson 1: Introduction to Entities, Attributes, and Relationships
..........................................................................................................................................
1-3
This list shows the reasons for creating a conceptual model. The most important
reason is that a conceptual model facilitates the discussion on the shape of the future
system. It helps communication between you and your sponsor as well as you and your
colleagues. A model also forms a basis for the default design of the physical database.
Last but not least, it is relatively cheap to make and very cheap to change.
.............................................................................................................................................
1-4 Data Modeling and Relational Database Design
Why Conceptual Modeling?
..........................................................................................................................................
1-4
A building contractor needs a solid plan, a set of blueprints of the house with a
description of the materials to be used, the size of the roof beams, the capacity of the
plumbing and many, many other things. The contractor follows the plan, and has the
knowledge to construct what is on the blueprint. But how do the ideas of the home
owner become the blueprint for contractor? This is where the architect becomes
involved.
..........................................................................................................................................
®
1-5
Lesson 1: Introduction to Entities, Attributes, and Relationships
..........................................................................................................................................
The Architect
The architects are the intermediary between sponsor and constructor. They are trained
in the skills of translating ideas into models. The architect listens to the description of
the ideas and asks all kinds of questions. The architect’s skills in extracting the ideas,
putting it down in a format that allows discussion and analysis, giving advice,
describing sensible options, documenting it, and confirming it with the home owners,
are the cornerstones to providing the future home-owner with a plan of the home they
want.
Sketches
The architect’s understanding of the dreams is transformed into sketches of the new
house—only sketches! These consist of floor plans and several artist’s impressions,
and show the functional requirements of the house, not the details of the construction.
This is a conceptual model, the first version.
Easy Change
If parts of the model are not satisfactory or are misunderstood, the model can easily be
changed. Such a change would only need a little time and an eraser, or a fresh sheet of
paper. Remember, it is only changing a model. The cost of change at this stage is very
low. Certainly it is far less costly than making changes to the floor plan or roof
dimensions after construction has started. The house model is then reviewed again,
and further changes are made. The architect continues to explore and clarify the
dreams and make alternative suggestions until all controversial issues are settled, and
the model is stable and ready for the final approval by the sponsor.
Technical Design
Then the architect converts the model into a technical design, a plan the contractor can
use to build the house. Calculations are made to determine, for example, the number of
doors, how thick the walls and floor beams must be, the dimensions of the plumbing,
and the exact construction of the roof. These are technical issues that need not involve
the customer.
.............................................................................................................................................
1-6 Data Modeling and Relational Database Design
Entity Relationship Modeling
..........................................................................................................................................
defined by applied to
part of
ORGANIZATION
•
o EMAIL
* NAME
Models business,
o POSTAL CODE
at o
o
REGION
STREET
o TOWN parent organization o
o TELEPHONE NUMBER
TITLE MOVIE o CONTACT NAME
# PRODUCT CODE* CATEGORY o CONTACT EXTENSION
* TITLE o AGE RATING
o DESCRIPTION * DURATION
* MONOCHROME GAME
not implementation
o AUDIO * CATEGORY
o PREVIEW * MEDIUM SUPPLIER
o MINIMUM MEMOR # SUPPLIER CODE
o EMAIL
available * APPROVED
for * REFERENCE
the source of
PRICE HISTORY
# EFFECTIVE DATE reviewed inavailable as on
* PRICE
•
* DEFAULT DAYS
* OVERDUE RATE
Is a well-established
OTHER ORGANIZATION
PUBLICATION
# REFERENCE
* TITLE
CATALOG the holder o
# REFERENCE
technique
o VOLUME o CATALOG DAT
o ISSUE o DESCRIPTION CUSTOMER
o PUBLISH DATE o EMAIL
managed b
* DESIGNATION
EMPLOYEE the manager o * FIRST NAME
the source of the source of * POSITION
* LAST NAME
o OTHER INITIALS
* LAST NAME * STREET
o FIRST NAME * TOWN
o OTHER INITIALS * POSTAL CODE
o EMAIL * REGION
o HOME PHONE
•
acquired fro o WORK EXTENSION
the cancellor of responsible
COPY
* ACQUIRE DATE
* PURCHASE COST
* SHELF CODE
the holder of
o CONDITION responsible for
o CUSTOMER REMARKS
•
... MEMBERSHIP TYPE
# CODE
Results in easy-to-
* DESCRIPTION
rented on reserved on * DISCOUNT PERCENTAGE
o STANDARD FEE
# SEQUENCE
* ARTICLE
* HOT
in
of
approved by
MEMBERSHIP
# NUMBER
o TERMINATION REASON
o TERMINATION DATE
of
o AUTHOR
o URL
renewed fo used fo
the reservation for r r
cancelled by
for
requested authorized by of
approved by
part of
# LINE NO
* RENTAL PERIOD
* PRICE PAID
o RETURN DATE
o STAFF REMARKS
1-5
..........................................................................................................................................
®
1-7
Lesson 1: Introduction to Entities, Attributes, and Relationships
..........................................................................................................................................
1-6
.............................................................................................................................................
1-8 Data Modeling and Relational Database Design
Database Types
..........................................................................................................................................
Database Types
Database Types
ER Model
Hierarchical Network
Relational
1-7
..........................................................................................................................................
®
1-9
Lesson 1: Introduction to Entities, Attributes, and Relationships
..........................................................................................................................................
Entities
This section gives definitions and examples.
Entity
• An Entity is:
– “Something” of significance to the business
about which data must be known.
– A name for the things that you can list.
– Usually a noun.
• Examples: objects, events
• Entities have instances.
1-8
Definition of an Entity
There are many definitions and descriptions of an entity. Here are a few; some are
quite informal, some are very precise.
• An entity is something of interest.
• An entity is a category of things that are important for a business, about which
information must be kept.
• An entity is something you can make a list of, and which is important for the
business.
• An entity is a class or type of things.
• An entity is a named thing, usually a noun.
Two important aspects of an entity are that it has instances and that the instances of the
entity somehow are of interest to the business.
Note the difference between an entity and an instance of an entity.
.............................................................................................................................................
1-10 Data Modeling and Relational Database Design
Entities
..........................................................................................................................................
More on Entities
The illustration shows examples of entities and examples of instances of those entities.
Note:
• There are many entities.
• Some entities have many instances, some have only a few.
• Entities can be:
– Tangible, like PERSON or PRODUCT.
– Non-tangible, like REQUIRED SKILL LEVEL.
– An event, like ELECTION.
• An instance of one entity may be an entity in its own right: the instance “violinist”
of entity JOB could be the name of another entity with instances like “David
Oistrach”, “Kyung-Wha Chung.”
..........................................................................................................................................
®
1-11
Lesson 1: Introduction to Entities, Attributes, and Relationships
..........................................................................................................................................
JOB
manager
cook
waitress
dish washer
financial controller
porter
waiter
piano player
1-10
You can regard entities as sets. The illustration shows a set JOB and the set shows
some of its instances. At the end of the entity modeling process entities are
transformed into tables; the rows of those tables represent an individual instance.
During entity modeling you look for properties and rules that are true for the whole
set. Often you can decide on the rules by thinking about example instances. The
following lessons contain many examples of this.
Set Theory
Entity relationship modeling and the theory of relational databases are both based on a
sound mathematical theory, that is, set theory.
.............................................................................................................................................
1-12 Data Modeling and Relational Database Design
Attributes
..........................................................................................................................................
Attributes
Attribute
1-11
What is an Attribute?
An attribute is a piece of information that in some way describes an entity. An attribute
is a property of the entity, a small detail about the entity.
..........................................................................................................................................
®
1-13
Lesson 1: Introduction to Entities, Attributes, and Relationships
..........................................................................................................................................
Attribute Examples
Attribute Examples
Entity Attribute
EMPLOYEE Family Name, Age, Shoe Size,
Town of Residence, Email, ...
CAR Model, Weight, Catalog Price, …
ORDER Order Date, Ship Date, …
JOB Title, Description, ...
TRANSACTION Amount, Transaction Date, …
EMPLOYMENT Start Date, Salary, ...
CONTRACT
1-12
Note:
• Attribute Town of Residence for EMPLOYEE is an example of an attribute that is
quite likely to change, but is probably single valued at any point in time.
• Attribute Shoe Size may seem to be of no importance, but that depends on the
business: if the business supplies industrial clothing to its employees, this may be a
very sensible attribute to take.
• Attribute Family Name may not seem to be single-valued for someone with a
double name. This double name, however, can be regarded as a single string of
characters that forms just one name.
Volatile Attributes
Some attributes are volatile (unstable). An example is the attribute Age. Always look
for nonvolatile, stable, attributes. If there is a choice, use the nonvolatile one. For
example, use the attribute Birth Date instead of Age.
.............................................................................................................................................
1-14 Data Modeling and Relational Database Design
Relationships
..........................................................................................................................................
Relationships
Relationships
1-13
Relationship Examples
1-14
..........................................................................................................................................
®
1-15
Lesson 1: Introduction to Entities, Attributes, and Relationships
..........................................................................................................................................
Numerical observation:
• All EMPLOYEES have a JOB
• No EMPLOYEE has more than one JOB
• Not all JOBS are held by an EMPLOYEE
• Some JOBS are held by more than one EMPLOYEE
1-15
Based on what you know about instances of the entities, you can decide on four
questions:
• Must every employee have a job?
In other words, is this a mandatory or optional relationship for an employee?
• Can employees have more than one job?
and
• Must every job be done by an employee?
In other words, is this a mandatory or optional relationship for a job?
• Can a job be done by more than one employee?
Later on we will see why these questions are important and why (and how) the
answers have an impact on the table design.
.............................................................................................................................................
1-16 Data Modeling and Relational Database Design
Entity Relationship Models and Diagrams
..........................................................................................................................................
Graphical Elements
Entity Relationship diagramming uses a number of graphical elements. These are
discussed in the next pages.
Unfortunately, there is no ISO standard representation of ER diagrams. Oracle has its
own convention. In this course we use the Oracle diagramming technique, which is
built into the Oracle Designer tool.
..........................................................................................................................................
®
1-17
Lesson 1: Introduction to Entities, Attributes, and Relationships
..........................................................................................................................................
Representation
Entity
• Drawn as a “softbox”
• Name singular
EMPLOYEE JOB
• Name inside
ELECTION
• Neither size,
nor position
has a special TICKET
meaning ORDER
RESERVATION
JOB ASSIGNMENT
1-16
In an ER diagram entities are drawn as soft boxes with the entity name inside. Borders
of the entity boxes never cross each other. Entity boxes are always drawn upright.
Throughout this book, entity names are printed in capitals. Entity names are preferably
in the singular form; you will find that diagrams are easier to read this way.
Box Size
Neither the size of an entity, nor its position, has a special meaning. However, a reader
might construe a larger entity to be of more importance than a smaller one.
Where Entities Lead
During the design for a relational database, an entity usually leads to a table.
.............................................................................................................................................
1-18 Data Modeling and Relational Database Design
Attribute Representation
..........................................................................................................................................
Attribute Representation
Attributes in Diagrams
EMPLOYEE JOB
* Family Name * Title
* Address o Description
o Birth Date
o Shoe Size
o Email
Attributes are listed within the entity box. They may be preceded by a * or an O. These
symbols mean that the attribute is mandatory or optional, respectively. Throughout
this book attributes are printed in Initial Capital format.
* Mandatory: It is realistic to assume that for every instance of the entity the
attribute value is known and available when the entity instance is recorded and that
there is a business need to record the value.
o Optional: The value of the attribute for an instance of the entity may be unknown
or unavailable when that instance is recorded or the value may be known but of no
importance.
Not all attributes of an entity need to be present in the diagram, but all attributes must
be known before making the table design. Often only a few attributes are shown in a
diagram, for reasons of clarity and readability. Usually you choose those attributes that
help understanding of what the entity is about and which more or less “define” the
entity.
Where Attributes Lead
During design an attribute usually leads to a column. A mandatory attribute leads to a
not null column.
..........................................................................................................................................
®
1-19
Lesson 1: Introduction to Entities, Attributes, and Relationships
..........................................................................................................................................
Relationship Representation
Relationships are represented by a line, connecting the entities. The name of the
relationship, from either perspective, is printed near the starting point of the
relationship line.
The shape of the end of the relationship line represents the degree of the relationship.
This is either one or many. One means exactly one; many means one or more.
Relationship in Diagrams
EMPLOYEE JOB
has
held by
1-18
In the above example, it is assumed that JOBS are held by one or more EMPLOYEES.
This is shown by the tripod (or crowsfoot), at EMPLOYEE.
An EMPLOYEE, on the other hand, is assumed here to have exactly one JOB. This is
represented by the single line at JOB.
The relationship line may be straight, but may also be curved; curves have no special
meaning, nor does the position of the starting point of the relationship line. The
diagram below represents exactly the same model, but arguably less clearly.
has
JOB
EMPLOYEE
held by
.............................................................................................................................................
1-20 Data Modeling and Relational Database Design
Relationship Representation
..........................................................................................................................................
mandatory: optional:
held by
When you read the relationship, imagine it split into two perspectives:
held by
held by
..........................................................................................................................................
®
1-21
Lesson 1: Introduction to Entities, Attributes, and Relationships
..........................................................................................................................................
P split into Q
part of
“Each “QEachmust
Q
be
must be partpart
may be
of
of exactly
exactly one P
one P”
one or more Ps
”
1-24
.............................................................................................................................................
1-22 Data Modeling and Relational Database Design
Data and Functionality
..........................................................................................................................................
1-25
..........................................................................................................................................
®
1-23
Lesson 1: Introduction to Entities, Attributes, and Relationships
..........................................................................................................................................
Types of Information
Weather Forecast
-DQXDU\
.¡EHQKDYQ
%UHPHQ
%HUOLQ
0QFKHQ
$PVWHUGDP
%UX[HOOHV
3DULV
%RUGHDX[
1-26
.............................................................................................................................................
1-24 Data Modeling and Relational Database Design
Types of Information
..........................................................................................................................................
DK København
(Copenhagen)
IR UK
NL Bremen
Amsterdam
Berlin
BE Bruxelles DE
(Brussels)
LU München
Paris (Munich)
FR
CH
Bordeaux IT
1-27
You may notice that the cities in the weather forecast are not printed in a random order.
The German cities (Bremen, Berlin and Munchen) are grouped together, just as the
French cities are. Moreover, the cities are not ordered alphabetically by name but seem
to be ordered North-South. Apparently this report “knows” something to facilitate the
grouping and sorting. This could be:
• Country of the city
• Geographical position of the city
and maybe even
• Geographical position of the country
Next Step
Try to identify which of the above types of information is probably an entity, which is
an attribute and which is a relationship.
City and Country are easy. These are entities, both with, at least, attribute Name and
Geographical Position. Weather Type could also be an entity as there is an attribute
available: Icon. For the same reason there could be an entity Wind Direction. Now,
where does this leave the temperatures and forecast date? These cannot be attributes of
City as the forecast date is not single value for a City: there can be many forecast dates
for a city. This is how you discover that there is still one entity missing, such as
Forecast, with attributes Date, Minimum and Maximum Temperature, Wind Force.
..........................................................................................................................................
®
1-25
Lesson 1: Introduction to Entities, Attributes, and Relationships
..........................................................................................................................................
subject of
about
1-28
In this entity relationship diagram some assumptions are made about the relationships:
• Every FORECAST must be about one CITY, and
not all CITIES must be in a FORECAST—but may be in many
• Every CITY is located in a COUNTRY, and
every COUNTRY has one or more CITIES
• A FORECAST must not always contain a WEATHER TYPE, and
not all WEATHER TYPES are in a FORECAST—but may be in many
• A FORECAST must not always contain a WIND DIRECTION, and
not all WIND DIRECTIONS are in a FORECAST—but may be in many
The rationale behind these assumptions is that we consider an incomplete FORECAST
still to be a FORECAST, unless we do not know the date or the CITY the FORECAST
refers to.
.............................................................................................................................................
1-26 Data Modeling and Relational Database Design
Other Graphical Elements
..........................................................................................................................................
Entity
Attribute ** **
o
Relationship
Subtype
Unique identifier
Arc
Nontransferability
#o #
1-29
The illustration shows all graphical elements you can encounter in a ER diagram. You
saw earlier how to represent an entity, an attribute, and a relationship.
The lessons following this one discuss the remaining four types of elements:
• Subtype, represented as an entity within the boundary of another entity
• Unique identifier, represented as a # in front of an attribute or as a bar across a
relationship line
• Arc, represented as an arc-shaped line across two or more relationship lines
• Nontransferability symbol, represented as a diamond across a relationship line
..........................................................................................................................................
®
1-27
Lesson 1: Introduction to Entities, Attributes, and Relationships
..........................................................................................................................................
Summary
Conceptual models are created to model the functional and information needs of a
business. These models may be based on the current needs but can also be a reflection
of future needs. This course is about modeling the information needs. Functional
needs cannot be ignored while modeling data, as these form the only legitimate basis
for the data model. Ideally, the conceptual models are created free of any consideration
of the possible technical problems during implementation. Consequently the model is
only concerned with what the business does and needs and not with how it can be
realized.
Summary
1-30
.............................................................................................................................................
1-28 Data Modeling and Relational Database Design
Practice 1—1: Instance or Entity
..........................................................................................................................................
Your Assignment
List which of the following concepts you think is an Entity, Attribute, or Instance. If
you mark one as an entity, then give an example instance. If you mark one as an
attribute or instance, give an entity. For the last three rows, find a concept that fits.
1-32
..........................................................................................................................................
®
1-29
Lesson 1: Introduction to Entities, Attributes, and Relationships
..........................................................................................................................................
Scenario
On the left side of the illustration are three entities that play a role in a hotel
environment: GUEST, HOTEL, and ROOM. On the right is a choice of attributes.
Your Assignment
Draw a line between the attribute and the entity or entities it describes.
Practice: Guest
Address
Arrival Date
Family Name
GUEST
Room Number
HOTEL
Floor Number
ROOM
Number of Beds
Number of Parking Lots
Price
TV set available?
1-33
.............................................................................................................................................
1-30 Data Modeling and Relational Database Design
Practice 1—3: Reading
..........................................................................................................................................
Your Assignment
Which text corresponds to the diagram?
Practice: Reading
1-34
..........................................................................................................................................
®
1-31
Lesson 1: Introduction to Entities, Attributes, and Relationships
..........................................................................................................................................
living in
home town of
visitor of
visited by
mayor of
with mayor
1-35
.............................................................................................................................................
1-32 Data Modeling and Relational Database Design
Practice 1—5: Hotel
..........................................................................................................................................
Practice: Hotel
HOTEL
* Address
the lodging host of
for
ROOM
* Room Number
with
in in guest in
STAY of PERSON
* Arrival Date with * Name
1-36
2 Make up two more possible relationships between PERSON and HOTEL that
might be of some use for the hotel business.
..........................................................................................................................................
®
1-33
Lesson 1: Introduction to Entities, Attributes, and Relationships
..........................................................................................................................................
Scenario
You work as an analyst for a publishing company that wants to make recipes available
on the Web. It wants the public to be able to search for recipes in a very easy way.
Your ideas about easy ways are highly esteemed.
Your Assignment
1 Analyze the example page from Ralph’s famous Raving Recipes book and list as
many different types of information that you can find that seem important.
.............................................................................................................................................
1-34 Data Modeling and Relational Database Design
.................................
Introduction
Lesson Aim
This lesson provides you with a detailed discussion about entities and attributes and
how you can track these in various sources of information. The lesson looks at the
evolution of an entity definition and the concept of subtype and supertype entity. The
lesson also introduces the imaginary business of ElectronicMail Inc.which is used in
many examples throughout this book.
Overview
2-2
.............................................................................................................................................
2-2 Data Modeling and Relational Database Design
Introduction
..........................................................................................................................................
Objectives
At the end of this lesson, you should be able to do the following:
• Track entities from various sources
• Track attributes from various sources
• Decide when you should model a piece of information as an entity or an attribute
• Model subtypes and supertypes
..........................................................................................................................................
®
2-3
Lesson 2: Entities and Attributes in Detail
..........................................................................................................................................
• Data
– Facts given from which other facts may be
inferred
– Raw material
Example: Telephone Directory
• Information
– Knowledge, intelligence
Example: Telephone number of florist
2-3
The words data and information are often used as if they are synonyms. Nevertheless,
they have a different meaning.
Data: Raw material, from which you can draw conclusions. Facts from which you
can infer new facts. A typical example is a telephone directory. This is a huge
collection of facts with some internal structure.
.............................................................................................................................................
2-4 Data Modeling and Relational Database Design
Data
..........................................................................................................................................
Data
Data
Data~
• Modeling, Conceptual
Structuring data concepts into logical, coherent,
and mutually related groups
• Modeling, Physical
Modeling the structure of the (future) physical
database
• Base
A set of data, usually in a variety of formats, such
as paper and electronically-based
• Warehouse
A huge set of organized information
2-4
..........................................................................................................................................
®
2-5
Lesson 2: Entities and Attributes in Detail
..........................................................................................................................................
Database
A database is a set of data. The various parts of the data are usually available in
different forms, such as paper and electronic-based. The electronic-based data may
reside, for example, in spreadsheets, in all kinds of files, or in a regular data base.
Today, relational databases are very common; but many older systems are still around.
The older systems are mostly hierarchical databases and network databases. Systems
of more recent date are semantic databases and object oriented databases.
Data Warehouse
A data warehouse is composed of data from multiple sources placed into one logical
database. This data warehouse database, (or, more correctly, this database structure), is
optimized for Online Analytical Processing (OLAP) actions.
Often a data warehouse contains summarized data from day-to-day transaction
systems with additional information from other sources. An example is a phone
company that tracks the traffic load for a routing system. The system does not store the
individual telephone calls, but stores the data summarized by hour.
From a data analysis point of view a data warehouse is just a database, like any other,
only with very specific and characteristic functional requirements.
.............................................................................................................................................
2-6 Data Modeling and Relational Database Design
Tracking Entities
..........................................................................................................................................
Tracking Entities
The nouns in, for example, the texts, notes, brochures, and screens you see concerning
a business often refer to entities, attributes of entities, or instances of entities.
Entities
• Be aware of homonyms
• Check entity names and descriptions regularly
• Avoid use of reserved words
• Remove relationship name from entity name
2-5
Be Aware of Synonyms
In many business contexts one and the same concept is known under different names.
Select one and mention the synonyms in the description: “...also known as ...”.
..........................................................................................................................................
®
2-7
Lesson 2: Entities and Attributes in Detail
..........................................................................................................................................
Avoid Homonyms
Often in a business one word is used for different concepts. Sometimes even the same
person will use the same word but with different meanings as you can see in the next
example.
“The data modeling course you attend now was written in 1999 and requires modeling
skills to teach.” In this sentence the word “course” refers to three different concepts: a
course event (like the one you are attending today), a course text (which was written in
1999) and the course type (that apparently needs particular skills).
GUEST HOTEL
guest of
host of
PERSON guest of
ACCOMMODATION
host of
The second model is more general in its naming. There a guest is seen as a PERSON
playing the role of being a guest.
As a rule, if there is choice take the more general name. It allows, for example, for the
addition of a second relationship between the same entities that shows, for example,
person is working for or is owning shares in the accommodation. The first model
would require new entities.
This subject is closely related to the concept of subtypes and roles. You find more on
this later in this lesson and when we discuss Patterns.
.............................................................................................................................................
2-8 Data Modeling and Relational Database Design
Electronic Mail Example
..........................................................................................................................................
2-7
(0
ORJR DGYHUWLVHPHQWDUHD s
&RPSRVH &RPSRVH ge
7HPSODWH default
sa
)ROGHUV
es
il m
6XEMHFW test 6HQG
ma
$GGUHVVHV
7R bipi, [email protected] 6DYH'UDIW
3UHIHUHQFHV
&F
s e
po
myself 6DYH7HPSODWH
*HW1HZ0DLO
%FF
m
co
&DQFHO
([LW
0HVVDJH
to
this is a test
n
ree
DGYHUWLVHPHQW
h .HHS
e tc &RS\
sk
$WWDFKPHQWV 7\SH
$GG
abc.html Hypertext
6LJQDWXUH
xyz.doc Word document
2-8
..........................................................................................................................................
®
2-9
Lesson 2: Entities and Attributes in Detail
..........................................................................................................................................
The screenshots may give an idea of how the Compose a Mail Message screen and the
Maintain Addresses screen will look like.
(0
ORJR DGYHUWLVHPHQWDUHD
&RPSRVH $GGUHVVHV s se
s
dre
)ROGHUV
1LFNQDPHV
ma
*HW1HZ0DLO
joe [email protected]
myself [email protected]
o
nt
([LW
*URXS
ree
DGYHUWLVHPHQW
c
o fs
IULHQGV
h bipi
e tc
DUHD
joe
sk [email protected]
[email protected]
2-9
2-10
.............................................................................................................................................
2-10 Data Modeling and Relational Database Design
Evolution of an Entity Definition
..........................................................................................................................................
Must every message contain text? Should it not be possible to send a message that
only transports an attachment, without additional text?
And what about a message that comes from an external source and is received by an
EM user? Should those not be kept as well?
The thinking process shown here is typical for the change of a definition from the first
idea to something that is much more well thought-out—though this does not mean that
the definition is final.
..........................................................................................................................................
®
2-11
Lesson 2: Entities and Attributes in Detail
..........................................................................................................................................
Creating a Message
When I type in some text in the compose screen, is that text a message? You will
probably agree that it does not make much sense to consider it as a message until some
fields are completed, such as the To or Subject field. The checks must take place after
I hit the send key. Only after all checks have been made is the message born.
Removing a Message
When can the system remove a message? When a user hits the delete key? But what
should the system do when there are other receivers of that same message? It is better
to consider the deleting of a message as the signal to the system that you no longer
need the right to read the message. When all users that did receive the same message
have done this, then the message can be deleted. Apparently, for a message to exist it
must have receivers that still need the message.
Changing a Message
Changing a message? As long as the text is not sent, it is no problem as it is not yet
considered to be a message. Changing it after sending it? Changing something that is
history? This cannot be done. Changing the text should lead to a new message.
Draft
What about a message that is not yet ready for sending? Suppose a user wants to finish
a message at a later date. Is there a place for this? Do we want an unsent, or draft,
message in the system? Is a DRAFT a special case of entity MESSAGE, or should we
treat a DRAFT as a separate entity?
Template
What about the templates? A template is about everything a message can be, but a
template is only used as a kind of stamp for a message. Templates are named,
messages are not. Is TEMPLATE a special case of entity MESSAGE, or should we
look upon it as a separate entity?
.............................................................................................................................................
2-12 Data Modeling and Relational Database Design
Functionality
..........................................................................................................................................
Functionality
In the previous evolution of the entity definition, the definition changes were invoked
by thinking and rethinking the functionality of the system around messaging. This
illustrates the statement made earlier: functions drive the conceptual data model.
Business Functions
2-11
The first idea of the functionality of a system, or desired functionality, can be derived
from the verbs in, for example, descriptive texts and interview notes. In the above text
the functionality is expressed at a high level, without much detail. Nevertheless, you
can probably imagine more detailed functionality.
In this course functionality is always present, often implicitly assumed, sometimes in
detail.
..........................................................................................................................................
®
2-13
Lesson 2: Entities and Attributes in Detail
..........................................................................................................................................
Tracking Attributes
An Attribute...
2-13
As discussed earlier, the nouns in, for example, the texts, notes, brochures, and screens
you see used in a business often refer to entities, attributes of entities, or instances of
entities. You can usually easily recognize attributes by asking the questions “Of
what?” and “Of what format?”. Attributes describe, quantify, qualify, classify, specify
or give a status of the entity they belong to. We define an attribute as a property of an
entity; this implies there is no concept of a standalone attribute.
In the background information text on ElectronicMail that is shown below, the first
occurrence of the (probable) entities are capitalized, the attributes are boxed and
instances are shown in italics.
.............................................................................................................................................
2-14 Data Modeling and Relational Database Design
Tracking Attributes
..........................................................................................................................................
List the types of information, distinguish the probable entities and attributes and group
them. Add attributes, if necessary, (like Name of COUNTRY) in the example. Distill
one or more attributes from the instances (like Name of FOLDER).
Naming Attributes
Attribute names become the candidate column names at a later stage. Column names
must follow conventions. Try to name attributes avoiding the use of reserved words.
Do not use abbreviations, unless these were decided beforehand. Examples of
frequently-used abbreviations are Id, No, Descr, Ind(icator).
Do not use attribute names like Amount, Value, Number. Always add an explanation
of the meaning of the attribute name: Amount Paid, Estimated Value, Licence No.
Always put frequently-used name components, such as “date” or “indicator”, of
attribute names in the same position, for example, at the end—Start Date, Creation
Date, and Purchase Date.
Do not use underscores in attribute names that consist of more than one word. Keep in
mind that attribute names, like entity names, must be as clear and understandable as
possible.
..........................................................................................................................................
®
2-15
Lesson 2: Entities and Attributes in Detail
.....................................................................................................................................................
GARMENT
Name
Price
GARMENT
2-16
Redundancy
You should take special care to prevent using redundant attributes, that is, attribute
values that can be derived from the values of others. An example is shown below.
Using derivable information is typically a physical design decision. This is also true
for audit type attributes such as Date Instance Created, and User Who Modified.
COMMODITY
* Name
* Price exclusive VAT
* Price inclusive VAT
* VAT %
.....................................................................................................................................................
2-16 Data Modeling and Relational Database Design
Subtypes and Supertypes
..........................................................................................................................................
A Subtype ...
2-18
Subtypes have all properties of X and usually have additional ones. In the example,
supertype ADDRESS is divided into two subtypes, USER and LIST. One thing USER
and LIST have in common is an attribute NAME and the functional fact that they can
both be used in the To field when writing a message.
Inheritance
In the next illustration, is a new entity, COMPOSITION, as a supertype of
MESSAGE, DRAFT, and TEMPLATE. The subtypes have several attributes in
common. These common attributes are listed at the supertype level. The same applies
to relationships. Subtypes inherit all attributes and relationships of the supertype
entity.
..........................................................................................................................................
®
2-17
Lesson 2: Entities and Attributes in Detail
.....................................................................................................................................................
COMPOSITION
o Subject
o Cc
o Bcc DRAFT
o Text * Name
MESSAGE TEMPLATE
* Name
Subtype: Rules
Name subtypes A
adequately:
B C NON B OTHER A
2-20
.....................................................................................................................................................
2-18 Data Modeling and Relational Database Design
Subtypes and Supertypes
..........................................................................................................................................
Nested Subtypes
You can nest Subtypes. For readability, you would not usually subtype to more than
two levels, but there is no major reason not to do so. Reconsider the placement of the
attributes and relationships after creating a new level.
COMPOSITION OTHER
o Subject COMPOSITION
o Cc * Name
o Bcc DRAFT
o Text *DRAFT
Name
MESSAGE TEMPLATE
TEMPLATE
* Name
More on Subtypes
EMPLOYEE
CURRENT OTHER
EMPLOYEE EMPLOYEE
EMPLOYEE
EMPLOYEE WITH OTHER
SHOE SIZE > 45 EMPLOYEE
2-22
Implementing Subtypes
You can implement subtype entities in various ways, for example, as separate tables or
as a single table, based on the super entity.
..........................................................................................................................................
®
2-19
Lesson 2: Entities and Attributes in Detail
..........................................................................................................................................
Summary
Entities can often be recognized as nouns in texts that functionally describe a business.
Entities can be tangible, intangible, and events. Subtypes of an entity share all
attributes and relationships of that entity, but may have additional ones.
Summary
• Entities
– Nouns in texts
– Tangible, intangible, events
• Attributes
– Single-valued qualifiers of entities
• Subtypes
– Inherit all attributes and relationships of
supertype
– May have their own attributes and relationships
2-23
.............................................................................................................................................
2-20 Data Modeling and Relational Database Design
Practice 2—1: Books
..........................................................................................................................................
Your Assignment
1 In this text the word book is used with several meanings. These meanings are
different entities in the context of a publishing company or a book reseller. Try to
distinguish the various entities, all referred to as book. Give more adequate names
for these entities and make up one or two attributes to distinguish them.
1. I have just finished writing a book. It’s a novel about justice and
power.
2. We have just published this book. The hard cover edition is available
now.
3. Did you read that new book on Picasso? I did. It's great!
4. If you like you can borrow my book.
5. I have just started translating this book into Spanish. I use the modern
English text as a basis and not the original, which is 16th century.
6. I ordered that book for my parents.
7. Yes, we have that book available. You should find it in Art books.
8. A second printing of the book War and Peace is very rare.
9. I think My name is Asher Lev is one of the best books ever written.
Mine is autographed.
10. I want to write a book on entity relationship modeling when I retire.
2-25
2 Create an ER model based on the text. Put the most general entity at the top of your
page and the most specific one at the bottom. Fit the others in between. Do not
worry about the relationship names.
..........................................................................................................................................
®
2-21
Lesson 2: Entities and Attributes in Detail
.....................................................................................................................................................
Your Assignment
1 Make a list of about 15 different entities that you think are important for
Moonlight Coffees. Use your imagination and common sense and, of course, use
what you find in the summary that is printed below.
Moonlight Coffees
Summary
Moonlight Coffees is a fast growing chain of high quality coffee shops with currently
over 500 shops in 12 countries of the world. Shops are located at first-class
locations, such as major shopping, entertainment and business areas, airports,
railway stations, museums. Moonlight Coffees has some 9,000 employees.
Products
All shops serve coffees, teas, soft drinks, and various kinds of pastries. Most shops
sell nonfoods, like postcards and sometimes even theater tickets.
Financial
Shop management reports sales figures on a daily basis to Headquarters, in local
currency. Moonlight uses an internal exchange rates list that is changed monthly.
Since January 1, 1999, the European Community countries must report in Euros.
Stock
Moonlight Coffees is a public company; stock is traded at NASDAQ, ticker symbol
MLTC. Employees can participate in a stock option plan.
2-26
.....................................................................................................................................................
2-22 Data Modeling and Relational Database Design
Practice 2—3: Shops
.....................................................................................................................................................
Your Assignment
Use the information from the list as a basis for an ER model. Pay special attention to
find all attributes.
Shop List
Moonlight Coffees
181 The Flight, JFK Airport terminal 2, New York, USA, 212.866.3410, Airport, 12-oct-97
182 Hara, Kita Shinagawa,Tokyo, JP, 3581.3603/4, Museum, 25-oct-97
183 Phillis, 25 Phillis Rd, Atlanta, USA, 405.867.3345, Shopping Centre, 1-nov-97
184 JFK, JFK Airport terminal 4, New York, USA, 212.866.3766, Airport, 1-nov-97
185 VanGogh, Museumplein 24, Amsterdam, NL, 76.87.345, Museum, 10-nov-97
186 The Queen, 60 Victoria Street, London, UK, 203.75.756, Railway Station, 25-nov-97
187 Wright Bros, JFK Airport terminal 1, New York, USA, 212.866.9852, Airport, 6-jan-98
188 La Lune, 10 Mont Martre, Paris, FR, 445 145 20, Entertainment, 2-feb-98
189
2-27
.....................................................................................................................................................
®
2-23
Lesson 2: Entities and Attributes in Detail
..........................................................................................................................................
Your Assignment
Find all incorrect subtyping in the illustration. Explain why you think the subtyping is
incorrect. Adjust the model to improve it.
Subtypes
DISABLED CAR
PERSON
STATION WAGON
DEAF
SEDAN
BLIND
OTHER DISABLED
PERSON BUILDING HOUSE
HOTEL DOG
ROOM WITH BATH DOMESTIC
ANIMAL
OTHER ROOM MAMMAL
2-28
.............................................................................................................................................
2-24 Data Modeling and Relational Database Design
Practice 2—5: Schedule
.....................................................................................................................................................
Your Assignment
Use the schedule that is used in one of the shops in Amsterdam as
a basis for an entity relationship model. The schedule shows, for example, that in the
week of 12 to 18 October Annet B is scheduled for the first shift on Monday, Friday,
and Saturday.
2-29
The scheme suggests there is only one shift per person per day.
.....................................................................................................................................................
®
2-25
Lesson 2: Entities and Attributes in Detail
..........................................................................................................................................
Your Assignment
An entity, possibly PERSON (or ADDRESS) may have attributes that describe the
address as in the examples below.
1 How would you model the address information if the future system is required to
produce accurate international mailings?
.............................................................................................................................................
2-26 Data Modeling and Relational Database Design
Practice 2—6: Address (continued)
..........................................................................................................................................
3 Check if your model would be different if the system is also required to have
facilities to search addresses in the following categories. Make the necessary
changes, if any.
All addresses:
• In Kirkland
• With postal code 53111 in Bonn
• That are P.O. Boxes
• On:
– Oxford Road or
– Oxford Rd or
– OXFORD ROAD or
– OXFORD RD
in Reading
..........................................................................................................................................
®
2-27
Lesson 2: Entities and Attributes in Detail
..........................................................................................................................................
.............................................................................................................................................
2-28 Data Modeling and Relational Database Design
.................................
Relationships
in Detail
Lesson 3: Relationships in Detail
..........................................................................................................................................
Introduction
Lesson Aim
This lesson discusses in detail how to establish a relationship between two entities.
You meet the ten types of relationship and examples of the less frequent types. This
lesson looks at nontransferable relationships and discusses the differences and
similarities between relationships and attributes. It also provides a solution for the
situation where a relationship seems to have an attribute. Finally, the rules of
normalization are discussed in the context of conceptual models.
Overview
• Relationships
• Ten different relationship types
• Nontransferability
• Relationships that seem to have attributes
• Rules of Normalization
3-2
.............................................................................................................................................
3-2 Data Modeling and Relational Database Design
Introduction
..........................................................................................................................................
Objectives
At the end of this lesson, you should be able to do the following:
• Create a well-defined relationship between entities
• Identify which relationship types are common and which are not
• Give real-life examples of uncommon relationship types
• Choose between using an attribute or a relationship to model particular
information
• Resolve a m:m relationship into an intersection entity and two relationships
• Resolve other relationships and know when to do so
• Rules of Normalization
..........................................................................................................................................
®
3-3
Lesson 3: Relationships in Detail
..........................................................................................................................................
Establishing a Relationship
Establishing a Relationship
3-3
receiving
replying
.............................................................................................................................................
3-4 Data Modeling and Relational Database Design
Establishing a Relationship
..........................................................................................................................................
Relationship Names
sender
MESSAGE sent by USER
of
sent to
receiver
reply of of
replied
to by
3-5
Are sent to and receiver of really opposite? If so, the assumption is that if a
MESSAGE is sent to a USER, it also arrives. Maybe it is safer to name the
relationship received by / receiver of...
..........................................................................................................................................
®
3-5
Lesson 3: Relationships in Detail
..........................................................................................................................................
Optionality
author USER
MESSAGE written by of
received by
receiver
reply of of
replied
to by
3-7
.............................................................................................................................................
3-6 Data Modeling and Relational Database Design
Establishing a Relationship
..........................................................................................................................................
A split into B
part of
• An optional “many” relationship end means zero, one or more. In the e-mail
example a USER can be author of 0,1 or more MESSAGES.
• Sometimes the degree is a fixed value, or there is a maximum number. Assume a
MESSAGE may be containing one or more ATTACHMENTS, but for some
business reason, the number of ATTACHMENTS per MESSAGE may not exceed
4. The degree then is <5. The diagram, however, shows a crowsfoot.
Degree
author
MESSAGE written by of USER
received by
receiver
reply of of
containing replied
to by
with <5
ATTACHMENT
3-10
..........................................................................................................................................
®
3-7
Lesson 3: Relationships in Detail
..........................................................................................................................................
Nontransferability
FOLDER
containing
filed in
author
MESSAGE written by USER
of
received by
receiver
reply of of
replied
to by
3-12
• Not all relationships are nontransferable. Assume the mail system allows a user to
file a MESSAGE in a FOLDER. This is only a valuable functionality if the user is
allowed to change the FOLDER in which a MESSAGE is filed.
.............................................................................................................................................
3-8 Data Modeling and Relational Database Design
Relationship Types
..........................................................................................................................................
Relationship Types
There are three main groups of relationships, named after their degrees:
• One to many (1:m)
• Many to many (m:m)
• One to one (1:1)
This paragraph discusses the various types and gives some examples of their variants.
Relationships—1:m
The various types of 1:m relationships are most common in an ER Model. You have
seen several examples already.
Relationship Types
1:m
(a)
(b)
(c)
(d)
3-13
a Mandatory at both ends. This type of relationship typically models entities that
cannot exist without each other. Often the existence of mandatory details for a
master is more wishful thinking than a strict business rule. Often the
relationship expresses that an entity is always split into details. Seen from the
other perspective, it often expresses an entity that is always classified,
assigned.
..........................................................................................................................................
®
3-9
Lesson 3: Relationships in Detail
..........................................................................................................................................
PRODUCT
part
of BUNDLE
consists
of
.............................................................................................................................................
3-10 Data Modeling and Relational Database Design
Relationship Types
..........................................................................................................................................
Relationships—m:m
The various types of m:m relationships are common in a first version of an ER Model.
In later stages of the model most m:m relationships, and possibly all, will disappear.
Relationship Types
m:m
(e)
(f)
(g)
3-15
..........................................................................................................................................
®
3-11
Lesson 3: Relationships in Detail
..........................................................................................................................................
USER
part
of LIST
consists
of
.............................................................................................................................................
3-12 Data Modeling and Relational Database Design
Relationship Types
..........................................................................................................................................
Relationships—1:1
Usually you will find just a few of the various types of 1:1 relationships in every ER
Model.
Relationship Types
1:1
(h)
( i)
(j)
3-17
h A 1:1 relationship, mandatory at both ends, tightly connects two entities: when
you create an instance of one entity there must be exactly one dedicated
instance for the other simultaneously; for example, entity PERSON and entity
BIRTH. This leads to the question why you want to make a distinction between
the two entities anyway. The only acceptable answer is: only if there is a
functional need.
If you have this relationship in your model, it is often, possibly always, part of
an arc.
i Mandatory at one end is often in a model where roles are modeled, for
example, in this hospital model.
See Page 46
PERSON acting as PATIENT
* Name role of * Blood Type
acting as EMPLOYEE
role of * Job
Note: These role-based relationships are often named is/is type of or simply
is/is.
..........................................................................................................................................
®
3-13
Lesson 3: Relationships in Detail
..........................................................................................................................................
MESSAGE
DRAFT
basis for
result of
.............................................................................................................................................
3-14 Data Modeling and Relational Database Design
Relationship Types
..........................................................................................................................................
Redundancy
Like attributes, relationships can be redundant.
Redundant Relationships
COUNTRY COUNTRY
location of location of birth
of of located of
located
in in
TOWN TOWN
hometown hometown
of living living of living born
in in in in
PERSON PERSON
3-20
In the left-hand example you can derive the relationship from PERSON to COUNTRY
from the other two relationships and you should remove them from the model.
This is a semantic issue and cannot be concluded from the structure alone, as the right-
hand example shows.
..........................................................................................................................................
®
3-15
Lesson 3: Relationships in Detail
..........................................................................................................................................
See Page 48
Relationships and Attributes
ATTACHMENT TYPE
* Name
of
with
ATTACHMENT
* Type ATTACHMENT
* Content * Content
3-21
.............................................................................................................................................
3-16 Data Modeling and Relational Database Design
Relationships and Attributes
..........................................................................................................................................
ATTACHMENT TYPE
* Name
of
with
ATTACHMENT
* Type ATTACHMENT
* Content * Content
3-22
The table based on entity ATTACHMENT would contain the same columns in both
situations, but the Attachment Type Name column would be a foreign key column in
the second implementation. This would mean that an Attachment Type Name entered
for an ATTACHMENT can only be taken from the types listed in the table based on
entity ATTACHMENT TYPE. The list serves as a pick list and spelling check.
There are advantages and disadvantages for both models.
The one entity model is somewhat easier to read because it is less packed with lines. In
the table implementation you would need no joins to get the required information.
However, a two-entity model is usually far more flexible. It leaves the option open to
create relationships from other entities to the new entity. You would have control over
the values entered as they are checked against a given set. Usually, the two-table
implementation takes less (sometimes even much less) space in the database.
Use your common sense when you select the attributes and entities.
..........................................................................................................................................
®
3-17
Lesson 3: Relationships in Detail
..........................................................................................................................................
placed in
MESSAGE
* Message Id
* Text
* Folder Name
3-24
.............................................................................................................................................
3-18 Data Modeling and Relational Database Design
Relationship Compared to Attribute
..........................................................................................................................................
MESSAGE USER
* Addressee
3-25
..........................................................................................................................................
®
3-19
Lesson 3: Relationships in Detail
..........................................................................................................................................
CUSTOMER PRODUCT
* Id buyer of * Code
* Name * Name
bought by
3-26
Suppose you make a model for a retail company that sells PRODUCTS. A
CUSTOMER buys PRODUCTS. Suppose future customers are accepted into the
system as well. This would mean:
A CUSTOMER may buy one or more PRODUCTS
A PRODUCT may be bought by one or more CUSTOMERS
A typical event for this company would be customer Nick Sanchez buying two shirts.
“Nick Sanchez” is a CUSTOMER Name, “shirt” is a PRODUCT Name. This leaves
the question of where to put the “two”, the quantity information.
.............................................................................................................................................
3-20 Data Modeling and Relational Database Design
m:m Relationships May Hide Something
..........................................................................................................................................
CUSTOMER
? PRODUCT
buyer of
* Id * Code
* Name bought by * Name
Quantity
CUSTOMER
? PRODUCT
buyer of
* Id * Code
* Name bought by * Name
Quantity
3-27
CUSTOMER PRODUCT
buyer of
* Id * Code
* Name bought by * Name
Quantity
..........................................................................................................................................
®
3-21
Lesson 3: Relationships in Detail
..........................................................................................................................................
CUSTOMER
* Id with
* Name of
ORDER
3-29
The table design here is the default design for implementing the model. Note the two
foreign key columns in the ORDERS table, Ctr_id (foreign key to CUSTOMERS) and
Pdt_code (to PRODUCTS).
Now suppose Pepe Yomita enters the store and buys one pair of jeans, two shirts, and
one silk tie. Given the current model this would mean that Pepe places three orders:
one for the jeans, one for the shirts and one for the tie. Three orders, all at the same
time, from one and the same customer. No problem so far as the model allows for this.
Now suppose the store wants to automate the billing of the orders. (This is probably
one of the reasons for making the model anyway.) Using the above model, this would
mean three orders and, as a consequence, three bills, as the system has no way of
knowing these three orders somehow belong to each other.
It is better to change the model in such a way that one order can be for more than one
product. That means we should have a m:m relationship between ORDER and
PRODUCT, which we should investigate next.
.............................................................................................................................................
3-22 Data Modeling and Relational Database Design
m:m Relationships May Hide Something
..........................................................................................................................................
CUSTOMER
* Id with
* Name of
ORDER
* Id
PRODUCT with
* Date
* Code
for
* Name
Qu
ant
?
ity
3-30
Then there is the question again: where do you put quantity? Quantity can now no
longer be an attribute of an order because the attribute must be single-valued and
cannot contain three values 1, 2 and 1 at the same time. Quantity has become a
property of the m:m relationship between PRODUCT and ORDER.
..........................................................................................................................................
®
3-23
Lesson 3: Relationships in Detail
..........................................................................................................................................
CUSTOMER
* Id with
ORDER
* Name of HEADER
* Id
PRODUCT
* Date
* Code
* Name
with
with
for for
ORDER ITEM
*Quantity Sold
3-31
Tables
CUSTOMERS
Id Name ORDER_HEADERS
1 Sanchez Id Ctr_id Date_ordered
2 Lowitch 1 1 25-MAY-1999
3 Yomita 2 2 25-MAY-1999
4 3 1 25-MAY-1999
4
ORDER_ITEMS PRODUCTS
Ohd_id Pdt_code Quantity_sold Code Name
1 2 2 1 Jeans
2 2 2 2 Shirt
3 1 1 3 Tie
4 4
3-32
.............................................................................................................................................
3-24 Data Modeling and Relational Database Design
Resolving Relationships
..........................................................................................................................................
Resolving Relationships
Relationships and Intersection Entities
Earlier in this lesson you saw a typical example of relationships seeming to have
attributes. The relationships in the example were many-to-many relationships. You
deal with the situation by creating a new entity, an intersection entity, that replaces the
relationship and can hold attributes.
This leads to the following questions:
• What are the steps in resolving a relationship in general?
• Should every m:m relationship be resolved?
• Can other relationships than m:m be resolved?
Resolving a Relationship
Suppose we want to resolve the m:m relationship between entities A and B.
yyy
B B of
in
1 First create a new intersection entity. You will experience that sometimes there is
no suitable word available for the concept you are modeling. The new entity can
always be named with the neologism “A/B COMBINATION”, or a name that is
somehow derived from the name of the original m:m relationship. Do not let the
unavailability of a proper name for the entity stop you from modeling it.
2 Next create two new m:1 relationships from entity A/B COMBINATION, one to
A and one to B. Initially, draw these as mandatory at A/B COMBINATION, as
you will probably only be interested in complete pairs of A and B. If the original
..........................................................................................................................................
®
3-25
Lesson 3: Relationships in Detail
..........................................................................................................................................
m:m relationship was optional (or mandatory) at A’s side, then the new
relationship from A to A/B COMBINATION is also optional (or mandatory).
3 Name the relationships. You can often name both relationships “in / of”.
4 The next step is to remove the m:m relationship you started with.
5 Finally, reconsider the newly-drawn relationships. They may be optional at the A/
B COMBINATION side. Also, they may turn out to be of type m:m and require
resolving, as you have seen in the example of customers buying products.
No Purely from a conceptual data modeling point of view, there is no need to resolve
these genuine m:m relationships. The model is rich enough to be the basis for table
design. A m:m relationship will transform into a binary table; this is a table that
consists of the columns of two foreign keys only. This is exactly the same table as the
one that would result from the intersection entity when you resolved the m:m
relationship.
A m:m relationship in a conceptual data diagram needs less space than a separate
entity plus two relationships. For this reason a diagram with unresolved m:m
relationships is more transparent and easier to read.
Yes From a function modeling point of view the answer is different. If your model
contains a true m:m relationship there is apparently a business need to keep
information on the combinations of, say, entity A and B. In other words, the system
would contain at least one business function that creates the relationship. This “create
relationship” cannot be expressed as a usage of entities of attributes, although this is
usually what design tools require of the functional model. Oracle Designer is no
exception. This means that when you create an ER model in Oracle Designer you
would always resolve the m:m relationships in order to create a fully-defined
functional model with all data usages included.
.............................................................................................................................................
3-26 Data Modeling and Relational Database Design
Resolving Relationships
..........................................................................................................................................
Suppose you need your system to create a m:1 relationship from external entity
PERSON to CUSTOMER TYPE, one of your internal entities (as in the diagram
below):
external
classified
PERSON as
CUSTOMER
classification
of TYPE
internal
3-35
This would result later on in a change of the table structure of the third-party
PERSONS table. This is undesirable (third parties often ask you to you sign a contract
that simply forbids you to do that) and sometimes even impossible if you have no
authority over that table.
external
PERSON CUSTOMER
TYPE
with in
for with
CLASSIFICATION
internal
The above model leaves the external entity PERSON as is and does the referencing
from inside. The m:1 relationship is replaced by an entity CLASSIFICATION and two
relationships.
..........................................................................................................................................
®
3-27
Lesson 3: Relationships in Detail
..........................................................................................................................................
Normalization Rules
Normal Form Rule Description
.............................................................................................................................................
3-28 Data Modeling and Relational Database Design
Normalization During Data Modeling
..........................................................................................................................................
RECEIVED
MESSAGE received by USER
# Receive Date # Name
o Subject receiver * Person Name
o Text of
3-38
..........................................................................................................................................
®
3-29
Lesson 3: Relationships in Detail
..........................................................................................................................................
3-39
.............................................................................................................................................
3-30 Data Modeling and Relational Database Design
Normalization During Data Modeling
..........................................................................................................................................
3-40
..........................................................................................................................................
®
3-31
Lesson 3: Relationships in Detail
..........................................................................................................................................
Summary
Summary
3-41
Relationships connect entities and express how they are connected. There are ten types
of relationships, 4 of type1:m, 3 of type m:m and 3 of type 1:1.
The m:1 relationship that is optional at the 1 side is by far the most common type in
finished ER models. This one is very easy to implement in a relational database.
At the beginning of the process of creating an ER model there are often many m:m
relationships. Many of these disappear after closer investigation.
Relationships cannot have attributes. If this seems to be the case, you need to resolve
the relationship into an intersection entity plus two relationships.
The other types are less common—some express more a desired situation rather than
reality, such as the m:1 relationship that is mandatory at both ends.
A normalized data model yields a normalized relational database design. Third normal
form is the generally accepted standard.
.............................................................................................................................................
3-32 Data Modeling and Relational Database Design
Practice 3—1: Read the Relationship
..........................................................................................................................................
Your Assignment
Read the diagrams aloud, from both perspectives. Make sentences that can be
understood and verified by people who know the business area, but do not know how
to read ER models.
ALU of
BRY
with
PUR bazooned in
YOK
bazooned by
KLO bilought in
HAR
glazoed with
3-39
..........................................................................................................................................
®
3-33
Lesson 3: Relationships in Detail
..........................................................................................................................................
Your Assignment
Given the following ER diagrams, find a context that fits the model.
1
.............................................................................................................................................
3-34 Data Modeling and Relational Database Design
Practice 3—3: Name the Intersection Entity
..........................................................................................................................................
Your Assignment
1 Resolve the following m:m relationships. Find an acceptable name for the
intersection entity.
3-44
2 Invent at least one attribute per intersection entity that could make sense in some
serious business context. Give it a clear name.
..........................................................................................................................................
®
3-35
Lesson 3: Relationships in Detail
.....................................................................................................................................................
Scenario
You work as a contractor for Moonlight Coffees. Your task is to create a conceptual
data model for their business. You have collected all kinds of documents about
Moonlight. Below you see an example of a receipt given at one of the shops.
Your Assignment
Use the information from the receipt and make a list of entities and attributes.
.....................................................................................................................................................
3-36 Data Modeling and Relational Database Design
Practice 3—5: Moonlight P&O
.....................................................................................................................................................
Scenario
You are still working as a contractor for Moonlight Coffees—apparently you are doing
very well!
Your Assignment
1 Create a entity relationship model based on the following personnel and
organization information:
3 And again:
.....................................................................................................................................................
®
3-37
Lesson 3: Relationships in Detail
..........................................................................................................................................
4 Change the model—if necessary and if possible—to allow for the following new
information.
a Jan takes shifts in two different shops in Prague.
b Last year Tess resigned in Brazil as a shop manager and moved to Toronto.
Recently she joined the shop at Toronto Airport.
c To reduce the number of direct reports, departments and country organizations
may also report to another department instead of Headquarters.
d The shops in Luxembourg report to Belgium.
e To prevent conflicting responsibilities, employees are not allowed to work for
a department and for a shop at the same time.
5 Would your model be able to answer the next questions?
a Who is currently working for Operations?
b Who is currently working for Moonlight La Lune at the Mont Martre, France?
c Are there currently any employees working for Marketing in France?
d What is the largest country in terms of number of employees? In terms of
managers? In terms of part-timers?
e When can we celebrate Lynn’s fifth year with the company? When can we do
the same with Tess’ fifth year with Moonlight?
f What country has the lowest number of resignations?
.............................................................................................................................................
3-38 Data Modeling and Relational Database Design
Practice 3—6: Price List
.....................................................................................................................................................
Scenario
You work as a contractor for Moonlight Coffees.
Your Assignment
Make a ER model based on the pricelist from one of the Moonlight Coffee Stores.
visit us at www.moonlight.com
3-47
.....................................................................................................................................................
®
3-39
Lesson 3: Relationships in Detail
..........................................................................................................................................
Scenario.
FOLDER
containing
placed in
author
USER
COMPOSITION written by
of
part
MESSAGE received by of LIST
receiver consists
OTHER of of
COMPOSITION
reply of
containing replied
to by
with <5
3-48
Your Assignment
Take the given model as starting point. Add, delete, or change any entities, attributes,
and relationships so that you can facilitate the following functionality:
1 A user must be able to create nick names (aliases) for other users.
2 A folder may contain other folders.
3 A user must be able to forward a composition. A forward is a new message that is
automatically sent together with the forwarded message.
4 All folders and lists are owned by a user.
Challenge:
5 A mail list may contain both users and other lists.
6 A mail list may contain external addresses, like “[email protected]”.
7 A nickname may be an alias for an external address.
.............................................................................................................................................
3-40 Data Modeling and Relational Database Design
Practice 3—8: Holiday
..........................................................................................................................................
Scenario
“Paul and I hiked in the USA. Eric and I hiked in France and we rented a car in the
USA last year”.
Your Assignment
Comment on the model given below that was based on the scenario text.
COUNTRY TRANSPORT
France Boots
USA Boots
USA Car
RT
C TRANSPORT
F r OU COUNTRY
O
US an NT
US A ce RY C ots SP
Bo AN
A
TR
C
Er OM
Bo ar
COMPANION s
ot
ON
E ic PA
Pa ric NI
NI
O
E ic P A
ul N
Er OM
C
Pa ric
ul
3-49
..........................................................................................................................................
®
3-41
Lesson 3: Relationships in Detail
..........................................................................................................................................
Your Assignment
1 For the following ER Model, evaluate each entity against the rules of
normalization, identify the misplaced attributes and explain what rule of
normalization each misplaced attribute violates.
2 Optionally, redraw the ER diagram in third normal form.
assigned
STUDENT
#* student id
last name
first name
.............................................................................................................................................
3-42 Data Modeling and Relational Database Design
.................................
Constraints
Lesson 4: Constraints
..........................................................................................................................................
Introduction
This lesson is about constraints that apply to a business. Constraints are also known as
business rules. Some of these constraints can be easily modeled. Some can be
diagrammed but the resulting decreased clarity may not be acceptable. Some
constraints cannot be modeled at all. These should be listed in a separate document.
Overview
• Unique Identifiers
• Arcs
• Domains
• Various other constraints
4-2
.............................................................................................................................................
4-2 Data Modeling and Relational Database Design
Introduction
..........................................................................................................................................
Objectives
At the end of this lesson, you should be able to do the following:
• Describe the problem of identification in the real world
• Add unique identifiers to your model and know how they are represented
• Recognize correct and incorrect unique identifiers
• Decide when an arc is needed in your model
• Describe the similarities between arcs and subtypes
• Describe various types of business constraints that cannot be represented in an ER
diagram
..........................................................................................................................................
®
4-3
Lesson 4: Constraints
..........................................................................................................................................
Identification
What Are We Talking About?
It is not unreasonable to assume everybody knows Rembrandt was born in the
Netherlands. What most people probably do not know is that Rembrandt was born on a
farm as the son of Pajamas and an unknown father. Rembrandt had a twin sister.
Although Rembrandt never married, he was the father of numerous children. You can
easily recognize Rembrandt and his offspring as they all have four white stripes at the
end of their tails.
Identification is about knowing what or who you are talking about. Obviously, the
name Rembrandt is not unique to the famous painter; other human beings and even
cats have the same name.
In day-to-day conversations, you can usually assume that you and the people you talk
to share enough of the same context and know enough about each other’s jobs and
interests, to understand what you are both talking about. Language is always a rather
nonspecific way to communicate, with lots of ambiguities, but people are very capable
of interpretation. Computers must communicate in a more specific way that is not
open to much interpretation. It would help a system to be told “Rembrandt the painter”
or “Rembrandt van Rijn, born in 1606” or maybe even the combination of all:
“Rembrandt van Rijn, the painter, born in 1606”, to distinguish this Rembrandt from
the other famous creatures with the same name.
.............................................................................................................................................
4-4 Data Modeling and Relational Database Design
Identification
..........................................................................................................................................
Identification in the Real World Many things in the real world are difficult, if not
impossible, to identify—distinguishing between two cabs, two customers, two versions
of a contract, or two performances of the fourth string quartet by Shostakovich. As a
general rule, real world things cannot be identified with certainty. You have to live
with a substantial level of ambiguity. For example, how can I be sure that the car at the
other side of the street with license plate MN4606 is the same car as the one I saw last
week with that number? I cannot even be sure it is the same license plate. In normal
circumstances there in no reason for doubt, but that is not the same as certainty.
Sometimes people have their reasons for creating confusion.
Fortunately, some things in the real world are easier as they are within your reach.
There you can define the rules. When a company sends out, for example, invoices, it
can give every single invoice a unique number. When a business lets people create
ElectronicMail usernames (identities), they can force these names to be unique.
Identification Within a Database Usually, database systems can make sure that a
row is not stored twice, or, to be more exact, that a particular combination of values is
not stored twice, within the same table. The technical problem is solved for you by the
standard software you use.
Representation The remaining problem is to make sure that you can always know
what real world thing is represented by a particular row in a table. The solution to this
problem depends highly on the context. How likely do you consider it to be that two
different employees for the same company have the same family name, or the same
family name plus initials, or the same family name plus initials plus birthdate?
G. Papini, please?
EMPLOYEES
Name Initials Birthdate
PAPINI G. 02-FEB-1954
HIDE T.M. 11-JUN-1961
PAPINI G. 02-FEB-1945
BAKER S.J.T. 24-SEP-1958
Clearly, the answer could be different when your company employs five or 50,000
employees.
Be aware that adding a new identifying attribute for EMPLOYEE, say, Id, only
partially solves the above problem. It would be very useful within the database. It
would not help much in the real world where employees usually would not know their
IDs, let alone the IDs of others. This kind of Id attribute often works only as an
internal, but not as an external identification.
..........................................................................................................................................
®
4-5
Lesson 4: Constraints
..........................................................................................................................................
Unique Identifier
To know what you are talking about, you need to find, for every entity, a value, or a
combination of values, that uniquely identifies the entity instance. This value or
combination is called the Unique Identifier for the entity.
JOB Name
COMPUTER IN NETWORK IP Address
TELEPHONE Country code,
Area code,
Telephone number
EMPLOYEE Employee number or
Name,
Initials,
Birth Date
MAIL LIST Name,
Owner
4-5
The MAIL LIST example shows that a unique identifier is not necessarily a
combination of attributes: the owner of a MAIL LIST is actually represented by a
relationship.
UID Representation
In an ER diagram, the components of the UID of an entity are marked:
• # for attributes.
• With a small bar across the relationship end for relationships (a barred
relationship).
.............................................................................................................................................
4-6 Data Modeling and Relational Database Design
Unique Identifier
..........................................................................................................................................
USER
# Name
part of owner
of
contains owned by
MAIL LIST
# Name
Composed UID
A MAIL LIST, illustrated above, is identified by the Name of the LIST plus the USER
that owns the LIST. That means that the combination of OWNER and a Name of a list
must be a unique pair.
This means that every USER must name their LIST instances uniquely, but need not
worry about names given by other users. It also means that the system may have many
LIST instances with the same name, as long as they are owned by different USERS.
You may argue that a USER also has a composed UID, as the Name must be unique,
within this mail system. To show this, you could add an extra high level entity, MAIL
PROVIDER, plus a relationship form USER to PROVIDER. The relationship then is
part of the UID of a USER.
..........................................................................................................................................
®
4-7
Lesson 4: Constraints
..........................................................................................................................................
USER USER
# Name # Name
owner is owner
part of of
of referred to
contains owned by owned by
LIST LIST
# Name # Name
contains
referring
contained
to
in
LIST ITEM
4-8
Indirect Identification
Identification regularly takes place using an indirect construction, that is, when the
instance of an entity is identified only by the instance of another entity it refers to.
Examples
• In many office buildings employees are identified by their badge, which is
identified by a code.
• Around the world a person is identified by the picture on their passport.
.............................................................................................................................................
4-8 Data Modeling and Relational Database Design
Unique Identifier
..........................................................................................................................................
• All cows in the European Community are identified by the number of the tag they
are supposed to wear in their ear.
• When you park a car at Amsterdam International Airport you enter the parking lot
by inserting a credit card into a slot at the gate. The parking event is identified by
the credit card of the person that parked the car. This is a double indirect
identification.
Clearly, these identification constructions are not 100% reliable, but are probably as
far as you can go in a situation.
The model of these indirect identifications is shown in the next illustration, at the right
bottom corner. An instance of S is identified by the single instance of T it refers to. In
other words, the UID consists of one relationship only.
Multiple UIDs
Entities may have multiple UIDs. Earlier, you saw the example of entity EMPLOYEE
that can be identified by an Employee Number, and possibly by a combination of, for
example, Name, Initials and Birth Date.
At some point in time, usually at the end of your analysis, you promote one of the
UIDs to be the primary UID. All the other UIDs are called secondary UIDs.
You would usually select the UID that is most compact or easy to remember to
become primary UID. The reason, of course, is that the UID leads to one or more
foreign key columns in related tables. These columns should not be too sizeable.
Preferably, the primary UID of an entity does not consist of optional elements.
UID in Diagram
Only the primary UID is shown in ER Diagrams.
..........................................................................................................................................
®
4-9
Lesson 4: Constraints
..........................................................................................................................................
Z Q P
# Z1 # Q1 # P1
o Z2
o Z3 Y
# Y1 K
# Z4
# Y2 L
# L1
X
# X1 M
# M1
XY R T S
# R1 # T1
4-9
L F G
K # G1
# L1 # F1
# K1
P R
# P1 # R1
KL
T
o # T1 Q
# Q1
G H
# G1
4-10
.............................................................................................................................................
4-10 Data Modeling and Relational Database Design
Unique Identifier
..........................................................................................................................................
Information-Bearing Identifiers
When things in the real world are coded, you need to be especially careful. Codes that
have been used for some time are often information bearing. An example is a company
that uses product codes like 54.0.093.81, where 54 refers to the product group, 0
shows that the product is still in production, 093 identifies the factory where the
product is made and 81 is a sequence number. These codes come from the time when a
maximum amount of information had to be squeezed into a minimum number of bits.
The example above would be modeled conceptually:
Information-Bearing Codes
54.0.093.81
Product Group
In Production?
Factory
Sequence Number
4-11
The Code attribute would contain the same codes, for reasons of compatibility, but
now without meaning, as the old meaning is transferred to the attributes and
relationships. Product 54.0.093.81 may now be produced by factory 123 and may no
longer be in Product Group 54.
..........................................................................................................................................
®
4-11
Lesson 4: Constraints
..........................................................................................................................................
Arcs
Suppose ElectronicMail rents the Advertisement Areas that are located in their various
mail screens on the Web. This renting is controlled by contracts; contracts consist of
one or more standard conditions and customized conditions. This can be modeled with
four entities: CONTRACT, CONTRACT COMPONENT, STANDARD
CONDITION and CUSTOMIZED CONDITION. See the model below. How do we
model the following constraint: every instance of CONTRACT COMPONENT refers
to either a STANDARD CONDITION or a CUSTOMIZED CONDITION, but not to
both at the same time?
An arc is a constraint about two or more relationships of an entity. An arc indicates
that any instance of that entity can have only one valid relationship of the relationships
in the arc at any one time. An arc models an exclusive or across the relationships. An
arc is therefor also called exclusive arc.
There is no similar constraint construct for attributes of an entity.
Arcs
4-12
Arc Representation
The arc is drawn as an arc-shaped line, around an entity. Where the arc crosses a
relationship line a small circle is drawn, but only if the relationship participates in the
arc.
.............................................................................................................................................
4-12 Data Modeling and Relational Database Design
Arcs
..........................................................................................................................................
Exclusive Arc
USER owner
of
owned
by
LIST
is referred to container is referred
of to
contained
referring to in referring to
LIST ITEM
4-13
Suppose a MAIL LIST may contain USERS as well as other MAIL LISTS. This
means that a particular LIST ITEM may refer to a USER or a LIST. To be more
precise, it must be a reference to a USER or to a LIST, but not to both at the same
time.
Note
• The relationship contained in/container of from LIST ITEM to LIST (the one that
is printed in gray) is not part of the arc as there is no small circle at the intersection
with the arc.
• A relationship that is part of a UID may also be part of an arc.
• The constraint that a LIST may only contain LISTS other than itself cannot be
shown in the model.
..........................................................................................................................................
®
4-13
Lesson 4: Constraints
..........................................................................................................................................
See Page 36
Possible Arc Constructs
4-14
.............................................................................................................................................
4-14 Data Modeling and Relational Database Design
Arcs
..........................................................................................................................................
Incorrect Arcs
4-15
You cannot capture all possible relationship constraints with arcs. For example, if two
out of three relationships must be valid, this cannot be represented. The table below
shows what an arc can express.
}n n n
}n 1 1
}n 0 n
}n 0 1
4-16
..........................................................................................................................................
®
4-15
Lesson 4: Constraints
..........................................................................................................................................
Arc or Subtypes
Relationships within an arc are often of a very similar nature. They frequently carry
exactly the same names. If that is the case, an arc can often be replaced by a subtype
construction, as the illustration shows. On the left you see the arc that contains both
referring to relationships of LIST ITEM. In the model on the right there is only one
relationship left, now connected to an entity ADDRESS, a new supertype entity of
USER and LIST.
Both models are equivalent.
Arc or Subtype
USER ADDRESS
owner owner
of USER of
owned owned
by by
LIST LIST
is referred is referred contains
to contains to
referring referring is referred
to to to
in referring to in
LIST ITEM LIST ITEM
4-17
The model on the left emphasizes the difference between USER and LIST, which
clearly exists; the other model emphasizes the commonality. This commonality is
mainly a functional issue. Both USERS and LISTS can be part of a LIST and both can
be used as the address in the To, Cc or Bcc field in the screen for composing a
message.
Generally speaking, you can replace every arc with a supertype/subtype construction
and every supertype/subtype construction with an arc.
.............................................................................................................................................
4-16 Data Modeling and Relational Database Design
More About Arcs and Subtypes
..........................................................................................................................................
A A
1 2
R
Q P Q
P
A C
A C B
A C B
B
3 4 5
R R
Q Q
P Q P P
4-18
Note that only model 5 does not present the same information. In model 5, an instance
of B may be related to an instance of Q, unlike that which is modeled in 3 and 4.
..........................................................................................................................................
®
4-17
Lesson 4: Constraints
..........................................................................................................................................
Hidden Relationships
Every subtype hides a relationship between the subtype and its supertype. Moreover,
the relationships are in an arc, as the next illustration shows. Both relationships are
mandatory 1:1 is/is relationships.
A A
is B
B is
is C
C
is
4-19
.............................................................................................................................................
4-18 Data Modeling and Relational Database Design
Domains
..........................................................................................................................................
Domains
A very common type of attribute constraint is a set of values that shows the possible
values an attribute can have. Such a set is called a domain.
Very common domains are, for example:
• Yesno: Yes, No
• Gender: Male, Female, Unknown
• Weekday: Sun, Mon, Tue, Wed, Thu, Fri, Sat
In a conceptual data model you can recognize these as entities with, usually, only two
attributes: Code and Description. These domain entities are referred to frequently but
do not have any “many” relationships of their own, (see model A below). Typically,
you would know all the values before the system is built. The number of values is
normally low. Often you would deliver such a system with non-empty code tables
An alternative model for the (sometimes many) code entities is a more generic, two-
entity approach: CODE and CODE TYPE, model B.
Model A has the advantage of fewer relationships per entity as well as easy-to-
understand entities; B has obviously fewer entities and therefore will lead to fewer
tables.
.
Value sets
CODE TYPE
# Id
YESNO * Name
# Code
A * Max Length
* Description of Description
B
GENDER
# Code
* Description CODE
# Code
WEEKDAY * Description
# Code
* Description
4-20
Domains that have a large number of values, such as all positive integers up to a
particular value, are usually not modeled.
You should list and describe such a constraint in a separate document.
..........................................................................................................................................
®
4-19
Lesson 4: Constraints
..........................................................................................................................................
Categories: Examples
• Conditional domain: The domain for an attribute depends on the value of one or
more attributes of the same entity.
• State value transition: The set of values an attribute may be changed to depends
on the current value of that attribute.
• Range check: A numeric attribute must be between attribute values of a related
instance.
• Front door check: A valid relationship must only exist at creation time.
• Conditional relationship: A relationship must exist or may not exist, if an
attribute (of a related entity) has a special value.
• State value triggered check: A check must take place when an attribute is given a
value that indicates a certain state.
• There are also combinations of the above.
See Page 37
EMPLOYEE JOB
* Name * Title
* Address * Minimum Salary between
with * Maximum Salary
of
for referring to
EMPLOYMENT
* Start Date
o End Date
* Salary
Constraint: Employee salary must be within the salary range of the job of the
employee.
.............................................................................................................................................
4-20 Data Modeling and Relational Database Design
Some Special Constraints
..........................................................................................................................................
Possible
Wid
Mar
to
Sin
Div
DP
Marital Status
EMPLOYEE
Transitions from
* Name
* Address Single
* Current Marital Status Married
Widowed
Divorced
Domestic Partnership
Constraint: Marital Status of employees cannot change from any value to all other
values.
CONTRACT
# Id
* Standard Indicator STANDARD basis for
CONDITION based
consists on
in
of CUSTOMIZED
CONDITION
in
part of
referring to referring to
CONTRACT COMPONENT
4-23
..........................................................................................................................................
®
4-21
Lesson 4: Constraints
..........................................................................................................................................
Derived Attribute?
You may argue that the attribute Standard Indicator of CONTRACT is derivable. If
the contract contains CUSTOMIZED CONDITIONS, it is, by consequence, not a
standard CONTRACT. This may be true, but it is not necessarily so. Suppose the
contract is created in various steps, by various people with different responsibilities.
Then, the creation of a CONTRACT is a process that may take days. The Standard
Indicator, then, is an attribute of that process. Only when the CONTRACT is finalized,
should a check be made that the Indicator corresponds with the actual STANDARD
and CUSTOMIZED CONDITIONS. In those situations, the entity CONTRACT will
usually have an attribute Completed Indicator that triggers the check when set to Yes.
.............................................................................................................................................
4-22 Data Modeling and Relational Database Design
Some Special Constraints
..........................................................................................................................................
Boundaries
More than once the checking of constraints or special rules needs to use information
that is not directly related to one of the entities in the model.
Typical examples are rules and boundaries set by external sources, like a mother
company or national legislation. If reasonably possible, these rules should be part of
your conceptual data model, and should not be hard coded in your programs. The
reason is obvious: if the rule changes, which is beyond your power, there is a chance
you do not have to make changes to your programs. Only an update of a value in a
table would be necessary. The time spent developing a complete model is fully
justified by the programming time saved.
Boundaries
EXTERNAL
unrelated entity # Id
* Description
* Value
4-24
..........................................................................................................................................
®
4-23
Lesson 4: Constraints
..........................................................................................................................................
Summary
Entities in the real world must be individually identified before they can be
represented in a database. You would not know what you are talking about, otherwise.
Some entities are really difficult to identify, such as people and paintings. Some are
more easy, especially when they are part of the domain as you can make up the rules,
such as a unique number for each of the invoices you send to your customers. Some
unique identifiers are already present in the real world, often as a combination of
attributes and relationships of the entity.
Summary
• Identification
– Can be a real problem in the real world
– Models cannot overcome this
• Entities must have at least one Unique Identifier
• Unique Identifiers consist of attributes or
relationships or both
• Arcs
• Many types of constraint are not represented
in ER model
4-25
Arcs in a diagram represent a particular type of constraint for the relationships of one
entity.
Many business constraints cannot be represented in a diagram and must be listed
separately. This way the model remains clear and not too full of graphical elements.
.............................................................................................................................................
4-24 Data Modeling and Relational Database Design
Practice 4—1: Identification Please
..........................................................................................................................................
• A city
• A contact person for a customer
• A train
• A road
• A financial transaction
• An Academy Award (Oscar)
• A painting
• A T.V. show
4-27
..........................................................................................................................................
®
4-25
Lesson 4: Constraints
..........................................................................................................................................
A B C
# Xx * Yy # Zz
A B
C # Id
# Code
A
* Xx
B C with D
# Yy # Zz # Id
of
P Q
# Id
.............................................................................................................................................
4-26 Data Modeling and Relational Database Design
Practice 4—2: Identification
..........................................................................................................................................
P
# Name
6
Note: the next model describes a context that may be different from the world you are
familiar with.
PERSON
FEMALE
MALE son of
# Name
# Seqno mother of # Birth Date
partner in partner in
..........................................................................................................................................
®
4-27
Lesson 4: Constraints
.....................................................................................................................................................
Scenario
Moonlight Coffees, organization model.
Your Assignment
Use what you know about Moonlight Coffees by now, and, most importantly, use your
imagination.
1 Given the model below, indicate UIDs for the various entities. Add whatever
attributes you consider appropriate. Country organizations have a unique “tax
registration number” in their countries.
2 Are there any arcs missing?
reporting to report of
DEPARTMENT report of
HQ
reporting to
OTHER
DEPARTMENT COUNTRY in COUNTRY
ORGANIZATION
with of
with with with
EMPLOYEE belongs to in
with SHOP
of for for
PAYROLL with
ENTRY
to for to
ASSIGNMENT as JOB
in
4-34
.....................................................................................................................................................
4-28 Data Modeling and Relational Database Design
Practice 4—4: Tables
..........................................................................................................................................
Your Assignment
Read the text on ISO Relational tables.
Do a quality check on the ER model based on the quoted text and what you know
about this subject. Also list constraints that are mentioned in the text but not modeled.
Practice: Table 1
from with
FOREIGN KEY TABLE KEY
# Name # Name # Name
with
to PRIMARY
referenced of
with in with UNIQUE
with
for in
from in COLUMN for
ASSOCIATION in USAGE
# Seqno # Name
in * Data Type of # Seqno
to o Not Null
..........................................................................................................................................
®
4-29
Lesson 4: Constraints
..........................................................................................................................................
Your Assignment
Change the diagrams to model the constraint given.
EMPLOYEE
# Name managed by
manager of
1 Every EMPLOYEE must have a manager, except the Chief Executive Officer.
owned
USER by LIST
owner of
# Name # Name
owned
owner of by NICKNAME
# Alias
2 A user may not use the same name for both NICKNAME and LIST name.
with
subfolder USER
# Name
FOLDER within
Name owner
of
owned
by
3 A top level FOLDER must have a unique name per user; sub folders must have a
unique name within the folder where they are located.
.............................................................................................................................................
4-30 Data Modeling and Relational Database Design
.................................
Modeling Change
Lesson 5: Modeling Change
..........................................................................................................................................
Introduction
Every update of an attribute or transfer of a relationship means loss of information.
Often that information is no longer of use, but some systems need to keep track of
some or all of the old values of an attribute. This may lead to an explicit time
dimension in the model which is usually quite a complicated issue.
Lesson Aim
Time is often present in a business context, as many entities are in fact a representation
of an event. This lesson discusses the possibilities and difficulties that arise when you
incorporate time in your entity model.
Overview
5-2
.............................................................................................................................................
5-2 Data Modeling and Relational Database Design
Introduction
..........................................................................................................................................
Objectives
At the end of this lesson, you should be able to do the following:
• Make a well considered decision about using entity DATE or attribute Date
• Model life cycle attributes to all entities that need them
• List all constraints that arise from using a time dimension
• Cope with journalling
..........................................................................................................................................
®
5-3
Lesson 5: Modeling Change
..........................................................................................................................................
Time
Modeling Time
In many models time plays a role. Often entities that are essentially events are part of a
model, for example, PURCHASE, ASSIGNMENT. One of the properties you record
about these entities is the date or date and time of the event. Often the date and time
are part of a unique identifier.
.............................................................................................................................................
5-4 Data Modeling and Relational Database Design
Date as Opposed to Day
..........................................................................................................................................
PURCHASE
on
Single attribute entity without m:1 relationships
is usually replaced by attribute
of
PURCHASE
DAY * Date
# Date
5-4
A day, however, is not just a date. My great-grand father was born on a day in 1852,
but the exact date is unknown. A Genealogical Register System should therefore be
able to store fragments of a date, such as “1852”, or even a description, such as
“around 1765”.
Systems that store historical information often have to deal with several dates for one
event, according to multiple sources with nonidentical information.
Some systems have to take dates in conjunction with the reliability of that date.
Clearly, in these cases a simple attribute would not suffice.
Loosely speaking, when you are interested in the date only, and these dates are known
to the user, model an attribute; on the other hand, when you are interested in the day,
model it as an entity with attribute Date, which is possibly a unique identifying
attribute.
..........................................................................................................................................
®
5-5
Lesson 5: Modeling Change
..........................................................................................................................................
Entity DAY
It is not only systems that deal with historical information that struggle with dates.
Sometimes a system needs to know more about a day than can be derived from its
date. A planning system, for example, often needs to know if a particular day is a
public holiday. Many data warehouse systems use a calendar that is different from the
normal one, for example, where a year is divided into four-week periods or 30 day
Months or Quarters where Q1 starts in the middle of May.
Some warehouses need weather information about days in order to do statistical
analysis about the influence of the weather on, for example, their sales. In these cases
a day has attributes or relationships of its own and should be modeled as entity DAY.
Entity DAY
DAY
# Date
* Public Holiday Indicator
first day of
starts on
5-5
The above model shows part of a planning system where tasks are assigned to
employees. Tasks may take from a few hours to, at maximum, several days.
Based on this model, table TASK_ASSIGNMENTS will contain a date column that is
a foreign key column to the DAYS table.
.............................................................................................................................................
5-6 Data Modeling and Relational Database Design
Modeling Changes Over Time
..........................................................................................................................................
EMPLOYEE COUNTRY
# Id # Name
of in
for as
ASSIGNMENT
# Start Date
o End Date
..........................................................................................................................................
®
5-7
Lesson 5: Modeling Change
..........................................................................................................................................
COUNTRY
# Name
EMPLOYEE # Start Date
# Id * End Date
of in
life cycle
for
attributes
as
ASSIGNMENT
# Start Date
o End Date
5-7
Time-related Constraints
Be aware of the numerous constraints that result from the time dimension! Here is a
selection:
• An ASSIGNMENT may only refer to a COUNTRY that is valid at the Start Date
of the ASSIGNMENT.
• The obvious one: End Date must be past Start Date.
• A business rule: ASSIGNMENT periods may not overlap. The Start Date of an
ASSIGNMENT for an EMPLOYEE may not be between any Start Date and End
Date of an other ASSIGNMENT for the same EMPLOYEE.
• As for the previous constraint, but for End Date.
.............................................................................................................................................
5-8 Data Modeling and Relational Database Design
Modeling Changes Over Time
..........................................................................................................................................
Referential Logic
Note that, except for two, these constraints result from referential logic only. There
may be more additional business constraints.
Imagine the sheer number of constraints if a time-affected entity is related to several
other time-affected entities! Fortunately, these constraints all have a similar pattern;
these result from the referential, time related, logic.
Not in Diagram
You cannot model any of these constraints in the diagram as they all have to be listed
separately.
Implementation
In an Oracle environment, one of these constraints can be implemented as a check
constraint, (End Date must be later than Start Date). All the others will be
implemented as database triggers.
..........................................................................................................................................
®
5-9
Lesson 5: Modeling Change
..........................................................................................................................................
PRODUCT
# Id
* Name
with
PRICE =
of PRICED PRODUCT=
PRICE HISTORICAL PRICE
* Price in $
# Start date
o End Date
5-8
Products have a price. Prices change. Old prices are probably of interest. That leads to
a model with entities PRODUCT and PRICE. The latter entity contains the prices and
the time periods they are applicable. In real-life situations you find the concept of
PRICE also named PRICED PRODUCT, HISTORICAL PRICE (and less appropriate:
price list or price history); all these names more or less describe the concept.
You may argue the need for an End Date attribute. If the various periods of a product
price are contiguous, End Date is obsolete. If, on the other hand, the products are not
always available, as in the fruit and vegetable market, the periods should have an
explicit End Date.
.............................................................................................................................................
5-10 Data Modeling and Relational Database Design
A Time Example: Prices
..........................................................................................................................................
See Page 27
What Price to Pay?
ORDER HEADER
PRODUCT referred # Id
# Id by
* Order Date
* Name
n
ee
with
with
tw
be
of
of
ORDER ITEM
PRICE
* Quantity Ordered
* Price in $ referring
# Start date to
o End Date
5-9
Here, entities ORDER HEADER and ORDER ITEM are introduced. An ORDER
HEADER holds the information that applies to all items, like the Order Date and the
relationship to the CUSTOMER that placed the order or the EMPLOYEE that handled
it. (For clarity, these relationships are not drawn here.) The ORDER ITEM holds the
Quantity Ordered and refers to the PRODUCT ordered. The price that must be paid
can be found by matching the Order Date between Start Date and End Date of PRICE.
Note that you cannot model this “between relationship”.
This model is a fairly straightforward product pricing model and is often used.
Order
Note that the concept of an order in this model is composed of ORDER HEADER and
ORDER ITEM.
To find the order total for an order, it would need a join over four tables.
..........................................................................................................................................
®
5-11
Lesson 5: Modeling Change
..........................................................................................................................................
Price List
A variant on the above model is often used when prices as a group are usually changed
at the same time. The period that prices are valid is the same for many prices; that
would lead to this model:
between
PRICE LIST
ORDER HEADER
# Id
PRODUCT # Id
* Start Date
o End Date
# Id * Order Date
* Name referred with
with with by
on of
of
ORDER ITEM
PRICED PRODUCT
* Quantity
* Price in $ referring
Ordered
to
5-10
Entity PRICE LIST represents the set of prices for the various products; PRICED
PRODUCT represents the price list items. To know the price paid for an ordered item,
you take the Order Date of the ORDER HEADER, and take the PRICE LIST that is
applicable at that date. Next, you go from ORDER ITEM to the PRODUCT that is
referred to and from there to the PRICED PRODUCT of the PRICE LIST you have
just found. To find the order total for an order, it would need a join over five tables.
.............................................................................................................................................
5-12 Data Modeling and Relational Database Design
A Time Example: Prices
..........................................................................................................................................
PRICE LIST
ORDER HEADER
# Id
PRODUCT # Id
* Start Date
o End Date
# Id * Order Date
* Name with
with with
on of
of
referred by ORDER ITEM
PRICED PRODUCT
* Quantity
* Price in $ referring
Ordered
to
5-11
Here an ORDER ITEM refers directly to a PRICED PRODUCT. At create time of the
ORDER ITEM the constraint is applied that the Order Date must mach the correct
PRICE LIST period. To find the order total for an order now only requires three tables.
..........................................................................................................................................
®
5-13
Lesson 5: Modeling Change
..........................................................................................................................................
Negotiated Prices
Negotiated Prices
PRICE LIST
ORDER HEADER
# Id
PRODUCT # Id
* Start Date
o End Date
# Id * Order Date
* Name referred by with
with with
on of
of
ORDER ITEM
PRICED PRODUCT
* Quantity Ordered
* Price in $ referring * Negotiated Price
to
5-12
When prices are subject to negotiation, the model becomes simpler. Negotiated Price
is now an attribute of entity ORDER ITEM; ORDER ITEM refers to PRODUCT.
Every referential constraint can be modeled.
This model may seem to hold derivable information, but this is not true. Even in the
case that almost all Negotiated Prices are equal to the current product price, you have
to model Negotiated Price at ORDER ITEM level, just because of the small chance of
an exception. To find the order total you require only two tables. You can imagine that
many analysts choose this variant of the model as a safeguard, even if there is nothing
to negotiate at present.
.............................................................................................................................................
5-14 Data Modeling and Relational Database Design
A Time Example: Prices
..........................................................................................................................................
..........................................................................................................................................
®
5-15
Lesson 5: Modeling Change
..........................................................................................................................................
Current Price
Current Prices
of of of
PRICE PRICE PRICE
* Price in $ * Price in $ * Price in $
# Start Date # Start Date # Start date
* End Date o End Date o End Date
o Current Indicator
5-13
These models are variants on the PRODUCT-PRICE model you have seen before.
In the left-hand model the 1:m relationship between PRODUCT and PRICE shows the
real historical prices only. You can guess that only historical prices are kept because
attribute End Date is mandatory; an additional constraint is that this value should
always be in the past. The Current Price of a PRODUCT is represented as an attribute.
This model does not have any redundancies.
In many situations it would be a good design decision to keep the current product
prices as well as the old prices in one table based on entity PRICE. The middle model
is an ER representation of that situation. Note that End Date is now optional.
The right-hand model is another model that contains a subtle redundancy. See more on
this type of redundancy in the lesson on Denormalized Data.
.............................................................................................................................................
5-16 Data Modeling and Relational Database Design
Journalling
..........................................................................................................................................
Journalling
When a system allows a user to modify or remove particular information, the question
should arise if the old values must be kept on record. This is called logging or
journalling. You will often encounter this when the information is of a financial
nature.
Journalling
by
PAYMENT to
by o Date Paid
PAYMENT to * Amount in $
*Date Paid
with
*Amount in $
of
AMOUNT
MODIFICATION
* Old Amount in $
* Modified by
* Date Modification
5-14
A journal usually consists of both the modified value and the information about who
did the modification and when it was done. This extra information can, of course, be
expanded if you wish.
Apart from the consequences for the conceptual data model, the system needs special
journalling functionality: any business function that allows an update of Amount In
should result in the requested update, plus the creation of an entity instance AMOUNT
MODIFICATION with the proper values. Of course, the system would need special
functions as well in order to do something with the logged data.
No Journal Entity
When several, or all, attributes of an entity need to be journalled, it is often
implemented by maintaining a full shadow table that has the same columns as the
original plus some extra to store information about the who, when, and what of the
change. This table does not result from a separate entity; it is just a second, special,
implementation of one and the same entity.
..........................................................................................................................................
®
5-17
Lesson 5: Modeling Change
..........................................................................................................................................
.............................................................................................................................................
5-18 Data Modeling and Relational Database Design
Summary
..........................................................................................................................................
Summary
Every update in a system means loss of information. To avoid that you can create your
model to keep a history of the old situations. Sometimes relationships refer to a time-
dependent state of an entity. In other words, the updated entity is in fact a new instance
of the entity and not an updated existing instance. If this is the case, the time-
dependent referential constraints cannot be modeled by a relationship only.
Time in your model is a complicated issue. Many models have some time-related
entities.
Summary
5-15
..........................................................................................................................................
®
5-19
Lesson 5: Modeling Change
.....................................................................................................................................................
Scenario
Some shops are open 24 hours a day, seven days a week. Others
close at night. Employees work in shifts. Shifts are subject to local legislation. Below
you see the shifts that are defined in one of the shops in Amsterdam.
Your Assignment
List the various date/time elements you find in this Shift scheme and make a
conceptual data model.
Practice: Shift
Museumplein, Amsterdam, March 21
Shift 1 2 3 4 5
5-17
.....................................................................................................................................................
5-20 Data Modeling and Relational Database Design
Practice 5—2: Strawberry Wafer
.....................................................................................................................................................
Your Assignment
Revisit your model and make changes, if necessary, given this extra information.
Prices are at the same level within a country; prices are determined
by the Global Pricing Department. Usually the prices for regular,
global products are re-established once a year.
Prices and availability for local specialties are determined by the
individual shops. For example, the famous Norwegian Vafler med
Jordbær (a delicious wafer with fresh strawberries) is only available
in summer. Its price depends on the current local market price of
fresh strawberries.
5-19
.....................................................................................................................................................
®
5-21
Lesson 5: Modeling Change
.....................................................................................................................................................
Scenario
As a test, Moonlight sells bundled products in some shops, for a special price. Here are
some examples.
Bundles sell very well; all kinds of new bundles are expected to come.
The system should know how all these products are composed, in order to complete
various calculations.
Your Assignment
1 Modify the product part of the model in such a way that the desired calculations
can be completed.
PRODUCT GROUP
# Name
classification
for
classified
as
PRODUCT
# Id
* Name
.....................................................................................................................................................
5-22 Data Modeling and Relational Database Design
Practice 5—3: Bundles
.....................................................................................................................................................
.....................................................................................................................................................
®
5-23
Lesson 5: Modeling Change
.....................................................................................................................................................
Scenario
Moonlight needs to make sales information available as a tool to
optimize its business. A hierarchical product structure is being developed to be able to
report on different summary levels. This hierarchical structure should replace the
single level product group classification. Below you see the current idea about a
product structure. This structure is far from complete, but it should give you an idea of
the shape the structure will take. The + signs mean that the structure will be expanded
at that point.
Your Assignment
1 Create a model for a product classification structure.
+ Products
+ Drinks
+ Coffees
Regular
Cappuccino
Café Latte
+ Special Coffee
Teas
+ Black
Chinese
Indian
English
+ Infusions
+ Herbal
Soft drinks
Juices
Orange
Grape
+ Waters
+ Sodas
+ Dairy Products
+Foods
+ Pastry
+ Candy Bars
+ Local Specialties
+Non Foods
Merchandise
CDs
+ Stationary
Other
+ Tickets
+ Art
5-22
.....................................................................................................................................................
5-24 Data Modeling and Relational Database Design
.................................
Advanced Modeling
Topics
Lesson 6: Advanced Modeling Topics
..........................................................................................................................................
Introduction
Lesson Aim
This lesson gives an overview of patterns you can discover in data models. This lesson
introduces some generic models. You can use these to make your model withstand
future changes that are predictable but not yet known.
Objectives
Overview
• Patterns
• Drawing conventions
• Generic modeling
6-2
.............................................................................................................................................
6-2 Data Modeling and Relational Database Design
Introduction
..........................................................................................................................................
..........................................................................................................................................
®
6-3
Lesson 6: Advanced Modeling Topics
..........................................................................................................................................
Patterns
Similar Structure
Many models contain parts that have a similar structure, although the context may be
completely different. For example, the structure of a conceptual data model in the
context of a dictionary that deals with concepts such as headword, entry, meaning,
synonym is, surprisingly, almost identical to the structure of a railroad with track,
station, connection, and also to the structure of a baseball or soccer competition.
Easier to see are the similarities between, for example, ORDER HEADER with
ORDER ITEM and QUOTATION HEADER with QUOTATION ITEM, or between
MARRIAGE and JOB ASSIGNMENT.
• Similar structure
• Similar rules and constraints?
.............................................................................................................................................
6-4 Data Modeling and Relational Database Design
Master Detail
..........................................................................................................................................
Master Detail
Patterns: Master–Detail
consists
of
part of
B
• Characteristic: consists of
An instance of B only exists in the context of an A
• Metaphor: Master–Detail
6-3
Master-detail constructions are very common, as 1:m relationships are very common.
Distinguish between a 1:m relationship that is typically directed from the 1 to the
many and a relationship that is directed the other way around (see below). Master-
detail is characterized by the fact that the master A is divided into B’s. B’s do not exist
alone; they are always in the context of an A.
It is very rare that these relationships are transferable; if an instance of B is connected
to the wrong instance of A, it is far more likely that the instance of B is deleted and
then recreated in the context of the correct A.
Typical master-detail relationship names:
• Consists of
• Divided into
• Made of
• (Exists) With
Often a master A is of no value when it has no B’s, for example, the relationship is
mandatory at the 1 side. This mandatory relationship end can usually be circumvented,
as you have seen before.
Implementation
The tables that come from this master-detail pattern should be considered as clustered.
..........................................................................................................................................
®
6-5
Lesson 6: Advanced Modeling Topics
..........................................................................................................................................
Basket
Pattern: Basket
A X
A Y
Z
X B
consists
of Y
part of
B Z
• Characteristic:
container for various types of items
• Items may be of different types
• Metaphor: Shopping Basket
6-4
.............................................................................................................................................
6-6 Data Modeling and Relational Database Design
Classification
..........................................................................................................................................
Classification
Patterns: Classification
classifying
classified by
P
6-5
This is again a 1:m relationship, but now the main orientation is from P to Q.
This is typically the case when Q can exist independently from P. Q acts as a class for
P, something with which to group P’s.
Usually entities in a conceptual data model have several of these classes.
Typical classification-type relationship names:
• Classified by
• Grouped by
• Assigned to
• (Exists) In
The relationship is usually transferable as classifications may change over time.
..........................................................................................................................................
®
6-7
Lesson 6: Advanced Modeling Topics
..........................................................................................................................................
Hierarchy
Patterns: Hierarchy
A
# Id
6-6
Most hierarchical structures have a known limit for the maximum number of levels. If
that is the case and the limit is a low number of 5, for example, then usually the best
model is the one that is shown in the left of the illustration, one entity per level.
Model the structure with the recursive relationship if:
• The structure has no known level limit.
• The structure has a level limit, but the limit is high, say six or more.
• An instance of the structure can easily have a change of position, thus changing its
level.
• You like maintaining constraints.
.............................................................................................................................................
6-8 Data Modeling and Relational Database Design
Hierarchy
..........................................................................................................................................
Also the hierarchical structure of a FILE SYSTEM with files and folders, which are
files of a particular type, is a disputable hierarchy when you think of the concept of a
shortcut in Windows (or a Link in UNIX). These shortcuts transform the hierarchy
conceptually into a network although technically a shortcut and a link are just files
with a special role.
Implementation
The first constraint, A1 may not refer to A1, and you can easily check this with an
Oracle check constraint. The others need some programming and lead to database
triggers.
Possibly you may have to check extra business rules, for example, when the number of
levels may not exceed a given value.
..........................................................................................................................................
®
6-9
Lesson 6: Advanced Modeling Topics
..........................................................................................................................................
Chain
Patterns: Chain
preceded
by B
BEAD
# Id
followed CHAIN
by
A
BEAD
# Seqno
6-7
.............................................................................................................................................
6-10 Data Modeling and Relational Database Design
Network
..........................................................................................................................................
Network
Patterns: Network
A
A A
• Characteristic: pairs
Every A can be connected to every A
(sometimes: to every other A)
• Metaphor: Web Document with Hyperlinks
6-8
Network structures typically describe pairs of things of the same type, for example,
marriage, railroad track (pair of start and end stations), synonyms (two words with the
same meaning), and Web documents with hyperlinks to other Web documents.
Characteristics
Often:
• The m:m relationship must be resolved to hold specific information about the pair
such as the date of the marriage, or the length of the railroad track.
• The two relationships of the intersection entity form the unique identifier.
• Time-related constraints apply in networks that must guard, for example, the kind
of rules that deal with “sequentially monogamous”.
• The two relationships refer to different subtypes of the entity:
Note that a hierarchy is a network where a particular set of business rules apply.
..........................................................................................................................................
®
6-11
Lesson 6: Advanced Modeling Topics
..........................................................................................................................................
Bill of Material
A special example of a network structure is a Bill of Material (BOM). A BOM
describes the way things are composed of other things, and how many of these other
things (here it is instances of PRODUCT) are needed. Entity COMPOSITION is the
intersection entity with attribute Quantity Needed.
Bill of Material
product of
PRODUCT COMPOSITION
# Code with * Quantity Needed
part in
in
PRODUCTS COMPOSITIONS
Code Name Prod_code Part_code Quantity
914.53 AAAAAAAAA 854.01 604.18 1
914.54 AA 854.01 604.19 1
914.55 BBBBBBBBB 854.01 914.54 2
914.56 CCCCCCC 914.54 914.55 1
DDDDD 914.54 914.56 1
934.76 915.12 3
6-10
854.01
914.54 914.54
604.18 914.55
914.56
604.19
6-11
.............................................................................................................................................
6-12 Data Modeling and Relational Database Design
Symmetric Relationships
..........................................................................................................................................
Symmetric Relationships
Symmetric recursive relationships cause a very special kind of problem which is more
complex than you would assume.
In most contexts a record of a pair (A1, A2) has a different meaning when referred to
as (A2, A1). For example, if the model is about entity PERSON and the relationship is
mother of /daughter of, then the existence of person pair (P1, P2) would mean the
exclusion of the possibility of pair (P2, P1).
The recursive relationship of PERSON and family of / family of. Here, if (P1, P2) is
true, then (P2, P1) is equally true. This is called a symmetric relationship. There are
other symmetric recursive relationships such as: STATION directly connected by rail
with STATION,
GROUP Group_id S
# Id GROUP 1 S1
Group_id S
# Id
consists of 2 1 S2 1 S1
in consists of 2 2 S3 1 S2
2 S4 2 S3
S in
3 S5 2 S4
S
3 S6 3 S5
3 S6
..........................................................................................................................................
®
6-13
Lesson 6: Advanced Modeling Topics
..........................................................................................................................................
Roles
Patterns: Roles
A P
6-13
Roles often occur when a system needs to know more about people than the basic
Name/Address/City information. Modeling the roles as separate entities offers the
possibility to show which attributes are mandatory for a particular role, and, if
necessary, to show relationships between the various roles. The example below shows
that a person in their role as president of a country can appoint a person in the role of
minister of a department. Possibly the words “presidency” and “ministership” are
closer to the concepts than the ones in the diagram.
PERSON ROLE
TYPE
ROLE
roles
PERSON PRESIDENT COUNTRY
appointing
appointed by
MINISTER DEPARTMENT
PARTY PARTY
LEADER
.............................................................................................................................................
6-14 Data Modeling and Relational Database Design
Fan Trap
..........................................................................................................................................
Fan Trap
Fan Trap
A B
A B
C
AB
AC BC
6-15
A Fan Trap (named after the characteristic shape of the solution) occurs when three or
more entities are related through m:m relationships and form a ring. Usually you
should replace the relationships with a central entity having several m:1 relationships.
Preventing a fan trap is similar to resolving a m:m relationship between two entities.
A B C A B C A B C
AB BC
..........................................................................................................................................
®
6-15
Lesson 6: Advanced Modeling Topics
..........................................................................................................................................
Data Warehouse
July
B C
A X D
F E
A data warehouse system can be modeled as any system. Data warehouses contain the
same sort of information as any straightforward transaction processing information
system. Data warehouses usually contain less detailed, summarized, information as
warehouses are mainly built for overview and statistical analysis. However, Data
warehouses in general receive the input from online transaction systems that do
contain details.
Data warehouses often have a star-shaped model: this is made up of one central entity
(the facts) containing the condensed, summarized, information, and several
dimensions that classify and group the details.
Common dimensions represent entities such as:
• Time
• Geography
• Actor (for example, salesperson, patient, customer, instructor)
• Product (for example, article, medical treatment, course)
Often the dimensions are classified as well. Time may be structured in day, week,
month, quarter, year. You can classify products in various ways as you have seen in
earlier examples. If this is the case, the model is usually described as the Snowflake
model, as it looks like the crystal shape of a snowflake.
.............................................................................................................................................
6-16 Data Modeling and Relational Database Design
Drawing Conventions
..........................................................................................................................................
Drawing Conventions
Drawing Conventions
high volumes
high volumes
6-18
Two drawing conventions are widely in use: one that positions the entities with the
high volumes at the top of the paper and one that does the opposite. Both try to avoid
crossing relationship lines, partially overlapping entities, and relationship lines that
cross entities. Whatever convention you choose, choose one and use it consistently.
This will prevent errors and make the reading of large diagrams much easier.
Keep the overall structure of the layout unchanged during the modeling project as
many people are disoriented when you change the structure.
Make separate diagrams for every business area. These may have a different layout;
these diagrams are mainly used for communication with subject matter experts.
At the end of this course, you should be able to read models created in any drawing
convention, and you should be able to complete a model following any convention
used.
..........................................................................................................................................
®
6-17
Lesson 6: Advanced Modeling Topics
..........................................................................................................................................
But:
Readability first
The major goal of creating the diagram (but not the model) is to give a representation
of the model that can be used for communication purposes. This means that you must
never let a convention interfere with readability and clarity. Do not be concerned that
readability takes space. Usually an entity model is represented by several diagrams
that show only the entities and relationships that deal with a particular functional part
of the future system. Splitting the model over various diagrams adds to the readability.
Model Readability
B A B
A
C
E C
D
F E
D
• Takes space
• Subject to taste F
6-20
.............................................................................................................................................
6-18 Data Modeling and Relational Database Design
Generic Modeling
..........................................................................................................................................
Generic Modeling
Generic Modeling
MANUFACTURER MANUFACTURER ARTICLE
* Name * Name TYPE
FILM ARTICLE
* AsaTRIPOD o Weight
* Height
LENS o Focal Distance
CAMERA o Height
* Focal
BODY o Asa Number
Distance
* Weight o ...
6-21
..........................................................................................................................................
®
6-19
Lesson 6: Advanced Modeling Topics
..........................................................................................................................................
Generic Models
More generic models are shown below. They may be useful in particular situations.
ARTICLE TYPE
ARTICLE
* Definition Prop1 o Property1
o Definition Prop2
o Property2
o Definition Prop3
o Property3
o Definition Prop4
o Property4
...
o Property5
o Property6
o Property7
MANUFACTURER
o Property8
* Name
Recycling of Attributes You can use this model if it is safe to assume the articles
will have a limited number of attributes. This limit may be a high number but must be
set beforehand. Property1 may contain the Asa Number for instances of ARTICLE of
TYPE Film and may contain Weight for instances of ARTICLE of TYPE Camera
Body and so on. The major advantage of this model is the possibility of adding new
instances of ARTICLE TYPE without the need to change the model.
The type of information that should be entered for Property1, Property2, and so on can
be described by using, for example, the Definition Prop1, attributes of ARTICLE
TYPE. Here you can also store information about the data type of these properties.
ARTICLE TYPE
ARTICLE PROPERTY
.............................................................................................................................................
6-20 Data Modeling and Relational Database Design
More Generic Models
..........................................................................................................................................
THING
having some kind of
relationship with
THING
ASSOCIATION
THING
TYPE ASSOCIATION
TYPE
THING
ASSOCIATION
6-26
This is a rather generic model. In fact, it is a model of the universe and beyond. Note
that the number of attributes for entity THING may be substantial.
..........................................................................................................................................
®
6-21
Lesson 6: Advanced Modeling Topics
..........................................................................................................................................
THING
ASSOCIATION
PROPERTY
6-27
This model combines the concepts of “thing” and the property/property value and thus
allows everything to be represented with a free number of properties per type.
CUSTOMER ‘generic’
ARTICLE TYPE
ORDER HEADER
ARTICLE PROPERTY
ORDER ITEM
.............................................................................................................................................
6-22 Data Modeling and Relational Database Design
Summary
..........................................................................................................................................
Summary
Summary
• Patterns
– Show similarities
– Invent your wheel only once
• Generic models
– Reduce the number of entities dramatically
– Are more complex to implement
– Are very flexible
– Are usually the best choice in unstable
situations
6-29
..........................................................................................................................................
®
6-23
Lesson 6: Advanced Modeling Topics
..........................................................................................................................................
Your Assignment
What pattern do you expect to find in the given contexts? If you do not see it, make a
quick sketch of the model. Use your imagination and common sense.
Practice: Patterns
6-31
.............................................................................................................................................
6-24 Data Modeling and Relational Database Design
Practice 6—2: Data Warehouse
.....................................................................................................................................................
Scenario
Moonlight wants to build a data warehouse based on the detailed sales figures the
shops report back on a daily basis. Examples of questions Moonlight wants the data
warehouse to answer are printed below.
•What is the sales volume in $ of coffee last month compared with the coffee sales
volume same month last year?
•What is the sales volume in $ of coffee per head in Japan compared with the
average coffee sales volume in the Moonlight countries around the world?
•What is the growth of the sales volume in $ of coffee in Sweden compared with the
growth of sales volume of all products in the same geographical area? What is the
growth in local currency?
•What was the total sales volume in $ of coffee last month, compared with the total
coffee sales volume in the same month last year, for the shops that have been open
for at least 18 months?
•What is the growth of the sales volume in $ of nonfoods compared to that of foods?
•What is the best day of the week for total sales in the various countries? How is that
related to the average? Is the best day of the week dependent on the type of
location?
•What products are most profitable per country? Globally?
•Does the service level (#employees per 1000 items sold) have influence on sales?
6-32
Your Assignment
1 Check the Moonlight models you created so far. Do they cater for answering the
listed questions. If not, make the appropriate changes.
2 For a data warehouse data model, suggest the central “facts” entity.
.....................................................................................................................................................
®
6-25
Lesson 6: Advanced Modeling Topics
..........................................................................................................................................
Scenario
The scenario for this practice is Stranger in a Strange Land. Lost in Darkness. The
Wanderer in the Mist. You name it!
Your Assignment
Make a conceptual data model based on the information in the text. Mark all the pieces
in the diagram that can be confirmed from the text.
"Erats have names that are unique. Erats can have argos.
Argos have names as well. The name of an argo must be
unique within the erat it belongs to. Erats mutually have
rondels. There are only a few different types of rondels. Erats
can have one or more ubins. A ubin always consists of one or
more argos of the erat, one or more rondels of the erat, or
combinations of the two."
.............................................................................................................................................
6-26 Data Modeling and Relational Database Design
Practice 6—4: Synonym
..........................................................................................................................................
practice - exercise
order - command
entity - being
order - sequence
order - arrangement
Command - demand
Your Assignment
Make a conceptual data model that could be the basis for a dictionary of synonyms.
..........................................................................................................................................
®
6-27
Lesson 6: Advanced Modeling Topics
..........................................................................................................................................
.............................................................................................................................................
6-28 Data Modeling and Relational Database Design
.................................
Introduction
Lesson Aim
This lesson describes some principles of relational databases and presents the various
techniques that you can use to transform your Entity Relationship model into a
physical database design.
Overview
7-2
......................................................................................................................................................
7-2 Data Modeling and Relational Database Design
Introduction
......................................................................................................................................................
Objectives
At the end of this lesson, you should be able to do the following:
• Explain the need of a physical database design
• Know the concepts of the relational model
• Agree on the necessity of naming rules
• Perform a basic mapping
• Decide how to transform complex concepts
......................................................................................................................................................
7-3
®
Lesson 7: Mapping the ER Model
......................................................................................................................................................
7-3
......................................................................................................................................................
7-4 Data Modeling and Relational Database Design
Why Create a Database Design?
......................................................................................................................................................
Presenting Tables
Tables are supported by integrity rules that protect the data and the structures of the
database. Integrity rules require each table to have a primary key and each foreign key
to be consistent with its corresponding primary key.
Presenting Tables
Table: EMPLOYEES
columns
EMPLOYEES (EPE)
pk * Id
Table diagram: EMPLOYEES uk1 * Name
o Address foreign
uk1 * Birth_date key
fk * Dpt_id
7-4
Tables A table is a very simple structure in which data is organized and stored.
Tables have columns and rows. Each column is used to store a specific type of value.
In the above example, the EMPLOYEES table is the structure used to store
employees’ information.
Rows Each row describes an occurrence of an employee. In the example, each row
describes in full all properties required by the system.
Columns Each column holds information of a specific type like Id, Name, Address,
Birth Date, and the Id of the department the employee is assigned to.
Primary keys The Id column is a primary key, that is, every employee has a unique
identification number in this table which distinguishes each individual row.
Unique keys Both columns Name and Birth_date are associated with a Unique key
constraint which means that the system does not allow two rows with the same name
and Birth_date. This restriction defines the limits of the system.
Foreign keys The foreign key column enables the use of the Dpt_id value to retrieve
the department properties for which a specific employee is working.
......................................................................................................................................................
7-5
®
Lesson 7: Mapping the ER Model
......................................................................................................................................................
Transformation Process
Using transformation rules you create a new model based on the conceptual model.
Transformation Process
Conceptual Model
Relational Model
7-5
Conceptual Model The way you can describe requirements for the data business
requires using a semantically rich syntax through graphical representation. As you
have seen in previous chapters, you can describe many of the business rules with
graphical elements such as subtypes, arcs, and relationships (barred and
nontransferable ones). The only constraints in expressing business complexity that you
have encountered so far are the graphical limitations. We know that this model acts as
a generic one, because it is not related to any physical considerations. Therefore you
can use it for any type of database. Nevertheless, it may be that the DBMS type you
want to use (relational or others) does not support all of the semantic rules graphically
expressed in your ER model.
Relational Model The Relational model is based on mathematical rules. This means
that when you try to fit all of the syntax from the ER model into the physical database
model, some of it may not have any correspondence in the relational model. To
preserve these specified rules, you have to keep track of them and find the correct way
to implement them.
......................................................................................................................................................
7-6 Data Modeling and Relational Database Design
Transformation Process
......................................................................................................................................................
Terminology Mapping
Terminology Mapping
ANALYSIS DESIGN
Entity Table
Attribute Column
7-6
......................................................................................................................................................
7-7
®
Lesson 7: Mapping the ER Model
......................................................................................................................................................
Naming Convention
Before transforming the ER diagram you probably need to define a naming convention
so that people working on the project use the same standards and produce the same
model from the same source. Rules explained here are the ones used within Oracle.
Even though they are efficient, they are not the only ones that you can use. You or
your company can provide the company’s own standard as part of its method.
7-7
Naming of Tables
The plural of the entity name is used as the corresponding table name. The idea is that
the Entity is the concept of an abstract thing—you can talk about EMPLOYEE,
CUSTOMER, and so on, so singular is a good naming rule, but a table is made up of
rows (the EMPLOYEES table, or CUSTOMERS table) where the plural is more
appropriate.
Naming of Columns
Column names are identical to the attribute names, with a few exceptions. Replace
special characters with an underscore character. In particular, remove the spaces from
attribute names, as SQL does not allow spaces in the names of relational elements.
Attribute Start Date converts to column Start_date; attribute Delivered Y/N transforms
to Delivered_y_n (or preferably Delivered_Ind). Often column names use more
abbreviations than attribute names.
......................................................................................................................................................
7-8 Data Modeling and Relational Database Design
Naming Convention
......................................................................................................................................................
Short Names
A unique short name for every table is a very useful element for the naming of foreign
key columns or foreign key constraints. A suggested way to make these short names is
based on the following rules:
• For entity names of more than one word, take the:
– First character of the first word.
– First character of the second word.
– Last character of the last word.
For example entity PRICED PRODUCT produces PPT as a short table name.
• For entity names of one word but more than one syllable, take the:
– First character of the first syllable.
– First character of the second syllable.
– Last character of the last syllable.
For example EMPLOYEE gives EPE as a short name.
• For entity names of one syllable, but more than one character, take the:
– First character.
– Second character.
– Last character.
For example FLIGHT gives FLT.
This short name construction rule does not guarantee uniqueness among short names
but experience has proved that duplicated names are relatively rare.
In case two short names happen to be the same, just add a number to the one that is
used less often giving, for example, CTR for the most frequently used one and then
CTR1 for the second one.
......................................................................................................................................................
7-9
®
Lesson 7: Mapping the ER Model
......................................................................................................................................................
7-8
• You can use any alpha-numeric character for naming tables and columns as long
as the name:
– Starts with a letter.
– Is up to 30 characters long.
– Does not include special characters such as “!” but “$”,’#” and “_” permitted.
......................................................................................................................................................
7-10 Data Modeling and Relational Database Design
Naming Convention
......................................................................................................................................................
• Table names must be unique within the schema that is shared with views and
synonyms.
• Within the same table two columns cannot have the same name.
• Be aware also of the reserved programming language words that are not allowed
for naming objects. Avoid names like:
– Number
– Sequence
– Values
– Level
– Type
for naming tables or columns. Refer to the RDBMS reference books for these.
......................................................................................................................................................
7-11
®
Lesson 7: Mapping the ER Model
......................................................................................................................................................
Basic Mapping
Entity Mapping
Before going into complex transformation we will look at the way to transform simple
entities.
Basic Mapping
1 - Entities
2 - Attributes
3 - Unique identifiers Table Name: EMPLOYEES
Short Name: EPE
7-11
1 Transform entities into tables using your own naming convention or the one
previously described.
In this example the entity EMPLOYEE produces a table name EMPLOYEES and
a short name EPE.
Use a box to represent tables on a diagram.
2 Each attribute creates a column in the table and the characteristics such as
mandatory or optional have to be kept for each column. Using the same notation
“*” or “o” facilitates recognition of these characteristics on a diagram.
3 All unique identifiers are transformed. A primary unique identifier is transformed
into a Primary key. The notation “pk” next to the column name indicates the
Primary key property. If more than one column is part of the primary key, use the
“pk” notation for each column.
You need to implement secondary unique identifiers, even if they do not appear on
your ER diagram. To preserve this property, secondary UIDs are transformed as
unique keys. In the above example, the values for the combination of two columns
must be unique. They belong to the same unique key and each column has a uk1
notation to indicate this. If, in future, another unique key comes to exist for that table,
it would be notated as uk2.
......................................................................................................................................................
7-12 Data Modeling and Relational Database Design
Basic Mapping
......................................................................................................................................................
EMPLOYEE
# Id DEPARTMENT
* Name # Id
o Address * Name
* Birth Date
fk2 = epe_epe_fk
EMPLOYEES (EPE)
pk * Id
* Name DEPARTMENTS (DPT)
fk1 * Dpt_id pk * Id
fk2 o Epe_id fk1 = epe_dpt_fk uk * Name
7-12
Foreign Key Columns: A relationship creates one or more foreign key columns in
the table at the many side. Using previous naming rules, the name of this foreign key
column is Dpt_id for the relationship with Department and Epe_id for the recursive
relationship. This ensures that column names such as Id, coming from different tables,
still provide a unique column name in the table.
Depending on whether or not the relationship is required, the foreign key column is
mandatory or optional.
Foreign Key Constraints: The foreign key constraints between EMPLOYEES and
DEPARTMENTS is epe_dpt_fk. The recursive one between EMPLOYEES and
EMPLOYEES is called epe_epe_fk.
......................................................................................................................................................
7-13
®
Lesson 7: Mapping the ER Model
......................................................................................................................................................
Relationship Mapping
Mapping of One-to-Many Relationships
As previously mentioned, some of the meaning that is expressed in an ERD cannot be
reproduced in the physical database design.
.
XS
fk o Y_id
XS
fk * Y_id
7-13
A relationship in an ER Diagram expresses the rules that apply between two entities,
from two points of view. The notation used in the ERD is rich enough to tell, for
example, that the relationship is mandatory on both sides. The illustration shows that
the 1:m relationships that are mandatory at the one side are implemented in exactly the
same way as the ones that are optional at the one side. This means that part of the
content of the ER model is lost during transformation, due to the relational model
limitations. You need to keep track of these incomplete transformations; they must be
implemented using a mechanism other than a declarative constraint.
......................................................................................................................................................
7-14 Data Modeling and Relational Database Design
Relationship Mapping
......................................................................................................................................................
You can implement code to check this on the server side or on the client side. In an
Oracle environment this was usually done at the client side. Since Oracle 8, on the
server side Oracle offers implementation possibilities that were not available in
previous releases.
X Y
# Id
* C1 # Id
* C2
XS (X) YS (Y)
pk * Id fk = y_x_fk pk * Id
* C1 pk, fk * X_id
* C2
7-14
This relationship property does not migrate to the physical database design because it
has no natural counterpart in an RDBMS, although you can code a solution at the
server side. In the example, you would create an update trigger at table YS that fails
when the foreign key column X_id is updated.
......................................................................................................................................................
7-15
®
Lesson 7: Mapping the ER Model
......................................................................................................................................................
A B C D
# Id # Id # Id # Id
* C1 * C2 * C3 * C4
To avoid column names that could end up with more than 30 characters, the suggested
convention is never to use more than two table prefixes.
The usual choice for the foreign key column names is:
<nearest by table short name> _ <farthest table short name> _ <column name>
In the above example the foreign key column in DS that comes all the way from AS
through BS and CS is named C_a_id instead of C_b_a_id.
As the short names are usually three characters long, this rule explains why attribute
names should not have more than 22 characters.
......................................................................................................................................................
7-16 Data Modeling and Relational Database Design
Relationship Mapping
......................................................................................................................................................
X Y
# Id # Id
* C1 * C2
XS YS
pk * Id pk * Id
* C1 X_YS * C2
pk,fk1 * X_id
pk,fk2 * Y_id
fk1 = xy_x_fk fk2 = xy_y_fk
7-16
The intersection table contains all the combinations that exist between XS and YS.
• This table has no columns other than foreign key columns. These columns together
form the primary key.
• The rule for naming this table is short name of the first table (in alphabetical order)
and full name of the second one. This would give a many-to-many relationship
between tables EMPLOYEES and PROJECTS an intersection table named
EPE_PROJECTS.
• Whether the relationship was mandatory or not, the foreign key columns are
always mandatory.
Note this table is identical (except, possibly, for its name) to the table that would result
from an intersection entity that could replace the m:m relationship.
......................................................................................................................................................
7-17
®
Lesson 7: Mapping the ER Model
......................................................................................................................................................
X Y
# Id # Id
* C1 * C2
YS (Y)
XS (X)
fk = y_x_fk pk * Id
pk * Id * C2
* C1 fk,uk * X_id
7-17
When transforming a one-to-one relationship, you create a foreign key and a unique
key. All columns of this foreign key are also part of a unique key.
If the relationship is mandatory on one side, the foreign key is created at the
corresponding table. If the relationship is mandatory on both sides or optional on both
sides, you can choose on which table you want to create the foreign key. There is no
absolute rule for deciding on which side to implement it.
If the relationship is optional on both sides you may decide to implement the foreign
key in the table with fewer numbers of rows, as this would save space.
If the relationship is mandatory at both ends, we are facing the same RDBMS
limitation you saw earlier. Therefore, you need to write code to check the mandatory
one at the other side, just as you did to implement m:1 relationships that are mandatory
at the one end.
Alternative Implementations
A 1:1 relationship between two entities can be implemented by a single table. This is
probably the first implementation to consider. It would not need a foreign key
constraint.
A third possible implementation is to create an intersection table, as if the relationship
was of type m:m. The columns of each of the foreign keys of the intersection table
would be part of unique keys as well.
......................................................................................................................................................
7-18 Data Modeling and Relational Database Design
Relationship Mapping
......................................................................................................................................................
Mapping of Arcs
Mapping Arcs
Explicit implementation
USER
# Id
LIST ITEM * Name
ALIAS
# Id
fk1 = lim_x_fk
USERS (USR)
LIST_ITEMS (LIM)
fk2 = lim_usr_fk pk * Id
pk,fk1 * X_id * Name
fk2 o Usr_id
fk3 o Als_id ALIASES (ALS)
fk3 = lim_als_fk pk * Id
+ check constraint
7-18
The first solution illustrated above shows that there are as many foreign keys created
as there are relationships. Therefore a rule must be set to verify that if one of the
foreign keys is populated, the others must not be populated (which is the exclusivity
principle of the relationships in an arc) and that one foreign key value must always
exist (to implement the mandatory condition).
From a diagram point of view, all foreign keys must be optional, but additional code
will perform the logical control. One solution on the server side is to create a check
constraint at LIST_ITEMS as is:
CHECK ( usr_id IS NOT NULL
AND als_id IS NULL)
OR ( usr_id IS NULL
AND als_id IS NOT NULL).
This controls the exclusivity of mandatory relationships.
In case the relationships are optional, you need to add:
OR (usr_id IS NULL AND als_id IS NULL)
An other syntax that is often used:
DECODE (usr_id,NULL,0,1)
+ DECODE (als_id,NULL,0,1)=1;
(or =<1 for optional relationship).
You can also map arcs in a different way using the generic arc implementation. This is
a historical solution that you may encounter in old systems. You should not use it in
new systems. It is discussed in the lesson on Design Considerations.
......................................................................................................................................................
7-19
®
Lesson 7: Mapping the ER Model
......................................................................................................................................................
Mapping of Subtypes
In mapping subtypes, you must make a choice between three different types of
implementations. All three are discussed in detail.
Mapping Subtypes
P K
# Id # Id
* Xxx
A • Supertype
Q
Yyy # Id
o • Subtype
R B
• Both Supertype
* Zzz # Id and Subtype (“Arc”)
L
# Id
7-19
Supertype Implementation
This choice produces one single table for the implementation of the entities P, Q, and
R. The supertype implementation is also called single (or one) table implementation.
Rules
1 Tables:
– Independent of the number of subtypes, only one single table is created.
2 Columns:
– The table gets a column for all attributes of the supertype, with the original
optionality.
– The table also gets a column for each attribute belonging to the subtype but the
columns are all switched to optional.
– Additionally, a mandatory column should be created to act as a discriminator
column to distinguish between the different subtypes of the entity. The value it
can take is from the set of all the subtype short names (DBE, DBU in the
example). This discriminator column is usually called <table_short_ name> _
type, in the example Dba_type.
3 Identifiers:
– Unique identifiers translate into primary and unique keys.
......................................................................................................................................................
7-20 Data Modeling and Relational Database Design
Mapping of Subtypes
......................................................................................................................................................
– Unique identifiers at subtype level usually translate into a unique key or check
constraint only.
Supertype Implementation
P K
# Id # Id
* Xxx
A PS (P)
Q
o Yyy # Id
pk * Id
* Xxx
R B o Yyy
* Zzz # Id o Zzz
fk1 * A_id
fk2 o B_id
• Mandatory
L discriminator * P_type
# Id column
• Additional
constraints
7-20
4 Relationships:
–Relationships at the supertype level transform as usual. Relationships at
subtype level are implemented as foreign keys, but the foreign key columns all
become optional.
5 Integrity constraints:
– For each particular subtype, all columns that come from mandatory attributes
must be checked to be NOT NULL.
– For each particular subtype, all columns that come from attributes or
relationships of other subtypes must be checked to be NULL.
Note: You may avoid the use of the discriminator column if you have one
mandatory attribute in each subtype. The check is done directly on these columns to
find out what type a specific row belongs to.
......................................................................................................................................................
7-21
®
Lesson 7: Mapping the ER Model
......................................................................................................................................................
• The access path to the data of the various types is the same.
• Business rules are globally the same for the subtypes.
• The number of instances per subtype does not differ too much, for example, one
type having more than, say, 1000 times the number of instances of the other.
• An instance of one subtype can become an instance of another, for example,
imagine an entity ORDER with subtypes OPEN ORDER and PROCESSED
ORDER, each subtype having its own properties. An OPEN ORDER may
eventually become a PROCESSED ORDER.
Additional Objects
Usually you would create a view for every subtype, showing only the columns that
belong to that particular subtype. The correct rows are selected using a condition based
on the discriminator column. These views are used for all data operations, including
inserts and updates. All applications can be based on the view, without loss of
performance.
The supertype table plus subtype views is an elegant and appropriate implementation
and should be considered as first choice.
......................................................................................................................................................
7-22 Data Modeling and Relational Database Design
Subtype Implementation
......................................................................................................................................................
Subtype Implementation
This subtype table implementation (often loosely referred to as two-table
implementation) produces one table for each of the subtypes, assuming there are only
two subtypes, such as Q and R.
Subtype Implementation
P K QS (Q)
# Id # Id pk * Id q_a_fk
* Xxx * Xxx
A o Yyy
Q fk * A_id
o Yyy # Id
R B
* Zzz # Id RS (R)
fk1=r_a_fk
pk * Id
* Xxx
* Zzz fk2=r_b_fk
L
fk1 * A_id
# Id fk2 * B_id
7-21
Rules
1 Tables:
– One table per first level subtype.
2 Columns:
– Each table gets a column for all attributes of the supertype, with the original
optionality.
– Each table also gets a column for each attribute belonging to the subtype, also
with the original optionality.
3 Identifiers:
– The primary unique identifier at the supertype level creates a primary key for
each of the tables. Alternatively, if the subtypes had their own UID, this one
are used as the basis for the primary key.
– Secondary identifiers of the supertype become unique keys within each table.
4 Relationships:
– All tables get a foreign key for a relationship at the supertype level with the
original optionality.
......................................................................................................................................................
7-23
®
Lesson 7: Mapping the ER Model
......................................................................................................................................................
–For the relationships at the subtype levels, the foreign key is implemented in
the table it is mapped to. The original optionality is retained.
5 Integrity constraints:
– No specific additional checks are required. Only when the Id values must be
unique across all subtypes would it need further attention.
Additional Objects
Usually you would create an additional view that represents the supertype showing all
columns of the supertype and various subtypes. The view select statement must use the
union operator. The view can be used for queries only, not for data manipulation.
......................................................................................................................................................
7-24 Data Modeling and Relational Database Design
Subtype Implementation
......................................................................................................................................................
R B
* Zzz # Id fk1 = fk2 =
p_q_fk p_r_fk
QS (Q) RS (R)
L pk Id pk * Id
* r_b_fk
# Id o Yyy * Zzz
fk * B_id
7-22
This choice produces one table for every entity, linked to foreign keys in an exclusive
arc at the PS side. It is the implementation of the model as if the subtypes were
modeled as standalone entities with each one having an is subtype of / is supertype of
relationship to the supertype. These relationships are in an arc. Therefore this
implementation is also called Arc Implementation. See also the chapter on Constraints
for more details about subtypes compared to the arc.
Rules
1 Tables:
– As many tables are created as there are subtypes, as well as one for the
supertype.
2 Columns:
– Each table gets a column for all attributes of the entity it is based on, with the
original optionality.
3 Identifiers:
– The primary UID at the supertype level creates a primary key for each of the
tables.
– All other unique identifiers transform to unique keys in their corresponding
tables.
4 Relationships:
– All tables get a foreign key for a relevant relationship at the entity level with
......................................................................................................................................................
7-25
®
Lesson 7: Mapping the ER Model
......................................................................................................................................................
Additional Objects
Although you would hardly use them, you could consider creating additional views
that represent the supertype and various subtypes in full.
......................................................................................................................................................
7-26 Data Modeling and Relational Database Design
Subtype Implementation
......................................................................................................................................................
Storage Implication
The illustrations show the differences between the one, two, and three table
implementations. In most database systems empty column values do take some bytes
of database space (although this sounds contradictory). In Oracle this is very low when
the empty columns are at the end of the table and when the data type is of variable size.
Supertype Implementation All rows for both types are in one table. Note the
empty space in the Q rows at the R columns and vice-versa.
Storage Implication
Supertype Implementatioin
discriminator column
cols cols cols
P
P Q R
Q
rows Q
R
rows R
7-24
......................................................................................................................................................
7-27
®
Lesson 7: Mapping the ER Model
......................................................................................................................................................
Storage Implication
Subtype Implementation
rows Q
rows R
7-25
......................................................................................................................................................
7-28 Data Modeling and Relational Database Design
Subtype Implementation
......................................................................................................................................................
Arc Implementation In this three table implementation the one table is sliced
vertically into a P-columns-only portion. The remaining part is horizontally split into
the Q and R columns and rows. An additional foreign key column at P, or a foreign
key column at both Q and R is needed to connect all the pieces together.
Storage Implication
Supertype and Subtype (Arc) Implementation
cols
P
fk cols fk cols
rows Q Q R
rows Q
rows R rows R
7-26
......................................................................................................................................................
7-29
®
Lesson 7: Mapping the ER Model
......................................................................................................................................................
Summary
Summary
• Relational concepts
• Naming rules convention
• Basic mapping
• Complex mapping
7-27
Relational databases implement the relational theory they are based on.
A coherent naming rule can prevent many errors and frustrations and adds to the
understanding of the structure of the database schema.
You have seen how to map basic elements from an ER model such as entities and
relationships. You can do this very simply. There are also complex structures which
require decisions on how to transform them. Some ER model elements can only be
implemented by coding check constraints or database triggers. These are specific to
Oracle and not part of the ISO standard for relational databases.
......................................................................................................................................................
7-30 Data Modeling and Relational Database Design
Practice 7—1: Mapping basic Entities, Attributes and Relationships
..........................................................................................................................................................................
Scenario
The following is part of the simple Moonlight ER model showing the entities of
DEPARTMENT and EMPLOYEE. Map the entities, attributes, relationships,
optionality, and keys of the following diagram.
EMPLOYEE
# Id assigned DEPARTMENT
to # Id
* First Name
* Last Name * Name
* Date of Birth * Location
responsible
o Home Phone for
EMPLOYEES ( ) DEPARTMENTS ( )
Your Assignment
1 Map both entities to tables and all attributes to columns.
2 Map relationships to foreign keys columns and mark as (fk).
3 Map all optionality tags to not nulls (*).
4 Map UID tags to primary keys (pk).
5 On the table diagram, name all the elements that must be created following this
implementation. Use the naming convention as described in this lesson, or use
your own rules. Give proper names to the columns and foreign key constraints.
EMPLOYEES ( ) DEPARTMENTS ( )
..........................................................................................................................................................................
7-31
®
Lesson 7: Mapping the ER Model
..........................................................................................................................................................................
Scenario
Here is part of the Moonlight ER model showing the entity DEPARTMENT. One of
the analysts has decided to implement the DEPARTMENT entity and its subtypes as a
single table.
Practice: Mapping Supertype
reporting to report of
DEPARTMENT report
# Id HQ COUNTRY
of
* Name * Address ORGANIZATION
* Head Count reporting # Tax Id Number
to
OTHER DEPARTMENT
Your Assignment
1 What would have been the rationale of this choice?
2 On the table diagram, name all the elements that must be created following this
supertype implementation. Use the naming convention as described in this lesson,
or use your own rules. Give proper names to the columns and foreign key
constraints and identify check constraints, if any.
DEPARTMENTS ( )
7-29
..........................................................................................................................................................................
7-32 Data Modeling and Relational Database Design
Practice 7—3: Quality Check Subtype Implementation
..........................................................................................................................................................................
Scenario
Here is a part of the Moonlight ER model.
COUNTRY
# Code
with PRODUCT GROUP with
# Name in
with SHOP
with
in # No
* Name
PRODUCT * Address
for
GLOBAL of * City
PRICE LIST # Code LOCAL
# Start Date o Size # Name
* End Date
with with
in of
GLOBAL PRICE
* Amount
7-30
Your Assignment
Perform a quality check on the proposed subtype implementation of entity
PRODUCT.
lpt_shop_fk
..........................................................................................................................................................................
7-33
®
Lesson 7: Mapping the ER Model
..........................................................................................................................................................................
Scenario
This practice is based on the same ER diagram as the previous practice.
GLOBAL_PRODUCTS (GPT)
gpt_pgp_fk
pk * Code
o Size
LOCAL_PRODUCTS (LPT)
fk1=shp_lpt_fk
pk * Name
pk, fk1 o Shp_no fk2=pgp_lpt_fk
fk1 * Pgp_name
7-32
Your Assignment
Perform a quality check on the proposed supertype and subtype implementation of the
entity PRODUCT and its subtypes. Also, check the selected names.
..........................................................................................................................................................................
7-34 Data Modeling and Relational Database Design
Practice 7—5: Mapping Primary Keys and Columns
..........................................................................................................................................................................
Scenario
This practice is based on the same model that was used in the previous practice.
Your Assignment
Identify the Primary key columns and names resulting from the transformation of the
GLOBAL PRICE entity. Give the short name.
GLOBAL_PRICES ( )
..........................................................................................................................................................................
7-35
®
Lesson 7: Mapping the ER Model
......................................................................................................................................................
......................................................................................................................................................
7-36 Data Modeling and Relational Database Design
.................................
Denormalized Data
Lesson 8: Denormalized Data
..........................................................................................................................................
Introduction
Lesson aim
This lesson shows you the most common types of denormalization with examples.
Overview
• Denormalization
• Benefits
• Types of denormalization
8-2
.............................................................................................................................................
8-2 Data Modeling and Relational Database Design
Introduction
..........................................................................................................................................
Objectives
..........................................................................................................................................
®
8-3
Lesson 8: Denormalized Data
..........................................................................................................................................
Denormalization Overview
Denormalization
• Starts with a “normalized” model
• Adds “redundancy” to the design
• Reduces the “integrity” of the design
• Application code added to compensate
8-3
.............................................................................................................................................
8-4 Data Modeling and Relational Database Design
Why and When to Denormalize
..........................................................................................................................................
Denormalization Techniques
8-4
..........................................................................................................................................
®
8-5
Lesson 8: Denormalized Data
..........................................................................................................................................
A
After pk * Id
* X
* Total_quantity
8-5
Appropriate:
• When the source values are in multiple records or tables
• When derivable values are frequently needed and when the source values are not
• When the source values are infrequently changed
Advantages:
• Source values do not need to be looked up every time the derivable value is
required
• The calculation does not need to be performed during a query or report
Disadvantages:
• DML against the source data will require recalculation or adjustment of the
derivable data
• Data duplication introduces the possibility of data inconsistencies
.............................................................................................................................................
8-6 Data Modeling and Relational Database Design
Storing Derivable Values
..........................................................................................................................................
MESSAGES (MSE)
After pk * Id
* Subject
* Text
* Number_of_times_received
8-6
When a message is delivered to a recipient, the user only receives a pointer to that
message, which is recorded in RECEIVED_MESSAGES. The reason for this, of
course, is to prevent the mail system from storing a hundred copies of the same
message when one message is sent to a hundred recipients.
Then, when someone deletes a message from their account, only the entry in the
RECEIVED_MESSAGES table is removed. Only after all RECEIVED_MESSAGE
entries, for a specific message, have been deleted, the should the actual message be
deleted too.
We could consider adding a denormalized column to the MESSAGES table to keep
track of the total number of RECEIVED_MESSAGES that are still kept for a
particular message. Then each time users delete a row in RECEIVED_MESSAGES,
in other words, they delete a pointer to the message, the Number_of_times_received
column can be decremented. When the value of the denormalized column equals zero,
then we know the message can also be deleted from the MESSAGES table.
..........................................................................................................................................
®
8-7
Lesson 8: Denormalized Data
..........................................................................................................................................
Pre-Joining Tables
You can pre-join tables by including a nonkey column in a table, when the actual value
of the primary key, and consequentially the foreign key, has no business meaning. By
including a nonkey column that has business meaning, you can avoid joining tables,
thus speeding up specific queries.
You must include application code that updates the denormalized column, each time
the “master” column value changes in the referenced record.
Pre-Joining Tables
Before
B
A
pk pk * Id
* Id
fk * A_id
* Col_a
Add the non_key column to the table with the foreign key.
B
After
pk * Id
fk * A_id
* A_col_a
8-7
Appropriate:
• When frequent queries against many tables are required
• When slightly stale data is acceptable
Advantages
• Time-consuming joins can be avoided
• Updates may be postponed when stale data is acceptable
Disadvantages
• Extra DML needed to update original nondenormalized column
• Extra column and possibly larger indices require more working space and disk
space
.............................................................................................................................................
8-8 Data Modeling and Relational Database Design
Pre-Joining Tables
..........................................................................................................................................
* pk,fk * Mse_id
pk Id
* pk,fk * Flr_id
Name
* Date_received
RECEIVED_MESSAGES (RME)
After pk,fk * Mse_id
pk,fk * Flr_id
* Date_received
* Fdr_Name
8-8
Example
Suppose users often need to query RECEIVED_MESSAGES, using the name of the
folder where the received message is filed. In this case it saves time when the name of
the folder is available in the RECEIVED_MESSAGES table.
Now, if a user needs to find all messages in a particular folder, only a query on
RECEIVED_MESSAGES is needed.
Clearly, the disadvantage is extra storage space for the extra column in a, potentially,
very large table.
..........................................................................................................................................
®
8-9
Lesson 8: Denormalized Data
..........................................................................................................................................
Hard-Coded Values
If a reference table contains records that remain constant, then you can consider hard-
coding those values into the application code. This will mean that you will not need to
join tables to retrieve the list of reference values. This is a special type of
denormalization, when values are kept outside a table in the database. In the example,
you should consider creating a check constraint to the B table in the database that will
validate values against the allowable reference values. Note that a check constraint,
though it resides in the database, is still a form of hardcoding.
Whenever a new value of A is needed the constraint must be rewritten.
Hard-Coded Values
Before
B
A
pk Id pk * Id
*
Type fk * A_id
*
Remove the foreign key and hard code the allowable values and
validation in the application.
B
After pk Id
*
* A_Type
8-9
Appropriate
• When the set of allowable values can reasonably be considered to be static during
the life cycle of the system
• When the set of possible values is small, say, less than 30
Advantages
• Avoids implementing a look-up table
• Avoids joins to a look-up table
Disadvantages
• Changing look-up values requires recoding and retesting
.............................................................................................................................................
8-10 Data Modeling and Relational Database Design
Hard-Coded Values
..........................................................................................................................................
8-10
Example
ElectronicMail would like to know some background information about their users,
such as the type of business they work in. Therefore EM have created a table to store
all the valid BUSINESS_TYPES they want to distinguish. The values in this table are
set up front and not likely to change.
This is a candidate for hard-coding the allowable values. You could consider placing a
check constraint on the column in the database. In addition to that, or instead of that,
you could build the check into the field validation for the screen application where
users can sign in to the EM service.
..........................................................................................................................................
®
8-11
Lesson 8: Denormalized Data
..........................................................................................................................................
Keeping
Keeping Details
Details with
with Master
Master
Before
Before
B B
A A
pk,fk * A_id
pk,fk A_id
*
pk pk* *Id Id pk pk * * TypeType
Amount
* * Amount
Add
Add thethe repeating
repeating detail
detail columns
columns to the
to the master
master table.
table.
A A
After pk pk*
After
*Id Id
Type1
*Amount_1
*
Amount_1
*Amount_2
*
Type2
*Amount_3
*
Amount_2
*Amount_4
*
Type3
*Amount_5
*
Amount_3
*Amount_6
*
8-11
Appropriate
• When the number of detail records for all masters is fixed and static
• When the number of detail records multiplied by the number of columns of the
detail is small, say less than 30
Advantages
• No joins are required
• Saves space, as keys are not propagated
Disadvantages
• Increases complexity of data manipulation language (DML) and SELECTs across
detail values
• Checks for Amount column must be repeated for Amount1, Amount2 and so on
• Table name A might no longer match the actual content of the table
.............................................................................................................................................
8-12 Data Modeling and Relational Database Design
Keeping Details With Master
..........................................................................................................................................
USERS (USR)
After pk * Id
* Name
* Message_Quota_Allocated
* Message_Quota_Available
* File_Quota_Allocated
* File_Quota_Available
8-12
Example
Suppose each e-mail user is assigned two quotas—one for messages and one for files.
The amount of each quota is different, so both have to be tracked individually. The
quota does not change very frequently. To be relationally pure, we would create a two-
record STORAGE_TYPES table and a STORAGE_QUOTAS table with records for
each user, one for each quota type. Instead, we can create the following denormalized
columns in the USER table:
• Message_Quota_Allocated
• Message_Quota_Available
• File_Quota_Allocated
• File_Quota_Available
Note that the name of table USERS does not really match the data in the denormalized
table.
..........................................................................................................................................
®
8-13
Lesson 8: Denormalized Data
..........................................................................................................................................
A
After
pk * Id
* Current_price
8-13
Appropriate
• When detail records per master have a property such that one record can be
considered “current” and others “historical”
• When queries frequently need this specific single detail, and only occasionally
need the other details
• When the Master often has only one single detail record
Advantages
• No join is required for queries that only need the specific single detail
Disadvantages
• Detail value must be repeated, with the possibility of data inconsistencies
Additional code must be written to maintain the duplicated single detail value at the
master record.
.............................................................................................................................................
8-14 Data Modeling and Relational Database Design
Repeating Single Detail with Master
..........................................................................................................................................
MESSAGES (MSE)
After
pk * Id
* First_attachment_name
* Subject
* Text
8-14
Example
Any time a message is sent, it can be sent with attachments included. Messages can
have more than one attachment. Suppose in the majority of the messages that there is
no or only one attachment. To avoid a table join, you could store the attachment name
in the MESSAGES table. For those messages containing more than one attachment,
only the first attachment would be taken. The remaining attachments would be in the
ATTACHMENTS table.
..........................................................................................................................................
®
8-15
Lesson 8: Denormalized Data
..........................................................................................................................................
Short-Circuit Keys
For database designs that contain three (or more) levels of master detail, and there is a
need to query the lowest and highest level records only, consider creating short-circuit
keys. These new foreign key definitions directly link the lowest level detail records to
higher level grandparent records. The result can produce fewer table joins when
queries execute.
Short-Circuit Keys
Before
B C
A
pk * Id pk * Id
pk * Id fk A_id fk * B_id
*
A B C
After pk Id
pk * Id * pk * Id
fk * A_id fk * B_id
fk * A_id
8-15
Appropriate
• When queries frequently require values from a grandparent and grandchild, but not
from the parent
Advantages
• Queries join fewer tables together
Disadvantages
• Extra foreign keys are required
• Extra code is required to make sure that the value of the denormalized column
A_id is consistent with the value you would find after a join with table B.
.............................................................................................................................................
8-16 Data Modeling and Relational Database Design
Short-Circuit Keys
..........................................................................................................................................
Create a new foreign key from the lowest detail to the highest
master.
After
RECEIVED_
FOLDERS (FDR) MESSAGES (RME)
USERS (USR)
pk * Name pk * Id
pk * Id fk * Usr_id fk * Fdr_name
uk * Name fk * Usr_name
8-16
Example
Suppose frequent queries are submitted that require data from the
RECEIVED_MESSAGES table and the USERS table, but not from the FOLDERS
table. To avoid having to join USERS and FOLDERS, the primary or a unique key of
the USERS table can been migrated to the RECEIVED_MESSAGES table, to provide
information about USERS and RECEIVED_MESSAGES with one less, or no, table
join.
..........................................................................................................................................
®
8-17
Lesson 8: Denormalized Data
..........................................................................................................................................
8-17
Appropriate
• When queries are needed from tables with long lists or records that are historical
and you are interested in the most current record
Advantages
• Can use the between operator for date selection queries instead of potentially time-
consuming synchronized subquery
Disadvantages
• Extra code needed to populate the end date column with the value found in the
previous start date record
.............................................................................................................................................
8-18 Data Modeling and Relational Database Design
End Date Columns
..........................................................................................................................................
8-18
Example
When a business wishes to track the price history of a product, they may use a PRICES
table that contains columns for the price and its start date and a foreign key to the
PRODUCTS table. To avoid using a subquery when looking for the price on a specific
date, you could consider adding an end date column. You should then write some
application code to update the end date each time a new price is inserted.
Compare:
...WHERE pdt_id = ...
AND start_date = ( SELECT max(start_date)
FROM prices
WHERE start_date <= sysdate
AND pdt_id = ...
)
and
...WHERE pdt_id = ...
AND sysdate between start_date and nvl(end_date, sysdate)
Note that the first table structure presupposes that products always have a price since
the first price start date of that product. This may very well be desirable but not always
the case in many business situations.
Note also that you would need code to make sure periods do not overlap.
..........................................................................................................................................
®
8-19
Lesson 8: Denormalized Data
..........................................................................................................................................
A B
pk,fk * A_id
pk * Id pk * Start_date
After B
pk,fk * A_Id
pk * Start_date
o Current_indicator
8-19
Appropriate
• When the situation requires retrieving the most current record from a long list
Advantages
• Less complicated queries or subqueries
Disadvantages
• Extra column and application code to maintain it
• The concept of “current” makes it impossible to make data adjustments ahead of
time
.............................................................................................................................................
8-20 Data Modeling and Relational Database Design
Current Indicator Column
..........................................................................................................................................
8-20
Example
In the first table structure, when the current price of a product is needed, you need to
query the PRICES table using:
...WHERE pdt_id = ...
AND start_date = ( SELECT max(start_date)
FROM prices
WHERE start_date <= sysdate
AND pdt_id = ...
)
..........................................................................................................................................
®
8-21
Lesson 8: Denormalized Data
..........................................................................................................................................
After A
pk * Id
fk * A_id
* Level_no
8-21
Appropriate
• When there are limits to the number of levels within a hierarchy, and you do not
want to use a connect-by search to see if the limit has been reached
• When you want to find records located at the same level in the hierarchy
• When the level value is often used for particular business reasons
Advantages
• No need to use the connect-by clause in query code
Disadvantages
• Each time a foreign key is updated, the level indicator needs to be recalculated,
and you may need to cascade the changes
.............................................................................................................................................
8-22 Data Modeling and Relational Database Design
Hierarchy Level Indicator
..........................................................................................................................................
8-22
Example
Imagine that because of storage limitations, a limit has been placed on the number of
nested folders. Each time a user wants to create a new instance of a folder within an
existing folder instance, code must decide if that limit has been reached. This can be a
slow process.
If you add a column to indicate at what nested level a FOLDER is, then when you
create a new folder in it, you can decide immediately if this is allowed. If it is, the level
of the new folder is simply one more than the level of the folder it resides in.
..........................................................................................................................................
®
8-23
Lesson 8: Denormalized Data
..........................................................................................................................................
Denormalization Summary
Denormalization is a structured process and should not be done lightly. Every
denormalization step will require additional application code. Be confident you do
want to introduce this redundant data.
Denormalization Summary
Denormalization Techniques
• Storing Derivable Information
– End Date Column
– Current Indicator
– Hierarchy Level Indicator
• Pre-Joining Tables
• Hard-Coded Values
• Keeping Detail with Master
• Repeating Single Detail with Master
• Short-Circuit Keys
8-23
.............................................................................................................................................
8-24 Data Modeling and Relational Database Design
Practice 8—1: Name that Denormalization
.....................................................................................................................................................
Your Assignment
For the following table diagrams, decide what type of
denormalization is used and explain why the diagram depicts the denormalization you
have listed.
Use one of:
• Storing derivable information
• Pre-Joining Tables
• Hard-Coded Values
• Keeping Details with Master
• Repeating Single Detail with Master
• Short-Circuit Keys
1
SHIFTS (SFT)
WEEKDAYS (WDY)
pk * No
pk * Code fk * Wdy_code
* Name Start_time
*
* End_time
* Wdy_name
PRICE_LISTS (PLT)
COUNTRIES (CTY)
pk,fk * Cty_code
pk * Code pk * Start_date
* Name o End_date
* Current_price_ind
.....................................................................................................................................................
®
8-25
Lesson 8: Denormalized Data
..........................................................................................................................................
Your Assignment
1 Indicate which triggers are needed and what they should do to handle the
denormalized column Order_total of ORDER_HEADERS.
* Order_total * Item_total
8-29
.............................................................................................................................................
8-26 Data Modeling and Relational Database Design
Practice 8—2: Triggers
..........................................................................................................................................
2 Indicate which triggers are needed and what they should do to handle the
denormalized column Lcn_address of EMPLOYEES.
8-31
..........................................................................................................................................
®
8-27
Lesson 8: Denormalized Data
..........................................................................................................................................
3 Indicate which triggers are needed and what they should do to handle the
denormalized column Curr_price_ind of table PRICES.
* Curr_price_ind
8-33
.............................................................................................................................................
8-28 Data Modeling and Relational Database Design
Practice 8—3: Denormalize Price Lists
.....................................................................................................................................................
Scenario
End users have started to complain about query performance. One of the areas where
this is particularly noticeable is when querying the price of a global product. Since
there is a large list of records in the GLOBAL_PRICES table, and it needs to be joined
with the PRICE_LISTS table, it is not surprising the queries can take a long time.
Optimizing the queries using other techniques have failed to result in acceptable
response times.Therefore the decision is to use some denormalization to correct this
problem.
The corporate office also has another concern. They would like to notify the local
shops of any new price list changes of global products, prior to their effective date.
They would like to enter the new price list information when it is decided, not when
the start date is reached. You need to add provision to alleviate this restriction.
Your Assignment
Describe what type of denormalization you would implement and what code you
would add to ensure the database does not lose any integrity. The next diagram shows
the current table schema. Consider both issues described above when deciding which
types of denormalization to implement.
.....................................................................................................................................................
®
8-29
Lesson 8: Denormalized Data
.....................................................................................................................................................
Scenario
The corporate office has decided to formalize English as the
corporate language. Headquarters has asked the IS department to arrange for all global
products to store their names in English. On the other hand, countries must be able to
store their native language equivalent.
Your Assignment
Using the design below, denormalize the table design and describe the additional code
that will allow this requirement to be implemented.
LANGUAGES (LGE)
pk * Code
* Name
PRODUCT_NAMES (PNE)
PRODUCTS (PDT)
pk * Code pk,fk * Pdt_code
o Size pk,fk * Lge_code
* Name
.....................................................................................................................................................
8-30 Data Modeling and Relational Database Design
.................................
Database Design
Considerations
Lesson 9: Database Design Considerations
......................................................................................................................................................
Introduction
Lesson Aim
This lesson illustrates some principles of the Oracle RDBMS and presents the various
techniques that can be used to refine the physical design.
Overview
9-2
......................................................................................................................................................
9-2 Data Modeling and Relational Database Design
Introduction
......................................................................................................................................................
Objectives
At the end of this lesson, you should be able to do the following:
• Describe which data types to use for columns
• Evaluate the quality of the Primary key
• Use artificial keys and sequences where appropriate
• Define rules for referential integrity
• Explain the use of indexes
• Discuss partitioning and views
• Recognize old-fashioned database techniques
• Explain the principle of distributed databases
• Describe the Oracle database model
......................................................................................................................................................
9-3
®
Lesson 9: Database Design Considerations
......................................................................................................................................................
• User Expectations
• Volumes
Adapted
• Hardware
Initial design Physical
• Network
Design
• O.S.
• Oracle specifics
9-3
You have to analyze a large number of parameters to obtain a correct adapted physical
design from the initial design. Note the “a correct”, not “the correct”. Like many
design issues, there is no absolute truth here.
The points noted here are the most important ones—there are others.
• The expected volume of tables, the hardware characteristics like CPU speed,
memory size, number of disks and corresponding space, the architecture—client/
server or three tier, the network bandwidth, speed, and the operating systems are
determinants.
• User requirements are an other big issue. Depending on the response time, the GUI
and the frequency of use of modules, they influence the objects that can be used in
Oracle to cope with user expectations.
• Depending on the version of Oracle you are using, some elements may or may not
exist.
......................................................................................................................................................
9-4 Data Modeling and Relational Database Design
Oracle Data Types
......................................................................................................................................................
• Depending on:
– Domains
– Storage issue
– Performance
– Use
• Select a data type for columns:
– Character
– Number
– Date
– Large Objects
9-4
When you create a table or cluster, you must specify an internal data type for each of
its columns. These data types define a generic domain of values that each column can
contain.
• Some data types have a narrow focus, like number and date. Some data types are
general purpose data types, like the various character data types.
• Some data types allow for variable length, some do not.
Choosing a large fixed length for a column to store very few bytes for most of the
rows can result in a huge table size. This may affect performance as a row may
actually contain only a few bytes and yet be stored on multiple blocks, resulting in
a great number of I/O’s, and therefore decreasing performance.
• One cannot search against the Large Object Data Types; they cannot be used in a
where clause. They are only retrievable by searching against other columns.
......................................................................................................................................................
9-5
®
Lesson 9: Database Design Considerations
......................................................................................................................................................
......................................................................................................................................................
9-6 Data Modeling and Relational Database Design
Column Sequence
......................................................................................................................................................
Column Sequence
The sequence of columns in a table is relevant, although any column sequence would
allow all table operations. The column sequence can influence, in particular, the
performance of data manipulation operations. It may also influence the size of a table.
9-5
* Incases where the table contains a LONG or LONG RAW column, even if it is a
mandatory column, make it the last column of the table.
The rationale is that null columns should be at the end of the table; columns that are
often used in search conditions should be up front. This is for both storage and
performance reasons.
......................................................................................................................................................
9-7
®
Lesson 9: Database Design Considerations
......................................................................................................................................................
Primary Keys
9-6
Primary Keys
They are a strong concept that is usually enforced for every table.
• They can be made up of one or more columns; each has to be mandatory.
• They are declarative as a constraint and can be named. When creating a primary
key constraint, Oracle automatically creates a unique index in association with it.
• A foreign key usually refers to the primary key of a table, but may also refer to a
unique key.
Tables that do not have a primary key should have a unique key.
Note: Although Oracle allows a primary key to be updated, relational theory strongly
advises against this.
Unique Keys
A unique key is a key that for some reason was not selected to be the primary key. The
reasons may have been:
• Allowed nulls. Nulls may be allowed in Unique keys columns.
• Updatable. Unique key values may change but still need to remain unique. For
example, the home phone number of an employee or the license plate for a car.
There may be more than one unique key for each table.
Note: A Unique index is the additional structure Oracle uses to check the uniqueness
of values for primary keys and unique keys. Creating a unique key results
automatically in the creation of a unique index.
......................................................................................................................................................
9-8 Data Modeling and Relational Database Design
Primary Keys and Unique Keys
......................................................................................................................................................
Primary Keys
9-7
Easy to Use: Primary keys are normally used in join statements, so a primary key
should be easy to use. Writing a SQL statement to create a join between two tables is
easier if two columns only, rather than a large number, are involved in the join
predicate.
Does Not Kill Performance: A join operation using a single key usually performs
much better than a join using four key columns.
Small Size: Large-sized primary keys lead to large-sized foreign keys referencing
them. In general, the referencing table contains far more rows than the referenced
table. An oversized primary key can lead to a multiple of unnecessary bytes.
......................................................................................................................................................
9-9
®
Lesson 9: Database Design Considerations
......................................................................................................................................................
Meaningless: You could, for example, choose to use the name of a country as a
primary key, but even recent history has shown that countries may change their names.
Opt for numeric values rather than character values, and if using numbers, avoid
numbers with any particular meaning.
Stable: You should try to avoid selecting a primary key that is likely to be updated.
Bear in mind that it is very rare for real world things to stay stable for ever.
......................................................................................................................................................
9-10 Data Modeling and Relational Database Design
Artificial Keys
......................................................................................................................................................
Artificial Keys
An artificial key is a meaningless, usually numeric, value that is assigned to a record
which functions as the primary key for the table. Artificial keys provide an interesting
alternative to complex primary keys. Artificial keys are also called surrogate keys.
Artificial Keys
u
pk ,fk1 * A_id
u
pk ,fk2 * B_id
XS (X) pk ,fk3 * C_id
u
pk Id * C4
* pk * Id
fk1 * D_a_id fk = x_d_fk
fk1
fk ** D_b_id
D_id
fk1 * D_c_id
o C5
9-8
Advantages
Artificial keys have the following advantages over composed keys:
• The extra space that is needed for the artificial key column and index is less, often
far less, than the space you save for the foreign key columns of referring tables.
• Join conditions consist of a single equation.
• The joins perform better.
• Internal references, which are completely invisible to the user, can be managed.
The modeled UID can than be implemented as a unique key, and made updatable
without needing cascade updates.
• Because they are meaningless, it is difficult to memorize them. Users will not even
attempt this.
• Some people really like them.
Disadvantages
Disadvantages of artificial keys are:
• Because they are meaningless, they always require joins to collect the meaning of
the foreign key column.
......................................................................................................................................................
9-11
®
Lesson 9: Database Design Considerations
......................................................................................................................................................
• More space is required for the indexes, if you decide to create an additional unique
key that consists of the original primary key columns.
• Because they are meaningless, it is difficult to memorize them. Users always need
a list of values or other help for entering the foreign key values.
• Some people really hate them.
Before Design
Negative: It would corrupt your data model, as you would add elements that have
no business meaning.
Positive: There is a close mapping between the conceptual and technical model
that reduces the chances of misunderstanding.
After Design
Positive: It really is a design decision based on current performance
considerations.
Tools like Oracle Designer let you decide about artificial keys during the initial
mapping of the ER model. This is a nice compromise.
......................................................................................................................................................
9-12 Data Modeling and Relational Database Design
Sequences
......................................................................................................................................................
Sequences
Sequences
225
224
223
9-9
......................................................................................................................................................
9-13
®
Lesson 9: Database Design Considerations
......................................................................................................................................................
Foreign Key
By definition, Foreign Keys must refer to primary key or unique key values. You
should consider what should happen if the primary key (or unique key) value changes.
Delete Update
Restrict
Cascade
Default / Nullify
9-10
Referential Integrity
There are two aspects to consider:
• The rules you want to implement to support business constraints
• The functionalities Oracle provides for these rules
Relational theory describes four possible kinds of behavior for a foreign key. For every
foreign key decide what kind of behavior you want it to have.
The behaviors describe what the foreign key should do when the value of the key it
refers to changes.
Restrict Delete
Restrict delete means that no deletes of a primary (or unique) key value are allowed
when referencing values exist. This is supported by Oracle. This is the most
commonly used foreign key behavior.
Restrict Update
Restrict update means that no updates of a primary (or unique) key value are allowed
when referencing values exist. This is supported by Oracle. Note that this behavior is
unnecessary in the case of artificial keys as these are probably never updated.
......................................................................................................................................................
9-14 Data Modeling and Relational Database Design
Sequences
......................................................................................................................................................
Note that restrict update is not the same concept as nontransferability. Restrict update
prevents the update of a referenced primary key value. Nontransferability means that
the foreign key columns are not updatable.
Cascade Delete
Cascade delete means that deletion of a row causes all rows that reference that row
through a foreign key marked as “cascade” will be deleted automatically. Cascade
delete is an option that Oracle supports.
The complete delete operation will fail if, during the cascade, there is a record
somewhere that cannot be deleted. This may happen if the record to be deleted is
referred to through a restrict delete foreign key.
Cascade delete is a very powerful mechanism that should be used with care.
Cascade Update
Cascade update means that after a primary key value is updated, this change is
propagated to all the foreign key columns referencing it.
Cascade update and nontransferability often come together.
Typical Use
Usually, many foreign keys are defined as restrict delete. This does not prevent the
referred record being deleted; it just forces the user to consciously remove or transfer
all referring rows.
Of course, when you use artificial keys you can set all foreign key update properties to
“restrict” as there will never be a good reason for updating an artificial key value.
......................................................................................................................................................
9-15
®
Lesson 9: Database Design Considerations
......................................................................................................................................................
Indexes
Indexes are database structures that are stored separately from the tables they depend
on. In a relational database you can query any column, independently of the existence
of an index on that column.
Indexes
• Performance
Name Phone
b
c
ALBERT 2655
d
ALFRED 3544 ef
gh
ij
ALICE 7593 kl
m
ALLISON 3456
no
pq
ALVIN 8642
rs
tu
ALPHONSO 2841 vw
xyz
• Uniqueness
9-11
Performance
Indexes are created to provide a fast method to retrieve values. However, indexes can
slow down performance on DML statements.
Oracle provides a wide range of index types. You must choose the type which is
suitable for its intended use.
Uniqueness
A unique index is an efficient structure to ensure that the values are not duplicated
within the set of columns included in the index. Unique indexes are automatically
created when you create a primary or unique key. The name of the index in that case is
the same as the name of the key constraint.
......................................................................................................................................................
9-16 Data Modeling and Relational Database Design
Indexes
......................................................................................................................................................
Index Types
See page 38
Choosing Indexes
B*tree
Bitmap
aba .1.2.5 X Y Z
abb .1.4.5
Reverse 0 1 0
aba .1.2.5 1 0 0
abb .1.3.5 bba .1.3.5 0 0 1
abc .1.1.5 cba .1.1.5 1 0 0
bba .1.4.5 ... 0 0 1
...
C1 C2 C1 C2
I.O.Table abc Y
aba X
abb Z aba X
abc Y abb Z
bba Z bba X
bbc X bbc Z
9-12
B*Tree
The classical structure of an index, if not explicitly specified otherwise, is the B*Tree
(also known as Tree balanced) index. It is specially designed for online transaction
processing systems. They have a proven efficiency and Oracle has offered them for
some time. They easily support insert, update, and delete.
Reverse Key
Based on that classical structure of the B*Tree, Oracle offers a reverse key index
which has most of the properties of the B*Tree but in which the bytes of each indexed
column are reversed.
Bitmap
A bitmap index stores for each individual value of the indexed column, if a row
contains this value or not.
Typical use: Data warehouse environment. Bitmap indexes have a proven efficiency
in On Line Analytical Process systems when ad-hoc queries can be intensive and the
number of distinct values for the indexed column is not high.
......................................................................................................................................................
9-17
®
Lesson 9: Database Design Considerations
......................................................................................................................................................
Bitmap indexes require less space than a B*Tree index but they do not support inserts,
updates, and deletes as well as a B*Tree.
Typical use: Tables that are always accessed through exactly the same path, in
particular when storing large objects.
Concatenated Index
You can create an index that includes more than one column. These are called
concatenated indexes. The order in which you specify the columns has a strong impact
on the way Oracle can use the index. Set the column that is always in a Where clause
as the first column of the index. This is called the leading part of the index.
Typical use: Create an index on the first three characters of a name using the substr
function or the year component of a date using the to_char function.
......................................................................................................................................................
9-18 Data Modeling and Relational Database Design
Choosing Columns to Index
......................................................................................................................................................
! Avoid indexing:
• Small tables
• Columns frequently updated
9-13
......................................................................................................................................................
9-19
®
Lesson 9: Database Design Considerations
......................................................................................................................................................
Temporary Indexes
• Indexes can be created and dropped for a particular incidental use. For example,
you can decide to create an index right before a report is run and then drop it
afterwards.
General Recommendations
• Limit the number of indexes per table. Although a table can have any number of
indexes this does not necessarily improve performance; the more indexes, the
more overhead is incurred when there are updates or deletes.
• As a rule of thumb, if there is any doubt, do not create the index. You can always
create it later.
• It is very likely that the initial set of indexes will have to change after some time,
because of changes of the characteristics of the system. Typically, the number of
different values in a column can initially be very low but increase during the life
cycle of a system. Initially, an index would not be of value but it would be later.
......................................................................................................................................................
9-20 Data Modeling and Relational Database Design
When Are Indexes Used?
......................................................................................................................................................
9-14
You may have created an index to improve performance but without seeing any
benefits.
For Oracle to use them, indexed columns need to be referenced in the Where clause of
a SQL statement, or in the order by, while the Where clause must not include the
following:
• IS NULL
• IS NOT NULL
• !=
• LIKE
• When the column is affected by an operation or function (unless you use a
function-based index and the condition uses the same function)
For example, suppose column X contains many nulls and a few numeric, positive
values. Suppose queries often select all rows having a NOT NULL value. Finally,
suppose an index is created on X.
In this case, the condition WHERE X > 0 is preferable to WHERE X IS NOT NULL
because in the first situation Oracle would use an index on X and in the second Oracle
would not.
Yet, even if it was written in this way, it is the optimizer’s choice to decide whether to
use indexes or not. The decision is based on rules or on statistics.You can stimulate the
optimizer to use indexes using hints in your SQL statements.
......................................................................................................................................................
9-21
®
Lesson 9: Database Design Considerations
......................................................................................................................................................
Table Partitioning
Oracle provides an interesting feature to solve performance and administration
problems on tables with a large number of rows.
CUSTOMERS_R1
Col1 Col2 Col3 Region
CUSTOMERS_R2
Col1 Col2 Col3 Region
9-15
Partitioned Table
Since Oracle8, when creating a table, you can specify the criteria on which you want
to divide the table and make a horizontal partitioning. There are then as many
partitioned tables as there are distinct values in the column. Each partitioned table has
a specific name but access is made referring to the global name of the table. The
optimizer then decides which partition to access, depending on the value of the Where
clause.
The main issue of this feature is to manipulate considerably smaller pieces of data and
then improve the speed of SQL statements. Suppose you want to query on customers
located in a specific region, Oracle does not need to access all rows of the
CUSTOMERS table but can limit its search to the piece holding all customers of this
region only.
Logically, the table behaves as one object; physically, data is stored in different places.
Partitioned Index
Using the same idea, an index may be partitioned. It does not need to match with the
table partitioning. It may have different partitioning criteria and have a different
number of partitions to the table. This may be useful in the situation where the answer
to particular queries can always be found in the partitioned index.
......................................................................................................................................................
9-22 Data Modeling and Relational Database Design
Views
......................................................................................................................................................
Views
A view is a window onto the database. It is defined by a SELECT statement which is
named and stored in the database. Therefore a view has no data of its own—it relays
information from underlying tables.
Views
T1 T2 T3 T4
V1 V2 V3 V4
• Restricting access
• Presentation of data
• Isolate applications from data structure
• Save complex queries
• Simplify user commands
9-16
Usages of Views
• Restricting access: The view mechanism is one of the possible ways to hide
columns and rows from the tables it is based on.
• Presenting data: A view can be used to present data in a more understandable way
to end-users. For example, a view can present calculated data built from
elementary information that is stored in tables.
• Isolating application from data structures: Applications may be based on views
rather than tables, where there is a high risk that the structure might change. If a
view is used, the application would need no maintenance providing the view
remains untouched, even though the underlying tables were modified.
• Saving complex queries and simplifying commands: Views can be used to hide the
complexity of the data structure, allowing users to create queries over multiple
tables without having to know how to join the tables together.
• Simplifying user commands.
......................................................................................................................................................
9-23
®
Lesson 9: Database Design Considerations
......................................................................................................................................................
Use of Views
• Advantages
– Dynamic views
– Present denormalized data from normalized
tables
– Simplify SQL statements
• Disadvantages
– May affect performances
– Restricted DML in some cases
9-17
Advantages
• You can use a view to present derived data to end users without having to store
them in the database. Typically, you would show completely denormalized, pre-
joined information in views that would allow end users to write simple SELECT
statements like SELECT * FROM ... WHERE ...
• Views can be made dynamic, for example, showing data that depend on which user
you are or what day it is.
For example, you could create a view that shows localized help messages.
According to the user name, the system can find the preferred language in a
PREFERENCES table and next return a message in this language. A single view
returns different values depending on the name of the user.
Another example type of view can be used to allow a user to access data between
8:00 am and 6:00 pm on weekdays only.
Disadvantages
• Views are always somewhat slower, which is due to the fact that the parse time is
slightly longer. Once a table and its columns are found, the query can be
immediately executed. Query criteria are linked with “and” to the criteria of the
view. This can affect the execution plan generated by the optimizer.
• Even if views behave almost like tables, there are still some restrictions when
using views for insert, update, and delete statements.
......................................................................................................................................................
9-24 Data Modeling and Relational Database Design
Old-Fashioned Design
......................................................................................................................................................
Old-Fashioned Design
Going through existing systems, you may find some old-fashioned design techniques.
These techniques were used at the time the RDBMS features were not so advanced.
See page 40
• Unique index
• Views with “Check option” clause
• Generic Arc implementation
9-18
Unique Index
Unique Indexes used to be created manually on the primary key columns because the
primary key constraint could not be declared up to Oracle7.
......................................................................................................................................................
9-25
®
Lesson 9: Database Design Considerations
......................................................................................................................................................
X
A # Id
# Id * Name
* Name
Y
# Id
* Name
AS (A)
...
* Table_name (X or Y)
* Fk_id
9-19
The generic arc implementation is a fossil construction you may find in old systems.
In the implementation of the arc of entity A in the example, the three relationships in
the arc were merged into one generic foreign key column Fk_id. Added to table AS is
a NOT NULL column that keeps the information about which table the foreign key
value refers to. This used to be a popular technique because it could make use of a
NOT NULL constraint on Fk_id when the arc was mandatory.
This solution for implementing arcs should now be avoided for the following
limitations:
• Since Oracle7 the arc can now be implemented by simply declaring two foreign
keys and writing one check constraint.
• The joins may be very inefficient as, in many cases, you would need the time-
consuming union operator:
select A.Name, X.Name, ’X’ Type
from AS A, XS X
where ...
union
select A.Name, Y.Name, ’Y’
from AS A, YS Y
where ...
• Foreign key constraint for the foreign key column cannot be declared since it
cannot reference more than one primary key.
......................................................................................................................................................
9-26 Data Modeling and Relational Database Design
Distributed Design
......................................................................................................................................................
Distributed Design
This is characterized as many physical databases, located at different nodes, but
appearing to be a single “logical database”.
Distributed Database
9-20
Characteristics
• Multiple physical databases
• One logical database view
• Possibly dissimilar processors
• Kernel runs wherever a part of the database exists
The multiple physical databases are not necessarily copies of each other or part of each
other.
You can decide on how to spread the individual table content across the different
databases on the different partitioning principles. You can decide for a vertical or
horizontal technique, or a combination of both.
......................................................................................................................................................
9-27
®
Lesson 9: Database Design Considerations
......................................................................................................................................................
• Resilience
• Reduced line traffic
• Location transparency
• Local autonomy
• Easier growth path
but
• Increased, distributed, complexity
9-21
• Improved flexibility and resilience. Access to data is not dependent on only one
machine or link. If there is any failure then some data is still accessible on the local
nodes. A failing link can automatically be rerouted via alternative links.
• Improved response time by having the data close to the usual users of the data.
This may reduce the line traffic dramatically. For example, in the model of
ElectronicMail, it is very likely that each country will have its own database. This
database will store in its own messages table the messages that belong to the
people registered in that country.
• Location transparency allows the physical data to be moved without the need to
change applications or notify users.
• Local autonomy allows each of the physical databases:
– To be managed independently.
– To have definitions and access rights created and controlled locally.
• An easier growth path is achieved:
– More processes can be added to the network
– More databases can be included on a node.
– Software update is independent of physical structure.
Disadvantage
A major disadvantage of distributed design is the often very complex configuration:
with the data the complexity is also distributed. System maintenance is complicated.
......................................................................................................................................................
9-28 Data Modeling and Relational Database Design
Oracle Database Structure
......................................................................................................................................................
DATABASE
Database Structure
consists
of part of
TABLESPACE consists
of
resides container part of
in SEGMENT
of
OTHER TABLE INDEX
SEGMENT SEGMENT SEGMENT
sliced in sliced in consists
of
located in part of part of
residence
of TABLE OR INDEX PARTITION part of
DATA FILE EXTENT USED FREE
consists resides in
of part of residence of
DATA BLOCK
Tablespaces
The diagram shows the structure of a Oracle database.
An Oracle database consists of one or more tablespaces. Each tablespace can
hold a number of segments, and each segment must be wholly contained in
its tablespaces. The SYSTEM tablespace is created as part of the database
creation, and should be reserved for the Oracle Data Dictionary and related
tables only. You should not create application data structures in this
tablespace. You are advised to create separate tablespaces for different types
of segments.
Segments
A segment is the space occupied by a database object. There are three types
of segments: a table segment, an index segment or an other segment, that is
used for clusters. Only the other segments must be part of one tablespace.
Partitions
Usually, a segment is assigned to a single tablespace. However, with Oracle8
it is possible to spread a table or index segment into more than one
tablespace. This technique is called partitioning. A partition is the part of a
table segment (or index segment) that resides in one tablespace.
......................................................................................................................................................
9-29
®
Lesson 9: Database Design Considerations
......................................................................................................................................................
Extents
Each time more space is needed by a segment, a number of contiguous
blocks is allocated as an extent. There is no maximum limit on the number
of extents that can be allocated to a segment. It is usually preferable to avoid
an excessive number of small extents by ensuring that the segment has a
sufficiently large initial extent.
Data Files
Data files are the operating system files that physically contain the database data. Data
files consist of data blocks.
Data Blocks
A data block is the smallest amount of data Oracle reads in one read operation. A data
block always contains information from one extent only.
There is a distinction between the logical table, made up of rows with columns, and
the physical table, taking space that is made up of database blocks organized in extents
and located in data files.
......................................................................................................................................................
9-30 Data Modeling and Relational Database Design
Summary
......................................................................................................................................................
Summary
Summary
• Data Types
• Primary, Foreign, and Artificial Keys
• Indexes
• Partitioning
• Views
• Distributed design
9-23
• Oracle provides a large choice of data types for the columns of the tables.
• Primary keys are needed for tables. Artificial keys can be a good solution to
implement complex primary keys.
• Indexes improve performance of queries and provide a mechanism for
guaranteeing unique values.
• Partitioning tables can also be a solution to performance problems.
• Views are a flexible, secure, and convenient object for users.
• Distributed Design is a complex technique. It allows data to be located closer to
the user.
......................................................................................................................................................
9-31
®
Lesson 9: Database Design Considerations
..........................................................................................................................................................................
Scenario
Use the model that illustrates Moonlight pricing.
CURRENCY of COUNTRY
# Code # Code Moonlight Pricing
with
in in with with
PRODUCT GROUP in
# Name with SHOP
from to with # No
EXCHANGE in * Name
RATE * Address
PRODUCT
# Month of * City
* Rate for
GLOBAL LOCAL
PRICELIST # Code # Name
# Start Date o Size
* End Date with
PRODUCT NAME
* Name
9-25
..........................................................................................................................................................................
9-32 Data Modeling and Relational Database Design
Practice 9—1: Data Types
......................................................................................................................................................
Your Assignment
1 Here you see table names and column names and the suggested data type. Do a
quality check on these. If you think it is appropriate, suggest an alternative.
2 Suggest data types for the following columns. They are all based on previous
practices.
3 What data type would you use for a column that contains times only?
......................................................................................................................................................
9-33
®
Lesson 9: Database Design Considerations
..........................................................................................................................................................................
Scenario
You need to make decisions on possible artificial keys for some of the Moonlight
tables. The model is the same as the one used in the previous practice.
Your Assignment
1 Indicate for each table if you see benefits of creating an artificial key and why.
COUNTRIES
GLOBAL_PRICES
PRICE_LISTS
2 For which tables (if any) based on the Moonlight model does it not make any sense
at all to create artificial keys?
..........................................................................................................................................................................
9-34 Data Modeling and Relational Database Design
Practice 9—3: Product Pictures
..........................................................................................................................................................................
Scenario
This is your last task for Moonlight coffees. Tomorrow you are free to forget all about
Moonlight and only drink coffee!
The decision has been made to make the first steps into the e-commerce market. One
objective is to allow customers to consult Moonlight’s website. This site should
provide product information. For each product at least two additional attributes have
been identified.
The first is the attribute Picture for images of the products. The second is an attribute
HTML Document that holds the product description that can be displayed with a
browser. Other attributes may follow.
Your Assignment
1 Decide what data type you would advise to be used for each column.
2 You have heard that an old Oracle version would not accept more than one long
type column per table. You are not sure if this is still a limitation. Advise about the
implementation.
..........................................................................................................................................................................
9-35
®
Lesson 9: Database Design Considerations
......................................................................................................................................................
......................................................................................................................................................
9-36 Data Modeling and Relational Database Design
.................................
Normalization
Appendix B: Normalization
..........................................................................................................................................
Introduction
Lesson aim
This lesson describes the steps involved in order to normalize table data to the third
normal form for cases when there is no possibility of performing a full data analysis.
Overview
• Table Normalization
• Normal Forms of Tables
B-2
Objectives
.............................................................................................................................................
B-2
Normalization and its Benefits
..........................................................................................................................................
History of Normalization
Normalization is a technique established by the originator of the relational model, E.F.
Codd. The complete set of normalization techniques, include twelve rules that
databases need to follow in order to be described as truly normalized. It is a technique
that was created in support of relational theory, years before entity relationship
modeling was developed. The entity relationship modeling process has incorporated
many of the normalization techniques to produce a normalized entity relationship
diagram.
Two terms that have their origins in the normalization technique are still widely in use.
One is normalized data, the other is denormalization.
Objective of Normalization
The major objective of normalization is to remove redundant data from an existing set
of tables or table definitions, thereby increasing the integrity of the database design
and to maximize flexibility of data storage. Removing redundant data helps to
eliminate update anomalies. The first three normal forms progress in a systematic
manner to achieve this objective.
There are many other normal forms in addition to the first three, and they deal with
more subtle anomalies. In general, the IT industry considers normalization to the Third
form an acceptable level to remove redundancy. With a few exceptions, higher
normalization levels are not widely used.
The major subject of normalization is tables, not entities.
..........................................................................................................................................
®
B-3
Appendix B: Normalization
..........................................................................................................................................
Why Normalize?
B-3
.............................................................................................................................................
B-4
Normalization and its Benefits
..........................................................................................................................................
B-4
Unnormalized Data
Data that has not been “normalized” is considered to be “unnormalized” data or data in
zero-normal form. This data is not to be confused with data that is denormalized. If no
ER Model was created at the start of a database design project, you are likely to have
unnormalized data, not denormalized data. If you want to add redundancy, for faster
performance or other reasons, you follow the rules defined during the process of
denormalization. But, to denormalize data you must start with normalized data. You
cannot denormalize an unnormalized design, just as you cannot de-ice your car, if
there is no ice on it.
..........................................................................................................................................
®
B-5
Appendix B: Normalization
..........................................................................................................................................
Normalization
Normalization consists of a series of rules that must be applied to move from a
supposedly unnormalized set of data to a normalized structure. The process is
described in various steps which lead to a “higher” level of normalization. These
levels are called normal forms.
Normalization Rules
Normal Form Rule Description
.............................................................................................................................................
B-6
First Normal Form
..........................................................................................................................................
B-6
..........................................................................................................................................
®
B-7
Appendix B: Normalization
..........................................................................................................................................
7773 Walsh 54101 05/07 Meeting Today There is.. 9988 EMEA01
0022 Patel 54101 05/07 Meeting Today There is.. 9988 EMEA01
First create a second table to contain the repeating group columns. Then create a
primary key composed of the primary key from the unnormalized table and another
column that is unique. Finally create a foreign key to link back to the first table.
B-8
.............................................................................................................................................
B-8
Second Normal Form
..........................................................................................................................................
B-9
..........................................................................................................................................
®
B-9
Appendix B: Normalization
..........................................................................................................................................
USERS
B-10
.............................................................................................................................................
B-10
Third Normal Form
..........................................................................................................................................
B-12
..........................................................................................................................................
®
B-11
Appendix B: Normalization
..........................................................................................................................................
B-13
USERS
SRVR
ID NAME _ID
USERS ---- ----- ----
USER USER SRVR SERVER 2301 Smith 3786
_ID _NAME _ID _NAME 5607 Jones 6001
---- ----- ---- ------ 7773 Walsh 9988
2301 Smith 3786 IMAP05 0022 Patel 9988
5607 Jones 6001 IMAP08
7773 Walsh 9988 EMEA01
0022 Patel 9988 EMEA01 MAIL_ ID NAME
SERVER ---- ------
3786 IMAP05
6001 IMAP08
9988 EMEA01
B-14
The theory of normalization goes further than the third normal form to cater for
several problematic constructions that may remain. Those normal forms are outside
the scope of this lesson.
.............................................................................................................................................
B-12
Summary
..........................................................................................................................................
Summary
Summary
B-15
..........................................................................................................................................
®
B-13
Appendix B: Normalization
..........................................................................................................................................
.............................................................................................................................................
B-14
Index
.....................................................................................................................................................
.....................................................................................................................................................
®
Index-1
Index
.....................................................................................................................................................
.....................................................................................................................................................
®
Index-2
Index
.....................................................................................................................................................
F
I
fan trap
pattern 6-15 identification 4-4
first normal form B-7 in database 4-5
foreign key indirect 4-8
cascade delete 9-15 problems 4-4
cascade update 9-15 real world 4-5
columns 7-13 identifiers
constraints 7-13 information-bearing 4-11
default and nullify 9-15 incorrect arcs 4-15
incorrect UIDs 4-10
.....................................................................................................................................................
®
Index-3
Index
.....................................................................................................................................................
.....................................................................................................................................................
®
Index-4
Index
.....................................................................................................................................................
.....................................................................................................................................................
®
Index-5
Index
.....................................................................................................................................................
.....................................................................................................................................................
®
Index-6
Index
.....................................................................................................................................................
U
UID
cascade composed 4-7
composed 4-7
multiple attribute 4-7
primary 4-9
relationships 4-8
secondary 4-9
single attribute 4-7
unique identifier 1-27, 4-6
primary 3-18
unique index 9-8
unique key 7-18
unique keys 7-5, 9-8
unnormalized data B-5
update
cascade 9-15
restrict 9-14
V
values 1-13
derivable
storing 8-6
hard-coded 8-10
VARCHAR2 9-6
views
usage 9-23
volatile attributes 1-14
W
words
reserved 2-15
.....................................................................................................................................................
®
Index-7
Index
.....................................................................................................................................................
.....................................................................................................................................................
Index-8