Chapter 2 Database
Chapter 2 Database
This chapter introduces data modeling using the Entity-Relationship (ER) approach. The basic
techniques described are applicable to the development of microcomputer based relational database
applications as well as those who use relational database servers such as MS SQL Server or Oracle.
The aims of this chapter are:
To define the terms entity type, entity, attribute, attribute value, primary key, relationship,
relationship type
To describe the terms unary, binary, ternary, degree, cardinality and optionality with regard to
relationship types
To show how many-many relationship types can be split into one-many relationship types
3.
logical design
4.
physical design
5.
implementation
The data model is one part of the conceptual design process. The other, typically is the functional
model. The data model focuses on what data should be stored in the database while the functional
model deals with how the data is processed. To put this in the context of the relational database, the
data model is used to design the relational tables. The functional model is used to design the queries
which will access and perform operations on those tables.
Components of A Data Model
The data model gets its inputs from the planning and analysis stage. Here the modeler, along with
analysts, collects information about the requirements of the database by reviewing existing
documentation and interviewing end-users.
The data model has two outputs. The first is an entity-relationship diagram which represents the data
structures in a pictorial form. Because the diagram is easily learned, it is valuable tool to communicate
the model to the end-user. The second component is a data document. This a document that describes
in detail the data objects, relationships, and rules required by the database. The dictionary provides
the detail required by the database developer to construct the physical database.
Fig. 2.1 An entity type CUSTOMER and one of its attributes Cus_no
Attributes
The data that we want to keep about each entity within an entity type is contained in attributes. An
attribute is some quality about the entities that we are interested in and want to hold on the database.
In fact we store the value of the attributes on the database. Each entity within the entity type will
have the same set of attributes, but in general different attribute values. For example the value of the
attribute ADDRESS for a customer J. Smith in a CUSTOMER entity type might be '10 Downing St.,
London' whereas the value of the attribute 'address' for another customer J. Major might be '22
Railway Cuttings, Cheam'.
There will be the same number of attributes for each entity within an entity type. That is one of the
characteristics of entity-relationship modelling and relational databases. We store the same type of
facts (attributes) about every entity within the entity type. If you knew that one of your customers
happened to be your cousin, there would be no attribute to store that fact in, unless you wanted to
have a 'cousin-yes-no' attribute, in which case nearly every customer would be a `no', which would be
considered a waste of space.
3.4 Primary Key
Attributes can be shown on the entity-relationship diagram in an oval. In Fig. 3.1, one of the attributes
of the entity type CUSTOMER is shown. It is up to you which attributes you show on the diagram. In
many cases an entity type may have ten or more attributes. There is often not room on the diagram
to show all of the attributes, but you might choose to show an attribute that is used to identify each
entity from all the others in the entity type. This attribute is known as the primary key. In some cases
you might need more than one attribute in the primary key to identify the entities.
In Fig. 2.1, the attribute CUS_NO is shown. Assuming the organization storing the data ensures that
each customer is allocated a different cus_no, that attribute could act as the primary key, since it
identifies each customer; it distinguishes each customer from all the rest. No two customers have the
same value for the attribute cus_no. Some people would say that an attribute is a candidate for being
a primary key because it is `unique'. They mean that no two entities within that entity type can have
the same value of that attribute. In practice it is best not to use that word because it has other
connotations.
As already mentioned, you may need to have a group of attributes to form a primary key, rather than
just one attribute, although the latter is more common. For example if the organization using the
CUSTOMER entity type did not allocate a customer number to its customers, then it might be
necessary to use a composite key, for example one consisting of the attributes SURNAME and
INITIALS together, to distinguish between customers with common surnames such as Smith. Even this
may not be sufficient in some cases.
Apart from serving as an identifier for each entity within an entity type, the primary key also serves as
the method of representing relationships between entities. The primary key becomes a foreign key in
all those entity types to which it is related in a one-one or one-many relationship type.
Relationship Types
The first two major elements of entity-relationship diagrams are entity types and attributes. The final
element is the relationship type. Sometimes, the word 'types' is dropped and relationship types are
called simply 'relationships' but since there is a difference between the terms, one should really use
the term relationship type.
Real-world entities have relationships between them, and relationships between entities on the entityrelationship diagram are shown where appropriate. An entity-relationship diagram consists of a
network of entity types and connecting relationship types. A relationship type is a named association
between entities. Individual entities have individual relationships of the type between them. An
idividual person (entity) occupies (relationship) an individual house (entity). In an entity-relationship
diagram, this is generalized into entity types and relationship types. The entity type PERSON is related
to the entity type HOUSE by the relationship type OCCUPIES. There are lots of individual persons, lots
of individual houses, and lots of individual relationships linking them.
Fig. 2.2 shows a single relationship type 'Received' and its inverse relationship type 'Was_sent_to'
between the two entity types CUSTOMER and INVOICE. It is very important to name all relationship
types. The reader of the diagram must know what the relationship type means and it is up to you the
designer to make the meaning clear from the relationship type name. The direction of both the
relationship type and its inverse should be shown to aid clarity and immediate readibility of the
diagram. The tense of the relationship type should also be clear from its name.
Fig. 2.5 There can be one, two, three or more entity types involved in a relationship.
Notice that the schema changes the semantics of the original relation to
employees may be given assignments to projects
and projects must be done by more than one employee assignment.
Transform Complex Relationships into Binary Relationships
Complex relationships are classified as ternary, an association among three entities, or n-ary, an
association among more than three, where n is the number of entities involved. For example, Figure
2.12 shows the relationship
Employees can use different skills on any one or more projects.
Each project uses many employees with various skills.
Complex relationships cannot be directly implemented in the relational model so they should be
resolved early in the modeling process. The strategy for resolving complex relationships is similar to
resolving many-to-many relationships.
identify and define the primary key attributes for each entity
validate primary keys and relationships
3.
the values must not change or become null during the life of each entity instance
In some instances, an entity will have more than one attribute that can serve as a primary key. Any
key or minimum set of keys that could be a primary key is called a candidate key. Once candidate
keys are identified, choose one, and only one, primary key for each entity. Choose the identifier most
commonly used by the user as long as it conforms to the properties listed above. Candidate keys
which are not chosen as the primary key are known as alternate keys.
An example of an entity that could have several possible primary keys is Employee. Let's assume that
for each employee in an organization there are three candidate keys: Employee ID, Social Security
Number, and Name.
Name is the least desirable candidate. While it might work for a small department where it would be
unlikely that two people would have exactly the same name, it would not work for a large organization
that had hundreds or thousands of employees. Moreover, there is the possibility that an employee's
name could change because of marriage. Employee ID would be a good candidate as long as each
employee were assigned a unique identifier at the time of hire. Social Security would work best since
every employee is required to have one before being hired.
Composite Keys
Sometimes it requires more than one attribute to uniquely identify an entity. A primary key that made
up of more than one attribute is known as a composite key. Figure 2.15 shows an example of a
composite key. Each instance of the entity Work can be uniquely identified only by a composite key
composed of Employee ID and Project ID.
Figure 2.15: Example of Composite Key
WORK
Employee ID
Project ID Hours_Worked
01
01
200
01
02
120
02
01
50
02
03
120
03
03
100
03
04
200
Artificial Keys
An artificial key is one that has no meaning to the business or organization. Artificial keys are
permitted when 1) no attribute has all the primary key properties, or 2) the primary key is large and
complex.
Primary Key Migration
Dependent entities, entities that depend on the existence of another entity for their identification,
inherit the entire primary key from the parent entity. Every entity within a generalization hierarchy
inherits the primary key of the root generic entity.
Define Key Attributes
Once the keys have been identified for the model, it is time to name and define the attributes that
have been used as keys.
There is no standard method for representing primary keys in ER diagrams. For this document, the
name of the primary key followed by the notation (PK) is written inside the entity box. An example is
shown below.
Figure 2.16: Entities with Key Attributes
Every entity in the data model shall have a primary key whose values uniquely identify entity
instances.
The primary key attribute cannot be optional (i.e., have null values).
The primary key cannot have repeating values. That is, the attribute may not have more than
one value at a time for a given entity instance is prohibited. This is known as the No Repeat
Rule.
Entities with compound primary keys cannot be split into multiple entities with simpler primary
keys. This is called the Smallest Key Rule.
Two entities may not have identical primary keys with the exception of entities within
generalization hierarchies.
The entire primary key must migrate from parent entities to child entities and from supertype,
generic entities, to subtypes, category entities.
Foreign Keys
A foreign key is an attribute that completes a relationship by identifying the parent entity. Foreign
keys provide a method for maintaining integrity in the data (called referential integrity) and for
navigating between different instances of an entity. Every relationship in the model must be supported
by a foreign key.
Identifying Foreign Keys
Every dependent and category (subtype) entity in the model must have a foreign key for each
relationship in which it participates. Foreign keys are formed in dependent and subtype entities by
migrating the entire primary key from the parent or generic entity. If the primary key is composite, it
may not be split.
Foreign Key Ownership
Foreign key attributes are not considered to be owned by the entities to which they migrate, because
they are reflections of attributes in the parent entities. Thus, each attribute in an entity is either
owned by that entity or belongs to a foreign key in that entity.
If the primary key of a child entity contains all the attributes in a foreign key, the child entity is said to
be "identifier dependent" on the parent entity, and the relationship is called an "identifying
relationship." If any attributes in a foreign key do not belong to the child's primary key, the child is not
identifier dependent on the parent, and the relationship is called "non identifying."
Non-key Attributes
Non-key attributes describe the entities to which they belong. In this section, we discuss the rules for
assigning non-key attributes to entities and how to handle multivalued attributes.
Relate attributes to entities
Non-key attributes can be in only one entity. Unlike key attributes, non-key attributes never migrate,
and exist in only one entity. from parent to child entities.
The process of relating attributes to the entities begins by the modeler, with the assistance of the endusers, placing attributes with the entities that they appear to describe. You should record your
decisions in the entity attribute matrix discussed in the previous section. Once this is completed, the
assignments are validated by the formal method of normalization.
Before beginning formal normalization, the rule is to place non-key attributes in entities where the
value of the primary key determines the values of the attributes. In general, entities with the same
primary key should be combined into one entity. Some other guidelines for relating attributes to
entities are given below.
Parent-Child Relationships
With parent-child relationships, place attributes in the parent entity where it makes sense to
do so (as long as the attribute is dependent upon the primary key)
If a parent entity has no non-key attributes, combine the parent and child entities.
Multivalued Attributes
If an attribute is dependent upon the primary key but is multivalued, has more than one value for a
particular value of the key), reclassify the attribute as a new child entity. If the multivalued attribute is
unique within the new entity, it becomes the primary key. If not, migrate the primary key from the
original, now parent, entity.
For example, assume an entity called PROJECT with the attributes Proj_ID (the key), Proj_Name,
Task_ID, Task_Name
Figure 2.17
PROJECT
Proj_ID
01
01
Analysis
01
02
Design
01
03
Programming
01
04
Tuning
02
01
Analysis
Task_ID and Task_Name have multiple values for the key attribute. The solution is to create a new
entity, let's call it TASK and make it a child of PROJECT. Move Task_ID and Task_Name from PROJECT
to TASK. Since neither attribute uniquely identifies a task, the final step would be to migrate Proj_ID
to TASK.
Attributes That Describe Relations
In some cases, it appears that an attribute describes a relationship rather than an entity (in the Chen
notation of ER diagrams this is permissible). For example,
a MEMBER borrows BOOKS.
Possible attributes are the date the books were checked out and when they are due. Typically, such a
situation will occur with a many-to-many relationship and the solution is the same. Reclassify the
relationship as a new entity which is a child to both original entities. In some methodologies, the
newly created is called an associative entity. See Figure 2.18 for an example of an converting a
relationship into an associative entity.
EXERCISES/ASSIGNMENTS:
Exercise 1 - CARS
Identify all entity types, attributes, relationship types and their degrees in the following case. Hence
draw an entity-relationship diagram.
An organization makes many models of cars, where a model is characterized by a name and a suffix
(such as GL or XL which indicates the degree of luxury) and an engine size.
Each model is made up from many parts and each part may be used in the manufacture of more than
one model. Each part has a description and an id code. Each model of car is produced at just one of
the firm's factories, which are located in London, Birmingham, Bristol, Wolverhampton and Manchester
- one in each city. A factory produces many models of car and many types of part although each type
of part is produced at one factory only.
Exercise 3 - MORTGAGES
Draw an entity-relationship diagram for the following. Produce also a list of questions you would have
to have answered in order to complete the model.
In a case study of this kind, and in particular in exam questions, there is not usually the space to
completely specify a problem. Remember also that not all the information given in a case study of this
type is necessarily relevant. Some information, while relevant to the organization concerned, might
not be relevant as far as database design is concerned.
Members of a friendly society invest money in any one of the society's branches. A member may hold
a number of investment accounts. Each investment account is associated with the branch where it was
opened, but money may be paid in or withdrawn at any branch. For each account, the member holds
an account book to record all transactions. A member may also have one mortgage account. All
mortgage accounts are associated with the Head Office. Payments may be transferred from any
investment account into the mortgage account.
Refrerences
Batini, C., S. Ceri, S. Kant, and B. Navathe. Conceptual Database Design: An Entity Relational
Approach. The Benjamin/Cummings Publishing Company, 1991.