Chapter-2 Modelling Data-Edid 451 Sem
Chapter-2 Modelling Data-Edid 451 Sem
1 MODELING DATA
Objectives
• Define terms
• Understand importance of data modeling
• Write good names and definitions for entities, relationships, and attributes
• Distinguish unary, binary, and ternary relationships
Entity relationship (ER) diagrams include rectangles representing entities, and lines between the entities
representing relationships. Relationships have cardinalities, which can be one-to-one, one-to-many, or many-
to-many. In addition, on each side of the relationship you can specify that it is mandatory or optional.
1
Basic E-R notation
In addition to cardinalities, relationships also have degrees. A unary degree represents a relationship between
entities of the same entity type. A binary degree represents a relationship between entities of two different
entity types. A ternary degree represents a relationship between entities of three different entity types. In
principle you can have relationships between any number of entity types, so the term for this degree is “n-ary”.
Business Rules
- Are statements that define or constrain some aspect of the business
- Are derived from policies, procedures, events, functions
- Assert business structure
- Control/influence business behavior
- Are expressed in terms familiar to end users
- Are automated through DBMS software
A business rule is a statement that defines or constrains some aspect of the business. It is intended to assert
business structure or to control or influence the behavior of the business.
Not all business rules are implemented in a database, and it is the responsibility of the database analyst to
determine which business rules can be expressed through ER models and which cannot.
Business rules appear (possibly implicitly) in descriptions of business functions, events, policies, units,
stakeholders, and other objects. These descriptions can be found in interview notes from individual and group
information systems requirements collection sessions, organizational documents (e.g., personnel manuals,
2
policies, contracts, marketing brochures, and technical instructions), and other sources. Rules are identified by
asking questions about the who, what, when, where, why, and how of the organization.
Gathering business rules requires good interviewing and listening skills. As a database analyst, you will ask
questions about the who, what, when, where, why, and how of the organization. You’ll need to be persistent in
clarifying initial statements of rules because initial statements may be vague or imprecise, so this is an iterative
inquiry process.
Data objects must be named and defined before they can be used unambiguously in a model of organizational
data. Data names refer to the names of entities, their attributes, and their relationships, which are the data
objects. These names should be meaningful to the business interests and operations. In a sense, data names
should be “self-documenting”, which means they should “obviously” capture the essence of the data object.
Data Definitions:
Explanation of a term or fact
Term–word or phrase with specific meaning
Fact–association between two or more terms
Guidelines for good data definition
A concise description of essential data meaning
Gathered in conjunction with systems requirements
Accompanied by diagrams
Achieved by consensus, and iteratively refined
It is difficult to obtain a universally agreed-upon data definition. So, you may want to use multiple definitions to
cover the various situations, or alternatively Use a very general definition that will cover most situations. With data
definitions, as with most organizational knowledge, the person who controls the meaning of data controls the data.
Thus, the definition of data is a source of organizational power.
Entities:
Entity – a person, a place, an object, an event, or a concept in the user environment about which the
organization wishes to maintain data
Entity type – a collection of entities that share common properties or characteristics
Entity instance – A single occurrence of an entity type
It is important to distinguish an entity instance from an entity type. For example, an entity may be John Doe, a
particular person. But the entity type is “Person” as a concept. When you develop ER diagrams, the boxes
represent entity types, not entity instances. Although we use the word “entity” when describing ER diagrams,
what we are really talking about is “entity types”.
3
Entity Type and Entity Instances:
Here we see the distinction between an entity type and an entity instance. The entity type is represented in the first
two columns of this figure. It includes the names of the various attributes (remember what we talked about
regarding data names), as well as the types of data. By contrast, the third and fourth columns represent two entity
instances. These would be actual records (or rows) in the final database table that implements this entity type.
An Entity…
SHOULD BE:
An object that will have many instances in the database
An object that will be composed of multiple attributes
An object that we are trying to model
SHOULD NOT BE:
A user of the database system
An output of the database system (e.g., a report)
A common mistake people make when they are learning to draw E-R diagrams, especially if they are already
familiar with data process modeling (such as data flow diagramming), is to confuse data entities with other
elements of an overall information systems model. A simple rule to avoid such confusion is that a true data
entity will have many possible instances, each with a distinguishing characteristic, as well as one or more other
descriptive pieces of data.
4
Figure 2-4 Example of inappropriate entities
This figure illustrates a mistake many novices will make. The treasurer is a user of the system, and the expense
report is an output of the system. Neither of these are entities that should be represented in the database or the ER
model. The ER model should represent the objects that are of interest to the user and that will be displayed in the
system output.
Attributes:
Attribute–property or characteristic of an entity or relationship type
Classifications of attributes:
Required versus Optional Attributes
Simple versus Composite Attribute
Single-Valued versus Multivalued Attribute
Stored versus Derived Attributes
Identifier Attributes
In naming attributes, we use an initial capital letter followed by lowercase letters. If an attribute name consists
of more than one word, we use a space between the words
and we start each word with a capital letter, for example, Employee Name or Student Home Address. In E-R
diagrams, we represent an attribute by placing its name in the entity it describes.
6
Required – must have a value for every entity (or relationship) instance with which it is associated
Optional – may not have a value for every entity (or relationship) instance with which it is associated
This figure illustrates the various properties of an entity’s attributes. Required attributes must have a value,
whereas optional attributes could be null. Note that the identifier is ALWAYS required.
In this case, the student’s major is optional because a student may not yet have declared a major. All the other
attributes, however, are required.
Simple vs. Composite Attributes:
Composite attribute – An attribute that has meaningful component parts (attributes)
Sometimes many attributes are related to each other, such as the elements of an address. In this case they can be
grouped into a composite attribute. For simplicity, we can refer to the “employee address”, but if we want more
detail we can break it into street, city, state, and postal code. So, this way we have the option to describe the
attribute at a macro or at a micro level. Note the use of parentheses for encompassing the components of a
composite attribute.
7
A multivalued attribute is not the same as a composite attribute, although novices may confuse these terms. A
composite attribute is one that has many parts, such as an address composed of street, city, state, and zip. By
contrast, a multivalued attribute is one that can have many different values, such as an employee being able to do
many things.
Note that a derived attribute is not one that is physically stored in the database, but rather one that is calculated
based on the value of another. The length of time employed, or a person’s age, are classic examples, as they are
calculated based on a fixed starting point (date hired or birthdate).
Attributes could be both composite and multivalued, and even also derived. So these are distinct concepts.
Identifiers (Keys):
Identifier (Key)–an attribute (or combination of attributes) that uniquely identifies individual instances of
an entity type
Simple versus Composite Identifier
Candidate Identifier–an attribute that could be an identifier…satisfies the requirements for being an
identifier
Every entity type should have an identifier attribute. No two instances of the entity type may have the same
value for the identifier attribute. For example, a person (employee, student, etc.) cannot rely on the first and
last name to be an identifier, because many people could have the same name. Rather, the identifier should be
something like an employee ID, a social security number, or some other absolutely unique value.
8
In the ER diagram, and identifier will be underlined. Note also that required attributes are typically boldfaced, so
all identifiers will be boldfaced as well. If an identifier is composite, then all its component parts are required.
Naming Attributes:
Name should be a singular noun or noun phrase
Name should be unique
Name should follow a standard format
e.g. [Entity type name { [ Qualifier ] } ] Class
Similar attributes of different entity types should use the same qualifiers and classes
As with all other data objects, there are guidelines for naming and defining attributes. These are listed in this slide
and the next.
A common naming format is [Entity type name { [ Qualifier ] } ] Class, where [ . . . ] is an optional clause, and { . . . }
indicates that the clause may repeat. Entity type
name is the name of the entity with which the attribute is associated. The entity type name may be used to make
the attribute name explicit. It is almost always used for the identifier attribute (e.g., Customer ID) of each entity
type.
Class is a phrase from a list of phrases defined by the organization that are the permissible characteristics or
properties of entities (or abbreviations of these characteristics). For example, permissible values (and associated
approved abbreviations) for Class might be Name (Nm), Identifier (ID), Date (Dt), or Amount (Amt). Class is
required.
Qualifier (optional) is a phrase from a list of phrases defined by the organization that are used to place constraints
on classes.
Defining Attributes:
✓ State what the attribute is and possibly why it is important
✓ Make it clear what is and is not included in the attribute’s value
✓ Include aliases in documentation
✓ State source of values
✓ State whether attribute value can change once set
✓ Specify required vs. optional
✓ State min and max number of occurrences allowed
✓ Indicate relationships with other attributes
9
2.2 Modeling Data in the Organization
Database Development Process
Objectives
Model different types of attributes, entities, relationships, and cardinalities
Draw E-R diagrams for common business situations
Convert many-to-many relationships to associative entities
Model time-dependent data using time stamps
Modeling Relationships
Relationship Types vs. Relationship Instances
The relationship type is modeled as lines between entity types…the instance is between specific
entity instances
Relationships can have attributes
These describe features pertaining to the association between the entities in the relationship
Two entities can have more than one type of relationship between them (multiple relationships)
Associative Entity–combination of relationship and entity
This figure illustrates the difference between relationship types and relationship instances. The ER diagram
depicts types. It depicts both entity types and relationship types. The actual data that would be in the database
constitutes instances, both relationship and entity instances.
10
Degree of Relationships
Degree of a relationship is the number of entity types that participate in it
Unary Relationship
Binary Relationship
Ternary Relationship
Most relationships are of binary degree. But it is possible to have any number of entities involved in a relationship
“Ternary” refers to three. If you have more than that, it is sometimes referred to generically as an “n-ary”
relationship.
Cardinality of Relationships
One-to-One
Each entity in the relationship will have exactly one related entity
One-to-Many
An entity on one side of the relationship can have many related entities, but an entity on the other side
will have a maximum of one related entity
Many-to-Many
Entities on both sides of the relationship can have many related entities on the other side
c) Ternary relationships
The cardinality of this ternary relationship is many-to-many-to-many. In other words, each vendor could supply
many parts to many warehouses. Each part could come from many vendors and housed in many warehouses. Each
warehouse could have many parts from many vendors.
The dashed line is a way of representing the attributes of the relationship. For a given vendor supplying a given
part to a given warehouse, here is a shipping mode and a unit cost. Each of these ternary relationship instances
could have its own shipping mode and unit cost.
12
Cardinality Constraints
Cardinality Constraints—the number of instances of one entity that can or must be associated with each
instance of another entity
Minimum Cardinality
If zero, then optional
If one or more, then mandatory
Maximum Cardinality
The maximum number
Sometimes it is required for an entity to have its related entity, and sometimes not. Also, it is possible for there to
be a limit to how many related entities a given entity could be related to.
Note the hatch mark vs. the circle. The hatch mark illustrated mandatory cardinalities, whereas the circle
represents optional cardinalities. This figure indicates that each patient must have had at least one visit
(mandatory), and could have many more (many). By contrast, each patient history (visit) record must be
associated with exactly one patient.
Note that in all these ER diagrams cardinality is represented using something called “crow’s-feet” notation. The
three prongs on the many side of the relationship is called a “crow’s foot”. There are other possible notations in ER
diagram. For example, Microsoft Visio by default shows an arrow from the many side to the one side, although you
can change it to crow’s feet notation.
13
b) One optional, one mandatory
This figure shows a binary many-to-many relationship. In this case one side is mandatory and the other is optional.
Here every project must have at least one employee assigned to it, but an employee could possibly not be assigned
to any projects.
c) Optional cardinalities
This is a unary one-to-one relationship. According to this, a person could be married to one or no other person.
This figure rules out polygamy. Can you see why? How would we be able to allow polygamy in this ER diagram?
(Answer: make it many-to-many).
Dawn and Fred are single. Shirley is married to Ellis and Mack is married to Kathy.
14
Figure 2-21 Examples of multiple relationships
Here, we see a one-to-many unary relationship between employees. It shows that a given employee MUST have
exactly one supervisor and could supervise any number of other employees (or none at all).
We also see two binary relationships between employees and departments. First, each department must have at
least one, and possibly many, employees. Each employee must work in exactly one department. Also, each
department has exactly one employee as a manager, and each employee can manage at most one department, or
possible none at all.
The figure illustrates that there could be multiple types of relationships between entities.
Again, we see multiple relationships between two entities, this time between professors and courses. The “Is
Qualified” relationship is of binary degree and mandatory many-to-many cardinality. A professor must be qualified
to teach at least one course. And a course must have at least two qualified professors.
The other relationship is actually implemented via what is called an “associative entity” called Schedule, which
has an identifier attribute called Semester. We will shortly talk about associative entities in more detail. This
associative entity is implementing a many to many relationship between professors and courses, indicating that a
particular professor may be scheduled during a particular semester to many courses, and vice versa.
15
Figure 2-15a and 2-15b Multivalued attributes can be represented as relationships
In this figure we see two examples of multivalued attributes on the left. On the right we see instead separate
entities with relationships.
The top figure shows a simple multivalued attribute, whereas the bottom figure shows a composite multivalued
attribute. Note that on the right, it is explicit that there are many to many relationships. For example, although the
left side shows that a course can have many prerequisites, there is nothing explicit showing that a course could
itself be a prerequisite for multiple other courses. Similarly, on the left it is explicitly shown that an employee can
many many skills, but it is not explicitly shown that many employees can share the same skill. The figures on the
right, however, do make these facts explicit.
The right side of each figure is in Microsoft Visio notation.
16
Associative Entities
An entity–has attributes
A relationship–links entities together
When should a relationship with attributes instead be an associative entity?
✓ All relationships for the associative entity should be many
✓ The associative entity could have meaning independent of the other entities
✓ The associative entity preferably has a unique identifier, and should also have other attributes
✓ The associative entity may participate in other relationships other than the entities of the
associated relationship
✓ Ternary relationships should be converted to associative entities
Here, the date completed attribute pertains specifically to the employee’s completion of a course…it is an attribute
of the relationship.
Here, the relationship simply states that an employee completed a course on a particular date. The completion is
represented as a relationship, and is not an entity unto itself.
Associative entity is like a relationship with an attribute, but it is also considered to be an entity in its
own right.
Note that the many-to-many cardinality between entities in Figure 2-11a has been replaced by two one-
to-many relationships with the associative entity.
Here, the simple relationship has been replaced with an associative entity. A certificate is considered to be an
entity unto itself, and in fact even has a unique identifier attribute.
17
Figure 2-13c An associative entity – bill of materials structure
Here we see another example of an associative entity representing a bill of materials structure. If not for an
associative entity, the BOM structure would be represented as a unary many-to-many relationship between items.
Here is another example of an associative entity, this time as the center of a ternary relationship.
18
Figure 2-19 Simple example of time-stamping
Figure 2-20c E-R diagram with associative entity for product assignment to product line
over time
Modeling time-dependent data
has become more important due to
regulations such as HIPAA and
Sarbanes-Oxley.
19
Figure 2-22 Data model for Pine Valley Furniture Company in Microsoft Visio notation
Different modeling
software tools may
have different
notation for the
same constructs.
20