Data Models
Data Models
Lesson 2
Rueda Street, Calbayog City, Samar, Philippines | +63 (055) 533 9857 | [email protected] | www.nwssu.edu.ph
Data Modeling and Data Models
Data modeling, the first step in designing a database, refers to the process of creating a specific data
model for a determined problem domain (a problem domain is a clearly defined area within the real-world
environment, with a well-defined scope and boundaries that will be systematically addressed.).
A data model is a relatively simple representation, usually graphical, of more complex real-world data
structures. In general terms, a model is an abstraction of a more complex real-world object or event. A
model’s main function is to help you understand the complexities of the real-world environment.
Within the database environment, a data model represents data structures and their characteristics,
relations, constraints, transformations, and other constructs with the purpose of supporting a specific
problem domain.
Data modeling is an iterative, progressive process. You start with a simple understanding of the problem
domain, and as your understanding increases, so does the level of detail of the data model. When done
properly, the final data model effectively is a “blueprint” with all the instructions to build a database that
will meet all end-user requirements. This blueprint is narrative and graphical in nature, meaning that it
contains both text descriptions in plain, unambiguous language and clear, useful diagrams depicting the
main data elements.
The Importance of Data Models
Data models can facilitate interaction among the designer, the applications programmer, and the end user.
A well-developed data model can even foster improved understanding of the organization for which the
database design is developed. In short, data models are a communication tool.
The importance of data modeling cannot be overstated. Data constitutes the most basic information
employed by a system. Applications are created to manage data and to help transform data into
information, but data is viewed in different ways by different people. For example, contrast the view of a
company manager with that of a company clerk. Although both work for the same company, the manager
is more likely to have an enterprise-wide view of company data than the clerk.
Even different managers view data differently. For example, a company president is likely to take a
universal view of the data because he or she must be able to tie the company’s divisions to a common
(database) vision. A purchasing manager in the same company is likely to have a more restricted view of
the data, as is the company’s inventory manager. In effect, each department manager works with a subset
of the company’s data. The inventory manager is more concerned about inventory levels, while the
purchasing manager is more concerned about the cost of items and about relationships with the suppliers
of those items.
Applications programmers have yet another view of data, being more concerned with data location,
formatting, and specific reporting requirements. Basically, applications programmers translate company
policies and procedures from a variety of sources into appropriate interfaces, reports, and query screens.
A sound data environment requires an overall database blueprint based on an appropriate data model.
When a good database blueprint is available, it does not matter that an applications programmer’s view of
the data is different from that of the manager or the end user. Conversely, when a good database blueprint
is not available, problems are likely to ensue. For instance, an inventory management program and an order
entry system may use conflicting product-numbering schemes, thereby costing the company thousands or
even millions of dollars.
The data model is an abstraction; you cannot draw the required data out of the data model. Just as you are
not likely to build a good house without a blueprint, you are equally unlikely to create a good database
without first creating an appropriate data model
.
Data Model Basic Building Blocks
The basic building blocks of all data models are entities, attributes, relationships, and constraints.
An entity is a person, place, thing, or event about which data will be collected and stored. An entity
represents a particular type of object in the real world, which means an entity is “distinguishable”—that is,
each entity occurrence is unique and distinct. For example, a CUSTOMER entity would have many
distinguishable customer occurrences, such as John Smith, Pedro Dinamita, and Tom Strickland. Entities
may be physical objects, such as customers or products, but entities may also be abstractions, such as
flight routes or musical concerts.
An attribute is a characteristic of an entity. For example, a CUSTOMER entity would be described by
attributes such as customer last name, customer first name, customer phone number, customer address,
and customer credit limit. Attributes are the equivalent of fields in file systems
.
A relationship describes an association among entities. For example, a relationship exists between
customers and agents that can be described as follows: an agent can serve many customers, and each
customer may be served by one agent. Data models use three types of relationships: one-to-many, many-
to-many, and one-to-one. Database designers usually use the shorthand notations 1:M or 1..*, M:N or *..*,
and 1:1 or 1..1, respectively. (Although the M:N notation is a standard label for the many-to-many
relationship, the label M:M may also be used.) The following examples illustrate the distinctions among the
three relationships.
.
The following examples illustrate the distinctions among the three relationships.
One-to-many (1:M or 1..*) relationship. A painter creates many different paintings, but each is painted
by only one painter. Thus, the painter (the “one”) is related to the paintings (the “many”). Therefore,
database designers label the relationship “PAINTER paints PAINTING” as 1:M. Note that entity names
are often capitalized as a convention, so they are easily identified. Similarly, a customer (the “one”)
may generate many invoices, but each invoice (the “many”) is generated by only a single customer.
The “CUSTOMER generates INVOICE” relationship would also be labeled 1:M.
Many-to-many (M:N or *..*) relationship. An employee may learn many job skills, and each job skill
may be learned by many employees. Database designers label the relationship “EMPLOYEE learns
SKILL” as M:N. Similarly, a student can take many classes and each class can be taken by many
students, thus yielding the M:N label for the relationship expressed by “STUDENT takes CLASS.”
One-to-one (1:1 or 1..1) relationship. A retail company’s management structure may require that each
of its stores be managed by a single employee. In turn, each store manager, who is an employee,
manages only a single store. Therefore, the relationship “EMPLOYEE manages STORE” is labeled 1:1.
The preceding discussion identified each relationship in both directions; that is, relationships are
bidirectional:
One CUSTOMER can generate many INVOICEs.
Each of the many INVOICEs is generated by only one CUSTOMER.
A constraint is a restriction placed on the data. Constraints are important because they help to ensure data
integrity. Constraints are normally expressed in the form of rules:
An employee’s salary must have values that are between 6,000 and 350,000.
A student’s GPA must be between 0.00 and 4.00.
Each class must have one and only one teacher
How do you properly identify entities, attributes, relationships, and constraints? The first step is to clearly
identify the business rules for the problem domain you are modeling.
Business Rules
From a database point of view, the collection of data becomes meaningful only when it reflects properly
defined business rules.
A business rule is a brief, precise, and unambiguous description of a policy, procedure, or principle within a
specific organization. In a sense, business rules are misnamed: they apply to any organization, large or
small—a business, a government unit, a religious group, or a research laboratory—that stores and uses data
to generate information.
Business rules derived from a detailed description of an organization’s operations help to create and
enforce actions within that organization’s environment. Business rules must be rendered in writing and
updated to reflect any change in the organization’s operational environment.
Properly written business rules are used to define entities, attributes, relationships, and constraints. Any
time you see relationship statements such as “an agent can serve many customers, and each customer can
be served by only one agent,” business rules are at work.
.
To be effective, business rules must be easy to understand and widely disseminated to ensure
that every person in the organization shares a common interpretation of the rules. Business
rules describe, in simple language, the main and distinguishing characteristics of the data as
viewed by the company.
Examples of business rules are as follows:
A customer may generate many invoices.
An invoice is generated by only one customer.
A training session cannot be scheduled for fewer than 10 employees or for more than 30
employees.
More examples of business rules.
Business Rules: More Examples
.
Discovering Business Rules
The main sources of business rules are company managers, policy makers, department managers, and
written documentation such as a company’s procedures, standards, and operations manuals. A faster
and more direct source of business rules is direct interviews with end users. Unfortunately, because
perceptions differ, end users are sometimes a less reliable source when it.
The process of identifying and documenting business rules is essential to database design for several
reasons:.
It helps to standardize the company’s view of data.
It can be a communication tool between users and designers.
It allows the designer to understand the nature, role, and scope of the data.
It allows the designer to understand business processes.
It allows the designer to develop appropriate relationship participation rules and constraints and to create an
accurate data model.
Translating Business Rules into Data Model Components
As a general rule, a noun in a business rule will translate into an entity in the model, and a verb (active
or passive) that associates the nouns will translate into a relationship among the entities. For example,
the business rule “a customer may generate many invoices” contains two nouns (customer and
invoices) and a verb (generate) that associates the nouns. From this business rule, you could deduce
the following:
Customer and invoice are objects of interest for the environment and should be represented by their
respective entities.
There is a generate relationship between customer and invoice.
To properly identify the type of relationship, you should consider that relationships are bidirectional;
that is, they go both ways. For example, the business rule “a customer may generate many invoices” is
complemented by the business rule “an invoice is generated by only one customer.” In that case, the
relationship is one-to-many (1:M). Customer is the “1” side, and invoice is the “many” side.
.
As a general rule, to properly identify the relationship type, you should ask two
questions:
How many instances of B are related to one instance of A?
How many instances of A are related to one instance of B?
For example, you can assess the relationship between student and class by asking two questions:
In how many classes can one student enroll? Answer: many classes.
How many students can enroll in one class? Answer: many students.
Therefore, the relationship between student and class is many-to-many (M:N).
.
Naming Conventions
During the translation of business rules to data model components, you identify entities, attributes,
relationships, and constraints. This identification process includes naming the object in a way that
makes it unique and distinguishable from other objects in the problem domain. Therefore, it is
important to pay special attention to how you name the objects you are discovering.
Entity names should be descriptive of the objects in the business environment and use terminology
that is familiar to the users. An attribute name should also be descriptive of the data represented by
that attribute. It is also a good practice to prefix the name of an attribute with the name or
abbreviation of the entity in which it occurs. For example, in the CUSTOMER entity, the customer’s
credit limit may be called CUS_CREDIT_LIMIT. The CUS indicates that the attribute is descriptive of the
CUSTOMER entity, while CREDIT_LIMIT makes it easy to recognize the data that will be contained in
the attribute.
.
Network Model
Hierarchical Model
The Relational Model
Developed by E. F. Codd of IBM in 1970, the relational model is based on mathematical set theory and
represents data as independent relations. Each relation (table) is conceptually represented as a two-
dimensional structure of intersecting rows and columns. The relations are related to each other through
the sharing of common entity characteristics (values in columns).
The relational model’s foundation is a mathematical concept known as a relation. To avoid the
complexity of abstract mathematical theory, you can think of a relation (sometimes called a table) as a
two-dimensional structure composed of intersecting rows and columns. Each row in a relation is called a
tuple. Each column represents an attribute. The relational model also describes a precise set of data
manipulation constructs based on advanced mathematical concepts.
The relational data model is implemented through a very sophisticated relational database management
system (RDBMS) - a collection of programs that manages a relational database. The RDBMS software
translates a user’s logical requests (queries) into commands that physically locate and retrieve the
requested data.
Tables are related to each other through the sharing of a common attribute (a value in a column). For
example, the CUSTOMER table in Figure 2.1 might contain a sales agent’s number that is also contained
in the AGENT table..
.
The Entity Relationship Model.
The entity relationship (ER) model, or ERM, has become a widely accepted standard for data
modeling.
ER models are normally represented in an entity relationship diagram (ERD), which uses graphical
representations to model database components.
The ER model is based on the following components: entity, attributes for each entity, and
relationships.
Figure 2.3 shows the different types of relationships using three ER notations: the original Chen
notation, the Crow’s Foot notation, and the newer class diagram notation, which is part of the
Unified Modeling Language (UML)
.
Objects that share similar characteristics are grouped in classes. A class is a collection of similar
objects with shared structure (attributes) and behavior (methods). In a general sense, a class
resembles the ER model’s entity set. However, a class is different from an entity set in that it
contains a set of procedures known as methods. A class’s method represents a real-world action
such as finding a selected PERSON’s name, changing a PERSON’s name, or printing a PERSON’s
address. In other words, methods are the equivalent of procedures in traditional programming
languages. In OO terms, methods define an object’s behavior.
Object-oriented data models are typically depicted using Unified Modeling Language (UML) class
diagrams. UML is a language based on OO concepts that describes a set of diagrams and symbols
you can use to graphically model a system.
UML class diagrams are used to represent data and its relationships within the larger UML object-
oriented system’s modeling language. For a more complete description of UML, see Appendix H,
Unified Modeling Language (UML).
As you examine Figure 2.4, The object representation of the INVOICE includes all related objects
within the same object box. Note that the connectivities (1 and M) indicate the relationship of the
related objects to the INVOICE. For example, the “1” next to the CUSTOMER object indicates that
each INVOICE is related to only one CUSTOMER. The “M” next to the LINE object indicates that each
INVOICE contains many LINEs.
The UML class diagram uses three separate object classes (CUSTOMER, INVOICE, and LINE) and
two relationships to represent this simple invoicing problem. Note that the relationship
connectivities are represented by the 1..1, 0..*, and 1..* symbols, and that the relationships are
named in both ends to represent the different “roles” that the objects play in the relationship.
The ER model also uses three separate entities and two relationships to represent this simple
invoice problem.