Data Model Patterns
Data Model Patterns
Section 1: Introduction
Definitions
Model: an abstraction of some aspect of a problem. Data model: a model that lets you understand the structure of data. Do not model problems literally Instead search for the deep, inner essence of a problem. Such a model accommodates change and is less costly to develop. It is straightforward to implement a data model. Pattern: a model fragment that is profound and recurring. Patterns focus on structure (classes and relationships). Attributes provide fine details that vary for specific applications. Our focus here is on patterns for data models and databases.
Drawbacks of Patterns
Sporadic coverage. You cannot build a model by just combining patterns. Typically you will use only a few patterns, but they often embody key insights. Pattern discovery. It can be difficult to find a pattern, especially if your idea is ill-formed. Complexity. Patterns are an advanced topic and can be difficult to understand. Inconsistencies. There has been a real effort in the literature to cross reference other work and build on it. However, inconsistencies still happen. Immature technology. The patterns literature is active but the field is still evolving.
Section 2: Aspects of Pattern Technology Mathematical template: an abstract model fragment that is devoid of application content. Driven by deep data structures that often arise in database models. Notation: Angle brackets denote parameters that are placeholders. Antipattern: a characterization of a common software flaw. Shows what not to do and how to fix it Archetype: a deep concept that is prominent and cuts across problem domains. Identity: the means for denoting individual objects, so that they can be found. Canonical model: a submodel that provides a useful service for many applications The remaining lecture will partially cover the above topics.
Section 3: Mathematical Template Tree Tree: a term from graph theory. A tree is a set of nodes that connect from child to parent. Each node has one parent node except for the node at the trees top. A node can have many (zero or more) child nodes. There are no cycles at most one path connects any two nodes. An example of a tree...
A
Hardcoded Tree
Hardcoded tree template
<Tree>
0..1 1 <Level 1 class> root 1
*
1
Division
<Level 2 class>
1
Department
<Level 3 class>
...
Use when: The structure of a tree is well known and it is important to enforce the sequence of types in the levels of the hierarchy. In practice, used for examples, but seldom for code.
2010 Michael R. Blaha
Patterns of Data Modeling 10
Simple Tree
Simple tree template
<Tree>
root 0..1 1 parent 0..1
<Node>
child
subordinate
{All nodes have a parent except the root node.} {There cannot be any cycles.}
{Every person has a manager, except the CEO.} {The management hierarchy must be acyclic.}
Use when: Tree decomposition is merely a matter of data structure. Node names can be globally unique or unique within the context of a parent.
11
Structured Tree
Structured tree template
<Tree>
0..1 root child <Node> 1 * 0..1 parent
0..1
<Leaf>
<Branch>
Text
GeometricObject
Group
{All nodes have a parent except the root node.} {There cannot be any cycles.}
Use when: Branch nodes and leaf nodes have different attributes, relationships, and/or behavior. Node names can be globally unique or unique within the context of a parent.
12
Overlapping Trees
Overlapping trees template
<Tree>
* *
0..1 parent
* *
0..1 parent
Part
*
{All nodes have a parent except the root node.} {There cannot be any cycles.} {A parent must only have children for trees to which the parent belongs.}
*
{Each BOM must be acyclic.}
Use when: A node can belong to multiple trees. Example: A part can have several bill-of-materials, such as one for manufacturing, another for engineering, and another for service. Motivated by [Fowler, page 21] but a more powerful template capturing the constraint that a child has at most one parent for a tree.
2010 Michael R. Blaha
Patterns of Data Modeling 13
* *
parent 1 1 child 1
<Tree>
root 0..1 1
{All nodes have a parent except the root node. There cannot be any cycles.} {A child has at most one parent at a time.}
14
effectiveDate expirationDate
* *
parent 1 1 child 1 1 Position root
Person
15
Class name
Use when: The grouping of a parent and its children must be described.
16
Additional Templates
There are additional templates. Directed graph. Simple DG. Treats all nodes the same. Structured DG. Differentiates leaf nodes from branch nodes. Node-edge DG. Treats nodes and edges as peers. Connection DG. Promotes a node-edge connection to a class. Simple DG changing over time. Stores variants of a simple DG over time. Node-edge DG changing over time. Stores variants of a node-edge DG over time.
17
18
Manager
1
IndividualContributor
subordinate
{Every person has a manager, except the CEO.} {The management hierarchy must be acyclic.}
19
*
0..1
name title
Department name
Manager
IndividualContributor
20
effectiveDate expirationDate
* *
parent 1 1 child 1 1 Position root
Person
The model provides matrix management. This is because the model does not enforce a treethat a child can only have a single parent at a time. Application code would need to provide such a constraint if it was desired.
21
* *
subordinate
{Every person has a manager, except the CEO.} {The management graph must be acyclic.}
* *
name title
Department name
Manager
IndividualContributor
22
effectiveDate expirationDate
* *
parent 1 1 child 1 1 Position root
Person
Person
1
23
Section 6: Antipatterns Antipattern: a characterization of a software flaw. When you find an antipattern, substitute the correction. Universal antipattern avoid for all applications. Non-data-warehouse antipattern acceptable for data warehouses, but avoid them otherwise. Patterns are good ideas that can be reused. In contrast, antipatterns look at what can go wrong. The literature focuses on antipatterns for programming code, but antipatterns also apply to data models. [Brown-98]. An antipattern is some repeated practice that initially appears to be beneficial, but ultimately produces more bad consequences than beneficial results.
24
Improved model
Contract
*
0..1
RelatedContract
ContractRelationship
Observation: There is a self relationship with the same multiplicity and role names on each end. Symmetric relationships are always troublesome for relational databases.
Which column is first? Which column is second? Double entry or double searching of data.
Improved model: Promote the relationship to a class in its own right. The improved model is often more expressive.
25
Improved model
Employee employeeType / reportingLevel
boss 0..1
*
Supervisor
1
*
subordinate
*
IndividualContributor
Observation: There is a fixed hierarchy with little difference between the levels. Contrast with the hardcoded tree template where there is a material difference between the levels. Improved model: Abstract and consolidate the levels. Use one of the tree patterns to relate the levels.
2010 Michael R. Blaha
Patterns of Data Modeling 26
27
Improved model
1
Metric name
*
Organization name
1
FinancialData quantity
*
1
Product name
Observation: A class has groups of similar attributes. Such a model can be brittle, verbose, and awkward to extend. Exceptions: OK for data warehouses. Improved model: Abstract and factor out commonality. The improved model can handle new products and financial metrics.
28
Improved model
1
* * *
actor *
*
name
producer
Person name
* director *
actress
Person
*
Movie
writer
MovieRole
name
*
1
RoleType name
Observation: Two classes have several (at least three) similar relationships. Exceptions: OK for data warehouses. Improved model: Abstract and factor out commonality.
29
Improved model
Customer accountNumber customerName customerType customerStatus
Observation: A class has disparate attributes and lacks cohesion. The contact position and contact phone depend on the contact name which in turn depends on the customer. Several customer records could have the same contact name with inconsistent positions and phones. Exceptions: OK for data warehouses. Improved model: Make each concept its own class.
2010 Michael R. Blaha
Patterns of Data Modeling 30
datetime 8 datetime 8
32
34
35
36
37
Section 8: Archetypes Archetype: an abstraction that often occurs and transcends individual applications. Archetypes are similar in style to other data pattern books and their emphasis on seed models. The difference is that archetypes emphasize the core concepts and omit application details. The term archetype is taken from [Arlow-2004].
38
Archetype: Account
Account a label for recording and reporting a quantity of something. An owner can have multiple accounts for an account type. Some accounts can be unwanted duplicates and remain undetected. AccountEquivalence can logically combine accounts without having to move data (see the Symmetric Relationship Antipattern).
AccountType name {unique} 1 Account
* *
39
Archetype: Actor
Actor someone or something that is notable in terms of data or relationships. Useful as a hook for permissions, approvals, logging... This archetype is consistent with the literature but is more robust, adding roles and applications.
Actor name effectiveDate expirationDate
TangibleActor
ActorRole
ActorRoleType
Person
Application
Organization
40
Archetype: Part
Part a specific good that can be described. PhysicalPart a tangible thing. Customer service records refer to physical cars. Occurrences form a collection of trees. CatalogPart a description of a similar group of things. Design documents describe car models. Occurrences form a directed acyclic graph.
Contains
component 0..1
Contains quantity
component
CatalogPart modelNumber
role
0..1
assembly
PhysicalPart
0..1 assembly
serialNumber[0..1] Describes
0..1
41
Additional Archetypes
Address a means for communicating with an actor. Asset something of value. Contract an agreement for the supply of products. Course a series of lessons about a subject. Customer someone involved in the purchase of products. Document a physical or electronic representation of a body of information. Event an occurrence at some point in time. Flight the travel by an airplane between airports. Item a part or a service. Location a physical place in space.
42
43
Section 9: Pattern Literature Jim Arlow and Ila Neustadt. Enterprise Patterns and MDA: Building Better Software with Archetype Patterns and UML. Boston: Addison-Wesley, 2004. Their archetype models are large and more like seed models. Small archetype models are more likely to be application independent and reusable. They distinguish between client and supplier. This is a modeling error. This is completely unnecessary, given that they have roles.
Party
1
PartyRole
client 1 1 supplier
PartyRoleType
**
PartyRelationship
The book focuses on design and programming. Data modeling notation: UML class model.
2010 Michael R. Blaha
Patterns of Data Modeling 44
45
46
47
48
49