DAMA-DMBOK - Ch-1
DAMA-DMBOK - Ch-1
DAMA-DMBOK - Ch-1
Data Management
1. INTRODUCTION
Many organizations recognize that their data is a vital enterprise asset.
Data and information can give them insight about their customers,
products, and services. It can help them innovate and reach strategic goals.
Despite that recognition, few organizations actively manage data as an
asset from which they can derive ongoing value (Evans and Price, 2012).
Deriving value from data does not happen in a vacuum or by accident. It
requires intention, planning, coordination, and commitment. It requires
management and leadership.
Data Management is the development, execution, and supervision of
plans, policies, programs, and practices that deliver, control, protect, and
enhance the value of data and information assets throughout their
lifecycles.
A Data Management Professional is any person who works in any facet of
data management (from technical management of data throughout its
lifecycle to ensuring that data is properly utilized and leveraged) to meet
strategic organizational goals. Data management professionals fill
numerous roles, from the highly technical (e.g., database administrators,
network administrators, programmers) to strategic business (e.g., Data
Stewards, Data Strategists, Chief Data Officers).
Data management activities are wide-ranging. They include everything
from the ability to make consistent decisions about how to get strategic
value from data to the technical deployment and performance of
databases. Thus data management requires both technical and non-
technical (i.e., ‘business’) skills. Responsibility for managing data must be
shared between business and information technology roles, and people in
both areas must be able to collaborate to ensure an organization has high
quality data that meets its strategic needs.
Data and information are not just assets in the sense that organizations
invest in them in order to derive future value. Data and information are
also vital to the day-to-day operations of most organizations. They have
been called the ‘currency’, the ‘life blood’, and even the ‘new oil’ of the
information economy.1 Whether or not an organization gets value from its
analytics, it cannot even transact business without data.
To support the data management professionals who carry out the work,
DAMA International (The Data Management Association) has produced
this book, the second edition of The DAMA Guide to the Data
Management Body of Knowledge (DMBOK2). This edition builds on the
first one, published in 2009, which provided foundational knowledge on
which to build as the profession advanced and matured.
This chapter outlines a set of principles for data management. It discusses
challenges related to following those principles and suggests approaches
for meeting these challenges. The chapter also describes the DAMA Data
Management Framework, which provides the context for the work carried
out by data management professionals within various Data Management
Knowledge Areas.
1.2 Goals
Within an organization, data management goals include:
2. Essential Concepts
2.1 Data
Long-standing definitions of data emphasize its role in representing facts
about the world.2 In relation to information technology, data is also
understood as information that has been stored in digital form (though data
is not limited to information that has been digitized and data management
principles apply to data captured on paper as well as in databases). Still,
because today we can capture so much information electronically, we call
many things ‘data’ that would not have been called ‘data’ in earlier times
– things like names, addresses, birthdates, what one ate for dinner on
Saturday, the most recent book one purchased.
Such facts about individual people can be aggregated, analyzed, and used
to make a profit, improve health, or influence public policy. Moreover our
technological capacity to measure a wide range of events and activities
(from the repercussions of the Big Bang to our own heartbeats) and to
collect, store, and analyze electronic versions of things that were not
previously thought of as data (videos, pictures, sound recordings,
documents) is close to surpassing our ability to synthesize these data into
usable information.3 To take advantage of the variety of data without
being overwhelmed by its volume and velocity requires reliable,
extensible data management practices.
Most people assume that, because data represents facts, it is a form of
truth about the world and that the facts will fit together. But ‘facts’ are not
always simple or straightforward. Data is a means of representation. It
stands for things other than itself (Chisholm, 2010). Data is both an
interpretation of the objects it represents and an object that must be
interpreted (Sebastian-Coleman, 2013). This is another way of saying that
we need context for data to be meaningful. Context can be thought of as
data’s representational system; such a system includes a common
vocabulary and a set of relationships between components. If we know the
conventions of such a system, then we can interpret the data within it.4
These conventions are often documented in a specific kind of data referred
to as Metadata.
However, because people often make different choices about how to
represent concepts, they create different ways of representing the same
concepts. From these choices, data takes on different shapes. Think of the
range of ways we have to represent calendar dates, a concept about which
there is an agreed-to definition. Now consider more complex concepts
(such as customer or product), where the granularity and level of detail of
what needs to be represented is not always self-evident, and the process of
representation grows more complex, as does the process of managing that
information over time. (See Chapter 10).
Even within a single organization, there are often multiple ways of
representing the same idea. Hence the need for Data Architecture,
modeling, governance, and stewardship, and Metadata and Data Quality
management, all of which help people understand and use data. Across
organizations, the problem of multiplicity multiplies. Hence the need for
industry-level data standards that can bring more consistency to data.
Organizations have always needed to manage their data, but changes in
technology have expanded the scope of this management need as they
have changed people’s understanding of what data is. These changes have
enabled organizations to use data in new ways to create products, share
information, create knowledge, and improve organizational success. But
the rapid growth of technology and with it human capacity to produce,
capture, and mine data for meaning has intensified the need to manage
data effectively.
As these costs and benefits imply, managing Data Quality is not a one-
time job. Producing high quality data requires planning, commitment, and
a mindset that builds quality into processes and systems. All data
management functions can influence Data Quality, for good or bad, so all
of them must account for it as they execute their work. (See Chapter 13).
Creation and usage are the most critical points in the data
lifecycle: Data management must be executed with an
understanding of how data is produced, or obtained, as well as
how data is used. It costs money to produce data. Data is
valuable only when it is consumed or applied. (See Chapters 5,
6, 8, 11, and 14.)
While the DAMA Wheel presents the set of Knowledge Areas at a high
level, the Hexagon recognizes components of the structure of Knowledge
Areas, and the Context Diagrams present the detail within each
Knowledge Area. None of the pieces of the existing DAMA Data
Management framework describe the relationship between the different
Knowledge Areas. Efforts to address that question have resulted in
reformulations of the DAMA Framework, which are described in the next
two sections.
3.4 DMBOK Pyramid (Aiken)
If asked, many organizations would say that they want to get the most of
out of their data – they are striving for that golden pyramid of advanced
practices (data mining, analytics, etc.). But that pyramid is only the top of
a larger structure, a pinnacle on a foundation. Most organizations do not
have the luxury of defining a data management strategy before they start
having to manage data. Instead, they build toward that capability, most
times under less than optimal conditions.
Peter Aiken’s framework uses the DMBOK functional areas to describe
the situation in which many organizations find themselves. An
organization can use it to define a way forward to a state where they have
reliable data and processes to support strategic business goals. In trying to
reach this goal, many organizations go through a similar logical
progression of steps (See Figure 8):
Aiken’s pyramid draws from the DAMA Wheel, but also informs it by
showing the relation between the Knowledge Areas. They are not all
interchangeable; they have various kinds of interdependencies. The
Pyramid framework has two drivers. First, the idea of building on a
foundation, using components that need to be in the right places to support
each other. Second, the somewhat contradictory idea that these may be put
in place in an arbitrary order.
3.5 DAMA Data Management Framework
Evolved
Aiken’s pyramid describes how organizations evolve toward better data
management practices. Another way to look at the DAMA Knowledge
Areas is to explore the dependencies between them. Developed by Sue
Geuens, the framework in Figure 9 recognizes that Business Intelligence
and Analytic functions have dependencies on all other data management
functions. They depend directly on Master Data and data warehouse
solutions. But those, in turn, are dependent on feeding systems and
applications. Reliable Data Quality, data design, and data interoperability
practices are at the foundation of reliable systems and applications. In
addition, data governance, which within this model includes Metadata
Management, data security, Data Architecture and Reference Data
Management, provides a foundation on which all other functions are
dependent.
1. Introduction
Business Drivers
Goals and Principles
Essential Concepts
2. Activities
3. Tools
4. Techniques
5. Implementation Guidelines
6. Relation to Data Governance
7. Metrics
Knowledge Areas describe the scope and context of sets of data
management activities. Embedded in the Knowledge Areas are the
fundamental goals and principles of data management. Because data
moves horizontally within organizations, Knowledge Area activities
intersect with each other and with other organizational functions.
Data Handling Ethics describes the central role that data ethics
plays in making informed, socially responsible decisions about
data and its uses. Awareness of the ethics of data collection,
analysis, and use should guide all data management
professionals. (Chapter 2)
Big Data and Data Science describes the technologies and
business processes that emerge as our ability to collect and
analyze large and diverse data sets increases. (Chapter 14)
Data Management Maturity Assessment outlines an approach
to evaluating and improving an organization’s data management
capabilities. (Chapter 15)
Data Management Organization and Role Expectations
provide best practices and considerations for organizing data
management teams and enabling successful data management
practices. (Chapter 16)