0% found this document useful (0 votes)
36 views

An Introduction To Data Catalogs The Future of Data Management

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views

An Introduction To Data Catalogs The Future of Data Management

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Introduction to

Data Catalogs
Research Sponsored by

By Dave Wells

This publication may not be reproduced or distributed


without Eckerson Group’s prior permission.
INTRODUCTION TO DATA CATALOGS 2

About the Author Table of Contents


Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 A New Approach to Metadata Management. . . 13

Chapter 1: What is a Data Catalog?. . . . . . . . . . . . . 4


Metadata in the Age of Self-Service. . . . . . . . . . . 13

Starting the Data Cataloging Journey. . . . . . . . . . 5


Metadata and the Data Catalog. . . . . . . . . . . . . . . 14

Data Catalog Defined. . . . . . . . . . . . . . . . . . . . . . . . . 5


Chapter 4: Collaboration
and Crowdsourcing. . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Dave Wells is a Senior Analyst leading the Data Why Do We Need a Data Catalog? . . . . . . . . . . . . . 5
Management Practice at Eckerson Group, a Participative and Collaborative
business intelligence and analytics research What Does a Data Catalog Do? . . . . . . . . . . . . . . . . 6 Data Management. . . . . . . . . . . . . . . . . . . . . . . . . . . 16
and consulting organization. He brings a unique
perspective to data management based on five Chapter 2: What is Data Curation?. . . . . . . . . . . . . . 8 Why Collaboration and Crowdsourcing —
decades of working with data in both technical A Macro View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
and business roles. Dave works at the intersection More Than Shared Databases. . . . . . . . . . . . . . . . . 9
of information management and business
Why Collaboration and Crowdsourcing —
management, where real value is derived from What is Curation? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
An In-The-Trenches View. . . . . . . . . . . . . . . . . . . . . 16
data assets. He is an industry analyst, consultant,
and educator dedicated to building meaningful What is Data Curation?. . . . . . . . . . . . . . . . . . . . . . . . 9
Chapter 5: Driving Data Catalog Adoption. . . . . 18
and enduring connections throughout the path
from data to business value. Knowledge sharing Who Are the Data Curators?. . . . . . . . . . . . . . . . . . . 9 Getting People Involved . . . . . . . . . . . . . . . . . . . . . 19
and skills development are Dave’s passions, carried
out through consulting, speaking, teaching, and What About Data Stewards?. . . . . . . . . . . . . . . . . . 10 Understanding the Adoption Challenges. . . . . . 19
writing. He is a continuous learner – fascinated
with understanding how we think – and a student Chapter 3: Data Catalogs Closing Thoughts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
and practitioner of systems thinking, critical and Metadata Management. . . . . . . . . . . . . . . . . . . 12
thinking, design thinking, divergent thinking, and About Eckerson Group. . . . . . . . . . . . . . . . . . . . . . . . 22
innovation. Single Source for Shared Metadata. . . . . . . . . . . 13

© Eckerson Group 2019 www.eckerson.com


INTRODUCTION TO DATA CATALOGS 3

Introduction
The difficulties of data management have without visibility into existing data sets, their data curation, and data governance. Data catalogs
intensified at a steady pace over the past several contents, or their quality and usefulness. As a touch nearly everyone who works with data.
years. The management complexities of big result, analysts spent much of their time finding Success with data cataloging begins with
data, cloud hosting, self-service analytics, and data, understanding data, and recreating data sets fundamental knowledge of data catalog basics.
data science can’t be ignored. Effective data that already existed. Data catalogs were designed You’ll need to understand the what and why of
management has become a top priority for most to address these issues. data cataloging, the role and purpose of data
organizations, but getting there is challenging. A From modest beginnings as a means to manage curation, how data catalogs are a game-changer
data catalog has an essential role in overcoming data inventory and expose data sets to analysts, the for metadata management, and the importance of
these challenges. data catalog has grown in functionality, popularity, collaboration and crowdsourcing. Ultimately, you’ll
Data catalogs were introduced to help data and importance. Modern data catalogs still meet need to plan for and drive data catalog adoption
analysts find and understand data. Before data the needs of data analysts, but have expanded their — getting all data stakeholders to participate in
catalogs, most data analysts worked blind — reach. They are now central to data stewardship, curation and cataloging processes and practices.

© Eckerson Group 2019 www.eckerson.com


INTRODUCTION TO DATA CATALOGS 4

Chapter 1: What is a Data Catalog?


The What and Why of Data Cataloging

© Eckerson Group 2019 www.eckerson.com


INTRODUCTION TO DATA CATALOGS 5

Starting the Data Cataloging Journey consumers, curators, stewards, subject matter experts, etc. Search metadata
supports tagging and keywords to help people to find data. Processing
Data catalogs have quickly become a core component of modern data metadata describes transformations and derivations that are applied as data
management. Organizations with successful data catalog implementations is managed through its lifecycle. Supplier metadata is especially important for
see remarkable changes in the speed and quality of data analysis, and in the data acquired from external sources, informing about sources and subscription
engagement and enthusiasm of people who need to perform data analysis. or licensing constraints. We’ll look more closely at catalog metadata in Chapter
By contrast, organizations without a data catalog often have these questions: 3: Data Catalogs and Metadata Management.
What is a data catalog? Why do we need a data catalog? What does a data Figure 1. Data Catalog Metadata Subjects
FIGURE 1 DATA CATALOG METADATA SUBJECTS
catalog do? These are all good questions and a logical place to start your data
cataloging journey.
people
searching
Data Catalog Defined
A Data Catalog is a collection of metadata, combined with data management
and search tools, that helps analysts and other data users to find the data that datasets
they need, serves as an inventory of available data, and provides information to
evaluate the fitness of data for intended uses.
This brief definition makes several points about data catalogs — data
management, searching, data inventory, and data evaluation — but all depend
processing suppliers
on the central capability to provide a collection of metadata.
Data catalogs have become the standard for metadata management in the
age of big data and self-service analytics. The metadata that we need today Why Do We Need a Data Catalog?
is more expansive than metadata in the BI era. A data catalog focuses first on
The data management benefits of a data catalog become apparent by
datasets (the inventory of available data) and connects those datasets with rich
reflecting on the value of metadata and the capabilities that are created with
information to inform people who work with data. Figure 1 illustrates the typical
comprehensive metadata. The greatest value, however, is often seen in the
metadata subjects contained in a data catalog.
impact on analysis activities. We work in an age of self-service analytics. IT
Datasets are the files and tables that data workers need to find and access. organizations can’t provide all of the data needed by the ever-increasing
They may reside in a data lake, warehouse, master data repository, or any other numbers of people who analyze data. But today’s business and data analysts
shared data resource. People metadata describes those who work with data — are often working blind, without visibility into the datasets that exist, the
© Eckerson Group 2019 www.eckerson.com
INTRODUCTION TO DATA CATALOGS 6

contents of those datasets, and the quality and usefulness of each. They spend and perform data preparation and analysis efficiently and with confidence. It is
too much time finding and understanding data, often recreating datasets common to shift from 80% of time spend finding data and only 20% on analysis
that already exist. They frequently work with inadequate datasets resulting in to 20% finding and preparing data with 80% for analysis. Quality of analysis is
inadequate and incorrect analysis. Figure 2 illustrates how analysis processes substantially improved and organizational analysis capacity increases without
change when analysts work with a data catalog. adding more analysts.

FIGURE 2. ANALYSIS
Figure WITHOUT AND WITH
2. Analysis Without andAWith
DATA CATALOG
a Data Catalog

Without Data Catalog


What Does a Data Catalog Do?
available need more data A data catalog includes many features and functions that all depend on the
documentation
& tribal not a fit core capability of cataloging data — collecting the metadata that identifies and
knowledge
describes the inventory of shareable data. It is impractical to attempt cataloging
find get evaluate
try it
understand prepare analyze share the as a manual effort. Automated discovery of datasets, both for initial catalog
the data the data the data the data the data the data analysis
build and ongoing discovery of new datasets is essential. Use of AI and machine
trial & error … waste & rework learning for metadata collection, semantic inference, and tagging, is important
With Data Catalog to get maximum value from automation and minimize manual effort.
DATA CATALOG need more data With robust metadata as the core of the data catalog, many other features and
functions are supported, the most essential including:
find evaluate get understand prepare analyze share the • Dataset Searching — Robust search capabilities include search by facets,
the data the data the data the data the data the data analysis
keywords, and business terms. Natural language search capabilities are
speed … efficiency … confidence
especially valuable for non-technical users. Ranking of search results by
relevance and by frequency of use are particularly useful and beneficial
Without a catalog, analysts look for data by sorting through documentation, features.
talking to colleagues, relying on tribal knowledge, or simply working with
• Dataset Evaluation — Choosing the right datasets depends on ability
familiar datasets because they know about them. The process is fraught with
to evaluate their suitability for an analysis use case without needing to
trial and error, waste and rework, and repeated dataset searching that often
download or acquire data first. Important evaluation features include
leads to working with “close enough” data as time is running out. With a
capabilities to preview a dataset, view data profiles, see user ratings, read
data catalog the analyst is able to search and find data quickly, see all of the
user reviews and curator annotations, and view data quality information.
available datasets, evaluate and make informed choices for which data to use,

© Eckerson Group 2019 www.eckerson.com


INTRODUCTION TO DATA CATALOGS 7

• Data Access — The path from search to evaluation and then to data access A robust data catalog provides many other capabilities including support
should be a seamless user experience with the catalog knowing access for data curation and collaborative data management, data usage tracking,
protocols and providing access directly or interoperating with access intelligent dataset recommendations, and a variety of data governance features.
technologies. Data access functions include access protections for security,
privacy, and compliance sensitive data.

© Eckerson Group 2019 www.eckerson.com


INTRODUCTION TO DATA CATALOGS 8

Chapter 2: What is Data Curation?


Managed Data Sharing

© Eckerson Group 2019 www.eckerson.com


INTRODUCTION TO DATA CATALOGS 9

More Than Shared Databases datasets that is selected and managed to meet the needs and interests of a
specific group of people. Note that the focus here is datasets – files, tables, etc.
Data curation is a term that has recently become a common part of data – that can be accessed and analyzed. The distinction between “collections of
management vocabulary. Data curation is important in today’s world of data data” and “collections of datasets” is subtle but significant.
sharing and self-service analytics, but I think it is a frequently misused term. Data curation, then, is the work of organizing and managing a collection
When speaking and consulting I often hear people refer to data in their data of datasets to meet the needs and interests of a specific groups of people.
lakes and data warehouses as curated data, believing that it is curated because Collecting datasets is only the beginning. That is what we do when we store
it is stored as shareable data. Curating data involves much more than storing data in data warehouses or data lakes. But organizing and managing are the
data in a shared database. essence of data curation. Making datasets easy to find, understand, and access
is the purpose of data curation — a purpose that demands well-described

What is Curation? datasets. Data curation is a metadata management activity and data catalogs
are essential data curation technology.
Let’s set data aside for a moment and consider the meaning and the activities of
curating. The word “curated” is used frequently today. The traditional use of the
word is associated with collections of artifacts in a museum and works of art
Who Are the Data Curators?
in a gallery. More recently we’ve started to use the term to describe managed A typical organization has many people doing data curation work (see figure
collections of many kinds such as curated content at a website, curated music 3) with varying degrees of responsibility and time commitment. Everyone who
and videos available through streaming services, and curated apps through works with data has the opportunity to curate by sharing their knowledge
download services. Wired.com has described Apple’s App Store as “curated and experiences. Crowdsourcing of tribal knowledge is an important part of
computing.” curation practice. Collaborative data management is a necessity in the self-
Curation is the work of organizing and managing a collection of things to meet service world and knowledge sharing is the first step in creating collaborative
the needs and interests of a specific group of people. Collecting things is only culture. Curation collaborators will be large in number with a modest level of
the beginning. Organizing and managing are the critical elements of curation — responsibility and time commitment.
making things easy to find, understand, and access. Domain curators have subject expertise in specific data domains such as
customer, product, finance, etc. Domain curators record and share data domain

What is Data Curation? knowledge that helps data analysts to understand the nature of the data that
they work with. The number of domain curators is substantially smaller than
If “curated” describes collections of things that are selected and managed the number of collaborative curators, with greater level of responsibility and
to meet the needs of a specific group, then “curated data” is a collection of time commitment.

© Eckerson Group 2019 www.eckerson.com


INTRODUCTION TO DATA CATALOGS 10

Figure 3. Curators Throughout the Organization


FIGURE 3: CURATORS THROUGHOUT THE ORGANIZATION questions that are important when considering how to fit data curation into
your organization. It is practical for the same individual to have both curation
and stewardship responsibilities, especially at the level of domain curators. It
is important, however, to recognize curation and stewardship and distinctly
different roles, each with unique perspective about managing data. Some of the
key differences are shown in the table below.

Collaborative Curators
number of people

DATA STEWARD DATA CURATOR


• crowd sourcing of Domain Curators
Lead Curators
tribal knowledge • recording and sharing entities and attributes categories and analysis variables
• moderating the catalog
• reviews and ratings data domain knowledge relationships data collections
• ensuring metadata quality Focus
• Everyone who works • customer, product, etc.
databases datasets
with data can curate
data elements data pipelines and lineage
business data requirements analytic data requirements
priorities and data roadmap finding data when needed
data-to-business alignment data-to-value alignment
shaping data management
tracking data usage practices
policies
level of responsibility
data quality improvement data quality evaluation
Goals
security and privacy policy
security and privacy monitoring
Most organizations will have one or very few lead curators who are responsible compliance
for moderating data catalog contents much as wiki moderators manage descriptions, profiles,
data names and definitions
annotations
content. Lead curators have a high level of responsibility for metadata and
business and technical metadata search and select metadata
catalog quality – responsibilities that require substantial time commitment.
data-to-system mapping data-to-user tracking

What About Data Stewards? The roles of data steward and data curator are related and somewhat overlapping.
I frequently am asked about the differences between data curators and data Stewards and curators working together is a combination that maximizes the
stewards: Are they two names for the same role? Can data stewards be your value of data across all use cases from enterprise reporting to analytics and data
data curators? Why do we need both stewards and curators? These are good science. Stewardship and curation are both metadata management activities and

© Eckerson Group 2019 www.eckerson.com


INTRODUCTION TO DATA CATALOGS 11

data governance roles. Data curation and data cataloging are important elements very differently from metatdata management practices of the past. Chapter 3:
of modern data governance. They are complementary disciplines that are both Data Catalogs and Metadata Management looks at metadata management in
essential in the age of self-service analytics. greater depth.
Ultimately, data curation is a metadata management activity, and data
cataloging is metadata management technology. But both approach metadata

© Eckerson Group 2019 www.eckerson.com


INTRODUCTION TO DATA CATALOGS 12

Chapter 3: Data Catalogs


and Metadata Management
Knowledge Sharing for Data Management

© Eckerson Group 2019 www.eckerson.com


INTRODUCTION TO DATA CATALOGS 13

Single Source for Shared Metadata As data management becomes more complex with data lakes, big data, self-
service analytics, and data science, the role of metadata changes and the
Recall that we previously defined a data catalog as “a collection of metadata, importance of metadata increases exponentially. Metadata that is current,
combined with data management and search tools, that helps analysts and accurate, and readily accessible is an imperative. Metadata disparity is not
other data users to find the data that they need, serves as an inventory of workable and metadata management as an afterthought is hazardous. We
available data, and provides information to evaluate the fitness of data for must actively manage metadata, and a data catalog is the right tool for the
intended uses.” Although accurate, this definition overlooks one very important job. The data catalog has become the new gold standard for metadata and a
point: The data catalog serves as a resource of shared metadata. Everyone cornerstone of data curation.
who has knowledge about data can share it through the catalog, and anyone
seeking knowledge about data can find it in the catalog.
Metadata in the Age of Self-Service
From modest beginnings as a means to manage data inventory and expose
data sets to analysts, the data catalog has grown in functionality, popularity, The real value of metadata is found in the answers it can provide. People
and importance. Modern data catalogs — originated to help data analysts find who depend on data have questions about trustworthiness, latency, lineage,
and evaluate data — continue to meet the needs of analysts, but they have sensitivity, preparation, and much more. Sometimes they want to find others
expanded their reach. They are now central to data analysis, data stewardship, who know or have worked with the data to get human perspective. And they
data curation, and data governance — all metadata dependent activities. need to know about access, privacy and security constraints, cost, etc. Robust
metadata ranging from data set names and properties to usage, access,
licensing, and subject experts is the key to answering the many questions that
A New Approach to Metadata data users and data managers will ask.

Management In today’s self-service world, metadata is essential for three distinct groups of
data management stakeholders:
It seems that everyone wants data management but most want to avoid
• Data consumers need metadata to help them find data for reporting,
metadata management. The distaste for metadata management is an artifact
analysis, and data science work, and to evaluate that data to ensure that
of past metadata approaches with disparate metadata collected by a variety of
they work with the right datasets.
tools using proprietary formats and without integration. Metadata management
in the BI era was painful, but we can’t avoid the reality that metadata is • Data curators need metadata to observe data usage, understand the needs
essential to data management. Just as you need data about finances for and interests of data consumers, and effectively manage the collection of
effective financial management, you need data about data (metadata) for shared data.
effective data management. You can’t manage data without metadata.

© Eckerson Group 2019 www.eckerson.com


INTRODUCTION TO DATA CATALOGS 14

• Data governors (owners and stewards) need metadata to identify and FIGURE 4. METADATA IN A CATALOG
Figure 4. Metadata in a Catalog

protect sensitive data, trace data lineage, and establish trust in data. require transform- execute
license supplier process
ation
review
provide modify calculate
evaluate control

Metadata and the Data Catalog annotation


describe
dataset
contain
data element
use analysis
variable
quantify business
metric

use store source record describe


Metadata is the core of a data catalog. Every catalog collects data about the create
database describe business term map search term
business fact
data inventory and also about processes, people, and platforms related to data. curate
person describe
implement describe
Metadata tools of the past collected business, process, and technical metadata, steward
platform business
SME
and data catalogs continue that practice. But data catalogs do much more. entity
navigate relate blue – data about datasets
They collect metadata about datasets, metadata about processing, metadata service host black – data about processing
provider green – data for searching
key / index
for searching, and metadata for and about people. Figure 4 shows a logical data red – data by and about people

model that represents typical metadata content of a data catalog.


the nature of those roles is valuable. Data catalogs capture metadata to identify
Data catalogs change the game and elevate best practices for metadata
data users, data creators, data stewards, and data subject matter experts.
management with:
• Automated metadata discovery. Organizations with massive data holdings
• Crowdsourced metadata. Much of catalog metadata is collected
— literally tens of thousands of databases — simply don’t know about all of
automatically by applying algorithms and machine learning. But sometimes
the data they have. It is impossible to catalog a petabyte data estate without
the most valuable metadata is the knowledge and experiences of
automated discovery.
individuals and groups. Collecting that knowledge as user ratings, reviews,
tips, and techniques enriches the metadata collection and converts tribal • Automated metadata discovery is an important part of data cataloging.
knowledge into a shared and enduring data management resource. But much of the metadata in a data catalog is a result of crowdsourcing and
collaboration. That is the subject of the next chapter.
• Data about people. Data management and data analysis are ultimately
human activities. Knowing which people have data roles and relationships and

© Eckerson Group 2019 www.eckerson.com


INTRODUCTION TO DATA CATALOGS 15

Chapter 4: Collaboration
and Crowdsourcing
People and Culture in Data Cataloging

© Eckerson Group 2019 www.eckerson.com


INTRODUCTION TO DATA CATALOGS 16

Figure 5. Sharing via the Data Catalog


Participative and Collaborative FIGURE 5. SHARING VIA THE DATA CATALOG
DATA DATA CONSUMERS
Data Management
A core element of business today is the desire to become a data-driven Data Lake
Data Data Data Report
Scientist Engineer Analyst Writer
organization. Most organizations aspire to that goal and many of them

Knowledge Sharing

Data Prep Sharing

Analysis Sharing
struggle. The key to data-driven success and maturity is data culture, and

Data Sharing
Data Warehouse
strong data culture begins with participation. Getting people at all levels from
chief data officer to self-service data consumer to actively participate in data
management activities is a barrier to building a strong and healthy data culture. MDM / RDM
Dataset Searching Data Understanding
A data catalog can be the catalyst that helps to break through the barrier with
Collaboration Data Curation
SaaS
collaboration and crowdsourcing.
SaaS Metadata
Applications
DATA CATALOG

Why Collaboration and DATA USE CASES

Crowdsourcing — A Macro View


ERP Systems

LEGACY

Business Business
Collaboration is central to data-driven culture, creating an environment where Legacy Systems
Data Science
Analytics Intelligence
Reporting

no data stakeholders work in isolation, and where working together and sharing
knowledge and experience is the norm. A robust and full-featured data catalog
encourages collaboration and crowdsourcing with capabilities such as ratings, Why Collaboration and Crowdsourcing
reviews, annotations, and deprecations. This is the human side of data cataloging
that breaks down organizational silos and fosters a culture of sharing — knowledge
— An In-The-Trenches View
sharing, data sharing, process sharing (data preparation), and analysis sharing. Everyone with a role in data management and everyone with data knowledge
(See figure 5.) The data catalog becomes the centerpiece connecting people, data, has opportunity and responsibility to collaborate in the processes and activities
and use cases in a way that improves both speed and quality of analysis. that make a data catalog valuable and informative. Data consumers, data
Actively sharing knowledge, data, and experiences elevates data literacy and curators, and data governors must all participate to create a culture of data
competencies of everyone involved. Working together exposes every individual sharing, metadata sharing, and knowledge sharing.
to new information and different perspectives, often generating new ideas and Analysis and Reporting: Finding the right data for a self-service reporting
sometimes sparking innovation. or analysis project is typically a difficult and time-consuming task filled

© Eckerson Group 2019 www.eckerson.com


INTRODUCTION TO DATA CATALOGS 17

with unanswered questions. Users of data have questions about quality, those who work with data. Recall the earlier description of three levels of data
trustworthiness, latency, lineage, and more. Sometimes they want to find others curators — lead, domain, and collaborative. Curators are the largest group,
who know or have worked with the data to get a human perspective. Through sharing and formalizing tribal knowledge and posting reviews and ratings
collaboration the network of people willing to share their data knowledge rapidly to share their experiences when working with data. Crowdsourcing of tribal
expands. The effect is amplified with a data catalog that identifies data stewards, knowledge enriches catalog metadata and elevates the user experience for
data coaches, data subject matter experts, and frequent users of datasets. everyone who works with data. Crowdsourced knowledge from people who
Figure 6. Rethinking Data Governance have worked with the data, consumer reviews, and usage tracking metadata
FIGURE 6. RETHINKING DATA GOVERNANCE help to evaluate and select the best-fit datasets for each unique analysis and
Old Style Command-and-Control Data Governance reporting use case. Collaboration within and among the three levels of curators
is an effective way to supercharge the richness and value of catalog metadata.
Define Monitor Enforce Data Governance: Adoption of self-service analytics has challenged
Policies Compliance Policies
conventional data governance practices. The top-down, command-and-control
governance techniques of the past are at odds with the agility and autonomy
interests of the self-service community. In the self-service world, collaborative
Modern Collaborative Data Governance
data governance is an emerging and important practice. We must govern with
Facilitate Policy Definition the belief that most people want to do the right thing. The primary role of
governance is to help them to know what is the right thing. Participation and
collaboration are essential to fulfilling that role. (See figure 6.)
Prevention Intervention Enforcement
The data catalog is a core component of collaborative data governance. It
provides a single point of reference for everyone who works with data. Everyone
Foster Policy Compliance
from chief data officers to self-service consumers sees the same metadata, and
all have opportunity to share their knowledge, experiences, and perspectives
Data Curation: Data curation, as previously described, is the work of organizing about data. Crowdsourced, participative data governance is a natural fit for self-
and managing a collection of datasets to meet the needs and interests of service organizations.

© Eckerson Group 2019 www.eckerson.com


INTRODUCTION TO DATA CATALOGS 18

Chapter 5: Driving Data


Catalog Adoption
Engagement and Participation in Data Cataloging

© Eckerson Group 2019 www.eckerson.com


INTRODUCTION TO DATA CATALOGS 19

Getting People Involved FIGURE 8. BARRIERS TO 8.


Figure DATA CATALOG
Barriers to DataADOPTION
Catalog Adoption

business
create & confirm, maintain
case discover
Chapter 4: Collaboration and Crowdsourcing discussed the importance of tool configure complete current &
& catalog use the
selection new data & enrich complete
datasets catalog
participation by all data stakeholders as a key to getting maximum value from technical catalog metadata metadata
case
your data catalog. Many organizations, however, find data catalog adoption This is
— getting people to participate — to be among the biggest challenges to data the
This is the easy part. This is a bit more difficult. hard
catalog success. Adoption is challenging, but understanding the causes of part!

resistance and developing an adoption plan help to overcome those challenges. New Methods – Breaking away from “the way we’ve always done it”

Understanding Culture Shift – Knowledge sharing & collaboration

the Adoption Challenges Data Literacy – The skills to get from data to useful information

When planning for implementation, the human and cultural dimensions of data Motivation – Using the catalog isn’t hard but motivating people may be difficult
cataloging are often overlooked or subordinated to the process and technology
dimensions. A typical data catalog implementation process begins by defining
• New Methods as a Barrier — It is human nature to be anchored by “the way
the business and technical case, proceeds through technology selection and
that we’ve always done it.” The shift to new ways of doing things pushes
installation, then moves on to data discovery and populating the metadata
people away from the familiar and comfortable. Self-service data consumers
catalog. (See figure 7.) This build-it-and-they-will-come approach fails to engage
may resist the data catalog and continue to rely on personal networks and
people to actively use the catalog.
tribal knowledge because it is what they know how to do. Using the data
Figure 7. Data Catalog Implementation catalog requires them to learn new things, which can seem time-consuming
FIGURE 7. DATA CATALOG IMPLEMENTATION
and disruptive for busy people.
business
create & confirm, maintain
case discover • Culture Shift — Data cataloging is most successful in a culture of data
tool configure complete current &
& catalog use the
selection new data & enrich complete
datasets catalog sharing, knowledge sharing, and collaboration. Behaviors such as “my
technical catalog metadata metadata
case data” mentality, territorialism, and knowledge hoarding are signs of an
unhealthy culture that is a barrier to becoming a data-driven organization. A
The final step — use the catalog — often doesn’t happen at the level expected healthy data culture encourages collaboration and sharing, and discourages
for a variety of reasons. (See figure 8.) Predominant among those reasons: the unhealthy behaviors. Participation is a key element of data culture —

© Eckerson Group 2019 www.eckerson.com


INTRODUCTION TO DATA CATALOGS 20

participation at all levels. Leadership visibly invests in data management spreadsheets. The data catalog and access to abundant data often feels
and in growing data literacy throughout the organization. Staff are more like hazard than opportunity to these people.
encouraged and incentivized to access and analyze data and to share their • Motivation — Changing how you work and learning to use the data
knowledge about working with data and share the insights that they derive catalog can seem intimidating, time-consuming, or simply out of comfort
from data. zone. Most people will resist change until they see how it benefits them
• Data Literacy — Many line-of-business people have responsibilities that personally. What’s-in-it-for-me (WIFM) is a typical response, especially when
depend on data analysis but have not been trained to work with data. asked to do new things such as participate in metadata crowdsourcing and
The skills to get from data to useful information — data selection, data post ratings and reviews of datasets. WIFM is a major influence in resistance
understanding, data preparation, data analysis, data visualization, and to data sharing, resistance to knowledge sharing, reluctance to participate in
data storytelling — are not native and natural for them. Their tendency is to collaborative curation, and reluctance to post ratings and reviews.
do just enough data work to get by, and to do that work primarily in Excel

© Eckerson Group 2019 www.eckerson.com


INTRODUCTION TO DATA CATALOGS 21

Closing Thoughts
Data catalogs are positioned to be an enduring part of the future of data case — know the what and why of data cataloging. Then put data curation
management. They fill critical roles for data analysis, data curation, data practices into action to manage metadata, and encourage collaboration and
governance, and data science. Effective use of a data catalog increases crowdsourcing to enrich the metadata. Systematically and incrementally
effectiveness and value derived from all of the other tools in your data and expand the reach of the data catalog, ultimately extending to all data
analytics technology stack. Data preparation, data analysis, and data science consumers and stakeholders. With this approach to data cataloging you’ll
tools all see marked ROI increases when coupled with data cataloging. To experience real business impact through increased capacity for data analysis,
realize the benefits of data cataloging, begin with the business and technical accelerated analysis, and improved quality and reliability of analysis results.

© Eckerson Group 2019 www.eckerson.com


T H E B US I N E S S VA LU E O F A DATA C ATA LO G 29
INTRODUCTION TO DATA CATALOGS 22

About Eckerson Group


Wayne Eckerson, a globally known author, speaker, Unlike other firms, Eckerson Group focuses solely
and advisor, formed Eckerson Group to provide on data analytics. Our veteran practitioners each
data-driven leaders a cocoon of support during have more than 25 years of experience in the field. Research Consulting
every step of their journey toward data and They specialize in every facet of data analytics—
analytics excellence. from data architecture and data governance to
business intelligence and artificial intelligence. Their
Today, Eckerson Group has three main divisions: Education
primary mission is to share their hard-won lessons
• Eckerson Research publishes insights so you with you.
and your team can stay abreast of the latest
Our clients say we are hard-working, insightful, We Help Analytics Leaders Succeed
tools, techniques, and technologies in the field.
and humble. We take the compliment! It all stems
• Eckerson Consulting provides strategy, design, from our love of data and desire to serve—we
and implementation assistance to meet your Contact Us
see ourselves as a family of continuous learners,
organization’s current and future needs. Schedule a Call
interpreting the world of data for you and others.
• Eckerson Education keeps your data analytics Accelerate your data journey. Put an expert on your
team current on the latest developments in the side. Learn what Eckerson Group can do for you!
field through three- and six-hour workshops
and public seminars.

© Eckerson
© Eckerson Group
Group 2019
2019 www.eckerson.com
www.eckerson.com
INTRODUCTION TO DATA CATALOGS 23

About Alation
Alation, the data catalog company, is building a data-fluent world by changing
the way people find, understand, trust, use and reuse data. The first to bring
a data catalog to market, Alation combines machine learning and human
collaboration to bring confidence to data-driven decisions. More than 150
organizations, including eBay, Exelon, Munich Re and Pfizer, leverage the Alation
Data Catalog. Headquartered in Silicon Valley, Alation is funded by Costanoa
Ventures, DCVC (Data Collective), Harmony Partners, Icon Ventures, Salesforce
Ventures, and Sapphire Ventures.
For more information, visit alation.com.

© Eckerson Group 2019 www.eckerson.com

You might also like