Building A Unified Data Infrastructure O'Reilly Ebook PDF
Building A Unified Data Infrastructure O'Reilly Ebook PDF
m
pl
im
en
ts
of
Building a
Unified Data
Infrastructure
Access, Govern, and Share All
Data with Greater Consistency
and Control
Alice LaPlante
REPORT
METADATA
MANAGEMENT
LEARN MORE
Building a Unified
Data Infrastructure
Access, Govern, and Share All Data
with Greater Consistency and Control
Alice LaPlante
The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Building a Unified
Data Infrastructure, the cover image, and related trade dress are trademarks of
O’Reilly Media, Inc.
The views expressed in this work are those of the author, and do not represent the
publisher’s views. While the publisher and the author have used good faith efforts to
ensure that the information and instructions contained in this work are accurate, the
publisher and the author disclaim all responsibility for errors or omissions, includ‐
ing without limitation responsibility for damages resulting from the use of or reli‐
ance on this work. Use of the information and instructions contained in this work is
at your own risk. If any code samples or other technology this work contains or
describes is subject to open source licenses or the intellectual property rights of oth‐
ers, it is your responsibility to ensure that your use thereof complies with such licen‐
ses and/or rights.
This work is part of a collaboration between O’Reilly and TIBCO Software, Inc. See
our statement of editorial independence.
978-1-492-06322-3
[LSI]
Table of Contents
iii
Building a Unified
Data Infrastructure
• 71.7% of firms report that they have yet to forge a data culture.
• 69.0% of firms report that they have not created a data-driven
organization.
• 53.1% of firms state they are not yet treating data as a business
asset.
• 52.4% of firms declare they are not competing on data and
analytics.
Companies are gathering and storing a lot of data, but the biggest
obstacle to becoming data-driven is that they lack the right software
tools to collect and synthesize this data, according to Prevedere’s
2019 Executive Survey.
Capturing value from data requires excellence in your operating
model, or the people, processes, and systems used to manage your
1
data. Within systems, this means having access to the right data
infrastructure.
Data infrastructure is a digital infrastructure that enables seamless
sharing and consumption of data. Similar to other types of infra‐
structures, it provides the structure needed for an organization to
operate in a data-centric economy.
This report aims to help chief data officers, enterprise architects,
and line-of-business (LOB) executives learn the importance of tak‐
ing a unified and holistic approach to data infrastructure. It high‐
lights the ways in which business objectives are met by the following
solutions:
The report discusses these solutions, explains why you need them,
and explores the benefits of combining them. It also shares best
practices on how to build a unified data infrastructure using all
these technologies.
Each of these areas has its own unique data requirements and chal‐
lenges, yet you somehow have to build a data infrastructure that ful‐
fills the needs of all of them.
Operations
The operations domain includes anything that involves the running
of your business. This encompasses everything from ordering raw
Analytics
The analytics data context includes anything that involves using data
to inform (i.e., guide) business decisions. This encompasses every‐
thing from factory supervisors deciding how many widgets to man‐
ufacture next month, to marketing executives figuring out how to
price them, to CEOs determining optimal markets in which to
expand distribution.
These particular usages of data raise specific data infrastructure
requirements and challenges. First, the data silo problem is the same
as it is with operations. Whether for business intelligence and
reporting, self-service analytics, or data science experiments, getting
access to all the data needed to perform a particular analysis can fre‐
quently be a struggle. The problem, again, is that the necessary data
resides in multiple systems.