Datamesh Ebook
Datamesh Ebook
Datamesh Ebook
Fall 2021
Introduction
Data Mesh is an emerging hot topic for enterprise software that puts Oracle’s focus on the Data Mesh has been in providing a platform
focus on new ways of thinking about data. Data Mesh aims to that can address these emerging technology requirements, including
improve business outcomes of data-centric solutions, as well as to tools for data products, decentralized event-driven architectures, and
drive adoption of modern data architectures. streaming patterns for data in motion.
From the business point of view, Data Mesh introduces new ideas Investing in a Data Mesh can yield impressive benefits, including:
around ‘data product thinking’ and how it can help to drive a more • total clarity into data’s value chain, through applied ‘data product
cross-functional approach to business domain modeling and thinking’ best practices
creating high-value data products.
• >99.999% operational data availability 1, using microservices
From the technology side, there are three important and new focus based data pipelines for data consolidation and data migrations
areas for data-driven architecture: • 10x faster innovation cycles 2, shifting away from ETL, to
continuous transformation and loading (CTL)
1. distributed, decentralized data architecture that help
• ~70% reduction in data engineering 3, gains in CI/CD, no-code
organizations move away from monolithic architectures
and self-serve data pipeline tooling, and agile development
2. event-driven data ledgers for enterprise data in motion
Read on for a look as some impressive case studies and positive
3. streaming-centric pipelines to replace legacy batch type tooling, results from early adopters in this approach.
handle real-time events, and provide more timely analytics
Beware the hype…
Since Data Mesh is a rising hot topic and still in the early days of maturity,
there may be some marketing content that uses the words “data mesh” but
the described solutions do not actually fit the core approach.
Decentralized architecture
• an architecture built for decentralized data, services and clouds
Event-driven data ledgers
• designed to handle events of all types, formats and complexity
Streaming-centric data pipelines
• stream processing by default, centralized batch processing by exception
Self-service, governed platform
• built to empower developers and directly connect data consumers to data producers
• security, validation, provenance and explainability built-in
“By integrating real-time operational
data and analytics, companies can
make better operational and
Chapter 1 strategic decisions.” 11
Data Availability
Integration Data Pipelines
Page 7 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
Data Mesh
APPLICATION Use Case
MODERNIZATION
Looking beyond ‘lift and shift’ migrations of monoliths to the cloud, Application
many organizations also seek to retire their monolithic applications Monoliths
Application
of the past and move towards a more modern microservices Microservices
application architecture for the future.
service service
• Sub-domain offloading of DB transactions service service service
• Bi-directional transaction replication for phased migrations service service service service
Page 8 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
Data Mesh& CUse
DATA AVAILABILITY Case
ONTINUITY
Business-critical applications require very high KPIs and SLAs
around resiliency and continuity. Regardless of whether these
applications are monolithic, microservices or something in
between, they can’t go down!
Page 9 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
Data Mesh
EVENT SOURCING Use
& TRANSACTION OUTBOX
Case
A modern ‘service mesh’ style platform uses events for data
Application
interchange. Rather than depending on batch processing in the data
Monolith
Microservices
tier, data payloads flow continuously when events happen in the
application or data store.
A Data Mesh can supply the foundation tech for microservices centric Order XYZ
data interchange. For example: Service Service
Tables
Microservices patterns like Event Sourcing, CQRS, and Transaction
Outbox 12 are commonly understood solutions – a Data Mesh provides JSON
JSON
Payload
the tooling and frameworks to make these patterns repeatable and
reliable at scale. Figure 2: generic pattern for Transaction Outbox
(note: there are Data Mesh variations/optimizations for this pattern)
Page 10 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
Data Mesh
EVENT-DRIVEN Use Case
INTEGRATION
Beyond microservice design patterns, the need for enterprise
integration extends to other IT systems such as DBs, business IoT /
processes, applications and physical devices of all types. A Data Mesh Devices
provides the foundation for integrating data in motion.
Page 11 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
Data Mesh
STREAMING Use
INGEST (FOR Case
ANALYTICS )
Analytic data stores may include data marts, data warehouses,
OLAP cubes, data lakes and data lake house technologies.
Generally speaking, there are only two ways to bring data into Edge
these analytic data stores:
Monolith
1. Batch / Micro-batch loading – on a time scheduler service
Page 12 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
Data Mesh
STREAMING Use Case
DATA PIPELINES
Once ingested into the analytic data stores, there is usually a
need for ‘data pipelines’ to prepare and transform the data
across different data stages or data zones. This is a process of Data
data refinement often needed for the downstream analytic data
Visualization
products. Raw Data
Zone
Curated
A Data Mesh can provide an independently governed data Data
Data / Event
pipeline layer that works with the analytic data stores, providing Master Services
the following core services: Prepared
Data
Data SQL
• Self-service data discovery and data preparation Access
• Governance of data resources across domains Data
• Data transformation into required data product formats Warehouse
Notebooks
• Eg; streaming ETL
• Data verification, by policy, to assure consistency /ML
Marts
These data pipelines should be capable to work across different DATA MESH
physical data stores (such as marts, warehouses, lakes etc) or as
a “pushdown data stream” within analytic data platforms that Figure 1: a data mesh can create, execute and govern streaming pipelines within a Data Lake
Page 13 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
Data Mesh
STREAMING ANALYTICSUse Case
Events are continuously happening. The analysis of events in a
stream can be crucial for understanding what is happening from
moment to moment.
Page 14 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
DATA MESH USE CASES APPLY TO
OPERATIONAL & ANALYTIC SYSTEMS
Page 15 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
BENEFIT FROM A DATA MESH ON POINT-PROJECTS…
(Operational & Analytic use cases)
Application
Monoliths
Application
Microservices
Consumer Interfaces
Page 16 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
…ACHIEVE MAXIMUM VALUE BY OPERATING A
COMMON MESH ACROSS THE WHOLE DATA ESTATE
(a real-time mesh for both Operational & Analytic data)
Consumer Interfaces
Data
Visualization
Data / Event
Services
SQL
Access
Notebooks
/ML
DATA MESH
Page 17 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
1 DATA PRODUCT
THINKING
Chapter 2
4
different from commonplace solutions that
have already been around for decades.
Page 19 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
Data
1a.) DATAMesh
PRODUCTS Attribute
Products of any kind, from raw commodities to items at your local
store are produced as assets of value, intended to be consumed
and with a specific ‘job to be done.’
Page 20 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
Data Mesh
1b.) CROSS Attribute
-FUNCTIONAL DATA DOMAINS
The ‘wicked problem’ is often in aligning different cross-functional teams to common data domains – domains that require
shared data sets, data models, business policies and business rules.
Page 21 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
Data MeshDATA
2.) DECENTRALIZED Attribute
ARCHITECTURE
Decentralized IT systems are a modern reality, and with decentralization may be across
the rise of SaaS applications and public cloud physical sites, cloud networks,
infrastructure (IaaS), decentralization of applications or edge gateways
and data is here to stay.
Application software architectures are shifting away
from centralized monoliths and towards distributed
microservices (a service mesh).
Data architecture will follow the same trend towards
decentralization, with data becoming more distributed
across a wider variety of physical sites and across many
networks. We call this a Data Mesh.
Distributed software is hard. Just as nobody does
microservices architecture because it is easy, nobody
should try Data Mesh believing it is simple. There are
many good reasons and many benefits to having a
modular decentralized data, but a monolithic and
centralized data architecture is often simpler. data zones may reside in data consumers might
When the business benefits from decentralized data, different physical data stores consume data products from
Data Mesh patterns can keep the solution manageable. (obj store, databases, etc) any site/zone in the mesh
Page 22 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
Data
2a.) MESHMesh Attribute
The word ‘mesh’ means something specific – in tech, it is a particular
kind of network topology setup so that a large group of non-hierarchical
defining pattern of a nodes can collaboratively work together.
mesh is non-hierarchical, Some common tech examples include:
collaborative network
• WiFi Mesh – many nodes working together for better coverage
• ZWave/Zigbee – low-energy smart home device networks
• 5G Mesh – more reliable and resilient cell connections
• Starlink – satellite broadband mesh at global scale
• Service Mesh – a way to provide unified controls over
decentralized microservices (application software)
Data Mesh is aligned to these mesh concepts, and provides a
decentralized way of distributing data across virtual/physical networks
and across vast distances.
Legacy data integration monoliths (such as ETL tools, data federation
tools etc.) and even more recent public cloud services (such as AWS
Glue) require highly centralized infrastructure.
A complete Data Mesh solution should be capable of operating in a
multi-cloud framework, potentially spanning from on-premises, multiple
public clouds, and even to the edge networks.
Page 23 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
Data Mesh
2b.) DISTRIBUTED Attribute
SECURITY
In a world where data is highly distributed and decentralized, the role of
information security is paramount. Unlike highly centralized monoliths,
distributed systems must delegate out the activities necessary to
authenticate and authorize various users to different levels of access.
Securely delegating trust across networks is hard to do well.
Some considerations include:
• Encryption at rest – as data/events are written to storage
• Distributed authentication – for services and data stores
• Eg; mTLS, Certificates, SSO, Secret stores and data vaults
Figure 1: distributed authorizations using OPA sidecar in microservices
• Encryption in motion – as data/events are flowing in-memory
• Identity management – LDAP/IAM type services, cross-platform
• Distributed authorizations – for service end-points to redact data
• For example: Open Policy Agent (OPA) 15 sidecar to place Policy Decision
Point (PDP) within the container/K8S cluster where the microservice end
point is processing. LDAP/IAM may be any JWT capable service.
• Deterministic masking – to reliably and consistently obfuscate PII data
Security within any IT system can be difficult, and it is even more difficult
to provide high security within distributed systems. However, these are Figure 2: distributed mTLS authentication using secure certificates
Page 24 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
Data
3.) EVENTMesh Attribute
-DRIVEN DATA LEDGERS
Ledgers are a fundamental component of making a distributed data
General Purpose Event Ledger architecture function. Just as with an accounting ledger, a data ledger
• optimized for high volumes records the transactions as they happen.
• simple payload semantics
• pub/sub interfaces When we distribute the ledger, the data events become ‘replayable’ in
any location. Some ledgers are a bit like an airplane flight recorder,
used for high availability and disaster recovery.
Data Event Ledger
• optimized for DB transactions
Unlike centralized and monolithic data stores, distributed ledgers are
• ACID level Tx semantics purpose-built to keep track of atomic events and/or transactions that
• point-to-point / point-to-broker happen in other (external) systems.
A Data Mesh is not just one single kind of ledger, and can make use of
Messaging Ledgers different types of event-driven data ledgers, depending on the use
• optimized for guaranteed Tx’s cases and requirements:
• transaction processing system semantics
• General Purpose Event Ledger – such as Kafka or Pulsar
• pub/sub interfaces, transient payloads
• Data Event Ledger – distributed CDC/Replication tools
• Messaging Middleware – including ESB, MQ, JMS, and AQ
Blockchain Ledger
• optimized for multi-party transparency • Blockchain Ledger – for secure, multi-party transactions
• immutable transaction semantics
• API based interfaces (differs by type) Together, these ledgers can act as a sort of durable event log for the
*included for completeness, but not discussed in depth whole enterprise…providing a running list of data events happening on
systems of record and systems of analytics.
Page 25 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
Data Mesh
4.) POLYGLOT Attribute
DATA STREAMS
Data streams may vary by event types, payloads
and different transaction semantics, a Data Mesh T3VyIG1pc3Npb24gaXMgdG8gaGVscCBw
ZW9wbGUgc2VlIGRhdGEgaW4gbmV3IHdh Simple, flat &
should support the necessary stream types for a Telemetry Events
eXMsIGRpc2NvdmVyIGluc2lnaHRzLCB1 record at a time
variety of enterprise data workloads. (devices & things) bmxvY2sgZW5kbGVzcyBwb3NzaWJpbGl0
aWVzLg==
Simple Events:
• Base64 / JSON – raw, schemaless events
syntax = "proto3"; Record at a time,
• Raw Telemetry etc. – sparse events package moviecatalog;
records have
message MovieItem {
simple schema
Basic App Logging / IoT Events: App/Process Events string name = 1;
(biz process & logging) double price = 2;
• JSON / Protobuf – may have schema bool inStock = 3;
}
• MQTT etc. – IoT specific protocols <?xml version="1.0" encoding="utf-8"?> May be deeply
<Root xmlns="http://www.acme.com"> nested, complex
Application Business Process Events: <Customers> schemas
<Customer CustomerID="GREAL">
• SOAP/REST Events – XML/XSD, JSON etc. <ContactName>Howard</ContactName>
Data Events <ContactTitle>Manager</ContactTitle>
• B2B etc. – exchange protocols & standards (ACID transactions)
Follows DB log /
Data Events / Transactions: transaction
• Logical Change Records – LCR, SCN, URID etc. boundaries
Page 26 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
Data Mesh
4a.) STREAM Attribute
DATA PROCESSING
Stream processing is how data is manipulated (1) systems of (2) data processing in (3) data loaded to
within an event stream. Unlike ‘lambda functions’ record produce one or more data services, ledgers,
the stream processor maintains statefulness of data raw data events pipelines / streams storage or DWs
flows within a particular time window.
Basic Data Filtering: 1..n 1..n
Page 27 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
Data Mesh
PATTERN A RCHETYPE
General seconds
Purpose { Data / Event
Event Services }
Systems of Analysis/Engagement
Ledger
IoT
IoT Events, Edge Stream
Operational Systems of Record
Platform
& Microservices Processing
& Analytics
Multi-model
MOM /
IPaaS
Database/s
SaaS /
App Events
& Sys Logs
Database Data
Event Lake (house)
Ledger
milliseconds
Data
OLTP Warehouse
OLTP Governance – Security (distributed), Data Verification, Data Catalog, Registry, Policies
OCI Data Catalog & GoldenGate Stream Analytics Glue/Azure Catalog & GoldenGate Stream Analytics
Oracle Cloud Infrastructure Compute or Container Services
Enterprise
Applications
Stream Processing
Ingest
Athena
Exadata
Cloud@
Customer
Data Lake House RDS
Redshift
Page 30 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
Chapter 3
Page 31 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
Data Mesh
CASE STUDY CRITERIA
Data
Data Mesh There Meta-Catalog
is no single Microservices
‘perfect’ example of a Data Mesh. Messaging Data Lake Distributed DW
Integration
People, Process and Methods: Other software development and data architecture patterns, or technology categories exist and
there remains substantial overlap among the most common concepts like Data Fabrics,
Data Product Focus Microservices
Service
Mesh, and Data Lake
Houses.
Technical Architecture Attributes: For this document, we are considering Data Mesh as a type of Data Fabric. Case Studies should
Distributed
have ‘significant solution focus’ using technology with the following attributes:
Architecture
• Data Products – driving cultural and process changes that affect cross-organizational data
Event Driven Ledgers domains,
and institutionalize
strong
management practices
around data
assets
• Distributed Architecture – decentralized, microservices-based software architecture patterns
ACID Support
• Event Driven Ledgers – durable running log of events to drive cross-domain integrations
Stream Oriented • ACID
Support – for polyglot streams,
empowering correct
and trusteddata transactions
Page 32 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
Case Study
INTUIT – D ATA PRODUCT THINKING
Intuit has been an early proponent and leader in applying data product thinking to their
enterprise data estate. 16 Cross-organizational alignment means that data products include
people stakeholders, business processes, data pipelines, and well-defined APIs for consumption.
Different domains may have internal or external data consumers, and data can take the
form/shape required by the end consumer (eg; data lake tables vs. event bus topics, etc).
People, Process and Methods:
Distributed Architecture
Event Driven Ledgers
ACID Support
conceptual
Stream Oriented solution
framework
Analytic Data Focus
Operational Data Focus adjacent/GG
materialized
Physical & Logical Mesh example…in
GoldenGate Usage
data event ledger, ingest
to cloud and event bus
analytic data lake
Page 33 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
Case
NETFLIX –Study
APPLICATION MODERNIZATION
Netflix has frequently been at the cutting edge of new IT innovation and investment in data mesh
is no different. Before the rise of popularity of the term, Netflix was already using a data mesh
approach to perform online migration of operational apps (to the cloud), to avoid any outages
that would affect customers. 17
After an exhaustive review, Netflix chose an approach with several key data mesh attributes
including a distributed architecture and event-based data ledgers. The target architecture was a
People, Process and Methods:
modern microservices based application, and the real-time migration approach enabled a
Data Product Focus yes – adjacent 18 phased cutover approach to new platforms (infrastructure & DBs) without any downtime.
Technical Architecture Attributes:
This is a good example of an operationally
Distributed Architecture focused data mesh for a point-project.
Event Driven Ledgers Cloud Data Platform
ACID Support On Prem Platform
Stream Oriented n/a
Monolith
Analytic Data Focus n/a
Distributed Architecture
distributed across multiple sites
Event Driven Ledgers
microservices data event ledger (real-time events)
ACID Support Event
deployments in a Ledger
Stream Oriented n/a service mesh
Analytic Data Focus yes - adjacent
Page 35 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
Case
PAYPAL –Study
MICROSERVICES PATTERNS
PayPal uses a modern microservices application architecture (distributed), and needed fast, 100%
correct transactions moved asynchronously among services. They used a data mesh approach 21
with event-driven data ledgers to de-centralize transactions with zero data loss or corruption.
They considered alternatives like ‘event sourcing’ and ‘multi-phase commits’ but could not
guarantee zero data loss, the event-driven data ledgers from GoldenGate provided the trust,
People, Process and Methods: correctness, and performance needed for a distributed data mesh.
Data Product Focus
Technical Architecture Attributes:
Distributed Architecture
Event Driven Ledgers
ACID Support
Stream Oriented yes - adjacent
Analytic Data Focus yes - adjacent
Page 36 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
Case
WESTERNStudy
DIGITAL – INTEGRATION
Western Digital has been continuously investing in digital transformation goals for several years,
including the shift towards cloud-first and data-driven business practices. As a part of that
journey, they make extensive use of event-driven integration tech from Oracle – including
Integration Cloud and GoldenGate. 22
These data mesh capabilities provide a distributed, event-driven architecture that help in both
operational and analytic use cases. Operationally, the integration tech is used to modernize the
People, Process and Methods:
ERP platforms and ultimately reduce operating costs. For analytics, the shift to a real-time cloud
Data Product Focus yes - adjacent business means being able to continuously stream data events from applications into reporting
Technical Architecture Attributes: data marts, data warehouses and data lakes.
Distributed Architecture Operational Edge and Analytics
Event Driven Ledgers drive cost reductions and operating align core operations data to Fast Data and Big Data
efficiency by consolidating ERP and initiatives – impacting customer systems of engagement
ACID Support moving applications to cloud 23 as well as data science / AI initiatives 24
Stream Oriented n/a
Page 37 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
Case
LINKEDINStudy
– STREAMING INGEST
LinkedIn created and operates one of the world’s largest Apache Kafka implementations, using
the tech for both operational and analytic data events. 25 For 100’s of applications that produce
database events, those (billions per day) data events are captured and ingested to Kafka using
GoldenGate data ledger. 26
A modern distributed data mesh must work with raw data events as they happen. When DBs
People, Process and Methods: commit transactions, those data events become the source/provider data from the systems of
record (SoR). Downstream stream processing tools (eg; Samza, Flink, GoldenGate Stream
Data Product Focus Analytics) can then process these data events within milliseconds of their origin.
Technical Architecture Attributes:
Before: After:
Distributed Architecture
Event Driven Ledgers
ACID Support
GoldenGate for
Stream Oriented 100% correct data
transaction events
Analytic Data Focus
Operational Data Focus Apache Kafka for
simple events
Physical & Logical Mesh
data event ledger, stream
data mess!
GoldenGate Use Case DB events (DML & DDL)
into Apache Kafka
Page 38 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
Case
SAILGP –Study
STREAMING ANALYTICS
SailGP runs one of the most
exciting race venues in the world,
with high tech and high-speed sail
boats. Live race data and analytics
are provided within milliseconds
People, Process and Methods:
using data mesh tech. 27
Data Product Focus Distributed edge technology links
Technical Architecture Attributes: race boat, support boat and race
helicopter data into streaming
Distributed Architecture pipelines.
Event Driven Ledgers Telemetry data is streamed into
ACID Support ingest to DW nearby clouds for real-time ETL,
analytics and ingest to cloud data
Stream Oriented warehouse.
Analytic Data Focus Data mesh tech uses GoldenGate
and Kafka (Oracle Streaming).
Operational Data Focus Stream analytics are used in real-
Physical & Logical Mesh physical data
time on race day to assist with
support crews and broadcast
stream analytics, real-time
GoldenGate Use Case event correlation, ETL, networks.
analysis and ingest to DW
Page 39 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
Data
COMPARE Mesh
AND CONTRAST
[best] [worst]
Data Mesh Data Integration Meta-Catalog Microservices Messaging Data Lake House Distributed DW
Distributed
Architecture
ACID Support
Stream Oriented
Page 40 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
Data
BUSINESS Mesh
OUTCOMES
Overall Benefits Operational Outcomes Analytic Outcomes
Faster, Data-Driven Multi-cloud data liquidity Automate and simplify data products
• unlock data capital to flow freely • multi-model data sets
Innovation Cycles
Real-time data sharing Time series data analysis
• Ops-to-Ops & Ops-to-Analytics • deltas / changed records
Reduce Costs for • event-by-event fidelity
Mission-Critical Edge, location-based data services
• correlate IRL device/data events Eliminate full data copies for ODS’
Data Operations • log-based ledgers and pipelines
Trusted microservices data interchange
• “event sourcing” with correct data Distributed data lakes & warehouses
• DataOps and CI/CD for Data • hybrid / multi-cloud / global
• streaming integration / ETL
Uninterrupted continuity
• >99.999% up-time SLAs Predictive Analytics
• cloud migrations • Data monetization,
new ‘data services’ for sale
Page 44 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies