Datamesh Ebook

Download as pdf or txt
Download as pdf or txt
You are on page 1of 46

Enterprise Data Mesh

SOLUTIONS, USE CASES & CASE STUDIES

Fall 2021
Introduction
Data Mesh is an emerging hot topic for enterprise software that puts Oracle’s focus on the Data Mesh has been in providing a platform
focus on new ways of thinking about data. Data Mesh aims to that can address these emerging technology requirements, including
improve business outcomes of data-centric solutions, as well as to tools for data products, decentralized event-driven architectures, and
drive adoption of modern data architectures. streaming patterns for data in motion.

From the business point of view, Data Mesh introduces new ideas Investing in a Data Mesh can yield impressive benefits, including:
around ‘data product thinking’ and how it can help to drive a more • total clarity into data’s value chain, through applied ‘data product
cross-functional approach to business domain modeling and thinking’ best practices
creating high-value data products.
• >99.999% operational data availability 1, using microservices
From the technology side, there are three important and new focus based data pipelines for data consolidation and data migrations
areas for data-driven architecture: • 10x faster innovation cycles 2, shifting away from ETL, to
continuous transformation and loading (CTL)
1. distributed, decentralized data architecture that help
• ~70% reduction in data engineering 3, gains in CI/CD, no-code
organizations move away from monolithic architectures
and self-serve data pipeline tooling, and agile development
2. event-driven data ledgers for enterprise data in motion
Read on for a look as some impressive case studies and positive
3. streaming-centric pipelines to replace legacy batch type tooling, results from early adopters in this approach.
handle real-time events, and provide more timely analytics
Beware the hype…
Since Data Mesh is a rising hot topic and still in the early days of maturity,
there may be some marketing content that uses the words “data mesh” but
the described solutions do not actually fit the core approach.

A proper Data Mesh is a mindset, an organizational model and an


enterprise data architecture approach…it should have some mix of data
product thinking, decentralized data architecture, event-driven actions and
a streaming centric ‘service mesh’ style of microservices design.

A Data Mesh is not a…


• Single Cloud Data Lake …even with ‘domains,’ catalogs & SQL access
• Data Catalog / Graph …a data mesh needs a physical implementation
• Point-Product …no vendor has a singular product for Data Mesh
• IT Consulting Project …strategy/tactics still require platforms and tools
• Data Fabric …which is broadly inclusive of monolithic data architectures
• Self-Service Analytics …easy-to-use UX can front a mesh or a monolith
As the popularity of Data Mesh continues to increase, there will be many
bandwagon vendors/consultants, so it’s important to beware of the hype!
Why data mesh?
Because the old ways are not working well. Most business
transformation initiatives fail. Most of the time and costs for
digital platforms are sunk into ‘integration’ efforts 4. Monolithic
tech architectures of the past are cumbersome, expensive, and
inflexible. Additionally:
Data mess!
• 70-80% of digital transformations fail 5
• Rise of distributed architectures, app, data and cloud
architectures are all becoming less centralized/monolithic
• Cloud lock-in is real 6, and can become more costly 7
• Data Lakes rarely succeed 8, and are only analytics focused
• Organizational silos exacerbate data sharing issues 9
• Everything is speeding up, pace of innovation, the speed of
IT events, your competition are all moving faster than ever
• Cost of operational data outage is rising 10
Data Mesh is no silver bullet, but the principles, practices and
technologies have been aligned to focus on solving some of the
most pressing, unaddressed modernization objectives for data-
driven business initiatives.
Note: an excellent primer to why Data Mesh is needed is Zhamak Dehghani’s 2019 paper, “…from Monoliths to Data Mesh” 9
New concept for data
Data Mesh approach:

1. Emphasizes cultural change, as a mindset shift towards


thinking of data ‘as a product’ – which in turn can prompt
organizational and process changes to manage data as a tangible,
real capital asset of the business.

2. Calls for alignment across operational and analytic data


domains. A Data Mesh aims to link data producers directly to data
consumers and remove the IT middleman from the processes
that ingest, prepare and transform data resources.

3. Technology platform built for ‘data in motion’ is a key indicator


of success – a two-sided platform that links enterprise data
producers and consumers. Data Mesh core is a distributed
architecture for on-prem and multi-cloud data .
Defining the data mesh
1.) Outcomes focused Data Products
Data product thinking – mindset shift to data consumer point of view
• data domain owners responsible for KPIs/SLAs of data products Data Products
Alignment for Ops & Analytics – no more ‘throwing data over the wall’
• same technology mesh and data domain semantics for all
Data in motion – as a core competency for producing data products
• Remove the ‘man in the middle’ – by making data events directly accessible
from systems of record and providing self-service real-time data pipelines to
get the data where needed

2.) Rejects monolithic IT architecture

Decentralized architecture
• an architecture built for decentralized data, services and clouds
Event-driven data ledgers
• designed to handle events of all types, formats and complexity
Streaming-centric data pipelines
• stream processing by default, centralized batch processing by exception
Self-service, governed platform
• built to empower developers and directly connect data consumers to data producers
• security, validation, provenance and explainability built-in
“By integrating real-time operational
data and analytics, companies can
make better operational and
Chapter 1 strategic decisions.” 11

Seven Data Mesh


Use Case Examples
A successful Data Mesh fulfills use cases for
Operational as well as Analytic Data domains.
App Modernization Stream Analytics
The following seven use cases illustrate the
breadth of capabilities that a Data Mesh
brings to enterprise data. Event Sourcing Streaming Ingest

Data Availability
Integration Data Pipelines

Page 7 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
Data Mesh
APPLICATION Use Case
MODERNIZATION
Looking beyond ‘lift and shift’ migrations of monoliths to the cloud, Application
many organizations also seek to retire their monolithic applications Monoliths
Application
of the past and move towards a more modern microservices Microservices
application architecture for the future.

But legacy app monoliths typically depend on big monolithic Bi-directional


databases, raising the question, “how to phase the migration plan to Transaction
Outbox during
decrease disruption, risks, and costs?” DATA MESH migration period

Figure 1: data mesh foundation for monolith migrations


A Data Mesh can provide an important operational IT capability for
customers doing phased transitions from monoliths to mesh service

architecture. For example: service

service service
• Sub-domain offloading of DB transactions service service service

• Eg; filtering data by ‘bounded context’ service service service service

• Bi-directional transaction replication for phased migrations service service service service

• Cross-platform sync (eg; mainframe to DBaaS) TIME

In the lingo of microservices architects, this approach is using a bi- Monolith


Monolith
Monolith
directional Transaction Outbox 12 to enable the Strangler Fig 13 Monolith
migration pattern, one Bounded Context 14 at a time.
Figure 2: strangler fig pattern for monolith decomposition and phased migrations

Page 8 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
Data Mesh& CUse
DATA AVAILABILITY Case
ONTINUITY
Business-critical applications require very high KPIs and SLAs
around resiliency and continuity. Regardless of whether these
applications are monolithic, microservices or something in
between, they can’t go down!

For mission-critical systems a distributed eventual-consistency


data model is usually not acceptable. However, these apps must
operate across many data centers. This begs the question, “how Geo-Distributed Data
can I run my Apps across more than one data center while still
guaranteeing correct and consistent data?”

A Data Mesh can provide the foundation for decentralized yet


100% correct data across sites. For example:
• Very low latency logical transactions (cross-platform)
• ACID capable guarantees for correct data
• Multi-active, bi-directional and conflict resolution DATA MESH
Regardless of whether the monoliths are using ‘sharded data Figure 1: data mesh for geographically distributed data events
sets’ or the microservices are being setup for cross-site HA, the
Data Mesh can provide high speed, correct data at any distance.

Page 9 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
Data Mesh
EVENT SOURCING Use
& TRANSACTION OUTBOX
Case
A modern ‘service mesh’ style platform uses events for data
Application
interchange. Rather than depending on batch processing in the data

Monolith
Microservices
tier, data payloads flow continuously when events happen in the
application or data store.

For some architectures, microservices need to exchange data


payloads with each other. Other patterns require interchange between
monolithic applications or data stores. This begs the question, “how DATA MESH
can I reliably exchange microservice data payloads among my apps
Figure 1: event-based interop among various Apps, Microservices and DBs
and data stores?”

A Data Mesh can supply the foundation tech for microservices centric Order XYZ
data interchange. For example: Service Service

• Microservice to Microservice (w/in Context)


• Microservice to Microservice (across Contexts)
Message Message
• Monolith to/from Microservice App outbox
Relay Broker

Tables
Microservices patterns like Event Sourcing, CQRS, and Transaction
Outbox 12 are commonly understood solutions – a Data Mesh provides JSON
JSON
Payload
the tooling and frameworks to make these patterns repeatable and
reliable at scale. Figure 2: generic pattern for Transaction Outbox
(note: there are Data Mesh variations/optimizations for this pattern)

Page 10 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
Data Mesh
EVENT-DRIVEN Use Case
INTEGRATION
Beyond microservice design patterns, the need for enterprise
integration extends to other IT systems such as DBs, business IoT /
processes, applications and physical devices of all types. A Data Mesh Devices
provides the foundation for integrating data in motion.

Data in motion is typically event-driven. A user action, a device event,


a process step or a data store commit can all initiate an event with a
data payload. These data payloads are crucial for integrating IoT Application Data
Telemetry Events Analytics
systems, business processes and databases, data warehouses and Monoliths
data lakes.
Process Events
A Data Mesh supplies the foundation tech for real-time integration
across the enterprise. For example: Data Events

• Connecting real world device events to IT systems Microservice


Applications
• Integrating business processes across ERP systems DATA MESH
• Aligning operational DBs with analytic data stores
Large organizations will naturally have a mix of old and new systems,
monoliths and microservices, operational and analytic data stores – a
Data Mesh can help to unify these resources across differing business
and data domains.

Page 11 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
Data Mesh
STREAMING Use
INGEST (FOR Case
ANALYTICS )
Analytic data stores may include data marts, data warehouses,
OLAP cubes, data lakes and data lake house technologies.

Generally speaking, there are only two ways to bring data into Edge
these analytic data stores:

Monolith
1. Batch / Micro-batch loading – on a time scheduler service

2. Streaming Ingest – continuously loading data events

A Data Mesh provides the foundation tech for a streaming data


ingest capability. For example: Data
service Lake (house)
• Data events, from databases, data stores etc. service

• Device events, from physical device telemetry service

• Application events, logging or business events Data


Warehouse

Ingesting events by stream often reduces the impact on the


source systems, improves the fidelity of the data (important for Marts
data science) and can empower a real-time analytics use case
where valuable to the data product owners.
Figure 1: leveraging a Data Mesh for common data ingest
across Data Lakes, Data Warehouses, and Data Marts

Page 12 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
Data Mesh
STREAMING Use Case
DATA PIPELINES
Once ingested into the analytic data stores, there is usually a
need for ‘data pipelines’ to prepare and transform the data
across different data stages or data zones. This is a process of Data
data refinement often needed for the downstream analytic data
Visualization
products. Raw Data
Zone
Curated
A Data Mesh can provide an independently governed data Data
Data / Event
pipeline layer that works with the analytic data stores, providing Master Services
the following core services: Prepared
Data
Data SQL
• Self-service data discovery and data preparation Access
• Governance of data resources across domains Data
• Data transformation into required data product formats Warehouse
Notebooks
• Eg; streaming ETL
• Data verification, by policy, to assure consistency /ML
Marts
These data pipelines should be capable to work across different DATA MESH
physical data stores (such as marts, warehouses, lakes etc) or as
a “pushdown data stream” within analytic data platforms that Figure 1: a data mesh can create, execute and govern streaming pipelines within a Data Lake

support streaming data, such as Apache Spark and other data


lake house technologies.

Page 13 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
Data Mesh
STREAMING ANALYTICSUse Case
Events are continuously happening. The analysis of events in a
stream can be crucial for understanding what is happening from
moment to moment.

This kind of time-series based analysis of real-time event streams


may be important for real world IoT device data, but also for
understanding what is happening in your IT data centers or across
financial transactions (eg; fraud monitoring).

A full featured Data Mesh will include the foundation capabilities


to analyze events of all kinds, across many different types of event
time windows. For example:
• Simple event stream analysis (eg; Web events)
DATA MESH
• Business activity monitoring (eg; SOAP/REST events)
• Complex event processing (eg; multi-stream correlation) Figure 1: events of all types (IoT, DB, etc) can be analyzed in real-time streams

• Data event analysis (eg; on DB/ACID transactions)

Like data pipelines, the stream analytics may be capable of


running within established data lake house infrastructure, or
separately – as native cloud services for example.

Page 14 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
DATA MESH USE CASES APPLY TO
OPERATIONAL & ANALYTIC SYSTEMS

App Modernization Data Pipelines


Data Availability Stream Analytics
Streaming Ingest
Event Sourcing
Systems of Record (SoR) Systems of Analysis
Integration
Sources of Truth Decision Support
Data Providers / Producers Data Science
Core Business Processes Predictive Analytics
Systems of
Systems of Engagement Interchange Data Visualization

Page 15 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
BENEFIT FROM A DATA MESH ON POINT-PROJECTS…
(Operational & Analytic use cases)

Application
Monoliths
Application
Microservices
Consumer Interfaces

App Modernization Data


Visualization
Streaming Ingest Data
Lake (house)
Event Sourcing ODS Data / Event
Services
Geo-Distributed Data

Data Pipelines SQL


Integration Data Access
Warehouse
Data Availability Edge Notebooks
/ML
Marts
Devices
Stream Analytics

Page 16 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
…ACHIEVE MAXIMUM VALUE BY OPERATING A
COMMON MESH ACROSS THE WHOLE DATA ESTATE
(a real-time mesh for both Operational & Analytic data)

Consumer Interfaces

Data
Visualization

Data / Event
Services

SQL
Access

Notebooks
/ML

modern service mesh, multi-cloud deployment options…

DATA MESH

Page 17 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
1 DATA PRODUCT
THINKING

Chapter 2

Four Key Attributes


of a Data Mesh
2 DECENTRALIZED
DATA ARCHITECTURE

A Data Mesh should not be just a new buzz


word on top of an old tech architecture.

As Data Mesh aims to bring unique value, it


must have unique attributes that are distinctly
3 EVENT-DRIVEN
DATA LEDGERS

4
different from commonplace solutions that
have already been around for decades.

These are four key attributes to be aware of.


POLYGLOT
DATA STREAMING
Page 18 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
Data
1.) DATA PMesh Attribute
RODUCT THINKING
A mindset shift is the most important first step towards a Data Mesh.
The willingness to embrace the learned practices of innovation is the
springboard towards successful modernization of data architecture.

These learned practice areas include:

• Design Thinking – for solving ‘wicked problems’

• Jobs to be Done Theory – customer focused innovation, and


the Outcome-Driven Innovation process
Data
Attributes Design Thinking methodologies bring proven techniques that help
break down the organizational silos frequently blocking cross-
functional innovation. The Jobs to be Done Theory is the critical
foundation for designing data products that fulfil specific end-
Data consumer goals, or jobs to be done – it defines the product’s purpose.
Product
The data product approach initially emerged from the data science
Business User community but is now going mainstream, being applied to all aspects
Needs Needs of the data management discipline. It keeps the focus on the business
outcomes, the data consumers… rather than the IT tech.

Data product thinking can be applied to other data architectures, but


it is an essential part of a data mesh.

Page 19 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
Data
1a.) DATAMesh
PRODUCTS Attribute
Products of any kind, from raw commodities to items at your local
store are produced as assets of value, intended to be consumed
and with a specific ‘job to be done.’

Data Data products can take a variety of forms, depending on the


Products business domain or problem to be solved, and may include:
• Analytics – historic/real-time reports & dashboards

Data • Data Sets – data collections in different shapes/formats

Assets • Models – domain objects, data models, ML features


• Algorithms – ML models, scoring, business rules
• Data Services & APIs – docs, payloads, topics, REST APIs…

Business Data A data product is created for consumers, requiring tracking of


additional attributes such as:
• Stakeholder Map – who creates and consumes this product?
• Packaging, Documentation – how is it consumed?
Digital Noise • Purpose & Value – implicit/explicit value? depreciation?
• Quality, Consistency – KPIs and SLAs of usage?
• Provenance, Lifecycle & Governance – trust & explainability?

Page 20 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
Data Mesh
1b.) CROSS Attribute
-FUNCTIONAL DATA DOMAINS
The ‘wicked problem’ is often in aligning different cross-functional teams to common data domains – domains that require
shared data sets, data models, business policies and business rules.

data refinement zones, levels


of curation… eg; may be
Zone 1 Zone 2 Zone 3 Zone 4 across clouds, object store
buckets, DB schema, etc

Data Domain A business domains, logical


boundaries… may be
ontology categories, data
catalog tags, DDD bounded
contexts, etc.
Data Domain B
data products
may be sourced data products may exist at
Data Domain C from any zone different refinement levels
(eg; raw, curated, master, etc)

Page 21 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
Data MeshDATA
2.) DECENTRALIZED Attribute
ARCHITECTURE
Decentralized IT systems are a modern reality, and with decentralization may be across
the rise of SaaS applications and public cloud physical sites, cloud networks,
infrastructure (IaaS), decentralization of applications or edge gateways
and data is here to stay.
Application software architectures are shifting away
from centralized monoliths and towards distributed
microservices (a service mesh).
Data architecture will follow the same trend towards
decentralization, with data becoming more distributed
across a wider variety of physical sites and across many
networks. We call this a Data Mesh.
Distributed software is hard. Just as nobody does
microservices architecture because it is easy, nobody
should try Data Mesh believing it is simple. There are
many good reasons and many benefits to having a
modular decentralized data, but a monolithic and
centralized data architecture is often simpler. data zones may reside in data consumers might
When the business benefits from decentralized data, different physical data stores consume data products from
Data Mesh patterns can keep the solution manageable. (obj store, databases, etc) any site/zone in the mesh

Page 22 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
Data
2a.) MESHMesh Attribute
The word ‘mesh’ means something specific – in tech, it is a particular
kind of network topology setup so that a large group of non-hierarchical
defining pattern of a nodes can collaboratively work together.
mesh is non-hierarchical, Some common tech examples include:
collaborative network
• WiFi Mesh – many nodes working together for better coverage
• ZWave/Zigbee – low-energy smart home device networks
• 5G Mesh – more reliable and resilient cell connections
• Starlink – satellite broadband mesh at global scale
• Service Mesh – a way to provide unified controls over
decentralized microservices (application software)
Data Mesh is aligned to these mesh concepts, and provides a
decentralized way of distributing data across virtual/physical networks
and across vast distances.
Legacy data integration monoliths (such as ETL tools, data federation
tools etc.) and even more recent public cloud services (such as AWS
Glue) require highly centralized infrastructure.
A complete Data Mesh solution should be capable of operating in a
multi-cloud framework, potentially spanning from on-premises, multiple
public clouds, and even to the edge networks.

Page 23 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
Data Mesh
2b.) DISTRIBUTED Attribute
SECURITY
In a world where data is highly distributed and decentralized, the role of
information security is paramount. Unlike highly centralized monoliths,
distributed systems must delegate out the activities necessary to
authenticate and authorize various users to different levels of access.
Securely delegating trust across networks is hard to do well.
Some considerations include:
• Encryption at rest – as data/events are written to storage
• Distributed authentication – for services and data stores
• Eg; mTLS, Certificates, SSO, Secret stores and data vaults
Figure 1: distributed authorizations using OPA sidecar in microservices
• Encryption in motion – as data/events are flowing in-memory
• Identity management – LDAP/IAM type services, cross-platform
• Distributed authorizations – for service end-points to redact data
• For example: Open Policy Agent (OPA) 15 sidecar to place Policy Decision
Point (PDP) within the container/K8S cluster where the microservice end
point is processing. LDAP/IAM may be any JWT capable service.
• Deterministic masking – to reliably and consistently obfuscate PII data
Security within any IT system can be difficult, and it is even more difficult
to provide high security within distributed systems. However, these are Figure 2: distributed mTLS authentication using secure certificates

solved problems with known solutions.

Page 24 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
Data
3.) EVENTMesh Attribute
-DRIVEN DATA LEDGERS
Ledgers are a fundamental component of making a distributed data
General Purpose Event Ledger architecture function. Just as with an accounting ledger, a data ledger
• optimized for high volumes records the transactions as they happen.
• simple payload semantics
• pub/sub interfaces When we distribute the ledger, the data events become ‘replayable’ in
any location. Some ledgers are a bit like an airplane flight recorder,
used for high availability and disaster recovery.
Data Event Ledger
• optimized for DB transactions
Unlike centralized and monolithic data stores, distributed ledgers are
• ACID level Tx semantics purpose-built to keep track of atomic events and/or transactions that
• point-to-point / point-to-broker happen in other (external) systems.
A Data Mesh is not just one single kind of ledger, and can make use of
Messaging Ledgers different types of event-driven data ledgers, depending on the use
• optimized for guaranteed Tx’s cases and requirements:
• transaction processing system semantics
• General Purpose Event Ledger – such as Kafka or Pulsar
• pub/sub interfaces, transient payloads
• Data Event Ledger – distributed CDC/Replication tools
• Messaging Middleware – including ESB, MQ, JMS, and AQ
Blockchain Ledger
• optimized for multi-party transparency • Blockchain Ledger – for secure, multi-party transactions
• immutable transaction semantics
• API based interfaces (differs by type) Together, these ledgers can act as a sort of durable event log for the
*included for completeness, but not discussed in depth whole enterprise…providing a running list of data events happening on
systems of record and systems of analytics.

Page 25 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
Data Mesh
4.) POLYGLOT Attribute
DATA STREAMS
Data streams may vary by event types, payloads
and different transaction semantics, a Data Mesh T3VyIG1pc3Npb24gaXMgdG8gaGVscCBw
ZW9wbGUgc2VlIGRhdGEgaW4gbmV3IHdh Simple, flat &
should support the necessary stream types for a Telemetry Events
eXMsIGRpc2NvdmVyIGluc2lnaHRzLCB1 record at a time
variety of enterprise data workloads. (devices & things) bmxvY2sgZW5kbGVzcyBwb3NzaWJpbGl0
aWVzLg==
Simple Events:
• Base64 / JSON – raw, schemaless events
syntax = "proto3"; Record at a time,
• Raw Telemetry etc. – sparse events package moviecatalog;
records have
message MovieItem {
simple schema
Basic App Logging / IoT Events: App/Process Events string name = 1;
(biz process & logging) double price = 2;
• JSON / Protobuf – may have schema bool inStock = 3;
}
• MQTT etc. – IoT specific protocols <?xml version="1.0" encoding="utf-8"?> May be deeply
<Root xmlns="http://www.acme.com"> nested, complex
Application Business Process Events: <Customers> schemas
<Customer CustomerID="GREAL">
• SOAP/REST Events – XML/XSD, JSON etc. <ContactName>Howard</ContactName>
Data Events <ContactTitle>Manager</ContactTitle>
• B2B etc. – exchange protocols & standards (ACID transactions)
Follows DB log /
Data Events / Transactions: transaction
• Logical Change Records – LCR, SCN, URID etc. boundaries

• Consistent Boundaries – commits vs. operations

Page 26 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
Data Mesh
4a.) STREAM Attribute
DATA PROCESSING
Stream processing is how data is manipulated (1) systems of (2) data processing in (3) data loaded to
within an event stream. Unlike ‘lambda functions’ record produce one or more data services, ledgers,
the stream processor maintains statefulness of data raw data events pipelines / streams storage or DWs
flows within a particular time window.
Basic Data Filtering: 1..n 1..n

• Thresholds, alerts, telemetry monitoring etc.


Simple ETL:

Systems of Analysis / Engagement


• RegEx functions, math/logic, concatenation

Enterprise Data Ledgers


• Record-by-record, substitutions, masking
CEP & Complex ETL:
• Complex Event Processing (CEP)
• DML (ACID) processing, groups of tuples
• Aggregates, lookups, complex joins etc.

Stream Analytics: Filter Queries Time Series


Aggregate Data Patterns Spatial Analytics
• Time series analytics, custom time windows Correlate/Enrich Windowing Anomalies
• Geospatial, machine learning and embedded AI Thresholds Data Policies Classification
Joins Business Rules Scoring Models

Page 27 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
Data Mesh
PATTERN A RCHETYPE

General seconds
Purpose { Data / Event
Event Services }

Systems of Analysis/Engagement
Ledger
IoT
IoT Events, Edge Stream
Operational Systems of Record

Platform
& Microservices Processing
& Analytics
Multi-model
MOM /
IPaaS
Database/s

SaaS /
App Events
& Sys Logs
Database Data
Event Lake (house)
Ledger

milliseconds
Data
OLTP Warehouse
OLTP Governance – Security (distributed), Data Verification, Data Catalog, Registry, Policies

Serverless or Service Mesh (multi-cloud) Deployments


Data
CONCRETEMesh
EXAMPLES
Oracle & Hybrid
Oracle Cloud
(multi-cloud)
API Platform
Kinesis
-or-
IoT ExaCS & ExaCC
Event Hub GoldenGate
IoT
Cloud -or- Stream Athena,
OCI OCI Data Confluent Analytics Cosmos etc

Streaming Science Data


Integration GoldenGate OCI Data
AWS SQS
Cloud Stream Platform etc. (in EMR –or- EMR, ADLS, Science
ADLS) Delta Lake etc.
Analytics Analytics
Cloud PowerBI
OCI GoldenGate Redshift,
GoldenGate on compute Synapse, etc.
Autonomous
Data Warehouse Snowflake, etc

OCI Data Catalog & GoldenGate Stream Analytics Glue/Azure Catalog & GoldenGate Stream Analytics
Oracle Cloud Infrastructure Compute or Container Services

Open Source Noteworthy Technology Layers:


• IoT/Edge – for gateways, edge notes, telemetry collection
• Message-oriented Middleware – for event-driven business process integrations
Open- • Data Events/CDC – DB transaction events, full ACID consistency etc.
Remote
-or-
MySQL,
Postgres etc.
• Event Streams – scale-out and partitioned event store
WS02 / • Stream Processing, Analytics – stateful, windowed complex stream processing
RabbitMQ
Apache Spark • Security – distributed authentication/authorization across VCNs
Debezium etc. • Data Catalog / Registry – semantic alignment of entities, schemas, and registries
Apache Hive
• Data Verification – auditable verification of data consistency across data stores
Open Policy Agent, Egeria… • Serverless / Service Mesh – depending on public cloud or self-operated
Data Mesh
SINGLE CLOUD OR MULTI-CLOUD
Single Cloud: Multi-Cloud:
• Simpler – fewer networks, identity domains, etc • Decentralized – introduces greater complexity (events, security, networking, etc)
• Serverless – more opportunities to use ‘pay per use’ services • Service Mesh & Serverless – IT may have to operate ‘as a service’ containers & K8S
• Homogeneous – major commitment to single vendor solutions • Heterogeneous – empowers best-of-breed and greater portability, reuse of services

vs. Data Mesh


Self-Service GUI
Stream Processing
Ingest

Data Lake House


Edge
Analytics
Edge Gateways

Enterprise
Applications

Stream Processing
Ingest

Athena
Exadata
Cloud@
Customer
Data Lake House RDS

Redshift

Page 30 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
Chapter 3

Seven Case Studies


In consideration of the 7 use cases and 4 key
technology attributes, there are good public
examples of Data Mesh success.

Each of these examples are using distributed,


decentralized, event-driven, real-time tech.

Several examples also leverage data product


thinking, microservices, service mesh and
stream processing architecture.

Page 31 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
Data Mesh
CASE STUDY CRITERIA
Data
Data Mesh There Meta-Catalog
is no single Microservices
‘perfect’ example of a Data Mesh. Messaging Data Lake Distributed DW
Integration
People, Process and Methods: Other software development and data architecture patterns, or technology categories exist and
there remains substantial overlap among the most common concepts like Data Fabrics,
Data Product Focus  Microservices
 Service 
Mesh, and Data Lake
 Houses.   
Technical Architecture Attributes: For this document, we are considering Data Mesh as a type of Data Fabric. Case Studies should
Distributed
have ‘significant solution focus’ using technology with the following attributes:
Architecture       
• Data Products – driving cultural and process changes that affect cross-organizational data
Event Driven Ledgers  domains,
 and institutionalize
 strong
management practices
 around data
 assets 
• Distributed Architecture – decentralized, microservices-based software architecture patterns
ACID Support       
• Event Driven Ledgers – durable running log of events to drive cross-domain integrations
Stream Oriented  • ACID
 Support – for polyglot streams,
 empowering correct
 and trusteddata transactions

Analytic Data Focus  • Stream-Oriented


 – data processingon ‘data in motion’
  to drive solution outcomes 
• Analytic Data Focus – data pipelines or data products in the analytics domain (eg; OLAP)
Operational Data
      
Focus • Operational Data Focus – solution focus on operational data outcomes (eg; OLTP)
Physical & Logical
Mesh  • Physical
 & LogicalMesh – data is both
 physically and
logically ‘meshed’
 together 

Page 32 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
Case Study
INTUIT – D ATA PRODUCT THINKING
Intuit has been an early proponent and leader in applying data product thinking to their
enterprise data estate. 16 Cross-organizational alignment means that data products include
people stakeholders, business processes, data pipelines, and well-defined APIs for consumption.
Different domains may have internal or external data consumers, and data can take the
form/shape required by the end consumer (eg; data lake tables vs. event bus topics, etc).
People, Process and Methods:

Data Product Focus 


Technical Architecture Attributes:

Distributed Architecture 
Event Driven Ledgers 
ACID Support 
conceptual
Stream Oriented  solution
framework
Analytic Data Focus 
Operational Data Focus adjacent/GG
materialized
Physical & Logical Mesh  example…in
GoldenGate Usage
data event ledger, ingest
to cloud and event bus
analytic data lake

Page 33 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
Case
NETFLIX –Study
APPLICATION MODERNIZATION
Netflix has frequently been at the cutting edge of new IT innovation and investment in data mesh
is no different. Before the rise of popularity of the term, Netflix was already using a data mesh
approach to perform online migration of operational apps (to the cloud), to avoid any outages
that would affect customers. 17
After an exhaustive review, Netflix chose an approach with several key data mesh attributes
including a distributed architecture and event-based data ledgers. The target architecture was a
People, Process and Methods:
modern microservices based application, and the real-time migration approach enabled a
Data Product Focus yes – adjacent 18 phased cutover approach to new platforms (infrastructure & DBs) without any downtime.
Technical Architecture Attributes:
This is a good example of an operationally
Distributed Architecture  focused data mesh for a point-project.
Event Driven Ledgers  Cloud Data Platform
ACID Support  On Prem Platform
Stream Oriented n/a

Monolith
Analytic Data Focus n/a

Operational Data Focus 


Physical & Logical Mesh physical only
data event ledger, bi-
GoldenGate Use Case directional Tx-safe and Event
fully consistent events
Event Ledger
Ledger
Page 34 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
Case Study
WELLS FARGO – DATA CONTINUITY
Wells Fargo is one of the largest banks in the world and has been heavily investing in data-
driven digital transformation for many years. At the heart of data strategy is the need to
ensure 100% continuity of data operations, and Wells Fargo has spoken about their use of
GoldenGate microservices for these continuity use cases. 19
Combining operational data events with analytics and data lakes simplifies the data
architecture by reducing the number of ‘hops’ that the data must take prior to being
People, Process and Methods:
prepared for analytics. A data mesh approach aims to reduce friction and wasted IT
Data Product Focus yes – adjacent 20 resources when joining up operational and analytic data.
Technical Architecture Attributes:

Distributed Architecture 
distributed across multiple sites
Event Driven Ledgers 
microservices data event ledger (real-time events)
ACID Support  Event
deployments in a Ledger
Stream Oriented n/a service mesh
Analytic Data Focus yes - adjacent

Operational Data Focus 


Physical & Logical Mesh mainly physical
data event ledger, fully
GoldenGate Use Case consistent data events,
microservice deployments

Page 35 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
Case
PAYPAL –Study
MICROSERVICES PATTERNS
PayPal uses a modern microservices application architecture (distributed), and needed fast, 100%
correct transactions moved asynchronously among services. They used a data mesh approach 21
with event-driven data ledgers to de-centralize transactions with zero data loss or corruption.
They considered alternatives like ‘event sourcing’ and ‘multi-phase commits’ but could not
guarantee zero data loss, the event-driven data ledgers from GoldenGate provided the trust,
People, Process and Methods: correctness, and performance needed for a distributed data mesh.
Data Product Focus 
Technical Architecture Attributes:

Distributed Architecture 
Event Driven Ledgers 
ACID Support 
Stream Oriented yes - adjacent
Analytic Data Focus yes - adjacent

Operational Data Focus 


Physical & Logical Mesh domain driven
data event ledger, fully
GoldenGate Use Case consistent data events,
transaction outbox

Page 36 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
Case
WESTERNStudy
DIGITAL – INTEGRATION
Western Digital has been continuously investing in digital transformation goals for several years,
including the shift towards cloud-first and data-driven business practices. As a part of that
journey, they make extensive use of event-driven integration tech from Oracle – including
Integration Cloud and GoldenGate. 22
These data mesh capabilities provide a distributed, event-driven architecture that help in both
operational and analytic use cases. Operationally, the integration tech is used to modernize the
People, Process and Methods:
ERP platforms and ultimately reduce operating costs. For analytics, the shift to a real-time cloud
Data Product Focus yes - adjacent business means being able to continuously stream data events from applications into reporting
Technical Architecture Attributes: data marts, data warehouses and data lakes.
Distributed Architecture  Operational Edge and Analytics
Event Driven Ledgers  drive cost reductions and operating align core operations data to Fast Data and Big Data
efficiency by consolidating ERP and initiatives – impacting customer systems of engagement
ACID Support  moving applications to cloud 23 as well as data science / AI initiatives 24
Stream Oriented n/a

Analytic Data Focus 


Operational Data Focus 
Physical & Logical Mesh 
data event ledger, stream
GoldenGate Use Case SaaS data events into
business reporting tools

Page 37 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
Case
LINKEDINStudy
– STREAMING INGEST
LinkedIn created and operates one of the world’s largest Apache Kafka implementations, using
the tech for both operational and analytic data events. 25 For 100’s of applications that produce
database events, those (billions per day) data events are captured and ingested to Kafka using
GoldenGate data ledger. 26
A modern distributed data mesh must work with raw data events as they happen. When DBs
People, Process and Methods: commit transactions, those data events become the source/provider data from the systems of
record (SoR). Downstream stream processing tools (eg; Samza, Flink, GoldenGate Stream
Data Product Focus  Analytics) can then process these data events within milliseconds of their origin.
Technical Architecture Attributes:
Before: After:
Distributed Architecture 
Event Driven Ledgers 
ACID Support 
GoldenGate for
Stream Oriented  100% correct data
transaction events
Analytic Data Focus 
Operational Data Focus  Apache Kafka for
simple events
Physical & Logical Mesh 
data event ledger, stream
data mess!
GoldenGate Use Case DB events (DML & DDL)
into Apache Kafka

Page 38 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
Case
SAILGP –Study
STREAMING ANALYTICS
SailGP runs one of the most
exciting race venues in the world,
with high tech and high-speed sail
boats. Live race data and analytics
are provided within milliseconds
People, Process and Methods:
using data mesh tech. 27
Data Product Focus  Distributed edge technology links
Technical Architecture Attributes: race boat, support boat and race
helicopter data into streaming
Distributed Architecture  pipelines.
Event Driven Ledgers  Telemetry data is streamed into
ACID Support ingest to DW nearby clouds for real-time ETL,
analytics and ingest to cloud data
Stream Oriented  warehouse.
Analytic Data Focus  Data mesh tech uses GoldenGate
and Kafka (Oracle Streaming).
Operational Data Focus  Stream analytics are used in real-
Physical & Logical Mesh physical data
time on race day to assist with
support crews and broadcast
stream analytics, real-time
GoldenGate Use Case event correlation, ETL, networks.
analysis and ingest to DW

Page 39 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
Data
COMPARE Mesh
AND CONTRAST
[best]  [worst]

Data Fabric App-Dev-Integration Analytic Data Store

Data Mesh Data Integration Meta-Catalog Microservices Messaging Data Lake House Distributed DW

People, Process and Methods:

Data Product Focus       


Technical Architecture Attributes:

Distributed
Architecture       

Event Driven Ledgers       

ACID Support       

Stream Oriented       

Analytic Data Focus       


Operational Data
Focus       
Physical & Logical
Mesh       

Page 40 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies
Data
BUSINESS Mesh
OUTCOMES
Overall Benefits Operational Outcomes Analytic Outcomes
Faster, Data-Driven Multi-cloud data liquidity Automate and simplify data products
• unlock data capital to flow freely • multi-model data sets
Innovation Cycles
Real-time data sharing Time series data analysis
• Ops-to-Ops & Ops-to-Analytics • deltas / changed records
Reduce Costs for • event-by-event fidelity
Mission-Critical Edge, location-based data services
• correlate IRL device/data events Eliminate full data copies for ODS’
Data Operations • log-based ledgers and pipelines
Trusted microservices data interchange
• “event sourcing” with correct data Distributed data lakes & warehouses
• DataOps and CI/CD for Data • hybrid / multi-cloud / global
• streaming integration / ETL
Uninterrupted continuity
• >99.999% up-time SLAs Predictive Analytics
• cloud migrations • Data monetization,
new ‘data services’ for sale

41 Copyright © 2021, Oracle and/or its affiliates


Bringing it all together
Digital transformation is very, very hard and most will fail at it. 5 Lakes/Lakehouse, and Data Warehouses. This alignment of
Technology, software design and data architecture are becoming operational and analytic data domains is a critical enabler for the
increasingly more distributed, as modern techniques move away need to drive more self-service for the data consumer. Modern data
from highly-centralized and monolithic styles. platform technology can help to remove the middleman in
connecting data producers directly to data consumers.
Data Mesh is a new concept for data. It is at core a cultural mindset
shift to put the needs of data consumers first. It is also a real Oracle has long been the industry leader in mission critical data
technology shift, elevating the platforms and services that empower solutions, and has fielded some of the most modern capabilities to
a decentralized data architecture. Data Mesh is a deliberate shift empower a trusted Data Mesh:
towards highly distributed and real-time data events, as opposed to
monolithic, centralized and batch style data processing. Four • Gen2 public cloud infrastructure with >33 active regions
important attributes of Data Mesh include: • Multi-model database for ‘shape-shifting’ data products
• Microservices-based data event ledger for all data stores
1. Data Product Thinking – data consumer needs ahead of IT • Multi-cloud stream processing for real-time trusted data
2. Decentralized Data Architectures – a distributed mesh topology • API platform, modern app-dev and self-service tools
3. Event-Driven Data Ledgers – logs as the principal interchange • Analytics, data visualization and cloud-native data science
4. Polyglot Data Streaming – real-time processing for all data types
For more information, download the Data Mesh Tech Paper:
Use cases for Data Mesh encompass both operational data and https://www.oracle.com/a/ocom/docs/techbrief-
analytic data, which is one key difference from conventional Data enterprisedatameshandgoldengate.pdf
Our mission is to help people see
data in new ways, discover insights,
unlock endless possibilities.
Endnotes
1. 99.999% availability: https://www.oracle.com/a/tech/docs/maa-goldengate-hub.pdf
2. 10x faster innovation cycles, shifting away from batch ETL (eliminate batch windows), to continuous transformation and loading (CTL) via streaming ingest
3. Data derived from real world discussions with customers who have adopted the methodologies and tools described in this document
4. Most time and cost go into integration efforts: https://www.gartner.com/smarterwithgartner/use-a-hybrid-integration-approach-to-empower-digital-transformation/
5. 70-80% of digital transformations fail: https://www.bcg.com/publications/2020/increasing-odds-of-success-in-digital-transformation
6. Cloud lock in is real: https://www.infoworld.com/article/3623721/cloud-lock-in-is-real.html
7. Cloud can become more costly: https://a16z.com/2021/05/27/cost-of-cloud-paradox-market-cap-cloud-lifecycle-scale-growth-repatriation-optimization/
8. Data lakes rarely succeed: https://www.datanami.com/2021/05/07/drowning-in-a-data-lake-gartner-analyst-offers-a-life-preserver/
9. From Monolithic Data Lake to Distributed Data Mesh: https://martinfowler.com/articles/data-monolith-to-mesh.html
10. Cost of operational outages are rising: https://www.nextgov.com/ideas/2021/03/commercial-cloud-outages-are-wake-call/172731/
11. Integrate ops and analytics for better decisions: https://mitsloan.mit.edu/ideas-made-to-matter/digital-transformation-has-evolved-heres-whats-new
12. Transaction Outbox pattern: https://microservices.io/patterns/data/transactional-outbox.html
13. Strangler Fig migrations: https://martinfowler.com/bliki/StranglerFigApplication.html
14. Bounded Context in microservices: https://martinfowler.com/bliki/BoundedContext.html
15. Open Policy Agent (OPA): https://www.openpolicyagent.org/
16. Intuit Data Products: https://medium.com/intuit-engineering/intuits-data-mesh-strategy-778e3edaa017
17. Netflix Migration of Billing App: https://medium.com/netflix-techblog/netflix-billing-migration-to-aws-part-iii-7d94ab9d1f59
18. Netflix Data Productization / Inverting: https://canvas.stanford.edu/files/5342788/download?download_frd=1
19. Wells Fargo Data Continuity: Oracle OpenWorld 2018, GoldenGate Microservices joint presentation with Wells Fargo, Joe DiCario
20. Wells Fargo Data Productization: https://medium.com/@kshi/data-transformation-of-wells-fargo-en-f025843f5e2d
21. PayPal Microservices CDC Events: https://www.slideshare.net/r39132/big-data-fast-data-paypal-yow-2018
22. Western Digital Real-time Integrations: https://blogs.oracle.com/cloud-platform/western-digital-achieves-faster-integration-with-oracle-integration-cloud
23. Western Digital on ERP Consolidation: https://www.house-listing.com/technology/201903/298492.html
24. Western Digital Fast Data and Big Data graphics: https://gestaltit.com/tech-field-day/gestalt/western-digitals-data-vision-at-sfd18/
25. LinkedIn Use of Apache Kafka: https://www.confluent.io/blog/event-streaming-platform-1/
26. LinkedIn Use of Oracle GoldenGate for Streaming Ingest: Oracle OpenWorld 2019, GoldenGate joint presentation with LinkedIn
27. SailGP Stream Analytics: https://www.oracle.com/news/announcement/sailgp-launches-second-season-with-oracle-cloud-041521.html

Page 44 - Enterprise Data Mesh: Solutions, Use Cases and Case Studies

You might also like