0% found this document useful (0 votes)
250 views48 pages

Architect's Open-Source Guide For A Data Mesh Architecture: Lena Hall Microsoft

The document discusses the concept of a data mesh architecture as an alternative to a monolithic data lake, outlining core principles of data mesh including decentralized data ownership, data products powered by domain-driven design, and self-serve shared infrastructure; it provides an example of how a data mesh could be implemented for a drone delivery service and discusses challenges and considerations for technology choices in building a data mesh.

Uploaded by

Christopher
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
250 views48 pages

Architect's Open-Source Guide For A Data Mesh Architecture: Lena Hall Microsoft

The document discusses the concept of a data mesh architecture as an alternative to a monolithic data lake, outlining core principles of data mesh including decentralized data ownership, data products powered by domain-driven design, and self-serve shared infrastructure; it provides an example of how a data mesh could be implemented for a drone delivery service and discusses challenges and considerations for technology choices in building a data mesh.

Uploaded by

Christopher
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

Architect’s Open-Source

Guide for a Data Mesh


Architecture
Lena Hall
Microsoft
Lena Hall

Director at Microsoft
Azure Engineering

ü Architecture
ü Cloud
ü Data
ü ML/AI

lenadroid
Entry Point
How to Move Beyond a Monolithic Data Lake to a Distributed
Data Mesh
https://martinfowler.com/articles/data-monolith-to-mesh.html

Data Mesh Principles and Logical Architecture


https://martinfowler.com/articles/data-mesh-principles.html

Slack for Data-Mesh-Learning


https://launchpass.com/data-mesh-learning

lenadroid
Talk Snapshot
• What is Data Mesh
• When is Data Mesh a Good Idea
• Core Principles and Concepts
• Example: Drone Delivery Service
• Challenges
• OSS and Open Standards

lenadroid
When and Why
Data Mesh

@lenadroid
Data Mesh is Not For Everyone
Challenges Indicating Data Mesh
May Be Considered

@lenadroid
Drone Delivery Service

lenadroid
WHYs

• Ambiguity in Ownership and Responsibility


• Slow Change due to Coupling to Monolithic System
• Data Engineering Resources Bottleneck

lenadroid
Ideas Composing Data Mesh Concept

@lenadroid
Core Ideas

ü Decentralized teams and data ownership

lenadroid
Core Ideas

ü Decentralized teams and data ownership


ü Data Products powered by Domain Driven Design

lenadroid
High-Level View of a Data Product

lenadroid
Core Ideas

ü Decentralized teams and data ownership


ü Data Products powered by Domain Driven Design
ü Self-serve Shared Data Infrastructure

lenadroid
Core Ideas

ü Decentralized teams and data ownership


ü Data Products powered by Domain Driven Design
ü Self-serve Shared Data Infrastructure
ü Global Federated Governance

lenadroid
Drone Delivery Service Data Products

@lenadroid
lenadroid
Core Principles for Data Products

@lenadroid
Core Principles for Data Products

DISCOVERABLE

lenadroid
Core Principles for Data Products

DISCOVERABLE

SELF-DESCRIBING

lenadroid
Core Principles for Data Products

DISCOVERABLE

SELF-DESCRIBING

ADDRESSABLE

lenadroid
Core Principles for Data Products

DISCOVERABLE SECURE

SELF-DESCRIBING

ADDRESSABLE

lenadroid
Core Principles for Data Products

DISCOVERABLE SECURE

SELF-DESCRIBING
TRUSTWORTHY

ADDRESSABLE

lenadroid
Core Principles for Data Products

DISCOVERABLE SECURE

SELF-DESCRIBING
TRUSTWORTHY

ADDRESSABLE INTEROPERABLE

lenadroid
Cheat Sheet for Planning Data Products lenadroid

Input Ports Questions Data Product Action Questions


• Data Source - Where is the data coming from? External dataset or • What is the action that needs to happen to produce the outcomes for the
another data product? end-users?
• Data Format - What is the format of the source input? • What are the required adjustments, transformations, filters, updates, or
• Rate of Updates - How frequently does the input need to be updated? quality improvements to the input data?

Output Ports Questions Operational Questions


• End-consumers - Who are the end-users of the data product? • How can this data product be discovered and how should it be described to
• Data purpose - What are they planning to do with the data outputs? other data products that might want to consume it?
• Data access - Who needs to have access? How do they prefer to access • Which metadata and information should it make available to the end-
the data output? users?
• Data address - How do they prefer to access the data output? • Where and how should data product versioning be managed during
• Data Format - What format of the data do they expect? updates to ensure consistency with how the end-users consume it?
• Which SLAs or SLOs does the data product provide?
Which product success metrics can this data product expose and keep
Identity and Permission Policies Questions •
track of? (adoption, usage, quality)
• Which resources can this data product be allowed to access? • Is the automation/resource orchestration logic stored in the same
• Which data products or users can read which output ports of this data package?
product?
• Are all sensitive resources this data product offers protected according
their required privacy standards (e.g. HIPAA, GDPR, PII, CCPA, etc.) Other Questions
• Is the permissions policy stored and managed in the same package • Is this product not tightly coupled to any other data source, data product,
as the data product? or any other resource that makes him not interoperable?
• Does this data product follow the defined global governance standards and
practices defined by the organization?
• Does this data product have any implementation details that could
interfere with its portability?
Self-Serve Shared Infrastructure

@lenadroid
Types of Workloads Within a Data Product

PROCESSING
REAL-TIME DATA
COLUMNAR STORAGE

INGESTION

OBJECT STORAGE

PROCESSING

INCOMING
REQUEST
WEB SERVICE
COLUMNAR STORAGE
PROCESSING
lenadroid
It can look like this

Azure Data
Lake

WEB SERVICE
PROCESSING
lenadroid
Or, it can look like this

Google
Storage

WEB SERVICE

lenadroid
Self-Serve Shared Infrastructure

SHARED PLATFORM FOR SHARED PLATFORM FOR SHARED PLATFORM FOR


STREAMING INGESTION RAW DATA STORAGE COLUMNAR DATA STORAGE DATA CATALOGUE

SHARED PLATFORM FOR SHARED PLATFORM FOR SHARED PLATFORM AND MORE…
CONTAINER WORKLOADS CONTINUOUS DELIVERY FOR OBSERVABILITY

DEPENDING ON THE ORGANIZATION

lenadroid
lenadroid

Data Mesh

DISCOVERABLE

SELF-DESCRIBING

ADDRESSABLE

SECURE

TRUSTWORTHY SHARED PLATFORM FOR SHARED PLATFORM FOR SHARED PLATFORM FOR SHARED PLATFORM FOR SHARED PLATFORM FOR SHARED PLATFORM DATA CATALOGUE
STREAMING INGESTION RAW DATA STORAGE COLUMNAR DATA STORAGE CONTAINER WORKLOADS CONTINUOUS DELIVERY FOR OBSERVABILITY

INTEROPERABLE
Wait, What About the OSS Tools for Data Mesh??

@lenadroid
Challenges with Data Mesh

@lenadroid
Challenges
• Cost questions
• Lack of end-to-end examples
• Efforts to shift from centralized architecture to decentralization-
friendly techniques
• Automation required for enabling creating data products
• Underestimating the importance organizational aspects

lenadroid
Considerations for Technology Choices

@lenadroid
Considerations for Technology Choices

• Workload sharing and multi-tenancy


• No-copy data and compute mobility support
• Granularity of access-control
• Richness of automation and extensibility capabilities
• Flexibility and elasticity
• Provider-agnostic/multi-cloud operations support
• Variety of limitations
(quotas, data volume, resource count, etc.)

• Open Standards, Open Protocols, Open-Source Integrations

lenadroid
Examples of Data Mesh-friendly Technologies

@lenadroid
Data Catalogue, Data Lineage, OSS Data Analytics, Data Products for Data Analytics and
Data Ingestion, Streaming Data Visualization and BI Tools Cross-Platform Concepts and Tools
Data Governance Processing, Data Querying Processing

Data Orchestration, Workflows

Open Formats Cloud Storage OSS Storage Infrastructure Automation Data Experimentation Multi and Hybrid Cloud Tools

Amazon S3
Anthos
lenadroid

Azure Data Lake Azure Arc

data
Google Storage
Data Governance Systems

• Metadata
• Data lineage
• Data schemas
• Data relationships
• Data classification
• Data security
• Data catalog
lenadroid
Open Formats

• Open standard
• Atomic updates, serializable isolation, transactions
• Concurrent operations
• Versioning, rollbacks, time-travel
• Schema Evolution
• Scale, Efficiency, Data Volumes
• Compatibility with existing data stores and languages

lenadroid
Data Platforms (Cloud or OSS)

• Separation of storage and compute


• Support for no-copy data sharing
• Bringing compute to data
• Fine-tuned granularity of permissions for access
• Support for automation and resource management
• Open standards and interoperability with other platforms and
tools for governance, visualization, analytics, etc.

lenadroid
Multi-Cloud Infrastructure Management

• Terraform
Open-source infrastructure as code software tool that enables you to safely and
predictably create, change, and improve infrastructure.
• Pulumi
Open-source infrastructure as code SDK that enables you to create, deploy, and
manage infrastructure on any cloud, using your favorite languages.
• Crossplane
Assemble infrastructure from multiple vendors, and expose higher level self-service
APIs for application teams to consume, without having to write any code.

lenadroid
Multi-Cloud Workload Portability

• Azure Arc
Build cloud-native apps anywhere, at scale. Run Azure services in any Kubernetes
environment, whether it’s on-premises, multi-cloud, or at the edge

• Google Athnos
A modern application management platform that provides a consistent development
and operations experience for cloud and on-premises environments

lenadroid
Kubernetes Open-Standard Technologies
NOT AN EXHAUSTIVE LIST

• Open Application Model


An open standard for defining cloud native apps.
KubeVella - https://kubevela.io/docs/concepts
• Open Policy Agent
Declarative Policy-as-Code, enables portability, combination with Infra-as-Code.
https://www.openpolicyagent.org/docs/latest
• Service Catalog
Provision managed services and make them available within a Kubernetes cluster.
https://kubernetes.io/docs/concepts/extend-kubernetes/service-catalog/

lenadroid
Benefits Brought by Data Mesh
• Data Quality
• Tailored resource and focus allocation
• Organizational cohesion while allowing flexibility
• Reducing complexity
• Democratizing creating value
• Better understanding of value and innovation opportunities
• Empowering a more consistent and fast change

@lenadroid
Important Focus Areas for Technology Providers
• Open Standards, Open Protocols, Open-Source Integrations
• Workload sharing and multi-tenancy
• No-copy data and compute mobility support
• Granularity of access-control
• Richness of automation and extensibility capabilities
• Flexibility and elasticity
• Provider-agnostic/multi-cloud operations support
• Variety of limitations
(quotas, data volume, resource count, etc.)

@lenadroid
Data Mesh will drive better Interoperability, Open
Standards, and Data Quality in the Industry

@lenadroid
Thank you!

Follow lenadroid for more insights

You might also like