Architect's Open-Source Guide For A Data Mesh Architecture: Lena Hall Microsoft
Architect's Open-Source Guide For A Data Mesh Architecture: Lena Hall Microsoft
Director at Microsoft
Azure Engineering
ü Architecture
ü Cloud
ü Data
ü ML/AI
lenadroid
Entry Point
How to Move Beyond a Monolithic Data Lake to a Distributed
Data Mesh
https://martinfowler.com/articles/data-monolith-to-mesh.html
lenadroid
Talk Snapshot
• What is Data Mesh
• When is Data Mesh a Good Idea
• Core Principles and Concepts
• Example: Drone Delivery Service
• Challenges
• OSS and Open Standards
lenadroid
When and Why
Data Mesh
@lenadroid
Data Mesh is Not For Everyone
Challenges Indicating Data Mesh
May Be Considered
@lenadroid
Drone Delivery Service
lenadroid
WHYs
lenadroid
Ideas Composing Data Mesh Concept
@lenadroid
Core Ideas
lenadroid
Core Ideas
lenadroid
High-Level View of a Data Product
lenadroid
Core Ideas
lenadroid
Core Ideas
lenadroid
Drone Delivery Service Data Products
@lenadroid
lenadroid
Core Principles for Data Products
@lenadroid
Core Principles for Data Products
DISCOVERABLE
lenadroid
Core Principles for Data Products
DISCOVERABLE
SELF-DESCRIBING
lenadroid
Core Principles for Data Products
DISCOVERABLE
SELF-DESCRIBING
ADDRESSABLE
lenadroid
Core Principles for Data Products
DISCOVERABLE SECURE
SELF-DESCRIBING
ADDRESSABLE
lenadroid
Core Principles for Data Products
DISCOVERABLE SECURE
SELF-DESCRIBING
TRUSTWORTHY
ADDRESSABLE
lenadroid
Core Principles for Data Products
DISCOVERABLE SECURE
SELF-DESCRIBING
TRUSTWORTHY
ADDRESSABLE INTEROPERABLE
lenadroid
Cheat Sheet for Planning Data Products lenadroid
@lenadroid
Types of Workloads Within a Data Product
PROCESSING
REAL-TIME DATA
COLUMNAR STORAGE
INGESTION
OBJECT STORAGE
PROCESSING
INCOMING
REQUEST
WEB SERVICE
COLUMNAR STORAGE
PROCESSING
lenadroid
It can look like this
Azure Data
Lake
WEB SERVICE
PROCESSING
lenadroid
Or, it can look like this
Google
Storage
WEB SERVICE
lenadroid
Self-Serve Shared Infrastructure
SHARED PLATFORM FOR SHARED PLATFORM FOR SHARED PLATFORM AND MORE…
CONTAINER WORKLOADS CONTINUOUS DELIVERY FOR OBSERVABILITY
lenadroid
lenadroid
Data Mesh
DISCOVERABLE
SELF-DESCRIBING
ADDRESSABLE
SECURE
TRUSTWORTHY SHARED PLATFORM FOR SHARED PLATFORM FOR SHARED PLATFORM FOR SHARED PLATFORM FOR SHARED PLATFORM FOR SHARED PLATFORM DATA CATALOGUE
STREAMING INGESTION RAW DATA STORAGE COLUMNAR DATA STORAGE CONTAINER WORKLOADS CONTINUOUS DELIVERY FOR OBSERVABILITY
INTEROPERABLE
Wait, What About the OSS Tools for Data Mesh??
@lenadroid
Challenges with Data Mesh
@lenadroid
Challenges
• Cost questions
• Lack of end-to-end examples
• Efforts to shift from centralized architecture to decentralization-
friendly techniques
• Automation required for enabling creating data products
• Underestimating the importance organizational aspects
lenadroid
Considerations for Technology Choices
@lenadroid
Considerations for Technology Choices
lenadroid
Examples of Data Mesh-friendly Technologies
@lenadroid
Data Catalogue, Data Lineage, OSS Data Analytics, Data Products for Data Analytics and
Data Ingestion, Streaming Data Visualization and BI Tools Cross-Platform Concepts and Tools
Data Governance Processing, Data Querying Processing
Open Formats Cloud Storage OSS Storage Infrastructure Automation Data Experimentation Multi and Hybrid Cloud Tools
Amazon S3
Anthos
lenadroid
data
Google Storage
Data Governance Systems
• Metadata
• Data lineage
• Data schemas
• Data relationships
• Data classification
• Data security
• Data catalog
lenadroid
Open Formats
• Open standard
• Atomic updates, serializable isolation, transactions
• Concurrent operations
• Versioning, rollbacks, time-travel
• Schema Evolution
• Scale, Efficiency, Data Volumes
• Compatibility with existing data stores and languages
lenadroid
Data Platforms (Cloud or OSS)
lenadroid
Multi-Cloud Infrastructure Management
• Terraform
Open-source infrastructure as code software tool that enables you to safely and
predictably create, change, and improve infrastructure.
• Pulumi
Open-source infrastructure as code SDK that enables you to create, deploy, and
manage infrastructure on any cloud, using your favorite languages.
• Crossplane
Assemble infrastructure from multiple vendors, and expose higher level self-service
APIs for application teams to consume, without having to write any code.
lenadroid
Multi-Cloud Workload Portability
• Azure Arc
Build cloud-native apps anywhere, at scale. Run Azure services in any Kubernetes
environment, whether it’s on-premises, multi-cloud, or at the edge
• Google Athnos
A modern application management platform that provides a consistent development
and operations experience for cloud and on-premises environments
lenadroid
Kubernetes Open-Standard Technologies
NOT AN EXHAUSTIVE LIST
lenadroid
Benefits Brought by Data Mesh
• Data Quality
• Tailored resource and focus allocation
• Organizational cohesion while allowing flexibility
• Reducing complexity
• Democratizing creating value
• Better understanding of value and innovation opportunities
• Empowering a more consistent and fast change
@lenadroid
Important Focus Areas for Technology Providers
• Open Standards, Open Protocols, Open-Source Integrations
• Workload sharing and multi-tenancy
• No-copy data and compute mobility support
• Granularity of access-control
• Richness of automation and extensibility capabilities
• Flexibility and elasticity
• Provider-agnostic/multi-cloud operations support
• Variety of limitations
(quotas, data volume, resource count, etc.)
@lenadroid
Data Mesh will drive better Interoperability, Open
Standards, and Data Quality in the Industry
@lenadroid
Thank you!