0% found this document useful (0 votes)

77 views35 pages

Data Lakehouse, Data Mesh, and Data Fabric - SqlBits

The document provides an overview of various data architectures including Data Lakehouse, Data Mesh, and Data Fabric, highlighting their definitions, purposes, and use cases. It discusses the importance of data warehouses and lakes, their complementary roles, and the evolution towards modern data solutions that address challenges in data management and analytics. Additionally, it outlines the principles of Data Mesh and its potential integration with existing data solutions, emphasizing the need for a cultural shift and organizational readiness for successful implementation.

Uploaded by

developercmc51

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

77 views35 pages

Data Lakehouse, Data Mesh, and Data Fabric - SqlBits

Uploaded by

developercmc51

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

Data Lakehouse, Data

Mesh, and Data Fabric

(the alphabet soup of data architectures)

James Serra
Data & AI Solution Architect
Microsoft
[email protected]
Blog: JamesSerra.com
About Me
▪ Microsoft, Data & AI Solution Architect in Microsoft Consulting Services (MCS), now called Industry
Solutions Delivery (ISD)
▪ At Microsoft for most of the last eight years, with a brief stop at EY
▪ Was previously a Data & AI Architect at Microsoft for seven years
▪ In IT for 35 years, worked on many BI and DW projects
▪ Worked as desktop/web/database developer, DBA, BI and DW architect and developer, MDM
architect, PDW/APS developer
▪ Been perm employee, contractor, consultant, business owner
▪ Presenter at PASS Summit, SQLBits, Enterprise Data World conference, Big Data Conference
Europe, SQL Saturdays
▪ Blog at JamesSerra.com
▪ Former SQL Server MVP
▪ Author of book “Reporting with Microsoft SQL Server 2012”
Agenda

▪ Data Warehouse
▪ Data Lake
▪ Modern Data Warehouse
▪ Data Fabric
▪ Data Lakehouse
▪ Data Mesh
I tried to figure out all these data platform buzzwords on my own…

And it did not turn out well:

Let’s prevent that from happening…

The view in this deck are my own and not that of Microsoft!
What is a Data Warehouse and why use one?
(or, why do we need a copy of the source data?)
A data warehouse is where you store data from multiple data sources to be used for historical and trend analysis
reporting. It acts as a central repository for many subject areas and contains the "single version of truth". It is
NOT to be used for OLTP applications.

Reasons for a data warehouse:

▪ Reduce stress on production system
▪ Optimized for read access, sequential disk scans
▪ Integrate many sources of data
▪ Keep historical records (no need to save hardcopy reports)
▪ Restructure/rename tables and fields, model data
▪ Protect against source system upgrades
▪ Use Master Data Management, including hierarchies
▪ No IT involvement needed for users to create reports
▪ Improve data quality and plugs holes in source systems
▪ One version of the truth
▪ Easy to create BI solutions on top of it (i.e. Power BI tabular model)
▪ Don’t need to provide security access for many users to the production systems
▪ Make better business decisions by getting greater insights into your company

Why You Need a Data Warehouse

Two Approaches to getting value out of data: Top-Down +
Bottoms-Up
How can we
make it happen?
Prescriptive
What will Analytics
happen?
Theory
Predictive
Theory Analytics
Why did Hypothesis
Hypothesis it happen?
Diagnostic Pattern
Observation What
Analytics
happened?
Observation
Descriptive
Confirmation
Analytics
Data Warehousing Uses A Top-Down Approach
Understand Gather Implement Data Warehouse
Corporate Requirements
Strategy Reporting &
Reporting &
Analytics Design Analytics
Business Development
Requirements

Dimension Modelling Physical Design

ETL
ETL Design
Development
Technical
Requirements

Data sources
Setup Infrastructure Install and Tune
The “data lake” Uses A Bottoms-Up Approach
Ingest all data Store all data Do analysis
regardless of requirements in native format without Using analytic engines
schema definition like Hadoop

Devices
Batch queries
Interactive queries
Real-time analytics
Machine Learning
Data warehouse
Data Lake + Data Warehouse Better Together

What happened? What will happen?

Descriptive Predictive
Analytics Analytics

Why did it happen? How can we make it happen?

Diagnostic Prescriptive
Analytics Data sources
Analytics
What is a data lake and why use one?
A schema-on-read storage repository that holds a vast amount of raw data in its native format until it is needed.

Reasons for a data lake:

• Inexpensively store unlimited data

• Centralized place for multiple subjects (single version of the truth)
• Collect all data “just in case” (data hoarding). The data lake is a good place for data that you “might” use down the road
• Easy integration of differently-structured data
• Store data with no modeling – “Schema on read”
• Complements enterprise data warehouse (EDW)
• Frees up expensive EDW resources for queries instead of using EDW resources for transformations (avoiding user contention)
• Wanting to use technologies/tools (i.e Databricks) to refine/filter data that do the refinement quicker/better than your EDW
• Quick user access to data for power users/data scientists (allowing for faster ROI)
• Data exploration to see if data valuable before writing ETL and schema for relational database, or use for one-time report
• Allows use of Hadoop tools such as ETL and extreme analytics
• Place to land IoT streaming data
• On-line archive or backup for data warehouse data (i.e. keep three years of data in DW and have older data in data lake with an external table pointing to it)
• With Hadoop/ADLS, high availability and disaster recovery built in
• It can ingest large files quickly and provide data redundancy
• ELT jobs on EDW are taking too long because of increasing data volumes and increasing rate of ingesting (velocity), so offload some of them to the Hadoop data lake
• Have a backup of the raw data in case you need to load it again due to an ETL error (and not have to go back to the source). You can keep a long history of raw data
• Allows for data to be used many times for different analytic needs and use cases
• Cost savings and faster transformations: storage tiers with lifecycle management; separation of storage and compute resources allowing multiple instances of different
sizes working with the same data simultaneously vs scaling data warehouse; low-cost storage for raw data saving space on the EDW
• Extreme performance for transformations by having multiple compute options each accessing different folders containing data
• The ability for an end-user or product to easily access the data from any location
Data Warehouse
Serving, Security & Compliance
• Business people
• Low latency
• Complex joins
• Interactive ad-hoc query
• High number of users
• Additional security
• Large support for tools
• Dashboards
• Easily create reports (Self-service BI)
• Know questions
Enterprise Data Maturity Stages

STAGE 4:
Transformative
STAGE 3:
Predictive Data transforms
STAGE 2: business to drive
Informative desired outcomes.
Data capture is
STAGE 1: comprehensive and Any data, any
Reactive
Structured data is scalable and leads source, anywhere at
managed and business decisions scale
Structured data is analyzed centrally based on advanced
transacted and and informs the analytics
locally managed. business
Data used
reactively

Rear-view Real-time
mirror intelligence
Modern Data Warehouse
Data Fabric
Data Fabric adds to a modern data warehouse:
• Data access
• Data policies
• Metadata catalog/Lineage
• Master Data Management (MDM)
• Data virtualization
• Real-time processing
• Data scientist tools
• APIs
• Building blocks/Services
• Products

Bottom line: Additional technology to source more data, secure it, and make it available
Data Fabric defined
Data Lakehouse
Delta Lake
Top features:

• ACID transactions
• Time travel (data versioning enables rollbacks, audit trail)
• Streaming and batch unification
• Schema enforcement
• Upserts and deletes (MERGE)
• Performance improvement

Databricks Delta Lake

Use cases for Data Lakehouse
Today’s data architectures commonly suffer from four problems:

• Reliability: Keeping the data lake and warehouse consistent

• Data staleness: Data in warehouse is older
• Limited support for advanced analytics: Top ML systems don’t
work well on warehouses
• Total cost of ownership: Extra cost for data copied to warehouse

Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics
Concerns skipping relational database
• Speed: Relational databases faster, especially Massively Parallel Processing
(MPP)
• Security: No RLS, column-level, dynamic data masking
• Complexity: Metadata separate from data, file-based world
• Concurrency: Multiple reads of a file at the same time
• Missing features: Referential integrity, TDE, workload management; other
features lock you into Spark
• Having to use Spark SQL instead of T-SQL
• People used to using a relational database

Azure Synapse: starting to see data lake only solutions because can use T-SQL,
Power BI (speed, RLS), cost savings with Serverless
Data Lakehouse & Synapse
Data Mesh
Data Mesh in theory
Lots of things sound great in theory…
Data Mesh - Overview

Data mesh is an intentionally designed distributed data architecture, under centralized governance and standardization for
interoperability, enabled by a shared and harmonized self-serve data infrastructure

Data Mesh Principles

#1) Domain Ownership #2) Data as a product #3) Self-serve data #4) Federated
infrastructure as a platform computational governance

Decentralize and distribute Analytical data provided by High-level abstraction of Architect global decisions and
responsibility to people who the domains are treated as a diverse infrastructure that standards for interoperability,
are closest to the data in product and the consumers removes complexity and while respecting autonomy of
order to support continuous of that data are treated as friction of provisioning and local domains, and
change and scalability (i.e. customers (domain teams, managing the lifecycle of data implement global policies
manufacturing, sales, API code, data and metadata, products (i.e. storage, effectively (i.e. data quality,
supplier) infrastructure) compute, data pipeline, data security, regulations,
access control) data modeling)
Data Mesh

Data Mesh is a concept, not a product

It’s a mindset shift where you go from:

• Centralized ownership to decentralized

ownership
• Pipelines as first-class concern to domain data
as first-class concern
• Data as a by-product to data as a product
• A siloed data engineering team to cross-
functional domain-data teams
• A centralized data lake/warehouse to an
ecosystem of data products

Credit to Zhamak Dehghani, Slack: data-mesh-learning.slack.com

Use cases for Data Mesh
Data mesh tries to solve four challenges with a centralized data lake/warehouse:

• Lack of ownership: who owns the data – the data source team or the infrastructure team?
• Lack of quality: the infrastructure team is responsible for quality but does not know the data
well
• Organizational scaling: the central team becomes the bottleneck, such as with an enterprise
data lake/warehouse
• Technical scaling: current big data solutions can’t keep up with additional data requirements
Data Mesh – Logical Architecture
Data Sources

Domain design can be a long and

Customer 360 complicated process!

Manufacturing Sales Supplier

(source-aligned) (source-aligned) (source-aligned)

Supplier
P&L (consumer-Aligned)
(aggregate)

Consumers
Concerns with Data Mesh
• No standard definition of a data mesh
• Huge investment in organizational change and technical implementation
• Performance of combining data from multiple domains
• Duplication of data for performance reasons
• Getting quality engineering people for each domain
• Inconsistent technical implementations for the domains
• Domains don’t want to wait for a data mesh
• Need incentives for each domain to counter extra work
• Self-serve approach of data requests could be challenging
• Duplication of data and ingestion platform
• Creation of data silos for domains not able to join data mesh
• Not seeing the big picture for combing data

Data Mesh: Centralized vs decentralized data architecture

Data Mesh: Centralized ownership vs decentralized ownership
Key for a successful Data Mesh
• Have current pain points
• A company culture open to change
• Experience people
• Be aware of Data Mesh concerns
• Don’t just jump on the latest buzzword
• Don’t listen to vendors
• Don’t go strictly “by the data mesh book”
• Have a very long runway
Should you adopt data mesh today?

Need to score medium or high in ALL categories

From book “Data Mesh – Delivering Data-Driven Value at Scale” by Zhamak Dehghani
Data Mesh on Azure
Enterprise Scale Analytics and AI (ESA)
Enterprise-scale is an architecture approach and reference implementation that enables effective construction and operationalization of landing
zones on Azure, at scale and aligned with Azure Roadmap and Cloud Adoption Framework.

What is Enterprise Scale Analytics and AI?

A scalable analytics framework designed to enable customers building a data platform.

• Supports multiple topologies ranging across Data Centric, Lakehouse, Data Fabric and Data Mesh
• Based on inputs from PG and a diverse international group of specialists working with a range of customers
• Separate guidance tailored to Small-Medium and Large enterprises
• ~80% prescribed viewpoint with 20% client customization

Enterprise Scale Landing Zones is a prerequisite for Enterprise Scale Analytics since it is built on the core foundation of Enterprise Scale Landing
Zones. Consisting of:
• Prescriptive architecture
• Designed by Subject Matter Experts
• Documented End to End Technical Solution
• Deployment Templates
• Operational Usage Model
Data Mesh on Azure Resources
• Piethein Strengholt: Blog - Implementing Data Mesh on Azure (public), Blog – Data Mesh topologies
(public), Book - Data Management at Scale: Best Practices for Enterprise Architecture (public)
• Cloud Adoption Framework: Azure data management and analytics scenario (public)
• Data Management & Analytics Scenario - Data Management Zone: Github (public)
• Data Management & Analytics Scenario - Data Landing Zone: Github (public)
• Enterprise-Scale - Reference Implementation: Github (public)
• Microsoft doc: A financial institution scenario for data mesh (public)
Governance Topologies : Different Approaches
Mesh Type 2 • Domains use the same technology
• Each domain has its own storage that
is the same technology

Centralised Distributed
(Control) (Agility)
Mesh Type 1 • Domains use the same technology Mesh Type 3 • Domains can use any technology
• Data is kept in one enterprise data they want
lake with each domain getting its • Each domain has its own storage
own container/folder that can be any technology

?
Data Fabric vs Data Mesh
If Data Fabric uses data virtualization, how is it different from Data Mesh:

• Usually only some of the data is virtualized, so still mostly centralized

• Not making data as a product (no contract with domains).
• Still have siloed data engineering team in IT

Or more specifically, violates 3 of the 4 Data Mesh principles:

1) Domain ownership
2) Data as a product
3) Self-serve data infrastructure as a platform
4) Federated computational governance
Future
This view is my own and not that of Microsoft!

In the end, I predict data mesh will become an extension to a centralized data solution for a small percentage of solutions.

There will be a very small percentage of solutions that are 100% true to the data mesh concept (assuming mesh type 1 and 2 are true to
the data mesh concept). Ask ten people what a data mesh is and you will get eleven answers!

Always ask: What do your users need? Data Mesh

Domain
A
Centralized 1) Domain ownership (90%)
2) Data as a product (50%)
Data 3) Self-serve data infrastructure as a
Solution platform (1%)
4) Federated computational
governance (25%)
Domain
B

Rethinking the Data Mesh Architecture: Apply it Piecemeal (eckerson.com)

Feedback:

https://sqlb.it/?7106

Q&A
James Serra, Microsoft, Data & AI Solution Architect
?
Email me at: [email protected]
Follow me at: @JamesSerra
Link to me at: www.linkedin.com/in/JamesSerra
Visit my blog at: JamesSerra.com
Comparisons of Data Fabric and Data Mesh

Areas Data Mesh Data Fabric

Focus on data architecture, semantic
Framework Focus on data architecture consumption, through the wide use of
Ontologies
Governance Multiple governance layers Unified governance layer
Data Products owning the domain data and
Focuses on a comprehensive Unified Security
Security applying security and governance applicable
model across the entire Data Ecosystem
to the domain
Complex mechanics to ensure consistency of Focused on enabling and ensuring trust by
Consistency
data applying automatic consistency

Is complex, even to start a small By far simpler, due to the inherent use of Data
Implementation implementation due to the need of Virtualization, meta data and knowledge
understanding and segregating domain data graphs

Databricks Certified Data Engineer Professional Exam Guide 1 Mar 2025
No ratings yet
Databricks Certified Data Engineer Professional Exam Guide 1 Mar 2025
6 pages
Use Delta Lake in Azure Synapse Analytics
No ratings yet
Use Delta Lake in Azure Synapse Analytics
37 pages
Azure Data Factory
No ratings yet
Azure Data Factory
3,167 pages
Databricks Certified Data Engineer Professional Practice Questions
No ratings yet
Databricks Certified Data Engineer Professional Practice Questions
13 pages
Building Medallion Architectures 1742969743
No ratings yet
Building Medallion Architectures 1742969743
18 pages
APJ Elevate - Databricks Certification Exam Overview Training Data Analyst Associate
No ratings yet
APJ Elevate - Databricks Certification Exam Overview Training Data Analyst Associate
96 pages
Data Governance On Unity Catalog - Jul 2024
No ratings yet
Data Governance On Unity Catalog - Jul 2024
56 pages
Diagrama Elétrico Rolo 3411
100% (1)
Diagrama Elétrico Rolo 3411
67 pages
The Definitive Guide To The SQL Data Lakehouse Eckerson Report
No ratings yet
The Definitive Guide To The SQL Data Lakehouse Eckerson Report
19 pages
KQL Cheat Sheet DP700
No ratings yet
KQL Cheat Sheet DP700
2 pages
Data Mesh
No ratings yet
Data Mesh
345 pages
Data Lake Vs Warehouse Vs Lakehouse Vs Mesh Vs Fabric 1651985778
100% (1)
Data Lake Vs Warehouse Vs Lakehouse Vs Mesh Vs Fabric 1651985778
10 pages
Python For Data Engineering Guide
No ratings yet
Python For Data Engineering Guide
4 pages
Datawarehouse To Data Lakehouse
100% (1)
Datawarehouse To Data Lakehouse
48 pages
Fabric Onelake
No ratings yet
Fabric Onelake
89 pages
The Knowledge Graph Cookbook
No ratings yet
The Knowledge Graph Cookbook
228 pages
Cert DEWD (Edits)
No ratings yet
Cert DEWD (Edits)
158 pages
Iti Pdfs
No ratings yet
Iti Pdfs
10 pages
The Benefits of Delta Lake and Lakehouse Architecture
No ratings yet
The Benefits of Delta Lake and Lakehouse Architecture
3 pages
BodyLanguagefor Leaders PDF
No ratings yet
BodyLanguagefor Leaders PDF
14 pages
Databricks 101
No ratings yet
Databricks 101
16 pages
Apache Iceberg - Additional Real World Use Cases
No ratings yet
Apache Iceberg - Additional Real World Use Cases
25 pages
Manage Data Access With Unity Catalog
No ratings yet
Manage Data Access With Unity Catalog
17 pages
Data Ready Ai
No ratings yet
Data Ready Ai
8 pages
Generative AI APIs For Practical Applications
No ratings yet
Generative AI APIs For Practical Applications
27 pages
Maneesh Azure
No ratings yet
Maneesh Azure
6 pages
Data Lakehouse
No ratings yet
Data Lakehouse
7 pages
Data Hub Guide For Architects
No ratings yet
Data Hub Guide For Architects
83 pages
Dokumen - Pub - Understanding Etl Data Pipelines For Modern Data Architectures Early Release 9781098159252
No ratings yet
Dokumen - Pub - Understanding Etl Data Pipelines For Modern Data Architectures Early Release 9781098159252
39 pages
Caching in Spark
No ratings yet
Caching in Spark
51 pages
NSX Lab Description
No ratings yet
NSX Lab Description
344 pages
Tungban Machine Learning Math Course
No ratings yet
Tungban Machine Learning Math Course
124 pages
Azure AnalysisServiceOverview
No ratings yet
Azure AnalysisServiceOverview
173 pages
De Mod 5 Deploy Workloads With Databricks Workflows
No ratings yet
De Mod 5 Deploy Workloads With Databricks Workflows
19 pages
Data Lake Bootcamp: Building Reliable Data Lakes
No ratings yet
Data Lake Bootcamp: Building Reliable Data Lakes
29 pages
Matthieu - Lamairesse - Reda - Khouani - Why The Best Serverless Data Warehouse Is A Lakehouse - (DAIWT - PARIS)
No ratings yet
Matthieu - Lamairesse - Reda - Khouani - Why The Best Serverless Data Warehouse Is A Lakehouse - (DAIWT - PARIS)
38 pages
Lakehouse Analytics
100% (1)
Lakehouse Analytics
20 pages
Alteryx + Snowflake Retail Solutions
No ratings yet
Alteryx + Snowflake Retail Solutions
19 pages
ABD22 1st Exam - 6 January - Attempt Review
No ratings yet
ABD22 1st Exam - 6 January - Attempt Review
13 pages
Lab - Qlik Replicate Oracle To Azure Synapse
No ratings yet
Lab - Qlik Replicate Oracle To Azure Synapse
23 pages
Accelerating Data Modernization With Azure
No ratings yet
Accelerating Data Modernization With Azure
7 pages
02 - Introduction To Data Lakehouse Open-Source Technologies
No ratings yet
02 - Introduction To Data Lakehouse Open-Source Technologies
42 pages
8 Steps For A Developer To Learn Apache Spark and Delta Lake PDF
No ratings yet
8 Steps For A Developer To Learn Apache Spark and Delta Lake PDF
35 pages
Enabling Scalable OLAP Directly On A Data Lakehouse Architecture
No ratings yet
Enabling Scalable OLAP Directly On A Data Lakehouse Architecture
39 pages
Mining Your Data Lake For Analytics Insights v3 101420
No ratings yet
Mining Your Data Lake For Analytics Insights v3 101420
16 pages
Edureka Training - DP 203 Data Engineering On Microsoft Azure
No ratings yet
Edureka Training - DP 203 Data Engineering On Microsoft Azure
11 pages
Course12 2 PDF
No ratings yet
Course12 2 PDF
36 pages
Power BI Cheat Sheet
No ratings yet
Power BI Cheat Sheet
10 pages
Hemanshu Kumar Saraf - Resume New
No ratings yet
Hemanshu Kumar Saraf - Resume New
1 page
Profisee Datasheet Integrator 8.5x11
No ratings yet
Profisee Datasheet Integrator 8.5x11
1 page
Troubleshooting Spark Challenges
No ratings yet
Troubleshooting Spark Challenges
7 pages
PLAYBOOK - Data & AI - Migrate and Modernize Data Estate
No ratings yet
PLAYBOOK - Data & AI - Migrate and Modernize Data Estate
5 pages
Cloud Data Warehouse
No ratings yet
Cloud Data Warehouse
7 pages
Modern Data Architecture: Bywhinmon
No ratings yet
Modern Data Architecture: Bywhinmon
10 pages
Spark Use Cases
No ratings yet
Spark Use Cases
2 pages
Inbound 91797242154262642
No ratings yet
Inbound 91797242154262642
7 pages
Databricks Best Practices
No ratings yet
Databricks Best Practices
25 pages
Architecting To Support Machine Learning
No ratings yet
Architecting To Support Machine Learning
47 pages
Cloudera Introduction
No ratings yet
Cloudera Introduction
93 pages
Takeover Full
50% (2)
Takeover Full
92 pages
Data Warehousing and BA
No ratings yet
Data Warehousing and BA
77 pages
Xilinx System Generator For DSP PDF
No ratings yet
Xilinx System Generator For DSP PDF
376 pages
Lecture 01.1
No ratings yet
Lecture 01.1
21 pages
Chap07 DMMvideo
No ratings yet
Chap07 DMMvideo
40 pages
List Spare Part NCR BSB - 6622 - 6622e - Rev1
No ratings yet
List Spare Part NCR BSB - 6622 - 6622e - Rev1
56 pages
Ms Word
No ratings yet
Ms Word
29 pages
Digital Touchpoints - SMO - Digital Economy
No ratings yet
Digital Touchpoints - SMO - Digital Economy
8 pages
Computer Aided Drug Design PPT 5
No ratings yet
Computer Aided Drug Design PPT 5
1 page
ARRI Pro Cam Accs BRCH
No ratings yet
ARRI Pro Cam Accs BRCH
24 pages
E Chapter
No ratings yet
E Chapter
6 pages
Oemaomaa PDF 1734439841
No ratings yet
Oemaomaa PDF 1734439841
34 pages
2550Q-4th2021 - (EB187139-EEFB-462C
No ratings yet
2550Q-4th2021 - (EB187139-EEFB-462C
3 pages
Refrigerated vs. Desiccant Dryers - Choosing The Right One - Rev
No ratings yet
Refrigerated vs. Desiccant Dryers - Choosing The Right One - Rev
48 pages
Data Architect or ETL Architect or BI Architect or Data Warehous
No ratings yet
Data Architect or ETL Architect or BI Architect or Data Warehous
4 pages
px840t 12 Dfu Eng
No ratings yet
px840t 12 Dfu Eng
19 pages
Eagle Point
100% (1)
Eagle Point
5 pages
Imanager U2000 Product Documentation V200R014C50 - 02 20191127111505
No ratings yet
Imanager U2000 Product Documentation V200R014C50 - 02 20191127111505
7 pages
Group2 Ece142
No ratings yet
Group2 Ece142
61 pages
(Campus of Open Learning) University of Delhi Delhi-110007
No ratings yet
(Campus of Open Learning) University of Delhi Delhi-110007
1 page
RC1665 - Mindi Puspita Anggraeni
No ratings yet
RC1665 - Mindi Puspita Anggraeni
5 pages
Logan Keylock - Term 2 Marketing Task 2024
No ratings yet
Logan Keylock - Term 2 Marketing Task 2024
4 pages
FF0332 01 Artificial Intelligence Powerpoint Template
No ratings yet
FF0332 01 Artificial Intelligence Powerpoint Template
8 pages
Term Paper Topic:"Parking Management System"
No ratings yet
Term Paper Topic:"Parking Management System"
8 pages
Backface Removal
No ratings yet
Backface Removal
4 pages
Project Team Building, Conflict, and Negotiation
No ratings yet
Project Team Building, Conflict, and Negotiation
9 pages
PC Specification List
No ratings yet
PC Specification List
12 pages
Chapter 2 - Selected Solutions
No ratings yet
Chapter 2 - Selected Solutions
4 pages
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
From Everand
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
Eric Tome
No ratings yet
HDInsight Essentials - Second Edition
From Everand
HDInsight Essentials - Second Edition
Rajesh Nadipalli
No ratings yet
Databricks Essentials: A Guide to Unified Data Analytics
From Everand
Databricks Essentials: A Guide to Unified Data Analytics
Robert Johnson
No ratings yet

Data Lakehouse, Data Mesh, and Data Fabric - SqlBits

Uploaded by

Data Lakehouse, Data Mesh, and Data Fabric - SqlBits

Uploaded by

Data Lakehouse, Data

Mesh, and Data Fabric

And it did not turn out well:

Let’s prevent that from happening…

Reasons for a data warehouse:

Why You Need a Data Warehouse

Dimension Modelling Physical Design

What happened? What will happen?

Why did it happen? How can we make it happen?

Reasons for a data lake:

• Inexpensively store unlimited data

Databricks Delta Lake

• Reliability: Keeping the data lake and warehouse consistent

Data Mesh Principles

Data Mesh is a concept, not a product

It’s a mindset shift where you go from:

• Centralized ownership to decentralized

Credit to Zhamak Dehghani, Slack: data-mesh-learning.slack.com

Domain design can be a long and

Manufacturing Sales Supplier

Data Mesh: Centralized vs decentralized data architecture

Need to score medium or high in ALL categories

What is Enterprise Scale Analytics and AI?

A scalable analytics framework designed to enable customers building a data platform.

• Usually only some of the data is virtualized, so still mostly centralized

Or more specifically, violates 3 of the 4 Data Mesh principles:

Always ask: What do your users need? Data Mesh

Rethinking the Data Mesh Architecture: Apply it Piecemeal (eckerson.com)

Areas Data Mesh Data Fabric

You might also like