Data Driven Transformations Fabric
Data Driven Transformations Fabric
transformations
Piethein Strengholt
CDO Microsoft Netherlands
Piethein Strengholt
Consumption-oriented domains
Consumer-
Data Data
specific Use cases
Source-oriented domains
Data Product
Provider Consumer
transformation
Consumer-
Data Data
Data Product specific Use cases
Provider Consumer
transformation
Consumer-
Data Data
Data Product specific Use cases
Provider Consumer
transformation
Data marketplace
Consumer-
Data Data
Data Product specific Use cases
Provider Consumer
transformation
Consumer-
Data Data
Data Product specific Use cases
Provider Consumer
transformation
Consumer-
Data Data
Data Product specific Use cases
Provider Consumer
transformation
• Teams operate independently, treating data as • Teams manage their applications and data.
a product with clear goals. • Teams handle data ingestion and processing in
• Teams, also known as product teams, squads, data platforms.
pods, cells, or agile teams. • Teams ensure the quality of their data products and
• Teams handle their capability, business managed data.
processes, value addition, and expertise. • Teams contribute vital information to the data
• In line with Agile and Scrum, teams efficiently catalog.
manage changes (DevOps). • Teams oversee their use cases and related items
like reports, dashboards, models, and new data
products.
Zoom out and look at what your organization does: most companies
organize themselves by business capabilities, workflows or journeys
Terminal and landside management
Customer and Airport and Safety and Advertising Ground and
Asset Real estate Facilitates
information lounges incident and operations
management management management
management management management marketing management
Standard platform
services
Data
Data integration Data
Data product
integration services
product
services
Domain
Log Governance using Microsoft Purview Data quality
owner DQ Data
Reporting Steward
DQ results
Application Data
owner owner Publish lineage Data product
owner
Serve
Virtualized
Domains
Ingest Tables
Great Azure access
(Events) DBT
Expectations Databricks Gold layer
Data products
Operational
systems Ingest Tables Tables Tables
(Batch)
Bronze layer Silver layer Gold layer
Typically, raw and Typically, cleansed Typically, data that is
different file formats and standardized file ready for
Blob DevOps Data Data
formats consumption
engineer analyst engineer
landing
zone
Orchestrate using Data Factory Policy config store
Medallion architecture
Although the 3-layered design is common and well-known, there are many
discussions on the scope, purpose, and best practices on each of these layers.
Bronze layer Silver layer Gold layer
Typically raw, "as-is" Cleaned, filtered Refined business-level
• Maintains the raw state in the • Uses data quality rules for validation • What enterprises call data products:
structure “as-is” • Usually only functional data consumer-ready / user-friendly data
• Data is immutable (read-only) • Historization is merged (SCD2) • Data is highly governed and well-
• Delivery-based partitioned tables, i.e., • Efficient storage format; Delta documented
YYYYMMDD • Versioning for rolling back • Historization is applied only for the set
• Mostly Parquet or Delta. Sometimes • Handles missing or incorrect data of use cases or consumers
other formats • Usually enriched with reference data • Contains complex business rules, such
• Can be any combination of streaming • Source-oriented, although queryable as calculations and enrichments
and batch transactions and cluttered around subject areas • Efficient storage format; Delta
• May include extra metadata (schema) • Usually used by operational analytical • Versioning for rolling back
• May be fed from a “mediation layer” teams • Might contain additional sub layers for
• Used for debugging, testing sharing or distributing data
= contextual boundary
Manufacturing and supply chain Typical multinational companies Combined or hybrid approach
Guild
(community for Governance Platform Data Governance
sharing + Security Owner
Data Consultants Platform Data Advisors Board or Guild Data
knowledge and
Engineers engineer Governance Steward
best practices
Lead
Consultants
Policy
enforcement
Share
Data Data
blueprints
Engineers Policies + Purview Central Central Group Repo+ Chief Data Steward
Security metadata Logging Group Policies Scientist
Provider Consumer
1 2 3 4
Define business strategy Identify underlying Define new or refined Plan and
and outcomes solutions use case requirements execute
Customer journey,
product, core Business domain
business process
Core business
outcomes / goals
Value add Value add Value add
Solutions
Application Application Application Application
providing value
Each solution is Use case 1.1 Use case 2.1 Use case 3.1 Use case 4.1
typically
Use case 1.2 Use case 2.2 Use case 3.2 Use case 4.2
composed of
several use cases Use case 1.3 Use case 2.3 Use case 3.3 Use case 4.3
Mapping use cases by showcasing feasibility and business value
Noise monitoring
Intelligent airport
maintenance Real-time baggage tracking
2. Identify the genuine and unique data within all golden source
systems because systems often enrich themselves with data
from other systems.
Operational OneLake
systems Tables Tables Tables
Power BI
Operational
systems Tables
Program
Enterprise level
Quarterly
alignment
business
platform leads
Domain Data
Feedback
Domain level
Business
Domain
Data steward Data Owner Application owner Application owner Data Owner Data steward
• Support role • Business user • Manages pipeline • Manages pipeline • Business user • Support role
• Coordination, • Accountable for DQ, • Accountable for • Accountable for • Accountable for DQ, • Coordination,
sharing knowledge business terms technical issues technical issues business terms sharing knowledge
Enabling Data
Data Services Data advisors Platform engineers Chief Data Officer Architects
Domain Services • Catalog • Advise • Develop • Sets the overall • Set standards
• Metadata databases • Support • Train vision and strategy • Oversee and guide
• Governance tools • Best practices • Coach • Control, enforce
Governance Platform
Data management landing zone
advisor engineer
Reporting
Tables Tables Tables
Data
Steward
Unity
AutoML
Catalog Business
user
DirectQuery Citizen data science
Application Data
owner owner (Ad-hoc, debugging) and ad-hoc SQL
Databricks SQL Data
Data Processing scientist
Data
product Import mode
owner
(high performance)
Azure Delta Live
Databricks Tables
Tables
Tables
Blob Warehouse Gold
Ingest Bronze Layer Silver Layer Share to other
landing Domain-specific
(Azure Data Factory) (ADLS gen2) (ADLS gen2) domains
zone use cases
Microsoft Fabric
OneLake (Workspace)
Orchestrate using Azure Data Factory
What is the maturity after the next phases?
Expect the following maturity and user experience
After the foundation has put in place, expect the following (improved) maturity and user experience:
• The new governance and operating models are defined. These include new roles and responsibilities.
• Domain teams own their data lifecycle from start till end. So, from the application that creates the original data till
data products that are consumed by other teams. Throughout the lifecycle, domain teams own all data models, and
determine what data is suitable for sharing to others.
• Data product development is supported with tooling and data modelling best practices. Guidance is published in
wikis for better collaboration.
• Development teams have visibility into pipelines and how they impact downstream consumers. Developers are
supported by data quality services, which proactively alert users. So when data breaks, all involved users are informed.
• Manual process and pipelines are replaced with templates and services. Instead of hardcoding locations and
parameters, developers store their information as source code in version control. These repositories may include other
artifacts as well, such as test and deployment scripts, libraries/packages, configuration information, and so on.
• Data access policies, which specify who can access what data products, are stored as code or configuration. At this
stage, I still expect your central team to be in full of control of provisioning data access.
• You still use one single landing zone. There's limited variation of what services you offer. All of your domain teams
use the same blueprint configuration(s).
• There's a consistent metamodel that uses a data product and data domain. This metamodel is managed in a data
governance services and ensures that all domain teams know what data products are owned by what domains.
Example model for data literacy curriculum
Organizations should start to build a curriculum to develop data literacy across their organizations. Data literacy is a
fundamental skill that helps individuals to make informed decisions, solve complex problems, and communicate effectively
with others.
Foundational for Fundamentals of corporate data strategy, benefits of data management and analytics
everyone Suggested as optional for employees that don’t directly work with information
• Support on closing • Provide feedback • Provide feedback • Deliver data usage • Provide feedback
gaps and data quality information for data • Share AI use cases
Responsibilities issues contract • Share knowledge
of data user • Share requirements • Support fixing with data owners and
constraints on usage other data users
Commands and strong consistent reads
API GraphQL
Gateway
Analytical
Tables Tables Tables Consuming
Read store
Landing area Lakehouse Lakehouse Lakehouse Gold
Ingest Bronze Silver Distribution to
Temporary
Copy data using Typically, raw and Typically, cleansed other domains
storage location
Data Factory different file formats and standardized
file formats
Operational
Tables
systems
Warehouse Gold
Domain-specific
use cases
Blob
landing
Orchestrate using Data Factory
zone
Now we have high quality data products from our providing domains,
what should we do about our customer domains?
Larger data domain
New data
domain
Aggregate
Highly reusable data Data consuming domain
Data
Data sources Raw
Product
latest historical (decentralized ownership)
Data platform
Data sources Data platform instance Data Data
Product Product
Data provider
Newly created data
What is the maturity after the last phase?
Expect the following maturity and user experience
After the foundation has put in place, expect the following (improved) maturity and user experience:
• Your architecture supports different patterns for data distribution and application integration. Preferably, guidance for
data products, APIs, and events would be aligned.
• New data domains can be onboarded quickly.
• Guidance has been formalized, playbooks have been created, and teams are trained.
• Governance and access control policies are automated. For this, consider setting up a small workflow, an application,
and programmatic access APIs for registration.
• Data usage and processes for turning data into value are standardized. Updated blueprint templates are offered to
your domain teams, from which they can choose what to use for individual use cases.
• Master data management services have been added to your architecture for ensuring end-to-end and cross-domain
consistency.
• Interaction between stakeholders is optimized through data governance bodies. Teams come together on a periodic
basis for triage and overall planning.
• Data products have multiple endpoints, also known as output ports, to cater for a large variety of use cases.
• Extra landing zones or domains can be easily added.
Think big, start small, scale fast. An implementation roadmap that
supports business transformation
(Strategy) Launch first Add more Scale throughout
Increase maturity
Assessment business use cases business domains the organization
• Assessment of current • Implement first set of • Demonstrate tangible • Onboard more use cases, • Apply automation:
state (architecture, small data & AI use results deploy more data product automatic secure view
services, deployments, cases • Realize consuming services provisioning
projects) • Define foundation: pipeline (taking input as • Enable self-service and • Deploy strong data
• Baseline architecture domain zones, lake output) data discovery governance, setup dispute
• Harvest business use templates and other • Establish data ownership • Offer additional body
cases templates. transformation patterns • Finalize data product
• Host first data
• Evolution to future state • Deploy (data governance body (metadata-driven guidelines
(leverage reference management) landing sessions transformation • Define additional
architectures) zones framework, ETL tools, etc.) interoperability standard
• Define next metadata
• Evaluate transition • Define data product requirements • Enrich controls for • Automated data
scenarios definitions Governance (glossary, consumption processes
• Implement data lineage, linkage)
• Gather • Setup structure of consumption processes • Develop data query, self-
recommendations catalog • Implement consuming service, catalogue, lineage
• Setup a data contract process: approvals, use
• Define future program • Implement first set of repository capabilities, etc.
controls (data quality, case metadata, deploy
board and governance • Define your target data secure views by hand • Data usage dashboards
schema validation)
• Start up communication quality and master data • Establish data governance • Develop additional data
• Setup 'just-enough' management plans marketplace capabilities.
• Gather top-down governance control board
commitment