0% found this document useful (0 votes)
14 views36 pages

Data Driven Transformations Fabric

Piethein Strengholt discusses the latest trends in data management, emphasizing the importance of data marketplaces, operational and analytical process integration, and treating data as a product. He outlines a shift towards decentralized data architectures that empower business domains to manage their own data pipelines while ensuring strong governance and quality. The document also highlights the need for agile teams and a structured implementation roadmap to support data-driven transformations in organizations.

Uploaded by

AB
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views36 pages

Data Driven Transformations Fabric

Piethein Strengholt discusses the latest trends in data management, emphasizing the importance of data marketplaces, operational and analytical process integration, and treating data as a product. He outlines a shift towards decentralized data architectures that empower business domains to manage their own data pipelines while ensuring strong governance and quality. The document also highlights the need for agile teams and a structured implementation roadmap to support data-driven transformations in organizations.

Uploaded by

AB
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Data-driven

transformations
Piethein Strengholt
CDO Microsoft Netherlands
Piethein Strengholt

 CDO Microsoft Netherlands


 Former Chief Data Architect @ ABN AMRO
 O’Reilly Author - Data Management at Scale
 Prolific blogger:
https://www.linkedin.com/in/pietheinstrengholt
https://piethein.medium.com
Latest data management trends and development

Data marketplace Blending operational and Embracing data product


envisioning analytical processes thinking
Develop a future in where data is easily Emergence of new data management Data product thinking by treating data as
accessible and quickly available to anyone architectures, such as the data lakehouse, a valuable asset that can be leveraged to
who needs it. which seek to integrate operational and create innovative products and services.
analytical systems into a single platform.

Architectures for specific Staying ahead of Computational and


business units AI-driven innovation community-based data governance
Scalability by domain-orientation to Prepare for the next wave of AI, which Strong governance to improve data quality,
support the needs of specific business characterized by more advanced algorithms ownership, and accountability by
domains. and increased use of foundational models. establishing standardized governance
policies and procedures.
Consider the following situation as a starting point
Typical observations:

• Large backlog: changes to the


environment take very long
• Poor data quality because IT owns
all applications – not the business
• Many point-to-point interfaces:
Business teams Other business prioritizing quick value over
utilizing teams that rely sustainable architecture
applications for on data for their • Inconsistency issues due to data
their operations business duplication and incorrect ownership
activities
• Limited insights or overview; hard to
find data
• Several large (enterprise) data
warehouses for group reporting,
analytics
Central IT • Hidden (self-service) environments
Decentralisation is not a desired state, but an inevitable future of data
Tomorrow’s paradigm shift is a new type of eco-system architecture, which proposes a shift
towards a distributed architecture in which it allows domain-specific data and views “data-as-a-
product,” enabling each domain to handle its own data pipelines.

Consumption-oriented domains
Consumer-
Data Data
specific Use cases
Source-oriented domains

Data Product
Provider Consumer
transformation

Consumer-
Data Data
Data Product specific Use cases
Provider Consumer
transformation

Consumer-
Data Data
Data Product specific Use cases
Provider Consumer
transformation

Data marketplace

Foundational team: providing data management capabilities, supporting


governance and managing domain-agnostic platform infrastructure
New and emerging approaches shouldn’t be seen as rigid or standalone
Lakehouses Generative AI

Consumer-
Data Data
Data Product specific Use cases
Provider Consumer
transformation

Consumer-
Data Data
Data Product specific Use cases
Provider Consumer
transformation

Consumer-
Data Data
Data Product specific Use cases
Provider Consumer
transformation

Data Mesh Data Fabric


Data marketplace

Supporting governance and domain-agnostic platform infrastructure


Small, agile and independent product teams with clear goals and
thinking data as a product are the foundation
A data-driven transformation requires a change in mindset. It impacts team roles, responsibilities and
requires frameworks, such as agile and scrum.
.

• Teams operate independently, treating data as • Teams manage their applications and data.
a product with clear goals. • Teams handle data ingestion and processing in
• Teams, also known as product teams, squads, data platforms.
pods, cells, or agile teams. • Teams ensure the quality of their data products and
• Teams handle their capability, business managed data.
processes, value addition, and expertise. • Teams contribute vital information to the data
• In line with Agile and Scrum, teams efficiently catalog.
manage changes (DevOps). • Teams oversee their use cases and related items
like reports, dashboards, models, and new data
products.
Zoom out and look at what your organization does: most companies
organize themselves by business capabilities, workflows or journeys
Terminal and landside management
Customer and Airport and Safety and Advertising Ground and
Asset Real estate Facilitates
information lounges incident and operations
management management management
management management management marketing management

Airflight and baggage management


Baggage Flight Safety and Airflight Trunk
Terminal Commercial Operations
handling optimization Planning information transportation
management management management
management planning management management management

Staff and personnel management


Recruitment & Personnel Groundcrew Terminal Hall personal Landside Security
Productivity
employee support personal personal management personal personal
management
management management management management management management

Supporting services management


Assets and Cost IT Car leasing Income and Partnership Emission and Regulatory
financing management services and pick up taxes and fare trading procedures
management services management communication management management
Many business domains generate value via digital feedback loops

Standard platform
services

Data
Data integration Data
Data product
integration services
product
services

Standard platform services


Asset management Multi-disciplinary team
with product owner(s),
data owner(s), data
product owner(s)
engineers, scientists,
Standard platform business users, etc.
services
Passenger and
security flow
Data management
integration Data
services product

Service center Baggage handling


(management) management
All business leaders should understand the high-level architecture
while architects and engineers should master the details
Alerting

Domain
Log Governance using Microsoft Purview Data quality
owner DQ Data
Reporting Steward
DQ results
Application Data
owner owner Publish lineage Data product
owner

Serve
Virtualized
Domains
Ingest Tables
Great Azure access
(Events) DBT
Expectations Databricks Gold layer
Data products
Operational
systems Ingest Tables Tables Tables
(Batch)
Bronze layer Silver layer Gold layer
Typically, raw and Typically, cleansed Typically, data that is
different file formats and standardized file ready for
Blob DevOps Data Data
formats consumption
engineer analyst engineer
landing
zone
Orchestrate using Data Factory Policy config store
Medallion architecture
Although the 3-layered design is common and well-known, there are many
discussions on the scope, purpose, and best practices on each of these layers.
Bronze layer Silver layer Gold layer
Typically raw, "as-is" Cleaned, filtered Refined business-level

• Maintains the raw state in the • Uses data quality rules for validation • What enterprises call data products:
structure “as-is” • Usually only functional data consumer-ready / user-friendly data
• Data is immutable (read-only) • Historization is merged (SCD2) • Data is highly governed and well-
• Delivery-based partitioned tables, i.e., • Efficient storage format; Delta documented
YYYYMMDD • Versioning for rolling back • Historization is applied only for the set
• Mostly Parquet or Delta. Sometimes • Handles missing or incorrect data of use cases or consumers
other formats • Usually enriched with reference data • Contains complex business rules, such
• Can be any combination of streaming • Source-oriented, although queryable as calculations and enrichments
and batch transactions and cluttered around subject areas • Efficient storage format; Delta
• May include extra metadata (schema) • Usually used by operational analytical • Versioning for rolling back
• May be fed from a “mediation layer” teams • Might contain additional sub layers for
• Used for debugging, testing sharing or distributing data
= contextual boundary

The right architecture depends on your business needs


Often seen at start-ups Governed for (financially) regulated Seen at governments or legacy companies

Domain Domain Domain Domain Domain Domain

Domain Domain Domain Domain Domain Domain


Source-
Distribution aligned
domain distribution
Domain Domain Domain Domain Domain domain Domain

Domain Domain Domain Domain Domain Domain

Manufacturing and supply chain Typical multinational companies Combined or hybrid approach

Domain Domain Domain Domain Domain Domain Domain

Domain Domain Domain Domain

Domain Domain Distribution


Domain domain Domain
Domain Domain

Domain Domain Domain Domain Domain Domain Domain Domain


Enabling team

Guild
(community for Governance Platform Data Governance
sharing + Security Owner
Data Consultants Platform Data Advisors Board or Guild Data
knowledge and
Engineers engineer Governance Steward
best practices
Lead
Consultants

Policy
enforcement
Share
Data Data
blueprints
Engineers Policies + Purview Central Central Group Repo+ Chief Data Steward
Security metadata Logging Group Policies Scientist

Metadata (lineage + Logging


classifications)

Provider Consumer

Data Application Integration Data Application Data


product owner runtime product owner Steward
owner Analytical owner
Tables Tables Tables Tables Tables Tables Store

Operational Data product Data product


systems
Data IT Platform Data Data
Engineers owner Machine Engineers Scientist
Events Events Function Learning
Github Repo API product

Data Application Data Application Data


Integration
product owner Analytical product owner Steward
runtime
owner reporting owner
Tables Tables Tables Tables Tables Tables

Data product Data product

Data IT Platform Data Data


Engineers owner Operational Streaming OpenAI Engineers Scientist
Events Events
systems
Github Repo API product
Practical implementation of
scalable data management
Think big, start small, scale fast. An implementation roadmap that
supports business transformation
(Strategy) Launch first Add more Scale throughout
Increase maturity
Assessment business use cases business domains the organization
• Assessment of current • Implement first set of • Demonstrate tangible • Onboard more use cases, • Apply automation:
state (architecture, small data & AI use results deploy more data product automatic secure view
services, deployments, cases • Realize consuming services provisioning
projects) • Define foundation: pipeline (taking input as • Enable self-service and • Deploy strong data
• Baseline architecture domain zones, lake output) data discovery governance, setup dispute
• Harvest business use templates and other • Establish data ownership • Offer additional body
cases templates. transformation patterns • Finalize data product
• Host first data
• Evolution to future state • Deploy (data governance body (metadata-driven guidelines
(leverage reference management) landing sessions transformation • Define additional
architectures) zones framework, ETL tools, etc.) interoperability standard
• Define next metadata
• Evaluate transition • Define data product requirements • Enrich controls for • Automated data
scenarios definitions Governance (glossary, consumption processes
• Implement data lineage, linkage)
• Gather • Setup structure of consumption processes • Develop data query, self-
recommendations catalog • Implement consuming service, catalogue, lineage
• Setup a data contract process: approvals, use
• Define future program • Implement first set of repository capabilities, etc.
controls (data quality, case metadata, deploy
board and governance • Define your target data secure views by hand • Data usage dashboards
schema validation)
• Start up communication quality and master data • Establish data governance • Develop additional data
• Setup 'just-enough' management plans marketplace capabilities.
• Gather top-down governance control board
commitment
Start by identifying domains and use cases Capabilities map

1 2 3 4
Define business strategy Identify underlying Define new or refined Plan and
and outcomes solutions use case requirements execute

Customer journey,
product, core Business domain
business process

Core business
outcomes / goals
Value add Value add Value add

Solutions
Application Application Application Application
providing value

Each solution is Use case 1.1 Use case 2.1 Use case 3.1 Use case 4.1
typically
Use case 1.2 Use case 2.2 Use case 3.2 Use case 4.2
composed of
several use cases Use case 1.3 Use case 2.3 Use case 3.3 Use case 4.3
Mapping use cases by showcasing feasibility and business value

Low hanging fruit


High Predictive terminal cleaning
Retail & concessions growth Improve customer search efficiency with AI
Passenger and security flow
Streamline airport
Business management with automated
operations and insights
value detection
Sustainability Reporting
Optimize terminal workforce buffers
Workforce Management Improving baggage lifting-aid use

Noise monitoring
Intelligent airport
maintenance Real-time baggage tracking

Boost employee collaboration with AI-based


search on company data
Low

Low Feasibility High


The goal is to have clean, relevant, and available data so that agile teams can
use it to make better decisions and build better data-enabled solutions
Prioritize and focus your efforts on the most valuable sources of
information because everyone wants to have rapid access to
high quality data using the original domain context

1. Identify what golden source systems are part of your core


business processes. A golden source is the authoritative
application where all authentic data is managed within a
particular context.

2. Identify the genuine and unique data within all golden source
systems because systems often enrich themselves with data
from other systems.

3. Assign data ownership. So, aligning business ownership with


applications and data.

4. Data owners deliver high-quality data products that can be


repeatedly utilized.
The alignment between application domains and business domains isn’t
always perfect

Business Application Business Business


domain is fully
domain aligned for
domain domain An application
domain involves
fulfilling the multiple business
Application needs of a Application domain domains
domain business
domain

Business Business Business


domain A business domain domain
domain might
have multiple Similarly,
Application application
localized Application domain
domain instances or may domains may or
use multiple may not align
Application underlying Application with business
application domain
domain domain boundaries.
domains
Implementation of the first domain using Microsoft Fabric
Best practice is leveraging the Lakehouse Medallion Architecture

Process Process Microsoft


Fabric
(Workspace)
Synapse Synapse
Engineering Warehouse

Operational OneLake
systems Tables Tables Tables

Landing area Bronze Layer Silver Layer Gold Layer


Temporary Ingest Typically, raw Typically, cleaned, Meets functional
storage location Copy data using (technical) and filtered, light consumption
Blob Data Factory unprocessed data modifications requirements
landing
zone
Orchestrate using Data Factory
Implementation of the first domain using Databricks and Fabric
Best practice is leveraging the Lakehouse Medallion Architecture

Process Process Microsoft


Fabric
(SaaS)
Ingest
Copy data using Azure Azure
Data Factory Databricks Databricks

Power BI
Operational
systems Tables

ADLS as Bronze ADLS as Silver Onelake as Gold


Typically, raw and Typically, cleansed Ready for
different file formats and standardized file consumption
formats Synapse
Blob
Data
landing Science
zone

Orchestrate using Azure Data Factory


Typical dimensions and key elements that are important for the first stage
of a data-driven transformation plan

Operating People Change Technology


Data
model and culture management and architecture
1. List out all data 1. Define and initiate 1. Define personas 1. Setup up 1. Develop solution
sources initial target 2. Define roles and transformation team architecture for first
2. Find unique data operating model responsibilities 2. Create high-level use case(s)
sets 2. Define domain 3. Define what training planning 2. Build blueprints or
3. Define data product responsibilities is needed 3. Setup cadence for templates
guidelines and 3. Define platform team 4. Define processes and alignment and 3. Provision platform
standards responsibilities underlying steps communication architecture
4. Build ingestion 4. Define future target 5. Develop playbooks 4. Organize walk-in 4. Implement
pipelines operating model and wikis sessions monitoring, controls
5. Add data quality 5. Define transition plan 5. Plan organizational and policies
6. Setup awareness
rules 6. Organize governance sessions awareness activities 5. Develop DevOps /
6. Describe and meetings DataOps guidance
7. Train and educate
publish data organization 6. Develop transition
products in the plan towards figure
catalog
What is the maturity after the first phase?
Expect the following maturity and user experience
During the first and preliminary phase, your capabilities won't have a high maturity. It's also likely that you have
many manual processes:
• The operating model is defined as central. Your platform and governance teams are in the lead for setting
standards.
• Configuration of data pipelines is done by the central team. In parallel this team coaches and trains other
teams.
• Data products can be initially perceived as too raw or technical when guidance on data modelling is missing.
Allow this to happen but require iterations and improvements before new consumers are onboarded.
• Domain consumers and users are given access to data through manual interfaces. Data access during the initial
stage of building the architecture is typically coarse-grained, so on a services, container or folder level.
• Your teams use a single landing zone and single set of resources.
• Data lineage is either obsolete or lives in islands.
• Your data catalog isn't yet open for domains to self-service publish and maintain metadata.
• Services for interactively querying the data are obsolete. Instead, consuming domains duplicate data into their
own environments.
When progressing forward to next stages, your manual processes must be replaced with automation or self-
service.
Managing a transformation requires a real program with executive support

Program
Enterprise level

management Data strategy


CDO Scrum Lead Executive and domain Mission
master Architect sponsors statement
Objectives and
key results
Portfolio Architecture
requirements Escalations (OKRs)
requirements Roadmap

Quarterly
alignment
business
platform leads
Domain Data
Feedback
Domain level

needs Products Sprint planning


Sprint goals
Spring reviews
Domain User stories
Business Tasks
lead
Product Application Scrum Domain users Definition of
owners owners masters architect done / complete
Other domains
Typical organizational building blocks
Domain team, squad, Domain Domain Domain Domain Domain
cell, agile team, pod team team team team team
Developing journeys,
products, building
Program management domain-specific use cases,
with executive and developing data products,
domain sponsors providing metadata for
lineage, etc.

Data platform team


Responsibilities: deliver domain-agnostic services so domains can distribute data and
Data office team turn data into value, deliver ETL component, deliver catalog, provision central storage
(coordination, with containers for each domain
education, ethics,
vision creation, Integration
Data
alignment, community Reporting Governance and Analytics
Security
management, distribution
planning, etc.

Infrastructure platform team


Responsibilities: monitor, create subscriptions and resource groups, define policies,
cost allocation, deliver non-data related services to other teams.
Example model for data governance and data management roles
Providing Consuming
Domain Domain
Business owner Data Creator Business owner Data User
• Prioritisation • Data entry • Prioritisation • Accessing
• Backlog • Data quality • Backlog • Using
management corrections management

Business
Domain
Data steward Data Owner Application owner Application owner Data Owner Data steward
• Support role • Business user • Manages pipeline • Manages pipeline • Business user • Support role
• Coordination, • Accountable for DQ, • Accountable for • Accountable for • Accountable for DQ, • Coordination,
sharing knowledge business terms technical issues technical issues business terms sharing knowledge

Providing Data Data Consuming


application product Contract application

Enabling Data
Data Services Data advisors Platform engineers Chief Data Officer Architects
Domain Services • Catalog • Advise • Develop • Sets the overall • Set standards
• Metadata databases • Support • Train vision and strategy • Oversee and guide
• Governance tools • Best practices • Coach • Control, enforce
Governance Platform
Data management landing zone
advisor engineer

Metadata repo Log Governance using Microsoft Purview DQ results


DQ
Reporting

Application Data Process


owner owner

Real-time Synapse Synapse Data Data Data


product
Analytics Engineering Warehouse Steward
owner

Reporting
Tables Tables Tables

Operational Landing area Lakehouse Lakehouse Lakehouse Gold


systems Ingest Bronze Silver Distribution to Share to other
Temporary
Copy data using Typically, raw and Typically, cleansed other domains domains
storage location
Data Factory different file formats and standardized Data
file formats science
Tables Business
user
Blob
landing
DevOps Data Data Warehouse Gold
engineer engineer analyst Domain-specific
zone Data
use cases
scientist

Orchestrate using Data Factory


Governance Platform
Data management landing zone
advisor engineer

Metadata repo Log Governance using Microsoft Purview DQ results DQ


Reporting

Data
Steward
Unity
AutoML
Catalog Business
user
DirectQuery Citizen data science
Application Data
owner owner (Ad-hoc, debugging) and ad-hoc SQL
Databricks SQL Data
Data Processing scientist

Data
product Import mode
owner
(high performance)
Azure Delta Live
Databricks Tables
Tables

Operational DevOps Data Data Lakehouse Gold Direct Lake


systems engineer engineer analyst Distribution to (Fabric)
other domains

Tables
Blob Warehouse Gold
Ingest Bronze Layer Silver Layer Share to other
landing Domain-specific
(Azure Data Factory) (ADLS gen2) (ADLS gen2) domains
zone use cases
Microsoft Fabric
OneLake (Workspace)
Orchestrate using Azure Data Factory
What is the maturity after the next phases?
Expect the following maturity and user experience
After the foundation has put in place, expect the following (improved) maturity and user experience:
• The new governance and operating models are defined. These include new roles and responsibilities.
• Domain teams own their data lifecycle from start till end. So, from the application that creates the original data till
data products that are consumed by other teams. Throughout the lifecycle, domain teams own all data models, and
determine what data is suitable for sharing to others.
• Data product development is supported with tooling and data modelling best practices. Guidance is published in
wikis for better collaboration.
• Development teams have visibility into pipelines and how they impact downstream consumers. Developers are
supported by data quality services, which proactively alert users. So when data breaks, all involved users are informed.
• Manual process and pipelines are replaced with templates and services. Instead of hardcoding locations and
parameters, developers store their information as source code in version control. These repositories may include other
artifacts as well, such as test and deployment scripts, libraries/packages, configuration information, and so on.
• Data access policies, which specify who can access what data products, are stored as code or configuration. At this
stage, I still expect your central team to be in full of control of provisioning data access.
• You still use one single landing zone. There's limited variation of what services you offer. All of your domain teams
use the same blueprint configuration(s).
• There's a consistent metamodel that uses a data product and data domain. This metamodel is managed in a data
governance services and ensures that all domain teams know what data products are owned by what domains.
Example model for data literacy curriculum
Organizations should start to build a curriculum to develop data literacy across their organizations. Data literacy is a
fundamental skill that helps individuals to make informed decisions, solve complex problems, and communicate effectively
with others.

Deep-dive Leadership Deep- Deep-dive


Level 3: Intro on Deep-dive Self-
on employee dive on Data
Essential for all architecture Governance Service
architecture attributes ML & AI Engineering
pros

Intro to governance Data Ethics & AI


Level 2: Data products Data & AI
Course on data Course on
Essentials for all Workshop on creation, Course for
marketplace, processes, principles for
data co-workers distribution of data products executives
roles and responsibilities responsible AI

Level 1: Data literacy for everyone


Citizen
Introduction Key course with basics of data management, importance of quality, key concepts
Development
course and the way data management is done within the organization

Foundational for Fundamentals of corporate data strategy, benefits of data management and analytics
everyone Suggested as optional for employees that don’t directly work with information

Optional Essential Advanced Workshop


Example reference model for data management life cycle management
Data Data product Data Data Data
Data publish
creation creation publish request Usage
• Coordinate • Coordinate data • Provide policies and • Approve data usage • Collect and
addressing data needs with data restrictions of usage • Periodic review coordinate feedback
Responsibilities quality issues consumers • Remediate (quarterly) data on usage
of data owner • Fix data gaps • Provide guidance for constraints contracts and sign off
data & AI use cases
Detailed
Wikis or
• Maintain and • Develop data pipeline • Deliver metadata for • Collaborate with data • Collect feedback on Playbooks
Responsibilities describe application in close collaboration data product creation consumer application usage guide users
of application data models with data engineers • Implement controls owners on technical • Remediate through
owner • Process application • Deliver technical for meeting SLAs requirements application issues
changes metadata • Deliver lineage • Monitor and
remediate
• Coordinate • Submit business • Improve • Accomplish • Coach, train data
addressing data terms and definitions documentation consistent data usage users on data
Responsibilities quality issues • Link business terms • Share knowledge by providing consuming side
of data steward • Fix data gaps to technical attributes • Define policies with feedback to data
• Classify sensitive data data owner owner

• Support on closing • Provide feedback • Provide feedback • Deliver data usage • Provide feedback
gaps and data quality information for data • Share AI use cases
Responsibilities issues contract • Share knowledge
of data user • Share requirements • Support fixing with data owners and
constraints on usage other data users
Commands and strong consistent reads
API GraphQL
Gateway

Event streaming Consuming


Transform Events
(Web) Apps
Eventual
consistent reads ML models
IoT
Process

Real-time Synapse Synapse Data SQL Models


Analytics Engineering Warehouse
Change
feed

Analytical
Tables Tables Tables Consuming
Read store
Landing area Lakehouse Lakehouse Lakehouse Gold
Ingest Bronze Silver Distribution to
Temporary
Copy data using Typically, raw and Typically, cleansed other domains
storage location
Data Factory different file formats and standardized
file formats
Operational
Tables
systems
Warehouse Gold
Domain-specific
use cases
Blob
landing
Orchestrate using Data Factory
zone
Now we have high quality data products from our providing domains,
what should we do about our customer domains?
Larger data domain
New data
domain

Point-to-point distribution Dedicated data domains for Combined approaches, such as


managing overlapping requirements consumer-supplier models
Logical architecture for building aggregates
Providing domains Data platform instance Consuming domains
Data Science
Data sources Raw data Data Data
Product Product Reporting, ad-hoc querying
technical,
unstructured Read-optimized,
various file types immutable, organized Data Mart Reporting
around subject areas
Data providing domain
active archive latest historical Data mart
(decentralized ownership) DWH
(Dimensional
models) Data mart

Data Anti-corruption Data


Data Data
source layer product Product Product Pipeline

Aggregate
Highly reusable data Data consuming domain
Data
Data sources Raw
Product
latest historical (decentralized ownership)

Data platform
Data sources Data platform instance Data Data
Product Product
Data provider
Newly created data
What is the maturity after the last phase?
Expect the following maturity and user experience
After the foundation has put in place, expect the following (improved) maturity and user experience:
• Your architecture supports different patterns for data distribution and application integration. Preferably, guidance for
data products, APIs, and events would be aligned.
• New data domains can be onboarded quickly.
• Guidance has been formalized, playbooks have been created, and teams are trained.
• Governance and access control policies are automated. For this, consider setting up a small workflow, an application,
and programmatic access APIs for registration.
• Data usage and processes for turning data into value are standardized. Updated blueprint templates are offered to
your domain teams, from which they can choose what to use for individual use cases.
• Master data management services have been added to your architecture for ensuring end-to-end and cross-domain
consistency.
• Interaction between stakeholders is optimized through data governance bodies. Teams come together on a periodic
basis for triage and overall planning.
• Data products have multiple endpoints, also known as output ports, to cater for a large variety of use cases.
• Extra landing zones or domains can be easily added.
Think big, start small, scale fast. An implementation roadmap that
supports business transformation
(Strategy) Launch first Add more Scale throughout
Increase maturity
Assessment business use cases business domains the organization
• Assessment of current • Implement first set of • Demonstrate tangible • Onboard more use cases, • Apply automation:
state (architecture, small data & AI use results deploy more data product automatic secure view
services, deployments, cases • Realize consuming services provisioning
projects) • Define foundation: pipeline (taking input as • Enable self-service and • Deploy strong data
• Baseline architecture domain zones, lake output) data discovery governance, setup dispute
• Harvest business use templates and other • Establish data ownership • Offer additional body
cases templates. transformation patterns • Finalize data product
• Host first data
• Evolution to future state • Deploy (data governance body (metadata-driven guidelines
(leverage reference management) landing sessions transformation • Define additional
architectures) zones framework, ETL tools, etc.) interoperability standard
• Define next metadata
• Evaluate transition • Define data product requirements • Enrich controls for • Automated data
scenarios definitions Governance (glossary, consumption processes
• Implement data lineage, linkage)
• Gather • Setup structure of consumption processes • Develop data query, self-
recommendations catalog • Implement consuming service, catalogue, lineage
• Setup a data contract process: approvals, use
• Define future program • Implement first set of repository capabilities, etc.
controls (data quality, case metadata, deploy
board and governance • Define your target data secure views by hand • Data usage dashboards
schema validation)
• Start up communication quality and master data • Establish data governance • Develop additional data
• Setup 'just-enough' management plans marketplace capabilities.
• Gather top-down governance control board
commitment

You might also like