0% found this document useful (0 votes)
210 views18 pages

ProMoTe A Data Product Model Template For Data Meshes

This document introduces ProMoTe, a data product model template for data meshes. It aims to provide modeling support for effectively creating, managing, and describing data products, which are the core components of a data mesh architecture. The document outlines the study design used to develop ProMoTe, which included establishing non-functional requirements, evaluating existing standards, developing the ProMoTe meta-model based on existing standards like DCAT, and validating ProMoTe through an industrial case study with a major Dutch telecom company.

Uploaded by

tazziee8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
210 views18 pages

ProMoTe A Data Product Model Template For Data Meshes

This document introduces ProMoTe, a data product model template for data meshes. It aims to provide modeling support for effectively creating, managing, and describing data products, which are the core components of a data mesh architecture. The document outlines the study design used to develop ProMoTe, which included establishing non-functional requirements, evaluating existing standards, developing the ProMoTe meta-model based on existing standards like DCAT, and validating ProMoTe through an industrial case study with a major Dutch telecom company.

Uploaded by

tazziee8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

ProMoTe: A Data Product Model

Template for Data Meshes

Stefan Driessen1(B) , Willem-Jan van den Heuvel1 , and Geert Monsieur2


1
Tilburg University, Sint-Janssingel 92, ’s-Hertogenbosch, The Netherlands
[email protected]
2
Eindhoven Unversity of Technology, Sint-Janssingel 92, ’s-Hertogenbosch,
The Netherlands

Abstract. As the shortcomings of monolithic data platforms such as


data lakes are quickly becoming more grave and evident, many organi-
sations are struggling to transition to data meshes, making data avail-
able for consumption in a decentralised manner. However, the emerging
data mesh paradigm fails to provide sufficient (modelling) support to
effectively create, manage, and describe data products, the architectural
quanta of a data mesh. In this work, we introduce the data Product
Model Template (ProMoTe): a formal meta-model of data products
that is fully aligned with a data mesh. ProMoTe was devised, explored
and partially validated based on industry requirements in tandem with
academic literature and is currently being used by a major Dutch Tele-
com company to enable their data mesh transition.

Keywords: Data Product · Data Mesh · Modelling · Industry Report

1 Introduction

Despite the promises of big data to revolutionise the way companies do busi-
ness, many organisations still grossly fail to fully capitalise on the data they are
generating. Data Meshes are being developed as alternatives to the traditional
monolithic architectures-e.g., data lakes and warehouses-that are the norm for
dealing with big data and which critics have pointed to as bottlenecks in big
data management [7,9]. The main downside of these monolithic approaches is
that they fail to scale with the number of data sources on the one hand and
data science and analytics use cases on the other [9,19]. Data meshes, which are
domain-oriented alternatives that revolve around data products, in theory, pro-
vide a solution to this problem of scalability because each data product offered
in a data mesh is provided by an owner responsible for optimising the data for
consumption. Data Products can be defined as the combination of all responsibil-
ities and functionalities required to optimally exchange data on a platform. When
viewed from this perspective, data products mirror the successful transition in
software development from monolithic software solutions to (micro)services [3].

c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023


J. P. A. Almeida et al. (Eds.): ER 2023, LNCS 14320, pp. 125–142, 2023.
https://doi.org/10.1007/978-3-031-47262-6_7
126 S. Driessen et al.

Because of these shortcomings, data meshes have attracted significant inter-


est from industry, as can be observed from the amount of grey literature that is
becoming available online on the topic [6]. However, a persistent theme in these
sources is that there are no instances of completed data meshes populated with
mature data products. We find four main reasons for this lack of maturity. First
is the topic’s novelty: The term data mesh was first coined by Zhamak Dehghani
and posted in a blog [19] in 2019 and has only recently gained widespread atten-
tion in the scientific and industrial community. Second, the greatest challenges
with transitioning from a monolithic data platform to a data mesh are probably
organisational rather than technical [2]. Shifting the necessary responsibilities
and capabilities from centralised teams to the domains that generate the data
is an immense task that comes on top of the technical challenges. A third prob-
lem we see is that building data products, and consequently data meshes, is
a holistic problem. Existing data mesh (reference) architectures include a self-
service layer for creating data products with at least a dozen components [6].
This complexity leads to a chicken-and-egg problem. On the one hand, well-
developed self-service components are desired for creating and maintaining data
products. On the other hand, these same data products are necessary for the
agile development of self-service components. Moreover, the functionality of the
different components frequently depends on each other. Finally, there is ambi-
guity surrounding the concept of data products, and a clear definition appears
to be missing. The term data product has been around much longer than data
meshes and has meant different things in different contexts. For example, some
authors take the view of a data product as data that can be bought or sold on
a market (see e.g., [8]). However, such a view does not suffice for data products
in a data mesh, which are generally exchanged within an organisation1 .
At the same time, data products in data meshes bear an undeniable resem-
blance to the concept of data as a service (DaaS), which has been around as
a concept since around 2010 [11]. One could argue that data products are the
natural successors to DaaS, with one crucial difference being that they live in a
completely different environment; i.e., one that is designed for data (the mesh)
and not as a generic software service in a service-oriented architecture.
In this paper, we introduce the data Product Model Template (ProMoTe),
which is a technology-agnostic meta-model to specify and define data prod-
ucts in a data mesh (like) environment whilst embracing and extending existing
standards for modelling (meta-)data. The rest of this paper is organised as fol-
lows. The next section introduces the study design that has been systematically
applied for the development and application of ProMoTe. In Sect. 3, we then
discuss data products and the non-functional requirements they should meet.
Based on these requirements, Sect. 4 discusses existing standards and their suit-
ability for describing data products in a data mesh. Then, in Sect. 5, we intro-
duce the main concepts of ProMoTe and illustrate these with an example data
product. Afterwards, Sect. 6 discusses our validation efforts and lessons learned

1
Although they can be extended to external data markets or data spaces [12].
ProMoTe: A Data Product Model Template for Data Meshes 127

by highlighting how ProMoTe is leveraged in an industrial setting at a major


Telco company. Finally, in Sect. 7, we discuss the paper’s main takeaways and
propose future work.

2 Study Design
From a high-level perspective, our research methodology
for the development and evaluation of ProMoTe con- 1 Establish Non-Functional
Requirements
sists of four phases, as shown in Fig. 1. A fifth phase is From Literature.
currently being executed, whereby a standardised method- From Industry.

ology for building data products uses ProMoTe to design


blueprints of data products before instantiating them and 2 Evaluate Existing
Standards
will be presented in future work. As the first step, we From Literature.
established (non-functional) requirements from both lit- From Industry.

erature and industry. These guided the development of


ProMoTe, because a useful meta-model is such that 3 Develop ProMoTe

data products that comply with it meet these require- Adapt from DCAT standard.
Cross-reference with
ments. Then, we looked at existing (metadata) model industry standards.

standards to evaluate their suitability for describing and


modelling data products that meet the functional require- 4 Validate ProMoTe

ments. Since we concluded that none of these models was Link components to

a good fit for data products, we then based ProMoTe


requirements.
Validation through
construction.
on the Data Catalog Vocabulary (DCAT), which is one
of the more general, well-established standards, paying
5 (Future) Develop
special attention to new industry-proposed standards for Methodology for
creating ProMoTe-
comparison. Finally, we demonstrate the applicability of compliant Data
Products.
ProMoTe in two ways: first, by explicitly linking its com-
ponents to the established non-functional requirements,
and secondly, by constructing technical prototypes based Fig. 1. Methodology
on ProMoTe in an industrial context at a large Telco Overview.
provider in the Netherlands.

2.1 Establish Non-functional Requirements

One of the goals for the development of ProMoTe is to make it easier for
organisations to formulate relevant functional requirements for their data prod-
ucts by relating non-functional requirements to architectural components. As a
starting point for the non-functional requirements for data products, we used the
so-called DAUTNIVS usability properties proposed by Dehghani [2], described
below in Sect. 3. To ensure relevance for the industry and to make it easy to
use our standard for metadata management, we extended these requirements
through collaborations with two industrial partners, where we conducted inter-
views with various stakeholder experts.
Our first collaboration was with a major German automotive company and
yielded a new set of industry-driven requirements for data products [3]. More
recently, in order to extend the external validity of these requirements, we set up
128 S. Driessen et al.

a new collaboration with a major Dutch TelCo provider, where we interviewed


20 expert stakeholders. The interviews were semi-structured and followed the
same methodology described in our previous work [3]. Both companies operate
with over a billion euros in revenue and have thousands of employees organised
across different departments with their own IT systems and data landscape.
Furthermore, both companies are in the early stages of transitioning away from
their monolithic data landscape towards a data product-based data mesh-like
architecture.

2.2 Comparative Analysis of Existing Standards


After establishing the requirements for data products, we examined existing
standards to assess their applicability for describing data products in a corporate
data market or data mesh. Section 4 discusses the models that were considered
and their potential for describing data products.

2.3 Developing P RO M OT E
We selected the DCAT2 ontology as a basis for creating ProMoTe. DCAT
offers two advantages for our purposes:
– DCAT is a well-established standard for describing data catalogues. One of
the explicit steps of creating a data product is to ensure that it is well-
described in a data catalogue for potential consumers to find it.
– Many existing standards for describing data product-like entities are DCAT
compliant. By ensuring that ProMoTe is DCAT compliant, we promote its
interoperability with other standards that exist e.g., to describe data sets [21]
or data contracts [16].
We extended the DCAT concepts of resources, distributions and datasets
with the new concepts and relations necessary for describing data products that
meet the requirements established in step 1. This led to the inclusion of input-,
output- and control ports, as well as explicit use case modelling. Finally, we
created a mapping from the entities in the meta-model to the requirements they
address. This mapping makes it easier for data providers to understand which
components to build for their data product and how to prioritise them according
to the product’s context. For example, for a data product with unknown value,
it can be relevant to focus first on establishing and describing use cases, whereas
a data product resulting from a new business process might concentrate on
discoverability first and focus on accessibility and interoperability later.

2.4 Derive and Implement Metadata Template from P RO M OT E


In order to evaluate the usefulness of ProMoTe we collaborated with KPN3 ,
a major Dutch Telco company. The company has started a transition to a data
2
https://www.w3.org/TR/vocab-dcat-2/.
3
https://www.kpn.com/.
ProMoTe: A Data Product Model Template for Data Meshes 129

mesh and is in the process of setting up a data catalogue in which data from
all of its many different data platforms will be available for discovery. The data
catalogue is implemented in DataHub4 , which allows both push-based and pull-
based metadata ingestion from a wide variety of sources such as data lakes,
DBMS, data warehouses, etc.
In addition to making data sets discoverable on the data catalogue, a con-
certed effort is made to promote the creation of data products which should
be registered on the same data catalogue. For this process, we have created a
metadata template based on ProMoTe and implemented this with a proof-
of-concept within the DataHub business glossary, where it is used by new data
providers. Filled-out instantiations of this template then populate the data cat-
alogue and feed into the workflow of the centralised data governance team.
Section 6 discusses these applications in more detail.

3 Data Product Requirements


Considering the lack of well-defined data products in literature that can be used
as a reference, we consider good data products to be those that meet the needs of

tru
Infras cture

Fig. 2. A Data Product is a combination of data, code, metadata, and infrastructure.


Data products are exposed through ports and aim to achieve several non-functional
requirements.
4
https://datahubproject.io/.
130 S. Driessen et al.

their stakeholders. To ensure that our model can be used as a reference for such
data products5 , we used the DAUTNIVS usability attributes defined by Zhamak
Dehghani [2] to evaluate potential models for describing data products, which we
verified through interviews as described below. These attributes are the golden
standard for data product requirements in industry, and academia. DAUTNIVS
is an acronym standing for Discoverable, Addressable, Understandable, Trust-
worthy & Truthful, Natively Accessible, Interoperable, Valuable and Secure (see
Fig. 2). An extensive description of these requirements is beyond the scope of
this work and can be found in the original source [2].
To ensure the grounding of our work both in academia and industry, we addi-
tionally interviewed 30 stakeholders from our two industrial partners transition-
ing from a centralised monolithic data architecture to a decentral data product-
driven architecture. Through these interviews, several additional requirements
were identified, as shown in Table 1, which extends our previous work that
established requirements for metadata management for data products [3]. We
found that most of the requirements in Table 1 can be directly explained by

Table 1. Seven industry-driven requirements for any practical formal data product
meta-model were established. All of these can be related to the DAUTNIVS+ non-
functional requirements.

Req. Industry-Driven Requirement for Meta-Models DAUTNIVS+


R1. The model should serve as a baseline for creating D, A, U, T,
standardised data products or assets and, consequently, N, I, V, S,
provide a complete overview of different data product Feedback-Driven
components with direct relations to DAUTNIVS+
R2. Data products should be related semantically, even D, U, I
when crossing domain- or organisational boundaries.
when crossing domain- or organisational boundaries.
The model should incorporate relations with other
(existing) business ontologies
R3. Data in data products should be related on a technical D, U, I
level whenever possible. The model should incorporate
schema relations to reflect this
R4. The model should show the lineage of the data assets D, A, U
R5. The model should incorporate the promises and T, S
agreements between the data provider and the consumer.
Either as separate promises or in a data contract
R6. Data consumer feedback should be an explicit part of the V,
model Feedback-Driven
R7. Data products should shorten the lines between providers Feedback-Driven
and consumers. The model should demonstrate this
relation by containing both actors
R8. The model should be applicable for data products at D,A, U, T,
different levels of maturity N, I, V, S,
Feedback-Driven

5
and consequently, for describing such data products.
ProMoTe: A Data Product Model Template for Data Meshes 131

trying to achieve the DAUTNIVS, for example: R2. states that data should be
related to (existing) business ontologies. This requirement can easily be explained
as wanting to make the data product more Discoverable, Understandable, and
Interoperable.
One interesting conclusion that we drew from our interviews was a clear need
to establish and model one or more feedback loops between data providers and
data consumers. This feedback can be part of the effort required for establish-
ing data product value (R6), but more importantly, it can help organisations
to prioritise data assets to turn into data products (R8) and improve exist-
ing data products (R7). Moreover, stakeholders expressed concerns over the
expected resource investments required to build data products that do not weigh
up against their uncertain value. We believe an agile, feedback-driven approach
should work best when developing and maintaining data products, similar to
best practices in software development [1]. For these reasons, we take as non-
functional requirements the DAUTNIVS usability attributes + being feedback-
driven. In the rest of this paper, we will refer to these requirements as the
DAUTNIVS+ requirements. Figure 2 shows a visualisation of the data product
as defined by Dehghani as a combination of data,code, metadata and infrastruc-
ture which aims to achieve several non-functional requirements.

4 Related Standards

This section discusses two types of related literature. First, we briefly introduce
existing academic coverage of data mesh and data markets, which has focused
mainly on architectural aspects. Afterwards, we discuss tangentially related stan-
dards for describing and defining data that can be exchanged, similar to data
products in a data mesh.

4.1 Data Markets and Data Mesh

Data markets, which facilitate the exchange of data products between indepen-
dent parties, have received significant attention from the academic community
[4]. However, internal data marketplaces and data meshes appear more obscure,
and most of the academic work related to these platforms focuses on establishing
architectures and architectural patterns [5,9,10]. Additionally, there is the work
by Dehghani [2], who first coined the term data mesh and who provides both an
excellent conceptual overview of the topic and notes the need for standardised
description models for data products. As far as the authors are aware, only one
(grey) literature survey exists, which is in pre-print at the time of writing and
extensively covers the data mesh topic [6]. In this survey, Goedegebuure et. al.
identify research challenges for data mesh, which include a need for: 1) Stan-
dardisations, 2) Tools for Data Mesh development and operation, and 3) Data
Product Lifecycle Management. In this work, we make steps towards address-
ing these requirements by providing a meta-model for describing data products
that can help describe and develop data products in a mesh and facilitate the
collection of information needed for data product lifecycle management.
132 S. Driessen et al.

4.2 Work on Data Standards

Standards and vocabularies for describing data are well-researched and under-
stood in today’s age of big data. Among the most comprehensive of these stan-
dards are the Data Catalog Vocabulary (DCAT)6 Dublin Core Terms (DCT)7
and Simple Knowledge Organization System (SKOS)8 , which can serve as the
basis for describing almost any type of data. Even though these standards gener-
ally do not consider data products, the DCAT vocabulary is especially interesting
because it focuses on describing data in the context of data catalogues, which
are a crucial part of data mesh architectures [6].
Other standards have been developed for describing data, specifically in data
markets. These are often specific to the field in which they are developed, such
as the FIESTA-IOT ontology [18], the Common Vehicle Information Model
(CVIM) [14] and the spatial standard developed by Sakr [15]. However, like most
of the standards above, these focus heavily on describing only data in a standard-
ised manner rather than describing the product aspects of data products.
Finally, we note that there have been previous initiatives to describe data as
well as the context in which it can be exchanged. An excellent standard for describ-
ing data as a service is DEMODS, which was introduced by Vu et. al. in 2012 [20].
One of the main benefits of DEMODS is that it explicitly combines the service
aspects of data as a service, such as API descriptions, and the data aspects, such
as different field descriptions. These aspects are, of course, still very relevant when
describing data products. However, as previously argued, data products are more
than just data as a service, and DaaS standards are insufficient for our purposes.
More recently, there have been some attempts to describe data products on com-
mercial data markets (e.g., [13,17]). These standards add descriptions for formal
agreements and prices to data products; however, they often neglect the service
aspects and consider almost no aspects for describing the data.
In addition to academic work, several industry standards exist. Besides
vendor-specific standards (e.g., Google9 , Amazon10 , Microsoft11 ) two open-
source, generalised data product standards aim to describe data products in a
data mesh environment. The Data Product Specification (DPS) was developed
by agile-lab12 to provide a technology-independent standard for defining data
products, much like ProMoTe. However, it is unclear how the standard was
developed, and some crucial entities, such as input ports, are missing. Addition-
ally, the data product descriptor specification (DPDS)13 is an excellent standard

6
https://www.w3.org/TR/vocab-dcat-2/.
7
https://www.dublincore.org/specifications/dublin-core/dcmi-terms/.
8
https://www.w3.org/TR/skos-reference/.
9
https://cloud.google.com/architecture/describe-organize-data-products-resources-
data-mesh#the data product template.
10
https://docs.aws.amazon.com/marketplace/latest/userguide/data-products.html.
11
https://learn.microsoft.com/en-us/azure/cloud-adoption-framework/scenarios/
cloud-scale-analytics/architectures/what-is-data-product.
12
https://github.com/agile-lab-dev/Data-Product-Specification.
13
https://dpds.opendatamesh.org/resources/specifications/1.0.0-DRAFT/.
ProMoTe: A Data Product Model Template for Data Meshes 133

that builds on the OpenAI initiative. However, while the DPDS is more exten-
sive than the DPS, how it was developed and whether or how it relates to any
non-functional properties (such as DAUTNIVS+) is unclear. More importantly,
DPDS was not built to be interoperable- or compliant with existing ontologies
such as DCAT and DCT, which could prove to be a crucial benefit for ProMoTe
in terms of extensibility and interoperability with other existing standards.
Although each of these standards offers valuable perspectives on describing
data in specific environments, we can conclude that no standard currently exists
that: 1) describes and defines data products in a data mesh (like) environment,
2) is technology-independent, and 3) extends existing standards for modeling
(meta-)data. ProMoTe addresses all these points and refers explicitly to the
DAUTNIVS+ non-functional requirements for data products. This means it can
be used to define and describe data products in a data mesh, as illustrated in
the next section.

5 P RO M OT E

In this section, we introduce ProMoTe by discussing a hypothetical yet realistic


data product use case in a Telco company. Figure 3 below shows an overview in
UML of the (meta)classes and relations in ProMoTe. A full specification and
explicit linkage to the DAUTNIVS+ requirements can be found in our online
repository14 .

5.1 Overview

ProMoTe extends the dcat:Resource class with a subclass: pmt:Resource. These


pmt:Resources come in three varieties: the pmt:Dataset, which is a subclass of
dct:Dataset; the pmt:Dataproduct, which is the architectural quantum of a data
mesh and the main focus of ProMoTe; and the pmt:UseCase, that describes
how the data is consumed. Data Products make available one or more datasets.
Each dataset has one or more physical representations (distributions), which are
exposed through output ports.
Each resource is managed within a pmt:Domain that maintains semantic
domain knowledge in pmt:InstitutionalKnowledge. Data products ingest data
through one or more pmt:InputPorts and are governed through pmt:policies man-
aged through pmt:ControlPorts. Finally, data products make available one or
more dct:Distributions of pmt:Datasets through an associated pmt:OutputPort.
For each output port, an associated pmt:DataContract establishes the conditions
that apply when consuming the underlying data.

5.2 Modelling Data Products with P RO M OT E


The customer data product is created and maintained by the company’s
customer service department and instantiates the diagram of ProMoTe,
14
https://github.com/Stefan-Driessen/ProMoTe.
134 S. Driessen et al.

Fig. 3. A UML-representation of ProMoTe

depicted in Fig. 3. The customer service department handles the onboard-


ing of customers who subscribe to a product and keeps this informa-
tion in a pmt:Dataset consisting of three tables. Figure 4 shows a simple
pmt:logicalSchema with these three tables and their internal relations.
Following ProMoTe, the data product needs to be described with relevant
aspects from the pmt:Resource and pmt:DataProduct. This begins by assign-
ing “Alice” from the customer service pmt:domain as the pmt:dataProvider.
Alice then chooses the dct:title“Subscription Data Product” and fills out rele-
vant metadata for the data catalogue, such as a short dct:description and some
dcat:keywords, and assigns the dct:language to “English”. Moreover, in their
description, Alice explicitly references the business glossary of the customer ser-
vice domain through pmt:institutionalKnowledge, which contains standardised
semantic definitions of what customers, products, and subscriptions mean in the
customer service pmt:Domain.
The dataset itself is stored physically in a SQL-based database during
the customer onboarding process. Every dcat:Distribution of the pmt:Dataset
that will eventually be offered by the customer of the pmt:DataProduct will
source its data from this database. Alice describes this using ProMoTe in the
pmt:sourceSystem property when describing their data product’s pmt:InputPort.
ProMoTe: A Data Product Model Template for Data Meshes 135

Fig. 4. The pmt:logicalSchema of the dataset in the customer data product.

It quickly becomes apparent that the customer data product has three poten-
tial pmt:UseCases with corresponding consumers. These use cases are the first
major components that make the data product sensitive to feedback (from the
consumers). Use case A is presented by Bob, who is from a different team in
the customer service pmt:Domain and wants to use the data to allow customers
to cancel their subscriptions. Use case B comes from Charlie, in the marketing
pmt:Domain, who wants to run a targeted advertisement campaign and needs to
perform customer segmentation. Finally, use case C is presented by Dave from
the website pmt:Domain that builds and maintains the company’s website. Here,
customers can log in, find information about their subscriptions, and update the
information they provided when they subscribed. Based on these pmt:UseCases,
Alice builds and describes her data product using ProMoTe. Figure 5 illustrates
this process as an instantiation of the ProMoTe meta-model.

Use Case A. Normally, there would be no need to create a data product


for a consumer from the same domain. Presumably, Bob is familiar with the
pmt:sourceSystem (i.e., the SQL database) of the customer service pmt:Domain.
However, since Alice knows there are other consumers, they decide to put
in the effort of creating an output port that can be reused for future use
cases. They describe the dcat:Distribution of this pmt:Dataset through the
pmt:physicalSchema as it exists on their SQL database. Moreover, they create
a pmt:OutputPort that pmt:exposesDistribution this distribution through an
API that data consumers can call if they follow the pmt:consumeInstructions.
In addition to the pmt:OutputPort, they create a pmt:DataContract. In this
data contract, they describe the terms of service in a pmt:SLA and any qual-
ity checks already performed in the customer service pmt:Domain’s database as
pmt:providerPromises. Moreover, the company has a pmt:policy that any per-
sonally identifiable information (PII) can only be shared in compliance with the
GDPR. Therefore, Alice will have to add a clause to the pmt:DataContract in
the form of a pmt:consumerPromise that this data can only be consumed for
purposes for which the data subject has given consent.
At the same time, Alice asks Bob to describe their use case. This is useful
for improving the Discoverability and Understandability of the data product
by providing useful information for other potential data consumers. However,
Alice is also interested in feedback from their consumers in the form of the
136 S. Driessen et al.

Fig. 5. The subscription data product, described with ProMoTe. Some aspects, such
as data contracts, have been abbreviated for improved legibility.

pmt:estimatedValue of Bob’s use case, which justifies their effort for creating
the data product. In the same vein, Alice wants to know how long Bob plans
to consume the data product. Since customers should always be able to cancel
their subscriptions, Bob tells Alice that their intended pmt:plannedEndDate is
“indefinite”.

Use Case B. For use case B, Alice realises Charlie cannot use the same output
port as Bob because of the aforementioned pmt:policy, which only allows the pro-
cessing of customer data for marketing purposes if they have specifically opted
in. Because of this, Alice creates another dcat:Distribution of the pmt:Dataset in
their SQL-based database. In this distribution, customers who have not opted-in
to targeted marketing or segmentation are anonymised by removing all values in
their columns. Alice makes sure to describe how the pmt:physicalSchema includes
anonymised data and creates another pmt:OutputPort to expose this distribu-
tion at the same pmt:endpointURL (e.g., an API) as for use case A, but with
different access rights. The access rights are captured in the pmt:DataContract.
Moreover, the data contract notes that this output port is suitable for any use
case that consumes the data for marketing purposes in a pmt:consumerPromise.
Finally, Charlie describes their pmt:UseCase in much the same manner as Bob
did for use case A.
ProMoTe: A Data Product Model Template for Data Meshes 137

Use Case C. Based on customer interviews, Dave tells Alice that timeliness is an
essential pmt:SLO for their use case: it’s more important that data is quickly avail-
able, even if it might take a while to update the customer information. Therefore,
Alice decides to create a Kafka dcat:Distribution for streaming that prioritises
speed and completeness over accuracy. The pmt:endpointURL of the correspond-
ing output port refers to a topic that pmt:exposesDistribution this distribution,
and Dave (or any other data consumer) can subscribe to this topic following
the pmt:consumeInstructions. Alice describes all the pmt:providerPromises they
make over this output port, such as the timeliness pmt:SLO constraint in a
pmt:DataContract. Additionally, since the customer data contains the same per-
sonally identifiable information (PII) as use case A, Alice includes the same lim-
itation in a pmt:consumerPromise. Finally, just like in the previous use cases,
Alice asks Dave to describe their use case for improved discoverability and under-
standability of the data product and establish its pmt:estimatedValue.
Throughout the process of creating the customer data product, Alice is sup-
ported by the infrastructure-as-a-service that the platform providers of her com-
pany provide as part of the data mesh ecosystem, such as a data catalogue for
registering the dcat:CatalogRecord of her data product and access management
tools for her output port. In particular, the company offers tools that help mea-
sure and enforce the pmt:policies captured in the pmt:DataContract. Updating
and managing these tools is done through the pmt:ControlPorts, which can be
accessed both by Alice as the data provider and also by members of the federated
data governance team. This makes the control port the second major compo-
nent that makes Alice’s data product sensitive to feedback (from the platform
providers).

6 Validation

In order to validate that ProMoTe can accurately, consistently and robustly


describe data products, we employed validation through formalisation, validation
through experimentation and validation through construction.
Firstly, we have formally specified and verified ProMoTe in UML (see, for
example, Fig. 3 and its instantiation Fig. 5), and RDF (see GitHub)(See footnote
14). This has allowed us to ascertain internal and construct validity. Moreover,
we have formalised the relation between the components of ProMoTe and
the non-functional requirements in the formal specification provided online(See
footnote 14).
As mentioned above, the fact that ProMoTe is technology independent is
an advantage, as it allows organisations to implement its logic to fit their own
architecture, organisational structure, and technical infrastructure. We envision
different organisations having different physical implementations of data policies,
data contracts, data storage infrastructure, metadata storage, etc. To validate
that ProMoTe can help develop such physical implementations, we created sev-
eral technical prototypes within the company based on ProMoTe. Specifically,
metadata entities were created for data products, use cases, and output ports on
138 S. Driessen et al.

Fig. 6. A screenshot of the ProMoTe-based technical prototype data product meta-


data template implemented in the KPN data catalogue.

the company’s data catalogue. These entities came with corresponding metadata
templates that aspiring data product providers can fill out to help them describe
new data (and existing) data products. These metadata entries then enabled the
development of new prototypes, such as an early data governance dashboard for
tracking ownership of the various data products.
The company uses the Datahub (See footnote 4) data catalogue to gather
metadata about its datasets and distributions from various source platforms
across the company. For each dataset and distribution, metadata is ingested
and stored in an entry within the DataHub model, which runs on a GraphQL
backend. To implement the metadata entry for data products, output ports and
use cases, we used the business glossary functionality of DataHub, Fig. 6 shows
a screenshot of the template in DataHub. This approach had several advantages
over editing DataHub’s GraphQL model: 1) It allowed for rapid changes based on
user feedback; 2) It resulted in an intuitive, interactive environment for users to
fill out the template and create their own metadata entries; 3) Whenever entities
were not yet implemented in the catalogue, or existed on external platforms (such
as the companies institutional knowledge) hyperlinks could be easily leveraged
to reference these external resources.
The data product metadata template also came with instructions on how to
relate to other metadata entries, such as owners, domains, datasets, use cases,
output ports and institutional knowledge. Figure 7 shows an example data prod-
uct and its relations to a dataset metadata entry, an owner and a domain. Rela-
tions to other entries, such as use cases, output ports and institutional knowledge,
are captured in the same manner on different tabs of the data product metadata
entry.
ProMoTe: A Data Product Model Template for Data Meshes 139

Fig. 7. An anonymised screenshot of a filled-out metadata template in DataHub and


how it relates to distributions.

Another application of ProMoTe that feeds directly from the metadata


template is the construction of a prototype governance dashboard. The dash-
board, shown in Fig. 8, runs in Power BI and feeds directly off the GraphQL
backend of the data catalogue. Having a formal meta-model of data products in
ProMoTe allows the company’s governance team to keep track of the status
of important data product characteristics. In the prototype, this translates to
keeping track of (domain- and individual) ownership and encouraging new data
providers to provide, at minimum, a description of their data product.

6.1 Lessons Learned


Based on the implementations described above, we present a brief overview of
the lessons learned when applying ProMoTe in practice in industrial settings
and how these lessons affected the (use of) ProMoTe.
Lesson 1. Set maturity levels for developing and describing data products. The
industrial partners wanted to categorise existing data entities and data products
under construction in different maturity levels. We addressed this by relating

Fig. 8. An anonymised screenshot of the Power BI dashboard technical prototype used


by the centralised data governance team in the company.
140 S. Driessen et al.

maturity levels to non-functional requirements: level 1 focusing on Discoverable


and Addressable, level 2 focusing on Understandable, level 3 focusing Native
Accessability, Value and Feedback-Driven, and level 4 focusing on Truthful-
ness & Trustworthiness, Interoperability and Security. Moreover, we added a
pmt:maturityLevel to the ProMoTe meta-model so it can be described.
Lesson 2. Emphasise relevant components first. When describing and developing
the first data products, the full ProMoTe meta-model came across as over-
whelmingly much for new data providers. To address this, we organised infor-
mation sessions with relevant stakeholders and only used a subset of the fields
in ProMoTe for the metadata template described above.
Lesson 3. Address interoperability top-down and bottom-up. Achieving interop-
erability between data products has proven challenging. Nevertheless, we have
found two ways to address this problem. The first relies on interoperability
between pmt:InstitutionalKnowledge entities, e.g., through the use of knowledge
graphs, and the second relies on traditional techniques for achieving interoper-
ability between datasets such as the use of foreign keys [3].
Lesson 4. Integrate ProMoTe with data mesh architecture. Despite demon-
strating which components must be built, ProMoTe cannot be used out-of-
the-box for building data products. For this, a clear overview of the various tools
and infrastructure-as-a-service components (e.g., as described by Goedegebuure
et. al. [6]) and how they relate to the individual components of ProMoTe is
necessary.

7 Conclusion
In this paper, we have introduced ProMoTe (See footnote 14), a technology-
agnostic meta-model for specifying, developing and managing data products in
a data mesh. ProMoTe is DCAT-compliant and can be easily combined with
existing data catalogues for describing data. Moreover, ProMoTe is explicitly
linked to non-functional requirements gathered from academia and industry,
making it more likely to describe valuable data products. We believe ProMoTe
can be used to instantiate the different components of data products in various
organisational settings. To validate this, we instantiated the metadata entries
of data products and their components in a data catalogue in an industrial
environment at a large Telco company.
The results in this paper are core results in nature; more extensions and
refinements are needed in various directions. Firstly, we wish to establish external
validity by testing our approach in other industrial settings. Moreover, we intend
to demonstrate the applicability of ProMoTe by developing instantiations of
all its components, not just metadata entries on a data catalogue. Eventually,
we hope to extend this approach to define a method and/or patterns that assist
data product developers in effectively creating, maintaining and improving data
products. Finally, another aspect for future work is to focus on the integration
ProMoTe: A Data Product Model Template for Data Meshes 141

with the larger data mesh architecture by considering the different architectural
components (e.g., from the reference architecture provided by Goedegebuure et.
al. [6]) and their relation to the data product.

References
1. Beck, K., Beedle, M., Bennekum, A.V., Cockburn, A.: The agile manifesto (2001).
https://www.agilealliance.org/wp-content/uploads/2019/09/agile-manifesto-dow
nload-2019.pdf
2. Dehghani, Z.: Data Mesh: Delivering Data-Driven Value at Scale, 1st edn. O’Reilly
(2022)
3. Driessen, S., Monsieur, G., van den Heuvel, W.J.: Data product metadata man-
agement: an industrial perspective (2022)
4. Driessen, S., Monsieur, G., Heuvel, W.V.D.: Data market design: a systematic
literature review. IEEE Access 10, 1 (2022). https://doi.org/10.1109/access.2022.
3161478
5. Eichler, R., Gröger, C., Hoos, E., Schwarz, H., Mitschang, B.: From data asset to
data product - the role of the data provider in the enterprise data marketplace. In:
Barzen, J., Leymann, F., Dustdar, S. (eds.) Service-Oriented Computing, Summer-
SOC 2022. Communications in Computer and Information Science, vol. 1603, pp.
119–138. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-18304-1 7
6. Goedegebuure, A., et al.: Data mesh: a systematic gray literature review (2023)
7. Hooshmand, Y., Resch, J., Wischnewski, P., Patil, P.: From a monolithic PLM
landscape to a federated domain and data mesh, pp. 713–722 (2022)
8. Kennedy, J., Subramaniam, P., Galhotra, S., Fernandez, R.C.: Revisiting online
data markets in 2022. ACM SIGMOD Rec. 51, 30–37 (2022). https://doi.org/10.
1145/3572751.3572757
9. Loukiala, A., Joutsenlahti, J.-P., Raatikainen, M., Mikkonen, T., Lehtonen, T.:
Migrating from a centralized data warehouse to a decentralized data platform archi-
tecture. In: Ardito, L., Jedlitschka, A., Morisio, M., Torchiano, M. (eds.) PROFES
2021. LNCS, vol. 13126, pp. 36–48. Springer, Cham (2021). https://doi.org/10.
1007/978-3-030-91452-3 3
10. Machado, I.A., Costa, C., Santos, M.Y.: Data mesh: concepts and principles of a
paradigm shift in data architectures, vol. 196, pp. 263–271. Elsevier B.V. (2021).
https://doi.org/10.1016/j.procs.2021.12.013
11. Olson, J.A.: Data as a service: are we in the clouds? J. Map Geography Libr. 6,
76–78 (2009). https://doi.org/10.1080/15420350903432739
12. Otto, B., Steinbuß, S., Teuscher, A., Lohmann, S.: IDSA reference architec-
ture model. International Data Spaces Association (2019). https://internationalda
taspaces.org/download/16630/
13. Ozyilmaz, K.R., Dogan, M., Yurdakul, A.: IDMoB: IoT data marketplace on
blockchain. In: Proceedings - 2018 Crypto Valley Conference on Blockchain Tech-
nology, CVCBT 2018, pp. 11–19 (2018). https://doi.org/10.1109/CVCBT.2018.
00007
14. Pillmann, J., Sliwa, B., Schmutzler, J., Ide, C., Wietfeld, C.: Car-to-cloud commu-
nication traffic analysis based on the common vehicle information model, pp. 1–5
(2018)
15. Sakr, M.: A data model and algorithms for a spatial data marketplace. Int. J.
Geograph. Inf. Sci. 32, 2140–2168 (2018). https://doi.org/10.1080/13658816.2018.
1484124
142 S. Driessen et al.

16. Shakeri, S., et al.: Modeling and matching digital data marketplace policies. In:
Proceedings - IEEE 15th International Conference on eScience, eScience 2019, pp.
570–577 (2019). https://doi.org/10.1109/eScience.2019.00078
17. Spiekermann, M., Tebernum, D., Wenzel, S., Otto, B.: A metadata model for data
goods. In: MKWI 2018 - Multikonferenz Wirtschaftsinformatik 2018-March, pp.
326–337 (2018)
18. Sánchez, L., et al.: Federation of internet of things testbeds for the realiza-
tion of a semantically-enabled multi-domain data marketplace. Sensors (Switzer-
land) 18, 3375 (2018). https://doi.org/10.3390/s18103375, https://www.mdpi.
com/1424-8220/18/10/3375
19. (Thoughtworks), Z.D.: How to move beyond a monothilitic data lake to a dis-
tributed data mesh (2019). https://martinfowler.com/articles/data-monolith-to-
mesh.html
20. Vu, Q.H., Pham, T.V., Truong, H.L., Dustdar, S., Asal, R.: DEMODS: a descrip-
tion model for data-as-a-service. In: Proceedings - International Conference on
Advanced Information Networking and Applications, AINA, pp. 605–612 (2012).
https://doi.org/10.1109/AINA.2012.91
21. Yuan, J., Li, H.: Research on the standardization model of data semantics in the
knowledge graph construction of oil & gas industry. Comput. Stand. Interfaces 84,
103705 (2023). https://doi.org/10.1016/J.CSI.2022.103705

You might also like