0% found this document useful (0 votes)

26 views26 pages

A Framework For Current and New Data Quality Dimensions

This paper explores the lack of standardization in data quality terminology across various domains and proposes a hierarchical data model to aggregate and classify these terms. It introduces four new data quality dimensions—governance, usefulness, quantity, and semantics—to complement the existing ISO 25012 standard, aiming to enhance communication and understanding of data quality practices. The study emphasizes the need for a unified approach to data quality to address inconsistencies and improve decision-making processes across sectors.

Uploaded by

samia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views26 pages

A Framework For Current and New Data Quality Dimensions

Uploaded by

samia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Article

A Framework for Current and New Data Quality Dimensions:

An Overview
Russell Miller 1 , Harvey Whelan 1,2 , Michael Chrubasik 1 , David Whittaker 1 , Paul Duncan 1
and João Gregório 1, *

1 Informatics, Data Science Department, National Physical Laboratory, Glasgow G1 1RD, UK

2 Department of Natural Sciences, University of Bath, Bath BA2 7AX, UK
* Correspondence: [email protected]

Abstract: This paper presents a comprehensive exploration of data quality terminology, revealing a
significant lack of standardisation in the field. The goal of this work was to conduct a comparative
analysis of data quality terminology across different domains and structure it into a hierarchical data
model. We propose a novel approach for aggregating disparate data quality terms used to describe
the multiple facets of data quality under common umbrella terms with a focus on the ISO 25012
standard. We introduce four additional data quality dimensions: governance, usefulness, quantity,
and semantics. These dimensions enhance specificity, complementing the framework established
by the ISO 25012 standard, as well as contribute to a broad understanding of data quality aspects.
The ISO 25012 standard, a general standard for managing the data quality in information systems,
offers a foundation for the development of our proposed Data Quality Data Model. This is due to the
prevalent nature of digital systems across a multitude of domains. In contrast, frameworks such as
ALCOA+, which were originally developed for specific regulated industries, can be applied more
broadly but may not always be generalisable. Ultimately, the model we propose aggregates and
classifies data quality terminology, facilitating seamless communication of the data quality between
different domains when collaboration is required to tackle cross-domain projects or challenges. By
establishing this hierarchical model, we aim to improve understanding and implementation of data
quality practices, thereby addressing critical issues in various domains.

Citation: Miller, R.; Whelan, H.; Keywords: data quality; data model; data quality dimensions; data traceability; confidence in data;
Chrubasik, M.; Whittaker, D.; Duncan, data metrology; data uncertainty; data structures; big data; IoT
P.; Gregório, J. A Framework for
Current and New Data Quality
Dimensions: An Overview. Data 2024,
9, 151. https://doi.org/10.3390/
1. Introduction
data9120151
In this digital age, we are experiencing an unparalleled increase in the generation of
Academic Editor: Panagiotis Karras data [1]. There is an increasing dependence on data for informing decision-making processes
Received: 12 September 2024
and influencing future plans within organisations [2–4]. The use of data is readily apparent
Revised: 12 November 2024 across many different sectors, from pharmaceutical manufacturing and healthcare [5–14] to
Accepted: 13 December 2024 engineering and education [15–18]. This is exemplified by the frequent adoption of data-
Published: 18 December 2024 driven initiatives by businesses aiming to sustain a competitive advantage. In healthcare,
for example, the strategic use of personalised and detailed patient data enables professionals
to design and administer tailored therapies [19]. In the manufacturing industry, sensors
generate large volumes of data, assuming a vital role in the monitoring and enhancement of
Copyright: © 2024 by the authors. manufacturing processes [20]. As we increase our reliance on data to drive processes and
Licensee MDPI, Basel, Switzerland.
decision making, we must also be vigilant about its quality since the results obtained from
This article is an open access article
using data-driven methods are dependent on the quality of the data used.
distributed under the terms and
Lower-quality data, such as measurements captured by a sensor with insufficient
conditions of the Creative Commons
precision, lack the necessary requirements to support processes and can have adverse con-
Attribution (CC BY) license (https://
sequences for organisations. In data-driven operations, it is imperative to evaluate the data
creativecommons.org/licenses/by/
4.0/).
to determine its suitability for the intended task. This prompts a clear and fundamental

Data 2024, 9, 151. https://doi.org/10.3390/data9120151 https://www.mdpi.com/journal/data

Data 2024, 9, 151 2 of 26

question: what is data quality and what methodologies can be used for evaluating it? As
outlined in ISO standard 25012 [21], data quality relates to the extent to which data meet
the specifications established by an organisation responsible for developing a product.
The standard presents a comprehensive data model that, in the context of data manage-
ment within software systems, facilitates the assessment of data quality by considering
characteristics such as correctness, completeness, and consistency.
Prior studies on data quality have typically focused on specific sectors, as evidenced by
existing literature reviews [19,22–34]. There has been a growing need in the healthcare indus-
try, namely in electronic health records, to define and evaluate important dimensions of data
quality [19,35,36]. Research has also been conducted in the field of Architecture, Engineering,
and Construction (AEC) [37], the autonomous vehicles industry [38], and in data analytics
for smart factories [1]. Although these studies have made valuable contributions towards
enhancing the comprehension of data quality in specific areas, they have also reinforced the
issue of conflicting data quality definitions and terminology across sectors. This is further
aggravated by disparities within a given sector. For instance, an investigation conducted on
electronic health records revealed the existence of 44 unique dimensions employed across
different pockets of the healthcare sector [19]. This highlights the need for a standardised
approach to data quality that transcends sector-specific definitions and applications.
In a literature review conducted by Ibrahim et al. [32], data quality issues affecting
the the master data—data related to business entities that provide sufficient context for
transactions—of businesses and organisations were explored. The methodology used aimed
to identify key factors influencing the quality of master data. Analysis revealed that data
governance emerged as the most frequently addressed term within business operations,
with 11 out of 15 studies highlighting its significance. This heightened emphasis on data
governance can be attributed to increased GDPR compliance needs and digitalisation,
which requires defined roles and responsibilities in data quality management. Elements
that impact master data quality were compared for three distinct sectors: business, account-
ing, and healthcare. They identified terminology discrepancies in these elements, which
impact the quality of the data. It follows that there is a significant need to conduct a more
overarching analysis in order to capture the diversity observed throughout the different
sectors. Additionally, such an analysis could provide insights into common data quality
dimensions that are applicable across various domains.
Certain sectors, such as AEC, have not given due attention to data quality metrics.
A literature review by Morwood et al. [37] highlighted the insufficient consideration of
data quality in monitoring the energy performance of buildings. This concern prompted an
examination of recent literature through the lens of data quality, revealing that only 9 out of
162 articles explicitly addressed the subject, and, among those that did discuss data quality,
an average of only 3.23 data quality dimensions were mentioned. While specific data
quality issues, such as low spatio-temporal granularity, data loss, and high measurement
uncertainty have been sporadically addressed, the primary observation underscores the
fragmented nature of data quality approaches in building energy monitoring studies.
Addressing these gaps could lead to more effective data quality practices in the AEC sector.
A study conducted by Mansouri et al. [33] aimed to identify the dimensions of data
quality relevant for Internet of Things (IoT) applications. A constraint encountered in this
study was the limited number of data quality dimensions that were explicitly referenced
for the IoT domain. The researchers discovered that a lack of agreement existed on the
dimensions that could be applied to various data categories. They proposed the incorpo-
ration of additional dimensions, informed by insights from specialists in the IoT domain,
as a potential solution to tackle this issue. An additional literature review focused on IoT
has also been published, containing a comparative analysis of the relationship between data
types and data quality dimensions to refine available options for managing IoT data [39].
Findings indicate that the majority of IoT solutions have the capability to process structured
or semi-structured data, whereas their ability to handle unstructured data is limited. The di-
mensions of data quality that have been identified as particularly significant in this context
Data 2024, 9, 151 3 of 26

include completeness, correctness, consistency, and timeliness. This highlights the need for
a comprehensive framework that can accommodate the diverse data types encountered in
IoT applications.
Another work, conducted by Firmani et al. [40] focused on data quality related to big
data. The complexity of big data, which comprises large volumes of unstructured data,
presents unique challenges in terms of data accessibility, which is itself an aspect of data
quality. Firmani’s research provides perspective on the subject and adds to the establishment
of methodologies for assessing the quality of large datasets. Big data are also connected
with machine learning (ML) by leveraging MLs ability to discern data characteristics and
enhance data processing. The effectiveness and efficiency of ML techniques are tied to the
datasets, including their volume, used in the training process, thus further emphasising
the importance of data quality as a factor influencing model performance [3]. This relation-
ship highlights the need for robust data quality frameworks to support effective machine
learning applications.
Big data and IoT are revolutionising how data are used across different sectors. In busi-
ness, big data analytics informs decision making and optimises operations, while, in health-
care, IoT devices can be used to analyse patient data for creating personalised therapies [27].
The common denominator is the role of data and its quality. As these technologies progress,
challenges surface that highlight the need for standardised data quality frameworks. Apply-
ing such frameworks will be essential for ensuring data quality across diverse applications
and sectors.
These reviews show that, despite efforts to establish standardised data quality met-
rics and assessment procedures being pursued, different industries having unique data
needs is a reality that must be considered. As a result, it is possible for inconsistencies to
emerge while attempting to define data quality. These inconsistencies stem from a lack of
standardisation, with most sectors struggling to keep up and vary their understanding and
implementation of data quality to improve their processes and decision making. This lack
of standardisation creates a large diversity in existing terminology used to describe data
quality, resulting in further communication challenges. While the need for unambiguous
definitions for defining data quality terminology has been recognised, a notable lack of
established methodologies or standardised definitions of data quality persists. The goal of
this paper is to conduct a comparative analysis of data quality terminology across different
domains and structure it into a hierarchical data model. This structured glossary has the
potential to act as a translation layer, enabling different domains to communicate with each
other using a common language framework when addressing the issue of data quality.
The remainder of this paper is structured as follows: Section 2 (Methods) details the
literature search and terminology mapping processes used; Section 3 (Results) presents the
identified data quality dimensions and their classification; Section 4 (Discussion on Results)
analyses the classification of data quality dimensions into the following categories: inherent
data quality, contextual data quality, and system-dependent data quality; Section 5 (Discus-
sion) provides a higher-level perspective on data quality and its dimensions; and Section 6
(Conclusions) summarises the findings of this work and outlines future perspectives.

2. Methods
In this section, we outline the methodologies used in this study to ensure a compre-
hensive and rigorous analysis of data quality terminology. Our approach was two fold:
First, we conducted a literature search, as described in Section 2.1, to identify relevant
academic papers that discuss data quality. Second, to align the terms used in these papers
with the data quality dimensions specified by the ISO 25012 standard, we performed a
terminology mapping exercise, as detailed in Section 2.2. This enabled us to capture the
broad spectrum of data quality descriptors, thereby enriching our understanding of the
subject. This also allowed us to condense the heterogeneous terminology into a structured,
hierarchical vocabulary.
Data 2024, 9, 151 4 of 26

2.1. Literature Search

The main literature search was conducted using an independent multiple-review
approach and is highlighted in Figure 1. The search was carried out using Web of Science
(WoS) as the designated search engine. The search query included “data quality” as both
the search term and the only required keyword. The scope of the search was limited to
recently published academic papers between 2019 and 2023. This range allowed for optimal
use of resources available to the authors, while focusing on the latest developments in data
quality research. This resulted in a total of 10,855 papers being identified.

Identification of studies
Identification

Records removed before

Records identified from WoS: screening:
Research papers (n = Records removed after 500
10,855) maximum paper limit (n = 10
355)

Records screened Records excluded

(n = 500) (n = 327)

Reports sought for retrieval Reports not retrieved

(n = 173)
Screening

(n = 57)

Reports assessed for eligibility

(n = 116)
Reports excluded:
(n = 0)
Included

Studies included in review

(n = 116)

Figure 1. The literature screening process used in this study (does not account for research papers
found outside of the main Web of Science query).

Only publications written in English and found using the outlined search query were
considered, and only the first 500 papers from the total found were reviewed. This cut
off was decided, upon cross-referencing results from the multiple-review approach, as it
was around the 500 paper mark that all researchers, individually from each other, stopped
finding literature they deemed relevant for this study. This cap of 500 also ensured that the
number of papers to be individually reviewed was manageable.
Each researcher individually reviewed these 500 papers based on title, abstract,
and keywords. Subjective reviewing criteria, based on research expertise, were used to
decide whether the paper was relevant or not, in accordance with the independent multiple-
review approach. Papers identified as relevant by at least one researcher proceeded through
the remainder of the screening phase for a more in-depth review.
This approach highlighted a total of 173 papers, already taking into account the
removal of duplicate papers found by multiple researchers. From this total, 57 papers were
inaccessible, being locked behind closed access. This resulted in a total of 116 relevant
papers being captured for further review and discussion.
Data 2024, 9, 151 5 of 26

In addition to the primary literature search methodology, additional publications were

discovered through casual conversations during the research process. These are discussed
on a case-by-case basis and are not captured by Figure 1.

2.2. Terminology Mapping

Each paper included in this study was manually and individually screened for any and
all terminology mentioned, described, or used to assess data quality. The terms considered
were those that primarily described (or attempted to describe) any of the data quality
dimensions specified by the ISO 25012 standard. Terms that did not match the ISO 25012
standard but were used in the context of assessing data quality were also considered.
The outcome was a list of 262 terms used to describe some aspect of data quality or one
of its dimensions. The definition of each individual term, as stated in its source material,
was used to align the term with one of ISO 25012’s existing dimensions. The list included
antonyms, such as the use of “inaccuracy” to describe accurate data. Terms that did not fit
into any of the existing dimensions were grouped together based on their similarities to help
define new core data quality dimensions, in addition to ISO 25012’s existing terminology.

3. Results
3.1. Data Quality Dimensions
The ISO 25012 standard offers a detailed framework consisting of 15 different di-
mensions of data quality that determine the features related to the management and
improvement of data quality. These dimensions are defined in Table 1 according to their
ISO definitions. While this standard was originally developed in the context of software
development [21], it has relevance to the field of data science, given the close alignment
between both practices. They both share a focus on the importance of high-quality, reliable
data for successful outcomes, and they have cross-applicability over multiple domains.

Table 1. Data quality dimensions, as described by ISO 25012. This table presents the dimensions that
represent the fundamental properties of data quality.

Dimension Definition
The degree to which data have attributes that correctly represent the true value of the intended attribute of a concept or
Accuracy
event in a specific context of use.
The degree to which subject data associated with an entity have values for all expected attributes and related entity instances
Completeness
in a specific context of use.
The degree to which data have attributes that are free from contradiction and are coherent with other data in a specific
Consistency
context of use. It can be either or both among data regarding one entity and across similar data for comparable entities.
The degree to which data have attributes that are regarded as true and believable by users in a specific context of use.
Credibility
Credibility includes the concept of authenticity (the truthfulness of origins, attributions, commitments, etc.).
Currentness The degree to which data have attributes that are of the right age in a specific context of use.
The degree to which data can be accessed in a specific context of use, particularly by people who need supporting
Accessibility
technology or special configuration because of some disability.
The degree to which data have attributes that adhere to standards, conventions, or regulations in force and have similar
Compliance
rules relating to data quality in a specific context of use.
The degree to which data have attributes that ensure that they are only accessible and interpretable by authorised users in a
Confidentiality
specific context of use.
The degree to which data have attributes that can be processed and provide the expected levels of performance using the
Efficiency
appropriate amounts and types of resources in a specific context of use.
Precision The degree to which data have attributes that are exact or that provide discrimination in a specific context of use.
The degree to which data have attributes that provide an audit trail of access to the data and of any changes made to the
Traceability
data in a specific context of use.
The degree to which data have attributes that enable them to be read and interpreted by users, as well as are expressed in
Understandability
appropriate languages, symbols, and units in a specific context of use.
Data 2024, 9, 151 6 of 26

Table 1. Cont.

Dimension Definition
The degree to which data have attributes that enable them to be retrieved by authorised users and/or applications in a
Availability
specific context of use.
The degree to which data have attributes that enable them to be installed, replaced, or moved from one system to another
Portability
preserving the existing quality in a specific context of use.
The degree to which data have attributes that enable them to maintain and preserve a specified level of operations and
Recoverability
quality, even in the event of failure, in a specific context of use.

While this standard can be interpreted as an all-encompassing framework to assess

data quality, its specificity to software development still imposes limitations to its use across
different domains. Certain terms could not be associated with existing data quality dimen-
sions. As a result, this work considers the addition of four new dimensions: governance,
usefulness, quantity, and semantics. These are defined, by this work, in Table 2. The addi-
tion of these dimensions expands the ISO 25012 standard into a standardised framework for
categorising data quality dimensions that have more reliability across different domains.

Table 2. Additional data quality dimensions defined by this work to complement the ISO 25012
data model.

Dimension Definition
The degree to which data have attributes that adhere to the formalised frameworks of authority and accountability that
Governance
support harmonised data activities across an organisation.
The usefulness of data is determined by the extent to which their attributes meet the specific requirements of users or
Usefulness applications. This includes the data’s adaptability across various contexts, recognising their potential for diverse
applicability due to aspects such as reusability and interoperability.
The degree to which data have attributes that represent the sufficient amount or volume, providing a comprehensive view of
Quantity
the intended attribute of a concept or event in a specific context of use.
The degree to which data accurately and consistently represent the intended meaning, interpretation, and real-world
Semantics
concepts within a specific context of use, ensuring correct semantic understanding by users and applications.

In the process of developing this work, the ALCOA+ framework, a data integrity
framework that is widely adopted in life sciences and endorsed by the US Food and Drug
Administration (FDA), was also considered [8,41]. However, compared to ISO 25012, it
provides less detail in defining data quality dimensions, which poses challenges for its
adaptation into a universal data quality framework. A notable limitation of ALCOA+ is
its inability to distinguish between accuracy and precision. While it effectively addresses
the data quality needs specific to life sciences, its applicability to other sectors is limited
due to its lack of generalisability. This limitation reinforces the rationale for selecting ISO
25012, a data quality framework with relevance to the software development domain, as the
foundation for the more versatile data quality framework proposed in this study.

3.2. Classification of Data Quality Dimensions

ISO 25012 also assigns each data quality dimension a position in a spectrum that ranges
from “Inherent” to “System-dependent”. Inherent dimensions are those that represent
the fundamental, intrinsic properties of data that hold true regardless of context or user
requirements. Consider a hospital database that records patient information. The accuracy
of these data, such as correct names and addresses, is an inherent dimension. If a patient’s
name is recorded incorrectly, it could lead to serious errors in patient care.
System-dependent dimensions underscore the role of the system in data quality, and
they include aspects such as data availability, portability, and recoverability. For example,
consider an online banking system. The availability of the system (i.e., the system is
up and running when a customer wants to check their account balance) is a system-
dependent dimension.
Data 2024, 9, 151 7 of 26

Additionally to this placement, this work also considers dimensions in the middle
of this spectrum to be “Contextual” as they share both inherent and system-dependent
characteristics. Contextual dimensions are those that emphasise the importance of the
context in which data are used. For instance, in a marketing campaign, customer data
needs to be unique, accurate, and consistent across all engagement channels. The relevancy
of the data to the specific marketing campaign is a contextual dimension.
Tables 3–5 list all the inherent, contextual, and system-dependent data quality di-
mensions, respectively. These tables include all dimensions from the original ISO 25012
standard, as well as the additions proposed by this work. They also list all the 262 asso-
ciated terms found in the literature search, as described in Section 2.1, which are used to
describe each of the core data quality dimensions. Lastly, Figure 2 showcases how the
262 associated terms are organised into a structured glossary for supporting efficient data
quality communication across different domains.

Figure 2. Structured glossary of data quality terminology. The first concentric ring categorises data
quality into three domains: inherent, contextual, and system dependent. Each domain is further split
into core data quality dimensions in the second ring. The outermost ring aligns all associated terms
found in the literature, and they are presented in Tables 3–5 with their corresponding core terms.
Data 2024, 9, 151 8 of 26

Table 3. Inherent data quality dimensions based on ISO 25012. This table presents the dimensions
that represent the fundamental, intrinsic properties of data that hold true regardless of the context or
user requirements. Each dimension is defined and associated with specific terms from the literature,
providing a comprehensive overview of the inherent aspects of data quality [1,2,5,6,8,10–17,19,23,26,
28,29,31–33,36–140].

Dimension Associated Terms

Accuracy, Accurate, Closely match a real-state, Coincidence, Correct, Corrections made, Correctness, Data value out of range,
Accuracy
Errors, Free of error, Free of mistakes, Inaccurate, Incorrect, Positive predicted values, Value accuracy
Complete, Completeness, Comprehensiveness, Diversity, Entity heterogeneity, Heterogeneity, Incompleteness, Coverage, Min.
data capture, Min. sample points, Min. time coverage, Missing values, Representativeness, Study representativeness, Variety,
Completeness
Areas covered, Geographical coverage, Handling of null values, Homogeneity, Missing information, Missingness, Omission,
Representativity, Scope, Technological cover, Time-related coverage
Coherence, Cohesiveness, Comparability, Consistency, Consistent, Consistent representation, Constant representation,
Consistency Comparable, Duplication, Incompatibility, Inconsistency, Redundancy, Representational Consistency, Reproducibility, Spatial
stability, Structural consistency, Syntactic accuracy, Thematic accuracy, Variability
Agreement, Authenticity, Believability, Bias, Coding reliability, Confidence, Corroboration, Credibility, Freedom of bias,
Credibility Impartiality, Incorrect information, Integrity, Misleading, Plausibility, Popularity, Quality, Reliability, Reputability, Reputation,
Robustness, Status, Trust, Trustworthiness, Unambiguity, Unbiased, Valid, Validity, Veracity
Actuality, Currency, Currentness, Freshness, Outdated information, Rate of recording, Recency, Timeliness, Timely, Up-to-date,
Currentness
Velocity, Vitality, Volatility, Volatability

Table 4. Contextual data quality dimensions based on ISO 25012. This table presents the dimensions
that emphasise the importance of the context in which data are used. Each dimension is defined and
associated with terms from the literature, highlighting the multifaceted nature of contextual data
quality. * The table also introduces two newly added dimensions: governance and usefulness [1,2,5,6,
8–17,19,23,26,28,29,31–33,36–105,107–126,128–146].

Dimension Associated Terms

Accessibility Accessibility, Clear definition, Discoverability, Ease of access, Findability
Compliance, Concordance, Conformance, Conformity, Licensing, Model conformance, Privacy preservation, Value data type,
Compliance
Appropriate use
Confidentiality Confidentiality, Data protection, Privacy, Security, Sensitivity, Statistical disclosure control, Vulnerability
Efficiency Costs effectiveness, Ease of manipulation, Efficiency, Expediency, Minimality, Optimal use of resources, Performance, Viscosity
Governance ∗ Accountability, Alignment, Auditability, Authority, Authorisation, Enduring, Management, Risks
Attribute granularity, Brief representation, Concise representation, Conciseness, Detection limit, Distribution bias, Format
precision, Imprecise, Intrinsic Approximation, Intrinsic uncertainty, Intrinsic variability, Level of detail, Noiseness, Objectivity,
Precision
Outliers, Precision, Precision of domains, Representational conciseness, Spatial resolution, Time resolution, Unambiguous,
Uncertainty, Variation
Attributable, Capture, Contemporaneous, Documentation, Fairness, Identifiability, Lineage, Meta-data, Original, Provenance,
Traceability
Quality of methodology, Source, Traceability, Translatability, Transparency, Verifiability
Characteristic series structure, Clarity, Clean, Complexity, Comprehensibility, Content, Ease of interpretation, Ease of
understanding, Format, Formats, Information-to-noise ratio, Informativeness, Interpretable, Interpretability, Legible,
Understandability
Presentation, Presentation quality, Quantitativeness, Readability, Semiotic, Structure, Transformation, Understandability,
Understandable, Understanding, Visualisation
Applicability, Appropriateness, Artificiality, Definition, Essentialness, Expandability, Fitness, Fitness for purpose, Fitness for
Usefulness ∗ use, Flexibility, Importance, Interoperability, Irrelevant, Meaningful, Naturalness, Relevance, Relevancy, Relevant, Reusability,
Suitability, Uniqueness, Useableness, Usability, Usefulness, Utility, Valuation, Value, Value added, Versatility
Data 2024, 9, 151 9 of 26

Table 5. System-dependent data quality dimensions based on ISO 25012. This table details
the dimensions that underscore the role of the system in data quality, including aspects re-
lated to data availability, portability, recoverability, and quantity. Each dimension is defined
and associated with terms from the literature, illustrating how system characteristics can impact
data quality. * The table also introduces two newly added dimensions: quantity and seman-
tics [5,6,8,10,12,19,23,28,29,31,33,36,37,39,41,43,44,46–49,51,54–58,60,62,67,68,71,72,75,79,84,85,87,91–
93,98,103,105,107,108,110,114,121–123,125,126,129,131,132,135–137,139,141].

Dimensions Associated Terms

Availability Access security, Adequacy, Attainability, Availability, Available, Obtainability, Visibility
Portability Controllability, Mobility, Portability, Use of Storage
Quantity ∗ Amount of data, Appropriate amount, Compactness, Data volume, Scalability, Sufficiency, Suitable amount, Volume
Recoverability Back up, Decay, Recoverability
Semantics ∗ Interlinking, Language, Semantic accuracy, Semantic consistency, Syntactic validity, Syntax

4. Discussion on Results
4.1. Inherent Data Quality
According to ISO 25012, inherent data quality is defined as the degree to which data’s
quality characteristics have the intrinsic potential to meet stated and implied needs when
data are used under specified conditions. It refers to how well the intrinsic attributes and
values of data follow constraints and rules that make them fit for use.
The importance of maintaining robust inherent quality cannot be overstated as the
accuracy of downstream processes is only as good as the quality of the source data. Errors in
these areas can cause bigger problems later when the data are used for reporting, analytics,
and decision making.
For instance, invalid product codes in a purchase order database (domain values) can
result in shipment and accounting complications. Similarly, contradictory order statuses
over time (logical consistency) can disrupt plans for fulfillment prioritisation. Additionally,
a lack of descriptions for order status code meanings (metadata) can inhibit accurate
analysis, leading to shipping errors.
Therefore, it is crucial to use robust validation against business rules, governance
oversight, and thorough documentation to tackle quality issues at their root. By examining
key downstream data quality dimensions, such as accuracy, completeness, credibility, and
consistency, we can verify that inherent data integrity measures, such as comprehensive
validation rules, governance policies, and descriptive metadata, enable the data to meet
quality expectations across intended usage requirements.

4.1.1. Accuracy
Two closely related dimensions of data quality are accuracy and precision. While they
are often cited together, they serve distinct roles in data quality and should not be confused.
Accuracy is defined as as “the degree to which data has attributes that correctly represent
the true value of the intended attribute of a concept or event in a specific context of use”,
while precision is defined as “the degree to which subject data associated with an entity
has values for all expected attributes and related entity instances in a specific context of
use” [21].
Accuracy is one of the most frequently cited data quality dimensions. It specifically relates
to how closely data values match the true value concerned. With this formal definition in
mind, all correctness [17,19,26,36,44,46,59,69,72,75,77,88,106,110,123,140], free of error [59,110,
125,137], and accuracy-oriented terminology were categorised under the accuracy dimension.
For instance, the term closely match a real state [17], i.e., a state depicting the true values of
data being represented, logically aligning with accurately reflecting ground truth.
Additionally, out-of-range violations directly contradict defined accuracy bounds, sig-
naling deviation from expectations rather than natural variation [100]. Such breaches can
Data 2024, 9, 151 10 of 26

significantly impact accuracy when judgements are based on preset expectations. For exam-
ple, a sensor fault could result in a temperature reading outside the expected range, leading
to a home heating system’s controller to set the thermostat’s set point much lower/higher
than required. This would compromise both the comfort and heating efficiency in the home.

4.1.2. Completeness
Completeness is defined as “The degree to which subject data associated with an
entity has values for all expected attributes and related entity instances in a specific context
of use” [21]. Discussions on the scope of completeness highlighted aspects of heterogeneity,
such as diversity [82,88] and variety [23,28,58,60,62,75,87,108,110], which both capture the
idea of data spanning a broad scope of attributes and characteristics ensuring a compre-
hensive representation of the domain. Representativeness [67,102,132,135] goes beyond
simply filling gaps in the data and involves adequately capturing the full spectrum of
variability within a dataset.
Coverage [75,85,110,132] refers to whether the available data encompass information
from all the necessary elements required for a comprehensive overall representation of
the concept. For example, survey data could have responses from only a small subset of
people in a population. In that case, it has limited coverage to make conclusions about
behaviours and traits of the overall population. Sufficient coverage thus requires that
the data should capture information, metrics, and perspectives from key segments that
cumulatively contribute to meaningfully conveying the complete phenomenon.
Minimum data capture rates and coverage [75,85,110,132] across important sampling
dimensions such as time and geography are also directly tied to completeness as they
ensure the necessary representation. The core argument is that true completeness requires
not only addressing missing information, but also accurately portraying the entire scope
of the domain, including the anticipated diversity. For example, consider a dataset about
customer preferences for a product. Completeness would involve not only ensuring that
each customer record has all the necessary attributes filled in (e.g., age, gender, and location),
but also making sure that the dataset includes a representative sample of customers from
various demographics, regions, and preference categories. This way, the dataset can provide
a comprehensive and accurate picture of the customer landscape, enabling more reliable
insights and decision making.

4.1.3. Consistency
Consistency is defined as “the degree to which data has attributes that are free from
contradiction and are coherent with other data in a specific context of use. It can be either or
both among data regarding one entity and across similar data for comparable entities” [21].
Two terms containing “accuracy” were classified under the dimension of consistency.
These are thematic accuracy and syntactic accuracy [68]. Thematic accuracy refers to the
correct classification of entities. However, it aligns more with the application of standard-
ised rules for integration than with representing an absolute truth [68]. When datasets
are coded or classified for grouped analysis, the accuracy of individual data points may
be compromised to some extent. For example, encoding detailed satellite imagery into
classified map layers or land cover types for geospatial analysis may result in loss of accu-
racy for localised regions. A vegetation mapping system that categorises hyper-local flora
into broad biome categories like forests, grasslands, and deserts, among others, may lose
precision on individual plant species details. However, this approach enables aggregated
analytics about vegetation distribution. The benefits of unified coding frameworks for
cross-regional agricultural pattern insights outweigh the localised accuracy loss. In this
context, consistency in applying the classification rules holds more importance than the
accuracy of individual data points, and that is what is described by thematic accuracy.
Building on this idea of prioritising consistency, syntactic accuracy requires various
datasets to adhere to a common structural schema or standard format for interoperability,
such as XML (eXtensible Markup Language) or JSON (JavaScript Object Notation) [68].
Data 2024, 9, 151 11 of 26

These are widely used formats for structuring, exchanging, or representing data. This
adherence may not perfectly mirror the native representations within each source system,
meaning it might not match the original format in which the data were stored or created.
Some characteristics that make the data useful or meaningful may need to be compromised
to fit uniform syntactical mandates, or the rules may need to be changed about how the
data should be structured. However, adherence to the schema, the structure, and the
organisation of the data, ensures reliable interchange through expected consistency. This
improves consolidation, the process of combining data from multiple sources, migration
capabilities, and the ability to move data between systems or formats despite the potential
impacts on accuracy for fringe data elements, which are data elements not commonly used.
Just as thematic accuracy and syntactic accuracy prioritise consistency by generalising
observations into shared categorical frameworks, thematic conformity also focuses less on
accuracy than enabling unified analysis through abstraction semantics. In the context of
digital systems, it can also be described as the degree to which a data type can be used
in its intended domain, with consistency of data types being a requirement. Abstraction
semantics refers to the process of simplifying complex data into more understandable
or manageable formats. Similarly, while syntactic accuracy matters for data interchange,
shared schema adherence provides consistency but does not inherently ensure the accuracy
or truthfulness of the data itself. Inherent truth correspondence refers to data accurately
representing the true value of the intended attribute of a concept or event. Mandating
compatible structural representations, even if not completely accurate, enables integrat-
ing data and deriving unified meaning across different contexts. Hence, both thematic
accuracy and syntactic accuracy qualify as consistency dimensions by prioritising coherent
interpretations under standardised constraints.
Though inefficient at first, true duplicates—that is duplication [12,89,90], or the repli-
cation of identical data or attributes for the same entity—allow for the representations of
the same entity to diverge over time, resulting in inconsistency. It is important to note that
repeat measurements of the same item, which may vary due to factors such as measurement
error or changes in the item over time, would not be considered true duplicates. As such,
duplication should be considered a potential issue related to consistency.

4.1.4. Credibility
Credibility is defined as “the degree to which data has attributes that are regarded
as true and believable by users in a specific context. Credibility includes the concept of
authenticity (the truthfulness of origins, attributions, commitments)” [21].
While integrity [55] is primarily an ethical attribute, which directly impacts perceived
credibility, as it implies researcher honesty and mitigates doubts from questionable practices
that would undermine believability, similarly, the absence of bias provides more objective
evidence to support credibility as impartiality aligns with factual accuracy.
Although semantically related, veracity [23,58,60,75,87,108,110,128] fundamentally
differs from credibility. It represents comprehensive quality assurance across datasets,
and it encompasses attributes of accuracy, completeness, and freedom from distortion.
When handling vast volumes of data, there are risks of missing components, inaccurate
elements, or an inability to provide meaningful insight if they are left unchecked.
Data veracity thus indicates the level of confidence placed on extensive information pools
through supplemental reliability checks and controls beyond limited samples. It constitutes an
applied practice of instilling trust by promoting centralised policies that enable effective data
governance and quality control to enhance the integrity across diverse datasets. In contrast,
credibility intrinsically centers on inherent conformity to facts. However, while they are
distinct concepts, veracity quantification can be considered a measure of the trustworthi-
ness [46,48,67,84,86,90,91,97,102,116,132] or credibility of a very large data provider.

4.1.5. Currentness
Currentness is defined as “the degree to which data has attributes that are of the
right age in a specific context of use” [21]. This definition aligns directly with the focus on
Data 2024, 9, 151 12 of 26

timeliness [2,14,15,17,28,31–33,36,37,42–44,46,49–51,53–57,59,61–63,65,68,69,73,74,80,85,87,
88,91–94,96,99,102,105,107,109,110,112,114,116,118,119,123–125,131–135,137,139]. The term
vitality [108] encapsulates the concept of maintaining relevance over time as opposed to
becoming obsolete or out of date. Volatility [17,59,133], on the other hand, signals time
sensitivity. Highly volatile data can quickly become outdated if not refreshed promptly,
thus necessitating timely maintenance to preserve its usability. By accounting for volatility
exposure, proactive planning for currentness can be achieved rather than resorting to
reactive measures to combat obsolescence.

4.2. Contextual Data Quality

The ISO 25012 standard outlines several dimensions. As described in Section 3, we
identified a need to further broaden the scope to encompass additional contextual con-
siderations. Consequently, we propose the addition of two new dimensions, governance
and usefulness, under the contextual dimensions. The rationale behind their introduction
stems from the absence of existing dimensions that adequately capture the associated terms
identified during our literature review.
Contextual data quality is an important concept that emphasises evaluating the quality
of data within the context it is used in. For example, dimensions like accessibility, compliance,
and precision should be considered when evaluating contextual data quality. Assessing
data along these types of contextual dimensions highlights the multifaceted nature of data
quality—whether data are “good” depends significantly on the context of its application.
To illustrate, let us consider the dimension of accessibility. In a healthcare setting, data
must be readily accessible to healthcare providers for timely decision making, but stringent
controls must be in place to prevent unauthorised access, highlighting the interplay between
accessibility and compliance.
Considering contextual dimensions in a comprehensive manner allows a more com-
plete assessment to determine overall fitness for use, then understanding contextual data
quality leads to better data-driven decisions because it highlights how key quality issues
relate to the environment data are used in. Ignoring context can undermine data value by
overlooking crucial contextual factors within specific applications; hence, evaluating these
factors is crucial for having the right data in the right format at the right time to fully derive
value from it.

4.2.1. Accessibility
The dimension of accessibility pertains to how readily data can be accessed and
obtained within a specific context. This dimension encompasses several related terms iden-
tified in the literature review, all of which focus on data access in the face of user constraints
dictated by the context. Discoverability (or findability) [107] and clear definition [54] are
closely tied to accessibility [6,17,29,33,46,49,54–56,59,61,68,70,73,78,84,98,102,107,110,122,
123,125,126,131–133,135–137]. They highlight the user’s ability to understand and locate
data within context-dependent settings.

4.2.2. Compliance
Compliance refers to the degree to which data align with externally relevant regula-
tions, rules, standards, and policies within a specific context. This dimension is centered
around fulfilling these governance requirements. Conformance [19,96,123,140] and con-
formity [12,36,44,129,133] are associated terms that denote the alignment of data with
applicable standards or conventions. Model conformance [142] specifically pertains to
adherence when dealing with structured or schematic data models.
While there is an overlap between compliance and confidentiality [44,98,122,136] in
the context of privacy preservation, compliance specifically necessitates adherence with
regulations and policies to restrict access and uphold confidentiality via suitable security
controls. Thus, it is more about enforcing confidentiality rather than the concept itself, which
is a component of compliance. Lastly, licensing [91,132] is about complying with any terms
related to data usage access rights.
Data 2024, 9, 151 13 of 26

4.2.3. Confidentiality
The confidentiality dimension is concerned with the extent to which data attribute
limit access to only authorised users. This dimension involves controls and policies that
restrict access permissions to legitimate, authenticated users.
As such, data protection [82] involves examining the safeguards actively used to
secure data in accordance with confidentiality constraints. While privacy pertains to the
appropriate access to any sensitive or personal information, security [6,33,51,59,75,91,93,
110,123,125,132,137] focuses on technical controls like encryption that prevent unauthorised
user access (even in the event of an external system compromise).
Sensitivity [46,110] refers to levels such as high, medium, or low that dictate the degree
of confidentiality required. Vulnerability [33,108], on the other hand, assesses potential
exposures that could compromise existing access controls if exploited by an unauthorised
party. Statistical disclosure control [54,70] is associated with the anonymisation of data to
minimise the risks of identifying individuals in a dataset.

4.2.4. Efficiency
The efficiency dimension addresses whether data attributes allow for the achievement of
expected system performance levels and objectives through appropriate resource use. Terms
in this category span economic factors, computational efficiency, and resource optimisation.
Viscosity refers to the resistance that slows or inhibits the movement and transforma-
tion of data. When data are used across different sources or processing pipelines, friction
and barriers that reduce flow can introduce inefficiencies. High viscosity [108] implies
that data do not integrate or stream smoothly to where they need to go next. Essentially, it
describes how efficiently data flows to enable tasks to be completed.

4.2.5. Governance
The proposed dimension of governance encompasses organisational structures and
policies that guide data activities. The relationship between governance and compliance in
data management is complex and often overlaps, leading to potential confusion about their
distinct roles and objectives. Whilst both concepts are critical for effective data manage-
ment, it is essential to understand their differences and how they complement each other.
Governance focuses on the internal organisational structures, policies, and processes that
guide data activities within an enterprise. It establishes the internal rules, responsibilities,
and authorities for data management, ensuring that data are handled consistently and in
alignment with the goals and values of the organisation. On the other hand, compliance is
primarily concerned with adhering to external regulations, standards, and policies imposed
by regulatory bodies or authorities. It ensures that internal practices and procedures align
with external mandates and requirements, such as legal regulations, industry standards,
and contractual obligations. Although compliance often necessitates the existence of inter-
nal rules and policies, which are aspects of governance, it encompasses a broader range of
external requirements beyond enterprise-specific rules. Treating governance and compli-
ance as separate allows for a clearer definition of their respective scopes and emphasises the
importance of considering both internal governance and external compliance requirements
in data management. By understanding and addressing both governance and compliance,
organisations can establish a robust framework for managing data effectively, ensuring
internal consistency and alignment with external requirements.
Authority [75] and authorisation [49] are manifestations of governance policies that
designate permitted data actions across users. These concepts focus on the rules that deter-
mine who or what has been officially approved to access and interact with data in specific
ways. This is distinct from actual physical controls and system security that enable real-
time access, where authorisation is pre-approved based on roles, responsibilities, and data
sensitivity. For instance, an employee may be authorised to view sales records but lack the
authority to modify them; thus, the policy allows read but not write privileges.
Data 2024, 9, 151 14 of 26

Accountability [75] and management [38] directly implement governance principles

through coordination oversight. Alignment [105] falls under governance and assesses
behavioural consistency and adherence to centralised policies across an organisation. Au-
ditability [49,132] enables accountability monitoring through tracking, serving as a pillar
of functional governance. While there is overlap between the dimension of traceability and
the associated terms of auditability, traceability refers to the ability to trace the change or
state of uniquely identified data points across time in a meaningful sequence.
Auditability specifically pertains to the processes and controls in place to store trans-
actional records of how data have been accessed and modified. For example, retaining
details like timestamps, data field changes, and the source user or program making edits
supports comprehensive audit trails. Implementing such auditability makes ongoing data
usage and alterations transparent. This allows for factual verification of compliance with
policies and procedural benchmarks, thereby upholding accountability for proper data
handling. Lastly, risk [85] assessment is directly tied to governance’s risk mitigation duties
as stewards of institutional data assets.

4.2.6. Traceability
The traceability dimension pertains to the provision of audit trails, capturing data
access, and modifications. Although it is a distinct concept from auditability, traceability
aids in providing the necessary infrastructure that auditability uses. At its core, traceability
involves preserving attribution and documenting events to enable tracing lineage back to
original sources. Lineage [57] directly maps as a crucial component within the broader
scope of traceability based on its role in preserving and conveying chains of upstream data
provenance sources, pointing to origination events that led to current states.
Quality of methodology [70] and transparency [82] are essential for enabling others
to appraise the credibility of tracing procedures. It ensures that the methods used are
scientifically justified or internationally recognised. This includes using approved quality
procedures, regularly monitoring and enhancing data collection and processing, applying
suitable data correction techniques, and validating statistical indicators and the conduction
of audits. Transparency involves providing clear documentation and disclosure of the
steps, assumptions, data sources, and tools used.
Identifiability [54] mechanisms link activities directly to the actors involved by relying
on policies that avoid obfuscation, violating user rights. Similarly, verifiability, which refers
to the ability to confirm the accuracy and reliability of traceability information, is doubly
essential. It enables post hoc reconstruction following incidents, allowing for the examination
and validation of data traceability to understand what happened and why. Moreover, it
supports proactive conformity checks, confirming that policies are implemented as intended.

4.2.7. Precision
Precision refers to the exactness and the ability to discern minor variations between
data points across a distribution. In ISO 25012, it is referred to as discrimination, which
is described as the ability to distinguish subtle differences between phenomena rather
than making coarse generalisations [21]. Following the distinction between accuracy and
precision, associated terms describing uncertainty can be defined to directly shape data
discrimination, aligning with precision’s remit regarding the exactness of attributes.
Initially, both the presence of outliers [16,122,131] (values that significantly differ from
the rest of the dataset) and data values out of range [100] (values outside the defined range for
a dataset) were classified under the accuracy dimension. However, we later moved outliers
to the precision dimension. This is because outliers often represent real-world variation,
not necessarily inaccuracy. While it is acknowledged that measurement outliers caused by
errors do reduce truth representation, the argument remains that outliers generally relate to
distributions. Capturing them ensures that variability in real-life phenomena is represented.
Data 2024, 9, 151 15 of 26

4.2.8. Understandability
The understandability dimension emphasises interpretability [29,33,36,43,51,54,56,
59,61,70,74,89,93,110,125], which is the extent to which users can accurately derive meaning
from data attributes within specific contexts.
Visualisation [75,108] plays a crucial role in transforming data into consumable for-
mats, determining whether insights are accessible to target users without requiring expertise
in intermediate representation languages. Purposeful symbol and icon selection, along with
deliberate graphical arrangement, shape the conveyance of meaning, thereby improving
clarity and comprehension.
Characteristic series structure [95] examines how, when presenting a series of data
points (e.g., in a line graph or chart), the way the x-axis values are ordered and the spacing
between consecutive ordinal values can influence how well users can perceive patterns,
trends, or missing data within that series. This refers to assessing concepts like spacing
between points on a visual chart and whether data increments and scales progress evenly
as this impacts how understandable and continuous data trends appear to those interpreting
meaning from information flows. For instance, in a line graph tracking expenses over
time, having inconsistent gaps between temporal data points, or uneven jumps between
measurement values on the vertical scale axis (uneven ordinal spacing), would distort the
perception of the trend for viewers. Judgment becomes more challenging when formal
spacing between sequences or scalar comparisons is misaligned.
Information-to-noise ratio [75] measures the amount of useful, meaningful content
compared to irrelevant data. Higher ratios improve understandability by allowing users to
focus on informative patterns rather than misleading fluctuations. Semiotic [123] content
is directly related to understandability based on its focus on analysing how encoded signs
and structural formatting choices either enhance or inhibit accurate audience interpretation.

4.2.9. Usefulness
The usefulness dimension represents the ability of data attributes to effectively enable
purpose-driven applications and analyses within relevant contexts. Fitness dimensions such
as fitness for use and fitness for purpose [105,118] were classified under this dimension
as they describe whether data align with functional requirements. Versatility [91,132]
involves the adaptable flexibility enabling sustained usefulness across varying constraints
and evolving requirements.
The relevance of interoperability [63,67,91,99,105,107,116,128,132,134,135,137,144] stems
from combined data sources empowering more capable analyses compared to isolated
datasets alone, and it was found that integrating across systems and content types unlocks
expanded utility. Reusing data assets for distinct applications beyond their original purpose
increases overall usefulness and utility. However, as data assets are reused there is a greater
need for governance as it becomes necessary to protect sensitive information, as well as to
uphold dependability.
Cross-purpose use of data requires greater scrutiny to confirm data validity, integrity,
and value, which carry over into broader decision contexts while security and privacy
safeguarding also appropriately evolves. Conscientious data governance ensures new uses
align to original intent and reliability standards. Therefore, reusability [107] is considered
an associated term of usefulness. However, it is important to note that the concept of
reusability in this context differs from that of its namesake provided by the FAIR principles.
Uniqueness [37,44,50,74,81,83,96,110,124,132,133,146] directly enables usefulness by
revealing previously inaccessible insights through rare and novel data. Exclusive assets
with one-of-a-kind properties intrinsically expand analytical scope into exclusive analytical
opportunities others cannot explore. However, highly unique data may form smaller,
narrower datasets. This can limit the potential scope of applications if analytics require a
large amount of data. There is also the concern, of course, that such data could be spurious.
As such, uniqueness possesses an inherent trade off between novelty and constraints on
wide applicability when data subsets become too sparse.
Data 2024, 9, 151 16 of 26

Expandability [110] directly relates to usefulness because data assets that enable
additional capacities through potential expansion increase overall utility. Planning for
scalable data growth allows meeting evolving objectives over time.
In applications such as machine learning, if data sets are not large enough, “real
data” can be used to generate meaningful “fake” data. These data are commonly known
as synthetic data. Artificially or synthetically generated data can increase usefulness by
enabling scenario modeling, augmentation for underrepresented domains, and privacy-
preserving analysis while retaining informative patterns.
However, unlike raw captures of reality, artificial constructs represent simulated
approximations of true underlying dynamics. Without safeguards ensuring fidelity and
accurate representations in some applications, the use of synthetic data can be misleading
and result in ill-informed decisions. Thus, artificial data holds usefulness for situations
where representative real data prove inadequate if evaluative rigor ensures accuracy.

4.3. System-Dependent Data Quality

System-dependent data quality refers to the aspects of data quality that are influenced
by and specific to the technological or organisational system that collects, stores, manages,
and provides access to the data. As formally defined in the ISO 25012 standard, system-
dependency includes availability, recoverability, and portability—dimensions that can vary
substantially based on capabilities and constraints of interconnected data sources, pipelines,
and storage options. Two additional dimensions are proposed: quantity and semantics.
The complex configuration of different technologies in modern data environments
necessitates closer scrutiny and careful governance of these system-dependent data quality
factors. Both these aspects are critical not just for reliable analytics, but also operational sta-
bility, risk minimisation, and continuous data quality improvement across interconnected
systems, promoting big data and IoT.
Consider an organisation with various databases across different regions. When at-
tempting to integrate these databases, the system-dependent data quality becomes apparent.
Data availability may be high within each local system, but it may be limited when trying
to access it from a different region due to network constraints or data sovereignty laws.
Addressing system-dependent data quality requires a comprehensive understand-
ing of the system landscape. This includes establishing clear data definitions (to ensure
consistent semantics), implementing reliable backup and recovery processes (for data
recoverability), and designing flexible data architectures (to support data portability).

4.3.1. Availability
The fundamental notion captured by the availability dimension is whether users
can access the data they need when they require them. This means that the data should
be readily obtainable and usable by authorised individuals or systems whenever it is
necessary for their specific purpose. The terms assigned under this dimension, including
access security [29,33,56,93,135], adequacy [122], attainability [68], obtainability [54],
usability [28,49,75,79,84,85,110,114,129,131,132,141], and visibility [132], all impact the
degree to which data meet these criteria of availability. This category includes terms that
address both the user’s ability to find, access, and retrieve the data, as well as the adequacy
of the data available to a given user.

4.3.2. Portability
The key requirement captured by the portability dimension is in preserving the utility
and meaning of data when moving across storage, software, and hardware environments.
Portability directly denotes the ease of data transition across different systems and en-
vironments without any loss of quality or integrity. It ensures data can be seamlessly
moved between multiple platforms and applications whilst retaining its original meaning.
Portability enables data to be used and reused across multiple contexts without the need
for extensive transformations or adaptations, saving time and effort. It also minimises the
Data 2024, 9, 151 17 of 26

risk of data corruption, inconsistencies, or loss during the transition process. By prioritising
portability, organisations can ensure that their data remain accessible, usable, and valuable
regardless of the specific technology stack or infrastructure in use.
Building on the ideas of portability, mobility, controllability, and use of storage
further refines how data portability can be effectively managed and optimised across
different technical frameworks, i.e., towards enhancing the overall utility and integrity of
data during transfers. Mobility [44] refers to how easily whole data sets can be ported
across these frameworks. Controllability [44] corresponds to standardised mechanisms
governing validation, transport, and backup to enable error-free porting. Lastly, use of
storage [54] aligns to leveraging portable storage formats, protocols, and abstraction layers
that prevent system-dependence and degradation risks.
In this context, abstraction layers refer to functional tiers that hide complex implemen-
tation details behind simplified interfaces. Each layer provides services to the layer above it
whilst using capabilities from the layer below. This enables modular design by decoupling
high-level business needs from low-level technical realisations. Abstraction creates porta-
bility across different systems through well-defined application programming interfaces
(APIs)—mechanisms that enable multiple software components to communicate—and
protocols between abstracted layers instead of actual implementations. It enables interoper-
ability, ease of modification, and reuse across diverse deployment environments. In essence,
appropriately layered abstractions minimise external dependencies whilst revealing only
essential functions.

4.3.3. Quantity
The quantity dimension assesses whether the amount and coverage of available data
is sufficient to completely and accurately capture information for its intended application. It
also considers if existing data assets have enough detail, breadth, and granularity in terms
of volumes and varieties to support downstream applications and decisions. Amount of
data [33,93], data volume [80,110,135], and volume [23,28,31,33,57,58,60,62,75,87,108,110]
directly quantify the absolute or proportional magnitude of data.
Scalability [75] examines the ability to expand or down-sample data quantities to
meet application requirements. Sufficiency [43], suitable amount [51], and appropri-
ate amount [29,33,56,59,88,105,110,137] evaluate if quantities meet adequacy thresholds
for intended analytic tasks and decisions. Compactness [75] evaluates storage optimisa-
tion through compression and minimising redundancy to retain necessary details while
maximising efficiency. Quantifying this compactness trade off enables retaining the most
relevant information and reducing data volumes enough to enable efficient large-scale
processing and storage. Having appropriate data volumes is crucial for machine learning
and big data analytics to uncover insights.
Thus, these associated terms assess if different quantitative needs around comprehen-
siveness, adequacy, scalability, and storage efficiency are fulfilled so that data quantities
can provide a satisfactory informational picture.

4.3.4. Recoverability
The recoverability dimension requires attributes that enable data to withstand dis-
ruptive events and restore original fidelity, operability, and utility. The terms backup [123],
decay [85], and recoverability [44,98,122,136,139] directly bolster these necessities. Backup
refers to maintaining redundant, secondary data copies using archival and snapshot tech-
niques for reinstantiating compromised data post-outages. Decay corresponds to safeguard-
ing information integrity over elongated retention cycles against deterioration or distortions
over time through aging. Recoverability denotes mechanisms that rapidly invoke fail-safe
points, redeploy historical instances, and repair corrupted elements via backups to resume
services with minimal data loss.
Data 2024, 9, 151 18 of 26

4.3.5. Semantics
The semantics quality dimension requires data to have enough contextual attributes
to convey interpretable meaning to users or applications within specific contexts. Contex-
tual attributes are additional pieces of information that provide background, descriptive
details, or related characteristics about the main data points. These attributes help to clar-
ify the meaning, significance, and implications of the data within a particular setting or
use case. Semantic accuracy [68,91,132] ensures information objectively represents real-
world entities, properties, and relationships without distortion or ambiguity. Semantic
consistency [54,71] requires “persistent”, “unified”, and “coherent” meaning per stan-
dard definitions despite usage and modifications over time. Thus, semantic accuracy and
consistency quantify preservation and stability of meaning over data modification cycles.
For example, consider an electronic health record (EHR) system used in a hospital.
Semantic accuracy ensures that each patient’s demographic information, medical history,
diagnoses, and treatment plans are accurately recorded and reflect their real-world health
status without any errors or misinterpretations. If the blood type of a patient is recorded
as “A+” in one section of the EHR but as “A−” in another, it could lead to serious medical
errors and a lack of semantic accuracy.
Building on the same example, semantic consistency, in this context ensures that the
meaning and interpretation of the patient data remain the same across different healthcare
providers, applications, and time periods. If the “diagnosis” field in the EHR initially repre-
sents the primary diagnosis but later includes secondary diagnoses without clear labeling,
it could lead to confusion and inconsistencies in understanding the health condition of
the patient. Maintaining semantic consistency ensures that all healthcare providers can
correctly interpret and use the patient data for effective treatment and decision making.
Syntax and syntactic validity govern structuring information elements and validation
of schematic rules to enable processing. Language [54] denotes representing information
per conventions and vocabularies suited for sharing understanding across different user
groups. Interlinking [91,126,132] establishes explicit linkages across data sources and
elements to enrich insights from semantic connections. By governing faithful representation
and relation integrity, these associated terms ensure data generates intended meaning and
responds reliably during analytic processing.
While the argument could be made that most of the terms the authors have consid-
ered as part of the semantics data quality dimension could be aggregated into existing
dimensions (such as semantic accuracy and semantic consistency under accuracy and
consistency, respectively), the fact is that semantic-driven technologies (such as ontologies
and knowledge graphs) are becoming increasingly relevant and bring a very specific set of
requirements and regulations, which merit the addition of semantics as a standalone data
quality dimension.
For example, consider a knowledge graph used in a biomedical research platform.
The knowledge graph integrates data from various sources, such as scientific literature,
clinical trials, and patient records, to enable researchers to discover new insights and
relationships between diseases, drugs, and genes. In this context, the semantics dimension
becomes crucial as it ensures that the data are not only accurate and consistent, but also
semantically rich and meaningful.
In this scenario, the use of a well-defined ontology captures the complex relationships
between biomedical entities. By leveraging the ontology, the knowledge graph can accu-
rately represent the relationships between a particular gene and its associated diseases,
enabling researchers to make informed hypotheses and decisions. The semantic richness
provided by the ontology goes beyond simple accuracy and consistency as it allows for the
inference of new knowledge and the identification of previously unknown connections.
On the other hand, a knowledge graph that suffers from semantic inconsistencies and
inaccuracies can lead to false conclusions and misdirected research efforts. For instance,
if the knowledge graph incorrectly associates a gene with a disease due to conflicting or
outdated information from different sources, researchers may draw erroneous conclusions
Data 2024, 9, 151 19 of 26

based on this inaccurate semantic representation. Similarly, if the relationships between

entities are inconsistently represented across different parts of the knowledge graph, it
could lead to confusion and hinder the ability to draw meaningful insights. In these cases,
the lack of semantic accuracy and consistency undermines the reliability and usefulness of
the knowledge graph, even if the data satisfy other quality dimensions.

5. Discussion
The comparative analysis of data quality terminology across various domains high-
lights the critical need for a standardised framework that can facilitate effective cross-
domain communication and collaboration. By being able to assess the quality of data using
the diverse dimensions of data quality via a standardised framework, communication
between stakeholders can be made simpler and more efficient. In addition, the use of a
standardised framework also holds the potential to streamline regulation-heavy processes
as the framework can be used to justify the use of specific data management practices. This
perspective on a generalisable and standardised framework for data quality assessment
not only showcases the relevance of this work, but also positions its findings within the
broader context of ongoing challenges in data quality assessment.
As such, the contributions of this work are significant in advancing the understanding
of data quality dimensions. By integrating traditional data quality dimensions from estab-
lished frameworks, such as ISO 25012 [21], with newly proposed dimensions in the form of
governance, usefulness, quantity, and semantics, we provide a more comprehensive per-
spective on data quality. This approach not only aligns with existing literature by taking into
account existing specific frameworks [28,34,45,50,104,108,124,126,133,147,148] and mapping
the dimensions used to a common terminology, but it also introduces nuanced distinctions
that are increasingly relevant for data practitioners. For instance, the addition of governance
as a dimension does create some overlap with the existing dimension of compliance [8,21,41];
however, the differentiation between the two highlights the multifaceted nature of data man-
agement, where internal policies and external regulations must be navigated effectively and
are represented by these two dimensions, respectively [47,50,72,105,110]. Another example
comes from the addition of semantics and understandability as data quality dimensions.
While these two dimensions also share an overlap, despite being both proposed by this
work, the distinction is relevant given the rise in use of semantic-based technologies that
have very specific requirements of data suitability [71,79,149]. These insights contribute to
the ongoing discourse in data quality research, offering a framework that can adapt to the
evolving needs of various domains while retaining a common language basis.
From a practical point of view, this work gives stakeholders and industries, such
as data management professionals, policymakers, and business leaders, the benefits of a
structured classification of data quality terminology. The model proposed in this work
can be consulted at the beginning of projects to construct and agree on the terminology
to use when communicating data quality for data management. By adopting a common
language framework for data quality, organisations can improve their data management
practices and enhance decision-making processes when performing cross-organisational
and cross-domain projects. This work serves as a foundational step towards establishing a
standardised set of data quality metrics and assessments that can be applied universally,
leading to more effective data management strategies.

6. Conclusions
In this paper, we surveyed recent literature focusing on data quality, and it was found
that there is a severe lack of standardised terminology in the field. This led us to explore
the aggregation of disparate data quality terms under common umbrella terms, with an
initial focus on the ISO 25012 standard: “Data quality model for Software product Quality
Requirements and Evaluation”. The choice of this standard as the starting point for this
work was deliberate given data science, as an area of research and application, shares many
common elements with software development.
Data 2024, 9, 151 20 of 26

These umbrella terms for describing dimensions of data quality can be classified as
inherent, contextual, and system dependent. While the framework proposed in ISO 25012
does a commendable job at capturing these dimensions, we discovered that many of the
terms prevalent in recent literature did not fit into any of the umbrella terms described
in the standard. To address this gap, we proposed the addition of four additional data
quality dimensions: governance, usefulness, quantity, and semantics. The first two are con-
textual dimensions, while the latter two are system dependent. The addition of these four
dimensions to those already established by the ISO 25012 standard increases complexity,
but also enhances specificity, enabling end users to fully understand the aspect of data
quality captured by each dimension.
This creates a consistent representation of the multifaceted aspects of data quality and
enables the design of the Data Quality Data Model, which can serve as a common and
generalisable framework for assessing data quality, irrespective of the intended application.
In conclusion, our research underscores the need for a more comprehensive, adaptable,
and sector-sensitive approach to data quality assessment, aiming to facilitate collaboration
and communication of data quality terminology and assessment across different domains.

Future Directions
The classification of data quality dimensions performed to develop the data quality
framework proposed by this work has a taxonomical hierarchy as the basis for its structure.
This structure limits the assignment of data quality terms to more than one parent dimension,
preventing the parent dimensions from overlapping each other. As such, the classification
exercise has the literature research outlined in Section 2.1 as its basis, and, while this process
was comprehensive, it was by no means exhaustive due to the abundance of the available
literature in the field.
Moving forwards, the next steps in this research will be to operationalise the Data
Quality Data Model proposed in this paper. This is planned by the development of a data
quality ontology, where the overlap between classes and relationships between the data
quality dimensions can be made more detailed. It is our expectation that this can then serve
as a platform for data practitioners across diverse domains to consult, as well as to submit,
their own findings and learning related to data quality.
This has the potential to support the long-term aim of developing a mechanism that
can quantify data quality according to each dimension. This will also involve creating
a robust metric or set of metrics that can accurately measure the quality of data in each
dimension. Another key area of future work will be the introduction of a weighting factor.
This factor will be adjustable depending on industry needs, allowing for a more tailored
approach to data quality assessment. This will ensure that the model remains flexible and
adaptable to various industry contexts.

Author Contributions: Conceptualisation: J.G. and P.D.; methodology: J.G., R.M. and H.W.; software:
R.M. and H.W.; validation: J.G. and P.D.; formal analysis: J.G., R.M. and H.W.; investigation: J.G., R.M.,
H.W., D.W. and M.C.; resources: J.G. and P.D.; data curation: J.G., R.M. and H.W.; writing—original
draft preparation: R.M. and H.W.; writing—review and editing: M.C., D.W. and J.G.; visualisation:
R.M. and H.W.; supervision: J.G.; project administration: P.D.; funding acquisition: P.D. All authors
have read and agreed to the published version of the manuscript.
Funding: This work was funded by the UK Government Department for Science, Innovation and
Technology through the UK’s National Measurement System.
Institutional Review Board Statement: Not applicable
Informed Consent Statement: Not applicable
Data Availability Statement: The original contributions presented in this study are included in the
article; further enquiries can be directed to the corresponding authors.
Acknowledgments: Thanks is given to Moulham Alsuleman and Louise Wright for providing feedback
on the manuscript.
Data 2024, 9, 151 21 of 26

Conflicts of Interest: The authors declare no conflicts of interest.

References
1. Liu, C.; Peng, G.; Kong, Y.; Li, S.; Chen, S. Data Quality Affecting Big Data Analytics in Smart Factories: Research Themes, Issues
and Methods. Symmetry 2021, 13, 1440. [CrossRef]
2. Günther, L.C.; Colangelo, E.; Wiendahl, H.H.; Bauer, C. Data quality assessment for improved decision-making: A methodology
for small and medium-sized enterprises. Procedia Manuf. 2019, 29, 583–591. [CrossRef]
3. Fenza, G.; Gallo, M.; Loia, V.; Orciuoli, F.; Herrera-Viedma, E. Data set quality in machine learning: Consistency measure based
on group decision making. Appl. Soft Comput. 2021, 106, 107366. [CrossRef]
4. Ferencek, A.; Borstnar, M. Data quality assessment in product failure prediction models. J. Decis. Syst. 2020, 29, 79–86. [CrossRef]
5. Durá, M.; Leal, F.; Sánchez-García, Á.; Sáez, C.; García-Gómez, J.M.; Chis, A.E.; González-Vélez, H. Blockchain for data originality
in pharma manufacturing. J. Pharm. Innov. 2023, 18, 1745–1763. [CrossRef]
6. Alosert, H.; Savery, J.; Rheaume, J.; Cheeks, M.; Turner, R.; Spencer, C.; Farid, S.S.; Goldrick, S. Data integrity within the
biopharmaceutical sector in the era of Industry 4.0. Biotechnol. J. 2022, 17, 2100609. [CrossRef]
7. Wang, Z.; He, D.; Hou, Y. Data-Driven Adaptive Quality Control Under Uncertain Conditions for a Cyber-Pharmaceutical-
Development System. IEEE Trans. Ind. Inform. 2021, 17, 3165–3175. [CrossRef]
8. Kavasidis, I.; Lallas, E.; Leligkou, H.C.; Oikonomidis, G.; Karydas, D.; Gerogiannis, V.C.; Karageorgos, A. Deep Transformers
for Computing and Predicting ALCOA+ Data Integrity Compliance in the Pharmaceutical Industry. Appl. Sci. 2023, 13, 7616.
[CrossRef]
9. Arden, N.S.; Fisher, A.C.; Tyner, K.; Lawrence, X.Y.; Lee, S.L.; Kopcha, M. Industry 4.0 for pharmaceutical manufacturing:
Preparing for the smart factories of the future. Int. J. Pharm. 2021, 602, 120554. [CrossRef] [PubMed]
10. Hock, S.C.; Tay, V.; Sachdeva, V.; Wah, C.L. Pharmaceutical Data Integrity: Issues, challenges and proposed solutions for
manufacturers and inspectors. Generics Biosimilars Initiat. J. 2020, 9, 171–183. [CrossRef]
11. Boukouvala, F.; Muzzio, F.J.; Ierapetritou, M.G. Predictive modeling of pharmaceutical processes with missing and noisy data.
AIChE J. 2010, 56, 2860–2872. [CrossRef]
12. Hart, R.; Kuo, M. Better Data Quality for Better Healthcare Research Results—A Case Study. Stud. Health Technol. Inform. 2017,
234, 161–166. [PubMed]
13. Liu, C.; Talaei-Khoei, A.; Zowghi, D.; Daniel, J. Data Completeness in Healthcare: A Literature Survey. Pac. Asia J. Assoc. Inf. Syst.
2017, 9, 75–100. [CrossRef]
14. Hickey, D.; Connor, R.; McCormack, P.; Kearney, P.; Rosti, R.; Brennan, R. The Data Quality Index: Improving Data Quality in
Irish Healthcare Records. In Proceedings of the 24th International Conference Enterprise Information Systems (ICEIS ’21), Virtual
Event, 25–27 April 2021; pp. 625–636.
15. Kong, X. Evaluation of Flight Test Data Quality Based on Rough Set Theory. In Proceedings of the 2020 13th International Congress
on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), Chengdu, China, 17–19 October 2020;
pp. 1053–1057.
16. Valverde, C.; Marotta, A.; Panach, J.; Vallespir, D. Towards a model and methodology for evaluating data quality in software
engineering experiments. Inf. Softw. Technol. 2022, 151, 107029. [CrossRef]
17. Zulkiffli, P.; Akshir, E.; Azis, N.; Cox, K. The development of data quality metrics using thematic analysis. Int. J. Innov. Technol.
Explor. Eng. 2019, 8, 304–310.
18. Uddin, M.F.; Gupta, N. Seven V’s of Big Data understanding Big Data to extract value. In Proceedings of the 2014 Zone 1 Conference
of the American Society for Engineering Education, Bridgeport, CT, USA, 3–5 April 2014; pp. 1–5.
19. Iturry, M.; Alves-Souza, S.; Ito, M. Data Quality in health records: A literature review. In Proceedings of the 2021 16th Iberian
Conference on Information Systems and Technologies (CISTI), Chaves, Portugal, 23–26 June 2021.
20. Burkhardt, A.; Berryman, S.; Brio, A.; Ferkau, S.; Hubner, G.; Lynch, K.; Mittman, S.; Sonderer, K. Measuring Manufacturing Test Data
Analysis Quality. In Proceedings of the 2018 IEEE AUTOTESTCON, National Harbor, MD, USA, 17–20 September 2018; pp. 359–364.
21. ISO/IEC 25012:2008; Software Engineering—Software Product Quality Requirements and Evaluation (SQuaRE)—Data Quality
Model. Technical Report. International Organization for Standardization: Geneva, Switzerland, 2008.
22. Chen, H.; Hailey, D.; Wang, N.; Yu, P. A Review of Data Quality Assessment Methods for Public Health Information Systems. Int.
J. Environ. Res. Public Health 2014, 11, 5170–5207. [CrossRef] [PubMed]
23. Liu, J.; Li, J.; Li, W.; Wu, J. Rethinking big data: A review on the data quality and usage issues. ISPRS J. Photogramm. Remote Sens.
2016, 115, 134–142. [CrossRef]
24. Ekegren, C.; Gabbe, B.; Finch, C. Sports Injury Surveillance Systems: A Review of Methods and Data Quality. Sport. Med. 2016,
46, 49–65. [CrossRef]
25. Abdullah, M.; Arshah, R. A Review of Data Quality Assessment: Data Quality Dimensions from User’s Perspective. Adv. Sci.
Lett. 2018, 24, 7824–7829. [CrossRef]
26. Stausberg, J.; Nasseh, D.; Nonnemacher, M. Measuring Data Quality: A Review of the Literature between 2005 and 2013. Stud.
Health Technol. Inform. 2015, 210, 712–716.
27. Wang, X.; Williams, C.; Liu, Z.; Croghan, J. Big data management challenges in health research—A literature review. Briefings
Bioinform. 2019, 20, 156–167. [CrossRef] [PubMed]
Data 2024, 9, 151 22 of 26

28. Ijab, M.T.; Surin, E.S.M.; Nayan, N.M. Conceptualizing big data quality framework from a systematic literature review perspective.
Malays. J. Comput. Sci. 2019, 25–37. [CrossRef]
29. Liu, G. Data quality problems troubling business and financial researchers: A literature review and synthetic analysis. J. Bus.
Financ. Librariansh. 2020, 25, 315–371. [CrossRef]
30. Teh, H.; Kempa-Liehr, A.; Wang, K. Sensor data quality: A systematic review. J. Big Data 2020, 7, 11. [CrossRef]
31. Salih, F.; Ismail, S.; Hamed, M.; Yusop, O.; Azmi, A.; Azmi, N. Data Quality Issues in Big Data: A Review. In Recent Trends in Data
Science and Soft Computing. IRICT 2018; Advances in Intelligent Systems and Computing; Springer: Cham, Switzerland, 2019;
Volume 843, pp. 105–116.
32. Ibrahim, A.; Mohamed, I.; Satar, N. Factors Influencing Master Data Quality: A Systematic Review. Int. J. Adv. Comput. Sci. Appl.
2021, 12, 181–192. [CrossRef]
33. Mansouri, T.; Moghadam, M.; Monshizadeh, F.; Zareravasan, A. IoT Data Quality Issues and Potential Solutions: A Literature
Review. Comput. J. 2023, 66, 615–625. [CrossRef]
34. Engsig-Karup, T.; Doupi, P.; Makinen, M.; Launa, R.; Estupinan-Romero, F.; Bernal-Delgado, E.; Kristiansen, N. Review of data
quality assessment frameworks experiences around Europe. Eur. J. Public Health 2022, 32, ii202–ii203. [CrossRef]
35. Ozonze, O.; Scott, P.; Hopgood, A. Automating Electronic Health Record Data Quality Assessment. J. Med Syst. 2023, 47, 23.
[CrossRef]
36. Mashoufi, M.; Ayatollahi, H.; Khorasani-Zavareh, D.; Boni, T. Data Quality in Health Care: Main Concepts and Assessment
Methodologies. Methods Inf. Med. 2023, 62, 5–18. [CrossRef] [PubMed]
37. Morewood, J. Building energy performance monitoring through the lens of data quality: A review. Energy Build. 2023, 279, 112701.
[CrossRef]
38. Pradhan, S.; Heyn, H.; Knauss, E. Identifying and managing data quality requirements: A design science study in the field of
automated driving. Softw. Qual. J. 2023, 32, 313–360. [CrossRef]
39. Zhang, L.; Jeong, D.; Lee, S. Data Quality Management in the Internet of Things. Sensors 2021, 21, 5834. [CrossRef]
40. Firmani, D.; Mecella, M.; Scannapieco, M.; Batini, C. On the Meaningfulness of Big Data Quality (Invited Paper). Data Sci. Eng.
2016, 1, 6–20. [CrossRef]
41. Durá, M.; Sánchez-García, Á.; Sáez, C.; Leal, F.; Chis, A.E.; González-Vélez, H.; García-Gómez, J.M. Towards a computational
approach for the assessment of compliance of ALCOA+ Principles in pharma industry. Stud. Health Technol. Inform. 2022, 294, 755–759.
42. Jaya, I.; Sidi, F.; Ishak, I.; Affendey, L.; Jabar, M.A. A review of data quality research in achieving high data quality within
organization. J. Theor. Appl. Inf. Technol. 2017, 95, 2647–2657.
43. Wand, Y.; Wang, R.Y. Anchoring Data Quality Dimensions in Ontological Foundations. Commun. ACM 1996, 39, 86–95. [CrossRef]
44. Efimova, O.V.; Igolnikov, B.V.; Isakov, M.P.; Dmitrieva, E.I. Data Quality and Standardization for Effective Use of Digital Platforms.
In Proceedings of the 2021 International Conference on Quality Management, Transport and Information Security, Information
Technologies (IT&QM&IS), Yaroslavl, Russia, 6–10 September 2021; pp. 282–285.
45. Arts, D.G.; De Keizer, N.F.; Scheffer, G.J. Defining and improving data quality in medical registries: A literature review, case
study, and generic framework. J. Am. Med Inform. Assoc. 2002, 9, 600–611. [CrossRef]
46. Weiskopf, N.G.; Weng, C. Methods and dimensions of electronic health record data quality assessment: Enabling reuse for clinical
research. J. Am. Med Inform. Assoc. 2013, 20, 144–151. [CrossRef] [PubMed]
47. Tabersky, D.; Woelfle, M.; Ruess, J.A.; Brem, S.; Brombacher, S. Recent regulatory trends in pharmaceutical manufacturing and
their impact on the industry. Chimia 2018, 72, 146–150. [CrossRef]
48. Leal, F.; Chis, A.E.; Caton, S.; González-Vélez, H.; García-Gómez, J.M.; Durá, M.; Sánchez-García, A.; Sáez, C.; Karageorgos, A.;
Gerogiannis, V.C.; et al. Smart pharmaceutical manufacturing: Ensuring end-to-end traceability and data integrity in medicine
production. Big Data Res. 2021, 24, 100172. [CrossRef]
49. Cai, L.; Zhu, Y. The challenges of data quality and data quality assessment in the big data era. Data Sci. J. 2015, 14, 2. [CrossRef]
50. Hub, G.D.Q. The Government Data Quality Framework; Technical Report; Government Digital Service: London, UK, 2020.
51. Botha, M.; Botha, A.; Herselman, M. Compiling a Prioritized List of Health Data Quality Challenges in Public Healthcare Systems.
In Proceedings of the IST-Africa 2014 Conference Proceedings, Pointe aux Piments, Mauritius, 7–9 May 2014.
52. Heinrich, B.; Klier, M. Metric-based data quality assessment—Developing and evaluating a probability-based currency metric.
Decis. Support Syst. 2015, 72, 82–96. [CrossRef]
53. Cappiello, C.; Pernici, B.; Villani, L. Strategies for Data Quality Monitoring in Business Processes. In Web Information Systems
Engineering. WISE 2014; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2015; Volume 9051, pp. 226–238.
54. Jesilevska, S. Data quality aspects in latvian innovation system. In Proceedings of the New Challenges of Economic and Business
Development, Riga, Latvia, 12–14 May 2016; pp. 307–320.
55. Ortega-Ruiz, L.; Caro, A.; Rodriguez, A. Identifying the Data Quality terminology used by Business People. In Proceedings of
the 2015 34th International Conference of the Chilean Computer Science Society (SCCC), Santiago, Chile, 9–13 November 2015.
56. Laranjeiro, N.; Soydemir, S.; Bernardino, J. A Survey on Data Quality: Classifying Poor Data. In Proceedings of the 2015 IEEE
21st Pacific Rim International Symposium on Dependable Computing (PRDC), Zhangjiajie, China, 18–20 November 2015.
57. Becker, D.; McMullen, B.; King, T. Big data, big data quality problem. In Proceedings of the 2015 IEEE International Conference
on Big Data (Big Data), Santa Clara, CA, USA, 29 October–1 November 2015; pp. 2644–2653.
Data 2024, 9, 151 23 of 26

58. Rao, D.; Gudivada, V.; Raghavan, V. Data Quality Issues in Big Data. In Proceedings of the 2015 IEEE International Conference
on Big Data (Big Data), Santa Clara, CA, USA, 29 October–1 November 2015; pp. 2654–2660.
59. Juddoo, S. Overview of data quality challenges in the context of Big Data. In Proceedings of the 2015 International Conference on
Computing, Communication and Security (ICCCS), Pointe aux Piments, Mauritius, 4–5 December 2015.
60. Taleb, I.; El Kassabi, H.; Serhani, M.; Dssouli, R.; Bouhaddioui, C. Big Data Quality: A Quality Dimensions Evaluation. In
Proceedings of the 2016 Intl IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing,
Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress
(UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld), Toulouse, France, 18–21 July 2016; pp. 759–765.
61. Jiang, H.; Liang, L.; Zhang, Y. An Exploration of Data Quality Management Based on Allocation Efficiency Model. In Proceedings
of the 20th International Conference on Industrial Engineering and Engineering Management: Theory and Apply of Industrial
Management, Singapore, 6–9 December 2015; pp. 313–318.
62. Haug, F. Bad Big Data Science. In Proceedings of the 2016 IEEE International Conference on Big Data (Big Data), Washington,
DC, USA, 5–8 December 2016; pp. 2863–2871.
63. Karkouch, A.; Mousannif, H.; Al Moatassime, H.; Noel, T. A Model-Driven Architecture-based Data Quality Management
Framework for the Internet of Things. In Proceedings of the 2016 2nd International Conference on Cloud Computing Technologies
and Applications (CloudTech), Marrakech, Morocco, 24–26 May 2016; pp. 252–259.
64. Rivas, B.; Merino, J.; Caballero, I.; Serrano, M.; Piattini, M. Towards a service architecture for master data exchange based on ISO
8000 with support to process large datasets. Comput. Stand. Interfaces 2017, 54, 94–104. [CrossRef]
65. Aljumaili, M.; Karim, R.; Tretten, P. Metadata-based data quality assessment. VINE J. Inf. Knowl. Manag. Syst. 2016, 46, 232–250.
[CrossRef]
66. Heinrich, B.; Hristova, D.; Klier, M.; Schiller, A.; Szubartowicz, M. Requirements for Data Quality Metrics. J. Data Inf. Qual. 2018,
9, 1–32. [CrossRef]
67. Edelen, A.; Ingwersen, W. The creation, management, and use of data quality information for life cycle assessment. Int. J. Life
Cycle Assess. 2018, 23, 759–772. [CrossRef] [PubMed]
68. Fu, Q.; Easton, J. Understanding Data Quality Ensuring Data Quality by Design in the Rail Industry. In Proceedings of the 2017
IEEE International Conference on Big Data (Big Data), Boston, MA, USA, 11–14 December 2017; pp. 3792–3799.
69. Lim, Y.; Yusof, M.; Sivasampu, S. Assessing primary care data quality. Int. J. Health Care Qual. Assur. 2018, 31, 203–213. [CrossRef]
[PubMed]
70. Jesilevska, S.; Skiltere, D. Analysis of deficiencies of data quality dimensions. In Proceedings of the New Challenges of Economic
and Business Development, Riga, Latvia, 18–20 May 2017; pp. 236–246.
71. Heinrich, B.; Klier, M.; Schiller, A.; Wagner, G. Assessing data quality—A probability-based metric for semantic consistency.
Decis. Support Syst. 2018, 110, 95–106. [CrossRef]
72. Koltay, T. Data governance, data literacy and the management of data quality. IFLA J. 2016, 42, 303–312. [CrossRef]
73. Cichy, C.; Rass, S. An Overview of Data Quality Frameworks. IEEE Access 2019, 7, 24634–24648. [CrossRef]
74. Gyulgyulyan, E.; Ravat, F.; Astsatryan, H.; Aligon, J. Data Quality Impact in Business Inteligence. In Proceedings of the 2018
Ivannikov Memorial Workshop (IVMEM), Yerevan, Armenia, 3–4 May 2018; pp. 47–51.
75. Abdallah, M. Big Data Quality Challenges. In Proceedings of the 2019 International Conference on Big Data and Computational
Intelligence (ICBDCI), Le Meridian, Mauritius, 8–9 February 2019.
76. Rajan, N.; Gouripeddi, R.; Mo, P.; Madsen, R.; Facelli, J. Towards a content agnostic computable knowledge repository for data
quality assessment. Comput. Methods Programs Biomed. 2019, 177, 193–201. [CrossRef]
77. Bronselaer, A.; Nielandt, J.; Boeckling, T.; De Tre, G. Operational Measurement of Data Quality. In Information Processing and
Management of Uncertainty in Knowledge-Based Systems. Applications. IPMU 2018; Communications in Computer and Information
Science; Springer: Cham, Switzerland, 2018; Volume 855, pp. 517–528.
78. Barsi, A.; Kugler, Z.; Juhasz, A.; Szabo, G.; Batini, C.; Abdulmuttalib, H.; Huang, G.; Shen, H. Remote sensing data quality model:
From data sources to lifecycle phases. Int. J. Image Data Fusion 2019, 10, 280–299. [CrossRef]
79. Liu, Y.; Wang, Y.; Zhou, K.; Yang, Y.; Liu, Y. Semantic-aware data quality assessment for image big data. Future Gener. Comput.
Syst. 2020, 102, 53–65. [CrossRef]
80. Liu, C.; Nitschke, P.; Williams, S.; Zowghi, D. Data quality and the Internet of Things. Computer 2020, 102, 573–599. [CrossRef]
81. Cristalli, E.; Serra, F.; Marotta, A. Data Quality Evaluation in Document Oriented Data Stores. In Advances in Conceptual Modeling.
ER 2018; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2019; Volume 11158, pp. 309–318.
82. Firmani, D.; Tanca, L.; Torlone, R. Ethical Dimensions for Data Quality. J. Data Inf. Qual. 2020, 12, 1–5. [CrossRef]
83. Grueneberg, K.; Calo, S.; Dewan, P.; Verma, D.; O’Gorman, T. A Policy-based Approach for Measuring Data Quality. In Proceedings
of the 2017 IEEE International Conference on Big Data (Big Data), Boston, MA, USA, 9–12 December 2019; pp. 4025–4031.
84. Mustapha, J.C.; Mokhtar, S.A.; Jaffar, J.; Boursier, P. Measurement of Data Consumer Satisfaction with Data Quality for Improvement
of Data Utilization. In Proceedings of the 2019 13th International Conference on Mathematics, Actuarial Science, Computer Science
and Statistics (MACS), Karachi, Pakistan, 14–15 December 2019; pp. 1–7.
85. Ceravolo, P.; Bellini, E. Towards Configurable Composite Data Quality Assessment. In Proceedings of the 2019 IEEE 21st Conference
on Business Informatics (CBI), Moscow, Russia, 15–17 July 2019; Volume 1, pp. 249–257.
Data 2024, 9, 151 24 of 26

86. Ehrlinger, L.; Haunschmid, V.; Palazzini, D.; Lettner, C. A DaQL to Monitor Data Quality in Machine Learning Applications. In
Database and Expert Systems Applications. DEXA 2019; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2019;
pp. 227–237.
87. Ridzuan, F.; Zainon, W.M.N.W. A Review on Data Cleansing Methods for Big Data. Procedia Comput. Sci. 2019, 161, 731–738.
[CrossRef]
88. Li, A.; Zhang, L.; Qian, J.; Xiao, X.; Li, X.; Xie, Y. TODQA: Efficient Task-Oriented Data Quality Assessment. In Proceedings of
the 2019 15th International Conference on Mobile Ad-Hoc and Sensor Networks (MSN), Shenzhen, China, 11–13 December 2019;
pp. 81–88.
89. Souibgui, M.; Atigui, F.; Zammali, S.; Cherfi, S.; Yahia, S.B. Data quality in ETL process: A preliminary study. Procedia Comput.
Sci. 2019, 159, 676–687. [CrossRef]
90. Nikiforova, A. Definition and Evaluation of Data Quality: User-Oriented Data Object-Driven Approach to Data Quality Assessment.
Balt. J. Mod. Comput. 2020, 8, 391–432. [CrossRef]
91. Albertoni, R.; Isaac, A. Introducing the Data Quality Vocabulary (DQV). Semant. Web 2021, 12, 81–97. [CrossRef]
92. Mulgund, P.; Sharman, R.; Anand, P.; Shekhar, S.; Karadi, P. Data Quality Issues with Physician-Rating Websites: Systematic
Review. J. Med Internet Res. 2020, 22, e15916. [CrossRef]
93. Valencia-Parra, A.; Parody, L.; Varela-Vaca, A.; Caballero, I.; Gomez-Lopez, M. DMN4DQ: When data quality meets DMN. Decis.
Support Syst. 2021, 141, 113450. [CrossRef]
94. Onyeabor, G.; Ta’a, A. A Model for Addressing Quality Issues in Big Data. In Recent Trends in Data Science and Soft Computing.
IRICT 2018; Advances in Intelligent Systems and Computing; Springer: Cham, Switzerland, 2019; Volume 843, pp. 65–73.
95. Marev, M.; Compatangelo, E.; Vasconcelos, W. Intrinsic Indicators for Numerical Data Quality. In Proceedings of the 5th International
Conference on Internet of Things, Big Data and Security, IoTBDS 2020, Prague, Czech Republic, 7–9 May 2020; pp. 341–348.
96. Sarafidis, M.; Tarousi, M.; Anastasiou, A.; Pitoglou, S.; Lampoukas, E.; Spetsarias, A.; Matsopoulos, G.; Koutsouris, D. Data
Quality Challenges in a Learning Health System. Stud. Health Technol. Inform. 2020, 270, 143–147. [PubMed]
97. Musto, J.; Dahanayake, A. Integrating data quality requirements to citizen science application design. In Proceedings of the 11th
International Conference on Management of Digital EcoSystems, Limassol, Cyprus, 12–14 November 2019; pp. 166–173.
98. Musto, J.; Dahanayake, A. Improving Data Quality, Privacy and Provenance in Citizen Science Applications. In Information
Modelling and Knowledge Bases XXXI; Frontiers in Artificial Intelligence and Applications; IOS Press: Amsterdam, The Netherlands,
2020; Volume 321, pp. 141–160.
99. Weatherburn, C. Data quality in primary care, Scotland. Scott. Med. J. 2021, 66, 66–72. [CrossRef]
100. Gadde, M.; Wang, Z.; Zozus, M.; Talburt, J.; Greer, M. Rules Based Data Quality Assessment on Claims Database. Stud. Health
Technol. Inform. 2020, 272, 350–353. [PubMed]
101. Foscarin, F.; Rigaux, P.; Thion, V. Data quality assessment in digital score libraries The GioQoso Project. Int. J. Digit. Libr. 2021,
22, 159–173. [CrossRef]
102. Piscopo, A.; Simperl, E. What we talk about when we talk about Wikidata quality: A literature survey. In Proceedings of the 15th
International Symposium on Open Collaboration, Skövde, Sweden, 20–22 August 2019.
103. Gualo, F.; Rodriguez, M.; Verdugo, J.; Caballero, I.; Piattini, M. Data quality certification using ISO/IEC 25012: Industrial experiences.
J. Syst. Softw. 2021, 176, 110938. [CrossRef]
104. Schmidt, C.; Struckmann, S.; Enzenbach, C.; Reineke, A.; Stausberg, J.; Damerow, S.; Huebner, M.; Schmidt, B.; Sauerbrei, W.;
Richter, A. Facilitating harmonized data quality assessments. A data quality framework for observational health research data
collections with software implementations in R. BMC Med. Res. Methodol. 2021, 21, 63. [CrossRef] [PubMed]
105. Wong, K.; Wong, R. Big data quality prediction informed by banking regulation. Int. J. Data Sci. Anal. 2021, 12, 147–164.
[CrossRef]
106. Lettner, C.; Stumptner, R.; Fragner, W.; Rauchenzauner, F.; Ehrlingera, L. DaQL 2.0: Measure Data Quality based on Entity Models.
Procedia Comput. Sci. 2021, 180, 772–777. [CrossRef]
107. Kong, L.; Xi, Y.; Lang, Y.; Wang, Y.; Zhang, Q. A Data Quality Evaluation Index for Data Journals. In Big Scientific Data Management.
BigSDM 2018; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2019; Volume 11473, pp. 291–300.
108. Taleb, I.; Serhani, M.; Bouhaddioui, C.; Dssouli, R. Big data quality framework: A holistic approach to continuous quality management.
J. Big Data 2021, 8, 76. [CrossRef]
109. Akgul, M. Data Quality: Success Factors Emergent Research Forum (ERF). In Proceedings of the AMCIS 2021, Virtual Conference,
9–13 August 2021.
110. Juddoo, S.; George, C.; Duquenoy, P.; Windridge, D. Data Governance in the Health Industry: Investigating Data Quality Dimensions
within a Big Data Context. Appl. Syst. Innov. 2018, 1, 43. [CrossRef]
111. Bronselaer, A. Data Quality Management: An Overview of Methods and Challenges. In Flexible Query Answering Systems. FQAS
2021; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2021; Volume 12871, pp. 127–141.
112. Bogdanov, A.; Degtyarev, A.; Shchegoleva, N.; Khvatov, V. Data Quality in a Decentralized Environment. In Computational
Science and Its Applications. ICCSA 2020; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2020; Volume 12251,
pp. 58–71.
Data 2024, 9, 151 25 of 26

113. Valencia-Parra, A.; Parody, L.; Varela-Vaca, A.; Caballero, I.; Gomez-Lopez, M. DMN for Data Quality Measurement and Assessment.
In Business Process Management Workshops. BPM 2019; Lecture Notes in Business Information Processing; Springer: Cham, Switzerland,
2019; Volume 362, pp. 362–374.
114. Fang, Z.; Liu, Y.; Lu, Q.; Pitt, M.; Hanna, S.; Tian, Z. BIM-integrated portfolio-based strategic asset data quality management.
Autom. Constr. 2022, 134, 104070. [CrossRef]
115. Jain, A.; Patel, H.; Nagalapatti, L.; Gupta, N.; Mehta, S.; Guttula, S.; Mujumdar, S.; Afzal, S.; Mittal, R.; Munigala, V. Overview
and Importance of Data Quality for Machine Learning Tasks. In Proceedings of the 26th ACM SIGKDD International Conference
on Knowledge Discovery & Data Mining, Virtual Event, 6–10 July 2020; pp. 3561–3562.
116. Shenoy, K.; Ilievski, F.; Garijo, D.; Schwabe, D.; Szekely, P. A study of the quality of Wikidata. J. Web Semant. 2022, 72, 100679.
[CrossRef]
117. Talha, M.; Kalam, A. Big Data: Towards a Collaborative Security System at the Service of Data Quality. In Hybrid Intelligent
Systems. HIS 2021; Lecture Notes in Networks and Systems; Springer: Cham, Switzerland, 2022; Volume 420, pp. 595–606.
118. Ehrlinger, L.; Woess, W. A Survey of Data Quality Measurement and Monitoring Tools. Front. Big Data 2022, 5, 850611. [CrossRef]
[PubMed]
119. AbuHalimeh, A. Improving Data Quality in Clinical Research Informatics Tools. Front. Big Data 2022, 5, 871897. [CrossRef]
120. Azeroual, O. Proof of Concept to Secure the Quality of Research Data. In Proceedings of the Fourteenth International Conference
on Machine Vision (ICMV 2021), Virtual Conference, 8–12 November 2022; Volume 12084.
121. Caballero, I.; Gualo, F.; Rodriguez, M.; Piattini, M. BR4DQ: A methodology for grouping business rules for data quality evaluation.
Inf. Syst. 2022, 109, 102058. [CrossRef]
122. Nakajima, S.; Nakatani, T. AI Extension of SQuaRE Data Quality Model. In Proceedings of the 2021 IEEE 21st International
Conference on Software Quality, Reliability and Security Companion (QRS-C), Sanya, China, 6–10 December 2021; pp. 306–313.
123. Reda, O.; Zellou, A. SMDQM- Social Media Data Quality Assessment Model. In Proceedings of the 2022 2nd International Conference
on Innovative Research in Applied Science, Engineering and Technology (IRASET), Meknes, Morocco, 3–4 March 2022; pp. 733–739.
124. Mohammed, M.; Talburt, J.; Dagtas, S.; Hollingsworth, M. A Zero Trust Model Based Framework For Data Quality Assessment.
In Proceedings of the 2021 International Conference on Computational Science and Computational Intelligence (CSCI), Las Vegas,
NV, USA, 15–17 December 2021; pp. 305–307.
125. Iyengar, A.; Patel, D.; Shrivastava, S.; Zhou, N.; Bhamidipaty, A. Real-Time Data Quality Analysis. In Proceedings of the 2020 IEEE
Second International Conference on Cognitive Machine Intelligence (CogMI), Atlanta, GA, USA, 28–31 October 2020; pp. 101–108.
126. To, A.; Meymandpour, R.; Davis, J.; Jourjon, G.; Chan, J. A Linked Data Quality Assessment Framework for Network Data.
In Proceedings of the 2nd Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and
Network Data Analytics (NDA), Amsterdam, The Netherlands, 30 June 2019.
127. Wurl, A.; Falkner, A.; Haselbock, A.; Mazak, A. Using Signifiers for Data Integration in Rail Automation. In Proceedings of the
6th International Conference on Data Science, Technology and Applications, Madrid, Spain, 24–26 July 2017; pp. 172–179.
128. Kuban, M.; Gabaj, S.; Aggoune, W.; Vona, C.; Rigamonti, S.; Draxl, C. Similarity of materials and data-quality assessment by
fingerprinting. MRS Bull. 2022, 47, 991–999. [CrossRef]
129. Brajkovic, H.; Jaksic, D.; Poscic, P. Data Warehouse and Data Quality—An Overview. In Proceedings of the Central European
Conference on Information and Intelligent Systems, Varaždin, Croatia, 7–9 October 2020; pp. 17–24.
130. Serra, F.; Peralta, V.; Marotta, A.; Marcel, P. Modeling Context for Data Quality Management. In Conceptual Modeling. ER 2022;
Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2022; Volume 13607, pp. 325–335.
131. Nesca, M.; Katz, A.; Leung, C.; Lix, L. A scoping review of preprocessing methods for unstructured text data to assess data
quality. Int. J. Popul. Data Sci. 2022, 7, 1757. [CrossRef]
132. Ben Hassine, S.; Clement, D. Open Data Quality Dimensions and Metrics: State of the Art and Applied Use Cases. In Business
Information Systems Workshops. BIS 2020; Lecture Notes in Business Information Processing; Springer: Cham, Switzerland, 2020;
Volume 394, pp. 311–323.
133. Elouataoui, W.; El Alaoui, I.; El Mendili, S.; Gahi, Y. An Advanced Big Data Quality Framework Based on Weighted Metrics. Big
Data Cogn. Comput. 2022, 6, 153. [CrossRef]
134. Mashoufi, M.; Ayatollahi, H.; Khorasani-Zavareh, D.; Boni, T. Data quality assessment in emergency medical services: An objective
approach. BMC Emerg. Med. 2023, 23, 10. [CrossRef]
135. Buelvas, J.; Munera, D.; Tobon, V.; Aguirre, J.; Gaviria, N. Data Quality in IoT-Based Air Quality Monitoring Systems: A
Systematic Mapping Study. Water Air Soil Pollut. 2023, 234, 248. [CrossRef]
136. Guerra-Garcia, C.; Nikiforova, A.; Jimenez, S.; Perez-Gonzalez, H.; Ramirez-Torres, M.; Ontanon-Garcia, L. ISO/IEC 25012-based
methodology for managing data quality requirements in the development of information systems: Towards Data Quality by
Design. Data Knowl. Eng. 2023, 145, 102152. [CrossRef]
137. Krishna, C.; Ruikar, K.; Jha, K. Determinants of Data Quality Dimensions for Assessing Highway Infrastructure Data Using
Semiotic Framework. Buildings 2023, 13, 944. [CrossRef]
138. Mirzaie, M.; Behkamal, B.; Allahbakhsh, M.; Paydar, S.; Bertino, E. State of the art on quality control for data streams: A systematic
literature review. Comput. Sci. Rev. 2023, 48, 100554. [CrossRef]
Data 2024, 9, 151 26 of 26

139. Bertrand, Y.; Van Belle, R.; De Weerdt, J.; Serral, E. Defining Data Quality Issues in Process Mining with IoT Data. In Process
Mining Workshops. ICPM 2022; Lecture Notes in Business Information Processing; Springer: Cham, Switzerland, 2023; Volume 468,
pp. 422–434.
140. Lewis, A.; Weiskopf, N.; Abrams, Z.; Foraker, R.; Lai, A.; Payne, P.; Gupta, A. Electronic health record data quality assessment and
tools: A systematic review. J. Am. Med. Inform. Assoc. 2023, 30, 1730–1740. [CrossRef]
141. Perez-Castillo, R.; Carretero, A.G.; Rodriguez, M.; Caballero, I.; Piattini, M.; Mate, A.; Kim, S.; Lee, D. Data Quality Best Practices in
IoT Environments. In Proceedings of the 2018 11th International Conference on the Quality of Information and Communications
Technology (QUATIC), Coimbra, Portugal, 4–7 September 2018; pp. 272–275.
142. Huser, V.; Li, X.; Zhang, Z.; Jung, S.; Park, R.W.; Banda, J.; Razzaghi, H.; Londhe, A.; Natarajan, K. Extending Achilles Heel Data
Quality Tool with New Rules Informed by Multi-Site Data Quality Comparison. Stud. Health Technol. Inform. 2019, 264, 1488–1489.
[PubMed]
143. Heine, F.; Kleiner, C.; Oelsner, T. A DSL for Automated Data Quality Monitoring. In Database and Expert Systems Applications.
DEXA 2020; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2020; Volume 12391, pp. 89–105.
144. Montana, P.; Marotta, A. Data Quality Management oriented to the Electronic Medical Record. In Proceedings of the 2021 XLVII
Latin American Computing Conference (CLEI), Cartago, Costa Rica, 25–29 October 2021.
145. Strozyna, M.; Filipiak, D.; Wecel, K. Data Quality Assessment—A Use Case from the Maritime Domain. In Business Information
Systems Workshops. BIS 2020; Lecture Notes in Business Information Processing; Springer: Cham, Switzerland, 2020; Volume 394,
pp. 5–20.
146. Ji, R.; Hou, H.; Sheng, G.; Jiang, X. Data Quality Assessment for Electrical Equipment Condition Monitoring. In Proceedings of
the 2022 9th International Conference on Condition Monitoring and Diagnosis (CMD), Kitakyushu, Japan, 13–18 November 2022;
pp. 259–262.
147. Kapsner, L.; Kampf, M.; Seuchter, S.; Kamdje-Wabo, G.; Gradinger, T.; Ganslandt, T.; Mate, S.; Gruendner, J.; Kraska, D.; Prokosch, H.
Moving Towards an EHR Data Quality Framework: The MIRACUM Approach. Stud. Health Technol. Inform. 2019, 267, 247–253.
[PubMed]
148. Nguyen, T.L. A framework for five big v’s of big data and organizational culture in firms. In Proceedings of the 2018 IEEE
International Conference on Big Data (Big Data), Seattle, WA, USA, 10–13 December 2018; pp. 5411–5413.
149. Qiu, H.; Ayara, A.; Glimm, B. Ontology-Based Map Data Quality Assurance. In The Semantic Web. ESWC 2021; Lecture Notes in
Computer Science Lecture Notes in Computer Science (LNCS); Springer: Cham, Switzerland, 2021; Volume 12731, pp. 73–89.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

Coconut VA Resume Template
50% (2)
Coconut VA Resume Template
2 pages
Handbook of Data Quality Research and Practice (Shazia Sadiq)
No ratings yet
Handbook of Data Quality Research and Practice (Shazia Sadiq)
440 pages
Phrasal Verbs Worksheet:: - With Marty
No ratings yet
Phrasal Verbs Worksheet:: - With Marty
1 page
Language and Identity
100% (1)
Language and Identity
9 pages
3.an Overview of Data Quality Frameworks
No ratings yet
3.an Overview of Data Quality Frameworks
15 pages
Encyclopedia 02 00032 v2
No ratings yet
Encyclopedia 02 00032 v2
13 pages
Data Quality Management
100% (1)
Data Quality Management
12 pages
A Framework To Construct Data Quality Dimensions Relationships
No ratings yet
A Framework To Construct Data Quality Dimensions Relationships
10 pages
Data Quality
No ratings yet
Data Quality
76 pages
Soft v10 n12 2017 1
No ratings yet
Soft v10 n12 2017 1
20 pages
Data Quality and Database Design 1
No ratings yet
Data Quality and Database Design 1
4 pages
M3A1
No ratings yet
M3A1
7 pages
Elevating Data Quality A Paradigm Shift For Data Spaces and AI Needs - May 2024
No ratings yet
Elevating Data Quality A Paradigm Shift For Data Spaces and AI Needs - May 2024
84 pages
Data Quality
No ratings yet
Data Quality
10 pages
Data Quality A Survey of Data Quality Dimensions
No ratings yet
Data Quality A Survey of Data Quality Dimensions
5 pages
2017 Iaria Advances in SW Data Quality
No ratings yet
2017 Iaria Advances in SW Data Quality
20 pages
Ascertain Accuracy of Data
No ratings yet
Ascertain Accuracy of Data
3 pages
Guide Real Talk A Guide To Understanding Data Quality and Data Observability
No ratings yet
Guide Real Talk A Guide To Understanding Data Quality and Data Observability
36 pages
Architecture and Design For Regression Test Framework To Fuse Big Data Stack
No ratings yet
Architecture and Design For Regression Test Framework To Fuse Big Data Stack
9 pages
2024 DQOps Ebook A Step-By-step Guide To Improve Data Quality
No ratings yet
2024 DQOps Ebook A Step-By-step Guide To Improve Data Quality
120 pages
An Analysis of Data Quality Requirements For Machine Learning
No ratings yet
An Analysis of Data Quality Requirements For Machine Learning
12 pages
Everything You Need To Know About Data Quality in 2022
No ratings yet
Everything You Need To Know About Data Quality in 2022
37 pages
Dataqualitymanagement
No ratings yet
Dataqualitymanagement
20 pages
Data Quality and Its Parameters
No ratings yet
Data Quality and Its Parameters
10 pages
Soda Core for Modern Data Quality and Observability: The Complete Guide for Developers and Engineers
From Everand
Soda Core for Modern Data Quality and Observability: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
322 MGT 189 SP 06
No ratings yet
322 MGT 189 SP 06
10 pages
DataQuality Session2
No ratings yet
DataQuality Session2
39 pages
WG2 N1346 Discussion On Data Quality
No ratings yet
WG2 N1346 Discussion On Data Quality
18 pages
Un Marco de Big Data para La Evaluación de La Calidad de Los Datos de Energía Eléctrica
No ratings yet
Un Marco de Big Data para La Evaluación de La Calidad de Los Datos de Energía Eléctrica
4 pages
Data Quality MDM
No ratings yet
Data Quality MDM
20 pages
Module 4
No ratings yet
Module 4
24 pages
Data Quality Lec 3
No ratings yet
Data Quality Lec 3
3 pages
An Evaluation Framework For Data Quality Tools
No ratings yet
An Evaluation Framework For Data Quality Tools
15 pages
Talk - Data Quality Framework
100% (1)
Talk - Data Quality Framework
30 pages
Deequ for Scalable Data Quality Assurance: The Complete Guide for Developers and Engineers
From Everand
Deequ for Scalable Data Quality Assurance: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Data Quality and Data Cleaning: An Overview
No ratings yet
Data Quality and Data Cleaning: An Overview
27 pages
Create An Enterprise Vision For Data Quality and Observability Whitepaper
No ratings yet
Create An Enterprise Vision For Data Quality and Observability Whitepaper
17 pages
Ijcet 15 05 017
No ratings yet
Ijcet 15 05 017
13 pages
Big Data Quality Framework - Final - Jan08-2015
No ratings yet
Big Data Quality Framework - Final - Jan08-2015
32 pages
7 Quality of Data
No ratings yet
7 Quality of Data
2 pages
Data Quality Requirements Analysis and Modeling: Richard Y. Wang Henry B. Kon Stuart E. Madnick
No ratings yet
Data Quality Requirements Analysis and Modeling: Richard Y. Wang Henry B. Kon Stuart E. Madnick
15 pages
Data Quality Within Lakehouses. A Deep Dive Into Data Quality Using - by Piethein Strengholt - Mar, 2024 - Medium
No ratings yet
Data Quality Within Lakehouses. A Deep Dive Into Data Quality Using - by Piethein Strengholt - Mar, 2024 - Medium
11 pages
ICIQ20 FP20 Data 20 Governance 20 Contingency 200220 Kwe
No ratings yet
ICIQ20 FP20 Data 20 Governance 20 Contingency 200220 Kwe
15 pages
Big Data Quality Through Data Traceability Bab 2
No ratings yet
Big Data Quality Through Data Traceability Bab 2
6 pages
Data Quality
No ratings yet
Data Quality
5 pages
Data Quality Concepts PDF
100% (3)
Data Quality Concepts PDF
83 pages
Data Quality and Data Cleaning: An Overview
0% (1)
Data Quality and Data Cleaning: An Overview
132 pages
AIA DQG IDQ Approach& Features v1.1
No ratings yet
AIA DQG IDQ Approach& Features v1.1
29 pages
Thesis 2018
No ratings yet
Thesis 2018
82 pages
Examining Data Quality
No ratings yet
Examining Data Quality
4 pages
Introducing Data Quality in A Cooperative Context: Bertola@iasi - RM.CNR - It
No ratings yet
Introducing Data Quality in A Cooperative Context: Bertola@iasi - RM.CNR - It
14 pages
Strong Lee Wang CA CM May 97
No ratings yet
Strong Lee Wang CA CM May 97
9 pages
Data Quality
No ratings yet
Data Quality
6 pages
Tb-Discussing Data - Evaluating Data Quality
No ratings yet
Tb-Discussing Data - Evaluating Data Quality
12 pages
Data Quality Assessment From Then User's Perspective
No ratings yet
Data Quality Assessment From Then User's Perspective
6 pages
Mis Group 6 Assignment 1
No ratings yet
Mis Group 6 Assignment 1
10 pages
The Data Warehouse Quality Audit Session Overview
No ratings yet
The Data Warehouse Quality Audit Session Overview
5 pages
A Guide To Data Quality Testing For AI Applications Based On Standards
No ratings yet
A Guide To Data Quality Testing For AI Applications Based On Standards
31 pages
White Paper: 1 Definitive Guide To Data Quality
No ratings yet
White Paper: 1 Definitive Guide To Data Quality
18 pages
Andromeda
No ratings yet
Andromeda
6 pages
Building and Operating Data Hubs: Using a practical Framework as Toolset
From Everand
Building and Operating Data Hubs: Using a practical Framework as Toolset
Georg Graner
No ratings yet
Debezium in Action: Definitive Reference for Developers and Engineers
From Everand
Debezium in Action: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Great Expectations Checkpoints in Data Validation: The Complete Guide for Developers and Engineers
From Everand
Great Expectations Checkpoints in Data Validation: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Koos Faqs: Background Information
No ratings yet
Koos Faqs: Background Information
11 pages
MATLAB Simulation For Digital Signal Processing PDF
No ratings yet
MATLAB Simulation For Digital Signal Processing PDF
5 pages
English Grammar
No ratings yet
English Grammar
3 pages
Jesus Loves Me Oh, How He Loves You and Me: Words and Music by William B. Bradbury Kurt Kaiser
No ratings yet
Jesus Loves Me Oh, How He Loves You and Me: Words and Music by William B. Bradbury Kurt Kaiser
7 pages
Handout 12
No ratings yet
Handout 12
20 pages
Soal TOEFL (Structure Test 3) : (C) Sent Flowers Faithfully To The Cemetery Each Week
No ratings yet
Soal TOEFL (Structure Test 3) : (C) Sent Flowers Faithfully To The Cemetery Each Week
9 pages
ASSESSMENT
No ratings yet
ASSESSMENT
8 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
Superiority of Christianity Over Other Religions On Earth by Pastor Paul Rika Ebook
100% (1)
Superiority of Christianity Over Other Religions On Earth by Pastor Paul Rika Ebook
76 pages
Aschnorous Server Request
No ratings yet
Aschnorous Server Request
4 pages
Organizational Behavior Chapter 9
No ratings yet
Organizational Behavior Chapter 9
25 pages
BBFEST 2022 - Proposal - Sponsorhip
No ratings yet
BBFEST 2022 - Proposal - Sponsorhip
24 pages
Propose Persuade Complain: Tower Dirty Neighbourhood Market
No ratings yet
Propose Persuade Complain: Tower Dirty Neighbourhood Market
5 pages
Java Programming Unit-1 Mega Notes
No ratings yet
Java Programming Unit-1 Mega Notes
42 pages
Passive Voice Grammar Active Passive: by Our Information Officer Transitive: Verbs Which Currently Take Object
No ratings yet
Passive Voice Grammar Active Passive: by Our Information Officer Transitive: Verbs Which Currently Take Object
3 pages
Gaker 050823
No ratings yet
Gaker 050823
8 pages
Unit 2 Electronic Spreadsheet-1
No ratings yet
Unit 2 Electronic Spreadsheet-1
10 pages
Rare Birds of North America Course Book Steve N G Howell Ian Lewington Will Russell Download
No ratings yet
Rare Birds of North America Course Book Steve N G Howell Ian Lewington Will Russell Download
86 pages
II Unit Test Sceme Chart Class Nursery To XII 16.08.23
No ratings yet
II Unit Test Sceme Chart Class Nursery To XII 16.08.23
1 page
PlateMaker Advanced Tutorial Sample
No ratings yet
PlateMaker Advanced Tutorial Sample
8 pages
Operating Systems Lecture Notes
100% (3)
Operating Systems Lecture Notes
47 pages
Close Passage - Reconcilliation Week
No ratings yet
Close Passage - Reconcilliation Week
2 pages
Class 2 Computer Ab
No ratings yet
Class 2 Computer Ab
2 pages
On Giving Advice
No ratings yet
On Giving Advice
3 pages
B.ED Course 202 Notes
No ratings yet
B.ED Course 202 Notes
54 pages
Full Le Triangle Et L Hexagone Maboula Soumahoro Ebook All Chapters
No ratings yet
Full Le Triangle Et L Hexagone Maboula Soumahoro Ebook All Chapters
49 pages
English Grammar Worksheets: Conditional Tense
No ratings yet
English Grammar Worksheets: Conditional Tense
2 pages

A Framework For Current and New Data Quality Dimensions

Uploaded by

A Framework For Current and New Data Quality Dimensions

Uploaded by

Article

A Framework for Current and New Data Quality Dimensions:

1 Informatics, Data Science Department, National Physical Laboratory, Glasgow G1 1RD, UK

Data 2024, 9, 151. https://doi.org/10.3390/data9120151 https://www.mdpi.com/journal/data

2.1. Literature Search

Records removed before

Records screened Records excluded

Reports sought for retrieval Reports not retrieved

Reports assessed for eligibility

Studies included in review

In addition to the primary literature search methodology, additional publications were

2.2. Terminology Mapping

While this standard can be interpreted as an all-encompassing framework to assess

3.2. Classification of Data Quality Dimensions

Dimension Associated Terms

Dimension Associated Terms

Dimensions Associated Terms

4.2. Contextual Data Quality

Accountability [75] and management [38] directly implement governance principles

4.3. System-Dependent Data Quality

based on this inaccurate semantic representation. Similarly, if the relationships between

Conflicts of Interest: The authors declare no conflicts of interest.

You might also like