Multi-Dimensional Data Quality
Multi-Dimensional Data Quality
*Correspondence:
[email protected]
Abstract
1
Research Institute of China Recently, the widespread adoption of artificial intelligence (AI) has given rise to a signif-
Mobile Communications icant trust crisis, stemming from the persistent emergence of issues in practical appli-
Corporation, Beijing 100032, cations. As a crucial component of AI, data has a profound impact on the trustworthi-
China
2
Future Institute, Beijing 100032, ness of AI. Nevertheless, researchers have struggled with the challenge of rationally
China assessing data quality, primarily due to the scarcity of versatile and effective evaluation
3
Research Institute of Safety methods. To address this trouble, a multi-dimensional hierarchical evaluation system
Technology, Beijing, China
(MDHES) is proposed to estimate the data quality. Initially, multiple key dimensions
are devised to evaluate specific data conditions separately by the calculation of indi-
vidual scores. Then, the strengths and weaknesses among various dimensions can be
provided a clearer understanding. Furthermore, a comprehensive evaluation method,
incorporating a fuzzy evaluation model, is developed to synthetically evaluate the data
quality. Then, this evaluation method can achieve a dynamic balance, and meanwhile
achieve a harmonious integration of subjectivity and objectivity criteria to ensure
a more precise assessment result. Finally, rigorous experiment verification and compari-
son in both benchmark problems and real-world applications demonstrate the effec-
tiveness of the proposed MDHES, which can accurately assess data quality to provide
a strong data support for the development of trustworthy AI.
Keywords: Data quality, Assessment dimension, Multi-dimensional hierarchical
evaluation system, Trustworthy AI
Introduction
With the rapid development of computer software and hardware technology, artifi-
cial intelligence (AI) has made significant breakthroughs, which is increasingly applied
in multiple fields of human production and life [1–3]. AI has been proven particularly
professional to predict stock prices or the stock tendency in the financial field [4]. In
the medical field, AI can assist doctors to diagnose diseases and perform surgery [5].
Moreover, AI can identify real-time environmental information for path planning, thus
enhancing the likelihood of early arrival of unmanned vehicles in automated driving [6].
However, as the widespread application of AI, the continuous issues, such as the under-
representation of data or the unfairness of model outputs, have become an obstacle to
permeate AI into the actual scenarios [7, 8].
© The Author(s) 2024. Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0
International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long
as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you
modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of
it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise
in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted
by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy
of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
Zhang et al. Journal of Big Data (2024) 11:136 Page 2 of 26
Good data can guide AI towards goodness and trustworthy. Therefore, more scholars
are participating in the evaluation and improvement of data quality. In [29], Liang et al.
primarily explored the influence of data pipeline on the credibility of AI, encompassing
data design (sourcing and documentation), data sculpting (selection, cleaning and anno-
tation), and data strategies for model testing and monitoring. Their work offers valuable
theoretical insights into significance of data quality. However, a specific implementation
was not given. Additionally, a data protection strategy, using the backdoor watermark-
ing, was given to authenticate data ownership [30]. This proposed protection method
employs poison backdoor attacks for data watermarking. Then, a hypothesis-test-guided
method was utilized for verification. The experiments demonstrated that this protection
method can effectively prevent data theft to enhance reliability of AI. Caballero pro-
posed a 3Cs model, which is composed of three data quality dimensions for assessing the
quality of data: contextual consistency, operational consistency and temporal consist-
ency [31]. Nevertheless, it only focuses on a single dimension of data, potentially over-
looking other crucial aspects. A multi-dimensional assessment could provide a more
overall understanding of AI credibility. For examples, the significance of data integrity
and consistency as pivotal quality assurance dimensions was emphasized together [32].
In addition, a comprehensive assessment methodology, encompassing multiple dimen-
sions, was introduced to estimate the integrity, redundancy, accuracy, timeliness, intel-
ligence and consistency of power data [33]. The result demonstrated that this assessment
method can provide a foundation for data analysis and data mining to facilitate the trust-
worthy AI in power system.
However, the following problems should be future discussed.
tive formulas, which can provide a clearer understanding of the strengths and weak-
nesses among these dimensions, by calculating the individual scoring. Moreover, the
effects of data improvements can also be explicitly measured, which offers a con-
structive guidance and feedback on the implementing enhancements for researchers.
2. A comprehensive evaluation method, incorporating fuzzy evaluation model, is
developed to synthetically evaluate the data quality, based on the score value of
each dimension. This method focuses on interactions of dimensions to achieve the
dynamic balance. Furthermore, the adoption of fuzzy evaluation model can achieve
a harmonious integration of subjectivity and objectivity criteria to more accurately
reflect the data quality.
encompasses the entire data pipelines (data processing, data usage, data storage and
more), are meticulously designed to evaluate data condition separately by calculating
individual score of each dimension. Then, a comprehensive evaluation method, inte-
grating individual scores, is developed to provide a synthetically evaluation for the
data quality. Next, the proposed MDHES will be introduced in detail.
where ℧11 is the number of features in the benchmark data, while Ω11 represents the
number of features in the training data. Baseline data is meticulously gathered by meas-
urement organizations tailored to specific scenarios, resulting in a comprehensive data
set that meets specific requirements. However, acquiring such data demands consider-
able effort and financial investment.
The presence of null values will affect the availability of data, undermining the deci-
sion-making capabilities of AI, especially when the training data contains too many
null values. Therefore, it is imperative to consider the fullness of feature values and
can be defined as
θ12 = (S - � 12 ) Ns × 100%, (2)
where Ω12 signifies the number of training data samples with null values, and S indicates
the total number of training data sample. Having an adequate amount of training data is
crucial, which can help the AI model to learn the underlying patterns.
and essential laws, thereby enhancing its generalization ability. It can be described
as
where vi is the ith sample target size in the training data, and γ denotes the z-score asso-
ciated with the confidence level ε. pi is the proportion of the ith category in training data.
The score of data size can be calculated as
�11
θ13 = min 1, si /vi
1
�11 × 100%, (4)
where si indicates the actual size of ith category. Based on the above analysis, the train-
ing data completeness is
Zhang et al. Journal of Big Data (2024) 11:136 Page 6 of 26
where w11, w12 and w13 are the connect weights for each dimension, respectively.
2. Accuracy: The data accuracy is the degree of closeness between the collected data and
the actual true values [35]. The perturbation data, such as the false data by the deep fake
algorithm, and adversarial sample in dataset, may hinder AI model to recognize the correct
patterns, ultimately affecting the accuracy of AI models. Therefore, for the general pertur-
bation data can be discovered by outlier detection technology, such as the 3-sigma criterion
xij ⊆ xj −3 κj , xj +3 κj , (6)
where xij is the value of ith row and jth column in training data. xi and κi represent the
average value and standard deviation of ith column respectively. The accuracy of train-
ing data θ2 can be given as
θ21 = 21 S × 100%, (7)
where Ω22 represents the number of successful adversarial sample attacks, xadv iis the
ith adversarial sample and yi is the corresponding label value. Therefore, the accuracy of
training data θ2 can be given as
θ2 = 2 S × 100%, (9)
where θ31 is the score for format consistency, Ω31 indicates the number of samples with
different formats for a feature. Moreover, by comparing the values of features and labeled
values between each sample, the content consistency for the same matter can be given as
Zhang et al. Journal of Big Data (2024) 11:136 Page 7 of 26
θ32 = (S−� 32 ) S × 100%, (11)
where Ω32 denotes the number of samples containing conflicting content. In addition,
after content consistency evalution, adversarial samples with the target attacks will also
be detected. Therefore, the consistency of training data θ3 is
where Ω41 is the number of data sources of training data and ℧41 is the number of data
sources of benchmark data. Additionally, Ω42 denotes the number of categories of train-
ing data, while ℧42 is the number of categories of benchmark data.
Therefore, the variousness of training data can be defined as
where ζmax and ζmin are the maximum value of the number of categories respectively.
6. Logicality: The logicality can be utilized to evaluate whether the relationship
between features in training data align with factual or commonsense knowledge, which
certain data errors may remain undetected during data accuracy assessment [40]. Logi-
cal relationships between features encompass comparisons like ‘greater than’, ‘less than’,
‘equal to’, and the like. For example, if feature A and feature B, when multiplied, are
expected to be greater than or equal to feature C. If their product falls short of feature
C, it indicates a logical inconsistency. Based on the priori knowledge, logicality can be
given as
θ6 = (S−� 61 ) S × 100% , (16)
and latest samples [41]. The evaluation formula for quantifying these fluctuations is
expressed as
θ7 = min(1, |℧71 −� 71 | ℧71 ) × 100%, (17)
where ℧71 is the sum of historical samples, while Ω71 signifies the sum of latest sam-
ples, |.| denotes an absolute value operation. It is worth noting that, to ensure computa-
tional fairness, the number of historical samples should be equal to the number of latest
samples.
8. Uniqueness: Repetitive samples have an influence for the outputs of AI models,
often causing the overfitting [42]. Nevertheless, simply eliminating these samples from
the training data could compromise the generalization ability of the models. Therefore,
data uniqueness assessment is required as a basis for deletion without affecting the per-
formance of models
θ8 = (S−� 81 ) S × 100%, (18)
where
i=n 2
1 ⌢ ⌢
�= S i (t) − S i (t − τ ) , (20)
100n
i=1
Ŝi is the average value of the ith feature of dataset at time t, and τ indicates the time
interval. Furthermore, the update frequency of data is
θ9,2 = (TB −TN ) TB × 100%, (21)
where TB represents the ideal number of update time period and TN is the time period
that has not been updated. Thus, the timeliness of the data can be described as
the standard of the collection processes and data storage means that the data is pro-
tected by encryption algorithm such as the differential privacy method and feder-
ated learning. The standard of data training refers to the standard in the process of
using online or offline data to train AI models, such as the homogeneous compu-
tation or heterogeneous computing. Based on the manual judgement, θ10 score of
standard can be given as
θ10 = ω10,1 θ10,1 + ω10,2 θ10,2 + ω10,3 θ10,3 + ω10,4 θ10,4 + ω10,5 θ10,5 + ω10,6 θ10,6,
(23)
where θ10,1 is the format standard score, θ10,2 is the naming standard score,θ10,3, θ10,4,
θ10,5 and θ10,6 indicate scores for data source channel, collection process, data storage
channel and data training respectively, w10 = [w10,1,w10,2,…,w10,6] represents the weight
set.
Remark Ten key dimensions have been elaborately designed to evaluate data quality.
By calculating the score of each dimension, researchers have a clearer understanding
for data condition. This approach not only offers valuable guidance but also provides
feedback on the effectiveness of training data enhancement. Moreover, the use of simple
naming conventions for dimensions, such as serial numbering, simplifies the expansion
of dimensions or the addition of sub-item within each dimension in future studies. It is
worth noting that to better align with real-world scenarios, certain dimensions incorpo-
rate subjective evaluation criteria. In the following discussion, we will explore strategies
to minimize the influence of subjectivity.
dimensions can be defined as θ = {θ1, θ2, θ3, …, θn}, and n ͼ [1, 10]. The evaluation
grades can be given as v = {v1, v2, v3, …,vm} and m is the number of evaluation grades.
For example, when m is four, set v is {excellent, good, pass, poor}.
where rij is membership degree of ith dimension for the jth evaluation grade and can be
given as by adopting the membership function. It is important to highlight that distinct
membership functions should be employed in different scenarios. After the evaluation
matrix is calculated, weight vector also needs to be determined.
Each dimension has different roles and positions in the evaluation of data quality.
Therefore, the weight vector a = (a1, a2, …, an), a1≧0, ∑ai = 1, need to be designed, which
can express the importance of each dimension relative to the problem to be evaluated.
Since the requirements for each dimension are discrepant in different scenarios. Ana-
lytic hierarchy process (AHP) [45] is suitable for the determination of the weights in this
paper, which is an effective analytical method that harmoniously combines qualitative
and quantitative methods for solving complex problems with multiple objectives. In
order to save space, the working principle of AHP will not be presented while the com-
putational steps will be given in detail in the Section Results and discussion.
b=a◦R
where vector b = [b1, b2, …, bm] and symbolic < ○ > represents the fuzzy operator. More-
(25)
over, in order to make full use of information of R, the weighted average operator can be
adopted
i=n,j=m
�
bj = min 1, min(ai , rij ) . (26)
i=1,j=1
Table 1 The evaluation process of the proposed MDHES for data quality
where ηj is the source of jth evaluation grade. Moreover, the detailed evaluation pro-
cesses are summarized in Table 1.
Remark The data quality evaluation is complex, because many dimensions to be con-
sidered together at the same time, and each dimension has a different level of impor-
tance. Therefore, the proposed comprehensive evaluation method, incorporating the
fuzzy comprehensive evaluation model, can effectively focus on interactions of dimen-
sions to achieve the dynamic balance. Furthermore, the expert knowledge can be
employed in evaluation, and meanwhile the influence of subjectivity in entire evaluation
process can be mitigated, by leveraging fuzzy mathematics. Thus, this comprehensive
evaluation method will accurately reflect the data quality to increase credibility of AI.
Table 2 The distribution details of training data, testing data, benchmark data and validation data
Data Total Normal Probing Dos R2L U2R
number
Number Category Number Category Number Category Number Category Number Category
Table 3 The Evaluation results of each dimension for training data of intrusion detection
Dimension Completeness Accuracy Consistency Variousness Equalization Logicality Uniqueness
(%)
was labeled as normal and four abnormal categories. Furthermore, anomaly data and
another 17 attack types appeared in the test data.
Completeness 80 85 95 98
Accuracy 85 90 95 99
Consistency 90 95 98 99
Variousness 50 60 70 80
Equalization 35 55 75 85
Logicality 85 90 96 98
Uniqueness 25 35 45 60
contradiction has been discovered between training data and baseline data by com-
paring each sample, then the score of consistency is 100. Due benchmark data are
randomly selected from the training and testing sets, it means that multi-source score
of training data is 50. Moreover, there are 23 categories of training data, and 39 cat-
egories of benchmark data, the diversity is 56. Therefore, the variousness score is the
53. Intuitively, there is a significant difference in the number of categories of training
data and the equalization score is only 1 according to Eq. (13). It can be determined
that logicality score is 100 by calibration. Additionally, there are 348,435 duplicate
entries in the training data, and the score for uniqueness is calculated to be 30.
Comprehensive evaluation for training data: Based on the above discussion of
each dimension, data quality can be comprehensively evaluated in this section. In
our experiment, data quality is divided into four evaluation grades: poor, pass, good,
excellent. The evaluation matrix R, where the membership degree of dimensions for
each evaluation grade, needs to be calculated. First, membership function will be
determined. For the “poor” grade, the lower the score of dimension, the greater the
degree of belonging to it. Thus, “poor” should be a gradually decreasing function.
Similarly, “excellent” is a gradually rising function. Both “pass” and “good” are func-
tions that go up and then go down. Based on the above analysis, trapezoidal member-
ship function is adopted with a characteristic of simple calculation. The distribution
can be given as
�
1, x ≤�δ1
(x − δ1 )� (δ2 − δ1 ), δ1 < x < δ2
ri1 = (δ2 − x) (δ2 − δ1 ), δ1 < x < ·δ2 , ri2 = (δ3 − x) (δ3 − δ2 ), δ2 < x < δ3 ,
0, x ≥ δ2 0, x ≤ δ1 or x ≥ δ3
�
(x − δ2 )� (δ3 − δ2 ), δ2 < x < δ3
0, x ≤�δ3
ri3 = (δ4 − x) (δ4 − δ3 ), δ3 < x < δ4 , ri4 = (x − δ3 ) (δ4 − δ3 ), δ3 < x ≤ δ4 ,
0, x ≤ δ2 or x ≥ δ4 1, x ≥ δ4
(28)
where δ1, δ2, δ3, and δ4 are the critical values at the four levels of poor, pass, good, excel-
lent, respectively.
For the intrusion detection scenario, the completeness, accuracy, consistency and
logicality are critical, which will have a direct impact on the FNN model. However,
equalization is hard to guarantee, due to the difficulty and the cost of attacking.
Zhang et al. Journal of Big Data (2024) 11:136 Page 15 of 26
�
(x − 85)� (95 − 85), 85 < x < 95
0, x ≤ �85
ri3 = (98 − x) (98 − 95), 95 < x < 98 , ri4 = (x − 95) (98 − 95), 95 < x ≤ 98 ,
0, x ≤ 85 or x ≥ 98 1, x ≥ 98
(29)
thus, the fuzzy set of completeness is [0, 0.276, 0.724, 0]. Sequentially, evaluation matrix
R can be given
0, 0.276, 0.724, 1
0, 0, 0.67, 0.33
0, 0, 0, 1
R = 0.55 0.45 0, 0 , (30)
1, 0, 0, 0
0, 0, 0, 1
0.5, 0.5, 0, 0
Next, the weight vector a will be depicted by adopting AHP. In this AHP, judgment matrix
C is first constructed and the element cij of C are given using 1–9 scale method proposed by
Saaty.
Based on Table 5, the judgment matrix C, by a two-by-two comparison of completeness,
accuracy, consistency, variousness, equalization, logicality and repeatability, can be defined
as
Zhang et al. Journal of Big Data (2024) 11:136 Page 16 of 26
1, 1/4, 1/6, 0.5, 7, 1/4, 6
4, 1, 1/3, 5, 9, 1, 9
6, 3, 1, 5, 9, 1, 9
C = 2, 1/5, 1/5, 1, 5, 1/6, 5 , (31)
1/7, 1/9, 1/9, 1/5, 1 1/8, 1/2
4, 1, 1, 7, 8, 1, 8
1/6, 1/9, 1/9, 1/5, 2, 1/9, 1
CR=CI RI,
(32)
CI= (max −τ ) (τ −1),
where CR is the consistency ratio, λmax is the absolute value of the largest eigenvalue
of the matrix C and τ is the number of non-zero eigenvalues order. For matrix C, λmax
is 7.588, τ is the 7. Additionally, when the eigenvalue is 7, the corresponding RI is 1.32,
which is directly given by Saaty. Therefore, value of CR is 0.074 and passes the consist-
ency testing. The maximum vector of the maximum eigenvalue λmax is [– 0.15, – 0.43,
– 0.69, – 0.17, – 0.04, – 0.53, – 0.048]. Then, after normalization, the weight vector is
[0.18, 0.25, 0.25, 0.04, 0.04, 0.12, 0.12]. Therefore, the fuzzy transformation result is [0.1,
0.1, 0.47, 0.31]. The scores corresponding to four evaluation grades are [40, 60, 80, 90].
Hence, the comprehensive evaluation result of training data of intrusion detection is
75.5.
In addition, to better compare the advantages of the fuzzy comprehensive evalua-
tion method, it is compared with the weighted average evaluation method, where each
dimension is equally weighted. The calculated score for data quality is 71.2. In this sce-
nario, the score is 4 points lower, compared to the score of fuzzy comprehensive evalua-
tion. However, the fuzzy comprehensive evaluation method produces the final evaluation
result by setting unequal weights for each evaluation dimension. This weight setting
method can reflect the relative importance between the evaluation dimension, mak-
ing the results more scientific and reasonable. Besides, it can effectively deal with vari-
ous vague and uncertain information. These advantages make the fuzzy comprehensive
Zhang et al. Journal of Big Data (2024) 11:136 Page 17 of 26
evaluation method prominent and become one of the effective means to solve practical
evaluation problems.
Based on the above discussion, equalization and uniqueness are the main reasons
for causing the unsatisfactory score of data quality. Therefore, some data processing is
required for the data quality improvement. After de-duplication and simple random
sampling [46], the distributions of all data sets are shown in Figs. 2, 3. It can be seen
that, to improve the equalization, the number of category R2L is decreased, while other
categories are increased. Moreover, the evaluation results of improved training data are
summarized in Table 5 and the comparison with the raw training data is displayed in
Fig. 4.
Based on Fig. 4 and Table 6, the accuracy is improved by 2% comparing with the
raw training data. Additionally, the scores of variousness, equalization and uniqueness
have been improved by at least 50%. It is worth noting that we don’t completely remove
duplicates, to maintain a balance between the equalization and uniqueness, due to the
Fig. 4 The evaluation result comparison of raw training data and improved training data for intrusion
detection
Zhang et al. Journal of Big Data (2024) 11:136 Page 18 of 26
Table 6 The evaluation results of each dimension for processed training data of intrusion detection
Dimension Completeness Accuracy Consistency Variousness Equalization Logicality Uniqueness CER
(%)
FNN-P 97.01 1.66 0.84 95.93 3.62 2.51 90.05 7.50 4.31 88.10 7.10 6.20 82.60 10.80 8.31
FNN-R 89.36 8.51 2.3 73.67 24.6 6.94 65.87 32.87 11.43 48.64 36.65 24.68 43.81 32.41 23.79
SVM + ELM + MK [47] 95.79 4.68 2.05 98.10 0.95 0.46 87.20 11.32 2.40 21.93 57.41 22.69 31.35 48.46 21.92
BP 91.12 5.92 3.38 91.94 5.24 2.86 89.4 6.32 4.28 72.60 23.36 13.31 64.20 28.27 15.54
ILSTM [48] 97.79 1.84 1.27 96.37 3.74 2.83 91.12 8.39 1.27 80.61 15.48 8.45 75.48 21.39 9.24
ISSDA [49] 99.72 0.18 0.11 98.13 1.10 0.77 95.02 5.32 3.61 82.50 23.67 9.82 78.13 16.69 5.19
Page 19 of 26
Zhang et al. Journal of Big Data (2024) 11:136 Page 20 of 26
training data file with the 40 million samples to the proposed MDHES (seen as the
Fig. 8). Then, Clicking the ‘Create’ button, evaluation task can be created. Next, nine
dimensions are selected, and the task starts by clicking the ‘Start’ button. The evalua-
tion results are shown in Fig. 9. It can be seen that the data size in completeness is 58.16
and the variousness is 75. Therefore, the dimensions: completeness, consistency, unique-
ness and timeliness have unsatisfactory scores, which needs to be further improved and
repeat the evaluation process. Additionally, in order to clearly demonstrate the changes
in the scores of the raw and improved data samples, the comparisons are displayed in
Fig. 10, and the details are summarized in Table 7.
Fig. 10 The evaluation result comparison of raw training data and improved training data for cyber-telecoms
fraud identification
Zhang et al. Journal of Big Data
(2024) 11:136
Table 8 The evaluation comparisons of the raw and improved data samples for telephone network fraud
Dimension (%) Completeness Accuracy Consistency Variousness Equalization Logicality Uniqueness Timeliness Standard CER
Fig. 11 The average performance comparison of DNNs with the raw training data and improved training
data
It can be seen the completeness of training data is improved by 10% in Fig. 10 and
Table 8. Consistency and timeliness are two important dimensions for fraud identifica-
tion DNNs, due to the replacement of objects. For example, the previously normal URL
has become a malicious URL, which will lead to a contradiction of label value. There-
fore, the consistency is enhanced by correcting contradiction samples based on the lat-
est fraud data. Of course, the timeliness is also increased. Moreover, the comprehensive
score for data quality is 92.6, which is enhanced by approximately 8.3%, comparing with
the original training data. To reveal the impact of data quality change on DNNs, SDA
and LSTM are adopted, where the basic building units of SDA are 32, and the basic units
of LSTM are 21. The performances on the validation data are shown in Fig. 11.
As shown in Fig. 11, the average accuracy, average false positive rate, average and false
negative rate of DNNs have the positive improvement. The average accuracy of DNNs
is 89.46, which has increased by 13% comparing to the original data. Furthermore, the
average TF is reduced to 8.19% and the average TM is also 2.61%. Based on the evaluation
results and performance, this intelligent identification model meets the actual applica-
tion requirements and is deployed into the business of cyber-telecoms fraud prevention.
The actual application is shown in Fig. 12.
It can be seen that the total number of fraudulent websites today, the total number of
fraudulent websites within a week, the number of interceptions, and the total number of
interceptions within a week. Moreover, the top twelve types of fraudulent websites are
given, and at the same time, blocking websites with higher rankings are also displayed. In
the first quarter of 2024, the cyber-telecoms fraud prevention system effectively identi-
fied 3 million + fraudulent websites and blocked 46 billion + illegal visits, to effectively
reduce the incidence of fraud and protect citizens’ lives and properties.
Conclusion
In this paper, a MDHES is proposed to estimate the data. In this evaluation system, multiple
crucial dimensions are designed to evaluate data condition separately, which can provide a
clearer understanding for improvement. Then, a comprehensive evaluation method, incor-
porating a fuzzy evaluation model, is developed to synthetically evaluate the data quality to
achieve the dynamic balance and meanwhile mitigates the impact of subjectivity on the com-
prehensive evaluation result. Finally, experiment results and comparisons, in intrusion detec-
tion benchmark problem and real intelligent application identification of cyber-telecoms
fraud, demonstrate the effectiveness of the proposed MDHES, which achieves an accurately
and thoroughly data quality assessment to provide strong data support for trustworthy AI.
In addition, when there are multiple dimensions (more than 9), the workload of AHP scaling
is too large, which can easily cause confusion in judgment. Therefore, in future research, on
the hand, we will focus on improving the comprehensive evaluation method to better assess
the quality of data. On the other hand, more dimensions and sub-items for the big model will
be considered to improve our evaluation system for promoting the application of AI in more
practical scenarios in a safe and efficient manner.
Abbreviations
AI Artificial intelligence
MDHES Multi-dimensional hierarchical evaluation system
FDA Fuzzy denoising autoencoder
BANG Batch adjusted network gradients
SPAM Scalable polynomial additive model
SE-BPER Semantic-enhanced bayesian personalized explanation ranking
AutoML Automated machine learning
FNN Fuzzy neural network
AHP Analytic hierarchy process
FNN-P Processed training data
FNN-R Raw training data
SVM +ELM+MK Support vector machine and extreme learning machine based on modified k-means
BP Back propagation
ILSTM Improved long-short term memory
ISSDA Improved sparse denoising autoencoder
Acknowledgements
Zhang Yusheng is acknowledged for his consulting assistance and silent accompany provided to the first author of this
manuscript.
Author contributions
H.J. is the main designer of the proposed evaluation system and was also a major contributor in writing the manuscript.
C.C., P.R. and K.Y. coordinates with other parties to obtain and analyze the fraud data. Z.Y., J.C., Q.C. and J.K. are mainly
responsible for the web development of this multi-dimensional comprehensive evaluation system and real-time display
screen of cyber-telecoms fraud prevention. All authors read and approved the final manuscript.
Funding
This work was supported by Research and standardization of key technologies for 6G general computing and intelligent
integration R241149BC03, Research on 6G Trusted Endogenous Security Architecture and Key Technologies Grant
Zhang et al. Journal of Big Data (2024) 11:136 Page 25 of 26
R24113V7, and Research and Standardization of Key Technologies for 6G General Computing and Intelligent Integration
Grant R241149B.
Data availability
KDDCup99 dataset was derived from a simulated US Air Force LAN with attacking lasting 7 weeks and can be obtained
by https://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html. Fraud dataset is owned by a third party. The data
underlying this paper were provided by [third party] under licence/ by permission. data will be shared on request to the
corresponding author with permission of [third party].
Declarations
Competing interests
The authors declare no competing interests.
References
1. Ahmed I, Jeon G, Piccialli F. From artificial intelligence to explainable artificial intelligence in industry 4.0: a survey on
what, how, and where. IEEE Trans Ind Inform. 2022;18(8):5031–42.
2. Putra MA, Ahmad T, Hostiadi DP. B-CAT: a model for detecting botnet attacks using deep attack behavior analysis on
network traffic flows. J Big Data. 2024;11(1):49.
3. Zhang HJ, He S, Chen J. A hierarchical authentication system for access equipment in internet of things. Int J Intell
Syst. 2023;1:1–11.
4. Furman J, Seamans R. AI and the economy. Innov Policy Econ. 2019;19(1):1–191.
5. Yu KH, Beam AL, Kohane IS. Artificial intelligence in healthcare. Nat Biomed Eng. 2018;2:719–31.
6. Khan A. Role of artificial intelligence in car-following and lane change models for autonomous driving. Adv Hum
Asp Transp. 2018;9:307–17.
7. Schetinin V, Li D, Maple C. An evolutionary-based approach to learning multiple decision models from underrepre-
sented data. In: Schetinin V, editor. 2008 Fourth international conference on natural computation, vol. 1. Jinan: IEEE;
2008.
8. Vavra P, Baar JV, Sanfey A. The neural basis of fairness. Interdiscip Perspect Fairness Equity Justice. 2017;5:9–31.
9. Liu HC, Wang YQ, Fan WQ, et al. Trustworthy AI: a computational perspective. ACM Trans Intell Syst Technol.
2022;14(1):1–59.
10. Chatila R, Dignum V, Fisher M, et al. Trustworthy AI. Reflect Artif Intell Hum. 2021;12600:13–39.
11. Malchiodi D, Raimondi D, Fumagalli G, et al. The role of classifers and data complexity in learned bloom flters:
insights and recommendations. J Big Data. 2024;11(45):1–26.
12. Han HG, Zhang HJ, Qiao JF. Robust deep neural network using fuzzy denoising autoencoder. Int J Fuzzy Syst.
2020;22(6):1356–75.
13. Rozsa A, Gunther M, Boult TE. Towards robust deep neural networks with BANG. In: IEEE winter conference on appli-
cations of computer vision (WACV)2018.
14. Yampolskiy R. Unexplainability and Incomprehensibility of AI. J Artif Intell Conscious. 2020;7(2):1–15.
15. Guidotti R, Monreale A, Ruggieri S, et al. A survey of methods for explaining black box models. ACM Comput Surv.
2019;51(5):1–42.
16. Dubey A, Radenovic F, Mahajan D. Scalabl interpretability via polynomials. Neural Inform Process Syst. 2022;1:1–26.
17. Meng ZL, Wang MH, Bai JJ, et al. Interpreting deep learning-based networking systems. IEEE Commun Surv Tutor.
2019;21(3):2702–33.
18. Nwafor O, Okafor E, Aboushady AA, et al. Explainable artificial intelligence for prediction of non-technical losses in
electricity distribution networks. IEEE Access. 2023;11:73104–15.
19. McClure P, Moraczewski D, Lam KC, et al. Improving the interpretability of fMRI decoding using deep neural net-
works and adversarial robustness. Apert Neuro. 2023;3:1–17.
20. Fernandes FE, Yen GG. Automatic searching and pruning of deep neural networks for medical imaging diagnostic.
IEEE Trans Neural Netw Learn Syst. 2021;32(12):5664–74.
21. Barreiro E, Munteanu CR, Monteagudo MC, et al. Net–net auto machine learning (AutoML) prediction of complex
ecosystems. Sci Rep. 2018;8(12340):2685–96.
22. Elliott A. What data scientists tell us about AI model training today, Alegion, 2019; 1–10.
23. Forrester Consulting. Overcome obstacles to get to AI at scale. IBM. 2020; 1–12.
24. Kortylewski A. Analyzing and reducing the damage of dataset bias to face recognition with synthetic data. In: IEEE
Conference on Computer Vision and Pattern Recognition Workshops. 2019; pp. 2261–2268.
25. Jackson A, The state of open data science 2020, Digital Science. 2020: 1–30.
26. Andrew NG A Chat with Andrew on MLOps: From Model-centric to Data-centric AI. 2022. https://cloud.google.com/
solutions/machine-learning/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning.
27. Zhang D, Lai H. Data-centric artificial intellgence: a survey. 2023; 1–39.
28. Zhu SC. Making mathematical models for the humanities: Chinese thought from the perspective of artificial general
intelligence. J Mod Stud. 2024;3(1):42–66.
Zhang et al. Journal of Big Data (2024) 11:136 Page 26 of 26
29. Liang WX, Tadesse GA, Ho D, et al. Advances, challenges and opportunities in creating data for trustworthy AI. Nat
Mach Intell. 2022;4:669–77.
30. Artamonov I, Deniskina A, Filatov V, et al. Quality management assurance using data integrity model. Matec Web
Conf. 2019. https://doi.org/10.1051/matecconf/201926507031.
31. Caballero I, Serrano M, Piattini M. A data quality in use model for big data. Adv Concept Model. 2014;8823:65.
32. Cai L, Zhu YY. The challenges of data quality and data quality assessment in the big data era. Data Sci J.
2015;14(2):78–92.
33. Hongxun T, Honggang W, Kun Z. Data quality assessment for on-line monitoring and measuring system of power
quality based on big data and data provenance theory. In: Hongxun T, editor. 2018 IEEE 3rd international conference
on cloud computing and big data analysis. Chengdu: IEEE; 2018. p. 248–52.
34. Cai L, Zhu YY. The challenges of data quality and data quality assessment in the big data era. Data Sci J.
2015;14:69–87.
35. Barchard KA, Verenikina Y. Improving data accuracy: selecting the best data checking technique. Comput Hum
Behav. 2013;29(5):1917–22.
36. Li ZT, Sun JB, Yang KW, Xiong DH. A review of adversarial robustness evaluation for image classification. J Comput
Res Dev. 2022;59(10):2164–89.
37. Khalfi B, de Runz C, Faiz S, Akdag H. A new methodology for storing consistent fuzzy geospatial data in big data
environment. IEEE Trans Big Data. 2021;7(2):468–82.
38. Wang S, Yao X. Relationships between diversity of classification ensembles and single-class performance measures.
IEEE Trans Knowl Data Eng. 2013;25(1):206–19.
39. Chae JH, Jeong YU, Kim S. Data-dependent selection of amplitude and phase equalization in a quarter-rate trans-
mitter for memory interfaces. IEEE Trans Circuits Syst. 2020;67(9):2972–83.
40. Yao W. Research on static software defect prediction algorithm based on big data technology. In: Yao W, editor. 2020
International conference on virtual reality and intelligent systems (ICVRIS). Zhangjiajie: IEEE; 2020. p. 610–3.
41. Kim KY, Park BG. Effect of random dopant fluctuation on data retention time distribution in DRAM. IEEE Trans Elec-
tron Devices. 2021;68(11):5572–7.
42. Widad E, Saida E, Gahi Y. Quality anomaly detection using predictive techniques: an extensive big data quality
framework for reliable data analysis. IEEE Access. 2023;11:103306–18.
43. Xia Q, Xu Z, Liang W, Yu S, et al. Efficient data placement and replication for QoS-aware approximate query evalua-
tion of big data analytics. IEEE Trans Parallel Distrib Syst. 2019;30(12):2677–91.
44. Lee D. Big data quality assurance through data traceability: a case study of the national standard reference data
program of Korea. IEEE Access. 2019;7:36294–9.
45. Ge Z, Liu Y. Analytic hierarchy process based fuzzy decision fusion system for model prioritization and process
monitoring application. IEEE Trans Industr Inf. 2019;15(1):357–65.
46. Antal E, Tillé Y. Simple random sampling with over-replacement. J Stat Plann Inference. 2011;141(1):597–601.
47. Al-Yaseen WL, Othman ZA, Nazri MZA. Multi-level hybridsupport vector machine and extreme learning machine
based on modified K-means for intrusion detection system. Expert Syst Appl. 2017;67:296–303.
48. Zhang L, Yan H, Zhu Q. An improved LSTM network intrusion detection method. In: Zhang L, editor. 2020 IEEE 6th
international conference on computer and communications (ICCC). Chengdu: IEEE; 2020.
49. Guo XD, Li XM, Jing RX, et al. Intrusion detection based on improved sparse denoising autoencoder. J Comput Appl.
2019;39(3):769–73.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.