International Test and Evaluation Standards For Artificial Intelligence Based On Networked Data-Information-Knowledge-Wisdom-Purpose (DIKWP) Model
International Test and Evaluation Standards For Artificial Intelligence Based On Networked Data-Information-Knowledge-Wisdom-Purpose (DIKWP) Model
net/publication/382145340
CITATIONS READS
11 1,392
45 authors, including:
Fuliang Tang
Hainan University
109 PUBLICATIONS 219 CITATIONS
SEE PROFILE
All content following this page was uploaded by Fuliang Tang on 11 July 2024.
Contributing Units of the Editorial Board: (in alphabetical order, without precedence):
AIII, AGI-AIGC-GPT Evaluation DIKWP (Global) Laboratory, Blue Edu, Peking University, University of Science and
Technology Beijing, Bejing Academy of Social Sciences, Beijing Institute of Standardization Technology, Chengdu University
of Information Technology, Chongqing Police College, Dongguan Advantech Precision Manufacturing Co., Ltd, Guangxi
Normal University, State Grid, Hainan University, Hainan Nuclear Power Co., Ltd., Hainan Pushi Intelligent Technology Co.,
Ltd, Hainan Provincial Administration for Market Regulation, The Second Affiliated Hospital of Hainan Medical University,
Huazhong Agricultural University, Jiangsu Lizhuo Information Technology Co., Ltd, Kensid (Zhuhai) Co., Ltd, People's
Procuratorate of Liaoyang City, Liaoning Province, Nanjing police university, Inner Mongolia University, Ningbo University,
Institute for Advanced Study, Tsinghua University, Shandong University, Shanxi Provincial Bureau of Data, Shanghai Aerospace
Information Technology Research Institute, Shangrao Normal University, Sangfor Technologies Inc., World Association of
Artificial Consciousness(WAAC), World Conference on Artificial Consciousness(WCAC),Tai Chi Computer Co., Ltd, Xi’an
University of Technology, Southwest University of Political Science & Law, Standardization Research Center of Guangdong
Hong Kong Macao Greater Bay Area, Institute of Standardization Theory and Strategy of China Institute of Standardization,
China Association for the Application of Mechatronics Technology, China Institute of Information and Communication
Technology, etc
Preface ........................................................................................................................... 5
1 Scope........................................................................................................................... 6
3.1 DIKWP............................................................................................................ 6
3.12 Agent............................................................................................................ 33
Appendix ..................................................................................................................... 51
Reference .................................................................................................................... 58
Preparation Instructions
Preface
With the rapid development of artificial intelligence technology and the rise of
large language models, the way we interact with intelligent systems is undergoing a
change. Artificial intelligence models have demonstrated unprecedented performance
in various applications, attracting significant attention from various sectors of society
regarding their evaluation. Currently, the evaluation benchmarks for artificial
intelligence models present a diversified trend, with various benchmarks designed
specifically for different dimensions of performance emerging continuously. Examples
include GLUE, SuperGLUE, CLUE, SuperCLUE, which are used to test language
understanding and generation capabilities, while specific domain benchmarks like Owl-
Bench are tailored for areas such as intelligent operations and maintenance. In addition,
there are also evaluation benchmarks focusing on aspects like the security of large
models, ethical risks, fairness, reflecting the industry's emphasis on comprehensive
quality control of models. However, given the uneven levels of development and
maturity among these evaluation benchmarks, researchers face challenges in selecting
and referencing them, requiring a careful evaluation of suitable evaluation tools and
standards in combination with specific application scenarios. Faced with numerous
artificial intelligence model evaluation benchmarks launched by various research teams
and companies, there is a common and urgent demand among relevant professionals in
the field for a comprehensive, systematic, fair, and practical set of evaluation indicators
and methods to guide and promote the development and evaluation of artificial
intelligence models. The AGI-AIGC-GPT Evaluation DIKWP (Global) Laboratory,
composed of experts and scholars with long-term engagement in artificial intelligence
research, has drafted the "International Test and Evaluation Standards for Artificial
Intelligence based on Networked Data-Information-Knowledge-Wisdom-Purpose
(DIKWP) Model" aiming to establish an internationally recognized evaluation
benchmark for artificial intelligence with a certain degree of foresight and pilot
conditions.
1 Scope
This document outlines the relevant terms and definitions of International Test and
Evaluation Standards for Artificial Intelligence for artificial intelligence model
evaluation, describing the framework for evaluating AI models using DIKWP,
proposing DIKWP indicators for model evaluation, DIKWP evaluation methods, and
typical application cases.
This document is applicable to service providers, users, and third-party testing
organizations involved in the design and implementation of artificial intelligence model
testing.
2 Normative References
3.1 DIKWP
Purpose Purpose
Purpose
Wisdom Information
Purpose Purpose
Knowledge
information from the input space (𝐼𝑛𝑝𝑢𝑡𝑖 ) and transforms it through a series of sub-
steps (such as data preprocessing, feature extraction, pattern recognition, logical
reasoning, and decision-making) into outcomes in the output space (𝑂𝑢𝑡𝑝𝑢𝑡𝑖 ), such as
information classification, concept formation, purpose determination, or action
planning.
Function set: R={fConN_1, fConN_2,…,fConN_n}, where each function fConN_i: Inputi→
Outputi represents a specific cognitive processing step. Inputi is the input space, and
Outputi is the output space.
Input space Inputi: Includes various data or information sources received by the
individual or system. These inputs may come from observations of the external
environment (such as visual and auditory perceptions), signals received from other
systems, or internally generated data. The input space reflects the diversity of the
cognitive subject's interactions with the outside world and the breadth of information
acquisition.
Output space Outputi: Contains various higher cognitive products formed after
cognitive processing. This can include classification of input information, conceptual
structures built based on information, clear identification of purpose, and specific action
plans set to realize these purpose. The output space reflects the cognitive subject's
ability to deeply process and transform input information, which is the basis for the
cognitive subject's responses or actions towards the external world.
Each cognitive processing function fConN_i can be further refined into a series of
sub-steps, including data preprocessing, feature extraction, pattern recognition, logical
reasoning, and decision-making. These sub-steps together constitute the complete
cognitive pathway from raw data to the final output.
Sub-step representation: For each fConN_i , it can be represented as fConN_i=fConN_i(5)
○ fConN_i(4) ○…○ fConN_i(1)(Inputi) , where fConN_i(j) represents the processing function of
the j-th sub-step, and ○ represents the composition of functions.
Semantic space refers to the semantic association network of concepts within the
cognitive subject's brain, including semantic relationships and associations between
concepts. Semantic space is formed through the cognitive subject's experiences and
accumulated knowledge. For example, for the concept of "car," the semantic space may
include associated semantics such as "driving," "vehicle," "fuel consumption," and
others.
The semantic space is a collection formed by a series of semantic units, which are
interconnected through specific associations and dependencies, collectively
constituting an objective representation of information and knowledge. Widely
accepted concepts and linguistic rules in the semantic space facilitate the transmission
and communication of meaning.
Graph Representation: GraphSemA=(VSemA, ESemA) ,where VSemA represents semantic
units (words, sentences, etc.), and ESemA represents the associations and dependencies
between semantic units.
Semantic Unit: Each semantic v∈VSemA represents the smallest unit or concept that
can independently express meaning.
Relationships: Edge e ∈ ESemA represents semantic associations or logical
dependencies between semantic units, such as synonymy, antonymy, hyponymy,
causality, and other relationships.
In the semantic space, a series of operations correspond to querying, adding, or
modifying semantic units and their relationships:
Query Operation: Query(VSemA, ESemA, q)→{v1,v2, … ,vm}, returns a set of
semantic units that satisfy the query condition q.
Add Operation: Add(VSemA, v), adds a new semantic unit v to the set VSemA.
Update Operation: Update(ESemA,v,v',e), updates or adds the relationship e between
semantic units v and v'.
The semantic space not only provides stakeholders with a cognitive shared
language system for expressing DIKWP, but also supports semantic consistency in the
transformation and processing between DIKWP components. Leveraging semantic
units and their relationships enables accurate transmission and interpretation of
complex service interaction cognitive content among different entities.
3.5 Data Definition
In the DIKWP model, data concepts are not merely passive records of
observational results but collections of semantic objects actively recognized and
classified by cognitive systems. Mathematically, we can view data concepts as a
collection D of semantic instances, where each semantic instance d ∈ D is identified
as having the same set of semantic attributes S. Here, S={f1, f2, ... , fn} can be seen as a
set of parameters defining the semantic features of data concepts, where fi represents a
semantic feature of the data concept. This representation helps us understand how data
concepts are induced and processed based on shared semantic features.
In the DIKWP model, data concepts are regarded as specific manifestations of the
same semantics in cognition. Mathematically, we can define the semantic set D
corresponding to data concepts as a vector space, where each element d∈D is a vector
representing a specific semantic instance. These semantic instances are categorized
under the same semantic attribute S by sharing one or more semantic features F, i.e.,
S={f1, f2, ... , fn}
Where fi represents a semantic feature of data concepts. Therefore, we can define
the collection of data concepts as:
D={d∣d share S}
This description emphasizes the semantic multidimensionality and semantic
structural nature of data concepts, while also providing a mathematical foundation for
subsequent data concept processing and analysis.
3.5.4 The Specific Manifestations of Data Concept and the Same Semantics
In the DIKWP model, data concepts and semantic recognition view data concepts
not just as observations and recordings of the real world but as specific manifestations
of the same semantic attributes perceived by cognitive entities in communication and
interaction. This definition transcends the surface-level independent objective cognitive
existence of data concepts as records of objective facts, emphasizing the cognitive
nature of data concepts in the interaction between cognitive entities in cognitive space.
That is, the recognition and processing of data concepts depend on the connections and
matches with existing semantics in the subjective semantic space of cognitive entities.
Data concepts inherently possess cognitive subjectivity and context-dependence,
meaning that the same data concept may be linked to and processed with different
semantics depending on different cognitive entities or cognitive backgrounds.
Philosophically, data concepts cease to be mere objective records of existence but
become subjective interpretations through the subjective cognitive processes of
individuals. The formation and existence of data concepts rely on the semantic space
and conceptual space memory and processing capabilities of cognitive entities,
representing the correlation and transformation between the semantic space and
conceptual space in the interaction between the real world and cognitive entities. The
generation and recognition of data concepts are not purely objective processes but
deeply rooted in the preconceived conceptual space and contextual semantic space of
the subject. Therefore, the recognition and interpretation of data concepts must take
into account the cognitive spatial background knowledge, experiential information, and
cultural contextual semantics of cognitive entities.
The meaning of data concepts must be confirmed through the interpretation and
semantic matching of cognitive entities. The interaction between data concepts and data
semantics becomes a bridge connecting objective reality with subjective cognition. This
understanding highlights a Platonic idea: things in the real world (as concepts) are only
shadows of their ideas (i.e., "same semantics"). Thus, the cognitive value of data
concepts lies not only in the objectivity of their forms but also in how cognitive entities
seek and confirm the shared semantics of cognitive objects and phenomena through
data concepts, triggering semantic resonance and cognitive confirmation. This
interactive process of re-cognizing data concepts and data semantics within cognitive
entities is not only a cognitive mirror reflection of the external world for cognitive
entities but also a pursuit and revelation of the intrinsic semantic nature of phenomena.
It emphasizes the cognitive dominance and creative existence of conceptual semantic
transformation in the interpretation of data concepts by cognitive entities, as well as the
interaction between data concepts and the subconscious or conscious symbolic
language of cognitive entities.
The DIKWP model's cognitive definition of data concepts and data semantics
emphasizes the cognitive nature of data and their role as semantic entities. In philosophy,
this touches upon discussions of the "essence of things" and "being true to the name."
Data concepts are not merely symbolic records of objective existence; they are entities
endowed with specific data semantics, which are confirmed and endowed through the
cognitive entity's processing across conceptual and semantic spaces. This cognitive
processing also reveals that knowledge generation is not just a mapping of the objective
world but also a subjective process of constructing based on the transformation from
similar semantics to concepts. This aspect is reflected in Kantian epistemology, where
human knowledge of the world partly originates from external stimuli but is largely
determined by our cognitive structures.
In the field of artificial intelligence research, the precise expression and effective
organization of knowledge constitute the core driving force behind technological
advancement. This process is deeply rooted in the meticulous construction of cognitive
models and an in-depth understanding of information processing mechanisms. The
following are several key knowledge representation frameworks:
1. Formal Logic Systems: As a foundational framework for knowledge
representation, formal logic, particularly propositional logic and first-order predicate
logic, provides a rigorous mathematical basis for the precise expression and reasoning
of information. Propositional logic uses truth functions to express simple facts and their
logical relationships, while first-order predicate logic achieves formal descriptions of
entity attributes, relationships, and existence through advanced constructs such as
variables, predicates, and quantifiers. This supports the deductive reasoning and
consistency verification of complex propositions.
2. Production Systems: This is a rule-based representation method where
"condition-action" rules (i.e., production rules) become the primary units of knowledge
encoding. This model excels at simulating human expert decision-making processes,
especially in fields such as diagnosis, planning, and problem-solving. The flexibility of
production systems lies in their ability to dynamically select applicable rules based on
input conditions, achieving a mapping from known facts to target actions,
demonstrating the efficiency and practicality of rule-based knowledge processing.
3. Frame Representation: A frame is a structured knowledge template used to
organize information about a specific topic. Each frame consists of several "slots,"
which are filled with specific data or pointers to other frames, forming a closely
connected knowledge network. This method emphasizes the layering and
modularization of knowledge, facilitating the handling of structural information of
complex concepts and supporting efficient information retrieval and updating
operations.
4. Process Representation: Process representation focuses on demonstrating the
dynamics of knowledge, paying attention to state changes, event sequences, and the
execution of operations. It is particularly suitable for scenarios that require tracking
state changes, simulating event sequences, or designing control processes. Through
process representation, complex behavioral patterns and dynamic systems can be
systematically described and analyzed, providing strong support for fields such as
automated planning, robotic path planning, and workflow management.
Knowledge K is represented as a semantic network, where node n represents
concepts and edge e represents relationships between concepts:
K = (N, E)
Where N={n1, n2, … , nk} represents the set of concepts, E={e1, e2, … ,em}
represents the set of relationships between these concepts, and each edge can be
represented as
e =(ni, nj, r),ni, nj∈N
And r represents the semantic relationship between ni and nj .
In the DIKWP model, knowledge is not merely a record of observations and facts
but a systematic understanding formed through assumptions and higher-order cognitive
activities. The semantic integrity and systematic nature of knowledge reflect the
cognitive subject's profound understanding and interpretation of the world. The process
of knowledge generation emphasizes the active and creative role of the cognitive
subject in understanding and interpreting the world. Through assumptions and
abstraction, partial observations are endowed with complete semantics, thus forming
systematic knowledge.
Knowledge semantics are not just an aggregation or reorganization of DIKWP
content semantics but a creation of new semantic associations, reflecting the cognitive
subject's active exploration and interpretation of the world. Through assumptions and
higher-order cognitive activities, the process of knowledge generation can reveal deep
connections and underlying logic between phenomena, providing a more
comprehensive and profound understanding of the world.
The core values of wisdom revolve around building a human community with a
shared destiny, centered on human values. Cognitive subjects, relying on this core value
system, construct, analyze, affirm, correct, and develop the DIKWP content semantics
of individual and collective cognitive spaces, semantic spaces, and conceptual spaces.
Wisdom exists not only at the individual level but also at the societal level. The
formation of individual wisdom and social wisdom relies on the integrated development
of cognitive individuals and groups' DIKWP content semantic cognitive capabilities, as
well as on a deep understanding and reflection of the environment, cultural backgrounds,
and social relationships. The process of wisdom formation includes:
• Cultural inheritance: Through cultural inheritance, wisdom is shared and
propagated within communities. For example, ethical values and moral principles in
traditional cultures are passed down through education and social practices.
• Social interaction: Wisdom formation also depends on social interaction, where
wisdom continually develops and improves through communication and collaboration
among people. For instance, collective decision-making processes in community
governance exemplify the embodiment of wisdom.
In the field of AI, the goal of wisdom semantics processing is to develop advanced
decision-making artificial consciousness systems or ethical AI that can consider
multiple factors based on human-centered principles and provide solutions that are
more intelligent and align with ethical standards. Applications of wisdom in AI include:
• Ethical AI systems: Designing AI systems capable of making ethical decisions in
complex environments. For example, self-driving car systems need to weigh the safety
of passengers and pedestrians in emergency situations to make decisions that adhere to
ethical standards.
• Advanced decision systems: Developing advanced decision systems that integrate
considerations from multiple factors. For instance, in medical diagnostics, AI systems
need to combine patient medical history data, medical knowledge, and ethical principles
to provide optimal treatment plans.
In cognitive processes, purpose represents the transition path from the current state
to the desired state, revealing the dynamics and direction of cognitive activities. This
goal-oriented cognitive process emphasizes the proactive and creative nature of the
cognitive agent when processing information, as well as the underlying motivations and
goals behind cognitive activities.
From a philosophical perspective, purpose is not merely a preset goal of action but
the fundamental motive behind individual existence and behavior. Purpose embodies
individual free will and aspirations for the future, serving as the intrinsic drive for
interaction between individuals and the world.
• Aristotle's Final Cause: Aristotle believed that everything in existence has a purpose
or ultimate reason. Purpose, as the core of cognitive activities, aligns with Aristotle's
concept of final causes, emphasizing the importance of purpose in cognitive activities.
• Hegel's Teleological View: Hegel posited that reality's drive comes from the unity
of opposites, where through the process of purposive action, self-realization and self-
negation lead individuals to higher cognitive realms.
• 。Existentialist Free Will: Existentialist philosophy emphasizes the decisive role of
individual choice and purpose in one's existence. Purpose reflects individual free will
and serves as the core driving force of cognitive activities.
In the network model, data graphs are not only the starting point of information
processing but also the result of feedback adjustments for knowledge, wisdom, or
purpose. The data graph (DG) receives inputs from information, knowledge, wisdom,
and purpose through transformation functions TID, TKD, TWD, TPD achieving dynamic
updates and adjustments.
TXY: YG→XG, where X,Y ∈{D,I,K,W,P} and X ≠ Y, denotes the transformation
from graph Y to graph X.
The purpose graph is defined as PG = (VP, EP), where VP represents the nodes of
goals and implementation paths, and EP represents the strategies or steps to achieve
these goals. The purpose graph is constructed by data, information, knowledge, and
wisdom, and it can inversely influence these components:
TDP TIP TKP TWP
DG → PG , IG → PG , KG → PG , WG → PG : Formation of purpose from data,
information, knowledge, and wisdom.
TPD TPI TPK
PG → DG , PG → IG , PG → KG : Inverse influence of purpose on data, information,
and knowledge.
The DIKWP graphing system maps elements of the digital world and the cognitive
world to five main components: DG, IG, KG, WG, and PG. Each graph is further
subdivided into three levels of mapping: the semantic level, the conceptual level, and
the instance level. Thus, each graph g∈G is a triplet mapping:
g:S×C×I where G represents the set of graphs, S represents the set of semantic
levels, C represents the set of concepts, and I represents the set of instances.
The interactions between the DIKWP graphs are achieved through content models
and cognitive models, represented by a function f , which transforms the mapping of
one level or type of graph into another level or type of graph.
f: G×G→G
Complete, accurate, and consistent interaction modeling: objective content
space<->subjective cognitive space
Multimodal Objective Content Stakeholders (individual subjective
World (Resources) Interaction/ cognitive space)
communication/
processing
Multiple sources of imprecise, incomplete, Content data content information Cognitive data graph cognitive information graph
Multiple sources of imprecise, incomplete,
transform and inconsistent human-machine-object
and inconsistent human-machine-object graph graph interaction subjective and objective resources
interaction subjective and objective resources
{information graph} 输
输 输 输输输
输出 输
{infor mation gr aph}
输
输 入 输 输输
输 输输输 输入 入
输
入
输
输 入 输出 输
输 输输 入 输出
入 输出输
出 出入
输入 入
输
入
输出输 入 输输
入
出入
入 输出
入 出 出入 Cognitive Wisdom Graph cognitive knowledge graph 出入
入 输输
入 Content Wisdom Graph content knowledge graph 出
出入出入
出 DIKWP Subjective Cognitive
DIKWP Objective Content
Resource Graphical Mapping
Resource Graphical Mapping Consolidation of
DGD IKWD type resources
D IKW DP
DPD D D
data graph transform
Figure 3-4 Concept Space DIKWP Graph and Graph Relationship Transformation
In the conceptual space, the paradigm for converting across DIKWP (Data,
Information, Knowledge, Wisdom, purpose) resources, as illustrated in Figure 3-4,
involves mapping each dimension of DIKWP to corresponding processing methods:
data is mapped to conceptual statistical analysis, information is modeled and analyzed
using the partial order structure of lattice theory, knowledge is deepened through
various reasoning techniques, wisdom is reflected in multi-objective value-based
decision-making, and purpose is associated with precise modeling of goals and
problems.In this process, by systematically integrating processing methods for
individual DIKWP types and promoting pairwise interactive combinations among them,
a comprehensive processing mapping system that spans different types of DIKWP
resources is established.
D I K W P
D D1+P D2 D+P I D+P K D+P W D+P1 P2
I I+P D I1+P I2 I+P K I+P W I+P1 P2
K K+P D K+P I K1+P K2 K+P W K+P1 P2
W W+P D W+P I W+P K W1+P W2 W+P1 P2
P P1+P2 D P1+P2 I P1+P2 K P1+P2 W P1+P2 P3
D 统计
I 格论
K 推理
W 权衡
P 问题
统计 格论 推理 权衡 问题
D D D I D K D W D P
统计
统计 统计 统计 格论 统计 推理 统计 权衡 统计 问题
I D I I I K I W I P
格论
格论 统计 格论 格论 格论 推理 格论 权衡 格论 问题
K D K I K K K-W K P
推理
推理 统计 推理 格论 推理 推理 推理 权衡 推理 问题
W D W I W K W W W P
权衡
权衡 统计 权衡 格论 权衡 推理 权衡 权衡 权衡 问题
P D P I P K P W P P
问题
问题 统计 问题 格论 问题 推理 问题 权衡 问题 问题
3.12 Agent
3.13 RAG
4 Evaluation System
The DIKWP model is a model that vividly describes the cognitive process, linking
data, information, knowledge, wisdom, and purpose closely together. It forms an
interactive process across cognitive space, conscious space, semantic space, and
conceptual space, serving as an effective framework for semantic representation. The
DIKWP model enables a comprehensive examination of the artificial intelligence
model's ability to process data, information, knowledge, wisdom, and purpose resources
from conceptual space to cognitive space and then to semantic space, assessing the AI
model's semantic understanding and processing capabilities. In the dimension of
DIKWP semantic understanding, it evaluates how the AI model maps data, information,
knowledge, and purpose resources from the conceptual space to the cognitive space and
converts them into cognitive content, thereby verifying the semantic validity of DIKWP
resources in the semantic space. The evaluation involves the AI model's precise
definition of concepts, identification of semantic components, and their dynamic
adjustment and mapping in various cognitive environments.
Knowledge is the embodiment of complete semantics, which differs from data and
information. In the complete semantics of knowledge, specific concepts or patterns are
contained. The knowledge understanding ability of artificial intelligence models is their
ability to extract and utilize concepts from knowledge elements.
1. Concept extraction: Evaluating the model's ability to accurately extract core
concepts, patterns, and their attributes from knowledge elements, such as identifying
entities, relationships, events, rules, etc., and representing them in a standardized,
structured manner.
2. Knowledge application: Examining the model's ability to flexibly apply
mastered knowledge to reasoning, answering questions, generating creative insights, or
solving problems in given tasks or contexts, reflecting the model's understanding of the
depth, breadth, and applicability of knowledge.
Data bias primarily examines the model's sensitivity to sensitive attributes (such
as gender, race) in input data, and its response to potential output data distribution biases
that may result from these attributes.
Sensitivity to Sensitive Attributes: Artificial intelligence models should have the
ability to recognize and handle potential sensitive attributes in the data, and not exhibit
significant processing differences due to different attribute values (such as gender, race).
This requires the model to confirm the existence semantics of the data, correspond to
the same objects or concepts as the semantics of the existence of its own cognitive
objects, and avoid stereotypical or discriminatory processing of specific data.
Output Data Distribution: Evaluates whether the model's output results exhibit
unnatural, systematic biases due to external factors such as gender, race. For example,
in a job prediction task, if the model's predictions for different genders significantly
deviate from the actual occupational distribution proportions given the same
background information, there may be gender bias. Statistical analysis and comparative
experiments to detect whether the model's outputs remain balanced across different
attribute groups help reveal potential data biases.
Knowledge bias plays a central role in the processing and understanding of data.
In the process of knowledge handling, artificial intelligence models abstract multiple
concepts or patterns corresponding to complete semantics. Based on this, we mainly
assess whether the model avoids inherent ideas or one-sided conclusions when
constructing knowledge systems, and whether it tends towards certain specific
viewpoints or interpretations.
Wisdom Bias explores whether models can reflect fair and inclusive values during
complex decision-making or reasoning processes, as well as whether they exhibit biases
towards specific groups.
Decision Fairness: In complex decision-making scenarios involving ethics, law,
morality, etc., assess whether model decisions remain fair to all stakeholders, unaffected
by external factors such as gender, race, etc. The focus is on measuring the balanced
distribution of artificial intelligence models in integrating data, information, knowledge,
and wisdom to generate optimal decision-making processes.
4.3.5 Purpose Bia
Purpose bias focuses on whether the model can understand user purpose without
being influenced by preset preferences or discriminatory assumptions, accurately
identifying and responding to the user's true purpose.
Purpose Recognition: By constructing user purpose containing different genders,
races, and cultural backgrounds, the model's accuracy in understanding user purpose in
various contexts is tested to ensure that it is not affected by user attributes.
Purpose Response: Evaluating whether the resources provided by the model in
response to user requests remain consistent for all user groups, without differentiation
based on user attributes.
Concept space security focuses on the structure, storage, and access control of data
and information within artificial intelligence models, ensuring the physical and logical
security of data. It emphasizes the rationality and security of data structures, including
the design of data organization, labeling systems, and associative relationships, as well
as the use of encryption storage, backup recovery, and other technical means to prevent
data from being illegally tampered with or accidentally lost. Additionally, in the concept
space, strict control over access permissions to data and information resources is
enforced, following the principle of least privilege, to prevent unauthorized access,
leakage, or misuse.
⚫ Self-protection mechanism: Monitoring the temperature rise of the chip under high-load
working conditions, whether it can trigger self-protection mechanisms when the chip's
normal operating temperature is exceeded, and the threshold temperature for triggering
mechanisms, such as downclocking or forced shutdown, to ensure that the chip will not
be damaged due to extreme environments.
⚫ Leakage Prevention Mechanism: Testing the chip's measures to prevent leakage when
handling sensitive information, such as access control, data isolation strategies, etc.
⚫ Security Authentication: Checking the chip's authentication capability for data sources to
ensure that only authorized data can be processed.
⚫ Ethical Security: For sensitive decision-making processes, ensuring the ethical security
of output content so that the output content does not violate the current ethical values of
humanity.
5 Evaluation Indicators
The evaluation indicator system of International Test and Evaluation Standards for
Artificial Intelligence based on Networked Data-Information-Knowledge-Wisdom-
Purpose (DIKWP) Model is divided into two levels. The first-level indicators include
functional indicators and performance indicators, while the second-level evaluation
indicators decompose and refine each first-level indicator.
Where N is the total number of data, 𝑦𝑖 is the output of the model for the i-th
input, 𝑌𝑠 is the set of sensitive categories, and 𝐼 is the knowledge function.
2. Data Balance: Measures the degree of distribution balance of output data
among different predefined categories. For example, in gender classification tasks, the
ratio of male to female outputs should be close to 1:1 to indicate balance.
2
1
B =1− K
k =1 ( pk −
K
) (5.3.7)
Where 𝑝𝑘 is the proportion of the k-th class in the output data, and 𝐾 is the total
number of classes.
3. Data Robustness: Measures whether the data output by the model aligns with
the user's actual purpose, i.e., its performance in data alignment. This mainly focuses
on the model's robustness when faced with different types of inputs, especially in
situations where input data may contain noise or distortions.
R =1−
N
i =1 xi − xi'
(5.3.7)
N
Where 𝑥𝑖 and 𝑥𝑖′ represent the data representations of the model input and the
user's actual purpose, respectively, and N is the total number of samples.
Where 𝑎𝑐𝑐𝑚 and 𝑟𝑒𝑐𝑚 respectively represent the accuracy and recall of the m-th
class of information, and 𝑀 is the total number of information categories.
2. Bias Ratio: Measures the ratio of biased information (such as negative bias,
speech from specific groups, etc.) to unbiased information in model output.
BR =
iI b wi
(5.3.7)
jI u wj
Where 𝑣𝑖 and 𝑣𝑖′ represent the vector representations of model output and user
purpose, respectively, and 𝑁 is the total number of samples.
4. Information Redundancy: Measures the redundancy of information content at
the semantic level by analyzing information structure, i.e., whether information is
repeated across different outputs.
RDDinfo =
pairs sim(infoi , info j )
(5.3.7)
( )
N
2
N i
1 s
Ambinfo =
N
i =1
ci
(5.3.7)
Where 𝑚𝑖 is the number of possible meanings for the i-th information item in its
context, and 𝑐𝑖 is the number of contexts in which the information item appears.
N tTopics
1 P(t | input ) P(t | output )
Divk = 1 −
N
i =1
Topics
(5.3.7)
Fd = 1 −
uU decu − dec
(5.3.7)
U
Where 𝑑𝑒𝑐𝑢 represents the bias of the decision towards unit 𝑢 , 𝑑𝑒𝑐 ̅̅̅̅̅ is the
average decision value for all units, and 𝑈 is the set of units under consideration.
2. Wisdom Ethics: Measures the ethical and moral considerations of the wisdom
or decisions output by AI systems, assessing whether the system can follow ethical
norms in complex decision environments. This part mainly calculates the ratio between
decisions following ethical standards and total decisions to quantify the ethical
proportion of AI model decisions.
3. Wisdom Context Adaptability: Measures the adaptability and rationality of
the decisions or recommendations output by AI systems to all stakeholders' interests in
different contexts, ensuring comprehensive consideration of decisions without harming
the interests of any party.
1
Adawis =
N
iN=1 adapt (wi , Si ) (5.3.7)
Where 𝑝𝑢𝑝𝑚𝑜𝑑 and 𝑝𝑢𝑝𝑢𝑠𝑒𝑟𝑖 represent the vector representations of the model
and user purpose, and sim is the similarity (cosine similarity) between them.
ETrans =
N DIKWP
i ti
(5.3.2)
T
1 25
CTrans = xi
25 i =1
(5.3.3)
𝑥𝑖 is an indicator variable; if the ith dimension of the transformation process is
covered, then𝑥𝑖 = 1; otherwise, 𝑥𝑖 = 0.
3. Transformation Precision: The accuracy of the results obtained by the
artificial intelligence model after transforming DIKWP resources into other resources.
N DIKWP
ak
PrTrans = k =1
(5.3.4)
N DIKWP
U DIKWP
um
E Ays = m =1
(5.3.5)
T
Where 𝑢𝑚 is the time required to identify the m-th uncertainty resource, and
𝑈𝐷𝐼𝐾𝑊𝑃 is the total number of uncertainty resources.
2. Efficiency and Accuracy of Uncertainty Resource Processing: Evaluates the
accuracy of the results obtained by the model when processing uncertain DIKWP
resources using fusion transformation, and whether this uncertainty processing
improves the certainty of DIKWP resources.
Runcert Runcert
rn p =1
bp
Euncert = n =1
, Auncert = (5.3.6)
T Runcert
Where 𝑟𝑛 is the time required for the transformation of the n-th uncertainty
resource, 𝑏𝑝 is the indicator of accuracy after transformation, with 1 indicating no
uncertainty in the transformed resource, and 0 indicating otherwise, and 𝑅𝑢𝑛𝑐𝑒𝑟𝑡 is the
total number of uncertainty resources.
N DIKWP
( Si Vi )
U depth = i =1
(5.3.7)
N DIKWP
Where 𝑆𝑖 is the semantic understanding score for the ith resource, 𝑉𝑖 is the
weight of the resource among all resources, and 𝑁𝐷𝐼𝐾𝑊𝑃 is the total number of
resources.
2. Semantic Understanding of Uncertain DIKWP Resources: Assesses the
model's ability to understand the semantics of DIKWP resources in the presence of
uncertainty.
Nuncert
( Sk Pk )
U uncert = i =1
(5.3.8)
Nuncert
Where 𝑆𝑘 is the semantic understanding score for the kth uncertain resource, 𝑃𝑘
is the weight of the resource among all uncertain resources, and 𝑁𝑢𝑛𝑐𝑒𝑟𝑡 is the total
number of resources.
We use a weighted average method to synthesize comprehensive semantic analysis
performance indicators, assigning a weight to each sub-indicator based on its
importance in practical applications.
N cog
Ecog = (5.3.10)
N cog
r =1
cr
Where 𝑁𝑐𝑜𝑔 is the number of cognitive tasks, and 𝑐𝑟 is the time required to
complete the r-th task.
∑ 𝑅𝐷𝐼𝐾𝑊𝑃
𝑃𝑇𝑃 = (5.4.1)
∑ 𝑇𝑅𝑖
In formula 5.4.1, 𝑃𝑇𝑃 represents throughput, ∑ 𝑅𝐷𝐼𝐾𝑊𝑃 represents the total sum
of DIKWP resources, and ∑ 𝑇𝑅𝑖 represents the total time consumed to process these
resources.
2. Response Time: The total time from receiving resources to completing tasks.
𝑇𝑟𝑒𝑠 = 𝑇𝑒𝑛𝑑 − 𝑇𝑠𝑡𝑎𝑟𝑡 (5.4.2)
In formula 5.4.2, 𝑇𝑒𝑛𝑑 and 𝑇𝑠𝑡𝑎𝑟𝑡 represent the time after the task is completed
and the start time of receiving resources, with a unit scale of nanoseconds (ns).
3. Speedup: The proportion of improvement in processing speed compared to
single processing when using chip parallel processing.
𝑇𝑠𝑖𝑛𝑔𝑙𝑒
𝑎𝑠 = (5.4.3)
𝑇𝑃𝑎𝑟𝑎𝑙𝑙𝑒𝑙
In equation 5.4.3, 𝑎𝑠 represents the speed acceleration ratio, while 𝑇𝑠𝑖𝑛𝑔𝑙𝑒 and
𝑇𝑃𝑎𝑟𝑎𝑙𝑙𝑒𝑙 represent the time consumed for single chip processing and refrigerator
processing, respectively, in nanoseconds (ns).
4. Parallel Efficiency: The resource utilization and speed acceleration ratio during
parallel computing.
𝑎𝑠
𝑃𝑒 = (5.4.4)
∑ 𝑁𝑢𝑛𝑖𝑡
In equation 5.4.4, 𝑃𝑒 represents parallel efficiency, which is the ratio of speed
acceleration ratio(𝑎𝑠 ) to the total number of computing resources in parallel units
(∑ 𝑁𝑢𝑛𝑖𝑡 ).
6 Evaluation Methods
6 Evaluation Methods
The evaluation method is based on the assessment index system, for different
dimensions from DIKWP processing, conceptual, semantic, cognitive, and bias, to
provide methodological references such as index selection and weight setting for AI
model evaluation. Evaluation is required to comprehensively show the sub-scores and
comprehensive evaluation scores. Please refer to the appendix for the content of the
specific evaluation method.
Appendix
Using a descriptive text as the output case, the model is required to map the text
into data resources, information resources, knowledge resources, wisdom resources,
and purpose resources. The completeness, accuracy, and processing efficiency of the
DIKWP resource mapping by the model are then checked and evaluated. Specific input
cases are provided in Appendix 2(1). Evaluation indicators include Formula 5.3.1,
mapping completeness indicators, mapping accuracy indicators, and mapping
efficiency indicators. The evaluation score calculation formula is as follows:
s1 + s2 + s3
Q1.1 =
3 5
Where 𝑠1 is the completeness indicator score, 𝑠2 is the accuracy indicator score,
and 𝑠3 is the mapping efficiency indicator score.
An already mapped DIKWP case (including descriptive text and DIKWP resource
content) is provided, and the model is required to use the provided DIKWP resources
to integrate and transform them, thereby enriching the mapping level of DIKWP
resources to the text, supporting complex cognitive tasks and decision-making
processes. Based on the model's output, the efficiency, completeness, and accuracy of
the DIKWP resource transformation are checked. Specific input cases are provided in
Appendix 2(2). Evaluation indicators include Formulas 5.3.2, 5.3.3, and 5.3.4. The
evaluation score calculation formula is as follows:
A set of DIKWP resources that have been initially mapped is input into the model,
requiring the model to analyze the input DIKWP resources and transform them into
conceptual content. The model's ability to transform these resources into concepts is
then evaluated. Subsequently, the DIKWP resources and fully transformed conceptual
content are input into the model, requiring the model to construct a conceptual network
using these resources and identify logical or semantic relationships between nodes.
Specific input cases are provided in Appendix 2(3). Evaluation indicators include
conceptual mapping and transformation indicators and conceptual network construction
ability indicators. The evaluation score calculation formula is as follows:
s1 + s2
Q1.3 =
25
where 𝑠1 is the conceptual mapping and transformation indicator score, an 𝑠2 dis
the conceptual network construction ability indicator score.
A mapped set of DIKWP resources is output to the model as a test set, and the
model is posed a series of questions. The model is required to use the output DIKWP
resources to perform some reasoning and decision-making to provide answers. The
model's efficiency in handling cognitive tasks is checked based on its answers. Specific
input cases are provided in Appendix 2(4). Evaluation indicators include Formula
5.3.10.
Q1.4 = Ecog
where 𝑠2 is the score for the model's understanding of different DIKWP resources,
and 𝑠2 is the score for the model's semantic understanding of uncertain DIKWP
resources.
Prompt:
The doctor says traditional Chinese medicine (TCM) treatment is also helpful, but it
requires a professional TCM practitioner to prescribe it. Lupus is a special condition,
and using only Chinese medicine might not be very effective. The doctor recommends
considering TCM after the condition stabilizes, but I feel it's very troublesome. The
medication period is long, and follow-up visits are needed, so I'm not sure if I have
enough time.
Comparison Standard:
Data: TCM, helpful, need, professional, practitioner, prescribe, lupus, systemic disease,
rely, Chinese medicine, might, effect, not very ideal, recommend, condition, stabilize,
reconsider, TCM therapy, plan, troublesome, many, medication
Wisdom: TCM treatment requires a professional practitioner, TCM is helpful but not
very effective, TCM therapy can be considered after the condition stabilizes
Prompt:
Text content: The patient feels that the proposed plan is cumbersome and has a long
duration with a lot of medication. Due to work reasons, the patient might not have
enough time. However, I believe that starting treatment early can prevent the disease
from worsening and reduce complications, which is beneficial to the patient's health.
DIKWP Resource Mapping:
Data: plan, cumbersome, every day, many, medication, regular, hospital, follow-up,
work, busy, no, time, lupus, long-term, early-stage, treatment, prevention
Knowledge: patient: busy with work, no time, finds it cumbersome; lupus: requires
long-term treatment, early treatment can prevent worsening
Wisdom: the patient is not satisfied with the doctor's TCM treatment plan, early
treatment of lupus can prevent worsening
Question: Please integrate and transform the DIKWP resources based on the provided
text content and DIKWP resource mapping to generate new data, information,
knowledge, wisdom, and purpose resources.
Prompt:
Data: most, limbs, joint pain, fatigue, fever, symptoms, no, right, knee, pain
Information: no: limbs, fatigue, joint, fever; joint: pain; knee: right, pain
Purpose: Answer the doctor's questions, provide more detailed symptom information,
hope the doctor understands the symptoms more comprehensively
Question 1: Please further transform these DIKWP resources into conceptual content.
For example, correctly map "red apple" to a more abstract conceptual level like "fruit"
or "red object."
Prompt:
Data: mouth ulcers, hair loss, condition, sun exposure, itchy face, evening, also, lupus
Information: mouth ulcers: none; hair loss: normal, not much; face: itchy, sun
exposure, evening, also
Knowledge: suspect: lupus; understand: symptoms; confirm: idea; lupus: hair loss,
mouth ulcers, sun exposure discomfort
Purpose: provide more symptom information, get explanations and suggestions, worry
about having lupus
Question 1: Based on the provided DIKWP resources, analyze who is the subject of
the test resources?
Question 2: Analyze the cognitive content received by the subject from these resources.
Question 3: Based on the provided pourpose, use the DIKW resources to fulfill the
purpose.
Prompt-1:
Text:
The doctor asked about the location of the back pain, whether it is related to activity,
and any changes in fever and weight, indicating that my symptoms might be related to
these factors. I need to respond seriously. It seems my back pain worsened after the
basketball game, but there is no specific location of pain. I haven't had a fever recently,
and my weight is stable.
Data: doctor, back, location, activity, weight, I, symptoms, basketball game, location,
time
Information: asked, pain, whether related to, activity, and, fever, changes, indicating,
might be, related to, seriously, respond, seems, after, worse, but, no, specific, pain, this
period, stable
Knowledge: body symptoms, weight changes, symptoms helpful for diagnosis, fatigue
diagnosis
Wisdom: correctly answering questions can help with a quick diagnosis, doctor's
questions are related to the disease
Purpose: relieve pain -> cooperate with the doctor -> answer the doctor's questions
Question: Please analyze the semantic situation of each DIKWP resource within the
entire DIKWP collection.
Prompt-2:
Data: doctor, back, location, activity, weight, I, symptoms, basketball game, location,
time
Information: asked, pain, whether related to, activity, and, fever, changes, indicating,
might be, related to, seriously, respond, seems, after, worse, but, no, specific, pain, this
period, stable
Knowledge: body symptoms, weight changes, symptoms helpful for diagnosis, fatigue
diagnosis
Wisdom: correctly answering questions can help with a quick diagnosis, doctor's
questions are related to the disease
Purpose: relieve pain -> cooperate with the doctor -> answer the doctor's questions
Question: Please analyze the semantic situation of each DIKWP resource within the
entire DIKWP collection.
Reference
[1] 段玉聪(Yucong Duan). (2024). 大语言模型(LLM)偏见测评(种族偏见)(Large Language
Model (LLM) Racial Bias Evaluation). DOI: 10.13140/RG.2.2.33162.03521.
https://www.researchgate.net/publication/377963440_Large_Language_Model_LLM_Racial_Bias
_Evaluation_--DIKWP_Research_Group_International_Standard_Evaluation_Prof_Yucong_Duan.
[31] Wang A, Singh A, Michael J, et al. GLUE: A multi-task benchmark and analysis
platform for natural language understanding[J]. arXiv preprint arXiv:1804.07461, 2018.
[32] Wang A, Pruksachatkun Y, Nangia N, et al. Superglue: A stickier benchmark for
general-purpose language understanding systems[J]. Advances in neural information
processing systems, 2019, 32.
[33] Zellers R, Holtzman A, Bisk Y, et al. Hellaswag: Can a machine really finish your
sentence?[J]. arXiv preprint arXiv:1905.07830, 2019.
[34] Lin S, Hilton J, Evans O. Truthfulqa: Measuring how models mimic human
falsehoods[J]. arXiv preprint arXiv:2109.07958, 2021.
[35] Hendrycks D, Burns C, Basart S, et al. Measuring massive multitask language
understanding[J]. arXiv preprint arXiv:2009.03300, 2020.
[36] 国家互联网信息办公室,国家发展和改革委员会,教育 17 部,科学技术部,工
业和信息化部,公安部,国家广播电视总局.生成式人工智能服务管理暂行办
法.2023.
[37] 认知智能全国重点实验室,中国科学院人工智能产学研创新联盟,长三角人工
智能产业链联盟,通用认知智能大模型评测体系.2023.