0% found this document useful (0 votes)
28 views9 pages

KnowPath - Reasoning Via LLM-generated Inference Paths

The document presents KnowPath, a framework that enhances large language models (LLMs) by integrating internal and external knowledge from knowledge graphs (KGs) to improve reasoning accuracy and reduce hallucinations. It addresses limitations in existing methods by generating interpretable inference paths, exploring subgraphs, and providing a clear evaluation-based answering process. Extensive experiments demonstrate that KnowPath significantly outperforms traditional LLM approaches in knowledge question answering tasks.

Uploaded by

vmahajanbe22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views9 pages

KnowPath - Reasoning Via LLM-generated Inference Paths

The document presents KnowPath, a framework that enhances large language models (LLMs) by integrating internal and external knowledge from knowledge graphs (KGs) to improve reasoning accuracy and reduce hallucinations. It addresses limitations in existing methods by generating interpretable inference paths, exploring subgraphs, and providing a clear evaluation-based answering process. Extensive experiments demonstrate that KnowPath significantly outperforms traditional LLM approaches in knowledge question answering tasks.

Uploaded by

vmahajanbe22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

KnowPath: Knowledge-enhanced Reasoning via LLM-generated Inference Paths

over Knowledge Graphs


Qi Zhao1 , Hongyu Yang1 , Qi Song1∗ , Xinwei Yao2 , Xiangyang Li1
1
University of Science and Technology of China, Hefei, Anhui, China
2
Zhejiang University of Technology, Hangzhou, Zhejiang, China
{zq2021, hongyuyang}@mail.ustc.edu.cn
[email protected], {qisong09, xiangyangli}@ustc.edu.cn

Abstract
arXiv:2502.12029v1 [cs.AI] 17 Feb 2025

What language is spoken in Netherlands and


Query
Belgium?
Large language models (LLMs) have demonstrated
remarkable capabilities in various complex tasks,
yet they still suffer from hallucinations. Introduc- Internal Internal Inference
ing external knowledge, such as knowledge graph, knowledge knowledge Paths
KGs Retrieval guide
can enhance the LLMs’ ability to provide factual
answers. LLMs have the ability to interactively
evalueate
explore knowledge graphs. However, most ap- LLMs LLMs LLMs KGs
proaches have been affected by insufficient inter-
nal knowledge excavation in LLMs, limited gen- countries_spoken_in
eration of trustworthy knowledge reasoning paths, main_country West Flemish
The answer is
and a vague integration between internal and ex- Netherlands
North Germanic. Netherlands Dutch main_country
ternal knowledge. Therefore, we propose Know- Dutch

Path, a knowledge-enhanced large model frame-


work driven by the collaboration of internal and ex- LLMs Only LLMs with KGs Our Method
ternal knowledge. It relies on the internal knowl- (a.) (b.) (c.)
edge of the LLM to guide the exploration of in-
terpretable directed subgraphs in external knowl- Figure 1: (a.) The LLMs-only approach suffers from severe hallu-
edge graphs, better integrating the two knowledge cinations. (b.) The LLMs with KGs approach provides insufficient
sources for more accurate reasoning. Extensive ex- information, and their graph-based reasoning with KGs is often inac-
periments on multiple real-world datasets confirm curate. (c.) We first mine the internal knowledge of LLMs, offering
the superiority of KnowPath. more information for external KG reasoning and achieving better in-
tegration of internal and external knowledge in LLMs.

1 Introduction
requires fine-tuning their model parameters, which inevitably
Large language models (LLMs) are increasingly being ap-
incurs high computational costs[Sun et al., 2024]. In con-
plied in various fields of Natural Language Processing (NLP)
trast, updating knowledge graphs is relatively simple and in-
tasks, such as text generation[Wang et al., 2024; Dong et
curs minimal overhead.
al., 2023], knowledge-based question answering[Luo et al.,
2024a; Zhao et al., 2024], and over specific domains[Al- The paradigms of combining LLMs with KGs can be clas-
berts et al., 2023; Jung et al., 2024]. In most scenarios, sified into three main categories. The first one is knowledge
LLMs serve as intermediary agents for implementing vari- injection during pre-training or fine-tuning[Luo et al., 2024b;
ous functions[Hu et al., 2024; Huang et al., 2024; Guo et al., Cao et al., 2023; Jiang et al., 2022; Yang et al., 2024]. While
2024]. However, due to the characteristics of generative mod- the model’s ability to grasp knowledge improves, these meth-
els, LLMs still suffer from hallucination issues, often gener- ods introduce high computational costs and catastrophic for-
ating incorrect answers that can lead to uncontrollable and getting. The second one entails using LLMs as agents to
severe consequences[Li et al., 2024]. Introducing knowledge reason through knowledge retrieved from the KGs. This ap-
graphs (KGs) to mitigate this phenomenon is promising[Yin proach does not require fine-tuning or retraining, significantly
et al., 2022]. This is because knowledge graphs store a large reducing overhead[Jiang et al., 2023; Yang et al., 2023].
amount of structured factual knowledge, which can provide However, It heavily relies on the completeness of external
large models with accurate knowledge dependencies. At the KGs and underutilizes the internal knowledge of the LLMs.
same time, correcting the knowledge in large models often The third one enables LLMs to participate in the process of
knowledge exploration within external KGs[Ma et al., 2024].

Corresponding author In this case, the LLMs can engage in the selection of knowl-
Question:Where is located the Whistler mountain, and is the same place in which there has been the Demi Lovato Summer Toor 2009?
Whistler Mountain ,located_in , British Columbia
(a) Whistler Mountain , part_of , Coast Mountains [Whistler Mountain->located_in->British Columbia,Demi
Inference Paths Demi Lovato Summer Tour 2009 , held_in , United States Lovato Summer Tour 2009->held_in->United States]
Generation LLM Demi Lovato Summer Tour 2009 , performer , Demi Lovato
guide
........

Depth=1 elevation Whistler Blackcomb Alaska known_for Glaciers entity


extends Depth=2
containedby
(b) Whistler Mountain Garibaldi Volcanic Belt Whistler relation
containedby near
Subgraph contains
Pacific Ranges Canada retained
Exploration Lava Fork volcano
contains abandoned
Coast Mountains British Columbia
prominence contains Depth=3

The answer is Canada. However, the Demi Lovato


(c) Insufficient information to Insufficient information to
Summer Tour 2009 was a concert tour that
Evaluation-based answer the question, answer the question,
occurred in various cities across North America, not
continue exploring. continue exploring.
answering specifically at this location.

Figure 2: The workflow of KnowPath. It contains: (a) Inference Paths Generation to exploit the internal knowledge of LLMs, (b) Subgraph
Exploration to generate a trustworthy directed subgraph, (c) Evaluation-based Answering to integrate internal and external knowledge.

edge nodes at each step[Sun et al., 2024; Chen et al., 2024; • We build a knowledge-enhanced large model frame-
Xu et al., 2024], thereby leveraging the advantages of the in- work driven by the collaboration of internal and exter-
ternal knowledge of the LLMs to some extent. nal knowledge. It not only integrates both the internal
The effective patterns of LLMs introducing KGs still have and external knowledge of the LLMs better, but also pro-
limitations. 1) Insufficient exploration of internal knowledge vides clearer and more trustworthy reasoning paths.
in LLMs. When exploring KGs, most approaches primarily • Extensive experiments conducted on multiple knowl-
treat LLMs as agents to select relevant relationships and en- edge question answering datasets demonstrate that our
tities, overlooking the potential of the internal knowledge. 2) KnowPath significantly mitigates the hallucination prob-
Constrained generation of trustworthy reasoning paths. Some lem in LLMs and outperforms the existing best.
methods have attempted to generate highly interpretable rea-
soning paths, but they limit the scale of path exploration, re-
quire additional memory. The generated paths also lack intu- 2 KnowPath
itive visual interpretability. 3) Ambiguous fusion of internal 2.1 Preliminary
and external knowledge. How to better integrate the internal Topic Entities represent the main entities in a query Q, de-
knowledge of LLMs with the external knowledge in KGs still noted as e0 . Each Q contains N topic entities {e10 , ..., eN
0 }.
requires further exploration. Inference Paths are a set of paths P = p1 , ..., pL generated
To overcome the above limitations, we propose KnowPath, by the LLM’s own knowledge, where L ∈ [1, N ] is dynami-
a knowledge-enhanced large model framework driven by the cally determined by the LLM agent. Each path p starts from
collaboration of internal and external knowledge. Specifi- the topic entity e0 ∈ {e10 , ..., eN
0 } and can be represented as
cally, KnowPath consists of three stages. 1) Inference paths p = e0 → r1 → e1 → ... → rn → en , where ei and ri
generation. To entirely exploit the internal knowledge of represent entities and relationships, respectively.
LLMs and adept in zero-shot scenario, this stage employs a Knowledge Graph(KG) is composed of many structured
prompt-driven approach to extract the knowledge triples most knowledge triples: K = {(eh , r, et ), r ∈ R, eh , et ∈ E},
relevant to the topic entities, and then generates reasoning where E represents all entities in the knowledge graph, and R
paths based on these knowledge triples to attempt answering represents all relationships, and eh and et represent the head
the question. 2) Trustworthy directed subgraph exploration. and tail entities, respectively.
It refers to the process where the LLM combines the previ- KG Subgraph refers to a connected subgraph extracted from
ously generated knowledge reasoning paths to select entities the knowledge graph K, where the entities and relationships
and relationships, and then responses based on the subgraph are entirely derived from K, i.e., Ks ⊆ K.
formed by these selections. This stage enables the LLMs
to fully participate in the effective construction of external 2.2 Inference Paths Generation
knowledge, while providing a clear process for constructing
Due to the extensive world knowledge stored within its pa-
subgraphs. 3) Evaluation-based answering. At this stage, ex-
rameters, LLMs can be considered as a complementary rep-
ternal knowledge primarily guides the KnowPath, while in-
resentation of KGs. To fully excavate the internal knowl-
ternal knowledge assists in generating the answer. Our con-
edge of LLMs and guide the exploration of KGs, we pro-
tributions can be summarized as follows:
pose a prompt-driven method to extract the internal knowl-
• We focus on a new view, emphasizing the importance of edge of LLMs effectively. It can retrieve reasoning paths of
the LLMs’ powerful internal knowledge in knowledge the model’s internal knowledge and clearly display the rea-
question answering, via a prompt-based internal knowl- soning process, and also is particularly effective in zero-shot
edge reasoning path generation method for LLMs. scenarios. Specifically, given a query Q, we first guide the
LLM to extract the most relevant topic entities {e10 , ..., eN
0 } Algorithm 1 Subgraph Exploration
through a specially designed prompt. Then, based on these Require: entityDict, entityN ame, question,
topic entities, the large model is instructed to generate a set maxW idth, depth, path
of knowledge triples associated with them. The number of 1: Set originalP ath as path
triples n is variable. Finally, the LLM attempts to answer 2: if depth = 0 then
based on the previously generated knowledge triples and pro- 3: Initialize path as [ ] ∗ maxW idth
vides a specific reasoning path from entities and relations to 4: end if
the answer. Each path is in the form of P = e10 → r1 → 5: for eid in entityDict do
e1 → ... → rn → en . The details of the Inference Paths 6: Find relevantRelations
Generation process are presented in the Appendix A.1. 7: for relation in relevantRelations do
8: Find entities linked by relation
2.3 Subgraph Exploration 9: end for
Exploration Initialization. KnowPath performs subgraph 10: end for
exploration for a maximum of D rounds. Each round cor- 11: Extract relevantEntities using candidate entities
responds to an additional hop in knowledge graph K and 12: Update path and entityDict based on relevance
1 N
the j-th contains N subgraphs {Ks,j , ..., Ks,j }. Each sub- 13: extraP ath ← (path − originalP ath)
i 14: return extraP ath, entityDict
graph Ks,j is composed of a set of knowledge graph reason-
i
ing paths, i.e. Ks,j = {pi1,j ∪ · · · ∪ pil,j , i ∈ [1, N ]}. The
number of reasoning paths l is flexibly determined by the
LLM agent. Taking the D-th round and the z-th path as an Algorithm 2 Update Reasoning Path in Subgraph
example, it starts exploration from one topic entity ei0 and ul-
timately forms a connected subgraph of the KG, denoted as Require: path, pathIsHead, isHead, r, e
1: if not pathIsHead then
piz,D = {ei0 , ei1,z , r1,z
i
, ei2,z , r2,z
i i
, ..., rD,z , eiD,z }. The start of
the first round of subgraph exploration (D=0), each path pi 2: if not isHead then
corresponds to the current topic entity, i.e.p0z,0 = {e10 }. 3: newP ath ← path + [←, r, ←, e].
4: else
Relation Exploration. Relation exploration aims to ex-
5: newP ath ← path + [→, r, →, e].
pand the subgraphs obtained in each round of exploration, en-
6: end if
abling deep reasoning. Specifically, for the i-th subgraph and
7: else
the j-th round of subgraph exploration, the candidate entities
8: if not isHead then
is denoted as Eji = {eij−1,1 , ..., eij−1,l }, where eij−1,1 is the
9: newP ath ← [e, →, r, →] + path.
tail entity of the reasoning path pi1,j−1 . Based on these can- 10: else
didates Eji , we search for all coresponding single-hop rela- 11: newP ath ← [e, ←, r, ←] + path.
i end if
tions in knowledge graph K, denoted as Ra,j = {r1 , ..., rM }, 12:
where M is determined by the specific knowledge graph K. 13: end if
Finally, the LLM agent will rely on the query Q, the infer- 14: Append newP ath to path
ence path P generated through the LLM’s internal knowledge 15: return path
(Section 2.2), and all topic entities e0 to select the most rel-
i
evant candidate relations from Ra,j , denoted as Rji ⊆ Ra,j i
,
which is dynamically determined by the LLM agent.
Entity Exploration. Entity exploration depends on the al- the directionality of entities and relations, but also automati-
ready determined candidate entities and candidate relations. cally determines and updates the paths. Its detailed process is
Taking the i-th subgraph and the j-th round of subgraph ex- described in Algorithm 2. The final subgraph can be flexibly
ploration as an example, relying on Eji and Rji , we perform expanded due to the variable number of paths l.
queries like (e, r, ?) or (?, r, e) on the knowledge graph K
i
to retrieve the corresponding entities Ea,j = {e1 , ..., eN },
where N varies depending on the knowledge graph K. Then, 2.4 Evaluation-based Answering
the agent also considers the query Q, the inference path P
in Section 2.2, the topic entity ei0 , and the candidate rela-
tion set Rji from Ea,j i
to generate the most relevant entity set After completing the subgraph update for each round, the
i i i i agent attempts to answer the query through the subgraph
Ej+1 = {ej,1 , ..., ej,l } ⊆ Ea,j . Note that eij,1 is the tail entity 1
{Ks,j N
, ..., Ks,j }. If it responds incorrectly, the next round
i
of the reasoning path p1,j . of subgraph exploration will be executed, until the maximum
Subgraph Update. Relation exploration determines entity exploration depth D is reached. Otherwise, it will output the
exploration, and we update the subgraph only after complet- final answer along with the corresponding interpretable di-
ing the entity exploration. Specifically, for the i-th subgraph rected subgraph. Unlike previous work [Chen et al., 2024],
and the j-th round of subgraph exploration, we append the even if no answer is found at the maximum exploration depth,
result of the exploration (, r, eij,1 ) to the path pi1,j in the sub- our KnowPath will rely on the inference path P to response.
i
graph Ks,j . This path update algorithm not only considers The framework of KnowPath is shown in Figure 2.
Method CWQ WebQSP Simple Questions WebQuestions
LLM only
IO prompt [Brown et al., 2020] 37.6 ± 0.8 63.3 ± 1.2 20.0 ± 0.5 48.7 ± 1.4
COT [Wei et al., 2022] 38.8 ± 1.5 62.2 ± 0.7 20.5 ± 0.4 49.1 ± 0.9
RoG w/o planning [Luo et al., 2024b] 43.0 ± 0.9 66.9 ± 1.3 - -
SC [Wang et al., 2022] 45.4 ± 1.1 61.1 ± 0.5 18.9 ± 0.6 50.3 ± 1.2
Fine-Tuned KG Enhanced LLM
UniKGQA [Jiang et al., 2022] 51.2 ± 1.0 75.1 ± 0.8 - -
RE-KBQA [Cao et al., 2023] 50.3 ± 1.2 74.6 ± 1.0 - -
ChatKBQA [Luo et al., 2024a] 76.5 ± 1.3 78.1 ± 1.1 85.8 ± 0.9 55.1 ± 0.6
RoG [Luo et al., 2024b] 64.5 ± 0.7 85.7 ± 1.4 73.3 ± 0.8 56.3 ± 1.0
Prompting KG Enhanced LLM with GPT3.5
StructGPT [Jiang et al., 2023] 54.3 ± 1.0 72.6 ± 1.2 50.2 ± 0.5 51.3 ± 0.9
ToG [Sun et al., 2024] 57.1 ± 1.5 76.2 ± 0.8 53.6 ± 1.0 54.5 ± 0.7
PoG [Chen et al., 2024] 63.2 ± 1.0 82.0 ± 0.9 58.3 ± 0.6 57.8 ± 1.2
KnowPath (Ours) 67.9 ± 0.6 84.1 ± 1.3 61.5 ± 0.8 60.0 ± 1.0
Prompting KG Enhanced LLM with DeepSeek-V3
ToG [Sun et al., 2024] 60.9 ± 0.7 82.6 ± 1.0 59.7 ± 0.9 57.9 ± 0.8
PoG [Chen et al., 2024] 68.3 ± 1.1 85.3 ± 0.9 63.9 ± 0.5 61.2 ± 1.3
KnowPath (Ours) 73.5 ± 0.9 89.0 ± 0.8 65.3 ± 1.0 64.0 ± 0.7

Table 1: Hits@1 scores (%) of different models on four datasets under various knowledge-enhanced methods. We use GPT-3.5 Turbo and
DeepSeek-V3 as the primary backbones. Bold text indicates the results achieved by our method.

3 Experimental Setup Damx is set to 3. Since the FreeBase [Bollacker et al., 2008]
3.1 Baselines supports all the aforementioned datasets, we apply it as the
base graph for subgraph exploration, and We apply GPT-3.5-
We chose corresponding advanced baselines for comparison turbo-1106 and DeepSeek-V3 as the base models. All exper-
based on the three main paradigms of existing knowledge- iments are deployed on four NVIDIA A800-40G GPUs.
based question answering. 1) The First is the LLM-only, in-
cluding the standard prompt (IO prompt[Brown et al., 2020]), 4 Result
the chain of thought prompt (CoT[Wei et al., 2022]), the
self-consistency (SC[Wang et al., 2022]), and the RoG with- 4.1 Main results
out planning (ROG w/o planning[Luo et al., 2024b]). 2) We conducted comprehensive experiments on four widely
The second is the KG-enhanced fine-tuned LLMs, which in- used knowledge-based question answering datasets. The ex-
clude ChatKBQA[Luo et al., 2024a], RoG[Luo et al., 2024b], perimental results are presented in Table 1, and four key find-
UniKGQA[Jiang et al., 2022], and RE-KBQA[Cao et al., ings are outlined as follows:
2023]. 3) The third is the KG-enhanced prompt-based LLMs, KnowPath performs the best. Our KnowPath outper-
including Think on graph (ToG[Sun et al., 2024]), Plan on forms all the Prompting-driven KG-Enhanced. For instance,
graph (PoG[Chen et al., 2024]), and StructGPT[Jiang et al., on the multi-hop CWQ, regardless of the base model used,
2023]. Unlike the second, this scheme no longer requires KnowPath achieves a maximum improvement of about 13%
fine-tuning and has become a widely researched mode today. in Hits@1. In addition, KnowPath outperforms the LLM-
only with a clear margin and surpasses the majority of Fine-
3.2 Datasets and Metrics Tuned KG-Enhanced LLM methods. On the most challeng-
Datasets. We adopt four knowledge-based question answer- ing open-domain question answering dataset WebQuestions,
ing datasets: the single-hop Simple Questions [Bordes et KnowPath achieves the best performance compared to strong
al., 2015], the complex multi-hop CWQ [Talmor and Berant, baselines from other paradigms (e.g., PoG 61.2% vs Ours
2018] and WebQSP [Yih et al., 2016], and the open-domain 64.0%). This demonstrates KnowPath’s ability to enhance
WebQuestions [Berant et al., 2013]. the factuality of LLMs in open-domain question answering,
Metrics. Following previous research [Chen et al., 2024], which is an intriguing phenomenon worth further exploration.
we apply exact match accuracy (Hits@1) for evaluation. KnowPath excels at complex multi-hop tasks. On
both CWQ and WebQSP, KnowPath outperforms the lat-
3.3 Experiment Details est strong baseline PoG, achieving an average improvement
Following previous research [Chen et al., 2024], to control of approximately 5% and 2.9%, respectively. On the We-
the overall costs, the maximum subgraph exploration depth bQSP, DeepSeek-v3 with KnowPath not only outperforms
Method CWQ WebQSP SimpleQA WebQ Method LLM Call Total Token Input Token
KnowPath 73.5 89.0 65.3 64.0 ToG 22.6 9669.4 8182.9
-w/o IPG 67.3 84.5 63.1 61.0 PoG 16.3 8156.2 7803.0
-w/o SE 64.7 83.1 60.4 60.7 KnowPath 9.9 2742.4 2368.9
Base 39.2 66.7 23.0 53.7
Table 3: Cost-effectiveness analysis on the CWQ dataset between
Table 2: Ablation experiment results on four knowledge-based ques- our KnowPath and the strongly prompt-driven knowledge-enhanced
tion answering tasks. IPG stands for Inference Paths Generation benchmarks (ToG and PoG). The Total Token includes two parts:
module, while SE stands for Subgraph Exploration module. the total number of tokens from multiple input prompts and the total
number of tokens from the intermediate results returned by the LLM.
The Input Token represents only the total number of tokens from the
all Prompting-based KG-Enhanced LLMs but also surpasses multiple input prompts. The LLM Call refer to the total number of
the strongest baseline ROG among Fine-Tuned KG-Enhanced accesses to the LLM agent.
LLMs (85.7% vs 89%). On the more challenging multi-hop
CWQ, the improvement of KnowPath over the PoG is signif-
icantly greater than the improvement on the simpler single-
hop SimpleQuestions (5.2% vs 1.4%). These collectively in-
dicate that KnowPath is sensitive to deep reasoning.
Knowledge enhancement greatly aids factual question
answering. When question answering is based solely on
LLMs, the performance is poor across multiple tasks. For
example, COT achieves only about 20.5% Hits@1 on Sim-
pleQuestions. This is caused by the hallucinations inherent
in LLMs. Whatever method is applied to introduce the KGs,
they significantly outperform LLM-only. The maximum im-
provements across the four tasks are 35.9%, 27.9%, 46.4%,
and 15.3%. These further emphasize the importance of intro-
ducing knowledge graphs for generating correct answers.
The stronger the base, the higer the performance.
As DeepSeek-V3 is better than GPT-3.5, even though both
are prompting-based knowledge-enhanced, their performance
on all tasks shows a significant difference after incorporat-
ing our KnowPath. Replacing GPT-3.5 with DeepSeek-V3,
KnowPath achieved a maximum improvement from 67.9% to
73.5% on CWQ, and on Simple Questions, it improved by Figure 3: Comparison of KnowPath, its individual components, and
at least 3.8%. These findings indicate that the improvement strong baseline methods (ToG and PoG) on the performance across
in model performance directly drives the enhancement of its four commonly used knowledge-based question answering datasets.
performance in knowledge-based question-answering.
KnowPath is a more flexible plugin. Compared to fine-
tuned knowledge-enhanced LLMs, our KnowPath does not to the base model, the addition of these modules still signifi-
require fine-tuning of the LLM, yet it outperforms most of the cantly improves the overall performance.
fine-tuned methods. In addition, on the CWQ dataset, Know- It is necessary to focus on the powerful internal knowl-
Path with DeepSeek-V3 achieves performance that is very edge of LLMs. Eliminating the Subgraph Exploration and
close to the strongest baseline, ChatKBQA, which requires relying solely on the internal knowledge mining of LLMs to
fine-tuning for knowledge enhancement. On the WebQSP generate reasoning paths and provide answers proves to be
dataset, it outperforms ChatKBQA by about 11% (78.1% highly effective. It has shown significant improvement across
vs 89.0%). Overall, the resource consumption of KnowPath all four datasets, with an average performance enhancement
is significantly lower than that of Fine-Tuned KG-Enhanced of approximately 21.6%. The most notable improvement was
LLMs. This is because KnowPath improves performance by observed on SimpleQA, where performance leaped from 23%
optimizing inference paths and enhancing knowledge integra- to 60.4%. This indicates that even without the incorporation
tion, making it a more flexible and plug-and-play framework. of external knowledge graphs, the performance of the model
in generating factual responses can be enhanced to a certain
4.2 Ablation Study extent through internal mining methods. However, without
We validate the effectiveness of each component of Know- the guidance of internal knowledge reasoning paths, Know-
Path and quantify their contributions to performance. Its re- Path has seen some performance decline across all tasks, es-
sults are presented in Table 2, and visualized in Figure 3. pecially in complex multi-hop CWQ and WebQSP.
Each component contributes to the overall remarkable The most critical credible directed Subgraph Explo-
performance. After removing each module, their perfor- ration is deep-sensitive. Removing the subgraph exploration
mance on different datasets will decline. However, compared leads to a significant decline in Knowpath across all tasks, av-
Figure 4: Visualization of the cost-effectiveness analysis on four public knowledge-based question-answering datasets.

proximately 4.0x compared to TOG and POG. This is mainly


since all the prompts used in KnowPath are based on the care-
fully designed zero-shot approach, rather than the in-context
learning used by the previous, which require providing large
context to ensure the factuality of the answers. We explored
the reasons behind this difference. First, previous methods
rely on more contextual information for in-context learning
to ensure the correctness of the output. Secondly, KnowPath
(a) Exploration temperature (b) The count of triples fully leverages the powerful internal relevant knowledge and
uses it as the input signal for the agent. This not only provides
Figure 5: Analysis of key parameters. more contextual reference but also significantly improves the
accuracy and efficiency of relation and entity exploration in
subgraph exploration, ensuring that the generated subgraph is
eraging a drop of approximately 5.7%. This performance dip highly relevant while enabling the most effective reasoning
is particularly pronounced in complex multi-hop tasks. For toward potential answers.
instance, on the CWQ, Knowpath without subgraph explo-
ration experiences a nearly 9% decrease. 4.4 Parameter analysis
We analyze the key parameters that affect the performance of
4.3 Cost-effectiveness Analysis KnowPath on the WebQSP, and discuss the following issues:
To explore the cost-effectiveness of KnowPath while main- What is the impact of the temperature in Subgraph Ex-
taining high accuracy, we conducted a cost-benefit analysis. ploration? We explore the optimal temperature from 0.2 to
In this experiment, we tracked the primary sources of cost, 1, and the relation between it and Hits@1 is shown in Figure
including the LLM Call, Input Token, and Total Token usage. 5a. During subgraph exploration, variations in the tempera-
The results are presented in the Table 3, and are visualized in ture affect the divergence of the model’s generated answers.
Figure 4. Our key findings are described as follows: A lower temperature negatively impacts KnowPath’s perfor-
The number of accesses to the LLM agent was signif- mance, as the model generates overly conservative answers
icantly reduced. Specifically, the LLM calls for TOG and with insufficient knowledge, while the LLM relies on its in-
POG was 2.28x and 1.64x of that in our KnowPath, respec- ternal knowledge when exploring and selecting entities and
tively. This exceptionally low cost can be attributed to the fact relationships. A higher temperature also harms KnowPath, as
that the Subgraph Exploration does not limit the scale of the the divergent answers may deviate from the given candidates.
path search, and this can be broken down into three key rea- Extensive experiments show that 0.4 is the optimal tempera-
sons. First, in each round of subgraph exploration, only one ture, consistent with other existing works [Chen et al., 2024].
relation exploration and one entity exploration are conducted. How is the count of knowledge triples determined in In-
Second, the Evaluation-based answering only accesses the ference Paths Generation? We explored it with a step size
LLM once after each round of subgraph exploration to judge of 15, and the relationship between the count of knowledge
whether the current subgraph can answer the question. If it triples and Hits@1 is shown in Figure 5a. When the count
cannot, the next round is performed. Third, if the largest ex- is 0, KnowPath’s performance is poor due to the lack of in-
plored subgraph still cannot answer the question, KnowPath ternal knowledge exploration. When the count is too large,
will rely on the Inference Paths Generation. such as 45, its performance is also suboptimal, as excessive
The number of tokens used is saved by several times. exploration introduces irrelevant knowledge as interference.
Whether in Total Token or Input Tokens, KnowPath saves ap- Extensive experiments show that 15 is the optimal.
Question: What text in the religion which include Zhang Jue Question: who won the league cup in 2002?
as a key figure is considered to be sacred?

There is no explicit information provided about Zhang Jue. To answer this question, additional knowledge or data about
ToG the league cup winners in 2002 would be required.
The answer to the question is Taiping Jing.
The winner of the 2002 Football League Cup was Blackburn Rovers.
PoG Unable to answer this question, use cot to answer:the reasoning_chains:
["Football League Cup", "sports.sports_championship.events", "2002 Football
question is Taiping Jing.
League Cup Final"],["Football League Cup","sports.sports_championship_event
-.champion-nship", "2002 Football League Cup Final"]

Inference Path : Zhang Jue -> is a key figure in -> Way of the Five Inference Path : 2002 League Cup -> was won by -> Birmingham City ->
Pecks of Rice -> is a -> Taoist sect -> Taoism -> is based on -> Tao Te defeated -> Liverpool -> 2002 League Cup -> was contested between ->
Ching -> is considered to be -> sacred text in Taoism. Birmingham City and Liverpool.

Tao Te Ching 2002-03-Football


SubGraph Football League Cup
Know 2003 Football League Cup
Path Taoism Daozang League Cup Final
Zhang Jue
religion.religious_text_of sports.sports_team. time.event.instance_
championships of_recurring_event
Zhuang Zhou sports.sports_championship
religion.religion.notable_figures _event.season Liverpool F.C. SubGraph

The answer is Tao Te Ching, Daozang, Zhuang Zhou. the answer to the question is Liverpool F.C.
(a.)A case from CWQ (b.)A case from WebQuestions

Figure 6: The case study on the multi-hop CWQ and open-domain WebQuestions dataset. To provide a clear and vivid comparison with the
strong baselines (ToG and PoG), we visualized the execution process of KnowPath

4.5 Case Study ments in zero-shot scenarios[Kojima et al., 2022; Chung et


To provide a clear and vivid comparison with the strong base- al., 2024]. However, reasoning solely based on the model’s
lines, we visualized the execution process of KnowPath, as knowledge still faces significant hallucination issues, which
shown in Figure 6. In the CWQ, ToG and PoG can only ex- remain unresolved.
tract context from the question, failing to gather enough ac- KG-enhanced LLM inference. ”Early works enhanced
curate knowledge for a correct answer, thus producing the in- model knowledge understanding by injecting KGs into model
correct answer ”Taiping Jing.” In contrast, KnowPath uncov- parameters through fine-tuning or retraining[Cao et al., 2023;
ers large model reasoning paths that provide additional, suffi- Jiang et al., 2022; Yang et al., 2024]. ChatKBQA[Luo et al.,
cient information. This enables key nodes, such as ”Taoism,” 2024a] and RoG[Luo et al., 2024b] utilize fine-tuned LLMs
to be identified during subgraph exploration, ultimately lead- to generate logical forms. StructGPT[Jiang et al., 2023],
ing to the correct answer, ”Zhuang Zhou.” In the WebQues- based on the RAG approach, retrieves information from KGs
tions, ToG is unable to answer the question due to insufficient for question answering. ToG[Sun et al., 2024] and PoG[Chen
information. Although PoG provides a reasoning chain, the et al., 2024] involve LLMs in knowledge graph reasoning,
knowledge derived from the reasoning process is inaccurate, using them as agents to assist in selecting entities and rela-
and the final answer still relies on the reasoning of the large tionships during exploration. Despite achieving strong per-
model, resulting in the incorrect answer ”Blackburn Rovers.” formance, these methods still face challenges like insufficient
In contrast, guided by Inference, KnowPath accurately identi- internal knowledge mining and the inability to generate trust-
fied the relationship ”time.event.instance of recurring event” worthy reasoning paths.
and, through reasoning with the node ”2002-03-Football
League Cup,” ultimately arrived at the correct result node 6 Conclusion
”Liverpool F.C.” Overall, KnowPath not only provides an-
swers but also generates directed subgraphs, which serve as In this paper, to enhance the ability of LLMs to provide fac-
the foundation for trustworthy reasoning and significantly en- tual answers, we propose the knowledge-enhanced reasoning
hance the interpretability of the results. framework KnowPath, driven by the collaboration of internal
and external knowledge. It focuses on leveraging the reason-
ing paths generated by the extensive internal knowledge of
5 Related Work LLMs to guide the trustworthy directed subgraph exploration
Prompt-driven LLM inference. CoT[Wei et al., 2022] of knowledge graphs. Extensive experiments show that: 1)
(Chain of Thought) effectively improves the reasoning abil- Our KnowPath is optimal and excels at complex multi-hop
ity of large models, enhancing performance on complex tasks. 2) It demonstrates remarkable cost-effectiveness, with
tasks with minimal contextual prompts. Self-Consistency a 55% reduction in the number of LLM calls and a 75% de-
(SC)[Wang et al., 2022] samples multiple reasoning paths to crease in the number of tokens consumed compared to the
select the most consistent answer, with further improvements strong baselines. 3) KnowPath can explore directed sub-
seen in DIVERSE[Li et al., 2022] and Vote Complex[Fu graphs of the KGs, providing an intuitive and trustworthy rea-
et al., 2022]. Other methods have explored CoT enhance- soning process, greatly enhancing the overall interpretability.
References Artificial Intelligence, IJCAI 2024, Jeju, South Korea, Au-
[Alberts et al., 2023] Ian L Alberts, Lorenzo Mercolli, gust 3-9, 2024, pages 8048–8057. ijcai.org, 2024.
Thomas Pyka, George Prenosil, Kuangyu Shi, Axel [Hu et al., 2024] Yuxuan Hu, Gemju Sherpa, Lan Zhang,
Rominger, and Ali Afshar-Oromieh. Large language mod- Weihua Li, Quan Bai, Yijun Wang, and Xiaodan Wang.
els (llm) and chatgpt: what will the impact on nuclear An llm-enhanced agent-based simulation tool for informa-
medicine be? European journal of nuclear medicine and tion propagation. In Proceedings of the Thirty-Third Inter-
molecular imaging, 50(6):1549–1552, 2023. national Joint Conference on Artificial Intelligence, IJCAI
[Berant et al., 2013] Jonathan Berant, Andrew Chou, Roy 2024, Jeju, South Korea, August 3-9, 2024, pages 8679–
Frostig, and Percy Liang. Semantic parsing on freebase 8682. ijcai.org, 2024.
from question-answer pairs. In Proceedings of the 2013 [Huang et al., 2024] Xu Huang, Weiwen Liu, Xiaolong
conference on empirical methods in natural language pro- Chen, Xingmei Wang, Hao Wang, Defu Lian, Yasheng
cessing, pages 1533–1544, 2013. Wang, Ruiming Tang, and Enhong Chen. Understand-
[Bollacker et al., 2008] Kurt Bollacker, Colin Evans, ing the planning of llm agents: A survey. arXiv preprint
Praveen Paritosh, Tim Sturge, and Jamie Taylor. Freebase: arXiv:2402.02716, 2024.
a collaboratively created graph database for structuring [Jiang et al., 2022] Jinhao Jiang, Kun Zhou, Wayne Xin
human knowledge. In Proceedings of the 2008 ACM Zhao, and Ji-Rong Wen. Unikgqa: Unified retrieval and
SIGMOD international conference on Management of reasoning for solving multi-hop question answering over
data, pages 1247–1250, 2008. knowledge graph. arXiv preprint arXiv:2212.00959, 2022.
[Bordes et al., 2015] Antoine Bordes, Nicolas Usunier, [Jiang et al., 2023] Jinhao Jiang, Kun Zhou, Zican Dong,
Sumit Chopra, and Jason Weston. Large-scale simple Keming Ye, Wayne Xin Zhao, and Ji-Rong Wen.
question answering with memory networks. arXiv Structgpt: A general framework for large language
preprint arXiv:1506.02075, 2015. model to reason over structured data. arXiv preprint
[Brown et al., 2020] Tom Brown, Benjamin Mann, Nick Ry- arXiv:2305.09645, 2023.
der, Melanie Subbiah, Jared D Kaplan, Prafulla Dhari- [Jung et al., 2024] Sung Jae Jung, Hajung Kim, and Ky-
wal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, oung Sang Jang. Llm based biological named entity recog-
Amanda Askell, et al. Language models are few-shot nition from scientific literature. In 2024 IEEE Interna-
learners. Advances in neural information processing sys- tional Conference on Big Data and Smart Computing (Big-
tems, 33:1877–1901, 2020. Comp), pages 433–435. IEEE, 2024.
[Cao et al., 2023] Yong Cao, Xianzhi Li, Huiwen Liu, Wen [Kojima et al., 2022] Takeshi Kojima, Shixiang Shane Gu,
Dai, Shuai Chen, Bin Wang, Min Chen, and Daniel Her- Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. Large
shcovich. Pay more attention to relation exploration language models are zero-shot reasoners. Advances in
for knowledge base question answering. arXiv preprint neural information processing systems, 35:22199–22213,
arXiv:2305.02118, 2023. 2022.
[Chen et al., 2024] Liyi Chen, Panrong Tong, Zhongming [Li et al., 2022] Yifei Li, Zeqi Lin, Shizhuo Zhang, Qiang
Jin, Ying Sun, Jieping Ye, and Hui Xiong. Plan-on-graph: Fu, Bei Chen, Jian-Guang Lou, and Weizhu Chen. On
Self-correcting adaptive planning of large language model the advance of making language models better reasoners.
on knowledge graphs. CoRR, abs/2410.23875, 2024. arXiv preprint arXiv:2206.02336, 2022.
[Chung et al., 2024] Hyung Won Chung, Le Hou, Shayne [Li et al., 2024] Johnny Li, Saksham Consul, Eda Zhou,
Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan James Wong, Naila Farooqui, Yuxin Ye, Nithyashree
Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Manohar, Zhuxiaona Wei, Tian Wu, Ben Echols, et al.
et al. Scaling instruction-finetuned language models. Jour- Banishing llm hallucinations requires rethinking general-
nal of Machine Learning Research, 25(70):1–53, 2024. ization. arXiv preprint arXiv:2406.17642, 2024.
[Dong et al., 2023] Xiangjue Dong, Yibo Wang, Philip S Yu, [Luo et al., 2024a] Haoran Luo, Haihong E, Zichen Tang,
and James Caverlee. Probing explicit and implicit gen- Shiyao Peng, Yikai Guo, Wentai Zhang, Chenghao Ma,
der bias through llm conditional text generation. arXiv Guanting Dong, Meina Song, Wei Lin, Yifan Zhu, and
preprint arXiv:2311.00306, 2023. Anh Tuan Luu. Chatkbqa: A generate-then-retrieve frame-
[Fu et al., 2022] Yao Fu, Hao Peng, Ashish Sabharwal, Peter work for knowledge base question answering with fine-
Clark, and Tushar Khot. Complexity-based prompting for tuned large language models. In Lun-Wei Ku, Andre Mar-
multi-step reasoning. In The Eleventh International Con- tins, and Vivek Srikumar, editors, Findings of the Associ-
ference on Learning Representations, 2022. ation for Computational Linguistics, ACL 2024, Bangkok,
[Guo et al., 2024] Taicheng Guo, Xiuying Chen, Yaqi Wang, Thailand and virtual meeting, August 11-16, 2024, pages
Ruidi Chang, Shichao Pei, Nitesh V. Chawla, Olaf Wiest, 2039–2056. Association for Computational Linguistics,
and Xiangliang Zhang. Large language model based multi- 2024.
agents: A survey of progress and challenges. In Proceed- [Luo et al., 2024b] Linhao Luo, Yuan-Fang Li, Gholamreza
ings of the Thirty-Third International Joint Conference on Haffari, and Shirui Pan. Reasoning on graphs: Faithful
and interpretable large language model reasoning. In The [Yih et al., 2016] Wen-tau Yih, Matthew Richardson, Chris
Twelfth International Conference on Learning Representa- Meek, Ming-Wei Chang, and Jina Suh. The value of
tions, ICLR 2024, Vienna, Austria, May 7-11, 2024. Open- semantic parse labeling for knowledge base question an-
Review.net, 2024. swering. In Katrin Erk and Noah A. Smith, editors, Pro-
[Ma et al., 2024] Shengjie Ma, Chengjin Xu, Xuhui Jiang, ceedings of the 54th Annual Meeting of the Association
for Computational Linguistics (Volume 2: Short Papers),
Muzhi Li, Huaren Qu, and Jian Guo. Think-on-graph
pages 201–206, Berlin, Germany, August 2016. Associa-
2.0: Deep and interpretable large language model reason-
tion for Computational Linguistics.
ing with knowledge graph-guided retrieval. arXiv e-prints,
pages arXiv–2407, 2024. [Yin et al., 2022] Da Yin, Li Dong, Hao Cheng, Xiaodong
Liu, Kai-Wei Chang, Furu Wei, and Jianfeng Gao. A sur-
[Sun et al., 2024] Jiashuo Sun, Chengjin Xu, Lumingyuan
vey of knowledge-intensive nlp with pre-trained language
Tang, Saizhuo Wang, Chen Lin, Yeyun Gong, Lionel M. models. arXiv preprint arXiv:2202.08772, 2022.
Ni, Heung-Yeung Shum, and Jian Guo. Think-on-graph:
Deep and responsible reasoning of large language model [Zhao et al., 2024] Ruilin Zhao, Feng Zhao, Long Wang,
on knowledge graph. In The Twelfth International Con- Xianzhi Wang, and Guandong Xu. Kg-cot: Chain-of-
ference on Learning Representations, ICLR 2024, Vienna, thought prompting of large language models over knowl-
Austria, May 7-11, 2024. OpenReview.net, 2024. edge graphs for knowledge-aware question answering. In
Proceedings of the Thirty-Third International Joint Con-
[Talmor and Berant, 2018] Alon Talmor and Jonathan Be- ference on Artificial Intelligence, IJCAI 2024, Jeju, South
rant. The web as a knowledge-base for answering complex Korea, August 3-9, 2024, pages 6642–6650. ijcai.org,
questions. In Marilyn Walker, Heng Ji, and Amanda Stent, 2024.
editors, Proceedings of the 2018 Conference of the North
American Chapter of the Association for Computational
Linguistics: Human Language Technologies, Volume 1
(Long Papers), pages 641–651, New Orleans, Louisiana,
June 2018. Association for Computational Linguistics.
[Wang et al., 2022] Xuezhi Wang, Jason Wei, Dale Schu-
urmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha
Chowdhery, and Denny Zhou. Self-consistency improves
chain of thought reasoning in language models. arXiv
preprint arXiv:2203.11171, 2022.
[Wang et al., 2024] Ziao Wang, Xiaofeng Zhang, and Hong-
wei Du. Beyond what if: Advancing counterfactual text
generation with structural causal modeling. In Proceed-
ings of the Thirty-Third International Joint Conference on
Artificial Intelligence, IJCAI 2024, Jeju, South Korea, Au-
gust 3-9, 2024, pages 6522–6530. ijcai.org, 2024.
[Wei et al., 2022] Jason Wei, Xuezhi Wang, Dale Schuur-
mans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le,
Denny Zhou, et al. Chain-of-thought prompting elicits
reasoning in large language models. Advances in neural
information processing systems, 35:24824–24837, 2022.
[Xu et al., 2024] Yao Xu, Shizhu He, Jiabei Chen, Zihao
Wang, Yangqiu Song, Hanghang Tong, Guang Liu, Kang
Liu, and Jun Zhao. Generate-on-graph: Treat llm as both
agent and kg in incomplete knowledge graph question an-
swering. arXiv preprint arXiv:2404.14741, 2024.
[Yang et al., 2023] Linyao Yang, Hongyang Chen, Zhao
Li, Xiao Ding, and Xindong Wu. Chatgpt is not
enough: Enhancing large language models with knowl-
edge graphs for fact-aware language modeling. arXiv
preprint arXiv:2306.11489, 2023.
[Yang et al., 2024] Linyao Yang, Hongyang Chen, Zhao Li,
Xiao Ding, and Xindong Wu. Give us the facts: Enhanc-
ing large language models with knowledge graphs for fact-
aware language modeling. IEEE Transactions on Knowl-
edge and Data Engineering, 2024.

You might also like