0% found this document useful (0 votes)
35 views6 pages

A 等 - 2024 - Enhancing Long-Term Memory using Hierarchical Aggregate Tree for Retrieval Augmented Generation

The document presents the Hierarchical Aggregate Tree (HAT) memory structure designed to enhance long-term memory in large language models (LLMs) for improved dialogue coherence and summary quality. HAT allows for recursive aggregation of relevant dialogue context through conditional tree traversals, addressing the limitations of LLMs in managing long conversations. Experimental results demonstrate that HAT effectively balances information breadth and depth, enabling more consistent and grounded long-form conversations without significant parameter growth.

Uploaded by

kamisamaxiasheng
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views6 pages

A 等 - 2024 - Enhancing Long-Term Memory using Hierarchical Aggregate Tree for Retrieval Augmented Generation

The document presents the Hierarchical Aggregate Tree (HAT) memory structure designed to enhance long-term memory in large language models (LLMs) for improved dialogue coherence and summary quality. HAT allows for recursive aggregation of relevant dialogue context through conditional tree traversals, addressing the limitations of LLMs in managing long conversations. Experimental results demonstrate that HAT effectively balances information breadth and depth, enabling more consistent and grounded long-form conversations without significant parameter growth.

Uploaded by

kamisamaxiasheng
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Enhancing Long-Term Memory using Hierarchical Aggregate Tree for

Retrieval Augmented Generation

Aadharsh Aadhithya A∗ , Sachin Kumar S∗ and Soman K.P∗


Amrita School of Artificial Intelligence,Coimbatore
Amrita Vishwa Vidyapeetham, India

Abstract they are closed systems and providing external


information to LLMs is an active research area.
Large language models (LLMs) have limited
context capacity, hindering reasoning over long
conversations. We propose the Hierarchical
Two primary ways of providing external data to
arXiv:2406.06124v1 [cs.CL] 10 Jun 2024

Aggregate Tree (HAT) memory structure to re- LLMs are through: a) finetuning and b) retrieval-
cursively aggregate relevant dialogue context augmented generation (RAG). Finetuning requires
through conditional tree traversals. HAT en- access to model weights and can only be done with
capsulates information from children nodes, en- open models. RAG relies on strategies to retrieve
abling broad coverage with depth control. We information from a datastore given a user query,
formulate finding best context as optimal tree without needing internal model information, allow-
traversal. Experiments show HAT improves di-
ing it to be used with more model types. However,
alog coherence and summary quality over base-
line contexts, demonstrating the technique’s RAG is limited by the model’s context length bud-
effectiveness for multi-turn reasoning without get. Although very large context LLMs exist, the
exponential parameter growth. HAT balances budget remains limited. Hence, how and what data
information breadth and depth for long-form is retrieved given a user query is an important re-
dialogues. This memory augmentation enables search task.With the advent of "LLM agents", a
more consistent, grounded long-form conversa- separate memory management module is often re-
tions from LLMs. quired. Some solutions train models for this task
while others take a retrieval-based approach. Still
1 Introduction
others use the LLM itself to accomplish it. How-
Large language models (LLMs) like ChatGPT ever, current approaches tend to rely solely on pro-
are having an impact across various areas and viding a summary versus retrieving from a data-
applications. One of the most straightforward store, with little in between. We hence propose
applications is using LLMs as personalized chat a method that combines both worlds using a new
agents. There have been several efforts to develop data structure called the "Hierarchical Aggregate
chatbots for various applications, both generic and Tree".
domain-specific, particularly after the advent of
LLMs and associated Pythonic libraries, which 1.1 Recent works
have made it very easy for people to develop their There has been growing interest in developing tech-
own chatbots. niques to enhance LLMs’ capabilities for long-term
multi-session dialogues. (Xu et al., 2021a) col-
Customizing and aligning these LLMs is still an lected a dataset of human conversations across mul-
active area of research. One basic alignment we tiple chat sessions and showed that existing models
would want is to make chatbots behave according perform poorly in maintaining consistency, neces-
to our expectations, particularly when context sitating long-context models with summarization
is specific and requires some information to be and retrieval abilities. (Xu et al., 2021b) proposed
highlighted that is not necessarily in the model’s a model to extract and update conversational per-
pretraining corpus. While LLMs are considered sonas over long-term dialogues. (Bae et al.) pre-
snapshots of the internet, one limitation is that sented a task and dataset for memory management
0
Drafted: December 18,2023 in long chats, dealing with outdated information.
Arxiv Submission: June 11, 2024 (Wang et al.) proposed recursive summarization
for equipping LLMs with long-term memory. (Lee retention as detailed in section 3.1. At a high-
et al.) showed prompted LLMs can match fine- level, the intended characteristic of HAT is
tuned models for consistent open-domain chats. that we have more resolution as we move Top-
(Zhang et al.) encoded conversation history hierar- Down and we have more latest information as
chically to generate informed responses. Develop- we move Left-Right.
ing more sophisticated strategies for information
2. Memory Agent : The memory agent is tasked
integration remains an open challenge.
to find the best traversal in HAT, conditioned
2 Task Definition on the user query such that at the end of traver-
sal, we are in a node that contains relevant
The Task at hand is straightforward. Given context to answer the user query.
a history of conversations between a user and
3.1 Hierarchical Aggregate Tree (HAT)
an assistant(system), we are to predict the re-
sponse of the system. In other words, given Hierarchical Aggregate Tree (HAT) is de-
the history of conversation at time step t, Ht = fined as HST = (L, M, A, Σ) , where,
{u1 , a1 , u2 , a2 · · · at−1 }(where ui represents a user L = {l0 , l1 , . . . , ln } is a finite set of layers,M
response and ai represents assistant response) and a is the memory length, a positive integer,A is an
user query ut , our task is to find a relevant function aggregation function and Σ is a set of nodes. The
f such that layers in L are hierarchical, with l0 being the root
layer. Each layer li ∈ L is a set of nodes Σi , where
Σi ⊆ Σ. A node σ ∈ Σ can have children nodes
at ≈ LLM (ut , f (Ht |ut ))
and contains a text element. A node recalculates
Where f can be thought of as some function, its text property, whenever there is a change in
mapping the entire history of conversation to a con- the node’s children. The text property of a node
densed space conditioned on the user query at time σ ∈ Σi i ̸= |L| is given by A(C(σ)) , where
step t. Note that, f can be a selection function, C(σ) = {τ ∈ Σ | τ is a child of σ}
a trained neural network as a memory agent, or
even a simple summary function. In our exper- Aggregate Function
iments, the dataset is organized as sessions and
The aggregate function A maps the text of child
episodes. An episode consists of multiple consec-
nodes to produce the text stored at a parent node.
utive dialogue sessions with a specific user, often-
times requiring information from previous sessions A : P(Σ) → T ext
to respond back. Hence in addition to the history
H, at the end of every session, we also have a mem- Where, P(Σ) represents the power set of nodes.
ory Mse for session s and episode e, constructed by The exact implementation of A can vary depend-
combining Hse and Ms−1 e , where H e represents the ing on the usecase. Figure 2 for example, depicts
s
history of session s and episode e. Therefore, we an HAT with concatenation as aggregate function.
also have this auxiliary task of finding Mse given For our implementation, we use GPT as aggregate
e
Ms−1 and Hse . function.
It is important that the aggregate function should
3 Methedology be designed to produce concise summaries reflect-
ing key information from child nodes for a mean-
An overview of our methodology is depicted in ingful HAT. It executes whenever new child nodes
Figure 1. The memory module is tasked to retrieve are inserted to maintain consistency up the tree. For
a relevant context that has necessary information our implementation, we use GPT as aggregate func-
to respond to user query. The module does so by tion. We say GPT to summarize persona’s from
leveraging 2 primary components: the children’s text. The exact prompt given can be
found in the appendix.
1. Hierarchical Aggregate Tree (HAT) : We
propose the use of a novel datastructure called Node
HAT to manage memory, particularly in long A node σ ∈ Σ represents a single element in
form texts like open domain conversations. It the HAT structure. Whenever the set of chil-
has some important property like resolution dren nodes for σ changes, the update_text()
Figure 1: Overview of our approach. Given a user query, the memory module is responsible to give a relevant
context by traversing the HAT. The LLM then generates response for the user query, given the context.

The HAT datastructure satisfies the invariant that


if σk,i ∈ Σk is a child of τk−1,j ∈ Σk−1 ↔ j =
⌊ Mi ⌋, where σk,i is ith node in kth layer. This con-
nects child nodes to parent nodes between layers
based on the memory length, M . When insert-
ing a new node ϕ ∈ Σy , the aggregation func-
tion A is recursively applied to update ancestor
Figure 2: Illustration of HAT, with example aggregation nodes. That is, For all σy−1,z ∈ P (ϕ) that are
function as simple concatenation and memory length of parents of ϕ, σy,z .text = A(C(σy,z )), Where
2. C(σy,z ) = {τ ∈ Σ|τ is a child of σy,z }
This maintains the invariant while propagating
updated information through the tree. The number
method is called to update σ’s text. This text
of layers |L| and nodes |Σ| changes dynamically
is given by applying the aggregator function
based on M and node insertions. |L| represents
to the set of texts from the current child nodes.
the depth of the tree.
The previous aggregated texts of given dif-
ferent combinations of children are cached in
For Brevity, we restrict further detailing on the
previous_complete_state to enable reuse
datastructure. However, All necessary details to
instead of recomputing.
replicate the datastructure shall be given in ap-
pendix.
After updating, σ triggers the parent node to
also update, thereby propagating changes upwards 3.2 Memory Agent
in the HAT structure.
The memory agent is tasked with finding the opti-
mal traversal in HAT conditioned on a user query
Each node σ contains the following components:
q. This can be mathematically formulated as:
• id: A unique identifier
a∗0:T = arg max R(s0:T , a0:T |q)
a0:T
• text: The node’s aggregated text content
where s0:T is the state sequence from the root node
• previous_complete_state: A dictionary to a leaf node, a0:T is the action sequence, and
mapping hashes of the node’s children to the R is the total reward over the traversal dependent
previously aggregated text when the node had on q. Reward, in our case is the quality of re-
those children sponse the model is giving. This essentially can be
posed as a Markov Decision Process (MDP). The
• parent: The parent node in the HAT (None
agent starts at the root node s0 and takes an action
for the root node)
at ∈ A at each time step, transitioning to a new
• aggregator: The aggregation function A state st+1 ∼ P(·|st , at ). For cases like this, It is
difficult to design a reward function, and we will re-
• children: The set of child nodes C(σ) quire annotated training data for training the agent.
Hence, we resort to GPT and will ask GPT to act chat online in a series of sessions as is for exam-
as a traversal agent. GPT is well-suited for condi- ple common on messaging platforms. Each indi-
tional text generation, which allows it to traverse vidual chat session is not especially long before
HAT by generating an optimal sequence of actions it is “paused”. Then, after a certain amount of
based on the text representation at each node and (simulated) time has transpired, typically hours or
the user query. days, the speakers resume chatting, either contin-
The exact prompt used for the memory agent can uing to talk about the previous subject, bringing
be found in the Appendix. up some other subject from their past shared his-
The MDP is defined by the tuple (S, A, P, R, γ), tory, or sparking up conversation on a new topic.
where: Number of episodes per session is on Table 1.
We utilize 501 episodes whose session 5 is avail-
• S - set of tree nodes able from the Test set.

• A = {U, D, L, R, S, O, U } - set of actions 4.2 Evaluation Metrics


– U - move up the tree We evaluate the dialogue generation performance
– D - move down of our model using automatic metrics. We
report BLEU-1/2 , F1 score compared to human-
– L - move left
annotated responses, and DISTINCT-1/2.
– R - move right
– S - reset to root node BLEU (Bilingual Evaluation Understudy) mea-
– O - sufficient context for query q sures overlap between machine generated text and
– U - insufficient context for query q human references, with values between 0 to 1
(higher is better). We use BLEU-1 and BLEU-2
• P - state transition probabilities which compare 1-grams and 2-grams respectively.
F1 score measures overlap between generated re-
• R : S × A → R - reward function sponses and human references. We report F1 to
assess relevance of content. DISTINCT-1/2 quan-
• γ ∈ (0, 1) - discount factor
tifies diversity and vocabulary usage based on the
GPT is well-suited for conditional text genera- number of distinct unigrams and bigrams in the
tion, which allows it to traverse HAT by generating generated responses, normalized by the total num-
an optimal sequence of actions based on the text ber of generated tokens (higher is better).
representation at each node and the user query. But,
4.3 Baselines
It is important to note that our proposed framework
is open and generic. The memory agent can be We benchmark against three trivial methods: 1) All
anything from neural network or RL Agent to an Context: LLM generated dialogues with all dia-
GPT Aproximation. logues in context. 2) Part Context: LLM generates
dialogues with only current session’s context. 3)
4 Experiments Gold Memory: The LLM generated dialogues with
Gold memory from the dataset as context. Fur-
4.1 Dataset ther, We also evaluate different Traversal methods
including BFS,DFS,and GPTAgent. In BFS and
Data Type Train Valid Test DFS, we follow naive BFS or DFS traversal, and
Session 1 8939 1000 1015 at every step as gpt if this information is enough to
Session 2 4000 500 501 answer the user question. If it tells okay, we stop
Session 3 4000 500 501 there and return the context.
Session 4 1001 500 501
Session 5 - 500 501 5 Results
Table 1: Number of episodes across sessions. Table 2 compares GPTAgent to breadth-first search
(BFS) and depth-first search (DFS) traversal
We use the multi-session-chat dataset from (Xu methods. Across BLEU-1/2 and DISTINCT-1/2
et al., 2022). The dataset contains two speakers metrics, GPTAgent significantly outperforms both
BLEU-1/2 DISTINCT-1/2 HTTP API calls to gpt, is causing an additional
BFS 0.652 / 0.532 0.072 / 0.064 overhead on the time taken. These limitations could
DFS 0.624 / 0.501 0.064 / 0.058 be overcome, by turning to heuristic-based tree
GPTAgent 0.721 / 0.612 0.092 / 0.084 searches or Monte-Carlo Tree search like methods
in the future. Further, A Coupled-HAT : One HAT
Table 2: Dialogue generation comparison between with textual information and another HAT with
traversal methods
dense vector representation, would be more effi-
cient. Combined with hybrid retrieval techniques,
in quality and diversity of dialogues. This supports we could have a much more efficient way of doing
our approach of learning to traverse conditioned conditional retrival. Further, Another limitation of
on query relevance over hand-designed heuristics. this kind of Retrieval system is that, As the leaf
nodes expands exponentially, the memory footprint
Next, Table 3 benchmarks GPTAgent against con- might become larger than expected. Several op-
texts with complete, partial or gold dialogue history. timizations on this front, also could be potential
GPTAgent achieves highest scores, demonstrating future work in this direction.
the benefit of our focused memory retrieval. Access
to full history or gold references improves over just 7 Conclusion
current context, but lacks efficiency of precisely In this work, we have presented the Hierarchi-
identifying relevant information. Finally, Table 4 cal Aggregate Tree (HAT) - a new data structure
evaluates fidelity of memories generated by GPTA- designed specifically for memory storage and re-
gent compared to dataset ground truth references. trieval for long form text based conversational
We again see strong results surpassing 0.8 on both agents. Rather than solely providing a summary
word overlap and diversity measures. or retrieving raw excerpts, our key innovation is
recursive aggregation of salient points by travers-
BLEU-1/2 DISTINCT-1/2
ing conditional paths in this tree.We formulate the
All Context 0.612 / 0.492 0.051 / 0.042
tree traversal as an optimization problem using a
Part Context 0.592 / 0.473 0.043 / 0.038
GPT-based memory agent. In conclusion,Our Ex-
Gold Memory 0.681 / 0.564 0.074 / 0.064
periments demonstrate significant gains over al-
GPTAgent 0.721 / 0.612 0.092 / 0.084
ternate traversal schemes and baseline methods.
Table 3: Dialogue generation comparison between base- HAT introduces a flexible memory structure for dia-
lines logue agents that balances extraction breadth versus
depth through hierarchical aggregation. Our anal-
ysis confirms the viability and advantages of con-
BLEU-1/2 DISTINCT-1/2 F1 ditional traversal over existing limited budget so-
GPT 0.842 / 0.724 0.102 / 0.094 0.824 lutions, opening up further avenues for augmented
language model research.
Table 4: Memory generation scores

In summary, experiments validate effectiveness of References


our method in extracting salient dialogue context in
Sanghwan Bae, Donghyun Kwak, Soyoung Kang,
long form conversations. Both conversations and Min Young Lee, Sungdong Kim, Yuin Jeong, Hyeri
summarized memories demonstrate quality and rel- Kim, Sang-Woo Lee, Woomyoung Park, and Nako
evance gains over alternate approaches. The query Sung. Keep me updated! memory management in
conditioning provides efficiency over exhaustive long-term conversations.
history while retaining enough specificity for the Gibbeum Lee, Volker Hartmann, Jongho Park, Dim-
current need. itris Papailiopoulos, and Kangwook Lee. Prompted
LLMs as chatbot modules for long open-domain con-
6 Limitations versation. In Findings of the Association for Compu-
tational Linguistics: ACL 2023, pages 4536–4554.
While the proposed method has a potential and
Qingyue Wang, Liang Ding, Yanan Cao, Zhiliang Tian,
could work with long-form texts, The current im- Shi Wang, Dacheng Tao, and Li Guo. Recursively
plementation takes longer than a usual time taken summarizing enables long-term dialogue memory in
by a dialogue agent to respond. Also, Making large language models.
Jing Xu, Arthur Szlam, and Jason Weston. 2021a. Be-
yond goldfish memory: Long-term open-domain con-
versation. arXiv preprint arXiv:2107.07567.
Jing Xu, Arthur Szlam, and Jason Weston. 2022. Be-
yond goldfish memory: Long-term open-domain con-
versation. In Proceedings of the 60th Annual Meet-
ing of the Association for Computational Linguistics
(Volume 1: Long Papers), pages 5180–5197, Dublin,
Ireland. Association for Computational Linguistics.
Xinchao Xu, Zhibin Gou, Wenquan Wu, Zheng-Yu Niu,
Hua Wu, Haifeng Wang, and Shihang Wang. 2021b.
Long time no see! open-domain conversation with
long-term persona memory.
Tong Zhang, Yong Liu, Boyang Li, Zhiwei Zeng, Peng-
wei Wang, Yuan You, Chunyan Miao, and Lizhen
Cui. History-aware hierarchical transformer for
multi-session open-domain dialogue system.

You might also like