A 等 - 2024 - Enhancing Long-Term Memory using Hierarchical Aggregate Tree for Retrieval Augmented Generation
A 等 - 2024 - Enhancing Long-Term Memory using Hierarchical Aggregate Tree for Retrieval Augmented Generation
Aggregate Tree (HAT) memory structure to re- LLMs are through: a) finetuning and b) retrieval-
cursively aggregate relevant dialogue context augmented generation (RAG). Finetuning requires
through conditional tree traversals. HAT en- access to model weights and can only be done with
capsulates information from children nodes, en- open models. RAG relies on strategies to retrieve
abling broad coverage with depth control. We information from a datastore given a user query,
formulate finding best context as optimal tree without needing internal model information, allow-
traversal. Experiments show HAT improves di-
ing it to be used with more model types. However,
alog coherence and summary quality over base-
line contexts, demonstrating the technique’s RAG is limited by the model’s context length bud-
effectiveness for multi-turn reasoning without get. Although very large context LLMs exist, the
exponential parameter growth. HAT balances budget remains limited. Hence, how and what data
information breadth and depth for long-form is retrieved given a user query is an important re-
dialogues. This memory augmentation enables search task.With the advent of "LLM agents", a
more consistent, grounded long-form conversa- separate memory management module is often re-
tions from LLMs. quired. Some solutions train models for this task
while others take a retrieval-based approach. Still
1 Introduction
others use the LLM itself to accomplish it. How-
Large language models (LLMs) like ChatGPT ever, current approaches tend to rely solely on pro-
are having an impact across various areas and viding a summary versus retrieving from a data-
applications. One of the most straightforward store, with little in between. We hence propose
applications is using LLMs as personalized chat a method that combines both worlds using a new
agents. There have been several efforts to develop data structure called the "Hierarchical Aggregate
chatbots for various applications, both generic and Tree".
domain-specific, particularly after the advent of
LLMs and associated Pythonic libraries, which 1.1 Recent works
have made it very easy for people to develop their There has been growing interest in developing tech-
own chatbots. niques to enhance LLMs’ capabilities for long-term
multi-session dialogues. (Xu et al., 2021a) col-
Customizing and aligning these LLMs is still an lected a dataset of human conversations across mul-
active area of research. One basic alignment we tiple chat sessions and showed that existing models
would want is to make chatbots behave according perform poorly in maintaining consistency, neces-
to our expectations, particularly when context sitating long-context models with summarization
is specific and requires some information to be and retrieval abilities. (Xu et al., 2021b) proposed
highlighted that is not necessarily in the model’s a model to extract and update conversational per-
pretraining corpus. While LLMs are considered sonas over long-term dialogues. (Bae et al.) pre-
snapshots of the internet, one limitation is that sented a task and dataset for memory management
0
Drafted: December 18,2023 in long chats, dealing with outdated information.
Arxiv Submission: June 11, 2024 (Wang et al.) proposed recursive summarization
for equipping LLMs with long-term memory. (Lee retention as detailed in section 3.1. At a high-
et al.) showed prompted LLMs can match fine- level, the intended characteristic of HAT is
tuned models for consistent open-domain chats. that we have more resolution as we move Top-
(Zhang et al.) encoded conversation history hierar- Down and we have more latest information as
chically to generate informed responses. Develop- we move Left-Right.
ing more sophisticated strategies for information
2. Memory Agent : The memory agent is tasked
integration remains an open challenge.
to find the best traversal in HAT, conditioned
2 Task Definition on the user query such that at the end of traver-
sal, we are in a node that contains relevant
The Task at hand is straightforward. Given context to answer the user query.
a history of conversations between a user and
3.1 Hierarchical Aggregate Tree (HAT)
an assistant(system), we are to predict the re-
sponse of the system. In other words, given Hierarchical Aggregate Tree (HAT) is de-
the history of conversation at time step t, Ht = fined as HST = (L, M, A, Σ) , where,
{u1 , a1 , u2 , a2 · · · at−1 }(where ui represents a user L = {l0 , l1 , . . . , ln } is a finite set of layers,M
response and ai represents assistant response) and a is the memory length, a positive integer,A is an
user query ut , our task is to find a relevant function aggregation function and Σ is a set of nodes. The
f such that layers in L are hierarchical, with l0 being the root
layer. Each layer li ∈ L is a set of nodes Σi , where
Σi ⊆ Σ. A node σ ∈ Σ can have children nodes
at ≈ LLM (ut , f (Ht |ut ))
and contains a text element. A node recalculates
Where f can be thought of as some function, its text property, whenever there is a change in
mapping the entire history of conversation to a con- the node’s children. The text property of a node
densed space conditioned on the user query at time σ ∈ Σi i ̸= |L| is given by A(C(σ)) , where
step t. Note that, f can be a selection function, C(σ) = {τ ∈ Σ | τ is a child of σ}
a trained neural network as a memory agent, or
even a simple summary function. In our exper- Aggregate Function
iments, the dataset is organized as sessions and
The aggregate function A maps the text of child
episodes. An episode consists of multiple consec-
nodes to produce the text stored at a parent node.
utive dialogue sessions with a specific user, often-
times requiring information from previous sessions A : P(Σ) → T ext
to respond back. Hence in addition to the history
H, at the end of every session, we also have a mem- Where, P(Σ) represents the power set of nodes.
ory Mse for session s and episode e, constructed by The exact implementation of A can vary depend-
combining Hse and Ms−1 e , where H e represents the ing on the usecase. Figure 2 for example, depicts
s
history of session s and episode e. Therefore, we an HAT with concatenation as aggregate function.
also have this auxiliary task of finding Mse given For our implementation, we use GPT as aggregate
e
Ms−1 and Hse . function.
It is important that the aggregate function should
3 Methedology be designed to produce concise summaries reflect-
ing key information from child nodes for a mean-
An overview of our methodology is depicted in ingful HAT. It executes whenever new child nodes
Figure 1. The memory module is tasked to retrieve are inserted to maintain consistency up the tree. For
a relevant context that has necessary information our implementation, we use GPT as aggregate func-
to respond to user query. The module does so by tion. We say GPT to summarize persona’s from
leveraging 2 primary components: the children’s text. The exact prompt given can be
found in the appendix.
1. Hierarchical Aggregate Tree (HAT) : We
propose the use of a novel datastructure called Node
HAT to manage memory, particularly in long A node σ ∈ Σ represents a single element in
form texts like open domain conversations. It the HAT structure. Whenever the set of chil-
has some important property like resolution dren nodes for σ changes, the update_text()
Figure 1: Overview of our approach. Given a user query, the memory module is responsible to give a relevant
context by traversing the HAT. The LLM then generates response for the user query, given the context.