RAG PText
RAG PText
The historical profiles are as follows: [History 1]; [History 2]; [History 3].
Prompt Based on the historical profiles provided, please generate a title for the given user's input text. [Input]
Rouge-1: 0.1905
[Input] from user [10000041]
Hillary Clinton Eviscerates
Text: She called his foreign policy "dangerously incoherent" and said he was Ground
Donald Trump In Her Best
"temperamentally unfit" to serve as president. Truth
Speech Yet
Rouge-1: 0.4211
Text: She also said Text: It turns out calling Text: He makes Cruz
"Bernie," … the president … look sane … Hillary Clinton Slams
CFRAG Title: Hillary Clinton … Title: … Donald Trump Title: Donald Trump … Donald Trump's Foreign
Policy
[History 1] from [History 2] from [History 3] from
user [10000041] user [10000423] user [10000275] Llama3
Figure 1: An example from the LaMP-4 dataset [32]. The task of LaMP-4 is to generate personalized news headlines based on
user input. This example illustrates the benefit of collaborative information for LLM personalization: (a) The top shows results
retrieved by the existing RAG method from the current user’s history, where we can only infer that “She” in the user’s input
refers to “Hillary Clinton’‘. (b) The bottom shows results retrieved by our method from similar users’ histories, allowing us to
infer further that “his” in the user’s input refers to “Donald Trump” thus enabling the generation of a more accurate result.
our method, which retrieves documents from the history of similar relevance between the query and documents, we also considered
users. In this case, we can further infer that “his” in the user’s input the user’s preferences for different documents to enable personal-
refers to “Donald Trump”, leading to a better generation result. ized retrieval. Additionally, we further fine-tune the retriever and
From this example, we can see that incorporating collaborative in- reranker based on the feedback from the LLM to ensure that the
formation allows the retrieval of more diverse documents, helping retrieved documents better support the personalized LLM genera-
the LLM generate results that better meet the user’s needs. tion. Finally, the top-𝑘 documents are concatenated with the user’s
Inspired by the application of collaborative filtering in recom- input query to form a prompt, which is then fed into the LLM for
mender systems [11, 40, 46], we propose to adapt collaborative personalized generation.
information into RAG to personalize LLMs. However, adapting col- The major contributions of the paper are summarized as follows:
laborative filtering to personalized RAG presents two challenges. • We analyzed the necessity of introducing collaborative filtering
Challenge 1: How to incorporate collaborative information. With- into RAG for LLM personalization and identified the challenges:
out explicit labels indicating which users are similar, which users’ how to introduce collaborative information and how to retrieve
information should be selected to help personalize generation for documents that support personalized LLM generation.
the current user? Challenge 2: How to retrieve documents that • We proposed a method called CFRAG, which uses contrastive
support personalized LLM generation, rather than relying on tradi- learning to train user embeddings for retrieving similar users and
tional semantic relevance? Pre-trained dense retrieval models [54] incorporating collaborative information. It leverages LLM feedback
only retrieve based on the semantic relevance between the query to train the personalized retriever and reranker, enabling them to
and document. Directly using these models for retrieval may not retrieve documents that support personalized LLM generation.
necessarily result in content that allows the LLM to generate out- • Experimental results on the Language Model Personalization
puts that meet the user’s needs [25, 35]. (LaMP) [32] benchmark validate the effectiveness of CFRAG. The
To address the above challenges, this paper proposes a method experimental analysis also demonstrates the importance of leverag-
named CFRAG which adapts Collaborative Filtering to personal- ing collaborative information.
ized Retrieval Augmented Generation. Firstly, to address Challenge
1, since there are no explicit user similarity labels, we use contrastive
learning [15, 44] to train user embeddings for retrieving similar 2 Related Work
users to introduce collaborative information. Specifically, we apply
Personalization of LLMs. Large Language Models (LLMs) [55]
different data augmentation methods to the user’s history to obtain
have demonstrated remarkable capabilities in various fields, such
different views, and then treat different views of the same user’s
as text generation [22], information retrieval [56], recommender
history as positive samples for each other. Then we use contrastive
systems [5, 41], and so on. However, since LLMs are typically de-
learning on different views to train the user embeddings. Secondly,
signed to serve all tasks with a single model and are trained on
for Challenge 2, we designed a personalized retriever and reranker
broad, domain-agnostic data, they face challenges in adapting to
to retrieve the top-𝑘 documents from the histories of the retrieved
the personalized needs of individual users [4, 32]. Therefore, LLM
users. In both retrieval and reranking, in addition to the semantic
personalization has attracted widespread attention [16, 31, 57].
Retrieval Augmented Generation with Collaborative Filtering for Personalized Text Generation SIGIR ’25, July 13–18, 2025, Padua, Italy.
User 𝑢 …
𝑢1 Retriever 𝑑11 𝑑1𝑘 𝑑1
User Generated
Retrieval
… … … Reranker … Results
𝑑𝑘
𝑢𝑚 Retriever … LLM
Reranked
User Set 𝑑𝑚1 𝑑𝑚𝑘
Retrieved Documents
Users Retrieved Documents
Figure 2: The architecture of CFRAG. From left to right: (a) User Retrieval retrieves similar users (Section 4.1); (b) Retriever
retrieves the top-𝑘 documents from each user’s history (Section 4.2); (c) Reranker reranks the 𝑚 × 𝑘 documents to get the
final top-𝑘 documents, which are then concatenated with the query and input into the LLM for personalized text generation
(Section 4.3).
Existing works on LLM personalization mainly include the fol- in-context RAG framework has also been applied to LLM person-
lowing types of methods: (1) Fine-tuning a personalized LLM for alization, which is personalized by retrieving documents from the
each user [36, 37, 42]; Tan et al. [37] fine-tuned the LLM using user’s history [31, 32, 57]. This paper introduces collaborative filter-
LoRA [12] to get personalized LoRA parameters for each user. ing by retrieving similar users’ histories for better personalization.
(2) Aligning LLMs with user-specific preferences through Rein-
forcement Learning from Human Feedback (RLHF) [16, 23, 43]; 3 Problem Formulation
Jang et al. [16] first trained different parameters for various objec- Let U = {𝑢 1, 𝑢 2, . . . , 𝑢𝑀 } denotes the set of all users, where 𝑀 is the
tives using RLHF, then merged these parameters based on users’ number of users. Each user 𝑢 ∈ U has a chronologically ordered
personalized needs. (3) Incorporating user-specific context into the history H𝑢 = [𝑑 1, 𝑑 2, . . . , 𝑑 𝑁 ] which includes all her historical
prompt [21, 27, 29, 31, 32, 57]. Richardson et al. [29] used instruction- documents, where 𝑁 is the number of documents in the history.
tuned LLMs to summarize user history and then incorporated it |D|
The personalized text generation dataset is D = {(𝑢, 𝑞, 𝑦)𝑖 }𝑖=1 . For
into prompts for generation. Salemi et al. [31, 32] used RAG to
each instance, 𝑞 is the query input by the user 𝑢 to the LLM, and
retrieve relevant documents from user history based on the input
𝑦 is the target output. Our goal is first to introduce collaborative
query and incorporated them into the prompt.
information by retrieving the top-𝑚 most similar users for user 𝑢:
This paper further introduces collaborative filtering for per-
sonalization based on the RAG framework. Collaborative filter- Uretrieved = {𝑢 1, 𝑢 2, . . . , 𝑢𝑚 }.
ing has already been applied in fields such as recommender sys- Then, we use a retriever to retrieve the top-𝑘 documents from each
tems [33, 34, 38, 48–52] and has been proven effective. It assumes of the 𝑚 users’ histories, resulting in a total of 𝑚 × 𝑘 documents.
that users who have interacted with similar items share similar
preferences, and recommending items from similar users to the Dretrieved = {𝑑𝑖,𝑗 |𝑖 ∈ {1, . . . , 𝑚}, 𝑗 ∈ {1, . . . , 𝑘 }}.
current user can meet their needs. Some works [11, 46] learn the Finally, we use a reranker to rerank these 𝑚 × 𝑘 documents and
collaborative information between users and items through matrix obtain the final top-𝑘 documents:
factorization [19], while others [10, 40] further explore higher-order
collaborative information between users and items using graph Dreranked = {𝑑𝑖 |𝑖 ∈ {1, . . . , 𝑘 }}.
neural networks. The application of collaborative filtering in LLM These top-𝑘 documents will be concatenated with the user’s query
personalization remains under-explored. 𝑞 as a prompt and input into the LLM, enabling it to generate a
Retrieval Augmented Generation. Retrieval Augmented Gen- response that aligns with the target output 𝑦.
eration [7, 8] introduces external knowledge through document This paper primarily focuses on how to retrieve Uretrieved to
retrieval, alleviating issues such as LLM hallucinations [53], and en- introduce collaborative information, and how to train the retriever
hancing LLMs’ capabilities in knowledge-intensive tasks [17] such and reranker so that they can effectively retrieve documents that
as open-domain question answering [14, 20]. Some works [3, 13] support the personalized LLM generation.
encode retrieved documents using separate encoders, and then fuse
the results with the language model using cross-attention. A more 4 Our Approach
common approach is to directly include the retrieved documents This section introduces our method CFRAG. CFRAG’s overall archi-
in the prompt of the LLM [2, 9, 20, 25, 35]. In recent years, this tecture is shown in Figure 2. As mentioned in Section 1, to address
SIGIR ’25, July 13–18, 2025, Padua, Italy. Teng Shi et al.
…
LLM
where eval(·) measures the difference between the target output
… … …
𝑑𝑚𝑘 𝑑𝑚𝑘
𝑦 and the LLM’s output, using metrics such as ROUGE [24] score.
Candidate Rank Score Evaluated Score Candidate A larger value returned by eval(·) indicates a better-generated
Documents (eg. ROUGE) Documents
result. Similarly, we can also calculate the score distribution of the
candidate documents by the retrieval model based on 𝑆𝑢,𝑞,𝑑 retriever in
Figure 4: The method of training the retriever and reranker
Eq. (5):
using LLM feedback.
retriever )
exp(𝑆𝑢,𝑞,𝑑
𝑖,𝑗
𝑝 retriever (𝑑𝑖,𝑗 |𝑞, 𝑢) = Í Í . (7)
between the query and the candidate documents: 𝑚 𝑘 exp(𝑆 retriever )
𝑖=1 𝑗=1 𝑢,𝑞,𝑑 𝑖,𝑗
retriever
𝑆𝑞,𝑑 = cos(Encoder𝑞 (𝑞), Encoder𝑑 (𝑑)), (3) We aim for the retrieval model to retrieve documents that lead to
better LLM-generated results, which means making the distribution
where Encoder𝑞 (·) → R𝑑 and Encoder𝑑 (·) → R𝑑 are the encoders 𝑝 retriever (𝑑 |𝑞, 𝑢) in Eq. (7) closer to the distribution 𝑝 LLM (𝑑 |𝑞, 𝑦) in
for the query and the document in the retrieval model, respectively. Eq (6). Therefore, we compute the KL divergence between the two
retriever directly for re-
Pre-trained retrieval models typically use 𝑆𝑞,𝑑 distributions as the loss to optimize the retriever:
retriever only considers the semantic relevance
trieval. However, 𝑆𝑞,𝑑
Lretriever = KL(𝑝 retriever (𝑑 |𝑞, 𝑢) || 𝑝 LLM (𝑑 |𝑞, 𝑦)). (8)
between the query and the document. Since different users might
input the same query but expect different outputs due to their vary-
4.3 Document Rerank
ing preferences, we further account for user personalization by
calculating the preference score of the user for the document as After retrieving Dretrieved through the retriever, in this section, we
follows: further refine the results by reranking Dretrieved to obtain the final
top-𝑘 ranked results Dreranked = {𝑑𝑖 |𝑖 ∈ {1, . . . , 𝑘 }}.
retriever
𝑆𝑢,𝑑 = cos(MLP1 (e𝑢 ), Encoder𝑑 (𝑑)), (4)
4.3.1 Reranker. We use a pre-trained cross-encoder (such as the
where MLP1 : R𝑑 → R𝑑 is a multi-layer perceptron that maps BGE reranker [45]) to encode the query and document, obtaining
the user embedding to the space where the cosine similarity is the hidden state corresponding to the [CLS] token from the last
computed. e𝑢 is the embedding obtained in Section 4.1.1. The total layer:
score for retrieval is computed as follows: h𝑞,𝑑 = CrossEncoder(𝑞, 𝑑), (9)
retriever retriever retriever where h𝑞,𝑑 ∈ R𝑑 . Similarly, when reranking, in addition to consid-
𝑆𝑢,𝑞,𝑑 = (1 − 𝛼)𝑆𝑞,𝑑 + 𝛼𝑆𝑢,𝑑 , (5)
ering the semantic relevance between query and document, we also
where 𝛼 is a hyper-parameter that controls the weight of personal- take into account the user’s personalized preferences. However,
ization. since the cross-encoder does not encode documents separately, it
cannot compute the cosine similarity between users and documents
4.2.2 Training. Since the pre-trained dense retrieval model is not as shown in Eq. (4) to express the user preference score. Therefore,
fine-tuned for our specific task, the retrieved results may not nec- we directly concatenate the user embeddings to the output of the
essarily lead to LLM responses that better match the target output cross-encoder to account for the influence of user preferences. The
𝑦 [25, 35]. However, there is no ground truth indicating which doc- overall score used for reranking is calculated as follows:
uments are better. Therefore, we evaluate the difference between
the LLM’s output and the target output 𝑦, using this as a label to reranker
𝑆𝑢,𝑞,𝑑 = MLP3 (CONCAT(h𝑞,𝑑 , MLP2 (e𝑢 ))), (10)
train the retrieval model. Figure 4 shows the process of training the
retriever using LLM feedback. where MLP2 : R𝑑 → R𝑑 and MLP3 : R2𝑑 → R are two multi-layer
Specifically, we first use the pre-trained retrieval model to re- perceptions. CONCAT(·) denotes the concatenation operation.
trieve the top-𝑘 documents from each of the 𝑚 users’ histories
retriever in Eq. (3), resulting in a total of 𝑚 × 𝑘 candidate 4.3.2 Training. Similar to the retriever’s training in Section 4.2.2,
based on 𝑆𝑞,𝑑
we also want the reranker to assign higher scores to the documents
documents. These documents are then concatenated with the query that lead to better LLM-generated results. Therefore, we train the
one by one and used as prompts for the LLM, producing 𝑚 × 𝑘 reranker using a similar approach.
outputs: We use the trained retrieval model from Section 4.2.2 to retrieve
{𝑂𝑞,𝑑𝑖,𝑗 = LLM(𝑞, 𝑑𝑖,𝑗 )|𝑖 ∈ {1, . . . , 𝑚}, 𝑗 ∈ {1, . . . , 𝑘 }}, top-𝑘 documents from the history of each of the 𝑚 users, result-
ing in a total of 𝑚 × 𝑘 candidate documents. These documents
where LLM(𝑞, 𝑑𝑖,𝑗 ) represents the output generated by inputting the are concatenated with the query 𝑞 and used as prompts for the
concatenated query 𝑞 and document 𝑑𝑖,𝑗 into the LLM. Then, based LLM, producing 𝑚 × 𝑘 outputs. Similar to Eq.(6), we can obtain the
on the quality of these outputs, we can calculate the distribution of distribution 𝑝 LLM (𝑑 |𝑞, 𝑦) of these candidate documents. Based on
SIGIR ’25, July 13–18, 2025, Padua, Italy. Teng Shi et al.
Table 1: Statistics of the datasets used in this paper. 5.1.3 Baselines. In this work, we compare CFRAG with the follow-
ing methods.
Dataset LaMP-1 LaMP-2 LaMP-3 LaMP-4 LaMP-5 LaMP-7
No Personalization: We directly input the user’s query into
#Users 6,542 929 20,000 1,643 14,682 13,437 the LLM without retrieving from user history, using this as the
#Train 6,542 5,073 20,000 12,500 14,682 13,437
non-personalized baseline. We refer to this method as Zero Shot.
#Dev 1,500 1,410 2,500 1,500 1,500 1,498
#Test 1,500 1,557 2,500 1,800 1,500 1,500 Personalized Baselines: We compared CFRAG with methods
that personalize by retrieving from user history using different
retrieval models, including: (1) Random selects 𝑘 items randomly
reranker in Eq. (10), we can also get the score distribution of the
𝑆𝑢,𝑞,𝑑 from the user’s history; (2) Recency selects the most recent 𝑘 items
candidate documents by the reranker: from the user’s history; (3) BM25 [30] retrieves top-𝑘 items from the
reranker )
user’s history using BM25; (4) BGE [45] retrieves top-𝑘 items from
exp(𝑆𝑢,𝑞,𝑑 the user’s history using BGE retriever; (5) ROPG [31] optimizes the
𝑖,𝑗
𝑝 reranker (𝑑𝑖,𝑗 |𝑞, 𝑢) = Í Í . (11)
𝑚 𝑘 exp(𝑆 reranker ) dense retrieval model based on the results generated by the LLM.
𝑖=1 𝑗=1 𝑢,𝑞,𝑑 𝑖,𝑗
5.1.4 Implementation Details. We conducted experiments on two
We compute the KL divergence between distributions 𝑝 reranker (𝑑 |𝑞, 𝑢) LLMs: Llama3-8B-Instruct [1] and Qwen2-7B-Instruct [47]. In this
and 𝑝 LLM (𝑑 |𝑞, 𝑦) as the loss to optimize the reranker: paper, we do not fine-tune the LLM because fine-tuning is costly
Lreranker = KL(𝑝 reranker (𝑑 |𝑞, 𝑢) || 𝑝 LLM (𝑑 |𝑞, 𝑦)). (12) and could cause the LLM to retain user information, potentially
compromising user privacy. To ensure a fair comparison, we use
The loss allows the reranker to assign higher scores to documents greedy search for text generation. The dense retrieval model used
that enable better personalized generation by the LLM. in all methods is bge-base-en-v1.52 [45]. The cross-encoder used
for reranker in Section 4.3.1 is bge-reranker-base3 [45]. All hyper-
4.4 Discussion parameters for the baselines are searched according to the set-
Computational Efficiency. CFRAG comprises three modules. The tings in the original papers. The embedding dimension 𝑑 is set to
User Encoder is a lightweight, single-layer Transformer with inputs 768. The number of retrieved documents 𝑘 is set to 5, and the
derived from a frozen BGE embedding (dimension 768), resulting in number of retrieved users 𝑚 is tuned among {2, 3, 4, 5, 6}. The
minimal parameter overhead. The retriever and reranker are com- Trm(·) encoder in Eq. (1) has 1 layer and 2 heads. The hyper-
parable in size to BERT (approximately 100M parameters). Overall, parameters 𝐿𝑐 , 𝐿𝑚 , and 𝐿𝑟 used for data augmentation in Sec-
the training cost is low due to the modest parameter size. During tion 4.1.2 are set to 0.7, 0.3, and 0.3, respectively. The temperature
inference, user and document embeddings can be precomputed, parameters 𝜏1 in Eq. (2) is tuned among {0.01, 0.1, 1}. The weight
requiring only similarity calculations for retrieval, ensuring min- 𝛼 in Eq. (5) is tuned among [0.01, 1.0]. The learning rate is tuned
imal computational cost. This efficiency enables our method to among {1𝑒-3, 1𝑒-4, 1𝑒-5}. Adam [18] is used to conduct the optimiza-
generalize quickly to new datasets. tion. The data input and output formats are provided in Appendix A.
Table 2: Comparison of the performance of CFRAG with other approaches on the LaMP benchmark. ↑ indicates that a higher
value for the corresponding metric is better, while ↓ indicates that a lower value is better. The best and the second-best methods
are highlighted in bold and underlined fonts, respectively. “*” indicates improvements over the second-best methods are
statistically significant (𝑡-test, 𝑝-value< 0.05).
Table 3: Ablation Study of CFRAG on LaMP based on Llama3. “MEAN” represents using the average of user history document
embeddings as the user embedding. “w/o” indicates the corresponding module in CFRAG is removed.
can retrieve the documents that meet the personalized generation 0.664 top-(m-2m) 0.324 0.498 top-(m-2m) 0.414
top-m top-m
needs of LLM is crucial.
ROUGE-1
Accuracy
ROUGE-L
0.648 0.318 0.486 0.408
F1
ROUGE-1
Accuracy
ROUGE-L
ROUGE-1
0.644 0.321 0.478 0.410
Accuracy
ROUGE-L
0.648 0.318 0.480 0.412
F1
F1
0.641 0.319 0.472 0.405
0.632 0.312 0.476 0.408
0.638 0.317 0.466 0.400
0.616 0.306 0.472 0.404
0.635 0.315 0.460 0.395
1 2 3 4 5 6 1 2 3 4 5 6
0.600 0.300 0.468 0.400
Accuracy F1 ROUGE-1 ROUGE-L top-m top-m
Figure 6: Results using different retrievers and rerankers. Figure 8: Performance under different numbers of retrieved
“BM25” indicates using BM25 as both the retriever and users. The performance is the worst since no collaborative
reranker, while “w/o Tuning” refers to using pre-trained re- information is introduced when 𝑚 = 1.
trievers and rerankers without LLM feedback fine-tuning.
0.660 0.335 0.485 0.420
0.660 0.330 0.485 0.420 Accuracy F1 ROUGE-1 ROUGE-L
Accuracy F1 ROUGE-1 ROUGE-L 0.646 0.327 0.482 0.416
0.648 0.324 0.480 0.414
ROUGE-1
Accuracy
ROUGE-L
0.632 0.319 0.479 0.412
ROUGE-1
Accuracy
ROUGE-L
F1
0.636 0.318 0.475 0.408
0.618 0.311 0.476 0.408
F1
Table 4: The format of input, output, and user history for different datasets in the LaMP [32] benchmark. In the input, {history𝑖 }
will be replaced by the retrieved 𝑖-th history, and each history is represented as shown in the “User History” column. The other
italicized text in the input is replaced with the user’s input. For text generation tasks, to ensure that the LLM does not generate
irrelevant information, we instruct the LLM in the input to generate in JSON format, and then we extract the LLM’s prediction
from the JSON-formatted output.
References [27] Sheshera Mysore, Zhuoran Lu, et al. 2023. Pearl: Personalizing large language
[1] AI@Meta. 2024. Llama 3 Model Card. (2024). https://github.com/meta-llama/ model writing assistants with generation-calibrated retrievers. arXiv preprint
llama3/blob/main/MODEL_CARD.md arXiv:2311.09180 (2023).
[2] Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi. [n. d.]. [28] Aaron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection. with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018).
In The Twelfth International Conference on Learning Representations. [29] Chris Richardson, Yao Zhang, Kellen Gillespie, Sudipta Kar, Arshdeep Singh,
[3] Sebastian Borgeaud, Arthur Mensch, et al. 2022. Improving language models by Zeynab Raeesy, Omar Zia Khan, and Abhinav Sethy. 2023. Integrating summa-
retrieving from trillions of tokens. In International conference on machine learning. rization and retrieval for enhanced personalization via large language models.
PMLR, 2206–2240. arXiv preprint arXiv:2310.20081 (2023).
[4] Jin Chen, Zheng Liu, et al. 2024. When large language models meet personaliza- [30] Stephen E Robertson, Steve Walker, Susan Jones, Micheline M Hancock-Beaulieu,
tion: Perspectives of challenges and opportunities. World Wide Web 27, 4 (2024), Mike Gatford, et al. 1995. Okapi at TREC-3. Nist Special Publication Sp 109 (1995),
42. 109.
[5] Sunhao Dai, Ninglu Shao, et al. 2023. Uncovering chatgpt’s capabilities in rec- [31] Alireza Salemi, Surya Kallumadi, and Hamed Zamani. 2024. Optimization meth-
ommender systems. In Proceedings of the 17th ACM Conference on Recommender ods for personalizing large language models through retrieval augmentation.
Systems. 1126–1132. In Proceedings of the 47th International ACM SIGIR Conference on Research and
[6] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Development in Information Retrieval. 752–762.
Pre-training of Deep Bidirectional Transformers for Language Understanding. In [32] Alireza Salemi, Sheshera Mysore, Michael Bendersky, and Hamed Zamani. 2023.
Proceedings of the 2019 Conference of the North American Chapter of the Association Lamp: When large language models meet personalization. arXiv preprint
for Computational Linguistics: Human Language Technologies, Volume 1 (Long and arXiv:2304.11406 (2023).
Short Papers). [33] Chenglei Shen, Xiao Zhang, Teng Shi, Changshuo Zhang, Guofu Xie, and Jun Xu.
[7] Wenqi Fan, Yujuan Ding, Liangbo Ning, Shijie Wang, Hengyun Li, Dawei Yin, 2024. A survey of controllable learning: Methods and applications in information
Tat-Seng Chua, and Qing Li. 2024. A Survey on RAG Meeting LLMs: Towards retrieval. arXiv preprint arXiv:2407.06083 (2024).
Retrieval-Augmented Large Language Models. In Proceedings of the 30th ACM [34] Teng Shi, Zihua Si, Jun Xu, Xiao Zhang, Xiaoxue Zang, Kai Zheng, Dewei Leng,
SIGKDD Conference on Knowledge Discovery and Data Mining. 6491–6501. Yanan Niu, and Yang Song. 2024. UniSAR: Modeling User Transition Behaviors
[8] Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, between Search and Recommendation. In Proceedings of the 47th International
Jiawei Sun, and Haofen Wang. 2023. Retrieval-augmented generation for large ACM SIGIR Conference on Research and Development in Information Retrieval.
language models: A survey. arXiv preprint arXiv:2312.10997 (2023). 1029–1039.
[9] Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Mingwei Chang. 2020. [35] Weijia Shi, Sewon Min, Michihiro Yasunaga, Minjoon Seo, Richard James, Mike
Retrieval augmented language model pre-training. In International conference on Lewis, Luke Zettlemoyer, and Wen-tau Yih. 2024. REPLUG: Retrieval-Augmented
machine learning. PMLR, 3929–3938. Black-Box Language Models. In Proceedings of the 2024 Conference of the North
[10] Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yongdong Zhang, and Meng American Chapter of the Association for Computational Linguistics: Human Lan-
Wang. 2020. Lightgcn: Simplifying and powering graph convolution network for guage Technologies (Volume 1: Long Papers). 8364–8377.
recommendation. In Proceedings of the 43rd International ACM SIGIR conference [36] Zhaoxuan Tan, Zheyuan Liu, and Meng Jiang. 2024. Personalized Pieces: Effi-
on research and development in Information Retrieval. 639–648. cient Personalized Large Language Models through Collaborative Efforts. arXiv
[11] Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng preprint arXiv:2406.10471 (2024).
Chua. 2017. Neural collaborative filtering. In Proceedings of the 26th international [37] Zhaoxuan Tan, Qingkai Zeng, Yijun Tian, Zheyuan Liu, Bing Yin, and Meng
conference on world wide web. 173–182. Jiang. 2024. Democratizing Large Language Models via Personalized Parameter-
[12] Edward J Hu, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Efficient Fine-tuning. arXiv:2402.04401 [cs.CL] https://arxiv.org/abs/2402.04401
Wang, Weizhu Chen, et al. [n. d.]. LoRA: Low-Rank Adaptation of Large Language [38] Jiakai Tang, Sunhao Dai, Teng Shi, Jun Xu, Xu Chen, Wen Chen, Wu Jian, and
Models. In International Conference on Learning Representations. Yuning Jiang. 2025. Think Before Recommend: Unleashing the Latent Reasoning
[13] Gautier Izacard and Édouard Grave. 2021. Leveraging Passage Retrieval with Power for Sequential Recommendation. arXiv:2503.22675 [cs.IR] https://arxiv.
Generative Models for Open Domain Question Answering. In Proceedings of the org/abs/2503.22675
16th Conference of the European Chapter of the Association for Computational [39] A Vaswani. 2017. Attention is all you need. Advances in Neural Information
Linguistics: Main Volume. 874–880. Processing Systems (2017).
[14] Gautier Izacard, Patrick Lewis, et al. 2022. Few-shot learning with retrieval [40] Xiang Wang, Xiangnan He, Meng Wang, Fuli Feng, and Tat-Seng Chua. 2019.
augmented language models. arXiv preprint arXiv:2208.03299 1, 2 (2022), 4. Neural graph collaborative filtering. In Proceedings of the 42nd international ACM
[15] Ashish Jaiswal, Ashwin Ramesh Babu, Mohammad Zaki Zadeh, Debapriya Baner- SIGIR conference on Research and development in Information Retrieval. 165–174.
jee, and Fillia Makedon. 2020. A survey on contrastive self-supervised learning. [41] Likang Wu, Zhi Zheng, Zhaopeng Qiu, Hao Wang, Hongchao Gu, Tingjia Shen,
Technologies 9, 1 (2020), 2. Chuan Qin, Chen Zhu, Hengshu Zhu, Qi Liu, et al. 2024. A survey on large
[16] Joel Jang, Seungone Kim, et al. 2023. Personalized soups: Personalized large language models for recommendation. World Wide Web 27, 5 (2024), 60.
language model alignment via post-hoc parameter merging. arXiv preprint [42] Xinghao Wu, Xuefeng Liu, Jianwei Niu, Haolin Wang, Shaojie Tang, and Guogang
arXiv:2310.11564 (2023). Zhu. 2024. FedLoRA: When Personalized Federated Learning Meets Low-Rank
[17] Nikhil Kandpal, Haikang Deng, Adam Roberts, Eric Wallace, and Colin Raffel. Adaptation. (2024).
2023. Large language models struggle to learn long-tail knowledge. In Interna- [43] Zeqiu Wu, Yushi Hu, Weijia Shi, Nouha Dziri, Alane Suhr, Prithviraj Am-
tional Conference on Machine Learning. PMLR, 15696–15707. manabrolu, Noah A Smith, Mari Ostendorf, and Hannaneh Hajishirzi. 2024.
[18] Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic opti- Fine-grained human feedback gives better rewards for language model training.
mization. arXiv preprint arXiv:1412.6980 (2014). Advances in Neural Information Processing Systems 36 (2024).
[19] Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization tech- [44] Zhuofeng Wu, Sinong Wang, Jiatao Gu, Madian Khabsa, Fei Sun, and Hao Ma.
niques for recommender systems. Computer 42, 8 (2009), 30–37. 2020. Clear: Contrastive learning for sentence representation. arXiv preprint
[20] Patrick Lewis, Ethan Perez, et al. 2020. Retrieval-augmented generation for arXiv:2012.15466 (2020).
knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems [45] Shitao Xiao, Zheng Liu, Peitian Zhang, and Niklas Muennighoff. 2023.
33 (2020), 9459–9474. C-Pack: Packaged Resources To Advance General Chinese Embedding.
[21] Cheng Li, Mingyang Zhang, Qiaozhu Mei, Yaqing Wang, Spurthi Amba Hombaiah, arXiv:2309.07597 [cs.CL]
Yi Liang, and Michael Bendersky. 2023. Teach LLMs to Personalize–An Approach [46] Hong-Jian Xue, Xinyu Dai, Jianbing Zhang, Shujian Huang, and Jiajun Chen.
inspired by Writing Education. arXiv preprint arXiv:2308.07968 (2023). 2017. Deep matrix factorization models for recommender systems.. In IJCAI,
[22] Junyi Li, Tianyi Tang, Wayne Xin Zhao, Jian-Yun Nie, and Ji-Rong Wen. 2024. Vol. 17. Melbourne, Australia, 3203–3209.
Pre-trained language models for text generation: A survey. Comput. Surveys 56, [47] An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Cheng-
9 (2024), 1–39. peng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, et al. 2024. Qwen2 technical
[23] Xinyu Li, Zachary C Lipton, and Liu Leqi. 2024. Personalized language modeling report. arXiv preprint arXiv:2407.10671 (2024).
from personalized human feedback. arXiv preprint arXiv:2402.05133 (2024). [48] Changshuo Zhang, Teng Shi, Xiao Zhang, Qi Liu, Ruobing Xie, Jun Xu, and Ji-
[24] Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. Rong Wen. 2024. Modeling Domain and Feedback Transitions for Cross-Domain
In Text summarization branches out. 74–81. Sequential Recommendation. arXiv preprint arXiv:2408.08209 (2024).
[25] Xi Victoria Lin, Xilun Chen, Mingda Chen, Weijia Shi, Maria Lomeli, Richard [49] Changshuo Zhang, Teng Shi, Xiao Zhang, Yanping Zheng, Ruobing Xie, Qi Liu,
James, Pedro Rodriguez, Jacob Kahn, Gergely Szilvasy, Mike Lewis, et al. [n. d.]. Jun Xu, and Ji-Rong Wen. 2024. QAGCF: Graph Collaborative Filtering for Q&A
RA-DIT: Retrieval-Augmented Dual Instruction Tuning. In The Twelfth Interna- Recommendation. arXiv preprint arXiv:2406.04828 (2024).
tional Conference on Learning Representations. [50] Changshuo Zhang, Xiao Zhang, Teng Shi, Jun Xu, and Ji-Rong Wen. 2025. Test-
[26] Yinhan Liu. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv Time Alignment for Tracking User Interest Shifts in Sequential Recommendation.
preprint arXiv:1907.11692 (2019). arXiv:2504.01489 [cs.IR] https://arxiv.org/abs/2504.01489
Retrieval Augmented Generation with Collaborative Filtering for Personalized Text Generation SIGIR ’25, July 13–18, 2025, Padua, Italy.
[51] Kepu Zhang, Teng Shi, Sunhao Dai, Xiao Zhang, Yinfeng Li, Jing Lu, Xiaoxue [54] Wayne Xin Zhao, Jing Liu, Ruiyang Ren, and Ji-Rong Wen. 2024. Dense text
Zang, Yang Song, and Jun Xu. 2024. SAQRec: Aligning Recommender Systems retrieval based on pretrained language models: A survey. ACM Transactions on
to User Satisfaction via Questionnaire Feedback. In Proceedings of the 33rd ACM Information Systems 42, 4 (2024), 1–60.
International Conference on Information and Knowledge Management. 3165–3175. [55] Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou,
[52] Xiao Zhang, Teng Shi, Jun Xu, Zhenhua Dong, and Ji-Rong Wen. 2024. Model- Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, et al. 2023. A survey
Agnostic Causal Embedding Learning for Counterfactually Group-Fair Recom- of large language models. arXiv preprint arXiv:2303.18223 (2023).
mendation. IEEE Transactions on Knowledge and Data Engineering (2024). [56] Yutao Zhu, Huaying Yuan, Shuting Wang, Jiongnan Liu, Wenhan Liu, Chenlong
[53] Yue Zhang, Yafu Li, Leyang Cui, Deng Cai, Lemao Liu, Tingchen Fu, Xinting Deng, Haonan Chen, Zhicheng Dou, and Ji-Rong Wen. 2023. Large language
Huang, Enbo Zhao, Yu Zhang, Yulong Chen, et al. 2023. Siren’s song in the models for information retrieval: A survey. arXiv preprint arXiv:2308.07107 (2023).
AI ocean: a survey on hallucination in large language models. arXiv preprint [57] Yuchen Zhuang, Haotian Sun, Yue Yu, Qifan Wang, Chao Zhang, and Bo Dai. 2024.
arXiv:2309.01219 (2023). HYDRA: Model Factorization Framework for Black-Box LLM Personalization.
arXiv preprint arXiv:2406.02888 (2024).