0% found this document useful (0 votes)
19 views6 pages

Integrating Summarization and Retrieval For Enhanced Personalization Via Large Language Models

The document presents a novel approach to enhance personalization in natural language processing systems by integrating summarization and retrieval techniques using Large Language Models (LLMs). This method addresses limitations of traditional personalization methods, such as information loss and cold-start challenges, by generating task-aware user summaries that can be utilized in real-time applications like voice assistants. Experiments demonstrate that this summary-augmented method significantly reduces the amount of retrieved user data while maintaining or improving performance across various personalization tasks.

Uploaded by

ajesh.m
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views6 pages

Integrating Summarization and Retrieval For Enhanced Personalization Via Large Language Models

The document presents a novel approach to enhance personalization in natural language processing systems by integrating summarization and retrieval techniques using Large Language Models (LLMs). This method addresses limitations of traditional personalization methods, such as information loss and cold-start challenges, by generating task-aware user summaries that can be utilized in real-time applications like voice assistants. Experiments demonstrate that this summary-augmented method significantly reduces the amount of retrieved user data while maintaining or improving performance across various personalization tasks.

Uploaded by

ajesh.m
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Integrating Summarization and Retrieval for Enhanced

Personalization via Large Language Models


Christopher Richardson∗ Yao Zhang∗ Kellen Gillespie
Georgia Institute of Technology Amazon Alexa AI Amazon Alexa AI
Atlanta, GA, USA Austin, TX, USA Seattle, WA, USA
[email protected] [email protected] [email protected]

Sudipta Kar Arshdeep Singh Zeynab Raeesy


Amazon Alexa AI Amazon Alexa AI Amazon Alexa AI
Seattle, WA, USA Seattle, WA, USA Seattle, WA, USA
[email protected] [email protected] [email protected]

Omar Zia Khan Abhinav Sethy


Amazon Alexa AI Amazon Alexa AI
Seattle, WA, USA Seattle, WA, USA
[email protected] [email protected]

ABSTRACT CCS CONCEPTS


Personalization, the ability to tailor a system to individual users, is • Computing methodologies → Natural language generation;
an essential factor in user experience with natural language process- Information extraction; • Information systems → Personaliza-
ing (NLP) systems. With the emergence of Large Language Models tion.
(LLMs), a key question is how to leverage these models to better
personalize user experiences. To personalize a language model’s KEYWORDS
output, a straightforward approach is to incorporate past user data
LLM, Personalization, Summarization, NLP, Chatbot, Voice Assis-
into the language model prompt, but this approach can result in
tant, Conversational AI
lengthy inputs exceeding limitations on input length and incurring
latency and cost issues. Existing approaches tackle such challenges ACM Reference Format:
by selectively extracting relevant user data (i.e. selective retrieval) to Christopher Richardson, Yao Zhang, Kellen Gillespie, Sudipta Kar, Arshdeep
construct a prompt for downstream tasks. However, retrieval-based Singh, Zeynab Raeesy, Omar Zia Khan, and Abhinav Sethy. 2023. Integrat-
methods are limited by potential information loss, lack of more ing Summarization and Retrieval for Enhanced Personalization via Large
profound user understanding, and cold-start challenges. To over- Language Models. In Proceedings of ACM Conference (Conference’17). ACM,
come these limitations, we propose a novel summary-augmented New York, NY, USA, 6 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn
approach by extending retrieval-augmented personalization with
task-aware user summaries generated by LLMs. The summaries
can be generated and stored offline, enabling real-world systems
1 INTRODUCTION
with runtime constraints like voice assistants to leverage the power As virtual assistants and other natural language processing (NLP)
of LLMs. Experiments show our method with 75% less of retrieved systems become increasingly integrated into our daily lives, per-
user data is on-par or outperforms retrieval augmentation on most sonalization has become an essential factor in user experience.
tasks in the LaMP personalization benchmark. We demonstrate Tailoring virtual assistant interactions and NLP model outputs to
that offline summarization via LLMs and runtime retrieval enables individual users’ preferences, styles, needs, and contexts is essential
better performance for personalization on a range of tasks under in improving the performance of these systems to make them more
practical constraints. natural and conversational.
Traditional personalization methods, such as collaborative fil-
∗ Both authors contributed equally to this research.
tering [17], deep neural networks [8], deep interest network [22]
and their variations [4, 12], have enhanced user experiences in
recommendation systems. These methods leverage historical user
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed behavior data to make personalized recommendations, offering a
for profit or commercial advantage and that copies bear this notice and the full citation practical and effective solution for various domains. Despite their
on the first page. Copyrights for components of this work owned by others than ACM success, these methods still struggle with the cold-start problem,
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specific permission and/or a where new users lack sufficient behavior history, leading to sub-
fee. Request permissions from [email protected]. optimal recommendations. The cold-start problem highlights the
Conference’17, July 2017, Washington, DC, USA need for alternative approaches.
© 2023 Association for Computing Machinery.
ACM ISBN 978-x-xxxx-xxxx-x/YY/MM. . . $15.00 Large Language Models (LLMs) represent a promising avenue
https://doi.org/10.1145/nnnnnnn.nnnnnnn for advancing personalization techniques. LLMs have demonstrated
Conference’17, July 2017, Washington, DC, USA Richardson and Zhang et al.

remarkable capabilities in understanding context and generating co-


herent text [2]. By incorporating knowledge about users, LLMs can
potentially enhance personalization by capturing subtle user pref-
erences, but how to capture the full spectrum of user preferences
in a personalized manner remains a challenge. To personalize a lan-
guage model output, a straightforward approach is to incorporate
user data into the language model prompt. However, incorporat-
ing a comprehensive view of customer preferences with long-term
historical user data into the prompt may exceed the input length
limitations of language models and result in considerable increases
in inference cost. Further, language models tend to degrade with Figure 1: Personalization is achieved by combining runtime-
lengthy contexts [3]. To address these concerns, a personalized retrieved samples with an offline-generated user summary.
retrieval augmentation framework was proposed [16]. This frame- Given a textual input 𝑥 that describes a task in natural lan-
work selectively extracts relevant user data to construct prompts guage, the goal is to generate a personalized output 𝑦 for
for downstream language models. Recent work has also shown users. The retrieval model identifies the most relevant items
promise in combining retrieval approaches with LLMs to improve from user data, and the retrieved items along with the offline
performance in recommender systems [5, 11, 20], as well as gen- user summary and 𝑥 form the basis for creating a prompt.
eral NLP tasks [9, 10, 13, 14, 19]. However, retrieval-based methods This prompt is constructed using a prompt construction func-
have constraints in potential information loss, lack the ability to tion 𝜙𝑝 .
comprehend user data on a more profound level, and may suffer
from the cold-start problem.
Our research aims to address the aforementioned limitations challenge. Our method enables powerful LLMs to provide compre-
of both traditional personalization methods and retrieval-based hensive information about users with no additional runtime latency.
methods with LLMs by proposing the hybrid approach shown in Further, we implemented our proposed approach and conducted ex-
Figure 1. By integrating retrieval techniques with LLM-generated periments on a language model personalization benchmark dataset
summaries of user data, we intend to create a more robust personal- LaMP with 6 public tasks. With promise shown in our experiment
ized system. To prevent information loss, the user summary offers results, we envision a personalized system that better caters to
contextual information at a higher level of abstraction for the down- individual user preferences especially for new users by integrating
stream task. To understand user data on a more profound level, the summarization via LLMs and retrieval.
summary generation is aware of the task and incorporates this
information in the prompt for summary generation. For example, 2 METHODOLOGY
for a personalized paraphrase text generation task, the summary
2.1 Problem Formulation
model is instructed by a prompt to pay attention to the user writing
style in addition to the semantic content. Also, this hybrid model The problem formulation follows LaMP [16]: given a textual input
could overcome the cold-start problem and provide personalized 𝑥 that describes a task in natural language, we want to generate
outputs even in data-sparse scenarios by providing user summaries a personalized output 𝑦 for user 𝑢. The goal is thus to learn the
for new users based on available user data from other applications distribution 𝑝 (𝑦|𝑥, 𝑢).
or user’s self description. The summaries in our approach can be 2.1.1 Baseline. Our baseline is a retrieval-augmented method
generated offline and stored, ensuring negligible increased runtime that follows a retrieve-then-model paradigm [16]. For retrieval, a
latency and enabling systems with runtime constraints to leverage manually defined query generation function 𝜙𝑞 (𝑥) is first used to
the power of LLMs into real-work online applications, such as voice help extract salient information from 𝑥 as inputs to the retrieval
assistant scenarios. model R (𝜙𝑞 (𝑥), 𝑃𝑢 , 𝑘). The retrieval model returns the top-𝑘 rele-
We demonstrate our method of integrating summarization and vant items from the user profile 𝑃𝑢 using the retrieval query 𝜙𝑞 (𝑥𝑖 ),
retrieval on a publicly available Language Model Personalization and the returned items are used to construct a prompt for a down-
(LaMP) benchmark [16], including both text classification and gen- stream model using a prompt construction model 𝜙𝑝 . The input to
eration tasks across a variety of domains. Experiments show our the downstream language model is as follows.
method achieves comparable or better performance compared to re-
trieval augmentation on most tasks. With our method, the retrieval 𝑥 𝑖 = 𝜙𝑝 (𝑥𝑖 , R (𝜙𝑞 (𝑥𝑖 ), 𝑃𝑢 , 𝑘)) (1)
component can use 75% less of retrieved user data without sacri-
ficing performance on five out of six tasks, and achieves superior The downstream language model is fine-tuned on the dataset {𝑥 𝑖 , 𝑦𝑖 }.
performance on two tasks.
In summary, our main contributions are as follows. First, we 2.2 Integrating Summarization and Retrieval
propose augmenting traditional retrieval-based personalization Runtime constraints can limit the number of user data retrieved by
methods with LLMs’ summarization of user data to address the R that can be utilized. In this work, we consider scenarios where
limitations of existing methods: potential information loss, the in- there are both latency considerations as well as input length limits
ability to understand user data at a high level, and the cold-start for the downstream model. To improve performance without adding
runtime latency, we introduce a summary of the user data, 𝑠𝑢 , to
Integrating Summarization and Retrieval for Enhanced Personalization via Large Language Models Conference’17, July 2017, Washington, DC, USA

Table 1: Description of LaMP tasks and data. 3.2 Experimental Setup


Following LaMP, we used FlanT5-base [7] as our downstream model.
Task: Citation Identification (binary choice) This model demonstrated satisfactory runtime performance in our
LaMP-1 User Profile: Scholarly article titles and abstracts experiments (approximately 125 milliseconds per sample when we
Metric(s): Accuracy included as many user data as possible within the 512-token limit
Task: News Categorization (classification) of the input length) and, as shown in the LaMP experiments, it
LaMP-2 User Profile: Categorized articles published achieved superior performance to that achieved in zero-shot experi-
Metric(s): Accuracy and F1 ments with FlanT5-XXL and ChatGPT [16]. For all experiments, we
used the same settings reported in LaMP: a learning rate of 5 × 10 −5 ,
Task: Product Rating (classification) weight decay of 10 −4 , warmup ratio of 0.05, and a beam size of 4,
LaMP-3 User Profile: Product reviews and scores and we trained for 10 epochs for text classification (tasks 1-3) and
Metric(s): Mean Absolute Error (MAE) and Root 20 epochs for text generation (tasks 4, 5, and 7).
Mean Square Error (RMSE) In our experiments, we utilized the BM25 retrieval algorithm [15]
Task: News Headline Generation (text generation) due to its speed and performance. We found neural methods like
LaMP-4 Profile: News articles and their headlines Contriever [1] to be too slow for voice assistant scenarios, which
Metric(s): ROUGE-1 and ROUGE-L induced approximately 10-30 seconds of latency per sample while
not significantly outperforming BM25 on many LaMP tasks.
Task: Scholarly Title Generation (text generation)
We experimented with two instruction-tuned models for gener-
LaMP-5 Profile: Scholarly article titles and abstracts
ating summaries. The first is Vicuna [21], a 13-billion parameter
Metric(s): ROUGE-1 and ROUGE-L
model distilled from LLaMA [18]. The second model is ChatGPT
Task: Tweet Paraphrasing (text generation) using OpenAI’s API with the gpt−3.5−turbo−16k model. Vicuna
LaMP-7 Profile: Tweets has a context length of 2048 tokens, while ChatGPT’s limit is 16,384.
Metric(s): ROUGE-1 and ROUGE-L Prompts used for generating summaries are shown in Table 2. For
tasks 2 and 3, to achieve good performance given the simplicity of
the tasks, we constrained the summarization model to output ac-
augment the retrieved data: cording to a strict template shown in the Table 2, therefore ChatGPT
summaries were not included for these tasks.
𝑥 𝑖 = 𝜙𝑝 (𝑥𝑖 , R (𝜙𝑞 (𝑥𝑖 ), 𝑃𝑢 , 𝑘), 𝑠𝑢 ) (2) We compare our methods to the retrieval-only baselines, using 𝑘
values of 0, 1, and 4 for the baselines. We observed 𝑘 = 4 to be the
Our approach involves using LLMs to summarize salient informa- limit for some tasks given the context length of FlanT5 (512 tokens).
tion from 𝑢 as it relates to optimizing 𝑝 (𝑦|𝑥, 𝑢). We use instruction- Thus, we had to reduce the number of retrieved samples to fit the
tuned models to generate an abstractive summary of user data: summaries into the input of FlanT5 without truncating and chose
𝑠𝑢 = LLM(𝑃𝑢 ) (3) 𝑘 = 1 (for direct comparison with the baseline), as well as 𝑘 = 0 to
investigate the impact of summaries alone (no retrieval). We report
An overview of our method is shown in Figure 1. The summaries means of three repeated runs of each experiment for comparison
can be generated offline and stored along with the user data itself. with statistical significance.
At runtime, the retrieval algorithm retrieves the top-𝑘 profile entries
and concatenates them with the task input and the summary to
create the full context for the downstream language model, which
4 RESULTS AND DISCUSSION
is fine-tuned using the standard language modeling loss against the Table 3 shows the results of our methods for both summary models
output 𝑦. alongside the baselines across various NLP tasks. Our experiments
prove our summary-augmented method with 𝑘 = 1 is on-par or
3 EXPERIMENTS outperforms the retrieval-only baseline with 𝑘 = 4 on most tasks
(reducing the amount of retrieved user data by 75%). As the bold re-
3.1 Datasets and Evaluation Metrics sults indicate the best results among compared experiments for each
LaMP is a public benchmark dataset for training and evaluating task, our methods (Vicuna Summ. and GPT-3.5 Summ.) outperform
methods for personalization with language models [16]. It consists baselines consistently on tasks 1 and 2 at a statistical significance
of seven personalization NLP tasks, including three classification level with 𝑝−𝑣𝑎𝑙𝑢𝑒 < 0.05, and we achieve comparable performance
tasks and four text generation tasks. Data for each task include on tasks 3, 5, and 7 with no statistically significant difference.
input text, reference completion text as ground truth output, and On LaMP-1 task, our method GPT-3.5 Summ. outperforms the
a user profile consisting of an array of items with textual data. baseline with 𝑘 = 4 using only offline generated summaries (𝑘 =
A brief description of each task and their evaluation metrics are 0). Furthermore, Our method GPT-3.5 Summ. yields better results
shown in Table 1. We have excluded task LaMP-6 from our study compared to 𝑘 = 1 baseline on all tasks. Worth noting that ChatGPT
as it relies on private data to which we do not have access. The summaries mostly outperform those provided by Vicuna, likely due
LaMP benchmark organizes the data in both user-based separation to the disparity in model size.
and time-based separation. In our study, we utilize the user-based We observed a gap between Vicuna Summ. and GPT-3.5 Summ.
separation approach to address cold-start issues for new users. attributed to the differing quality of the offline summaries.. Despite
Conference’17, July 2017, Washington, DC, USA Richardson and Zhang et al.

Table 2: Prompts used for summarization. Additional tokens were used for Vicuna summaries to match the expected prompt
format for that model but the content was the same for Vicuna and ChatGPT.

Task Prompt
LaMP-1 Write a summary, in English, of the research interests and topics of a researcher who has published the following papers.
Only generate the summary, no other text.
LaMP-2 Look at the following past articles this journalist has written and determine the most popular category they write in.
Answer in the following form: most popular category: <category>
LaMP-3 Based on this user’s past reviews, what are the most common scores they give for positive and negative reviews? Answer
in the following form: most common positive score: <most common positive score>, most common negative score: <most
common negative score>
LaMP-4 Given this author’s previous articles, try to describe a template for their headlines. I want to be able to accurately predict
the headline gives one of their articles. Be specific about their style and wording, don’t tell me anything generic.
LaMP-5 Given this author’s previous publications, try to describe a template for their titles. I want to be able to accurately predict
the title of one of the papers from the abstract. Only generate the template description, nothing else.
LaMP-7 Given this person’s previous tweets, try to describe a template for their tweets. I want to take a generic sentence and
rephrase it to sound like one of their tweets, with the same style/punctuation/capitalization/wording/tone/etc. as them.
Only give me the template description, nothing else.

Table 3: Results for FlanT5-base model fine-tuned on LaMP benchmark tasks. Baseline: retrieval of k user data entries; Vicuna
Summ.: Baseline + summary of user data generated by vicuna; GPT-3.5 Summ.: Baseline + summary of user data generated by
GPT-3.5. Underline means summary improved the corresponding baselines with the same k, and bold means the best results
among compared experiments for each task. For all metrics, higher is better except in the case of MAE and RMSE used for
LaMP-3.

Baseline Vicuna Summ. GPT-3.5 Summ.


Task Metric 𝑘=0 𝑘=1 𝑘=4 𝑘=0 𝑘=1 𝑘=0 𝑘=1
LaMP-1: Personalized
Accuracy 0.516 0.650 0.709 0.704 0.728 0.738 0.743
Citation Identification
LaMP-2: Personalized Accuracy 0.731 0.782 0.807 0.801 0.814
N/A
News Categorization F1 0.511 0.573 0.574 0.550 0.601
LaMP-3: Personalized MAE 0.311 0.284 0.280 0.305 0.277
N/A
Product Rating RMSE 0.626 0.595 0.593 0.632 0.594
LaMP-4: Personalized ROUGE-1 0.152 0.177 0.188 0.157 0.173 0.170 0.181
News Headline Generation ROUGE-L 0.137 0.162 0.173 0.142 0.159 0.155 0.166
LaMP-5: Personalized ROUGE-1 0.424 0.447 0.448 0.426 0.447 0.424 0.448
Scholarly Title Generation ROUGE-L 0.382 0.408 0.409 0.386 0.408 0.383 0.409
LaMP-7: Personalized ROUGE-1 0.510 0.502 0.513 0.510 0.512 0.510 0.512
Tweet Paraphrasing ROUGE-L 0.455 0.448 0.459 0.455 0.459 0.456 0.460

studies showing Vicuna achieving up to 90% the performance of 5 CONCLUSION


ChatGPT [6], our results suggest that Vicuna did not perform as This paper introduces a novel method for augmenting retrieval with
well as GPT-3.5 on task LaMP-1 and 4. To assess the summary offline summarization for improving personalization in various NLP
quality, we provide examples in the Appendix. A. tasks. We implemented our method and achieved comparable or
While we have shown promise in combining offline summaries better performance on most NLP tasks in the LaMP personalization
with runtime retrieval for personalization, there are a few limi- benchmark while reducing the amount of retrieved user data by
tations to this work. For one, the data and tasks provided in the 75%. In some cases, we even achieved superior performance after
LaMP benchmark are simplistic and narrow in scope. More work removing retrieval entirely, showing an advantage for sparse data
is needed to assess the potential of our method on more realistic scenarios, such as the cold-start problem. Our method efficiently
user data. Also, the benefits of summarization can be improved by leverages offline summary generation and is suitable for runtime
fine-tuning a larger language model and end-to-end training for constrained applications such as voice assistants.
the tasks.
Integrating Summarization and Retrieval for Enhanced Personalization via Large Language Models Conference’17, July 2017, Washington, DC, USA

ACKNOWLEDGMENTS Conference on Recommender Systems. 502–505.


[21] Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu,
Thanks to Camden Parsons, Farideh Tavazoee, Xingzhi (Jacky) Guo Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric Xing, et al. 2023.
for reviewing and providing feedbacks on our initial paper draft. Judging LLM-as-a-judge with MT-Bench and Chatbot Arena. arXiv preprint
arXiv:2306.05685 (2023).
[22] Guorui Zhou, Chengru Song, Xiaoqiang Zhu, Ying Fan, Han Zhu, Xiao Ma,
Yanghui Yan, Junqi Jin, Han Li, and Kun Gai. 2018. Deep Interest Network
REFERENCES for Click-Through Rate Prediction. https://doi.org/10.48550/arXiv.1706.06978
[1] Akari Asai, Timo Schick, Patrick Lewis, Xilun Chen, Gautier Izacard, Sebastian arXiv:1706.06978 [cs, stat].
Riedel, Hannaneh Hajishirzi, and Wen-tau Yih. 2022. Task-aware retrieval with
instructions. arXiv preprint arXiv:2211.09260 (2022).
[2] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, A SUMMARY EXAMPLES
Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda
Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, We are presenting examples in which the LaMP baseline model
Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter,
Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin
exhibited errors, while our methods demonstrated superior perfor-
Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya mance in Example 1, and in which the LaMP baseline model and
Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. Vicuna Summ. exhibited errors while GPT-3.5 Summ. demonstrated
In Advances in Neural Information Processing Systems, H. Larochelle, M. Ran-
zato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, superior performance in Example 2.
Inc., 1877–1901. https://proceedings.neurips.cc/paper_files/paper/2020/file/
1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf
[3] Jiuhai Chen, Lichang Chen, Chen Zhu, and Tianyi Zhou. 2023. How Many A.1 Example 1
Demonstrations Do You Need for In-context Learning? arXiv:2303.08119 [cs.AI] Task: LaMP-1
[4] Qiwei Chen, Changhua Pei, Shanshan Lv, Chao Li, Junfeng Ge, and Wenwu
Ou. 2021. End-to-End User Behavior Retrieval in Click-Through RatePrediction Input: For an author who has written the paper with the title
Model. http://arxiv.org/abs/2108.04468 arXiv:2108.04468 [cs]. "QuickSense: Fast and energy-efficient channel sensing for dynamic
[5] Zheng Chen. 2023. PALR: Personalization Aware LLMs for Recommendation.
arXiv preprint arXiv:2305.07622 (2023).
spectrum access networks", which reference is related? Just answer
[6] Wei-Lin Chiang, Zhuohan Li, Zi Lin, Ying Sheng, Zhanghao Wu, Hao Zhang, with [1] or [2] without explanation. [1]: "A first look at cellular net-
Lianmin Zheng, Siyuan Zhuang, Yonghao Zhuang, Joseph E. Gonzalez, Ion Stoica, work performance during crowded events" [2]: "On the complexity
and Eric P. Xing. 2023. Vicuna: An Open-Source Chatbot Impressing GPT-4 with
90%* ChatGPT Quality. https://lmsys.org/blog/2023-03-30-vicuna/ of scheduling in wireless networks"
[7] Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Output: [1]
Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, et al. 2022. Scaling Retrieved user data: On the Stability of Rational, Heteroge-
instruction-finetuned language models. arXiv preprint arXiv:2210.11416 (2022).
[8] Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep Neural Networks neous Interdomain Route Selection
for YouTube Recommendations. In Proceedings of the 10th ACM Conference on A decision-theoretic approach to resource allocation in wireless
Recommender Systems. New York, NY, USA.
[9] Shiran Dudy. 2022. Personalization and Relevance in NLG. In Companion Pro-
multimedia networks
ceedings of the Web Conference 2022. 1178–1178. A general algorithm for interference alignment and cancellation
[10] Lucie Flek. 2020. Returning the N to NLP: Towards contextually personalized in wireless networks
classification models. In Proceedings of the 58th annual meeting of the association
for computational linguistics. 7828–7838. Enterprise social network analysis and modeling: A tale of two
[11] Jinming Li, Wentao Zhang, Tian Wang, Guanglei Xiong, Alan Lu, and Gerard graphs
Medioni. 2023. GPT4Rec: A generative framework for personalized recommen- Baseline Prediction: [2]
dation and user interests interpretation. arXiv preprint arXiv:2304.03879 (2023).
[12] Pi Qi, Xiaoqiang Zhu, Guorui Zhou, Yujing Zhang, Zhe Wang, Lejian Ren, Ying User Summary by Vicuna: (The original output is in Chinese
Fan, and Kun Gai. 2020. Search-based User Interest Modeling with Lifelong and here is the translation:) Network Traffic Analysis
Sequential Behavior Data for Click-Through Rate Prediction. https://doi.org/10.
48550/arXiv.2006.05639 arXiv:2006.05639 [cs, stat].
Wireless Communication
[13] Hongjin Qian, Xiaohe Li, Hanxun Zhong, Yu Guo, Yueyuan Ma, Yutao Zhu, Software-Defined Networking
Zhanliang Liu, Zhicheng Dou, and Ji-Rong Wen. 2021. Pchatbot: a large-scale Traffic Routing
dataset for personalized chatbot. In Proceedings of the 44th international ACM
SIGIR conference on research and development in information retrieval. 2470–2477. Low-Latency Communication
[14] Omid Rafieian and Hema Yoganarasimhan. 2022. AI and Personalization. Avail- Multimedia Networks
able at SSRN 4123356 (2022). Network Security
[15] Stephen E Robertson, Steve Walker, Susan Jones, Micheline M Hancock-Beaulieu,
Mike Gatford, et al. 1995. Okapi at TREC-3. Nist Special Publication Sp 109 (1995), Error Handling in Networks
109. Sensor Networks
[16] Alireza Salemi, Sheshera Mysore, Michael Bendersky, and Hamed Zamani. 2023.
LaMP: When Large Language Models Meet Personalization. arXiv preprint
Multi-Path Routing
arXiv:2304.11406 (2023). IP Routing
[17] J. Ben Schafer, Dan Frankowski, Jon Herlocker, and Shilad Sen. 2007. Collaborative Traffic Analysis Tools
Filtering Recommender Systems. In The Adaptive Web: Methods and Strategies of
Web Personalization, Peter Brusilovsky, Alfred Kobsa, and Wolfgang Nejdl (Eds.). Vicuna Summ. Prediction: [1]
Springer, Berlin, Heidelberg, 291–324. https://doi.org/10.1007/978-3-540-72079- User Summary by GPT-3.5: The research interests of this re-
9_9 searcher span several areas in computer networking and wireless
[18] Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne
Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal communications. They have published papers on topics such as uni-
Azhar, et al. [n. d.]. LLaMA: open and efficient foundation language models, 2023. fied cellular and ad-hoc network architecture, optimization of wire-
URL https://arxiv. org/abs/2302.13971 ([n. d.]).
[19] Yuwei Wu, Xuezhe Ma, and Diyi Yang. 2021. Personalized response generation
less mesh networks, networking analytics, incentive-compatible
via generative split memory network. In Proceedings of the 2021 Conference of the routing protocols, software-defined networking, and network secu-
North American Chapter of the Association for Computational Linguistics: Human rity. They have expertise in optimizing network throughput, design-
Language Technologies. 1956–1970.
[20] Jiajing Xu, Andrew Zhai, and Charles Rosenberg. 2022. Rethinking personalized ing scalable network architectures, developing game-theoretic and
ranking at Pinterest: An end-to-end approach. In Proceedings of the 16th ACM cryptographic techniques for wireless networks, and leveraging
Conference’17, July 2017, Washington, DC, USA Richardson and Zhang et al.

machine learning and data analytics for network monitoring and demonstrates a strong emphasis on efficient and scalable solutions
diagnosis. Their recent works also focus on addressing the latency for a wide range of machine learning and data analysis tasks.
and energy efficiency challenges in wireless multimedia networks GPT-3.5 Summ. Prediction: [1]
and leveraging software-defined networking for next-generation
cellular networks. In summary, this researcher’s work revolves
around developing novel algorithms, protocols, and systems that
improve the performance, resilience, and security of networked
systems.
GPT-3.5 Summ. Prediction: [1]

A.2 Example 2
Task: LaMP-1
Input: For an author who has written the paper with the title
"Visual-audio integration for user authentication system of partner
robots", which reference is related? Just answer with [1] or [2]
without explanation. [1]: "Pronunciation Modeling for Improved
Spelling Correction" [2]: "Path following algorithm for highly re-
dundant manipulators"
Output: [1]
Retrieved user data: Advancing Matrix Completion by Model-
ing Extra Structures beyond Low-Rankness
Improved Asymmetric Locality Sensitive Hashing (ALSH) for
Maximum Inner Product Search (MIPS)
Real-Time Implementation of Improved State-Space MPC for Air
Supply in a Coke Furnace
On Practical Algorithms for Entropy Estimation and the Im-
proved Sample Complexity of Compressed Counting
Baseline Prediction: [2]
User Summary by Vicuna: 1. Nystrom Method for Approxi-
mating the GMM Kernel.
2. Very sparse random projections.
3. Very sparse stable random projections for dimension reduction
in lalpha (0 lt;alphalt;=2) norm.
4. A Comparison Study of Nonlinear Kernels.
5. One sketch for all: Theory and Application of Conditional
Random Sampling.
6. Collaborative Multi-objective Ranking.
7. Accurate Estimators for Improving Minwise Hashing and b-Bit
Minwise Hashing.
8. A new space for comparing graphs.
9. Stability and Risk Bounds of Iterative Hard Thresholding.
10. Hashing Algorithms for Large-Scale Learning.
11. Variational Flow Graphical Model.
Vicuna Summ. Prediction: [2]
User Summary by GPT-3.5: The researcher’s primary research
interests revolve around the development of practical algorithms
for entropy estimation, compressed counting, and matrix comple-
tion. They explore various techniques for improving the efficiency
and accuracy of these methods. Additionally, the researcher focuses
on advancing hashing algorithms, particularly in the context of
maximum inner product search (MIPS) and locality-sensitive hash-
ing (LSH). They also investigate sketch-based sampling techniques,
especially for sparse data, and propose novel approaches for esti-
mating entropy of data streams. Furthermore, the researcher has
contributions in optimization methods, classification, and dimen-
sionality reduction using stable random projections. Their work

You might also like