Developing a Recommender System Using Large Language Models
Developing a Recommender System Using Large Language Models
Despite the ubiquitousness and success of recommendation systems, they still face huge challenges such
as the explainability problem.
To achieve this, we fine-tuned a language model of 13 billion parameters to take user preferences in
natural language format as input, with instructions to provide recommendation choices that matches user
preferences most and provide explanations during inference. Our experimental results demonstrate that our
solution performs excellently in solving the cold start and explainability problems encountered in modern
recommendation systems.
Acknowledgements
I would like to acknowledge the efforts and guidance of my supervisor, Professor Irena Spasic, whose
insights and feedback helped shape this project into comprehensive and impactful work. I would also like to
acknowledge my family whose support made any of this possible.
Table of content
Abstract ........................................................................................................................................................................... 2
Acknowledgements......................................................................................................................................................... 2
Table of content .............................................................................................................................................................. 3
1. Introduction.............................................................................................................................................................. 5
1.1 Introduction ....................................................................................................................................................... 5
1.2 Problem............................................................................................................................................................... 5
1.2.1 Cold start problem .................................................................................................................................... 5
1.2.2 Sparsity problem ....................................................................................................................................... 5
1.2.3 Explainability problem ............................................................................................................................. 6
1.3 Research objectives ........................................................................................................................................ 6
1.4 Scope and Limitations .................................................................................................................................... 6
1.5 Organization ...................................................................................................................................................... 7
1.6 Conclusion ......................................................................................................................................................... 7
2. Literature Review.................................................................................................................................................... 8
2.1 Introduction ....................................................................................................................................................... 8
2.2 Recommender Systems.................................................................................................................................. 8
2.3 Large language models .................................................................................................................................. 9
3. Architecture ........................................................................................................................................................... 12
3.1 Introduction ..................................................................................................................................................... 12
3.2 Architectural goals and objectives ............................................................................................................ 12
3.2.1 Goals .......................................................................................................................................................... 12
3.2.2 Architectural objectives ........................................................................................................................ 12
3.3 Architectural styles and patterns ............................................................................................................... 13
3.4 System components and modules ............................................................................................................ 14
3.4.1 The web service component ................................................................................................................ 15
3.4.2 The AI service component .................................................................................................................... 15
3.4.3 The LLM model component .................................................................................................................. 16
3.5 Data flow and communication .................................................................................................................... 16
3.6 Conclusion ....................................................................................................................................................... 17
4. Implementation ..................................................................................................................................................... 18
4.1 Introduction ..................................................................................................................................................... 18
4.2 Software requirements.................................................................................................................................. 18
4.3 Software Process Model ............................................................................................................................... 18
4.4 System Overview ............................................................................................................................................ 19
4.4 High-Level Design .......................................................................................................................................... 21
4.4.1 User Interaction Flow ............................................................................................................................. 21
4.4.2 Language model Integration. ............................................................................................................... 21
4.4.3 Postgres Database Integration ............................................................................................................ 21
4.5 Implementation ............................................................................................................................................... 22
4.5.1 Preferences page .................................................................................................................................... 22
4.5.2 Recommendation results page............................................................................................................ 25
4.7 Conclusion ....................................................................................................................................................... 26
5. Result and Evaluation ......................................................................................................................................... 27
5.1 Introduction ..................................................................................................................................................... 27
5.2 Experimental setup ........................................................................................................................................ 27
5.2.1 Dataset ....................................................................................................................................................... 27
5.2.2 Metrics ....................................................................................................................................................... 27
5.2.3 Result ......................................................................................................................................................... 28
6 Reflection................................................................................................................................................................. 29
6.1 Introduction ..................................................................................................................................................... 29
6.2 Motivations ...................................................................................................................................................... 29
6.3 Challenges ....................................................................................................................................................... 29
6.4 Lessons Learned ............................................................................................................................................ 29
7. Conclusion and future work ............................................................................................................................. 31
7.1 Introduction ..................................................................................................................................................... 31
7.2 Contributions .................................................................................................................................................. 31
7.3 Future work ...................................................................................................................................................... 31
References .................................................................................................................................................................. 33
1. Introduction
1.1 Introduction
Since the early 1990s, researchers have been looking for ways to harness the opinions of people online in
an effort to distinguish useful information from noise (Jannach et al, 2010).
This has become particularly relevant given the amount of data generated in this century. According to IBM,
we currently generate 2.5 quintillion bytes of data daily (IBM, 2020).
Every minute, Spotify adds 13 new songs; 4.2 million videos are watched on YouTube, Instagram posts
46,700 photos, and 3.6 million searches are done on Google (Schrage, 2020) and these numbers are only
going to keep going up.
This is why companies are increasingly adopting recommendations systems as the first line of defense
against consumer over-choice to great success: 80% of what people watch on Netflix comes from their
recommendation engine (Gomez-Uribe et al, 2016), Alibaba tripled its gross merchandise volume (GMV)
within a year due to their machine-learning enhanced recommenders and TikTok is the fasting growing
social media platform due to its recommendation engine (Schrage, 2020).
Modern recommender systems have been leveraging the advances made in deep learning to improve their
performance by overcoming problems faced in conventional models (Zhang et al, 2019).
YouTube leverages a deep neural network-based recommendation algorithm for video recommendation
(Covington et al, 2016), Yahoo! News uses an RNN-based recommender system (Shumpei et al, 2017)
and Facebook uses a state-of-the-art deep learning recommendation model (DLRM) to tackle
personalization and recommendation tasks (Naumov et al, 2019).
1.2 Problem
Despite the success of various recommendation methods such as collaborative filtering (CF) (Herlock et al,
1999), content-based methods (Balabanović et al, 1997), and deep learning-based methods (Covington et
al, 2016), there are still existing challenges faced by modern recommender engines: cold start problem
(Lika et al, 2014), sparsity problem (Wang et al, 2018) and explainability problem (Chen et al, 2022).
Serendipitous recommendations are not sufficient; people need to be able to see and grasp the underlying
“why.” “Recommendation explainability” has consequently become one of the most major areas in
recommender systems research (Schrage, 2020).
• Can a large language model-based recommendation engine solve the cold start problem by
providing relevant recommendations without knowing the user’s movie history?
• Can a large language model-based recommendation engine solve the sparsity problem by providing
relevant recommendations without knowing the relationship between user and movies such as
ratings and reviews?
• Can a large language model-based recommendation engine solve the explainability problem by
generating relevant explanation for every recommendation choice it makes?
1.5 Organization
The dissertation is structured into six chapters, each focusing on a distinct aspect of building a
recommendation engine based on a large language model. Chapter 1 introduces the problem, research
questions, and objectives of the study. Chapter 2 delves into the literature review, presenting an overview
of related work on recommendation systems and large language models. Chapter 3 elaborates on the
architectural design and components of the proposed recommendation engine. Chapter 4 discusses the
implementation details. Chapter 5 discusses the results and evaluations of the research. Chapter 6
discusses the reflections and lessons learnt working on this project and lastly, Chapter 7 concludes the
dissertation by summarizing the findings, discussing contributions, and highlighting potential avenues for
future research.
1.6 Conclusion
This dissertation seeks to leverage the reasoning capabilities of large language models to generate
effective content recommendations by providing acceptable explanations for recommendation choices. By
devising an architecture that seamlessly integrates these two domains, we aim to contribute to the field of
recommendation systems and software development alike. Through this journey, we endeavor to create a
recommendation engine that is robust in design and has a transformative impact on the field of
recommendation engine design.
2. Literature Review
2.1 Introduction
Recommender systems have become a mainstay application area of breakthrough research in machine
learning and natural language processing in areas such as e-commerce, internet music and video, news,
and other fields that generate enormous amounts of data (Amatriain & Basilico, 2016).
The breakthrough in the use of transformer-based architectures in building large language models has led
to impressive performance in the application of large language models in generative tasks in fields like
education (Milano et al, 2023), finance (Wu et al, 2023), and healthcare (Sallam, 2023).
This chapter summarizes the work and progress made in recommendation systems and the application of
deep learning technologies in building recommender systems.
This chapter also briefly explores large language models, the categories of large language models, how
large language models work and highlights earlier research done in the application of large language
models in the development of recommendation systems.
According to their architecture, there are three main categories of large language models:
I. Encoder-only models uses bi-directional attention to process token sequences, considering both
the left and right context of each token to learn contextual representations. Examples include BERT
(Devlin et al, 2018).
II. Decoder-only models use a self-attention mechanism for one directional word sequence
processing from left to right. Examples include GPT (Brown et al, 2020) and CPM (Zhang et al,
2021).
III. Encoder-decoder models handle any text-to-text task by converting every natural language
processing problem into a text generation problem. Examples include T5 (Raffel et al, 2020) and
ERNIE 3.0 (Sun et al, 2021).
These models are trained on enormous amounts of text data and use a combination of techniques such as
word embeddings, attention mechanisms, and layered neural networks to learn the relationships between
words and the context in which they are used.
In the following chapter, we will delve into the core elements of the software architecture for the
recommendation engine, examining the architectural styles and patterns that underpin its successful
implementation.
We will explore the intricacies of data handling, model deployment, and user interaction, all within the
context of our pursuit to craft a recommendation engine that leverages the transformative capabilities of
large language models.
By addressing these architectural goals and objectives, the proposed recommendation engine will provide
personalized recommendations by leveraging the power of large language models and providing a reliable,
scalable, and personalized user experience.
3.3 Architectural styles and patterns
Like many recommendation systems, the proposed recommendation platform is server based, and a client
is necessary to access the functionality. The system can support simple clients, like a web-based interface,
mobile applications or clients tailored to support specific features of the system.
At a high level, the proposed platform uses the n-tier architecture (Varma 2009) in its design. This
architecture is the de facto design pattern for web applications due to its simplicity, familiarity, and low cost
(Richards & Ford, 2020).
Accordingly, the proposed recommendation engine is a collection of layers with each layer performing
specific a role in the application:
i. Presentation layer: This layer handles all user interface and browser communication logic. Users
interact with the application using this layer and its main purpose is to provide an interface where
users can view recommended content, provide feedback, and engage with the system. This layer is
typically developed using HTML, CSS, JavaScript, Kotlin, Swift or Java languages.
ii. Application layer: This layer is responsible for executing specific business rules associated with
the request made by the user through the presentation layer. This layer processes user interactions,
coordinates recommendation generation, and manages real-time updates. This layer also leverages
insights from user profile and language model engine to produce tailored recommendations. The
application layer can also add, delete or modify data in the data layer. The application layer is
typically developed using Python, Java, Perl, PHP, or Ruby, and communicates with the data layer
using API calls such as REST (Wilde and Pautasso, 2011), SOAP (Box et al, 2000) or RPC
(Remote Procedure Call) (Srinivasan, 1995).
iii. Database layer: The data management layer handles data storage and retrieval. It interacts with
databases, data sources, and external services to save user profiles, behavior logs, content
metadata, and other relevant data. This can be a relational database management system (Jatana
et al, 2012) such as PostgreSQL, MySQL, MariaDB, Oracle, DB2, Informix or Microsoft SQL Server,
or in a NoSQL Database server (Gajendran, 2012) such as Cassandra, CouchDB or MongoDB.
Figure 2: System Architecture for proposed recommendation engine
The web service component also interacts with the AI service component to generate recommendations by
providing data such movie library and user preferences to the AI service component which interacts with
the AI model component to generate recommendations based on the data provided.
The web service component receives the refined recommendations and processes them before releasing
them as a final recommendation to the user. This component also tracks the users’ interaction with the
system and tracks the recommendations.
The AI service component serves as an ETL pipeline (Vassiliadis, 2009) that extracts, transforms, and load
the user data into machine-readable format and transfers the data to the LLM model that ingests the data
and generates recommendations based on the provided format.
The AI service component takes a set of candidate recommendations generated by the LLM model
component, and then reorders and removes items from the set to generate a refined recommendation. The
refined recommendation is then passed back to the web service engine which then passes it to the
presentation layer for display to the user.
3.4.3 The LLM model component
The LLM model component is the large language model engine responsible for generating
recommendations by picking a set of reasonable candidates for a recommendation based on user data
provided. Items are picked by applying one or more heuristics and adding the identified candidates to a set
which is then returned to the AI service component.
The LLM model component is responsible for tasks such as pre-training, fine-tuning, prompting and
completion (Fan et al, 2023).
Pre-training helps the LLM recognize and generate coherent responses by training on diverse and
unlabeled data. By doing this, the LLM model engine understands different grammar, syntax, semantics,
and human-like reasoning.
Fine-tuning helps the LLM model engine to understand movie specific domain knowledge by training the
LLM model on task-specific datasets such as movie information and knowledge. This process allows the
LLM model engine to improve its performance in the recommendation domain.
Prompting helps the LLMs (Large Language Models) to achieve better results by using text templates
applied to LLM model input to adapt to user specific scenarios (Fan et al, 2023). This allows the LLM model
engine to provide recommendations specific to the current user.
The LLM model engine returns recommendations in the form of completions (OpenAI, 2023) based on the
data and the prompt provided.
1. Data ingestion and collection: The data flow journey begins with the ingestion and collection of
data sources, including user preferences, movie content metadata, and textual content. These data
can originate from the user, databases, and external sources. The data is aggregated, and
metadata is extracted to form a comprehensive foundation for generating recommendations.
2. User interaction capture: User interactions, such as reviews, views, and ratings, are captured in
real time. These interactions provide valuable signals that shape the user profiles and influence the
recommendations. Capturing user feedback and engagement is an ongoing process that feeds into
the system's learning loop.
3. Preprocessing and cleansing: The data collected undergoes preprocessing to ensure its quality
and uniformity. This involves tasks such as data cleaning, normalization, and transformation.
Textual content is tokenized, and linguistic features are extracted to facilitate analysis by the
language model.
4. Large language model integration: The language model integration phase involves
communicating the preprocessed data to the language model for semantic analysis. This interaction
enables the model to understand context, sentiment, and nuances within the data. The language
model's outputs are then utilized in the recommendation process.
5. Real time recommendation generation: Based on the insights gained from the data, the language
model generates recommendations which are assessed and matched with user preferences.
3.6 Conclusion
In this chapter, we have delved into the software architecture of the proposed recommendation platform.
The architectural decisions made are pivotal in shaping the effectiveness, scalability, and user satisfaction
of the recommendation platform. Through the exploration of architectural goals, design principles, and
specific components, we have laid the foundation for a robust and sophisticated system that leverages the
capabilities of modern AI technology.
• Users should be able to request movie recommendations without specifying their watch history.
• Users should be able to request movie recommendations by specifying the genres of movies that
they like.
• Users should be able to request movie recommendations by specifying the countries they prefer.
• Users should be able to see the reasons why a movie was recommended.
• The software should be able to handle enormous amounts of data and have low latency
I utilized the waterfall process model for this project due to the following reasons:
1. It provides a structured and sequential approach to software development.
2. Each phase is well-defined and follows a logical progression which provides simplicity and clarity.
3. Its clear-cut structure allows for thorough planning and documentation and focuses on
understanding the project requirement.
4. It provides a level of predictability in the development process that makes it easy to establish
realistic timelines.
Frontend: The frontend, developed using JavaScript and Ruby on Rails, provides an intuitive and visually
appealing interface for users to interact with the platform. Through this interface, users can specify the
genres and countries of origin of movies and receive personalized movie recommendations.
The frontend will be built using HTML, CSS, and JavaScript languages.
Backend: The backend, implemented in Python and Ruby on Rails, serves as the engine that processes
user requests, handles business logic, and communicates with the machine learning model.
The backend is powered by two different APIs:
• Ruby API: This API processes user requests, handles business logic, save and query user data in
the database. This API connects the front end to the database and handles all user interactions. I
chose Ruby as the language of choice and Ruby on rails as framework of choice for this API due to
its built in security features such as such as protection against common web vulnerabilities like SQL
injection, Cross-Site Scripting (XSS), and Cross-Site Request Forgery (CSRF) and its
encouragement of the use of RESTful architecture for designing APIs by providing clean and
predictable URL structures.
• Python API: This API processes the interaction between the platform and the machine learning
engine that powers the movie recommendation. This API powers the ETL pipeline that extracts,
transforms, and loads data from the database to the model engine, vector storage and retrieval
actions. I chose Python for this API due to its rich ecosystem and ease of use for machine learning
projects. Also, I chose the FastAPI web framework due to its ability to handle multiple requests
asynchronously, an ability that is important to ensure that the system can handle multiple requests
with low latency.
Machine Learning Service: The movie recommendations are generated by the machine learning service
powered by Facebook’s Llama-2 language model.
This service takes user data such as user movie preferences, viewing history, movie profiles and converts
them to embedding vectors that can be easily consumed by machine learning models and algorithms. This
service also returns predicted completions and probabilities of alternatives when given a prompt.
There are several popular language models available, including GPT-3 (Brown et al, 2020), GPT-4(OpenAI,
2023), LaMDa (Thoppilan et al, 2022), Llama (Touvron et al, 2023), PaLM 2 (Anil et al, 2023). Each model
has its strengths and weaknesses, and the choice between them depends on the specific use case and
technical requirements.
LLaMA-2 (Touvron et al. 2023) is a recent language model, released in July 2023, that has gained
popularity due to its high performance and efficiency.
I chose to use Llama-2 over the other language models available for the following reasons:
1. Architecture: LlaMa-2 adopted grouped-query attention (Ainslie et al., 2023) technique, to speed
up its inference process.
2. Efficiency: LlaMa-2 has been designed with efficiency in mind, using less computation and
memory. This makes it particularly useful for deployment of resource-constrained devices or for
applications where computational resources are limited.
3. Open-source: LlaMa-2 is open-source, which allows researchers and developers to modify and
customize the model to fit their specific needs.
4. Pre-training: LLaMA-2 is pre-trained on 2 trillion tokens of data (Touvron et al, 2023), including the
internet, which helps it learn a rich set of semantic and syntactic features.
Postgres Database: Postgres is used to store the user data, movie information, and other relevant data
necessary for generating accurate recommendations. The Postgres database is also used to store, and the
embedding vectors generated by the machine learning service.
There are several popular databases available including: MySQL, MongoDB, Oracle, Microsoft SQL server,
MariaDB, SQLite and many more.
I decided to use Postgres database because of its support for vector similarity search (using the pgvector
library) and because of my previous experience working with it.
The high-level design outlines the interactions between the application components:
4.5 Implementation
The major pages in the platform are the preferences page and the recommendations result page.
The movie dropdown is dynamically populated by movie data using the TVMaze API. I implemented it by
writing a typeahead function in JavaScript that takes user input and searches the TVMaze API using the
input and returns the results.
Figure 9: TV show typeahead function.
The genre dropdown down is fetched from a static array of movies containing genres such as Drama,
Crime, Espionage and more.
The countries dropdown is also populated by a typeahead function that automatically fetches a list of all
countries using the countriesnow (https://countriesnow.space/api/v0.1/countries/iso) API.
Figure 11: Countries typeahead function
Once a user chooses their preferences, they click on the Give me recommendations button and the API
fetches the recommendation from the language model (Llama-2) service.
Figure 12: Preferences page (User selects preferences)
• Prompt generation: The Ruby API generates a prompt using the user data and a pre-generated
prompt template.
• Completion: The Ruby API sends the generated prompt to the language model service through the
Python API. The language model generates a recommendation based on the prompt provided and
then streams the response back to the Python API.
Figure 14: Python API method interacting with the Language model service.
The recommendation generated by the language model service is parsed by the Python API and then
returned to the Ruby API which displays the recommendation result to the user.
Figure 15: Recommendations result page
4.7 Conclusion
This chapter provided a comprehensive overview of the system implementation for the movie
recommendation platform. By harnessing the power of Llama-2, Python, Ruby on Rails, JavaScript, and
Postgres, the platform delivers personalized movie recommendations, enhancing the movie-watching
experience for users.
5. Result and Evaluation
5.1 Introduction
The major objective of the research is to implement a recommendation engine that can leverage large
language models to solve the problems of sparsity, cold start and explainability in recommender systems
and in this chapter, we revisit these objectives and assess the extent to which they have been met.
To evaluate our recommendation engine, we conducted an extensive usage experiment. Through this, we
aim to answer the following research question:
• Can the recommendation engine solve the cold start problem by providing relevant
recommendations without knowing the user’s movie history?
• Can the recommendation engine solve the sparsity problem by providing relevant recommendations
without knowing the relationship between user and movies such as ratings and reviews?
• Can the recommendation engine solve the explainability problem by generating relevant explanation
for every recommendation choice it makes?
5.2.2 Metrics
To evaluate the performance of the recommendation engine, I used the following metrics:
• User coverage: This is the percentage of users whom the recommender was able to generate a
recommendation list over the number of potential users (Massa & Avesani, 2009). This is a good
measure of how well the recommender does in solving the cold start problem by measuring the
number of cold start users who got a relevant recommendation.
• Explanation coverage: This is the percentage of recommendations for which explanations are
provided. This is a good measure of how well the recommender solves the explainability problem by
measuring how many recommendation choices have a good explanation regarding why it was
chosen.
This means, for the 10 different users without an account, they get a recommendation list every time they
use the recommender and all the choices provided in the list have a relevant explanation regarding why it
was recommended.
This demonstrates the effectiveness of the recommender in generating recommendations for new users
and explaining its reasoning for every recommended choice helping users see the reasoning behind its
choices.
6 Reflection
6.1 Introduction
In this chapter, I discuss the motivations behind choosing this project, the challenges faced during its
implementation, and the lessons learned throughout the process.
6.2 Motivations
The main motivation behind choosing this project was to explore the potential of using large language
models for recommendation systems. I have seen how recommendation systems have become part of our
everyday life with their application in shopping, entertainment, finance, and health.
Traditional methods of recommendation, such as collaborative filtering and content-based filtering, have
been successful but limited in their ability to help users understand the rationale behind their
recommendations. I wanted to investigate whether large language models could be used to improve the
explainability of recommendations.
6.3 Challenges
The implementation of a large language model-based recommendation system posed several challenges,
including data quality, training time, and time management.
Data Quality: The first challenge was collecting and preprocessing the dataset that captures relevant
details about movies to be used in finetuning the language model. The dataset needed to be
comprehensive, diverse, and representative of various countries and genres. Moreover, the data had to be
cleaned and normalized to ensure consistency and quality. I addressed this challenge by writing a python
script to fetch data from a movie API provider and then create a jsonl file containing the relevant training
dataset of over 60,000 movie descriptions.
Training Time: The second challenge was fine tuning the large language model. The model required
significant computational resources and a substantial amount of time to finetune. To address this challenge,
I used Replicate, a web platform that provides a high-performance computing cluster to train and fine tune
open-source language models using an optimized architecture to reduce training time.
Time management: The third challenge was balancing the demands of research, data gathering, technical
implementation and writing while working on a tight deadline. To address this challenge, I broke down the
project into different modules which becomes easier to manage and track using a project plan (I used a
Gantt chart in this case).
While working on this project, I learnt how to manage a project using Gantt charts. This helped me to
effectively manage my time and ensure that I complete my research on schedule.
I also learnt how to conduct independent research by learning how to identify research gaps, formulate
research questions, and design and implement a research study. I learnt how to critically evaluate existing
research, identify patterns and trends, and synthesize information to inform their own research.
I learnt different ways to evaluate the performance of a recommendation system and how to choose
appropriate evaluation metrics.
7. Conclusion and future work
7.1 Introduction
In this research, we conducted an implementation of a recommendation engine based on large-language
models to solve the cold start and explainability problems; two crucial problems faced by current
recommendation algorithms (Lika et al, 2014, Wang et al, 2018).
In this chapter, we will talk about the contributions of this research to the recommender systems field and
highlight areas where further research and exploration can continue to advance the field
7.2 Contributions
While implementing a recommender system based on large language models, this project has made
several noteworthy contributions to the domain of recommender systems:
1. Novel architecture: This project developed and rigorously tested novel recommendation
architecture tailored to integrating large language models into recommender systems with focus on
scalability and real time responsiveness.
2. Cold start problem mitigation: By integrating large language models, this project shows how to
resolve cold start problem in recommender systems. Our solution enables personalized
recommendations even for users with no historical interaction data.
3. Enhanced explainability: This project shows how the integration of a large language model can
help recommender systems to provide clear and understandable explanations for our
recommendations. This not only contributes to increased user trust but also fosters a deeper
understanding of the underlying recommendation process.
4. Real world applicability: While a lot of earlier research works (Kang et al, 2023, Bao et al, 2023,
Chen, 2023, Gao et al, 2023, Deng et al, 2023) focuses on the theoretical implications of leveraging
language models in recommender systems, this project shows the practical implications by
successfully deploying a large language model-based recommender system for movies, showing
tangible benefits on user experience.
These contributions collectively reflect the depth and breadth of this project's impact, from advancing the
theoretical foundations of language model backed recommender systems to providing practical solutions
that enhance user experiences and address critical challenges.
The significance of this work extends beyond this project, serving as a catalyst for continued research and
development in the quest for more effective, user-centric, and reliable recommendation systems in an ever-
evolving digital landscape.
In summary, this research project aims to explore the possibility of solving the cold start and explainability
problems in the recommender systems field by applying large language model. I hope that this work can
inspire researchers to analyze more opportunities for applying large language models in similar tasks and
further improve their performance in existing scenarios.
References
Schrage, M. 2020. Recommendation Engines. MIT Press.
Ricci, F., Rokach, L., Shapira, B. 2011. Introduction to Recommender Systems Handbook. Springer,
Boston, MA.
Covington, P., Adams, J. and Sargin, E., 2016, September. Deep neural networks for YouTube
recommendations. In Proceedings of the 10th ACM conference on recommender systems (pp. 191-198).
Adomavicius, G. and Tuzhilin, A., 2005. Toward the next generation of recommender systems: A survey of
the state-of-the-art and possible extensions. IEEE transactions on knowledge and data engineering, 17(6),
pp.734-749.
Pope, R., Douglas, S., Chowdhery, A., Devlin, J., Bradbury, J., Heek, J., Xiao, K., Agrawal, S. and Dean, J.,
2023. Efficiently scaling transformer inference. Proceedings of Machine Learning and Systems, 5.
Jonathan L. Herlocker, Joseph A. Konstan, and John Riedl. 2000. Explaining collaborative filtering
recommendations. In Proceedings of the 2000 ACM conference on Computer supported cooperative work
(CSCW '00). Association for Computing Machinery, New York, NY, USA, 241–250.
Thorat, P.B., Goudar, R.M. and Barve, S., 2015. Survey on collaborative filtering, content-based filtering
and hybrid recommendation system. International Journal of Computer Applications, 110(4), pp.31-36.
Sharma, S., Rana, V. and Malhotra, M., 2022. Automatic recommendation system based on hybrid filtering
algorithm. Education and Information Technologies, pp.1-16.
Shuai Zhang, Lina Yao, Aixin Sun, and Yi Tay. 2019. Deep Learning Based Recommender System: A
Survey and New Perspectives. ACM , Article 5 (January 2020), 38 pages.
Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural
Collaborative Filtering. In Proceedings of the 26th International Conference on World Wide Web (WWW
'17). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva,
CHE, 173–182.
Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning deep
structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM
international conference on Information & Knowledge Management (CIKM '13). Association for
Computing Machinery, New York, NY, USA.
Wang, H., Wang, N., and Yeung, D.Y., 2015, August. Collaborative deep learning for recommender
systems. In Proceedings of the 21st ACM SIGKDD international conference on knowledge discovery and
data mining (pp. 1235-1244).
Ying, H., Chen, L., Xiong, Y. and Wu, J., 2016. Collaborative deep ranking: A hybrid pair-wise
recommendation algorithm with implicit feedback. In Pacific-Asia conference on knowledge discovery and
data mining (pp. 555-567). Springer, Cham.
Kipf, T.N. and Welling, M., 2016. Semi-supervised classification with graph convolutional networks. arXiv
preprint arXiv:1609.02907.
Lika, B., Kolomvatsos, K. and Hadjiefthymiades, S., 2014. Facing the cold start problem in recommender
systems. Expert systems with applications, 41(4), pp.2065-2073.
Afchar, D., Melchiorre, A., Schedl, M., Hennequin, R., Epure, E. and Moussallam, M., 2022. Explainability in
music recommender systems. AI Magazine, 43(2), pp.190-208.
J. Liu, C. Liu, R. Lv, K. Zhou, and Y. Zhang, “Is chatgpt a good recommender? a preliminary study,” arXiv
preprint arXiv:2304.10149, 2023.
Devlin, J., Chang, M.W., Lee, K. and Toutanova, K., 2018. Bert: Pre-training of deep bidirectional
transformers for language understanding. arXiv preprint arXiv:1810.04805.
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Neelakantan, A., Shyam, P.,
Sastry, G., Askell, A. and Agarwal, S., 2020. Language models are few-shot learners. Advances in neural
information processing systems, 33, pp.1877-1901.
Zhang, Z., Han, X., Zhou, H., Ke, P., Gu, Y., Ye, D., Qin, Y., Su, Y., Ji, H., Guan, J. and Qi, F., 2021. CPM:
A large-scale generative Chinese pre-trained language model. AI Open, 2, pp.93-99.
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W. and Liu, P.J., 2020.
Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine
Learning Research, 21(1), pp.5485-5551.
Sun, Y., Wang, S., Feng, S., Ding, S., Pang, C., Shang, J., Liu, J., Chen, X., Zhao, Y., Lu, Y. and Liu, W.,
2021. Ernie 3.0: Large-scale knowledge enhanced pre-training for language understanding and generation.
arXiv preprint arXiv:2107.02137.
Thoppilan, R., De Freitas, D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H.T., Jin, A., Bos, T., Baker,
L., Du, Y. and Li, Y., 2022. Lamda: Language models for dialog applications. arXiv preprint
arXiv:2201.08239.
Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.A., Lacroix, T., Rozière, B., Goyal, N., Hambro,
E., Azhar, F. and Rodriguez, A., 2023. Llama: Open and efficient foundation language models. arXiv
preprint arXiv:2302.13971.
Anil, R., Dai, A.M., Firat, O., Johnson, M., Lepikhin, D., Passos, A., Shakeri, S., Taropa, E., Bailey, P.,
Chen, Z. and Chu, E., 2023. Palm 2 technical report. arXiv preprint arXiv:2305.10403.
Wei, J., Wei, J., Tay, Y., Tran, D., Webson, A., Lu, Y., Chen, X., Liu, H., Huang, D., Zhou, D. and Ma, T.,
2023. Larger language models do in-context learning differently. arXiv preprint arXiv:2303.03846.
Kim, H.J., Cho, H., Kim, J., Kim, T., Yoo, K.M. and Lee, S.G., 2022. Self-generated in-context learning:
Leveraging auto-regressive language models as a demonstration generator. arXiv preprint
arXiv:2206.08082.
Rubin, O., Herzig, J. and Berant, J., 2021. Learning to retrieve prompts for in-context learning. arXiv
preprint arXiv:2112.08633.
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Le, Q.V. and Zhou, D., 2022. Chain-of-
thought prompting elicits reasoning in large language models. Advances in Neural Information Processing
Systems, 35, pp.24824-24837
Zelikman, E., Wu, Y., Mu, J. and Goodman, N., 2022. Star: Bootstrapping reasoning with reasoning.
Advances in Neural Information Processing Systems, 35, pp.15476-15488.
Fei, H., Li, B., Liu, Q., Bing, L., Li, F. and Chua, T.S., 2023. Reasoning Implicit Sentiment with Chain-of-
Thought Prompting. arXiv preprint arXiv:2305.11255.
Jin, Z. and Lu, W., 2023. Tab-CoT: Zero-shot Tabular Chain of Thought. arXiv preprint arXiv:2305.17812.
Wu, S., Irsoy, O., Lu, S., Dabravolski, V., Dredze, M., Gehrmann, S., Kambadur, P., Rosenberg, D. and
Mann, G., 2023. Bloomberggpt: A large language model for finance. arXiv preprint arXiv:2303.17564.
Sallam M. (2023). ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on
the Promising Perspectives and Valid Concerns. Healthcare (Basel, Switzerland), 11(6), 887.
https://doi.org/10.3390/healthcare11060887
Milano, S., McGrane, J.A. & Leonelli, S. (2023). Large language models challenge the future of higher
education. Nat Mach Intell 5, 333–334. https://doi.org/10.1038/s42256-023-00644-2
Kang, W.C., Ni, J., Mehta, N., Sathiamoorthy, M., Hong, L., Chi, E. and Cheng, D.Z., 2023. Do LLMs
Understand User Preferences? Evaluating LLMs On User Rating Prediction. arXiv preprint
arXiv:2305.06474.
Bao, K., Zhang, J., Zhang, Y., Wang, W., Feng, F. and He, X., 2023. Tallrec: An effective and efficient
tuning framework to align large language model with recommendation. arXiv preprint arXiv:2305.00447.
Chen, Z., 2023. PALR: Personalization Aware LLMs for Recommendation. arXiv preprint arXiv:2305.07622.
Gao, Y., Sheng, T., Xiang, Y., Xiong, Y., Wang, H. and Zhang, J., 2023. Chat-rec: Towards interactive and
explainable llms-augmented recommender system. arXiv preprint arXiv:2303.14524.
Deng, Y., Zhang, W., Xu, W., Lei, W., Chua, T.S. and Lam, W., 2023. A unified multi-task learning
framework for multi-goal conversational recommender systems. ACM Transactions on Information
Systems, 41(3), pp.1-25.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł. and Polosukhin, I.,
2017. Attention is all you need. Advances in neural information processing systems, 30.
Camacho-Collados, J. and Pilehvar, M.T., 2017. On the role of text preprocessing in neural network
architectures: An evaluation study on text categorization and sentiment analysis. arXiv preprint
arXiv:1707.01780.
Congcong Wang, Paul Nulty, and David Lillis. 2021. A Comparative Study on Word Embeddings in Deep
Learning for Text Classification. In Proceedings of the 4th International Conference on Natural Language
Processing and Information Retrieval (NLPIR '20). Association for Computing Machinery, New York, NY,
USA, 37–46. https://doi.org/10.1145/3443279.3443304
Phuong, M. and Hutter, M., 2022. Formal algorithms for transformers. arXiv preprint arXiv:2207.09238.
Han, X., Zhang, Z., Ding, N., Gu, Y., Liu, X., Huo, Y., Qiu, J., Yao, Y., Zhang, A., Zhang, L. and Han, W.,
2021. Pre-trained models: Past, present and future. AI Open, 2, pp.225-250.
Radford, A., Narasimhan, K., Salimans, T. and Sutskever, I., 2018. Improving language understanding by
generative pre-training.
Xavier Amatriain and Justin Basilico. 2016. Past, Present, and Future of Recommender Systems: An
Industry Perspective. In Proceedings of the 10th ACM Conference on Recommender Systems (RecSys
'16). Association for Computing Machinery, New York, NY, USA, 211–214.
https://doi.org/10.1145/2959100.2959144
Vasudeva Varma, 2009. Software architecture: A case-based approach. New Delhi, India: Dorling
Kindersley.
Mark Richards, Neal Ford, 2020. Fundamentals of Software Architecture: An Engineering Approach.
O’Reilly Media.
Srinivasan, R., 1995. RPC: Remote procedure call protocol specification version 2 (No. rfc1831).
Box, D., Ehnebuske, D., Kakivaya, G., Layman, A., Mendelsohn, N., Nielsen, H.F., Thatte, S. and Winer,
D., 2000. Simple object access protocol (SOAP).
Wilde, E. and Pautasso, C. eds., 2011. REST: from research to practice. Springer Science & Business
Media.
Jatana, N., Puri, S., Ahuja, M., Kathuria, I. and Gosain, D., 2012. A survey and comparison of relational and
non-relational database. International Journal of Engineering Research & Technology, 1(6), pp.1-5.
Fan, W., Zhao, Z., Li, J., Liu, Y., Mei, X., Wang, Y., Tang, J. and Li, Q., 2023. Recommender systems in the
era of large language models (llms). arXiv preprint arXiv:2307.02046
Herlocker, J.L., Konstan, J.A., Borchers, A. and Riedl, J., 1999, August. An algorithmic framework for
performing collaborative filtering. In Proceedings of the 22nd annual international ACM SIGIR conference
on Research and development in information retrieval (pp. 230-237).
Lika, B., Kolomvatsos, K. and Hadjiefthymiades, S., 2014. Facing the cold start problem in recommender
systems. Expert systems with applications, 41(4), pp.2065-2073.
Wang, J., Huang, P., Zhao, H., Zhang, Z., Zhao, B. and Lee, D.L., 2018, July. Billion-scale commodity
embedding for e-commerce recommendation in alibaba. In Proceedings of the 24th ACM SIGKDD
international conference on knowledge discovery & data mining (pp. 839-848).
Chen, X., Zhang, Y. and Wen, J.R., 2022. Measuring" why" in recommender systems: A comprehensive
survey on the evaluation of explainable recommendation. arXiv preprint arXiv:2202.06466.
Dietmar Jannach, Markus Zanker, Alexander Felfernig, and Gerhard Friedrich. 2010. Recommender
Systems: An Introduction.
IBM Newsroom, 2020. 5 Things to Know About IBM’s New Tape Storage World Record. Available at
https://newsroom.ibm.com/IBM-research?item=32682.
Carlos A Gomez-Uribe and Neil Hunt. 2016. The netflix recommender system: Algorithms, business value,
and innovation. TMIS 6, 4 (2016)
Naumov, M., Mudigere, D., Shi, H.J.M., Huang, J., Sundaraman, N., Park, J., Wang, X., Gupta, U., Wu,
C.J., Azzolini, A.G. and Dzhulgakov, D., 2019. Deep learning recommendation model for personalization
and recommendation systems. arXiv preprint arXiv:1906.00091.
Shumpei Okura, Yukihiro Tagami, Shingo Ono, and Akira Tajima. 2017. Embedding-based news
recommendation for millions of users. In Proceedings of the SIGKDD. ACM, Halifax, NS, 1933–1942
Bozyiğit, F., Aktaş, Ö. and Kılınç, D., 2021. Linking software requirements and conceptual models: A
systematic literature review. Engineering Science and Technology, an International Journal, 24(1), pp.71-
82.
Nayan B. Ruparelia. 2010. Software development lifecycle models. SIGSOFT Softw. Eng. Notes 35, 3 (May
2010), 8–13. https://doi.org/10.1145/1764810.1764814
Ainslie, J., Lee-Thorp, J., de Jong, M., Zemlyanskiy, Y., Lebrón, F. and Sanghai, S., 2023. GQA: Training
Generalized Multi-Query Transformer Models from Multi-Head Checkpoints. arXiv preprint
arXiv:2305.13245.
Massa, P. and Avesani, P., 2009. Trust metrics in recommender systems. In Computing with social trust
(pp. 259-285). London: Springer London.
Kunaver, M. and Požrl, T., 2017. Diversity in recommender systems–A survey. Knowledge-based systems,
123, pp.154-162.
Kotkov, D., Wang, S. and Veijalainen, J., 2016. A survey of serendipity in recommender systems.
Knowledge-Based Systems, 111, pp.180-192.