0% found this document useful (0 votes)
90 views18 pages

Two Tower LLM Recommendation

The document describes the development of a Two-Tower Recommendation System using Graph Neural Networks (GNNs), Large Language Models (LLMs), and Reinforcement Learning (RL) to enhance personalized recommendations for Yelp business data. It outlines the architecture, embedding enhancements, and training workflow, focusing on how user and item profiles are encoded and optimized for better recommendation accuracy. The system leverages advanced embeddings and contextual reasoning to improve user-item similarity and utilizes a Q-network for reinforcement learning to refine recommendation strategies.

Uploaded by

Noor Uddin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views18 pages

Two Tower LLM Recommendation

The document describes the development of a Two-Tower Recommendation System using Graph Neural Networks (GNNs), Large Language Models (LLMs), and Reinforcement Learning (RL) to enhance personalized recommendations for Yelp business data. It outlines the architecture, embedding enhancements, and training workflow, focusing on how user and item profiles are encoded and optimized for better recommendation accuracy. The system leverages advanced embeddings and contextual reasoning to improve user-item similarity and utilizes a Q-network for reinforcement learning to refine recommendation strategies.

Uploaded by

Noor Uddin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Two_Tower_Final

December 15, 2024

0.1 A Two-Tower Recommendation System Powered by GNNs, LLMs, and RL


for Yelp Business Data

This notebook outlines the development of a personalized recommendation system utilizing both
Graph Neural Networks (GNNs) and Large Language Models (LLMs). We integrate Reinforcement
Learning (LR) to optimize recommendations and use advanced embeddings for richer user-item
profiles. This pipeline focuses on the following:
1. Constructing a Two-Tower Model with GNN-based user and item embeddings.
2. Leveraging LLMs to enhance user-item similarity via contextual reasoning.
3. Applying RL with an LLM-based reward estimator and chain-of-thought reasoning.

0.2 Technical Explanation

0.2.1 Two-Tower Architecture

The Two-Tower model consists of two independent towers: 1. User Tower: Encodes user profiles
using GNNs and user embeddings. 2. Item Tower: Encodes item attributes using GNNs and
item embeddings.
These towers generate latent representations that are combined to compute similarity scores. An
additional LLM Adapter refines embeddings with contextual language features which aligns LLM
and GNN embeddings for optimal performance.

0.2.2 Embedding Enhancements with LLMs

We employ pre-trained Sentence Transformers and GPT-2 for: - Semantic Understanding:


Sentence Transformers generate rich embeddings of user reviews and item descriptions. - Chain-
of-Thought Reasoning: GPT-2 evaluates recommendations using contextual reasoning which
improves semantic alignment between users and items.
The LLM embeddings are further transformed to a GNN-compatible space using a custom
TransformerAdapter.

0.2.3 Reinforcement Learning Optimization

Reinforcement learning fine-tunes the recommendation strategy. A Q-network learns to predict


optimal actions (recommendations) by maximizing long-term rewards based on user feedback.

1
Rewards are derived using: 1. Similarity Scores: From Sentence Transformers. 2. LLM-
Generated Scores: Chain-of-thought reasoning provides detailed explanations and suitability
scores. 3. Metadata Signals: Additional contextual signals refine the reward mechanism.

0.2.4 Dataset and Graph Construction

Yelp data containing user reviews, ratings, and business details forms the core dataset. Users and
items are represented as nodes, while interactions (reviews and ratings) form edges in the graph.
Self-loops and bi-directional edges are added to enhance graph connectivity.

0.2.5 Training Workflow

1. Two-Tower Training: The model minimizes prediction errors between generated scores and
actual user ratings using MSE loss.
2. RL Training: An �-greedy policy balances exploration (discovering new preferences) and
exploitation (prioritizing learned preferences).

0.2.6 Query Mechanism

For a given user query, the system: 1. Encodes the query using the LLM Adapter. 2. Generates
item embeddings via the item GNN. 3. Scores items using the Q-network and ranks them by
predicted Q-values.

0.3 Step 1: Import libraries

The first step is to import all the necessary libraries. We use PyTorch for deep learning, Sentence
Transformers for generating embeddings, and Transformers for LLMs.
[ ]: !pip install transformers torch_geometric sentence_transformers

Requirement already satisfied: transformers in /usr/local/lib/python3.10/dist-


packages (4.46.3)
Requirement already satisfied: torch_geometric in
/usr/local/lib/python3.10/dist-packages (2.6.1)
Requirement already satisfied: sentence_transformers in
/usr/local/lib/python3.10/dist-packages (3.2.1)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-
packages (from transformers) (3.16.1)
Requirement already satisfied: huggingface-hub<1.0,>=0.23.2 in
/usr/local/lib/python3.10/dist-packages (from transformers) (0.26.5)
Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.10/dist-
packages (from transformers) (1.26.4)
Requirement already satisfied: packaging>=20.0 in
/usr/local/lib/python3.10/dist-packages (from transformers) (24.2)
Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.10/dist-

2
packages (from transformers) (6.0.2)
Requirement already satisfied: regex!=2019.12.17 in
/usr/local/lib/python3.10/dist-packages (from transformers) (2024.9.11)
Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-
packages (from transformers) (2.32.3)
Requirement already satisfied: tokenizers<0.21,>=0.20 in
/usr/local/lib/python3.10/dist-packages (from transformers) (0.20.3)
Requirement already satisfied: safetensors>=0.4.1 in
/usr/local/lib/python3.10/dist-packages (from transformers) (0.4.5)
Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.10/dist-
packages (from transformers) (4.66.6)
Requirement already satisfied: aiohttp in /usr/local/lib/python3.10/dist-
packages (from torch_geometric) (3.11.10)
Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages
(from torch_geometric) (2024.10.0)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages
(from torch_geometric) (3.1.4)
Requirement already satisfied: psutil>=5.8.0 in /usr/local/lib/python3.10/dist-
packages (from torch_geometric) (5.9.5)
Requirement already satisfied: pyparsing in /usr/local/lib/python3.10/dist-
packages (from torch_geometric) (3.2.0)
Requirement already satisfied: torch>=1.11.0 in /usr/local/lib/python3.10/dist-
packages (from sentence_transformers) (2.5.1+cu121)
Requirement already satisfied: scikit-learn in /usr/local/lib/python3.10/dist-
packages (from sentence_transformers) (1.5.2)
Requirement already satisfied: scipy in /usr/local/lib/python3.10/dist-packages
(from sentence_transformers) (1.13.1)
Requirement already satisfied: Pillow in /usr/local/lib/python3.10/dist-packages
(from sentence_transformers) (11.0.0)
Requirement already satisfied: typing-extensions>=3.7.4.3 in
/usr/local/lib/python3.10/dist-packages (from huggingface-
hub<1.0,>=0.23.2->transformers) (4.12.2)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-
packages (from torch>=1.11.0->sentence_transformers) (3.4.2)
Requirement already satisfied: sympy==1.13.1 in /usr/local/lib/python3.10/dist-
packages (from torch>=1.11.0->sentence_transformers) (1.13.1)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in
/usr/local/lib/python3.10/dist-packages (from
sympy==1.13.1->torch>=1.11.0->sentence_transformers) (1.3.0)
Requirement already satisfied: aiohappyeyeballs>=2.3.0 in
/usr/local/lib/python3.10/dist-packages (from aiohttp->torch_geometric) (2.4.4)
Requirement already satisfied: aiosignal>=1.1.2 in
/usr/local/lib/python3.10/dist-packages (from aiohttp->torch_geometric) (1.3.1)
Requirement already satisfied: async-timeout<6.0,>=4.0 in
/usr/local/lib/python3.10/dist-packages (from aiohttp->torch_geometric) (4.0.3)
Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.10/dist-
packages (from aiohttp->torch_geometric) (24.2.0)
Requirement already satisfied: frozenlist>=1.1.1 in

3
/usr/local/lib/python3.10/dist-packages (from aiohttp->torch_geometric) (1.5.0)
Requirement already satisfied: multidict<7.0,>=4.5 in
/usr/local/lib/python3.10/dist-packages (from aiohttp->torch_geometric) (6.1.0)
Requirement already satisfied: propcache>=0.2.0 in
/usr/local/lib/python3.10/dist-packages (from aiohttp->torch_geometric) (0.2.1)
Requirement already satisfied: yarl<2.0,>=1.17.0 in
/usr/local/lib/python3.10/dist-packages (from aiohttp->torch_geometric) (1.18.3)
Requirement already satisfied: MarkupSafe>=2.0 in
/usr/local/lib/python3.10/dist-packages (from jinja2->torch_geometric) (3.0.2)
Requirement already satisfied: charset-normalizer<4,>=2 in
/usr/local/lib/python3.10/dist-packages (from requests->transformers) (3.4.0)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-
packages (from requests->transformers) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in
/usr/local/lib/python3.10/dist-packages (from requests->transformers) (2.2.3)
Requirement already satisfied: certifi>=2017.4.17 in
/usr/local/lib/python3.10/dist-packages (from requests->transformers)
(2024.8.30)
Requirement already satisfied: joblib>=1.2.0 in /usr/local/lib/python3.10/dist-
packages (from scikit-learn->sentence_transformers) (1.4.2)
Requirement already satisfied: threadpoolctl>=3.1.0 in
/usr/local/lib/python3.10/dist-packages (from scikit-
learn->sentence_transformers) (3.5.0)

[ ]: import pandas as pd
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
from torch_geometric.nn import GCNConv
from transformers import AutoTokenizer, AutoModel, AutoModelForCausalLM
from sklearn.model_selection import train_test_split
import random
from sentence_transformers import SentenceTransformer, util
from collections import defaultdict
import re

0.4 Step 2: Set device configuration

We configure the device to use GPU if available. This ensures faster computation for training and
inference.
[ ]: # Set device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

4
0.5 Step 3: Load pre-trained models

We load pre-trained models for:


1. Sentence Transformer embeddings for semantic representation.
2. BERT for tokenization and embeddings.
3. GPT-2 for chain-of-thought reasoning and reward estimation.
[ ]: # Load a sentence-transformer model for embeddings
model_name = "sentence-transformers/all-MiniLM-L6-v2"
st_model = SentenceTransformer(model_name)
st_model.to(device)

/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_auth.py:94:
UserWarning:
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab
(https://huggingface.co/settings/tokens), set it as secret in your Google Colab
and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access
public models or datasets.
warnings.warn(

[ ]: SentenceTransformer(
(0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with
Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token':
False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False,
'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens':
False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)

[ ]: # Define tokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
llm_model_uncased = AutoModel.from_pretrained("bert-base-uncased").eval().
,→to(device)

[ ]: # Load a generative LLM for chain-of-thought reasoning (We will use GPT-2, but␣
,→for better results you should use more advanced models)

llm_generator_tokenizer = AutoTokenizer.from_pretrained("gpt2")
llm_generator_model = AutoModelForCausalLM.from_pretrained("gpt2").eval().
,→to(device)

[ ]: # We add pad token for GPT-2


if llm_generator_tokenizer.pad_token_id is None:
llm_generator_tokenizer.pad_token = llm_generator_tokenizer.eos_token

5
0.6 Step 4: Adaptive embedding Transformer

The TransformerAdapter class maps embeddings from LLMs to a space compatible with GNN
embeddings. This alignment ensures smooth integration of language-based features with graph-
based features.
[ ]: class TransformerAdapter(nn.Module):
def __init__(self, input_dim, embed_dim, num_heads=2, ff_dim=256,␣
,→num_layers=1):

super(TransformerAdapter, self).__init__()
encoder_layer = nn.TransformerEncoderLayer(d_model=embed_dim,␣
,→nhead=num_heads, dim_feedforward=ff_dim)

self.transformer_encoder = nn.TransformerEncoder(encoder_layer,␣
,→num_layers=num_layers)

self.input_proj = nn.Linear(input_dim, embed_dim)

def forward(self, x):


x = self.input_proj(x).unsqueeze(1) #[batch_size, 1, embed_dim]
x = x.permute(1, 0, 2) #[1, batch_size, embed_dim]
x = self.transformer_encoder(x) #[1, batch_size, embed_dim]
x = x.permute(1, 0, 2).squeeze(1) #[batch_size, embed_dim]
return x

0.7 Step 5: Dataset preparation

The dataset is loaded from a CSV file. This file contains Yelp business data with details like ID,
name, description, and reviews. We define the YelpDataset class to process user and item data.
The dataset aggregates user histories and computes contextual embeddings using the Sentence
Transformer model.
[ ]: data_path = 'https://raw.githubusercontent.com/MPAghababa/llms/main/two_tower/
,→yelp.csv'

data = pd.read_csv(data_path)
data = data.dropna()
data.head(3)

[ ]: business_id date review_id stars \


0 9yKzy9PApeiPPOUJEtnvkg 2011-01-26 fWKvX83p0-ka4JS3dc6E5A 5
1 ZRJwVLyzEJq1VAihDhYiow 2011-07-27 IjZ33sJrzXqU-0X6U8NwyA 5
2 6oRAC4uyJCsJl1X0WZpVSA 2012-06-14 IESLBzqUCLdSzSqm0eCSxQ 4

text type \
0 My wife took me here on my birthday for breakf… review
1 I have no idea why some people give bad review… review
2 love the gyro plate. Rice is so good and I als… review

user_id cool useful funny

6
0 rLtl8ZkDX5vH5nAx9C3q5Q 2 5 0
1 0a2KyEL0d3Yb1V6aivbIuQ 0 0 0
2 0hT2KtfLiobPvh6cDC8JQg 0 1 0

[ ]: class YelpDataset(Dataset):
def __init__(self, data, tokenizer, llm_model, max_length=128):
self.data = data
self.tokenizer = tokenizer
self.llm_model = llm_model
self.max_length = max_length

# We build user profile embeddings by averaging all their reviews


user_texts = defaultdict(list)
for i, row in self.data.iterrows():
user_texts[row['user_id']].append(row['text'])

self.user_profile_embeddings = {}

# We encode user histories using sentence-transformer (This will give␣


,→us contextualized user profiles)

for uid, texts in user_texts.items():


embeddings = st_model.encode(texts, convert_to_tensor=True,␣
,→device=device)

self.user_profile_embeddings[uid] = embeddings.mean(dim=0)

def __len__(self):
return len(self.data)

def __getitem__(self, idx):


row = self.data.iloc[idx]
user_id = row['user_id']
item_id = row['business_id']
text = row['text']
stars = row['stars']

tokens = self.tokenizer(
text,
max_length=self.max_length,
padding='max_length',
truncation=True,
return_tensors="pt"
)

with torch.no_grad():
embedding = st_model.encode([text], convert_to_tensor=True,␣
,→device=device).squeeze(0)

7
user_profile_embedding = self.user_profile_embeddings[user_id]
embedding = (embedding + user_profile_embedding) / 2.0

return torch.tensor(user_id, dtype=torch.long), torch.tensor(item_id,␣


,→ dtype=torch.long), embedding, torch.tensor(stars, dtype=torch.float)

0.8 Step 6: GNN towers

The GNNTower class defines a simple Graph Convolutional Network to process user and item em-
beddings. It generates graph-based features for the Two-Tower Model.
[ ]: class GNNTower(nn.Module):
def __init__(self, input_dim, hidden_dim, output_dim):
super(GNNTower, self).__init__()
self.conv1 = GCNConv(input_dim, hidden_dim)
self.conv2 = GCNConv(hidden_dim, output_dim)

def forward(self, x, edge_index):


x = self.conv1(x, edge_index).relu()
x = self.conv2(x, edge_index)
return x

0.9 Step 7: Two-Tower model

The Two-Tower Model integrates the GNN embeddings for users and items with LLM-based em-
beddings processed by the TransformerAdapter. The similarity scores between user and item
embeddings are used for recommendations.
[ ]: class TwoTowerModel(nn.Module):
def __init__(self, user_gnn, item_gnn, num_users, num_items, embed_dim,␣
,→llm_dim):

super(TwoTowerModel, self).__init__()
self.user_gnn = user_gnn
self.item_gnn = item_gnn
self.user_embed = nn.Embedding(num_users, embed_dim)
self.item_embed = nn.Embedding(num_items, embed_dim)
self.llm_adapter = TransformerAdapter(input_dim=llm_dim,␣
,→embed_dim=embed_dim)

def forward(self, user_ids, item_ids, llm_embeddings, user_edge_index,␣


,→ item_edge_index):
user_ids = user_ids.to(device)
item_ids = item_ids.to(device)
llm_embeddings = llm_embeddings.to(device)
user_edge_index = user_edge_index.to(device)

8
item_edge_index = item_edge_index.to(device)

user_features = self.user_gnn(self.user_embed.weight, user_edge_index)


item_features = self.item_gnn(self.item_embed.weight, item_edge_index)
llm_features = self.llm_adapter(llm_embeddings)

user_vectors = user_features[user_ids] + llm_features


item_vectors = item_features[item_ids] + llm_features

scores = (user_vectors * item_vectors).sum(dim=1)


return scores

0.10 Step 8: Reward estimation

The LLMRewardEstimator class is responsible for calculating rewards using semantic similarity and
chain-of-thought reasoning. This reward mechanism will be used in reinforcement learning.
[ ]: class LLMRewardEstimator:
def __init__(self, tokenizer, llm_model):
self.tokenizer = tokenizer
self.llm_model = llm_model

def estimate_reward(self, query, recommendation_text):


with torch.no_grad():
query_embedding = st_model.encode([query], convert_to_tensor=True,␣
,→device=device)

rec_embedding = st_model.encode([recommendation_text],␣
,→convert_to_tensor=True, device=device)

similarity = util.cos_sim(query_embedding, rec_embedding).item()


return similarity

def estimate_reward_cot(self, query, recommendation_text):


prompt = (f"User query: {query}\n"
f"Recommended text: {recommendation_text}\n"
f"Explain why this recommendation is suitable for the user␣
,→query."

f"Provide reasoning focusing on details like ambiance, food,␣


,→and service."

f"End with a score between 1 and 5 based on suitability:


,→\nReasoning:")

input_ids = llm_generator_tokenizer.encode(prompt, return_tensors='pt').


to(device)
,→

with torch.no_grad():
outputs = llm_generator_model.generate(

9
input_ids=input_ids,
max_new_tokens=150,
temperature=0.7,
top_p=0.9
)

response = llm_generator_tokenizer.decode(outputs[0],␣
skip_special_tokens=True)
,→

scores = re.findall(r'\b[1-5]\b', response)


final_score = float(scores[-1]) if scores else 3.0

return final_score, response

0.11 Step 9: Reinforcement learning training

Reinforcement Learning is applied to fine-tune the model for better recommendations. A Q-Network
learns to optimize actions (recommendations) based on rewards. RL ensures the system balances
recommending safe options with discovering new user preferences.
[ ]: class QNetwork(nn.Module):
def __init__(self, state_dim, action_dim, hidden_dim=128):
super(QNetwork, self).__init__()
self.fc1 = nn.Linear(state_dim, hidden_dim)
self.fc2 = nn.Linear(hidden_dim, hidden_dim)
self.fc3 = nn.Linear(hidden_dim, action_dim)

def forward(self, state):


state = state.to(device)
x = torch.relu(self.fc1(state))
x = torch.relu(self.fc2(x))
q_values = self.fc3(x)
return q_values

[ ]: # RL Training
def train_rl_llm(model, q_network, optimizer, replay_buffer, batch_size,␣
,→gamma=0.99):

if len(replay_buffer) < batch_size:


return

batch = random.sample(replay_buffer, batch_size)


states, actions, rewards, next_states = zip(*batch)

states = torch.stack(states).to(device)
actions = torch.tensor(actions, dtype=torch.long).to(device)
rewards = torch.tensor(rewards, dtype=torch.float).to(device)
next_states = torch.stack(next_states).to(device)

10
q_values = q_network(states).gather(1, actions.unsqueeze(1)).squeeze(1)

with torch.no_grad():
next_q_values = q_network(next_states).max(1)[0]
target_q_values = rewards + gamma * next_q_values

loss = nn.MSELoss()(q_values, target_q_values)

optimizer.zero_grad()
loss.backward()
optimizer.step()

0.12 Step 10: Querying the Two-Tower model

We query the Two-Tower Model to generate recommendations based on a user query. The LLM
adapter is used to process the query, and the GNN embeddings are used to compute similarity
scores. To incorporate RL into the recommendation process, we select items based on the Q-values
predicted by the trained Q-network, which will provide the model with the learned reward signal
to guide the selection of recommended items.
[ ]: def query_two_tower_model_rl(model, q_network, query, tokenizer, llm_model,␣
,→item_edge_index, item_embeddings, id_to_item, k=5):

model.eval()
with torch.no_grad():
query_embedding = st_model.encode([query], convert_to_tensor=True,␣
,→device=device).squeeze(0)

query_embedding = model.llm_adapter(query_embedding.unsqueeze(0)).
,→squeeze(0)

item_edge_index = item_edge_index.to(device)
item_embeddings = item_embeddings.to(device)

item_features = model.item_gnn(item_embeddings, item_edge_index)

q_values = q_network(query_embedding.unsqueeze(0)).squeeze(0)

top_k_indices = torch.topk(q_values, k=k).indices.cpu().numpy()

top_k_items = [id_to_item[idx] for idx in top_k_indices]

return top_k_items

11
0.13 Step 11: Main execution for training and testing

[ ]: data = data[:100] # We use a subset of data for quick experimentation

# We map user and item IDs to indices


user_ids = data['user_id'].unique()
item_ids = data['business_id'].unique()
user_map = {uid: idx for idx, uid in enumerate(user_ids)}
item_map = {iid: idx for idx, iid in enumerate(item_ids)}
id_to_item = {idx: iid for iid, idx in item_map.items()}

# We filter and map IDs in the dataset


data = data[data['user_id'].isin(user_map.keys()) & data['business_id'].
,→isin(item_map.keys())]

data['user_id'] = data['user_id'].map(user_map)
data['business_id'] = data['business_id'].map(item_map)
data = data.dropna(subset=['user_id', 'business_id']).reset_index(drop=True)

num_users = len(user_map)
num_items = len(item_map)

[ ]: # We create edge indices for graph representation


def create_edge_index(data, num_nodes):
edges = torch.tensor(
[[row['user_id'], row['business_id']] for _, row in data.iterrows()],
dtype=torch.long
)
valid_edges = edges[(edges[:, 0] < num_nodes) & (edges[:, 1] < num_nodes)]
edge_index = torch.cat([valid_edges, valid_edges.flip(1)], dim=0).t()
self_loops = torch.arange(num_nodes, dtype=torch.long).unsqueeze(0).
,→repeat(2, 1)

edge_index = torch.cat([edge_index, self_loops], dim=1)


return edge_index
user_edge_index = create_edge_index(data[['user_id', 'business_id']], num_users)
item_edge_index = create_edge_index(data[['business_id', 'user_id']], num_items)

user_edge_index = user_edge_index.to(device)
item_edge_index = item_edge_index.to(device)

[ ]: # Initialize reward estimator and model components


reward_estimator = LLMRewardEstimator(tokenizer, llm_model_uncased)

embed_dim = 128
llm_dim = 384
user_gnn = GNNTower(embed_dim, 64, embed_dim).to(device)
item_gnn = GNNTower(embed_dim, 64, embed_dim).to(device)

12
model = TwoTowerModel(user_gnn, item_gnn, num_users, num_items, embed_dim,␣
,→llm_dim).to(device)

optimizer = optim.Adam(model.parameters(), lr=0.001)

/usr/local/lib/python3.10/dist-packages/torch/nn/modules/transformer.py:379:
UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False
because encoder_layer.self_attn.batch_first was not True(use batch_first for
better inference performance)
warnings.warn(

[ ]: # Split data and create datasets


train_data, test_data = train_test_split(data, test_size=0.2, random_state=27)
train_dataset = YelpDataset(train_data, tokenizer, llm_model_uncased)
test_dataset = YelpDataset(test_data, tokenizer, llm_model_uncased)

[ ]: # We define data loader and collation function


def collate_fn(batch):
user_ids_batch = []
item_ids_batch = []
llm_embeddings = []
stars_batch = []
for (u, i, e, s) in batch:
user_ids_batch.append(u)
item_ids_batch.append(i)
llm_embeddings.append(e)
stars_batch.append(s)
user_ids_batch = torch.stack(user_ids_batch)
item_ids_batch = torch.stack(item_ids_batch)
llm_embeddings = torch.stack(llm_embeddings)
stars_batch = torch.stack(stars_batch)
return user_ids_batch, item_ids_batch, llm_embeddings, stars_batch

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True,␣


,→collate_fn=collate_fn)

test_loader = DataLoader(test_dataset, batch_size=32, collate_fn=collate_fn)

[ ]: #Train the Two-Tower model


for epoch in range(2): # You should use more epochs
model.train()
total_loss = 0
for user_ids_batch, item_ids_batch, llm_embeddings, stars in train_loader:
user_ids_batch = user_ids_batch.to(device)
item_ids_batch = item_ids_batch.to(device)
llm_embeddings = llm_embeddings.to(device)
stars = stars.to(device)
optimizer.zero_grad()

13
predictions = model(user_ids_batch, item_ids_batch, llm_embeddings,␣
user_edge_index, item_edge_index)
,→

loss = nn.MSELoss()(predictions, stars.float())


loss.backward()
optimizer.step()
total_loss += loss.item()

print(f"Epoch {epoch+1}, Loss: {total_loss:.4f}")

Epoch 1, Loss: 36187.9502


Epoch 2, Loss: 16775.4653

[ ]: #Initialize Q-Learning components


q_network = QNetwork(state_dim=embed_dim, action_dim=num_items).to(device)
q_optimizer = optim.Adam(q_network.parameters(), lr=0.001)
replay_buffer = []

[ ]: #Train RL model with LLM-based rewards


query_pool = [
"I love cozy cafes with great coffee",
"Looking for a family-friendly restaurant with vegan options",
"Find me a cozy diner with a romantic ambiance and live music",
"I want a budget-friendly Italian restaurant that serves gluten-free pasta",
"Recommend a sushi place nearby with great reviews",
] # You should add more queries

for episode in range(2): # You should increase the number of episodes for␣
,→better training

query = random.choice(query_pool)
item_texts = data['text'].tolist()

query_embedding = st_model.encode([query], convert_to_tensor=True,␣


device=device).squeeze(0)
,→

query_embedding = model.llm_adapter(query_embedding.unsqueeze(0)).squeeze(0)

q_values = q_network(query_embedding.unsqueeze(0))
if random.random() < 0.1: # �-greedy exploration
action = random.randint(0, len(item_texts) - 1)
else:
action = torch.argmax(q_values).item()

if action >= len(item_texts):


action = len(item_texts) - 1

reward_base = reward_estimator.estimate_reward(query, item_texts[action])


cot_score, cot_reasoning = reward_estimator.estimate_reward_cot(query,␣
,→item_texts[action])

14
reward = (reward_base + (cot_score / 5.0)) / 2.0

metadata_signal = random.uniform(0.0, 0.1)


reward += metadata_signal

next_state = query_embedding.detach()

replay_buffer.append((query_embedding.detach(), action, reward, next_state))


train_rl_llm(model, q_network, q_optimizer, replay_buffer, batch_size=4)

print(f"Episode {episode+1}: Query: {query}, Action: {action}, Reward:␣


,→{reward:.4f}, CoT Reasoning: {cot_reasoning}")

/usr/local/lib/python3.10/dist-
packages/transformers/generation/configuration_utils.py:590: UserWarning:
`do_sample` is set to `False`. However, `temperature` is set to `0.7` -- this
flag is only used in sample-based generation modes. You should set
`do_sample=True` or unset `temperature`.
warnings.warn(
/usr/local/lib/python3.10/dist-
packages/transformers/generation/configuration_utils.py:595: UserWarning:
`do_sample` is set to `False`. However, `top_p` is set to `0.9` -- this flag is
only used in sample-based generation modes. You should set `do_sample=True` or
unset `top_p`.
warnings.warn(
The attention mask and the pad token id were not set. As a consequence, you may
observe unexpected behavior. Please pass your input's `attention_mask` to obtain
reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
The attention mask is not set and cannot be inferred from input because pad
token is same as eos token. As a consequence, you may observe unexpected
behavior. Please pass your input's `attention_mask` to obtain reliable results.
Episode 1: Query: I love cozy cafes with great coffee, Action: 94, Reward:
0.6985, CoT Reasoning: User query: I love cozy cafes with great coffee
Recommended text: I grew up on Empanadas in Panama and I have been hard pressed
to find anything close to them in the U.S.. today I found them!

A perfectly crunchy crust and the beef was beautifully spiced. Usually Empanadas
are bland and soggy. They did a great job on these.

I usually don't like rice, but the rice and black beans were wonderful.

Service was great! I'll be back!


Explain why this recommendation is suitable for the user query.Provide reasoning
focusing on details like ambiance, food, and service.End with a score between 1
and 5 based on suitability:
Reasoning: 1. The service was good. 2. The food was good. 3. The ambiance was

15
good. 4. The service was good. 5. The ambiance was good. 6. The service was
good. 7. The ambiance was good. 8. The service was good. 9. The service was
good. 10. The service was good. 11. The service was good. 12. The service was
good. 13. The service was good. 14. The service was good. 15. The service was
good. 16. The service was good. 17. The service was good. 18. The service was
good. 19. The service was good. 20. The service was good. 21. The service was
good.
The attention mask and the pad token id were not set. As a consequence, you may
observe unexpected behavior. Please pass your input's `attention_mask` to obtain
reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Episode 2: Query: I love cozy cafes with great coffee, Action: 80, Reward:
0.6014, CoT Reasoning: User query: I love cozy cafes with great coffee
Recommended text: The vibe exuding from this place is pure awesomeness.
Reminiscent of a trendy hipster coffee joint, this is actually a casual vegan
restaurant.

I am a pescatarian and unless I am eating seafood, I steer clear of meat, even


the mock kind usually however once in a while it's delicious… so why try it
here at Green I thought. The menu's style is comfort food, which as we all know
traditionally is heavy on the meat, sauces, and fat content… so at least if
you're going to be bad, you can do it with organic and pure ingredients.

So as I wanted to sample as much of the menu as possible, my lovely friend, her


fiance, and I shared a few items:

*Artichoke Gratine: The corn chips were amazing, lightly salted and crisp. The
dip was a bit too garlicky and runny for my liking. Ate a few bites of this but
could not see myself eating the entire thing solo.

*Spicy Buffalo "Wings": first things first… do not let looks dismay you…
true it looks gross but they taste legit! The flavor of the buffalo sauce was
perfect, although could have been spicier. And the cucumber ranch dipping sauce
was perfectly creamy and lightly flavored as to not overpower the "wings". This
dish is a must try!!

*Vegan Chili Fries: the fries are thin cut and tasty. The chili sauce was good,
at first, but I quickly got sick of the flavor. This could be because I was
never a huge chili fan even back when I ate meat. Hmm, I think you are better
off ordering the thyme fries.

*Crab Puffs: Perfectly crisp with a delicious creamy filling. Another must try!

Lastly my friend's fiance ordered that day's special which was a green chili
burrito… delicious and huge. A bit too much rice but besides that a great
option. It came with a side, which he ordered the curry pasta salad, mmm.

16
Green serves bowls, sandwiches, pizzas, salads… next time I am back in the
area I will be checking out more of the menu when craving "meat" and not my
usual tofu, seafood, veggie diet.

Might I add the service is friendly. Perfect place for a casual friend date.
Explain why this recommendation is suitable for the user query.Provide reasoning
focusing on details like ambiance, food, and service.End with a score between 1
and 5 based on suitability:
Reasoning: 1) The service is good, the food is good, and the service is good. 2)
The service is good, the food is good, and the service is good. 3) The service
is good, the food is good, and the service is good. 4) The service is good, the
food is good, and the service is good. 5) The service is good, the food is good,
and the service is good. 6) The service is good, the food is good, and the
service is good. 7) The service is good, the food is good, and the service is
good. 8) The service is good, the food is good, and the service is good. 9) The
service is good
#Step 12: Query Two-Tower model for recommendations
[ ]: # Example 1
user_query_1 = "I love quiet coffee shops with excellent Wi-Fi and great␣
,→desserts."

top_recommendations = query_two_tower_model_rl(
model=model,
q_network=q_network,
query=user_query_1,
tokenizer=tokenizer,
llm_model=llm_model_uncased,
item_edge_index=item_edge_index,
item_embeddings=model.item_embed.weight,
id_to_item=id_to_item,
k=5
)
print("Top Recommendations for Query 1 are:", top_recommendations)

Top Recommendations for Query 1 are: ['QGeliKMObpVZ3jP89--ZIg',


'_1QQZuf4zZOyFCvXc0o6Vg', '8ZwO9VuLDWJOXmtAdc7LXQ', '7SO_rX1F6rQEl-5s3wZxgQ',
'znBnrQNq1FdUt5aIGAbyuQ']

[ ]: # Example 2
user_query_2 = "Looking for a family-friendly restaurant with vegan options."
top_recommendations_2 = query_two_tower_model_rl(
model=model,
q_network=q_network,
query=user_query_2,
tokenizer=tokenizer,

17
llm_model=llm_model_uncased,
item_edge_index=item_edge_index,
item_embeddings=model.item_embed.weight,
id_to_item=id_to_item,
k=5
)
print("Top Recommendations for Query 2 are:", top_recommendations_2)

Top Recommendations for Query 2 are: ['QGeliKMObpVZ3jP89--ZIg',


'8ZwO9VuLDWJOXmtAdc7LXQ', '_1QQZuf4zZOyFCvXc0o6Vg', '7SO_rX1F6rQEl-5s3wZxgQ',
'puy0PzIcCgR3KWJI7llBFQ']

0.14 Remarks and suggestions for next steps

1. Scalability: Expand to full datasets and incorporate larger LLMs like GPT-4 or specialized
fine-tuned models.
2. Fine-Grained Evaluation: Evaluate the performance of the model using metrics such as
Precision@k, Recall@k, and NDCG@k. Introduce precision-recall curves and deeper
analysis of explainability metrics.
3. API Integration: Develop a FastAPI-based interface for real-time querying and evaluation.
4. Explainability Enhancements: Integrate visual tools to display reasoning paths and graph
relationships.
5. Enhanced Graph Learning: Incorporate advanced GNN architectures like GraphSAGE
or Graph Attention Networks.
6. User-Centric Feedback Loop: Add mechanisms for users to rate recommendations, im-
proving RL reward signals.
7. Model Deployment: Use edge AI or containerization for scalable deployment.

0.14.1 Let’s connect and let me know if you have any comments.

https://www.linkedin.com/in/mpaghababa/

[ ]:

18

You might also like