0% found this document useful (0 votes)
24 views14 pages

Cold Start Paper

Uploaded by

aaiyoaaiyo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views14 pages

Cold Start Paper

Uploaded by

aaiyoaaiyo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

This article has been accepted for publication in IEEE Access.

This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3346918

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2023.0322000

A Flexible Two-tower Model for Item Cold-start


Recommendation
WON-MIN LEE1 , YOON-SIK CHO1
1
Department of Artificial Intelligence, Chung-Ang University, Seoul 06974, South Korea
Corresponding author: Yoon-Sik Cho ([email protected])
This work was partly supported by the Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded
by the Korean government (MSIT) (No. 2021-0-01341, Artificial Intelligence Graduate School Program of Chung-Ang Univ.), and the
National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (No. 2021R1F1A1063389)

ABSTRACT One of the main challenges in recommendation system is the item cold-start problem, where
absence of historical interactions or ratings in new items makes recommendation difficult. In order to solve
the cold-start problem, hybrid neural network models using meta data of the item as a feature is widely used.
However, existing cold-start models tend to focus too much on utilizing the side information of items, which
may not be flexible enough to capture the interaction information of users. In this study, we propose a flexible
framework for better capturing the interaction information of users. Specifically, we incorporate the multiple
choice learning scheme into the two-tower neural network which is a popular recommendation model that
consists of two towers - one for users and one for items. In our proposed framework, we construct two
encoders. One of the two encoders, the tightly-coupled encoder, focuses on the side information of items
with which the user has interacted, the other one, loosely-coupled encoder, focuses the user’s interaction
information. We utilize Gumbel-Softmax to stochastically select the encoder, enhancing the flexibility that
considers not only item feature but also user interaction information. We evaluate our proposed framework on
two datasets - the MLIMDb dataset which is a combination of widely used the MovieLens and IMDb datasets
based on common movies, and the CiteULike dataset. The experimental results show that our proposed
framework achieves state-of-the-art results on cold-start recommendation. In the Recall@150 experiments
on the CiteULike dataset, we achieved improvement of approximately 2.7% compared to the base model.
In the Recall@150 experiments on the MLIMDb dataset, we achieved improvement of approximately 5.2%
compared to the base model. We further show our proposed model improves the performance in the warm-
start settings. In the Recall@100 experiments on the Citeulike dataset, we observed an improvement of
approximately 1.3% compared to the base model. In the Recall@100 experiments on the MLIMDb dataset,
we observed an improvement of approximately 3.9% compared to the base model. Our proposed framework
provides a flexible approach for capturing the diverse aspects of users in recommendation systems, even for
cold-start items. As demonstrated through extensive experiments, our proposed model outperforms several
state-of-the-art models on both datasets.

INDEX TERMS recommendation system, cold-start problem, hybrid neural network

I. INTRODUCTION algorithms require a sophisticated technology that can grasp


The Recommendation System (RS) is used in various fields, the context of user conversations while delivering appropri-
which is beneficial in many ways [1]. It generates revenue ate recommendations. In [5], [6] and [7] authors propose
by providing high-quality products, saves users’ time and algorithm to recommend the next item for user interaction,
efforts when finding items of their interest, and facilitates relying on information about the sequence in which users
access to new products. RS in the industry has attracted a have interacted with items. However, one of the main chal-
great deal of attention in the past few years and consequently lenges in RS is recommending new items, where no previous
many algorithms have been proposed. Research in this field interaction or rating records exists. Such a problem is called
has been active in recent years. In [2], [3] and [4] authors ‘cold-start problem’, where many algorithms [8]–[13] have
propose algorithm to understand user needs within the context been proposed to address this problem.
of user-system dialogues and provide recommendation. These In various domains of RS, work on solving the cold-start

VOLUME 11, 2023 1

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3346918

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

<interaction items of user A>

< recommendation > < Selected new items of user A >


A

: high similarity
: low similarity

<new items>

< recommendation > < Selected new items of user B >

: high similarity
: low similarity

<interaction items of user B>

B ①: tightly-coupled method
②: loosely-coupled method

FIGURE 1: The recommended flow of cold-start item in our proposed method. Other aspects besides item information should
also be considered for cold-start recommendation.

problem using additional feature [14]–[17] is actively under tower RS has further improved the performance for item cold-
way. One of the most popular approach is the hybrid method start recommendation. In [18], the item representation was
which combines the Collaborative Filtering (CF) and Content directly shared into the user encoder in an attempt to better
Based Filtering (CBF) methods. CF predicts unseen interac- unify the two representations, which achieved state-of-the-
tions between users and items using only feedback history, but art results for item cold-start recommendation. While this ap-
it faces the challenge of reduced performance when feedback proach might be effective in recommending items with similar
is sparse. CBF relies on user and item side information for features, we believe this over-constrained setting is limited
predictions, making it applicable when such information is beyond item features. As pointed in previous study [15], a
available. However, it cannot capture user interaction pat- user who only reads news in sports may still be interested
terns. To address these constraints, hybrid methods combine in news on crime, which we believe the method in [18] with
both approaches, mapping side information and feedback tightly coupled towers under performs.
to separate low-dimensional representations and combining In this paper, we study the item cold-start problem ad-
them to predict the final interaction. When implemented in dressing the aforementioned limitation through improving the
neural networks, two-tower approach [17]–[21] is favored flexibility of user representation, which is achieved through
for construction hybrid models. The hybrid method also has the additional user encoder for capturing loosely coupled
been successfully applied in many real-world recommenda- representations. Our proposed scheme has an item encoder,
tion systems, such as Amazon and Netflix. One of the main and two user encoders, where one of the user encoders shares
advantages of the hybrid method is that it can effectively uti- the item embeddings as in [18], and the other user encoder
lize the side information for handling the cold-start problem. does not share the item embeddings but more focuses on
In two-tower neural networks, two encoders are employed interaction information of users. We refer to this scheme as
separately for learning representations of users and items. In the ‘Flexible Two-Tower Model’. We take inspiration from the
the item encoder, the feature of the item is used as an input and multiple choice learning scheme [22], which can choose the
transformed into a lower dimensional representation. In the best performing user encoder at each user-item interaction.
user encoder, the information of user is used as an input and Specifically, we use Gumbel-Softmax for stochastically se-
transformed into a lower dimensional representation. Under lecting the promising method at each interaction. By selecting
this framework, information about the item or the user can be two user encoders, we can focus on item feature information
effectively exploited, and thus have proven to be effective in or interaction data, depending on the user, which allows us to
cold-start scenarios. make better recommendations. Thus, in the training phase, we
The recently introduced method [18] based on this two- expect the tightly coupled user encoder to focus on capturing
2 VOLUME 11, 2023

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3346918

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

the item features, while the loosely coupled user encoder can and artist side information in graph autoencoder architecture
focus on the relationship between user and items. We conduct to effectively incorporate the genre, country, and mood of
our experiments on completely cold-start items which hasn’t music into node embedding representations learned from the
been seen in the training phase. Figure 1 illustrates how our graph. Moreover, they tackled the cold start artist problem
proposed scheme can capture these two scenarios. In Figure 1, by automatically ranking the top-k most similar neighbors of
the items in the new items and interaction items indicate that new artists using a gravity-inspired mechanism. In the news
the blue color range items are similar to each other, and the domain, authors in [15] proposed a framework to perform
green color range item is dissimilar to the blue color range recommendation with personalized user interest and also in-
items. Through the list of interaction items of user A and user corporated news popularity to alleviate the cold-start prob-
B, we can see that the common items that both users interacted lem. In their work, entities obtained from news titles, content
with have features closer to the blue color range than the green embedding vectors such as words and click-through rates are
color range. However, in reality, user B only selects items with used as feature. The personalization score is calculated from
blue features, similar to previous interaction pattern, while the embedding vector obtained from the news title, and the
user A deviates from previous interaction pattern and selects news popularity score is calculated from the news content
not only the item with blue features but also item with green and click-through rate. These two scores are combined for
features among the new items. As shown in this example, news recommendations. In the movie domain, [16] proposed
users may actually select new items with features that they an attentive graph neural network model to leverage movie
did not interact with before. If a recommendation system only side information such as categories, directors in graph neural
recommends tightly coupled items with high similarity, it network. It highlights the importance of exploiting the at-
may lose potential recommendation predictions to some users tribute graph rather than the interaction graph in addressing
like user A. In our proposed scheme, we can use a loosely- strict cold-start problem in neural graph RS. In the online
coupled method that considers the relationship between users store domain, [24] addresses the cold-start problem in the
and items, in addition to the tightly coupled method. Next Basket Recommendation System (NBRS). To solve this
The evaluation is conducted by comparing the recall with problem, Authors proposed the model that incorporates vari-
the previous cold-start models. Our experimental results re- ous couplings between users and items, ensuring the effective
veal that our proposed method improves the state-of-the- transmission of user/item information. The model improves
art model in cold-start item recommendation. Moreover, in the Particle Swarm Optimization algorithm to optimize the
warm-start settings, our proposed method achieves competi- weights and biases of the Deep Auto-Regressive network for
tive performance outperforming the previous model [18]. learning heterogeneous couplings across baskets, ultimately
The main contributions of our work can be summarized as recommending the next basket. Furthermore, it integrates the
follows: Adaptive Response to Particle Adjustment Strategy (ARPAS)
• We propose a flexible design of two-tower recommen- into our framework. [23] addresses the cold-start problem by
dation model which can better capture diverse user’s leveraging item feedback information from various domains,
behavioral patterns. including Movies, Books, Games, and Perfumes where users
• Our proposed training scheme with Gumbel-Softmax have interacted with items. The authors propose a cross-
stochastically selects the most relevant encoder for each domain recommendation network that integrates a sparse
interaction between user and items. In turn, in the train- local sensitivity mechanism into geometric deep learning
ing phase, each encoder learns attentive representation. algorithms. Additionally, they introduce a local sensitivity
• Our experimental results support our theoretical claims adapter to capture crucial local geometric information within
and also demonstrate that our method empirically the recommendation system’s structure.
achieves state-of-the-art results for cold-start item rec-
ommendation. B. TWO-TOWER NETWORK
In deep neural network literature, hybrid models have been
II. RELATED WORK actively used for solving cold-start problems. In [21], [17],
RS is used in various fields and many applications suffer [19], and [20], both user and item features are utilized to
from cold-start problem [14]–[16], [23], [24]. To address this address the cold start problem. The authors in [17] combined
problem, researchers have proposed various approaches [25]. two of two-towers, where one of the two-tower is specialized
Specifically, hybrid models have been widely used in the field for computing the inner product of user and item embeddings,
of deep neural networks [17]–[21], [26]–[28], which leverage and the other two-tower is specialized for computing the
side information of users or items. The two-tower model is a prediction score using user and item features. In each tower,
popular hybrid model in neural network [17]–[21]. the GMF and MLP structures are applied to compute the
prediction score, and they are connected in the NeuMF layer
A. COLD START PROBLEM IN VARIOUS DOMAINS. to obtain the final prediction score, which solves the cold start
The cold start problem is present in recommender systems problem. In [19], [21], both models use a user tower that with
from various domains, where side information is often uti- the user’s ID and feature information, and an item tower with
lized. In the music domain, authors in [14] leveraged music the item’s ID and feature information. In [21], two stacked
VOLUME 11, 2023 3

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3346918

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

denoising autoencoders are used to learn the user and item representations, where we denote item encoder as g item (·) and
features. The extracted user feature vector and embedded user user encoder as g user (·). Following the implementations in
ID, as well as the extracted item feature vector and embedded [18], each encoder is a Multi-Layer Perceptron (MLP) using
item ID, are concatenated to form the user and item latent Hyperbolic Tangent (tanh) function as activation function.
vectors. These latent vectors are then fed into a neural network Once the representations are generated, the dot product of the
to learn the user-item relationship and generate a predicted user representation and the item representation is compared
rating. A hybrid model in [19] combines content-based and with the ground-truth positive or negative pairs in training
collaborative filtering approach, which can effectively inte- phase.
grate content and past feedback information. The attention Using the two-tower framework, [18] proposed a shared
mechanism is employed to control the proportion of the two model to effectively solve the cold-start item recommenda-
types of information for each user-item pair. Additionally, an tion. In their approach, the item representation from the item
adaptive learning strategy called ‘cold sampling’ is performed encoder is shared with the hidden item representation of the
to address the cold start problem. [20] consists of a user user encoder. The user encoder in [18] takes the user’s past
tower that takes the user’s preference and side information as interactions with the items as its input, instead of the user
inputs to generate the user’s latent representation, and an item information; the item encoder takes the item features as its
tower that takes the item’s preference and side information input. The item representation and user representation can be
as inputs to generate the item’s latent representation. This obtained as follows:
model focuses on integrating preference and side information,
thus exhibits performance degradation when past interaction zitem
n = g item (Xn ), (1)
data is less available. The shared attention model [18] is also zuser
m =g user
(Rm,: ), (2)
based on the two-tower framework. One of the encoders takes
users’ information of past interaction with the items, and the where the first layer of g user is directly borrowed from the
other takes all item features as inputs. This model is expressed item representations in shared model. [18] applied attention
as a shared model because the item representation is shared mechanism to the input of user encoder for further perfor-
with first layer of the user encoder. Shared attention model mance improvement. However, the shared model tend to
currently achieves state-of-the-art results on item cold-start focus too much on the item features similarities, and thus
setting. However, as the recommendations mainly rely on can lose flexibility. Throughout the paper, we refer the model
the feature similarity between items, items beyond its feature proposed in [18] as tightly coupled model.
cannot be recommended effectively. Items that are not similar
in features tend to be ignored in recommendations, which B. PROPOSED SCHEME FOR ENHANCING FLEXIBILITY
may limit to provide novel recommendations [29]. To address Motivated by aforementioned limitations, we propose a novel
this problem, in addition to the method which only uses item ensemble approach of two-tower network by having addi-
feature similarities in a tightly shared manner, we propose a tional user encoder. We resort to multiple-choice learning [22]
flexible model that can capture user behaviors beyond item scheme for training an ensemble of two (multiple) mod-
similarities in side information. els, where the overall structure of our model is provided in
Figure 2. In order to improve the limitations of the tightly
III. PROPOSED MODEL coupled model from [18], we add additional model, namely,
A. PRELIMINARIES loosely coupled model. As the naming implies, the layer
In this paper, a set of M users is denoted as U = {u1 , ..., uM } of user encoder in loosely coupled model does not share
and a set of N items is denoted as V = {v1 , ..., vN }. R denotes any information of item representations. We expect the user
the M × N interaction matrix with implicit feedback; when encoder in loosely-coupled model can further capture the user
user m interacted with item n in the past, Rmn = 1, otherwise representations that can be missed in the strict assumption
Rmn = 0. Side information can be exploited for recommen- in tightly coupled model. As can be seen in Figure 2, while
dation with implicit feedback, which can be from items or each item generates its own item representations, each user
users. Due to privacy concerns, item side information is often generates tight user representations and loose user represen-
used in the literature. We use N × D item side information X tations from different encoders. At each user-item interaction,
as feature in our study for RS, where D is the dimension of the multiple-choice learning scheme selects the most relevant
feature of each item. In item cold-start recommendation, the user representations from the two. We defer the details of our
goal is to predict whether users will like a new item based on multiple-choice learning scheme to the following section.
the available data without any past information, where two- As in [18], we use Multi-Layer Perceptron (MLP) with
tower model is often used. Hyperbolic Tangent (tanh) activation function for all encoders
Hybrid approaches can effectively combine the advantages in our model. While the item encoder is denoted as g item (·),
of collaborative filtering and content-based RS. In neural we differentiate the user encoders as encoder1:g user1 (·) for
network, two-tower is an intuitive structure for feeding side tightly-coupled, and user encoder2: g user2 (·) for loosely cou-
information of users or items. Two-tower network consists pled. The item encoder takes D-dimensional item features
of two encoders for generating user representations and item as input and obtains a low-dimensional item representation
4 VOLUME 11, 2023

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3346918

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

𝑦ො1 / ෝ𝑦2

Gumbel Softmax
𝜎
logits2 𝑦ො2

𝑦ො1
𝜎
logits1 MLP layer

User representation Item representation User representation

Shared embedding

Interaction vector Item side information Interaction vector

: tightly-coupled : loosely-coupled

FIGURE 2: Overall structure of proposed model.

through g item (·) in Equation 1. Thus, for user representations, TABLE 1: Performance comparison between Softmax and
we generate using the following encoders: Gumbel-Softmax in our proposed model. Gumbel-Softmax
always performs better than Softmax.
zuser1
m = g user1 (Rm,: ), (3) (a) R@100 of item cold-start under standard setting
zuser2
m = g user2 (Rm,: ), (4) Gating Mechanism CiteULike MLIMDb
Gumbel-Softmax 67.5 55.2
where the encoder g user1 (·) is borrowed from the encoder Softmax 67.4 53.8
g user (·) in the shared model [18], and the encoder g user2 (·) is
a three-layered MLP with all trainable weights in each layer. (b) R@100 of item cold-start under challenging set-
ting
From tightly coupled model, the output ŷ1 ∈ {0, 1} is
Gating Mechanism CiteULike MLIMDb
predicted using the sigmoid function. Gumbel-Softmax 61.3 53.1
Softmax 60.5 52.1
Pr(ŷ1 = 1) = σ(zuser1 · zitem ), (5)
where σ is the sigmoid function, and the logit in sigmoid is the
inner product of the user and item representations. Likewise,
the output of loosely coupled model can be obtained as below:
Pr(ŷ2 = 1) = σ(g r (zuser2 · zitem )), (6)
This ensemble approach offers great flexibility by allowing
where the MLP layer g r (·) is additionally added for capturing us to combine different assumptions. Thus, recommendation
the complex relationship between user-item interactions. We beyond item side information can still be conducted, and vice
empirically found that the extra MLP layer in the loose cou- versa. In the following, we present the details of our ensemble
pled model contribute to further performance improvement. scheme.
Finally, the model generates the final output from yˆ1 and
yˆ2 , where the final output is selected from one of the two.
VOLUME 11, 2023 5

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3346918

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

C. MULTIPLE CHOICE LEARNING WITH Algorithm 1 training process


GUMBEL-SOFTMAX 1: Input: R, X
We combine the tightly coupled model and loosely coupled 2: Output: ŷ
model through multiple-choice learning [22] scheme, where 3: Initialize: user encoders g user1 (·), g user2 (·), and item
item encoder is fixed and user encoder is selected stochasti- encoder g item (·)
cally. Multiple-choice learning outputs the multiple hypothe- 4: zuser1 : representation of user in encoder 1 (tight)
ses, unlike traditional learning methods, and chooses the 5: zuser2 : representation of user in encoder 2 (loose)
most plausible solution for each data instance (or user-item 6: zitem : representation of item
interaction in our context). Specifically, we select a user en- 7: for epoch do
coder using Gumbel-Softmax [30], where Gumbel-Softmax 8: for Iters I =(number of trainset) / batch size do
can be viewed as channel selector. Here, the Gumbel-Softmax 9: 1. Representation
trick [30] makes the decision process differentiable. We let the 10: zuser1 = g user1 (R)
two logits from the two encoders compete each other through 11: zuser2 = g user2 (R)
the equation below. As a result, we proceed with the train- 12: zitem =g item (X)
ing by stochastically selecting the encoder. Gumbel-Softmax 13: 2. Probability
aids in encoder selection, contributing to the performance 14: P1 = σ(zuser1 · zitem )
improvement of our model. It yields better results compared 15: P2 = σ(gr (zuser2 · zitem ))
to using a regular Softmax, as demonstrated in the Table 1 16: 3. User encoder selection
First, for every interaction between user m and item n, we 17: c = Gumbel-Softmax(zuser1 · zitem ,
r user2 item
obtain representations of tightly-coupled and loosely-coupled g (z ·z ))
encoders through the following Equation 1, 2, 3, 4. Second, 18: P(y = 1) = c [P1 , P2 ]⊺
based on [22], we select the encoder through Equation 1. 19: 4. Backpropagation
cmn = Gumbel_Softmax(zm user1
· zitem r user2
· zitem 20: update g user1 (·), g user2 (·), g item (·)
n , g (zm n )),
(7) 21: end for
where cmn is an one-hot indicator vector for interaction be- 22: end for
tween user m and item n and describes the sampled model1 .
Gumbel-Softmax have proven to be effective in many do-
mains when sharp mapping is preferred [31]–[34]. As such, IV. EXPERIMENTS
we use the Gumbel-Softmax as the gating mechanism be- In this section, we provide an explanation of the experimental
tween the two user encoders: tight vs loose. Thus, we can have setup and baseline models used in the experiments conducted
the final output of our model as below. to evaluation the effectiveness of our proposed ’Flexible Two-
( tower’ model in addressing the item cold-start problem.
ŷ1 , if c = [1, 0]
ŷ = (8)
ŷ2 , if c = [0, 1] A. EXPERIMENTAL SETUP
Through this proposed framework, we can capture users’ 1) Dataset
diverse behavioral aspects. In the forward-process (or gen- For the evaluation, we use two datasets:
erative process), Gumbel-Softmax stochastically selects one 1) CiteULike [35]. This dataset has been widely used in
of the user encoder for generating user-item interaction. The previous studies [19], [20], [36]–[38] on the cold-start
generated signal can be triggered by one of the multiple problem. The dataset has been originally introduced
assumptions, where we have two in this study. in collaborative topic modeling [35], where the proba-
bilistic topic modeling was introduced for recommend-
D. TRAINING PROCESS
ing scientific articles to users. The dataset contains
The overall training process of our proposed model is pro- 204,987 user-article (or item in the RS context) inter-
vided in Algorithm 1. Each user obtains two user representa- actions from 5,551 users across 16,980 articles (items),
tions from tightly coupled model and loosely coupled model. where the interaction matrix has a sparsity of 99.8%.
Each item also obtains its representation through the item In the interaction matrix R, Rmn = 1 means that
tower. The probability of user-item interactions are predicted user m has saved article n in his internet library, and
independently from the two scenarios, which is reserved as P1 Rmn = 0 otherwise. Along with the interaction matrix,
and P2 . The final probability off a given user-item interaction the title and abstract have also been used for the task.
is selected from the reserved two prediction with respect to The authors in [35] concatenated the title and abstract
the sampled indicator c. The final output is compared with the on each article, and performed tf-idf to choose the top
ground-truth, and each user encoder and a item encoder gets 8,000 vocabulary. Afterwards, the dimension of item
updated through backpropagation. Gumbel-Softmax trick al- features was fixed at 300 and has been widely used in
lows the model to be trained in end-to-end fashion. many studies. Out of 16,980 items, 3,396 items have
1 In our experiments, we set the Gumbel-Softmax temperature τ to 0.7 been reserved for cold-item recommendation in [20],
6 VOLUME 11, 2023

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3346918

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

which we use the same train-test split in our study as in the models for fair comparison in each dataset.
[18], [20], [35]. The items which have been reserved for
testing are completely cold-items, where no previous 2) Implementation Details
interaction records are available. These cold-items are We implement our proposed model using PyTorch 1.6.0.
also regarded as a new item to the RS. In this study, We set the mini-batch size to 32, the max epoch is set to
we additionally conduct an experiment for warm item, 50. The learning rate is 0.001 in the CiteULike dataset and
which we have 2,264 items for testing. The number 0.0001 in the MLIMDb dataset. The Gumbel-Softmax tem-
of warm start items were chosen to have reasonable perature was fixed to 0.7 throughout the training. The item
number of items for training and testing. and tightly user encoder use MLPs with three layers, where
2) MLIMDb. This dataset consists of MovieLens and the dimensions are [250, 200, 100], and loosely user encoder
IMDb data, which are commonly used for researching is [1000,500,100] for the CiteULike dataset. In the MLIMDb
and developing recommendation systems, including the dataset, the MLP layer dimensions for all encoders are set
cold-start problem [39]–[44]. The MovieLens 100K to [1000, 500, 100]. Suitable dimensions were tried with
dataset contains rating information of 1,682 movies respect to the size of the feature dimension. As previously
evaluated by 943 users. The IMDb dataset contains mentioned, we use the same train-test split as in [18]. In the
information on over 300,000 movies. In previous stud- train set, the ratio of positive(1) and negative(0) interactions
ies [16], [45], [46], two datasets, MovieLens and IMDb, between users and items is 1:10. when the train data set un-
are combined to obtain side information. Similarly, we balanced, our model can be biased toward a large class, which
combine the two datasets to have interaction matrix and adversely affects the training [47]. To address this problem,
item (movie) side information. We name our dataset we use re-weighting scheme [48], which uses the effective
as MLIMDb. In the MovieLens dataset, user IDs and number of samples for each class to re-balance the loss. In
movie IDs were used for interaction matrix, while our experiment, we integrate it to Binary Cross Entropy Loss
in the IMDb dataset, information on movie directors, (BCELoss). When the ith sample of the batch belongs to
writers, actors, and genres were used for item feature positive or negative class, the sample weight is calculated as
matrix. We used the director, writer, actor, and genre follows:
information of the movie as item features, treating the
unique ids of each entity in each category as a single 
1
word in a sentence. For example, in the MLIMDb  1−βn+
 if belongs to positive interaction
dataset, the movie ‘Toy Story’ has John Lasseter as Wi = 1
1−β
(9)
 1−βn−
 if belongs to negative interaction
the director, Pete Doctor, Andrew Stanton, Joe Ranft, 1−β
Joss Whedon, Joel Cohen, Alec Sokolow as writers,
Tom Hanks, Tim Allen, Don Rickles, Jim Varney as where the n and n− are the number of samples of the
+

actors, and Adventure, Animation, Comedy as gen- positive interactions and negative interactions respectively in
res, and their unique ids are nm0005124, nm0230032, a batch. The β is a hyperparameter which we fixed to 0.99.
nm0004056, nm0710020, nm0923736, nm0169505, All experiments of our model are conducted using a NVIDIA
nm0812513, nm0000158, nm0000741, nm0725543, RTX A5000 GPU.
nm0001815, Adventure, Animation, and Comedy re-
spectively, then the item feature for Toy Story would 3) Evaluation
be ‘nm0005124 nm0230032 nm0004056 nm0710020 For our evaluation, we use CiteULike, MLIMDb datasets.
nm0923736 nm0169505 nm0812513 nm0000158 CiteULike have been used widely in the literature, while
nm0000741 nm0725543 nm0001815 Adventure Ani- MLIMDb dataset is the newly introduced datasets in our
mation Comedy’. Then, we applied tf-idf to the bag-of- study. CiteULike dataset consist of 16,980 items of which
words from the item feature. The dimension is 1303, us- 3,396 are reserved for cold-start evaluation. This 4:1 split
ing only words that appear more than twice. The com- is the standard split, where training/test dataset have been
bined dataset based on the common movies includes fixed across all the studies for fair comparison. To tackle the
943 users, 1,146 items, and user-item interaction matrix problem in more challenging scenario, we additionally per-
is 91.0% sparse. In the interaction matrix R,Rmn = 1 form in 3:1 setting by adding 849 more items in the previous
means that user m has interacted with movie n, and testing data. We also want to evaluate our model in warm-start
Rmn = 0 otherwise. Out of 1,146 items, 229 items were setting, where we test the 2,264 fixed warm-items to compare
reserved for cold item recommendation. 187 items were with [18]. We constructed the MLIMDb dataset in a similar
reserved for warm item recommendation. The number way to the CiteULike dataset. Similar to the evaluation using
of cold and warm start items was chosen to provide a the CiteULike dataset, we perform item cold-start evaluation
reasonable number of items for training and testing. on two set of test data: 4:1 split and 3:1 split. For 4:1 split,
we have 229 movies for testing, and for 3:1 split, we have
Throughout our set of experiments, we fix the testing data to 286 movies for testing. To perform evaluation on warm-start
evaluate performance on cold and warm start items across all items, 187 items were used as test data.
VOLUME 11, 2023 7

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3346918

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

We use Recall as our evaluation metric. This is one of


the commonly used metrics in many recommendation system
research to assess how well the proposed model predicts the
items that users actually interact with. We demonstrate the
effectiveness of our proposed model through various evalua-
tions for the top 20, 50, 100, and 150 recommended items.

B. BASELINES
We compare the proposed model with the following strong
baseline models for cold-start RS.
• shared model (-attention) [18] is a two-tower network
for RS, where the item embedding from the item encoder
is shared to the user encoder for cold-start item predic-
tions.
• shared model (+attention) [18] further improves the FIGURE 3: Comparison of Convergence Graphs for training
shared model by applying attention mechanism to the up to epoch 50 on Recall@100 for the CiteULike dataset.
objective function.
• SimPDO [49] utilizes the objective function to train a
one-class recommendation system model, which effec-
tively solves the item cold start problem.
• NeuMF [50] combines Generalized Matrix Factoriza- A. COMPARISON AGAINST SOTA MODELS
tion (GMF) and Multi Layer Perceptron (MLP). By
We use offline test Recall@K as our evaluation metric and
combining the linearlity of MF and the non-linearity
report the average Recall@K to evaluate the proposed model,
of DNN, the correlation between user and item can be
where K is set to 20, 50, 100, and 150. Table 4 provides
learned.
the performance comparison between our proposed model
• DropoutNet [20] tackles the cold-start problem through
and other base models including the previous SOTA model
training the model to reconstruct the input from the
(shared model). We also report the results from None-shared
corrupted version.
model which is separate loosely coupled model. Two sets of
• ACCM [19] tries to take both advantages of content-
experiments were conducted under two settings; one from the
based and collaborative filtering in an attention-based
standard setting, and the other from the challenging setting.
unified approach.
For standard setting, we have 4:1 (train:test) split. For chal-
• DeepMusic [36] uses a latent factor model for recom-
lenging setting, we have 3:1 (train:test) split. These two sets
mendation, and predicts the latent factors from music
are reported in Table 4a and Table 4b. We also found that
audio when it cannot be obtained from usage data.
the previous SOTA model can be further tuned for achieving
• CTR [37] is a collaborative filtering based on prob-
higher performance than the performance from the original
abilistic topic modeling, where the probabilistic topic
paper [18], and we report the higher performance we ob-
modeling allows the interoperability on item content
tained. In Table 4a, we report the results under standard set-
information.
ting. Specifically, CiteULike dataset contains 3,396 cold-start
• CDL [38] is a hierarchical Bayesian deep learning
items out of 16,980 total items, and MLIMDb dataset contains
model, which jointly performs deep representation
229 cold-start items out of 1,146 total items. The experimental
learning on item content information and collaborative
results show that our proposed model outperforms all other
filtering for the feedback.
models, including the state-of-the-art model shared (+atten-
tion) model, in all evaluation metrics. In Table4b, we report
V. RESULTS the results under challenging setting. Specifically, CiteULike
We evaluate our proposed model in many ways. First, we dataset contains 4,245 cold-start items out of 16,980 total
compare the performance of our proposed model to the previ- items, and MLIMDb dataset contains 286 cold-start items
ous State-Of-The-Art (SOTA) model. Second, we justify our out of 1,146 total items. Although the overall performance
use of Gumbel-Softmax through the empirical results. Third, decreases in general as we have fewer samples in training
we use the CiteULike dataset and present the full performance set, our proposed model still outperforms other models with
comparison with the strong baselines. Forth, we show how our less performance degradation. In MLIMDb dataset, previous
model performs in warm-start setting. Finally, we show that SOTA model drops significantly, while our model holds well
our multiple-choice learning scheme with Gumbel-Softmax even in the challenging setting. This demonstrates the robust-
fully uses the two modules not only focusing only on one of ness of our model in various cold-start settings.
the two.

8 VOLUME 11, 2023

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3346918

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

TABLE 2: Comparison of Convergence Graphs for training up to epoch 50 on Recall@100.


Learning Rate CiteULike MLIMDb
Flexible Model Shared Model (+attention) [18] Flexible Model Shared Model (+attention) [18]
0.01 65.7 64.8 52.1 44.1
0.001 67.5 67.1 52.7 44.8
0.0005 67.3 66.6 53.1 45.3
0.0001 65.0 64.8 53.8 45.9
0.00001 60.5 57.0 49.8 42.5

TABLE 3: Performance Comparison Based on Item Feature each category as a standalone feature value. Each individual
Dimension Variations in the MLIMDb Dataset. category dimension is as follows: director is 197, writer is
Flexible Model Shared Model (+attention) [18] 340, actor is 727, and genre is 23. When conducting the
5696 48.8 46.3 experiments, we changed the dimensions of the MLPs with
1303 55.2 52.7
546 54.8 50.4
three layers of the tightly-coupled user encoder and item
263 54.2 49.1 encoder to [300, 200, 100] for writer and actor categories,
[100, 80, 60] for director category, and [20, 20, 20] for genre
category. In the Shared (+attention) models using tightly-
a: The strengths of our model through additional coupled method where feature similarity is important, using
experiment results analysis directors as a single feature resulted in the best performance.
We provide results to demonstrate the distinctions of our This indicates that the directors have a significant influence
model compared to the SOTA model during the learning on movie recommendation. The next best performance was
process. In Figure 3, we present Recall@100 results as graphs achieved by using writer and then actor as standalone features
during the training of the Shared model (+attention) [18] in the shared model, indicating their significant influence
and our model on the CiteULike dataset. This allows us to on movie recommendations. Using each of the three cat-
confirm that our model reaches peak performance faster than egories as standalone features yielded higher performance
the Shared model. Table 2 demonstrates the robustness of compared to using all four categories together as features,
our model. We conducted experiments on both the CiteULike with improvements of 5.8, 3.5, and 1.3 respectively. However,
dataset and the MLIMDb dataset, setting the learning rates at when using genre as a single feature, the performance is
0.01, 0.001, 0.0005, 0.0001, and 0.00001, respectively. The slightly lower than when using all categories as features.
experimental results showed that our model outperformed the From these observations, we can infer that the reason for
Shared model (+attention) at all learning rates. Consequently, the lower performance when using all categories as features
we have fixed the learning rate at 0.001 in the CiteULike compared to using director, writer, and actor individually is
dataset, and 0.0001 in the MLIMDb dataset, which yielded due to the low similarity of genres. This can have an impact on
the best results. Table 3 compares the performance based recommending cold start items. Movie genres tend to be more
on changes in item feature dimensions. We applied tf-idf divisive than other categories, meaning that users may interact
to the bag-of-words from the item feature and compared more with movies of their preferred genres. If movies are
dimensions of 5696, 1303, 546, and 263. These dimensions recommended solely based on genre similarity, then movies
correspond to using only words that appear more than once, from the genres with more interactions will be recommended
twice, thrice, and four times, respectively. When conducting more often, potentially leading to a problem where genres
the experiments, we changed the dimensions of the MLPs with fewer interactions are ignored and diverse recommenda-
with three layers of the tightly-coupled user encoder and item tions are not made. Table 5 also support this analysis. Table 5
encoder to [500,300,100] for 546 item feature dimension, shows the movies recommended through the tightly coupled
[200,150,100] for 263 item feature dimension. The 5696, method: shared (+attention) model and our Flexible Two-
1303 item dimensions maintain three layers of MLPs with Tower Model from a selected user in MLIMDb dataset. Based
[1000, 500, 100]. on the user’s past watched movies, the directors, writers, and
actors who appear in ‘Cosi’, ‘Reality Bites’, and ‘He Walked
by Night’ movies that the user has interacted with at least
b: Case Study: Movie Recommendation
once or twice. The user has interacted with 43 Dramas, 38
Our model outperforms the previous SOTA model on Comedies, 19 Crime, 11 Action, and 11 Adventure films.
MLIMDb dataset. Here, we conduct a case study for further In addition, the user has interacted with one Film-Noir and
analysis. In Figure 4 we compared the performance based on two Thriller movies. However, a similarity-based recommen-
feature categories of movies. Specifically, we select a single dation model only recommends movies that belong to the
category from the IMDb dataset, and use it as smaller features genre with which the user has interacted the most, which
for movies. This way, we can study how each category con- are the ‘Cosi’ movies in this case. On the other hand, our
tribute to the cold-start predictions. ‘All’ refers to using all model utilizes a loosely-coupled method that reflects the user-
categories as feature values as in our results in Table 4a, 4b. item interaction relationship, which allows it to recommend
‘Directors’, ‘Writers’, ‘Actors’, and ‘Genres’ refer to using
VOLUME 11, 2023 9

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3346918

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

TABLE 4: Performance comparison using R@20, R@50, R@100, and R@150 on cold-start items in each dataset. The best
performance is in bold font. The previously reported SOTA results (Shared models) have been further improved in our
experimentation for stringent comparison.
(a) Standard setting: 4:1 split
Method CiteULike MLIMDb
R@20 R@50 R@100 R@150 R@20 R@50 R@100 R@150
Proposed model 32.9 53.7 67.5 75.0 19.8 37.8 55.2 76.2
Shared model [18] (+attention) 31.5 50.0 67.1 69.3 19.5 33.5 52.7 71.0
Shared model [18] (-attention) 30.2 51.4 65.7 72.3 18.9 31.9 46.3 63.2
None-shared model 28.6 47.2 61.7 69.0 18.3 30.5 45.8 62.3
SimPDO [49] 27.9 44.6 59.2 67.2 17.5 31.5 50.6 67.7

(b) Challenging setting: 3:1 split.


Method CiteULike MLIMDb
R@20 R@50 R@100 R@150 R@20 R@50 R@100 R@150
Proposed model 25.9 46.6 61.3 69.3 13.8 34.7 53.8 72.5
Shared model [18] (+attention) 23.9 48.3 60.6 67.3 12.8 30.2 45.9 67.4
Shared model [18] (-attention) 23.2 41.7 54.6 61.6 12.5 29.5 45.1 64.5
Non-shared model 20.8 36.2 48.6 55.6 11.4 27.9 42.5 63.9
SimPDO [49] 22.5 41.0 53.2 61.2 11.9 28.7 45.7 66.0

movies from genres that may not have high similarity but are
still relevant to the user’s past interactions, such as the movies
‘Reality Bites’, and ‘He Walked by Night’. Furthermore, in
Figure 4, our model shows significantly better performance
than the shared model. When using all four movie features,
the model performs 0.6% better than when using only the
directors as feature, 1.1% better than when using only the
writers as feature, 3.% better than when using only the actors
as feature, and 6.5% better than when using only the genres
as feature. This demonstrates the flexibility and effectiveness
of our model, which overcomes the limitations of similarity-
based recommendations by incorporating user-item interac-
tions and considers various movie features to provide more
personalized recommendations.

B. GUMBEL-SOFTMAX AS GATING MECHANISM


Our model performs the choices by stochastically selecting
between the tightlyvcoupled and loosely coupled user en-
coders. We compare two the two gating mechanism, Softmax FIGURE 4: Performance comparison between models using
and Gumbel-Softmax. In the Table 1 we report the result item features on given category. Director turns out to be the
in R@100 for two datasets each under two settings. The most effective feature, while genres contribute the least.
experiments were conducted from CiteULike and MLIMDb
into different subsets based on the cold-start ratio: standard
and challenging. The experimental results reveal the effec- C. COMPARISON WITH STRONG BASELINE MODELS
tiveness of hard gating using the Gumbel-Softmax, where As we want to validate how our proposed model performs
Gumbel-Softmax always achieves higher performance than compared to the state-of-the-art model [18], we use the same
Softmax. We believe the regular Softmax achieves smoothed evaluation metric in [18] which is in R@100. In Table 6,
representations, and thus underperforms compared to the the methods with * denote that the results are directly taken
Gumbel-Softmax. Although there was a slight performance from [18]2 . Both shared model with and without attention
difference depending on the selection method, both methods mechanism, only focuses on the side information of the items,
outperformed the state-of-the-art models in Table 4. This and have been proven to be effective for new item recommen-
confirms the strength of our flexible model, and demonstrates
2 The results of Shared model are reproduced by our experimentation with
its practical value for cold-start recommendation systems.
fine-tuning on the original code, which exhibit improved performance than
the results reported in [18].

10 VOLUME 11, 2023

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3346918

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

TABLE 5: Comparison example of cold-start movies recommendation between the shared (+attention) model and the proposed
model.

main features of Recommendation of Recommendation of


Interaction cold start item list
interaction movies (10↑) tightly-coupled method Flexible Two-Tower Model

1. ‘Drama’ • ‘Reality bites’ • ‘Reality bites’ • ‘Reality bites’


2. ‘Comedy’ • ‘Cosi’ • ‘Cosi’
3. ‘Romance’ • ‘He Walked by Night’ • ‘He Walked by Night’
4. ‘Crime’
5. ‘Action’
6. ‘Adventure’

Cold start movies directors writers actors genres

‘Reality bites’ Ben Stiller Helen Childress Winona Ryder, Comedy,


Ethan Hawke, Drama,
Janeane Garofalo, Steve Romance
Zahn

‘Cosi’ Mark Joffe Louis Nowra Ben Mendelsohn, Barry Comedy,


Otto, Drama,
Toni Collette, Music
Rachel Griffiths

‘He Walked by Night’ Alfred L. Werker, John C. Higgins, Richard Basehart, Scott Crime,
Anthony Mann Crane Wilbur, Harry Brady, Film-Noir,
Essex Roy Roberts, Thriller
Whit Bissell

TABLE 6: Performance comparison of cold-start recommen- meaningful as the results we achieve is not a smoothed results
dation in R@100 using the CiteULike dataset. Results with of the two, but based on selecting better representations at
* denote that the results are directly taken from [18]. The each interactions through our proposed scheme. It is also
previously reported SOTA results (Shared models) have been worth noting that the train-test split has been fixed across
further improved in our experimentation for stringent com- different models for fair comparison, and thus 0.4% improve-
parison. ment is not minimal. Overall, our proposed model achieves
Method Test Recall (%) strong performance in cold-start item recommendation, as
our model 67.5 evidenced by its superior performance compared to other base
Shared model [18] (+attention) 67.1
Shared model [18] (-attention) 65.7
models.
DN-WMF* (DropoutNet, retrained) [20] 65.2
DN-WMF* (DropoutNet) [20] 63.6
ACCM* [19] 63.1 D. PERFORMANCE IN WARM-START SETTINGS
DN-CDL* (DropoutNet) [20] 62.9
None-Shared model 61.7
DeepMusic* [36] 60.1
To make the RS practical, the model should have competitive
SimPDO [49] 59.2 performance not only for the cold-start, but also in warm-start
CTR* [37] 58.9 settings. Here, we compare the performance of our model to
CDL* [38] 57.3
the shared model with attention mechanism and the none-
shared model, where we only compare the performance in
R@100. As shown in Table 7, our model achieves improve-
dations. By comparing the Shared model with and without ments of 1.3% and 3.9% over the shared model with attention,
attention mechanism, we also observe that the attention mech- as well as improvements of 13.% and 9.6% over the non-
anism is effective. The Shared [18] and SimPDO [49] models shared model. Comparing to SimPDO, we achieve improve-
can be also be viewed as our model with loosely-coupled ment of 6.2% and 4.% for the CiteULike and MLIMDb
model never been activated. Non-Shared model has been also datasets, respectively. We show how our flexible model that
included to the baseline, which can be also viewed as our stochastically selects the tightly-coupled or loosely-coupled
model with tightly-coupled model never been activated. From achieves better performance than the models using each mode
the table above, our results show an improvement of 0.4% separately in every case: cold and warm.
over the best performing Shared model (+attention), and even
greater improvements over other strong base models. This is
VOLUME 11, 2023 11

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3346918

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

TABLE 7: Comparison of average recall values of R@100 with warm-start items


Method Test Recall(CiteULike) (%) Test Recall(MLIMDB) (%)
our model 74.8 61.9
Shared model [18] (+attention) 73.5 58.0
None-Shared model 61.8 52.3
SimPDO [49] 67.0 57.9

TABLE 8: The ratio of selection between tight and loose for and items. This is demonstrated by Table 8, which shows the
each dataset. Results are from standard split. flexibility of our approach, and various results that confirm
tightly-coupled loosely-coupled its effectiveness. In particular, for the movie dataset, we ob-
CiteULike 54.1 45.8 served that our model’s flexibility in performance changes
MLIMDb 50.1 49.9
according to feature categories supplements similarity-based
models. Despite the varying impact of different categories, we
constructed the MLIMDb dataset with a similar structure to
E. SELECTION RATIO the original CiteULike dataset by setting each feature at the
Our study showed better performance than using a single same level for the sake of fairness in the experiment. However,
tightly coupled or loosely coupled approach alone. Therefore, we expect to achieve better performance by further leveraging
we provide important insights into the effects of flexibility. the information on the categories, such as having different
To verify the effect of flexibility, we conducted experiments weights on different categories. We leave this as our future
to confirm whether our model uses both tightly-coupled research.
and loosely-coupled approaches simply count the number
of selected indicator vectors in Equation 8. Table 8 shows VII. CONCLUSION
the average tower selection ratio according to the epoch. In One of the main challenges in RS is item cold-start problem,
the CiteULike dataset, the average selection probabilties of where the absence of previous interactions or ratings in new
tightly coupled and loosely coupled methods are 54.1% and items makes it difficult to be predicted. To solve this problem,
45.8%, respectively. In the MLIMDb dataset, the average hybrid neural network models using side information of items
selection probabilities of tightly coupled and loosely coupled as a feature have been favored in the literature. These models
methods are 50.1% and 49.9%, respectively. Based on the recommend items mainly based on the similarity of between
results, it can be observed that both tightly and loosely cou- the item features. However, focusing too much on item feature
pled encoders are fully used in both data sets. We can observe similarities can lead to missing capturing other signals for RS.
an increased impact of loosely coupled method compared to We proposed a flexible model for better capturing the diverse
the CiteULike dataset. Through the changes in influence of aspects of users. Our proposed framework stochastically se-
method, we can explain one of the reasons why the perfor- lects between the tightly coupled user encoder which focuses
mance of our flexible method is significantly better than using on capturing the item feature, and the loosely coupled user
either the tightly-coupled or loosely-coupled methods alone encoder which captures the patterns beyond item features.
in the MLIMDb dataset compared to the CiteULike dataset The effectiveness of this flexibility has been demonstrated
in the Table 8. Furthermore, we can verify the effectiveness through extensive experiments. The experimental results re-
of the flexible method through the actual examples shown in veal that our proposed model achieve SOTA results for item
Table 5, and in the Figure 4 showing the experimental results cold-start recommendation. Moreover, in the warm-start set-
by categories of movie features. This result allowed us to tings, our proposed model achieves competitive performance
compare the performance by category, demonstrating how our outperforming the previous models. These two results reflect
Flexible Two-Tower model can overcome the limitations of the practical application value of our proposed model.
recommendation systems that only focus on similarity.
ACKNOWLEDGMENT
VI. DISCUSSION AND FUTURE WORK
This work was partly supported by the Institute of Infor-
Recommendation is important not only for accurately sug- mation & Communications Technology Planning & Evalu-
gesting items that users want, but also for providing diverse ation (IITP) grant funded by the Korean government (MSIT)
recommendations. For cold-start item predictions, recom- (No. 2021-0-01341, Artificial Intelligence Graduate School
mendations rely heavily on feature similarity as there is little Program of Chung-Ang Univ.), and the National Research
to no interaction information available between the user and Foundation of Korea (NRF) grant funded by the Korean
the item. However, relying solely on feature similarity for rec- government (MSIT) (No. 2021R1F1A1063389)
ommending cold start items can hinder diverse recommenda-
tions, as it may overlook the possibility that the user may pre-
REFERENCES
fer an item even if its similarity with their past interactions is
[1] Hyeyoung Ko, Suyeon Lee, Yoonseo Park, and Anna Choi. A survey
low. Therefore, we propose a flexible model that can address of recommendation systems: Recommendation models, techniques, and
such issues by incorporating the relationship between users application fields. Electronics, 11(1), 2022.

12 VOLUME 11, 2023

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3346918

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

[2] Xiaolei Wang, Kun Zhou, Ji-Rong Wen, and Wayne Xin Zhao. Towards Knowledge Management, CIKM ’18, page 127–136, New York, NY, USA,
unified conversational recommender systems via knowledge-enhanced 2018. Association for Computing Machinery.
prompt learning. In Proceedings of the 28th ACM SIGKDD Conference [20] Maksims Volkovs, Guangwei Yu, and Tomi Poutanen. Dropoutnet: Ad-
on Knowledge Discovery and Data Mining, KDD ’22, page 1929–1937, dressing cold start in recommender systems. In I. Guyon, U. Von Luxburg,
New York, NY, USA, 2022. Association for Computing Machinery. S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors,
[3] Philipp Christmann, Rishiraj Roy, and Gerhard Weikum. Explainable Advances in Neural Information Processing Systems, volume 30. Curran
conversational question answering over heterogeneous sources via iterative Associates, Inc., 2017.
graph neural networks, 05 2023. [21] Yu Liu, Shuai Wang, M. Shahrukh Khan, and Jieyu He. A novel deep hybrid
[4] Dongding Lin, Jian Wang, and Wenjie Li. Cola: Improving conversational recommender system based on auto-encoder with neural collaborative
recommender systems by collaborative augmentation. Proceedings of the filtering. Big Data Mining and Analytics, 1(3):211–221, 2018.
AAAI Conference on Artificial Intelligence, 37(4):4462–4470, Jun. 2023. [22] Abner Guzman-Rivera, Dhruv Batra, and Pushmeet Kohli. Multiple choice
[5] Kun Zhou, Hui Yu, Wayne Xin Zhao, and Ji-Rong Wen. Filter-enhanced learning: Learning to produce multiple structured outputs. In Proceedings
mlp is all you need for sequential recommendation. In Proceedings of the of the 25th International Conference on Neural Information Processing
ACM Web Conference 2022, WWW ’22, page 2388–2399, New York, NY, Systems - Volume 2, NIPS’12, page 1799–1807, Red Hook, NY, USA, 2012.
USA, 2022. Association for Computing Machinery. Curran Associates Inc.
[6] Huiyuan Chen, Yusan Lin, Menghai Pan, Lan Wang, Chin-Chia Michael [23] John Kingsley Arthur, Conghua Zhou, Eric Appiah Mantey, Jeremiah Osei-
Yeh, Xiaoting Li, Yan Zheng, Fei Wang, and Hao Yang. Denoising self- Kwakye, and Yaru Chen. A discriminative-based geometric deep learning
attentive sequential recommendation. In Proceedings of the 16th ACM model for cross domain recommender systems. Applied Sciences, 12(10),
Conference on Recommender Systems, RecSys ’22, page 92–101, New 2022.
York, NY, USA, 2022. Association for Computing Machinery. [24] John Kingsley Arthur, Conghua Zhou, Jeremiah Osei-Kwakye, Eric Ap-
[7] Xuewei Li, Aitong Sun, Mankun Zhao, Jian Yu, Kun Zhu, Di Jin, Mei Yu, piah Mantey, and Yaru Chen. A heterogeneous couplings and persuasive
and Ruiguo Yu. Multi-intention oriented contrastive learning for sequential user/item information model for next basket recommendation. Engineering
recommendation. In Proceedings of the Sixteenth ACM International Applications of Artificial Intelligence, 114:105132, 2022.
Conference on Web Search and Data Mining, WSDM ’23, page 411–419, [25] Deepak Panda and Sanjog Ray. Approaches and algorithms to mitigate
New York, NY, USA, 2023. Association for Computing Machinery. cold start problems in recommender systems: a systematic literature review.
[8] Yutao Ma, Xiao Geng, and Jian Wang. A deep neural network with Journal of Intelligent Information Systems, 59:1–26, 04 2022.
multiplex interactions for cold-start service recommendation. IEEE Trans- [26] Gaowei Xin, Jiwei Qin, and Jiong Zheng. A hybrid recommendation
actions on Engineering Management, 68(1):105–119, 2021. algorithm with co-embedded item attributes and ratings. In 2022 4th
[9] Chieh-Yuan Tsai, Yi-Fan Chiu, and Yu-Jen Chen. A two-stage neural International Conference on Applied Machine Learning (ICAML), pages
network-based cold start item recommender. Applied Sciences, 11(9), 1–7, 2022.
2021.
[27] Sheng Li, Jaya Kawale, and Yun Fu. Deep collaborative filtering via
[10] Shameem A Puthiya Parambath and Sanjay Chawla. Simple and effective
marginalized denoising auto-encoder. In Proceedings of the 24th ACM
neural-free soft-cluster embeddings for item cold-start recommendations.
International on Conference on Information and Knowledge Management,
Data Mining and Knowledge Discovery, 34, 09 2020.
CIKM ’15, page 811–820, New York, NY, USA, 2015. Association for
[11] Ignacio Fernández-Tobías, Iván Cantador, Paolo Tomeo, Vito Walter Computing Machinery.
Anelli, and Tommaso Noia. Addressing the user cold start with cross-
[28] Xin Dong, Lei Yu, Zhonghuo Wu, Yuxia Sun, Lingfeng Yuan, and Fangxi
domain collaborative filtering: Exploiting item metadata in matrix factor-
Zhang. A hybrid collaborative filtering model with deep structure for
ization. User Modeling and User-Adapted Interaction, 29(2):443–486, apr
recommender systems. Proceedings of the AAAI Conference on Artificial
2019.
Intelligence, 31(1), Feb. 2017.
[12] Keyvan Rodpysh, Seyed Mirabedini, and Touraj Banirostam. Employing
[29] Marius Kaminskas and Francesco Ricci. Contextual music information
singular value decomposition and similarity criteria for alleviating cold
retrieval and recommendation: State of the art and challenges. Computer
start and sparse data in context-aware recommender systems. Electronic
Science Review, 6(2):89–119, 2012.
Commerce Research, 05 2021.
[13] Joy Jeevamol and V. G. Renumol. An ontology-based hybrid e-learning [30] Eric Jang, Shixiang Gu, and Ben Poole. Categorical reparameterization
content recommender system for alleviating the cold-start problem. Edu- with gumbel-softmax. In 5th International Conference on Learning Rep-
cation and Information Technologies, 26(4):4993–5022, jul 2021. resentations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference
[14] Guillaume Salha-Galvan, Romain Hennequin, Benjamin Chapus, Viet-Anh Track Proceedings. OpenReview.net, 2017.
Tran, and Michalis Vazirgiannis. Cold start similar artists ranking with [31] Chen Shen, Guo-Jun Qi, Rongxin Jiang, Zhongming Jin, Hongwei Yong,
gravity-inspired graph autoencoders. In Proceedings of the 15th ACM Yaowu Chen, and Xian-Sheng Hua. Sharp attention network via adaptive
Conference on Recommender Systems, RecSys ’21, page 443–452, New sampling for person re-identification. IEEE Transactions on Circuits and
York, NY, USA, 2021. Association for Computing Machinery. Systems for Video Technology, 29(10):3016–3027, 2018.
[15] Tao Qi, Fangzhao Wu, Chuhan Wu, and Yongfeng Huang. PP-rec: News [32] Yue Wang and Justin M Solomon. Prnet: Self-supervised learning for
recommendation with personalized user interest and time-aware news partial-to-partial registration. Advances in neural information processing
popularity. In Proceedings of the 59th Annual Meeting of the Association systems, 32, 2019.
for Computational Linguistics and the 11th International Joint Conference [33] Shiyang Yan, Jeremy S Smith, Wenjin Lu, and Bailing Zhang. Hierarchical
on Natural Language Processing (Volume 1: Long Papers), pages 5457– multi-scale attention networks for action recognition. Signal Processing:
5467, Online, August 2021. Association for Computational Linguistics. Image Communication, 61:73–84, 2018.
[16] Tieyun Qian, Yile Liang, Qing Li, and Hui Xiong. Attribute graph neural [34] Pengsheng Guo, Chen-Yu Lee, and Daniel Ulbricht. Learning to branch
networks for strict cold start recommendation. IEEE Transactions on for multi-task learning. In International Conference on Machine Learning,
Knowledge and Data Engineering, 34(8):3597–3610, 2022. pages 3854–3863. PMLR, 2020.
[17] Ravi Nahta, Yogesh Kumar Meena, Dinesh Gopalani, and Ganpat Singh [35] Chong Wang and David M. Blei. Collaborative topic modeling for rec-
Chauhan. Embedding metadata using deep collaborative filtering to ad- ommending scientific articles. In Proceedings of the 17th ACM SIGKDD
dress the cold start problem for the rating prediction task. Multimedia Tools International Conference on Knowledge Discovery and Data Mining, KDD
Appl., 80(12):18553–18581, may 2021. ’11, page 448–456, New York, NY, USA, 2011. Association for Computing
[18] Ramin Raziperchikolaei, Guannan Liang, and Young-joo Chung. Shared Machinery.
neural item representations for completely cold start problem. In Humberto [36] Aäron van den Oord, Sander Dieleman, and Benjamin Schrauwen. Deep
Jesús Corona Pampín, Martha A. Larson, Martijn C. Willemsen, Joseph A. content-based music recommendation. In Proceedings of the 26th Interna-
Konstan, Julian J. McAuley, Jean Garcia-Gathright, Bouke Huurnink, and tional Conference on Neural Information Processing Systems - Volume 2,
Even Oldridge, editors, RecSys ’21: Fifteenth ACM Conference on Rec- NIPS’13, page 2643–2651, Red Hook, NY, USA, 2013. Curran Associates
ommender Systems, Amsterdam, The Netherlands, 27 September 2021 - 1 Inc.
October 2021, pages 422–431. ACM, 2021. [37] Hao Wang, Binyi Chen, and Wu-Jun Li. Collaborative topic regression
[19] Shaoyun Shi, Min Zhang, Yiqun Liu, and Shaoping Ma. Attention- with social regularization for tag recommendation. In Proceedings of
based adaptive model to unify warm and cold starts recommendation. In the Twenty-Third International Joint Conference on Artificial Intelligence,
Proceedings of the 27th ACM International Conference on Information and IJCAI ’13, page 2719–2725. AAAI Press, 2013.

VOLUME 11, 2023 13

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3346918

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

[38] Hao Wang, Naiyan Wang, and Dit-Yan Yeung. Collaborative deep learning YOON-SIK CHO received the B.S. degree in elec-
for recommender systems. In Proceedings of the 21th ACM SIGKDD Inter- trical engineering from Seoul National University,
national Conference on Knowledge Discovery and Data Mining, KDD ’15, South Korea, in 2003, and the Ph.D. degree in elec-
page 1235–1244, New York, NY, USA, 2015. Association for Computing trical engineering from the University of Southern
Machinery. California, USA, in 2014. He was an Academic
[39] Yinwei Wei, Xiang Wang, Qi Li, Liqiang Nie, Yan Li, Xuanping Li, and Mentor for RIPS Program at the Institute for Pure
Tat-Seng Chua. Contrastive learning for cold-start recommendation. In and Applied Mathematics, University of Califor-
Proceedings of the 29th ACM International Conference on Multimedia,
nia, Los Angeles, and a Postdoctoral Scholar at
MM ’21, page 5382–5390, New York, NY, USA, 2021. Association for
the Information Sciences Institute, University of
Computing Machinery.
[40] Hoyeop Lee, Jinbae Im, Seongwon Jang, Hyunsouk Cho, and Sehee Chung. Southern California. He is currently an Assistant
Melu: Meta-learned user preference estimator for cold-start recommenda- Professor with the Department of AI, Chung-Ang University, South Korea.
tion. In Proceedings of the 25th ACM SIGKDD International Conference His research interests include large-scale data science, social network analy-
on Knowledge Discovery & Data Mining, KDD ’19, page 1073–1082, New sis, and cloud computing.
York, NY, USA, 2019. Association for Computing Machinery.
[41] Runsheng Yu, Yu Gong, Xu He, Yu Zhu, Qingwen Liu, Wenwu Ou, and
Bo An. Personalized adaptive meta learning for cold-start user preference
prediction. Proceedings of the AAAI Conference on Artificial Intelligence,
35(12):10772–10780, May 2021.
[42] Narges Heidari, Parham Moradi, and Abbas Koochari. An attention-based
deep learning method for solving the cold-start and sparsity issues of
recommender systems. Know.-Based Syst., 256(C), nov 2022.
[43] Mehrnaz Mirhasani and Reza Ravanmehr. Alleviation of cold start in movie
recommendation systems using sentiment analysis of multi-modal social
networks. Journal of Advances in Computer Engineering and Technology,
6(4):251–264, 2020.
[44] Keyvan Vahidy Rodpysh, Seyed Javad Mirabedini, and Touraj Banirostam.
Model-driven approach running route two-level svd with context infor-
mation and feature entities in recommender system. Comput. Stand.
Interfaces, 82(C), aug 2022.
[45] Lu Gan, Diana Nurbakova, Léa Laporte, and Sylvie Calabretto. En-
hancing recommendation diversity using determinantal point processes
on knowledge graphs. In Proceedings of the 43rd International ACM
SIGIR Conference on Research and Development in Information Retrieval,
SIGIR ’20, page 2001–2004, New York, NY, USA, 2020. Association for
Computing Machinery.
[46] Pham Minh Thu Do and Thi Thanh Sang Nguyen. Semantic-enhanced
neural collaborative filtering models in recommender systems. Knowledge-
Based Systems, 257:109934, 2022.
[47] Ryota Shimizu, Kosuke Asako, Hiroki Ojima, Shohei Morinaga, Motot-
sugu Hamada, and Tadahiro Kuroda. Balanced mini-batch training for
imbalanced image data classification with neural network. In 2018 First
International Conference on Artificial Intelligence for Industries (AI4I),
pages 27–30, 2018.
[48] Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang Song, and Serge Belongie.
Class-balanced loss based on effective number of samples. In 2019
IEEE/CVF Conference on Computer Vision and Pattern Recognition
(CVPR), pages 9260–9269, 2019.
[49] Ramin Raziperchikolaei and Young joo Chung. One-class recommendation
systems with the hinge pairwise distance loss and orthogonal representa-
tions, 2022.
[50] Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-
Seng Chua. Neural collaborative filtering. In Proceedings of the 26th
International Conference on World Wide Web, WWW ’17, page 173–182,
Republic and Canton of Geneva, CHE, 2017. International World Wide
Web Conferences Steering Committee.

WON-MIN LEE received the B.S. degree in com-


puter software engineering from Kunsan Univer-
sity, Kunsan, South Korea, in 2021. she is cur-
rently pursuing the M.S. degree in artificial intel-
ligence with Chung-Ang University, Seoul, South
Korea. her research interests include recommenda-
tion system and natural language processing.

14 VOLUME 11, 2023

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4

You might also like