A Study of Different Similarity Metrics and Prediction Approaches in Collaborative Filtering For Recommendation
A Study of Different Similarity Metrics and Prediction Approaches in Collaborative Filtering For Recommendation
User preferences for books on a scale of 1-5 and titles of the books
Similarity Algorithm in Collaborative
Filtering
𝑟𝑖𝑢 and 𝑟𝑗𝑢 are denote the rating of user u on items i and j.
4 3
4 ∗ 3 + 5∗0 + 4∗0 + 0∗3 + 0∗4 +(0∗0)
s(1,2) = 5 =
42 + 52 + 4 2 ∗ ( 32 + 32 + 42 )
4
3 = 0.272
4
U1 4
U2 4.33
U3 4
Average rating of all user =
U4 4
U5 4
U6 3.67
Sim(1,2)=
4−4 3−4 + 5−4.33 0−4.33 + 4−4 0−4 + 0−4 3−4 + 0−4 4−4 + 0−3.67 (0−3.67)
{ 4−4 2 + 5−4.33 2 + 4−4 2 + 0−4 2 + 0−4 2 + 0−3.67 2 } { 3−4 2 + 0−4.33 2 + 0−4 2 + 3−4 2 + 4−4 2 + 0−3.67 2 }
= 0.3032
𝑅𝑢,𝑖 and 𝑅𝑢,𝑗 are denote the rating of user u on item i and item j.
Sim(1,2)=
4−4.33 3−3.33 + 5−4.33 0−3.33 + 4−4.33 0−3.33 + 0−4.33 3−3.33 + 0−4.33 4−3.33 + 0−4.33 (0−3.33)
{ 4−4.33 2+ 5−4.33 2 + 4−4.33 2 + 0−4.33 2 + 0−4.33 2 + 0−4.33 2 } { 3−3.33 2 + 0−3.33 2 + 0−3.33 2 + 3−3.33 2 + 4−3.33 2 + 0−3.33 2 }
= 0.2725
Rank matrix =
1
Sim(1,2) = = 0.20
5
• Z-Score (ZS)
𝑗∈𝑁𝑢 (𝑖) 𝑠𝑖𝑚 𝑖,𝑗 (𝑟𝑗𝑢 − 𝑟𝑗 )/ 𝜎𝑖
𝑟𝑢𝑖 = 𝑟𝑖 + 𝜎𝑖
𝑗∈𝑁𝑢 (𝑖) |𝑠𝑖𝑚 𝑖,𝑗 |
B1 B2 B3 B4 B5 B6
Average rating of the book =
4.333 3.333 3.667 3.50 4.333 4.667
𝑟13 = 3.667 +
0.79 4−4.333 +0+1(0−3.667)+ 0.69 0−3.50 + 0.71 5−4.33 +(0.18)(0−4.667)
0.79 +0.69+0+1 +0.71+0.18
𝒓𝟏𝟑 = 1.67
B1 B2 B3 B4 B5 B6
Standard deviation of items =
0.5774 0.5774 1.5275 0.7071 0.5774 0.5774
𝑟𝑢𝑖 = 1.67
Weighted Average
Z-Score
Mean Centering
Weighted Average
Z Score
42
Task and Eval (1): Rating Prediction
User Item Rating
Task: P(U, T) in testing set
U1 T1 4
U1 T2 3 1. Build a model, e.g., P(U, T) = Avg (T)
U1 T3 3 2. Process of Rating Prediction
U2 T2 4 Train P(U1, T4) = Avg(T4) = (5+4)/2 = 4.5
P(U2, T1) = Avg(T1) = 4/1 = 4
U2 T3 5
P(U3, T1) = Avg(T1) = 4/1 = 4
U2 T4 5 P(U3, T2) = Avg(T2) = (3+4)/2 = 3.5
U3 T4 4 P(U3, T3) = Avg(T3) = (3+5)/2 = 4
U1 T4 3 3. Evaluation by Metrics
U2 T1 2
Mean Absolute Error (MAE) =
ei = R(U, T) – P(U, T)
U3 T1 3 Test
U3 T2 3 MAE = (|3 – 4.5| + |2 - 4| + |3 - 4| +
U3 T3 4 |3 – 3.5| + |4 - 4|) / 5 = 1
43
Task and Eval (2): Top-N Recommendation
User Item Rating Task: Top-N Items to a user U3
U1 T1 4
Predicted Rank: T3, T1, T4, T2
U1 T2 3
Real Rank: T3, T2, T1
U1 T3 3
U2 T2 4 Train Then compare the two lists:
U2 T3 5 Precision@N = # of hits/N
U2 T4 5
Other evaluation metrics:
U3 T4 4 • Recall
U1 T4 3 • Mean Average Precision (MAP)
• Normalized Discounted Cumulative Gain (NDCG)
U2 T1 2 • Mean Reciprocal Rank (MRR)
• and more …
U3 T1 3 Test
U3 T2 3
U3 T3 4
44
Task and Eval (2): Top-N Recommendation
User Item Rating Task: Top-N Items to user U3
U1 T1 4
1. Build a model, e.g., P(U, T) = Avg (T)
U1 T2 3
2. Process of Rating Prediction
U1 T3 3 P(U3, T1) = Avg(T1) = 4/1 = 4
U2 T2 4 Train P(U3, T2) = Avg(T2) = (3+4)/2 = 3.5
U2 T3 5 P(U3, T3) = Avg(T3) = (3+5)/2 = 4
P(U3, T4) = Avg(T4) = (4+5)/2 = 3.5
U2 T4 5
U3 T4 4 Predicted Rank: T3, T1, T4, T2
U1 T4 3 Real Rank: T3, T2, T1
U2 T1 2 3. Evaluation Based on the two lists
U3 T1 3 Test Precision@N = # of hits/N
Precision@1 = 1/1
U3 T2 3
Precision@2 = 2/2
U3 T3 4 Precision@3 = 2/3
45
Context and
Context-awareness
Outline
49
Outline
50
Factors Influencing Holiday Decisions
51
Example of Contexts
52
Example of Contexts
53
Example of Contexts Beyond Time & Locations
54
Example of Contexts Beyond Time & Locations
55
What is Context?
56
What is Context?
57
What is Context?
58
Outline
59
Context-Awareness
61
Example: Smart Home with Context-awareness
https://www.youtube.com/watch?v=UQWYRsXkbAM
62
Content vs Context
• Content-Based Approaches
63
Content-Based Approaches
64
Context-Driven Applications and Approaches
65
When Contexts Take Effect?
Historical
Most Ubiquitous
Data or
Applications Computing
Knowledge
66
Outline
67
How to Collect Contexts
• Sensors
e.g., the application of smart homes
• User Inputs
e.g., survey or user interactions
• Inference
e.g., from user reviews
68
How to Collect Contexts
69
How to Collect Contexts
70
How to Collect Contexts
Family Trip
Early Arrival
71
Short Summary
• Information Overload
• Solution: Information Retrieval (IR)
e.g., Google Search Engine
• Solution: Recommender Systems (RecSys)
e.g., Movie recommender by Netflix
• Context and Context-awareness
e.g., Mobile computing and smart home devices
72
Next
• Coffee Break
– Time: 3:00 PM to 3:30 PM
– Location: Outside Merchants
• Context-awareness in IR and RecSys
• Extended Topics: Trends, Challenges and Future
73
Context-awareness in IR
Context-awareness in IR: Examples
75
Context-awareness in IR: Examples
76
Context in IR
78
Context-awareness in IR
79
Context-awareness in IR
80
Terminologies in IR
81
Context-awareness in IR: Interactive Applications
Interactive app
• User-driven approach
• Users explicitly issue a request (along with context
information) to retrieve relevant documents
• Examples: What are the comfortable hotels near
the Omaha Zoo (assume there are no automatic
location detectors or sensors)
• Contexts are included in the query; or finer-grained
query can be derived from related key words
82
Context-awareness in IR: Proactive Applications
Proactive App
• Author-driven approach
• Each document is associated with a trigger context. The
documents are retrieved to the user if the trigger context
matches user’s current context.
• Example-1: (Location, Time) = trigger contexts for each
restaurant; open Yelp, input Chinese dish, Yelp will return a
list of Chinese restaurants nearby and valid opening hours at
the current moment. [search with queries]
• Example-2: (Location, Time) = trigger contexts for each
restaurant; open Yelp, Yelp will deliver a list of Chinese
restaurants nearby and valid opening hours at the current
moment. [retrieval or recommendation without queries]
83
Context-awareness in IR
84
Context-awareness in IR
Interactive
User must give a query
The context information are involved in the user inputs
It is a process from user contexts to relevant documents
Proactive
User may or may not give a query
Context are captured automatically
It is a process of matching trigger contexts with user contexts
It is a process from documents to users
85
Context-awareness in RecSys
Outline
• Context-aware Recommendation
Intro: Does context matter?
Definition: What is Context in RecSys?
Collection: Context Acquisition
Selection: How to identify the relevant context?
Context Incorporation
Context Filtering
Context Modeling
Other Challenges and CARSKit
87
Non-context vs Context
Companion
• Examples:
Travel destination: in winter vs in summer
Movie watching: with children vs with partner
Restaurant: quick lunch vs business dinner
Music: workout vs study
88
What is Context?
90
CARS With Representative Context
• Observed Context:
Contexts are those variables which may change when a same
activity is performed again and again.
• Examples:
Watching a movie: time, location, companion, etc
Listening to a music: time, location, emotions, occasions, etc
Party or Restaurant: time, location, occasion, etc
Travels: time, location, weather, transportation condition, etc
91
What is Representative Context?
Activity Structure:
1). Subjects: group of users
2). Objects: group of items/users
3). Actions: the interactions within the activities
92
Context-aware RecSys (CARS)
93
Terminology in CARS
94
Context Acquisition
How to Collect the context and user preferences in contexts?
• By User Surveys or Explicitly Asking for User Inputs
Predefine context & ask users to rate items in these situations;
Or directly ask users about their contexts in user interface;
• By Usage data
The log data usually contains time and location (at least);
User behaviors can also infer context signals;
• By User reviews
Text mining or opinion mining could be helpful to infer context
information from user reviews
95
Examples: Context Acquisition (RealTime)
96
Examples: Context Acquisition (Explicit)
97
Examples: Context Acquisition (Explicit)
98
Examples: Context Acquisition (Explicit)
Mobile App: South Tyrol Suggests
Personality
Collection
Context
Collection
99
Examples: Context Acquisition (Implicit)
Family Trip
Early Arrival
100
Examples: Context Acquisition (PreDefined)
101
Examples: Context Acquisition (PreDefined)
102
Examples: Context Acquisition (User Behavior)
103
Context Relevance and Context Selection
105
Context Incorporation
106
Context-aware RecSys (CARS)
107
Context-aware RecSys (CARS)
108
Contextual Filtering
109
Contextual Filtering
111
Differential Context Modeling
• Context Relaxation
User Movie Time Location Companion Rating
112
Differential Context Modeling
• Context Weighting
User Movie Time Location Companion Rating
113
Differential Context Modeling
• Notion of “differential”
1.Neighbor Selection 2.Neighbor contribution
114
Differential Context Modeling
• Workflow
Step-1: We decompose an algorithm to different components;
Step-2: We try to find optimal context relaxation/weighting:
In context relaxation, we select optimal context dimensions
In context weighting, we find optimal weights for each dimension
• Optimization Problem
Assume there are 4 components and 3 context dimensions
1 2 3 4 5 6 7 8 9 10 11 12
DCR 1 0 0 0 1 1 1 1 0 1 1 1
DCW 0.2 0.3 0 0.1 0.2 0.3 0.5 0.1 0.2 0.1 0.5 0.2
• Optimization Approach
Particle Swarm Optimization (PSO)
Genetic Algorithms
Other non-linear approaches
116
Differential Context Modeling
117
Differential Context Modeling
• Summary
Pros: Alleviate data sparsity problem in CARS
Cons: Computational complexity in optimization
Cons: Local optimum by non-linear optimizer
Our Suggestion:
We may just run these optimizations offline to find optimal
context relaxation or context weighting solutions; And those
optimal solutions can be obtained periodically;
118
Contextual Modeling
119
Contextual Modeling
120
Independent Contextual Modeling
(Tensor Factorization)
121
Independent Contextual Modeling
• Tensor Factorization
Multi-dimensional space: Users × Items × Contexts Ratings
123
Independent Contextual Modeling
124
Independent Contextual Modeling
• Tensor Factorization
Cons: 1). Ignore the dependence between contexts and user/item dims
2). Increased computational cost if more context dimensions
125
Dependent Contextual Modeling
(Deviation-Based v.s. Similarity-Based)
126
Dependent Contextual Modeling
127
Deviation-Based Contextual Modeling
128
Deviation-Based Contextual Modeling
130
Deviation-Based Contextual Modeling
𝑁
User-personalized model: F(U, T, C) = P(U, T) + 𝑖=0 𝐶𝑅𝐷(𝑖, 𝑈)
𝑁
Item-personalized model: F(U, T, C) = P(U, T) + 𝑖=0 𝐶𝑅𝐷(𝑖, 𝑇)
131
Similarity-Based Contextual Modeling
132
Similarity-Based Contextual Modeling
133
Similarity-Based Contextual Modeling
c1 Weekend Home
c2 Weekday Cinema
Sim(Di) 0.5 0.1
𝑆𝑖𝑚 c1, 𝑐2 = 𝑁 𝑖=1 𝑠𝑖𝑚(𝐷𝑖) = 0.5 × 0.1 = 0.05
𝐺𝑒𝑛𝑒𝑟𝑎𝑙𝑙𝑦, 𝐼𝑛 𝐼𝐶𝑆: 𝑆𝑖𝑚 c1, 𝑐2 = 𝑁 𝑖=1 𝑠𝑖𝑚(𝐷𝑖)
Weeend Weekday Home Cinema
Weekend 1 b — —
Weekday a 1 — —
Home — — 1 c
Cinema — — d 1
134
Similarity-Based Contextual Modeling
f1 f2 … … … … fN
home 0.1 -0.01 … … … … 0.5 Vector
work 0.01 0.2 … … … … 0.01 Representation
cinema 0.3 0.25 … … … … 0.05
𝑁
𝐺𝑒𝑛𝑒𝑟𝑎𝑙𝑙𝑦, 𝐼𝑛 𝐿𝐶𝑆: 𝑆𝑖𝑚 c1, 𝑐2 = 𝑖=1 𝑠𝑖𝑚(𝐷𝑖)
𝑆𝑖𝑚 𝐷𝑖 = 𝑑𝑜𝑡𝑃𝑟𝑜𝑑𝑢𝑐𝑡 (𝑉𝑖1, 𝑉𝑖2)
135
Similarity-Based Contextual Modeling
136
Similarity-Based Contextual Modeling
Similarity-Based CSLIM:
137
CARSKit: Recommendation Library
138
Recommendation Library
139
Recommendation Library
There are many recommendation library for traditional recommendation.
Users × Items Ratings
140
CARSKit: A Java-based Open-source
Context-aware Recommendation Library
CARSKit: https://github.com/irecsys/CARSKit
Users × Items × Contexts Ratings
141
CARSKit: A Short User Guide
3. Setting: setting.conf
142
CARSKit: A Short User Guide
143
CARSKit: A Short User Guide
Sample of Outputs:
1). Results by Rating Prediction Task
Final Results by CAMF_C, MAE: 0.714544, RMSE: 0.960389, NAME: 0.178636,
rMAE: 0.683435, rRMSE: 1.002392, MPE: 0.000000, numFactors: 10, numIter: 100,
lrate: 2.0E-4, maxlrate: -1.0, regB: 0.001, regU: 0.001, regI: 0.001, regC: 0.001,
isBoldDriver: true, Time: '00:01','00:00‘
2). Results by Top-N Recommendation Task
Final Results by CAMF_C, Pre5: 0.048756,Pre10: 0.050576, Rec5: 0.094997, Rec10:
0.190364, AUC: 0.653558, MAP: 0.054762, NDCG: 0.105859, MRR: 0.107495,
numFactors: 10, numIter: 100, lrate: 2.0E-4, maxlrate: -1.0, regB: 0.001, regU:
0.001, regI: 0.001, regC: 0.001, isBoldDriver: true, Time: '00:01','00:00'
144
Example of Experimental Results
145
Example of Experimental Results
146
Extended Topics:
Trends, Challenges & Future
Challenges
148
Challenges: Numeric Context
149
Challenges: Explanation
150
Challenges: User Interface
151
Challenges: Cold-Start and Data Sparsity
• Cold-start Problems
Cold-start user: no rating history by this user
Cold-start item: no rating history on this item
Cold-start context: no rating history within this context
• Solution: Hybrid Method by Matthias Braunhofer, et al.
152
Challenges: User Intent
153
Trends and Future
Context Suggestion
154
Context Suggestion
Context Rec
155
Context Suggestion: Motivations
156
Context Suggestion: Motivations
157
Zoo Parks in San Diego, USA
158
Zoo Parks in San Diego, USA
159
Context Suggestion: Motivations
160
Context Suggestion: Motivations
161
References
L Baltrunas, M Kaminskas, F Ricci, et al. Best usage context prediction
for music tracks. CARS@ACM RecSys, 2010
Y Zheng, B Mobasher, R Burke. Context Recommendation Using Multi-
label Classification. IEEE/WIC/ACM WI, 2014
Y Zheng. Context Suggestion: Solutions and Challenges. ICDM
Workshop, 2015
Y Zheng. Context-Driven Mobile Apps Management and
Recommendation. ACM SAC, 2016
Yong Zheng, Bamshad Mobasher, Robin Burke. “User-Oriented
Context Suggestion“, ACM UMAP, 2016
162
Tutorial: Context-Awareness In Information
Retrieval and Recommender Systems
Yong Zheng
School of Applied Technology
Illinois Institute of Technology, Chicago