13 Recsys 2
13 Recsys 2
• Homeworks:
• HW2 (due extended: 11/08)
• Due to maintenance at Haedong Lounge (11/01 18:00 – 11/04 09:00).
Recommender Systems 2 • Enjoy the Netflix challenge!
• HW3 (will be posted on 11/06)
EE412: Foundation of Big Data Analytics
• Midterm:
Fall 2024
• Claim finished; thank you for your hard work!
• Classes:
• No in-person class on 11/04; video will be uploaded.
Recommend
Recap Outline
Popularity
“Touching the void” “Into Thin Air”
• Recommender Systems 1. UV Decomposition
• Content-based Recommendation 2. UV Decomposition: Computation
• Collaborative Filtering Items 3. UV Decomposition: Variants
• The Netflix Challenge
like
like like
similar
recommend
like recommend
✖ 𝑉! k
m R ≈ U
R U 𝑉!
VT r
Σ
RMSE( )= m =
M U
R 𝑈𝑉 !
✖ 𝑉! k
• We call 𝑈, 𝑉 parameters.
m R ≈ U • Other choices are hyperparameters.
Overfitting Regularization
• This is a classical example of overfitting: • Regularization is a possible way to prevent overfitting:
• Model starts fitting noise with too many free parameters. • Allows a rich model when there is sufficient data.
• Model is not generalizing well to unseen test data. • Shrinks the model aggressively where data is scarce.
• We should carefully control the model complexity. • The new objective function with regularization is
• E.g., the number of clusters in 𝑘-means.
𝐽 ⋅ = 5 𝑟'( − 𝑢' 𝑣( +
+ 𝜆, 5 𝑢' +
+ + 𝜆+ 5 𝑣( +
+
',(∈* ' (
Validation dataset
Effect of regularization: The Lion King
Dumb and Training dataset
• Goes to the center unless Dumber
Factor 2
the signal is really strong. Independence
Day Test dataset
funny
Iteration/step
Jaemin Yoo 23 Jaemin Yoo 24
23
SGD with Mini-batches Outline
• In practice, people do not apply SGD for individual samples. 1. UV Decomposition
• Instead, they create (mini-)batches of several samples. 2. UV Decomposition: Computation
• GD: 1 step using 𝑁 samples. 3. UV Decomposition: Variants
• (True) SGD: 𝑁 steps using 1 sample.
• Batch SGD: 𝑁/𝐵 steps using a batch of 𝐵 samples.
• Makes a better balance between speed and stability in training.
• 𝐵 is a hyperparameter to choose.
• If 𝑟$% = 1, we minimize − log 𝜎 𝑢$#𝑣% by making 𝜎 𝑢$#𝑣% to 1. • BCE makes the model focus more on wrong samples.
• If 𝑟$% = 0, we minimize − log(1 − 𝜎 𝑢$#𝑣% ) by making 𝜎 𝑢$#𝑣% to 0. • Not already accurate ones.
Summary
1. UV Decomposition
• Latent factor models
2. UV Decomposition: Computation
• Overfitting and regularization
• Stochastic gradient descent
3. UV Decomposition: Variants
• Modeling biases
• Dealing with implicit feedback
• BCE and BPR losses
Jaemin Yoo 39