Precept 6
Precept 6
Precept 6
𝑀 has some missing entries (not everyone has seen every movie)
2
∑ (𝑀𝑖𝑗 − (𝐴𝐵)𝑖𝑗 ) Ω is set of non-missing indices in 𝑀
(𝑖,𝑗)∈Ω
Gradient Time!
Building Intuition
Strategies we use:
1. Take derivative wrt one entry, then generalize to vectors, matrices
2. Pretend the problem has only 1D inputs, then try to build up
𝑚×𝑛 𝜕𝑓
If 𝑓(𝑋, 𝑌) outputs a scalar and 𝑋, 𝑌 ∈ ℝ , what shape should be?
𝜕𝑋
Matrix Multiplication!
1 2
𝑓(𝐴, 𝐵) = ∑ (𝑀𝑖𝑗 − (𝐴𝐵)𝑖𝑗 )
|Ω| (𝑖,𝑗)∈Ω
̂ ̂ 𝑟
Let 𝑀 = 𝐴𝐵 (our current approximation for 𝑀) 𝑀𝑖𝑗 = ∑ 𝐴𝑖𝑧 𝐵𝑧𝑗
𝑧=1
𝑧
𝑧
×
=
7×4 𝐵 ∈ ℝ4×10 ̂
𝐴∈ℝ
𝑀 = 𝐴𝐵 ∈ ℝ7×10
Back to the objective!
Single Entry Derivative
1 ̂
2
̂ 𝑟
𝑓(𝐴, 𝐵) = ∑ (𝑀𝑖𝑗 − 𝑀𝑖𝑗 ) 𝑀𝑖𝑗 = ∑ 𝐴𝑖𝑧 𝐵𝑧𝑗 = 𝐴𝑖∗ ⋅ 𝐵∗𝑗
|Ω| (𝑖,𝑗)∈Ω 𝑧=1
̂
𝜕𝑓 𝜕𝑓 𝜕𝑀𝑖𝑗
= ∑ ⋅
𝜕𝐴𝑖𝑘 𝑗:(𝑖,𝑗)∈Ω ̂ 𝜕𝐴𝑖𝑘
𝜕𝑀𝑖𝑗
𝜕𝑓 ̂
What is ̂ ? 𝜕𝑀𝑖𝑗
𝜕𝑀𝑖𝑗 What is ?
𝜕𝐴𝑖𝑘
An Alternative Representation!
1 ̂
2
1 2 𝐸 is a matrix of errors and we measure
𝑓(𝐴, 𝐵) = ∑ (𝑀𝑖𝑗 − 𝑀𝑖𝑗 ) = ∑ 𝐸𝑖𝑗
|Ω| (𝑖,𝑗)∈Ω |Ω| (𝑖,𝑗)∈Ω loss on non-missing indices
𝜕𝑓 𝜕𝑓 𝜕𝐸𝑖𝑗
= ∑ ⋅
𝜕𝐴𝑖𝑘 𝑗:(𝑖,𝑗)∈Ω 𝜕𝐸𝑖𝑗 𝜕𝐴𝑖𝑘
𝜕𝑓 𝜕𝐸𝑖𝑗
What is ? What is ?
𝜕𝐸𝑖𝑗 𝜕𝐴𝑖𝑘
Matrix Time
𝜕𝑓 2 ̂
= ∑ (𝑀𝑖𝑗 − 𝑀𝑖𝑗 )𝐵𝑘𝑗 Assume no entries are missing.
𝜕𝐴𝑖𝑘 |Ω| 𝑗:(𝑖,𝑗)∈Ω
3 1 𝑜
ℎ2
0
𝑥2
-2 If 𝑔 is the ReLU activation, what is 𝑔((ℎ1 , ℎ2 , ℎ3 ))?
-1
ℎ3