Lecture-03 - Vectors and Matrices
Lecture-03 - Vectors and Matrices
Goal: Learn a model that predicts a label 𝑦ො given a feature vector 𝑥 (training data),
i.e., from training data, learn how to predict label 𝑦ො for new sample 𝑥0 .
Goal: Learn a model that predicts a label 𝑦ො given a feature vector 𝑥 (training
data), i.e., from training data, learn how to predict label 𝑦ො for new sample 𝑥0 .
Linear Model
𝑤1
𝑤
Let 𝑤 = 2 ∈ ℝ𝑝 be the weight vector
⋮
𝑤𝑝
Once we have two vectors, we can think about taking their inner product
For example,
−2
p = 2, 𝑤 = when is 𝑦ො = 𝑤, 𝑥𝑖 > 0 and when is 𝑦ො < 0 ?
1
1 𝑤0
𝑥𝑖1 𝑤1
𝑥𝑖 = 𝑥𝑖2 , 𝑤 = 𝑤2 ∈ ℝ𝑝+1
⋮ ⋮
𝑥𝑖𝑝 𝑤𝑝
𝑥𝑖1
𝑥𝑖2
Let 𝑥𝑖 = ∈ ℝ be the feature vector.
⋮
𝑥𝑖𝑝
𝑥𝑖1
𝑥𝑖2
𝑦ො = 𝑤, 𝑥𝑖 = 𝑤 𝑇 𝑥𝑖 = 𝑤1 , … , 𝑤𝑝 ∙ = 𝑥𝑖 𝑇 𝑤 = 𝑥𝑖 , 𝑤
⋮
𝑥𝑖𝑝
Ultimately, we need to use training data to learn the “best” weight vector.
To express this objective more compactly, we can use a matrix representation. Define:
e.g., x21 =1st feature of 2nd training sample; x12 =2nd feature of 1st training sample
i.e., columns corresponding to different features
Then, we can write our linear model using matrix representation as:
Computing X𝑤 means taking the inner product of each row of X with 𝑤 and
storing the results in the vector 𝑦.
ො
ෝ = 𝐗𝒘
𝒚
Note that dimensions should always match
Example:
1 0
X= 2 0 n = 3 training samples, p = 2 features
0 3
1 2 0
x1= , x2 = , x3=
0 0 3
1 0 2
2 2
if w= , then Xw = 2 0 = 4
4 4
0 3 12
Another perspective: Xw is a weighted sum of the columns of X, where w gives the weights
1 0 2 0 2
Xw = 2 2 + 4 0 = 4 + 0 = 4
0 3 0 12 12
ෝ = 𝐗𝒘
𝒚
• Thus, our predicted outputs 𝒚 ෝ are formed by taking this linear combination of
the feature columns – that’s the core idea behind a linear model.
This does not look a
straight line, but linear
models can still help!
This does not look a
straight line, but linear
models can still help!
If p=4
𝑦ො = 𝑋𝑤 implies
𝑦ො𝑖 = 𝑤, 𝑥𝑖 = 𝑤1 1+ 𝑤2 𝑥𝑖 + 𝑤3 𝑥𝑖2 + 𝑤4 𝑥𝑖3
= cubic polynomial that fits training
samples perfectly!
matrix with this special structure
is called Vandermonde matrix
Matrix-Matrix Multiplication
• Definition: A system that predicts user preferences and suggests relevant content
• Recommendation systems often use user-item interaction matrices
• Matrix operations help in predicting missing ratings
• Key technique: Matrix factorization (e.g., singular value decomposition)
User-Item Rating Matrix
R ≈ U × VT
Dimensionality Reduction:
Instead of storing full ratings, store only latent features of users and items
Matrix-Matrix Multiplication
5 ? 3 0.8 0.6 𝑇
0.9 0.4 0.7
? 4 2 ≈ 0.7 0.5 ×
0.5 0.8 0.6
1 ? 5 0.3 0.9
• Inner product:
If u and v are column vectors with the same size, then uTv is the inner product of u and v.
This results in a single scalar value.
• Outer product:
If u and v are column vectors of any size, then uvT is the outer product of u and v.
This results in a matrix where each entry is the product of the corresponding elements from u and v.
Matrix-Matrix Multiplication X = 𝑈𝑉
Example,
𝑢1 𝑣1
u = 𝑢2 , v = 𝑣2 ,
𝑢3 𝑣3
𝑣1
uTv = 𝑢1 𝑢2 𝑢3 𝑣2 = 𝑢1 𝑣1 + 𝑢2 𝑣2 + 𝑢3 𝑣3
𝑣3
𝑢 = (𝑢𝑇 𝑢)1/2 = 𝑢12 + 𝑢22 + 𝑢32 ➔ norm (giving the length of the vector)
Example,
𝑢1 𝑣1
u = 𝑢2 , v = 𝑣2 ,
𝑢3 𝑣3
𝑢1 𝑢1 𝑣1 𝑢1 𝑣2 𝑢1 𝑣3
uvT = 𝑢2 𝑣1 𝑣2 𝑣3 = 𝑢2 𝑣1 𝑢2 𝑣2 𝑢2 𝑣3
𝑢3 𝑢3 𝑣1 𝑢3 𝑣2 𝑢3 𝑣3
Matrix-Matrix Multiplication as the Sum of Outer Products
Matrix-matrix multiplication can be interpreted as the sum of outer products between the rows of the first matrix
and the columns of the second matrix
Example,
1 6 13 5
1 −1
U= 3 4, V= UV = 11 1
2 1
5 2 9 −3
1 6 1 −1 12 6 13 5
3 1 −1 + 4 2 1 = 3 −3 + 8 4 = 11 1
5 2 5 −5 4 2 9 −3
Further Readings
• Any linear algebra books should be fine!