Recommender Systems Assignment
Recommender Systems Assignment
• Advantages:
o Simple and easy to compute.
o Works well for con3nuous numeric data.
• Disadvantages:
o Sensi3ve to the scale of the data.
o May not perform well with high-dimensional data.
3. Cosine Similarity:
• Measures the cosine of the angle between two vectors. It is commonly
used for text data and is robust to the vector's length.
• cos θ = (A · B) / (||A|| * ||B||)
• Advantages:
o Effective for text data and high-dimensional spaces.
o Insensitive to the magnitude of the vectors.
• Disadvantages:
o Ignores non-linear relationships in the data.
o Assumes that the data is represented as vectors.
4. Jaccard Similarity:
• Calculates the size of the intersec3on divided by the size of the union of
two sets. Commonly used for comparing sets.
• J(A, B) = |A∩B| / |A∪B|
• Advantages:
o Suitable for comparing sets, especially in binary or categorical
data.
o Handles sparsity well.
• Disadvantages:
o Ignores the magnitude of the elements in the sets.
o Not suitable for cases where the order of elements matters.
5. Hamming Distance:
• Measures the number of posi3ons at which corresponding bits are
different in two binary strings of equal length.
• d(a,b) = a⊕ b
• Advantages:
o Specifically designed for binary data.
o Simple and easy to interpret.
• Disadvantages:
o Only applicable to data of equal length.
o Limited to binary data.
1. Cosine Similarity
# import required libraries
import numpy as np
from numpy.linalg import norm
print("A:", A)
print("B:", B)
2. Jaccard Similarity
A = {1,2,3,4,6}
B = {1,2,5,8,9}
C = A.intersection(B)
D = A.union(B)
print('AnB = ', C)
print('AUB = ', D)
print('J(A,B) = ', float(len(C))/float(len(D)))
3.Hamming Distance
# Driver code
str1 = "geeksprac3ce"
str2 = "nerdsprac3se"
# func3on call
print(hammingDist(str1, str2))
4. Manha:an Distance
5. Euclidean Distance
import numpy as np
# ini3alizing points in
# numpy arrays
point1 = np.array((1, 2, 3))
point2 = np.array((1, 1, 1))
import pandas as pd
from scipy.stats import pearsonr
• A=UΣVTA=UΣVT
• Here, UU and VV are orthogonal matrices (i.e., UUT=IUUT=I and
VVT=IVVT=I), and ΣΣ is a diagonal matrix with singular values on its
diagonal.
• Let's break down each component of the SVD:
1. Matrix UU:
o The columns of UU are called ler singular vectors.
o The columns of UU form an orthonormal basis for the column
space of AA.
o If A is an m×n matrix, UU is m×m.
2. Diagonal Matrix ΣΣ:
o The diagonal elements of ΣΣ are the singular values of AA, denoted
as σ1,σ2,…,σrσ1,σ2,…,σr, where r is the rank of AA.
o The singular values are always non-nega3ve and represent the
magnitude of the singular vectors in UU and VV.
o The remaining elements of ΣΣ are zero.
o If AA is m×n, ΣΣ is m×n with zeros outside the main diagonal.
3. Matrix VTVT:
o The rows of VTVT are called right singular vectors.
o The columns of VV form an orthonormal basis for the row space of
AA.
o If AA is m×n, VTVT is n×n.
The SVD provides a powerful way to represent and analyze a matrix. The
singular values in ΣΣ indicate the importance of each singular vector in
capturing the overall structure of the data. Higher singular values
correspond to more significant contribu3ons to the matrix.
1. Dimensionality ReducTon:
- SVD allows for dimensionality reduc3on by selec3ng a subset of the most
significant singular values and corresponding vectors. This is useful in
reducing storage requirements and computa3onal complexity.
2. Noise ReducTon
- In the context of data analysis, retaining only the most significant
singular values can help filter out noise and focus on the most essen3al
features of the data.
3. Data Compression:
- SVD is used in data compression techniques, where it helps represent
data in a more compact form by capturing the dominant pa>erns and
rela3onships.
4. Numerical Stability:
- SVD is a numerically stable method for decomposing matrices, making it
robust in various numerical applica3ons.
5. Unique RepresentaTon:
- SVD provides a unique and op3mal decomposi3on for any matrix,
allowing for a clear representa3on of its structure.
1. ComputaTonal Complexity:
- The computa3onal cost of performing the full SVD can be high, especially
for large matrices. Efficient algorithms and approxima3ons are oren used to
address this issue.
2. Storage Requirements:
- Storing the en3re decomposi3on, especially for large matrices, may
require significant memory. However, in many applica3ons, only a subset of
the singular values and vectors needs to be retained.
3. Interpretability:
- While SVD provides a unique decomposi3on, interpre3ng the meaning of
the singular values and vectors in real-world terms may not always be
straighworward, especially in high-dimensional spaces.
4. SensiTvity to Outliers:
- SVD can be sensi3ve to outliers in the data, poten3ally affec3ng the
accuracy of the decomposi3on.
5. Limited Applicability to Sparse Matrices:
- SVD is not directly applicable to sparse matrices, which have a large
number of zero entries. However, there are variants of SVD designed for
sparse matrices.
6. Assumes Linearity:
- SVD assumes that rela3onships in the data are linear. In cases where
non-linear rela3onships dominate, other techniques may be more
appropriate.
1. Image Compression:
o SVD compresses images by capturing essen3al features with fewer
singular values and vectors.
2. RecommendaTon Systems:
o SVD factorizes user-item matrices for collabora3ve filtering,
enabling personalized recommenda3ons.