0% found this document useful (0 votes)

9 views10 pages

Recommender Systems Assignment

Uploaded by

Metang Metagame

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views10 pages

Recommender Systems Assignment

Uploaded by

Metang Metagame

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

DATA SIMILARITY

Data similarity measures are techniques used to quan3fy the similarity or

dissimilarity between two sets of data. These measures are commonly
employed in various ﬁelds, including data mining, machine learning, pa>ern
recogni3on, and informa3on retrieval. The goal is to assess the degree of
resemblance or closeness between two data sets, which could be vectors, 3me
series, images, text documents, or any other type of data.

Some commonly used data similarity measures are:

1. Euclidean Distance:
• Measures the straight-line distance between two points in Euclidean
space.
• d =√[(x2 – x1)2 + (y2 – y1)2]

• Advantages:
o Simple and easy to compute.
o Works well for con3nuous numeric data.
• Disadvantages:
o Sensi3ve to the scale of the data.
o May not perform well with high-dimensional data.

2. Manha:an Distance (L1 Norm):

• Computes the sum of absolute diﬀerences between corresponding
elements of two vectors.
• Distance(x,y)=∑i=1n∣xi−yi∣=∣x2−x1∣+∣y2−y1∣+...+∣xn−x1∣+∣yn−y1∣
• Advantages:
o Similar to Euclidean distance but less sensitive to outliers.
o Suitable for data with different scales.
• Disadvantages:
o May not be appropriate for datasets with complex structures.

3. Cosine Similarity:
• Measures the cosine of the angle between two vectors. It is commonly
used for text data and is robust to the vector's length.
• cos θ = (A · B) / (||A|| * ||B||)
• Advantages:
o Effective for text data and high-dimensional spaces.
o Insensitive to the magnitude of the vectors.
• Disadvantages:
o Ignores non-linear relationships in the data.
o Assumes that the data is represented as vectors.

4. Jaccard Similarity:
• Calculates the size of the intersec3on divided by the size of the union of
two sets. Commonly used for comparing sets.
• J(A, B) = |A∩B| / |A∪B|
• Advantages:
o Suitable for comparing sets, especially in binary or categorical
data.
o Handles sparsity well.
• Disadvantages:
o Ignores the magnitude of the elements in the sets.
o Not suitable for cases where the order of elements matters.

5. Hamming Distance:
• Measures the number of posi3ons at which corresponding bits are
diﬀerent in two binary strings of equal length.
• d(a,b) = a⊕ b
• Advantages:
o Specifically designed for binary data.
o Simple and easy to interpret.
• Disadvantages:
o Only applicable to data of equal length.
o Limited to binary data.

6. Pearson CorrelaTon Coeﬃcient:

• Measures the linear correla3on between two variables, providing a value
between -1 and 1.
• ρ (X,Y) = cov (X,Y) / σX.σY.
• Advantages:
o Captures linear relationships between variables.
o Sensitive to both scale and location differences.
• Disadvantages:
o Assumes a linear relationship and may not capture non-linear
patterns.
o Affected by outliers.
The choice of similarity measure depends on the nature of the data and the
speciﬁc task at hand. Each measure has its strengths and weaknesses, and
selec3ng the appropriate one is crucial for meaningful comparisons.

IMPLEMENTING DATA SIMILARITY MEASURES USING PYTHON

1. Cosine Similarity
# import required libraries
import numpy as np
from numpy.linalg import norm

# define two lists or array

A = np.array([2,1,2,3,2,9])
B = np.array([3,4,2,4,5,5])

print("A:", A)
print("B:", B)

# compute cosine similarity

cosine = np.dot(A,B)/(norm(A)*norm(B))
print("Cosine Similarity:", cosine)

2. Jaccard Similarity

A = {1,2,3,4,6}
B = {1,2,5,8,9}
C = A.intersection(B)
D = A.union(B)
print('AnB = ', C)
print('AUB = ', D)
print('J(A,B) = ', float(len(C))/float(len(D)))

3.Hamming Distance

#func3on to calculate Hamming Distance

def hammingDist(str1, str2):
i=0
count = 0

while(i < len(str1)):

if(str1[i] != str2[i]):
count += 1
i += 1
return count

# Driver code
str1 = "geeksprac3ce"
str2 = "nerdsprac3se"

# func3on call
print(hammingDist(str1, str2))

4. Manha:an Distance

from math import sqrt

#create func3on to calculate Manha>an distance

def manha>an(a, b):
return sum(abs(val1-val2) for val1, val2 in zip(a,b))
#deﬁne vectors
A = [2, 4, 4, 6]
B = [5, 5, 7, 8]

#calculate Manha>an distance between vectors

manha>an(A, B)

5. Euclidean Distance

# Python code to ﬁnd Euclidean distance

# using linalg.norm()

import numpy as np

# ini3alizing points in
# numpy arrays
point1 = np.array((1, 2, 3))
point2 = np.array((1, 1, 1))

# calcula3ng Euclidean distance

# using linalg.norm()
dist = np.linalg.norm(point1 - point2)

# prin3ng Euclidean distance

print(dist)

6. Pearson CorrelaTon Coeﬃcient

import pandas as pd
from scipy.stats import pearsonr

# Import your data into Python

df = pd.read_csv("Auto.csv")

# Convert dataframe into series

list1 = df['weight']
list2 = df['mpg']

# Apply the pearsonr()

corr, _ = pearsonr(list1, list2)
print('Pearsons correla3on: %.3f' % corr)

SINGLE VALUE DECOMPOSITION (SVD)

• Singular Value DecomposiTon (SVD) is a mathema3cal technique used

in linear algebra to decompose a matrix into three other matrices. It has
wide applica3ons in various ﬁelds, including signal processing, image
compression, data analysis, and machine learning. For a given matrix AA,
the SVD is represented as:

• A=UΣVTA=UΣVT
• Here, UU and VV are orthogonal matrices (i.e., UUT=IUUT=I and
VVT=IVVT=I), and ΣΣ is a diagonal matrix with singular values on its
diagonal.
• Let's break down each component of the SVD:
1. Matrix UU:
o The columns of UU are called ler singular vectors.
o The columns of UU form an orthonormal basis for the column
space of AA.
o If A is an m×n matrix, UU is m×m.
2. Diagonal Matrix ΣΣ:
o The diagonal elements of ΣΣ are the singular values of AA, denoted
as σ1,σ2,…,σrσ1,σ2,…,σr, where r is the rank of AA.
o The singular values are always non-nega3ve and represent the
magnitude of the singular vectors in UU and VV.
o The remaining elements of ΣΣ are zero.
o If AA is m×n, ΣΣ is m×n with zeros outside the main diagonal.
3. Matrix VTVT:
o The rows of VTVT are called right singular vectors.
o The columns of VV form an orthonormal basis for the row space of
AA.
o If AA is m×n, VTVT is n×n.

The SVD provides a powerful way to represent and analyze a matrix. The
singular values in ΣΣ indicate the importance of each singular vector in
capturing the overall structure of the data. Higher singular values
correspond to more signiﬁcant contribu3ons to the matrix.

ADVANTAGES OF SINGULAR VALUE DECOMPOSITION (SVD):

1. Dimensionality ReducTon:
- SVD allows for dimensionality reduc3on by selec3ng a subset of the most
signiﬁcant singular values and corresponding vectors. This is useful in
reducing storage requirements and computa3onal complexity.

2. Noise ReducTon
- In the context of data analysis, retaining only the most signiﬁcant
singular values can help ﬁlter out noise and focus on the most essen3al
features of the data.

3. Data Compression:
- SVD is used in data compression techniques, where it helps represent
data in a more compact form by capturing the dominant pa>erns and
rela3onships.

4. Numerical Stability:
- SVD is a numerically stable method for decomposing matrices, making it
robust in various numerical applica3ons.

5. Unique RepresentaTon:
- SVD provides a unique and op3mal decomposi3on for any matrix,
allowing for a clear representa3on of its structure.

6. ApplicaTons in Signal Processing and Image Compression:

- SVD is widely used in signal processing and image compression, providing
eﬃcient representa3ons for these types of data.

7. Solving Linear Systems:

- SVD can be used to solve systems of linear equa3ons and ﬁnd solu3ons
to overdetermined or underdetermined systems through the use of
pseudoinverse.

8. Principal Component Analysis (PCA):

- PCA, which relies on SVD, is a powerful technique for iden3fying and
analyzing the principal components in a dataset.

DISADVANTAGES OF SINGULAR VALUE DECOMPOSITION (SVD):

1. ComputaTonal Complexity:
- The computa3onal cost of performing the full SVD can be high, especially
for large matrices. Eﬃcient algorithms and approxima3ons are oren used to
address this issue.

2. Storage Requirements:
- Storing the en3re decomposi3on, especially for large matrices, may
require signiﬁcant memory. However, in many applica3ons, only a subset of
the singular values and vectors needs to be retained.

3. Interpretability:
- While SVD provides a unique decomposi3on, interpre3ng the meaning of
the singular values and vectors in real-world terms may not always be
straighworward, especially in high-dimensional spaces.

4. SensiTvity to Outliers:
- SVD can be sensi3ve to outliers in the data, poten3ally aﬀec3ng the
accuracy of the decomposi3on.
5. Limited Applicability to Sparse Matrices:
- SVD is not directly applicable to sparse matrices, which have a large
number of zero entries. However, there are variants of SVD designed for
sparse matrices.

6. Assumes Linearity:
- SVD assumes that rela3onships in the data are linear. In cases where
non-linear rela3onships dominate, other techniques may be more
appropriate.

APPLICATIONS OF SINGULAR VALUE DECOMPOSITION (SVD)

Singular Value Decomposi3on (SVD) ﬁnds applica3ons in various ﬁelds :

1. Image Compression:
o SVD compresses images by capturing essen3al features with fewer
singular values and vectors.

2. RecommendaTon Systems:
o SVD factorizes user-item matrices for collabora3ve ﬁltering,
enabling personalized recommenda3ons.

3. Principal Component Analysis (PCA):

o SVD aids PCA, reducing data dimensionality while preserving
variability.

4. Latent SemanTc Analysis (LSA) in NLP:

o SVD uncovers hidden rela3onships in document-term matrices for
tasks like clustering and topic modeling.

5. Signal Processing and System IdenTﬁcaTon:

o SVD analyzes signals, iden3fying dominant frequencies, and aids
system iden3ﬁca3on in control theory.

These applications highlight the versatility of Singular Value Decomposition in

extracting meaningful information from diverse types of data, making it a
valuable tool in fields ranging from computer vision and natural language
processing to recommendation systems and signal processing.

Part 1.2
100% (1)
Part 1.2
88 pages
Yamaha R1 Service Manual 2007
100% (1)
Yamaha R1 Service Manual 2007
426 pages
Modern Big Data Algorithms
No ratings yet
Modern Big Data Algorithms
52 pages
RIPMWC Round 2 Sample Questions 2019
100% (3)
RIPMWC Round 2 Sample Questions 2019
2 pages
Linear Algebra Project
No ratings yet
Linear Algebra Project
9 pages
Parasite Zapper Circuit
No ratings yet
Parasite Zapper Circuit
8 pages
Lec 3
No ratings yet
Lec 3
60 pages
L14 SVD
No ratings yet
L14 SVD
8 pages
Linear Algebra Course Project
No ratings yet
Linear Algebra Course Project
7 pages
Singular Value Decomposition
No ratings yet
Singular Value Decomposition
24 pages
Vietnam General Confederation of Labor: Ton Duc Thang University Faculty of Information Technology
No ratings yet
Vietnam General Confederation of Labor: Ton Duc Thang University Faculty of Information Technology
26 pages
Linear Algebra: Submitted by Ahmad Saeed Submitted To Sir Muzzam Ali BITM-F18-022
No ratings yet
Linear Algebra: Submitted by Ahmad Saeed Submitted To Sir Muzzam Ali BITM-F18-022
5 pages
Cos323 s06 Lecture09 SVD
No ratings yet
Cos323 s06 Lecture09 SVD
24 pages
SVD Theory
No ratings yet
SVD Theory
5 pages
SVD and Data Science
No ratings yet
SVD and Data Science
52 pages
Mahoney Drineas 2009 Cur Matrix Decompositions For Improved Data Analysis
No ratings yet
Mahoney Drineas 2009 Cur Matrix Decompositions For Improved Data Analysis
6 pages
Math Primer
No ratings yet
Math Primer
13 pages
CS168: The Modern Algorithmic Toolbox Lecture #9: The Singular Value Decomposition (SVD) and Low-Rank Matrix Approximations
No ratings yet
CS168: The Modern Algorithmic Toolbox Lecture #9: The Singular Value Decomposition (SVD) and Low-Rank Matrix Approximations
10 pages
SVD Tutorial 2022
No ratings yet
SVD Tutorial 2022
24 pages
Section 7.4 Notes (The SVD)
No ratings yet
Section 7.4 Notes (The SVD)
9 pages
Singular Value Decomposition (SVD) With Two Fea-Tures: Column Means
No ratings yet
Singular Value Decomposition (SVD) With Two Fea-Tures: Column Means
3 pages
Lecture 2. Similarity Measures For Cluster Analysis
No ratings yet
Lecture 2. Similarity Measures For Cluster Analysis
31 pages
Singular Value Decomposition
100% (1)
Singular Value Decomposition
24 pages
Rsfinal
No ratings yet
Rsfinal
30 pages
Data Science - UNIT - 5
No ratings yet
Data Science - UNIT - 5
57 pages
AOE 5404 Homework 5: Due On Feb. 26, 2025
No ratings yet
AOE 5404 Homework 5: Due On Feb. 26, 2025
2 pages
Data Characterization
No ratings yet
Data Characterization
31 pages
On UNIT-5
No ratings yet
On UNIT-5
45 pages
SVD Other2
No ratings yet
SVD Other2
11 pages
Bo
No ratings yet
Bo
36 pages
Sanjey RS Lab
No ratings yet
Sanjey RS Lab
33 pages
Coursework
No ratings yet
Coursework
14 pages
DM Lab 02
No ratings yet
DM Lab 02
12 pages
SVD Analysis
No ratings yet
SVD Analysis
3 pages
Data Mining: Dimensionality Reduction Pca - SVD
No ratings yet
Data Mining: Dimensionality Reduction Pca - SVD
33 pages
Chapter 6
No ratings yet
Chapter 6
55 pages
PCA and SVD
No ratings yet
PCA and SVD
21 pages
SVD Poster
No ratings yet
SVD Poster
1 page
Lecture 9 Unit2
No ratings yet
Lecture 9 Unit2
168 pages
Sairam PCA
No ratings yet
Sairam PCA
27 pages
U5 - SVD - 5th Sem - DS
No ratings yet
U5 - SVD - 5th Sem - DS
17 pages
ML
No ratings yet
ML
8 pages
DS5 Statistics
No ratings yet
DS5 Statistics
67 pages
Multivariate Notes r1
No ratings yet
Multivariate Notes r1
54 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
37 pages
Similarity
No ratings yet
Similarity
19 pages
PART I: Approximation of Static Systems
No ratings yet
PART I: Approximation of Static Systems
123 pages
A Gentle Introduction To Singular-Value Decomposition For Machine Learning
No ratings yet
A Gentle Introduction To Singular-Value Decomposition For Machine Learning
14 pages
02data Part4
No ratings yet
02data Part4
28 pages
Final Report: Linear Algebra For It
No ratings yet
Final Report: Linear Algebra For It
25 pages
CHP 4
No ratings yet
CHP 4
72 pages
Part 1
No ratings yet
Part 1
1 page
Truncated SVD For Image Compression
No ratings yet
Truncated SVD For Image Compression
10 pages
Textdb
No ratings yet
Textdb
27 pages
EECS 275 Matrix Computation: Ming-Hsuan Yang
No ratings yet
EECS 275 Matrix Computation: Ming-Hsuan Yang
21 pages
SMAI-M20-06: Data, Distances and Learning: C. V. Jawahar
No ratings yet
SMAI-M20-06: Data, Distances and Learning: C. V. Jawahar
24 pages
MFound HW3
No ratings yet
MFound HW3
4 pages
Math Foundations of Gena I
No ratings yet
Math Foundations of Gena I
210 pages
Eigenvalue Eigenvector Concepts
No ratings yet
Eigenvalue Eigenvector Concepts
3 pages
Refresher 29 Sep Chats
No ratings yet
Refresher 29 Sep Chats
2 pages
Final
No ratings yet
Final
3 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
From Everand
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
Fouad Sabry
No ratings yet
Bilinear Interpolation: Enhancing Image Resolution and Clarity through Bilinear Interpolation
From Everand
Bilinear Interpolation: Enhancing Image Resolution and Clarity through Bilinear Interpolation
Fouad Sabry
No ratings yet
AbInitio String Functions
100% (3)
AbInitio String Functions
13 pages
r22 B.tech Cseaiml Iyriiyr Syllabusupdated
No ratings yet
r22 B.tech Cseaiml Iyriiyr Syllabusupdated
87 pages
Lecture Notes 1
No ratings yet
Lecture Notes 1
17 pages
Final Bachelor Project 07 Vikram
No ratings yet
Final Bachelor Project 07 Vikram
62 pages
Att-4 LV Cable Epr Epr GSWB Shf-2
No ratings yet
Att-4 LV Cable Epr Epr GSWB Shf-2
7 pages
Tehcnical Note - LIS PDF
100% (1)
Tehcnical Note - LIS PDF
19 pages
Spa - For Companies
No ratings yet
Spa - For Companies
2 pages
Heartofcoaching Sample
100% (1)
Heartofcoaching Sample
19 pages
C Lab Manual
No ratings yet
C Lab Manual
43 pages
Leave Application Form: To Be Filled-Out by Employee
No ratings yet
Leave Application Form: To Be Filled-Out by Employee
4 pages
Experiment 8 Fuentes Mark
No ratings yet
Experiment 8 Fuentes Mark
29 pages
Disec Study Guide Aurora Mun
No ratings yet
Disec Study Guide Aurora Mun
28 pages
Msme Tool Room, Indore: Bio-Data
No ratings yet
Msme Tool Room, Indore: Bio-Data
2 pages
Banner Installation Manual and Configuration Guide - V1-6
No ratings yet
Banner Installation Manual and Configuration Guide - V1-6
39 pages
FANUC Software WeldPRO
No ratings yet
FANUC Software WeldPRO
2 pages
Car Amp Subwofer JBL - bp1200.1
No ratings yet
Car Amp Subwofer JBL - bp1200.1
33 pages
IEC-IM03 Series: Key Features
No ratings yet
IEC-IM03 Series: Key Features
1 page
Laag 1
No ratings yet
Laag 1
12 pages
Document 1
No ratings yet
Document 1
17 pages
Wise Holdings Vs Garcia
100% (2)
Wise Holdings Vs Garcia
2 pages
Expanding Mental Health Care in The Kingdom of Eswatini: Successes, Challenges and Recommendations From Initial Experiences in Lubombo Region
No ratings yet
Expanding Mental Health Care in The Kingdom of Eswatini: Successes, Challenges and Recommendations From Initial Experiences in Lubombo Region
8 pages
Manual Polipasto R&M Load Mate LM16
100% (1)
Manual Polipasto R&M Load Mate LM16
65 pages
Coating MG For Use With NH4ClO4 - Shimizu's Improved and Long-Term Stable Dichromate Method
No ratings yet
Coating MG For Use With NH4ClO4 - Shimizu's Improved and Long-Term Stable Dichromate Method
13 pages
Curriculum Vitae: Present Libya +218 913008576 Residence +919816228430 +919736499006
100% (1)
Curriculum Vitae: Present Libya +218 913008576 Residence +919816228430 +919736499006
3 pages
REPSE Requirements
No ratings yet
REPSE Requirements
6 pages
223 Dak 17 DRG Cul Misc GW Typ 01
No ratings yet
223 Dak 17 DRG Cul Misc GW Typ 01
2 pages

Recommender Systems Assignment

Uploaded by

Recommender Systems Assignment

Uploaded by

DATA SIMILARITY

Data similarity measures are techniques used to quan3fy the similarity or

Some commonly used data similarity measures are:

2. Manha:an Distance (L1 Norm):

6. Pearson CorrelaTon Coeﬃcient:

IMPLEMENTING DATA SIMILARITY MEASURES USING PYTHON

# define two lists or array

# compute cosine similarity

#func3on to calculate Hamming Distance

while(i < len(str1)):

from math import sqrt

#create func3on to calculate Manha>an distance

#calculate Manha>an distance between vectors

# Python code to ﬁnd Euclidean distance

# calcula3ng Euclidean distance

# prin3ng Euclidean distance

6. Pearson CorrelaTon Coeﬃcient

# Import your data into Python

# Convert dataframe into series

# Apply the pearsonr()

SINGLE VALUE DECOMPOSITION (SVD)

• Singular Value DecomposiTon (SVD) is a mathema3cal technique used

ADVANTAGES OF SINGULAR VALUE DECOMPOSITION (SVD):

6. ApplicaTons in Signal Processing and Image Compression:

7. Solving Linear Systems:

8. Principal Component Analysis (PCA):

DISADVANTAGES OF SINGULAR VALUE DECOMPOSITION (SVD):

APPLICATIONS OF SINGULAR VALUE DECOMPOSITION (SVD)

Singular Value Decomposi3on (SVD) ﬁnds applica3ons in various ﬁelds :

3. Principal Component Analysis (PCA):

4. Latent SemanTc Analysis (LSA) in NLP:

5. Signal Processing and System IdenTﬁcaTon:

These applications highlight the versatility of Singular Value Decomposition in

You might also like