0% found this document useful (0 votes)
189 views29 pages

CC09 02

Uploaded by

phamyenvy2012005
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
189 views29 pages

CC09 02

Uploaded by

phamyenvy2012005
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

VIETNAM NATIONAL UNIVERSITY - HO CHI MINH CITY

HO CHI MINH UNIVERSITY OF TECHNOLOGY


FACULTY OF APPLIED SCIENCE

GROUP PROJECT
LINEAR ALGEBRA (MT1007)
CC09 – Group 2

PROJECT 6 & 7
INNER PRODUCT, NORMS, ANGLES

Lecturer: Hoang Hai Ha, MSc

No. Student Name Student ID


1 Trịnh Gia Hiệp 2352343
2 Đoàn Quang Đăng 2352249
3 Hồ Gia An 2352005
4 Lê Nghiêm Ngọc Bảo 2352092
5 Phạm Thuỳ Ngân 2053260
6 Nguyễn Đình Khôi Nguyên 2053278

Ho Chi Minh City, May 2024


TABLE OF CONTENTS

I. INTRODUCTION ................................................................................................. 1
II. THEORETICAL BACKGROUND .................................................................. 2
1. Inner product ........................................................................................................... 2
2. Norm of vector ........................................................................................................ 2
3. Angle and Orthogonality in Inner Product Spaces ................................................. 3
II. PROBLEM SOLVING ........................................................................................ 5
PROJECT 6: CONVOLUTION, INNER PRODUCT, AND IMAGE
PROCESSING REVISITED ................................................................................... 5
Code ........................................................................................................................ 5
Questions ............................................................................................................... 11
PROJECT 7: NORMS, ANGLES, AND YOUR MOVIE CHOICES ............... 15
Code ...................................................................................................................... 15
Questions ............................................................................................................... 21
III. CONCLUSION ...........................................................................................27
Linear Algebra – CC09 – Group 2

I. INTRODUCTION
This report serves as a comprehensive summary of linear algebra theories,
specifically focusing on the topics of Inner Product, Norms, and Angles of vectors. It
provides an overview of these fundamental concepts along with their associated
formulas. The inner product, also known as the dot product or scalar product, is a
mathematical operation that computes a single value from two sequences of equal
length, typically vectors. Furthermore, the report elucidates the notion of vector norms,
which are measures of the size or length of vectors, each defined in a distinct manner.
Additionally, it elaborates on calculating the angle between two vectors in a vector
space using the dot product and trigonometric principles.
Following the theoretical background, Python programming is employed to
exemplify and address the problem statements outlined in Project 6 and Project 7 of the
“Applied Projects for an Introductory Linear Algebra Class”. Project 6 mentioned
convolution, the inner product, and their application in image processing. These
operations are fundamental in various image processing tasks, such as filtering, feature
extraction, and similarity measurement, forming the basis for advanced techniques in
computer vision and image analysis. Conversely, the objective of Project 7 is to employ
vector norms and angles to discern similarities among users' preferences and construct
a rudimentary recommender system. These project proposals seamlessly merge linear
algebra principles with datasets and scenarios pertinent to the realm of movies,
furnishing practical experience in the application of theoretical concepts to real-world
quandaries.
Subsequently, the report concludes by summarizing the processing of solving
the given problems and related questions. Consequently, summarize the most important
algebra knowledge that were gained after doing this project.

1
Linear Algebra – CC09 – Group 2

II. THEORETICAL BACKGROUND


1. Inner product
In mathematics, an inner product space is a vector space with a binary operation
called an inner product. This operation associates each pair of vectors in the space with
a scalar quantity known as the inner product of the vector, often denoted using angle
brackets. Inner products allow the rigorous introduction of intuitive geometrical
notions, such as the length of a vector or the angle between 2 vectors.
In linear algebra, the inner product (also known as the dot product or scalar
product) is a fundamental operation that takes two vectors and produces a scalar. It
quantifies the geometric relationship between vectors, providing a measure of their
similarity or orthogonality. The inner product is widely used in various mathematical
fields, including geometry, physics, and engineering.
Let u and v be vectors in an n-dimensional real or complex vector space. The
inner product of u and v, denoted as ⟨u, v⟩ or u·v and defined as:
⟨u, v⟩ = u₁v₁ + u₂v₂ + ... + uₙvₙ
This inner product is commonly called the Euclidean inner product (or the
standard inner product) on Rn to distinguish it from other possible inner products that
might be defined on Rn. We call Rn with the Euclidean inner product Euclidean n-
space.
An inner product on a real vector space V is a function that associates a real
number u, v with each pair of vectors in V in such a way that the following axioms are
satisfied for all vectors u, v, and w in V and all scalars k.
1. ⟨u, v⟩ = ⟨v, u⟩ [Symmetry axiom]
2. ⟨u+v,w⟩ = ⟨u,w⟩ + ⟨v,w⟩ [Additivity axiom]
3. ⟨ku, v⟩ = k⟨u, v⟩ [Homogeneity axiom]
4. ⟨u,u⟩ ≥ 0 and ⟨u,u⟩ = 0 if and only if u = 0 [Positivity axiom]
A real vector space with an inner product is called a real inner product space.
2. Norm of vector
If V is a real inner product space, then the norm (or length) of a vector v in V is
denoted by ∥v∥ and is defined by
∥v∥ = ⟨v, v⟩

2
Linear Algebra – CC09 – Group 2

and the distance between two vectors is denoted by d (u, v) and is defined by
d(u,v) = ∥u−v∥ = ⟨u−v,u−v⟩
A vector with a norm of 1 is termed a unit vector. These vectors serve to specify
direction without considering their length in the problem context. To acquire a unit
vector in a specified direction, one can take any non-zero vector v in that direction and
multiply it by the reciprocal of its norm. For instance, if v is a vector with a length of 2
in R2 or R3, then v represents a unit vector in the same direction as v. More generally,

for any non-zero vector v in Rn, the unit vector can be obtained by multiplying v by
∥ ∥

u= v
∥ ∥

If v is a vector in Rn, and if k is any scalar, then:


(a) ∥v∥ ≥ 0
(b) ∥v∥ = 0 if and only if v = 0
(c) ∥kv∥ = |k|∥v∥
 The Standard Unit Vectors

When a rectangular coordinate system is introduced in R2 or R3, the unit vectors


in the positive directions of the coordinate axes are called the standard unit vectors.
Moreover, we can generalize these formulas to R n by defining the standard unit vectors
in Rn to be
e1 = (1,0,0,…0), e2 = (0,1,0,...,0), en = (0,0,0,…1)
In which case every vector v = (v1, v2,… , vn) in Rn can be expressed as
v=(v1,v2,...,vn)=v1e1 +v2e2 +···+vnen
 Distance in Rn
If P1 and P2 are points in R2 or R3, then the length of the vector P1P2 is equal to
the distance between the two points. Specifically, if P1(x1,y1) and P2(x2,y2) are points in
2
R , then the distance between these points are:
d (u, v) = ∥ P1P2 ∥ = (𝑥2 − 𝑥1) + (𝑦2 − 𝑦1)
3. Angle and Orthogonality in Inner Product Spaces
In the realm of inner product spaces, the notions of angle and orthogonality play
pivotal roles in understanding the geometric properties of vectors and subspaces.

3
Linear Algebra – CC09 – Group 2

The angle θ between two nonzero vectors u and v in Rn is defined by the formula:

Two nonzero vectors u and v in Rn are said to be orthogonal (or perpendicular)

if u·v = 0. We will also agree that the zero vector in Rn is orthogonal to every vector in

Rn.
 Orthogonal projection
In many applications, it is necessary to “decompose” a vector u into a sum of
two terms, one term being a scalar multiple of a specified nonzero vector a and the other
term being orthogonal to a.
Since the vector w1 is to be a scalar multiple of a, it must have the form
w1 = ka (1)
Our goal is to find a value of the scalar k and a vector w2 that is orthogonal to a such
that
u = w1 + w2 (2)
We can determine k by using (1) to rewrite (2) as
u = w1+ w2 = ka + w2
and then applying the algebraic properties of the inner product and projection theorem
to obtain
u·a = (ka + w2)·a = k∥a∥2 + (w2·a) (3)
Since w2 is to be orthogonal to a, the last term in (3) must be 0, and hence k must satisfy
the equation
u·a = k∥a∥2
from which we obtain
·
k=
∥ ∥

as the only possible value for k. The proof can be completed by rewriting (2) as
·
w2 = u−w1 = u−ka = u − a
∥ ∥

4
Linear Algebra – CC09 – Group 2

II. PROBLEM SOLVING


PROJECT 6: CONVOLUTION, INNER PRODUCT, AND IMAGE
PROCESSING REVISITED
Code
Task 1: Use the size function to find the dimensions m, n of the matrix ImJPG.
Variables: ImJPG, m, n
#PROJECT 6
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
from scipy.signal import convolve2d

# Task 1
# Load the image and save the resulting matrix as ImJPG
ImJPG = np.array(Image.open("einstein.jpg").convert("L"))
m, n = ImJPG.shape
print(f"Image size: {m} x {n}")

Task 2: Introduce some noise into the image by adding random fluctuations of color to
each point:
ImJPG_Noisy=double(ImJPG)+50*(rand(m,n)-0.5);
Observe that the command rand(m,n) produces a matrix of the dimension m × n filled
with pseudorandom numbers within the interval (0, 1). The amplitude of the noise is
equal to ±25 shades of gray. Also, notice the function double which converts the
variables from the type uint8 to the type double.
# Task 2
# Introduce some noise into the image by adding random
fluctuations of color to each point
ImJPG_Noisy = ImJPG.astype(float) + 50 * (np.random.rand(m, n)
- 0.5)

Task 3: Two of the most common operations on images done with convolution filters
include smoothing and sharpening. Let us start by using two average smoothing filters
given by the matrices:
0 1 0 1 1 1
𝐾𝑒𝑟𝑛𝑒𝑙_𝐴𝑣𝑒𝑟𝑎𝑔𝑒1 = 1 1 1 , 𝐾𝑒𝑟𝑛𝑒𝑙_𝐴𝑣𝑒𝑟𝑎𝑔𝑒2 = 1 1 1
0 1 0 1 1 1

5
Linear Algebra – CC09 – Group 2

Task 4: Using the appropriate Matlab syntax, type in the matrices Kernel Average1 and
Kernel Average2.
Variables: Kernel Average1, Kernel Average2
# Task 3 & 4
# Define the average smoothing filters
Kernel_Average1 = np.array([[0, 1, 0], [1, 1, 1], [0, 1, 0]])
/ 5
Kernel_Average2 = np.array([[1, 1, 1], [1, 1, 1], [1, 1, 1]])
/ 9

Task 5: Apply the filters to the noisy image by using the commands
ImJPG_Average1=conv2(Kernel_Average1,ImJPG_Noisy);
ImJPG_Average2=conv2(Kernel_Average2,ImJPG_Noisy);
Display the resulting images in separate figure windows and observe the result. Don’t
forget to convert the results back to the integer format by using uint8 function.
Variables: ImJPG Average1, ImJPG Average2
# Task 5
# Apply the filters to the noisy image
ImJPG_Average1 = np.clip(np.round(convolve2d(ImJPG_Noisy,
Kernel_Average1, mode='same')), 0, 255).astype(np.uint8)
ImJPG_Average2 = np.clip(np.round(convolve2d(ImJPG_Noisy,
Kernel_Average2, mode='same')), 0, 255).astype(np.uint8)

Task 6: An alternative blurring filter, Gaussian blur, is given by the matrix:


1 0 1 0
𝐾𝑒𝑟𝑛𝑒𝑙_𝐺𝑎𝑢𝑠𝑠 = 1 4 1
8
0 1 0
which assigns a higher weight to the pixel color in the center. Type in the matrix Kernel
Gauss in your Matlab file using the appropriate syntax.
Variables: Kernel Gauss
# Task 6
# Define the Gaussian blur filter
Kernel_Gauss = np.array([[0, 1, 0], [1, 4, 1], [0, 1, 0]]) / 8

Task 7: Perform the convolution using the function conv2 and Kernel Gauss and save
the resulting array as ImJPG Gauss. Display the result in a new window.
Variables: ImJPG Gauss

6
Linear Algebra – CC09 – Group 2

# Task 7
# Apply the Gaussian blur filter to the noisy image
ImJPG_Gauss = np.clip(np.round(convolve2d(ImJPG_Noisy,
Kernel_Gauss, mode='same')), 0, 255).astype(np.uint8)

Task 8: Observe that we can “layer” filter effects. Perform another convolution with
the Gaussian kernel on the image ImJPG Gauss, save the result as ImJPG Gauss2, and
display the result in a new window.
Variables: ImJPG Gauss2
# Task 8
# Apply the Gaussian blur filter again to the blurred image
ImJPG_Gauss2 = np.clip(np.round(convolve2d(ImJPG_Gauss,
Kernel_Gauss, mode='same')), 0, 255).astype(np.uint8)

Task 9: It is also possible to blur the image over a larger area:

Apply the kernel Kernel Large to the matrix ImJPG, save the result as ImJPG Large,
and display the figure in a new figure window.
Variables: ImJPG Large
# Task 9
# Apply a larger blur kernel
Kernel_Large = np.array([[0, 1, 2, 1, 0], [1, 4, 8, 4, 1], [2,
8, 16, 8, 2], [1, 4, 8, 4, 1], [0, 1, 2, 1, 0]]) / 80
ImJPG_Large = np.clip(np.round(convolve2d(ImJPG, Kernel_Large,
mode='same')), 0, 255).astype(np.uint8)

Task 10: An opposite action to blurring is the sharpening of an image with a


convolution filter.
Type the following kernels into your Matlab file:
0 −1 0 −1 −1 −1
𝐾𝑒𝑟𝑛𝑒𝑙_𝑆ℎ𝑎𝑟𝑝1 = −1 5 −1 , 𝐾𝑒𝑟𝑛𝑒𝑙_𝑆ℎ𝑎𝑟𝑝2 = −1 9 −1
0 −1 0 −1 −1 −1
Variables: Kernel Sharp1, Kernel Sharp2

7
Linear Algebra – CC09 – Group 2

# Task 10
# Define sharpening kernels
Kernel_Sharp1 = np.array([[0, -1, 0], [-1, 5, -1], [0, -1,
0]])
Kernel_Sharp2 = np.array([[-1, -1, -1], [-1, 9, -1], [-1, -1,
-1]])

Task 11: Perform the convolution of the original image ImJPG with the kernels Kernel
Sharp1, Kernel Sharp2 using the function conv2 and save the resulting arrays as ImJPG
Sharp1 and ImJPG Sharp2. Display the results in new figure windows.
Variables: ImJPG Sharp1, ImJPG Sharp2
# Task 11
# Apply sharpening kernels to the original image
ImJPG_Sharp1 = np.clip(convolve2d(ImJPG, Kernel_Sharp1,
mode='same'), 0, 255).astype(np.uint8)
ImJPG_Sharp2 = np.clip(convolve2d(ImJPG, Kernel_Sharp2,
mode='same'), 0, 255).astype(np.uint8)

Task 12: Finally, it is possible to use convolution to detect edges in the image. Edge
detection is used for image segmentation and data extraction in areas such as image
processing, computer vision, and machine vision. Two of the most common filters used
for these are the Sobel horizontal and vertical filters:
−1 0 1 −1 −2 −1
𝐾𝑒𝑟𝑛𝑒𝑙_𝑆𝑜𝑏𝑒𝑙1 = −2 0 2 , 𝐾𝑒𝑟𝑛𝑒𝑙_𝑆𝑜𝑏𝑒𝑙2 = 0 0 0
−1 0 1 1 2 1
Sobel filters can be interpreted as discrete derivatives in the horizontal and vertical
directions. Type in the matrices Kernel Sobel1, Kernel Sobel2.
Variables: Kernel Sobel1, Kernel Sobel2
# Task 12
# Define Sobel kernels for edge detection
Kernel_Sobel1 = np.array([[-1, 0, 1], [-2, 0, 2], [-1, 0, 1]])
Kernel_Sobel2 = np.array([[-1, -2, -1], [0, 0, 0], [1, 2, 1]])

Task 13: Perform the convolution of the original image ImJPG with the Sobel kernels
using the function conv2 and save the resulting arrays as ImJPG Sobel1 and ImJPG
Sobel2. Display the results in new figure windows.
Variables: ImJPG Sobel1, ImJPG Sobel2

8
Linear Algebra – CC09 – Group 2

# Task 13
# Apply Sobel kernels to the original image
ImJPG_Sobel1 = np.clip(convolve2d(ImJPG, Kernel_Sobel1,
mode='same'), 0, 255).astype(np.uint8)
ImJPG_Sobel2 = np.clip(convolve2d(ImJPG, Kernel_Sobel2,
mode='same'), 0, 255).astype(np.uint8)

Task 14: Create a combined image with both horizontal and vertical edges by summing
up the matrices ImJPG Sobel1, ImJPG Sobel2. Display the result in a new figure
window by using the code:
figure;
imshow(uint8(ImJPG_Sobel1+ImJPG_Sobel2));
# Task 14
# Create a combined image with both horizontal and vertical
edges
ImJPG_Sobel_Combined = np.clip(ImJPG_Sobel1 + ImJPG_Sobel2, 0,
255).astype(np.uint8)

Task 15: Alternatively, Laplacian edge detection can be used with the following filter:
0 −1 0
𝐾𝑒𝑟𝑛𝑒𝑙_𝐿𝑎𝑝𝑙𝑎𝑐𝑒 = −1 4 −1
0 −1 0
Type in the matrixKernel Laplace. The laplace kernel is a discrete analogue of
continuous Laplacian and may be interpreted as a sum of two discrete partial derivatives
of the second order.
Variables: Kernel Laplace
# Task 15
# Define Laplacian kernel for edge detection
Kernel_Laplace = np.array([[0, -1, 0], [-1, 4, -1], [0, -1,
0]])

Task 16: Perform the convolution of the original image ImJPG with the Laplace kernel
using the function conv2 and save the resulting array as ImJPG Laplace. Display the
results in a new window.
Variables: ImJPG Laplace
# Task 16
# Apply Laplacian kernel to the original image
ImJPG_Laplace = np.clip(convolve2d(ImJPG, Kernel_Laplace,
mode='same'), 0, 255).astype(np.uint8)

9
Linear Algebra – CC09 – Group 2

Result Displayed
# Display the results
plt.figure(figsize=(12, 12))
plt.subplot(4, 4, 1), plt.imshow(ImJPG, cmap='gray'),
plt.title('Original')
plt.subplot(4, 4, 2), plt.imshow(ImJPG_Noisy, cmap='gray'),
plt.title('Noisy')
plt.subplot(4, 4, 3), plt.imshow(ImJPG_Average1, cmap='gray'),
plt.title('Average Filter 1')
plt.subplot(4, 4, 4), plt.imshow(ImJPG_Average2, cmap='gray'),
plt.title('Average Filter 2')
plt.subplot(4, 4, 5), plt.imshow(ImJPG_Gauss, cmap='gray'),
plt.title('Gaussian Blur')
plt.subplot(4, 4, 6), plt.imshow(ImJPG_Gauss2, cmap='gray'),
plt.title('Gaussian Blur (2x)')
plt.subplot(4, 4, 7), plt.imshow(ImJPG_Large, cmap='gray'),
plt.title('Large Blur')
plt.subplot(4, 4, 8), plt.imshow(ImJPG_Sharp1, cmap='gray'),
plt.title('Sharpening Filter 1')
plt.subplot(4, 4, 9), plt.imshow(ImJPG_Sharp2, cmap='gray'),
plt.title('Sharpening Filter 2')
plt.subplot(4, 4, 10), plt.imshow(ImJPG_Sobel1, cmap='gray'),
plt.title('Sobel Filter 1')
plt.subplot(4, 4, 11), plt.imshow(ImJPG_Sobel2, cmap='gray'),
plt.title('Sobel Filter 2')
plt.subplot(4, 4, 12), plt.imshow(ImJPG_Sobel_Combined,
cmap='gray'), plt.title('Combined Sobel Edges')
plt.subplot(4, 4, 13), plt.imshow(ImJPG_Laplace, cmap='gray'),
plt.title('Laplacian Edge')
plt.tight_layout()
plt.show()

10
Linear Algebra – CC09 – Group 2

LAB 6 Result:

11
Linear Algebra – CC09 – Group 2

Questions
Task 4:
Q1: Why are the fractions 1/5 and 1/9 necessary?
The fractions 1/5 and 1/9 are used to normalize the filter kernels. This ensures
that the average of the filter kernel is 1, so that the overall brightness of the image
doesn’t change when the filter is applied.

Task 5:
Q1: What is the size of the matrices ImJPG Average1, ImJPG_Average2?
The size of the matrices ImJPG_Average1 and ImJPG_Average2 would be the
same as the original image matrix ImJPG.
Q2: What is the size of the original matrix ImJPG? Use the size function.
In Python, we can use the "shape" attribute to get the size of the matrix. If ImJPG
is a numpy array, you can use "ImJPG.shape" to get its size. This will return a tuple
where the first element is the number of rows (height of the image) and the second
element is the number of columns (width of the image).
Q3: Which filter blurs more? Why?
The filter Kernel_Average2 blurs more because it takes the average over a larger
neighbourhood (3x3 instead of 2x2). This means that each pixel in the output image is
influenced by more pixels from the input image, resulting in more blurring.

Task 8:
Q4: Devise a matrix for a filter which is equivalent to applying Gaussian
convolution twice.
To find a filter matrix that is equivalent to applying the Gaussian convolution
twice, we can convolve the Gaussian kernel with itself. This is based on the associative
property of convolution.

12
Linear Algebra – CC09 – Group 2

Let's denote the Gaussian kernel as G:

To convolve G with itself, we perform a 2D convolution operation:

Therefore, the filter matrix equivalent to applying the Gaussian convolution twice is:

Q5: What is the size of the matrix for this filter?


The size of the matrix Kernel_Gauss_2x is 5x5. When convolving two matrices,
the size of the resulting matrix is determined by the sizes of the input matrices. In this
case, since both input matrices (Gaussian kernels) are 3x3, the resulting matrix after
convolution is 5x5.
In general, if we convolve an m x n matrix with a p x q matrix, the size of the
output matrix will be (m+p-1) x (n+q-1).
Q6: Compare with applying Gaussian blur applied twice. Which one blurs more?
The larger blur kernel (Kernel_Large) blurs the image more than applying the
Gaussian blur twice (ImJPG_Gauss2). This can be explained by looking at the sizes and
values of the kernels:
1. Kernel_Large is a 5x5 matrix with larger values in the center and gradually
decreasing values towards the edges. The larger size and distribution of values
in this kernel lead to a stronger blurring effect.

13
Linear Algebra – CC09 – Group 2

2. Applying the Gaussian blur twice (ImJPG_Gauss2) is equivalent to convolving


the image with the Kernel_Gauss_2x, which is also a 5x5 matrix. However, the
values in Kernel_Gauss_2x are more concentrated around the center compared
to Kernel_Large.
The larger values in Kernel_Large spread out the pixel intensities more, resulting
in a stronger blurring effect compared to the Gaussian blur applied twice.
In summary, the larger blur kernel (Kernel_Large) has a more pronounced
blurring effect due to its larger size and the distribution of its values compared to
applying the Gaussian blur twice (ImJPG_Gauss2).

14
Linear Algebra – CC09 – Group 2

PROJECT 7: NORMS, ANGLES, AND YOUR MOVIE CHOICES


Code
Task 1: Load the arrays movies, users movies, users movies sort, index small, trial user
from the file users movies.mat using the load command. The matrix users movies
should be a 6040 × 3952 matrix containing integer values between 0 and 5 with 1
meaning “strongly dislike” and 5 meaning “strongly like”. The 0 in the matrix means
that the user did not rate the movie. The array movies contains all the titles of the
movies. The matrix users movies sort contains an extract from the matrix users
movies.mat with rating for the 20 most popular movies selected. The indexes of these
popular movies are recorded in the array index small. Finally, ratings of these popular
movies by yet another user (not any of the users contained in the database) are given by
the vector trial user. It is suggested to view all the variables and their dimensions by
using the “Workspace” window of the Matlab environment:
%% Load the data
clear;
load(’users_movies.mat’,’movies’,’users_movies’,’users_movies_sort’,...
’index_small’,’trial_user’);
[m,n]=size(users_movies);
Variables: movies, users movies, users movies sort, index small, trial user, m, n
#PROJECT 7
import numpy as np
from scipy.io import loadmat

# Load data from MATLAB file


data = loadmat('users_movies.mat')

# Extract data from the loaded dictionary


movies = data['movies'].squeeze()
users_movies = data['users_movies']
users_movies_sort = data['users_movies_sort']
index_small = data['index_small'].squeeze()
trial_user = data['trial_user'].squeeze()

Task 2: Print the titles of the 20 most popular movies by using the following code:
fprintf(’Rating is based on movies:\n’)
for j=1:length(index_small)
fprintf(’%s \n’,movies{index_small(j)})
end;
fprintf(’\n’)

15
Linear Algebra – CC09 – Group 2

Observe that the movie titles are called from the cell array movies (notice the curly
parentheses) by using an intermediate array index small.
# Print movie titles
print('Rating is based on movies:')
for idx in index_small:
print(movies[idx - 1]) # MATLAB indices start from 1,
Python starts from 0
print()

Task 3: Now let us select the users we will compare the trial user to. Here, we want to
select the people who rated all of the 20 movies under consideration. This means that
there should not be any zeros in the corresponding rows of the matrix users movies sort.
This can be accomplished by the following code:
%% Select the users to compare to
[m1,n1]=size(users_movies_sort);
ratings=[];
for j=1:m1
if prod(users_movies_sort(j,:))~=0
ratings=[ratings; users_movies_sort(j,:)];
end;
end;
Observe the use of the command prod to compute the product of the elements of the
rows of the array users movies sort. The array ratings contains all the users which have
rated all the popular movies from the array index small. In general, it may happen that
this array is empty or too small to create any meaningful comparison- we won’t consider
these cases in this project.
Variables: ratings, m1, n1
# Select users who rated all 20 movies
ratings = []
for user_ratings in users_movies_sort:
if np.prod(user_ratings) != 0:
ratings.append(user_ratings)
ratings = np.array(ratings)

Task 4: Next, we can look at the similarity metric based on the Euclidean distance. The
idea here is that we treat the array trial user and the rows of the array ratings as vectors
in 20-dimensional real space R20 . Assume that all the vectors have the origin as the
beginning point. We can find the distance between the end points of the vector trial user

16
Linear Algebra – CC09 – Group 2

and each of the vectors ratings (j,:). In other words, we are looking for the user with the
closest ratings to the trial user. This can be accomplished by the following code:
%% Find the Euclidean distances
[m2,n2]=size(ratings);
for i=1:m2
eucl(i)=norm(ratings(i,:)-trial_user);
end;
The vector eucl contains all the Euclidean distances between trial user and the rows of
the matrix ratings.
Variables: eucl
# Find Euclidean distances
eucl_dist = []
for user_ratings in ratings:
eucl_dist.append(np.linalg.norm(user_ratings - trial_user))

Task 5: Now let us select the user from the database with the smallest Euclidean
distance from the trial user. Instead of using the usual function min we will use a slightly
more complicated approach. Let us sort the elements of the vector eucl by using the
function sort. The advantage of this is that it allows us to find the second closest user,
the third closest user, etc. There may only be a small difference between the several
closest users and we might want to use their data as well.
[MinDist,DistIndex]=sort(eucl,’ascend’);
closest_user_Dist=DistIndex(1)
Variables: MinDist, DistIndex, closest user Dist
eucl_dist = np.array(eucl_dist)
closest_user_dist = np.argsort(eucl_dist)[0]

Task 6: The similarity metric above is one of the simplest ones which can be used to
compare two objects. However, when it comes to user ratings it has certain
disadvantages. For instance, what if the users have similar tastes, but one of them
consistently judges movies harsher than the other one? The metric above would rate
those two users as dissimilar since the Euclidean distance between vectors of their
opinions might be pretty large. To rectify this problem, we can look at a different
similarity metric known in statistics as the Pearson correlation coefficient which can be
defined as

17
Linear Algebra – CC09 – Group 2

… To compute the Pearson correlation coefficient, let us centralize the columns of


the matrix ratings and the vector trial user first:
ratings_cent=ratings-mean(ratings,2)*ones(1,n2);
trial_user_cent=trial_user-mean(trial_user);

# Compute Pearson correlation coefficients


ratings_cent = ratings - np.mean(ratings, axis=1, keepdims=True)
trial_user_cent = trial_user - np.mean(trial_user)

Task 7: Next, use the for... end loop to compute the Pearson correlation coefficients
between the rows of the matrix ratings and the vector trial user. Save the result as a
vector pearson.
Variables: pearson
pearson_corr = []
for user_ratings in ratings_cent:
numerator = np.sum(user_ratings * trial_user_cent)
denominator = np.linalg.norm(user_ratings) *
np.linalg.norm(trial_user_cent)
pearson_corr.append(numerator / denominator)
pearson_corr = np.array(pearson_corr)
closest_user_pearson = np.argsort(-pearson_corr)[0]

Task 8: Finally, observe that the value r(x, y) belongs to the interval (−1, 1). The closer
the coefficient is to 1, the more similar the tastes of the users are. Finally, let us sort the
vector pearson as before using the sort function. Save the results of this function as
[MaxPearson,PearsonIndex] and find the maximal correlation coefficient which will be
the first element in the matrix MaxPearson. Save this element as closest user Pearson.
Variables: MaxPearson, PearsonIndex, closest user Pearson

Task 9: Compare the elements of the vectors DistIndex, PearsonIndex.


Task 10: Now let us display the recommendations on the screen. Use the following list
to create the list of movies which the trial user has liked and the lists of
recommendations for him/her based on the distance criterion and the Pearson
correlation coefficient criterion:

18
Linear Algebra – CC09 – Group 2

%% Recommendations
recommend_dist=[];
for k=1:n
if (users_movies(closest_user_Dist,k)==5)
recommend_dist=[recommend_dist; k];
end;
end;
recommend_Pearson=[];
for k=1:n
if (users_movies(closest_user_Pearson,k)==5)
recommend_Pearson=[recommend_Pearson; k];
end;
end;
liked=[];
for k=1:20
if (trial_user(k)==5)
liked=[liked; index_small(k)];
end;
end;
We use the rating equal to 5 both as the criterion for liking the movie and the criterion
to recommend the movie. Of course, you can broaden this up and also include the
movies ranked as 4.
Variables: liked, recommend dist, recommend Pearson
# Print recommendations
print('Movies you liked:')
for idx in index_small[trial_user == 5]:
print(movies[idx - 1])
print()

print('Recommendations based on Euclidean distance:')


for idx in np.flatnonzero(users_movies[closest_user_dist] ==
5):
print(movies[idx - 1])
print()

print('Recommendations based on Pearson correlation:')


for idx in np.flatnonzero(users_movies[closest_user_pearson]
== 5):
print(movies[idx - 1])
print()

Task 11: Finally, display the titles of the movies from the arrays liked, recommend dist,
recommend Pearson on the screen by using the procedure similar to the one used in the

19
Linear Algebra – CC09 – Group 2

Task 12: Take a look at the list of the popular movies displayed in step 2. Chances are
you might have seen the majority of them before. Create your own vector of ratings and
call it myratings. Ratings should be integers between 1 and 5. Again, assign the rating
1 if you really disliked the movie, and 5 if you really liked it. If you haven’t seen a
particular movie, pick its rating at random. The vector of ratings myratings should be a
row-vector with 20 elements.
Variables: myratings

Task 13: Create a new code cell. In this cell, repeat steps 4-11 of this project and
substitute the vector trial user by the vector myratings. This should produce a personal
recommendation based on your own ratings.
Variables: liked, recommend dist, recommend Pearson
# Personal recommendations
myratings = np.array([4, 5, 3, 4, 5, 2, 4, 3, 5, 4, 3, 5, 4,
3, 4, 5, 3, 4, 5, 3])

eucl_personal = np.zeros(m2)
for i in range(m2):
eucl_personal[i] = np.linalg.norm(ratings[i, :] -
myratings)

MinDist_personal, DistIndex_personal = np.sort(eucl_personal),


np.argsort(eucl_personal)
closest_user_Dist_personal = DistIndex_personal[0]

ratings_cent_personal = ratings - np.mean(ratings, axis=1,


keepdims=True)
myratings_cent = myratings - np.mean(myratings)

pearson_personal = np.zeros(m2)
for i in range(m2):
pearson_personal[i] = (np.dot(ratings_cent_personal[i, :],
myratings_cent) /
(np.linalg.norm(ratings_cent_person
al[i, :]) * np.linalg.norm(myratings_cent)))

MaxPearson_personal, PearsonIndex_personal =
np.sort(pearson_personal)[::-1],
np.argsort(pearson_personal)[::-1]
closest_user_Pearson_personal = PearsonIndex_personal[0]

recommend_dist_personal = []
for k in range(n):

20
Linear Algebra – CC09 – Group 2

if users_movies[closest_user_Dist_personal, k] == 5:
recommend_dist_personal.append(k)

recommend_Pearson_personal = []
for k in range(n):
if users_movies[closest_user_Pearson_personal, k] == 5:
recommend_Pearson_personal.append(k)

liked_personal = []
for k in range(20):
if myratings[k] == 5:
liked_personal.append(index_small[k])

print("Movies you liked:")


for j in range(len(liked_personal)):
print(movies[liked_personal[j]-1][0])
print()

print("Movies recommended for you based on Euclidean


distance:")
for j in range(len(recommend_dist_personal)):
print(movies[recommend_dist_personal[j]][0])
print()

print("Movies recommended for you based on Pearson correlation


coefficient:")
for j in range(len(recommend_Pearson_personal)):
print(movies[recommend_Pearson_personal[j]][0])
print()

LAB 7 Result:
Rating is based on movies:
['E.T. the Extra-Terrestrial (1982)']
['Star Wars Episode IV - A New Hope (1977)']
['Star Wars Episode V - The Empire Strikes Back (1980)']
['Star Wars Episode VI - Return of the Jedi (1983)']
['Jurassic Park (1993)']
['Saving Private Ryan (1998)']
['Terminator 2']
['Matrix, The (1999)']
['Back to the Future (1985)']
['Silence of the Lambs, The (1991)']
['Star Wars Episode I - The Phantom Menace (1999)']
['Raiders of the Lost Ark (1981)']
['Fargo (1996)']
['Sixth Sense, The (1999)']

21
Linear Algebra – CC09 – Group 2

['Braveheart (1995)']
['Shakespeare in Love (1998)']
['Princess Bride, The (1987)']
["Schindler's List (1993)"]
['Shawshank Redemption, The (1994)']
['Groundhog Day (1993)']

Movies liked by the trial user:


['Star Wars Episode IV - A New Hope (1977)']
['Star Wars Episode V - The Empire Strikes Back (1980)']
['Star Wars Episode VI - Return of the Jedi (1983)']
['Matrix, The (1999)']
['Silence of the Lambs, The (1991)']
['Raiders of the Lost Ark (1981)']
['Groundhog Day (1993)']

Movies recommended based on Euclidean distance:


['Taxi Driver (1976)']
["Schindler's List (1993)"]
['Fargo (1996)']
['Godfather, The (1972)']
['North by Northwest (1959)']
['Casablanca (1942)']
['Citizen Kane (1941)']
['Mr. Smith Goes to Washington (1939)']
['Bonnie and Clyde (1967)']
['Bob Roberts (1992)']
['Paris Is Burning (1990)']
['12 Angry Men (1957)']
['To Kill a Mockingbird (1962)']
['Title not available']
['Grand Day Out, A (1992)']
['Raging Bull (1980)']
['Annie Hall (1977)']
['Stand by Me (1986)']
['Killing Fields, The (1984)']
['My Life as a Dog (Mitt liv som hund) (1985)']
['Tickle in the Heart, A (1996)']
['Boys, Les (1997)']
["There's Something About Mary (1998)"]
['On the Waterfront (1954)']
['Ordinary People (1980)']
['Chariots of Fire (1981)']
['Rain Man (1988)']
['Saving Private Ryan (1998)']
['Life Is Beautiful (La Vita è bella) (1997)']
['Risky Business (1983)']

22
Linear Algebra – CC09 – Group 2

['Brief Encounter (1946)']


['Shower (Xizhao) (1999)']

Movies recommended based on Pearson correlation


coefficient:
['Taxi Driver (1976)']
["Schindler's List (1993)"]
['Fargo (1996)']
['Godfather, The (1972)']
['North by Northwest (1959)']
['Casablanca (1942)']
['Citizen Kane (1941)']
['Mr. Smith Goes to Washington (1939)']
['Bonnie and Clyde (1967)']
['Bob Roberts (1992)']
['Paris Is Burning (1990)']
['12 Angry Men (1957)']
['To Kill a Mockingbird (1962)']
['Title not available']
['Grand Day Out, A (1992)']
['Raging Bull (1980)']
['Annie Hall (1977)']
['Stand by Me (1986)']
['Killing Fields, The (1984)']
['My Life as a Dog (Mitt liv som hund) (1985)']
['Tickle in the Heart, A (1996)']
['Boys, Les (1997)']
["There's Something About Mary (1998)"]
['On the Waterfront (1954)']
['Ordinary People (1980)']
['Chariots of Fire (1981)']
['Rain Man (1988)']
['Saving Private Ryan (1998)']
['Life Is Beautiful (La Vita è bella) (1997)']
['Risky Business (1983)']
['Brief Encounter (1946)']
['Shower (Xizhao) (1999)']

Movies you liked:


['Star Wars Episode IV - A New Hope (1977)']
['Jurassic Park (1993)']
['Back to the Future (1985)']
['Raiders of the Lost Ark (1981)']
['Shakespeare in Love (1998)']
['Shawshank Redemption, The (1994)']

Movies recommended for you based on Euclidean distance:

23
Linear Algebra – CC09 – Group 2

['Star Wars Episode IV - A New Hope (1977)']


['Pulp Fiction (1994)']
['Shawshank Redemption, The (1994)']
["Schindler's List (1993)"]
['Dr. Strangelove or']
['Godfather, The (1972)']
['Infinity (1996)']
['Paris Is Burning (1990)']
['Star Wars Episode V - The Empire Strikes Back (1980)']
['Apocalypse Now (1979)']
['Third Man, The (1949)']
['Title not available']
['Cool Hand Luke (1967)']
['Citizen Ruth (1996)']
['Life Is Beautiful (La Vita è bella) (1997)']

Movies recommended for you based on Pearson correlation


coefficient:
['Star Wars Episode IV - A New Hope (1977)']
['Pulp Fiction (1994)']
['Shawshank Redemption, The (1994)']
["Schindler's List (1993)"]
['Dr. Strangelove or']
['Godfather, The (1972)']
['Infinity (1996)']
['Paris Is Burning (1990)']
['Star Wars Episode V - The Empire Strikes Back (1980)']
['Apocalypse Now (1979)']
['Third Man, The (1949)']
['Title not available']
['Cool Hand Luke (1967)']
['Citizen Ruth (1996)']
['Life Is Beautiful (La Vita è bella) (1997)']

24
Linear Algebra – CC09 – Group 2

Questions
Task 3:
Q1: What does the command ratings=[] do?
ratings=[] creates an empty list named ratings. This list will be used to store the
ratings of users who have rated all 20 movies:
- Ratings is initialized as an empty list using ratings=[].
- A for loop is used to iterate through each row user_ratings of users_movies_sort.
- Inside the loop, np.prod(user_ratings) != 0 checks if the product of all elements
in the current row user_ratings is non-zero. If this condition is true, it means none
of the elements in the row are zero.
- If the condition np.prod(user_ratings) != 0 is true, the current row user_ratings
is appended to the ratings list using ratings.append(user_ratings).
- After the loop completes, ratings are converted to a NumPy array using ratings
= np.array(ratings). This array contains all the rows from users_movies_sort that
have no zero entries.
So, in summary, the command ratings=[] initializes an empty list to store the
relevant user ratings, which are then used for further analysis and recommendation
generation.

Task 9:
Q2: Are the variables closest user user Pearson. Pearson and closest user Dist the
same?
closest_user_Pearson represents the index of the user in the ratings matrix who
is most similar to the trial_user based on the Pearson correlation coefficient. The
Pearson correlation measures the linear correlation between two variables. In this case,
it looks at the ratings of the trial_user and each other user, and finds the user whose
rating pattern has the strongest positive linear relationship with the trial_user's ratings.
This takes into account not just the absolute values of the ratings, but also how the
ratings vary in relation to each other.

25
Linear Algebra – CC09 – Group 2

On the other hand, closest_user_Dist represents the index of the user who is
closest to the trial_user based on Euclidean distance. Euclidean distance is the straight-
line distance between two points in Euclidean space. Here, it's used to find the user
whose vector of movie ratings is geometrically closest to the trial_user's vector of
ratings. This metric considers the absolute difference between the rating values.
Because the Pearson correlation and Euclidean distance are measuring similarity
in different ways, closest_user_Pearson and closest_user_Dist may not always be the
same. The Pearson correlation is more sensitive to the pattern of ratings, while
Euclidean distance is more sensitive to the magnitude of the ratings.
For example, consider two users:
User 1 ratings: [1, 2, 3, 4, 5]
User 2 ratings: [2, 3, 4, 5, 1]
Trial user ratings: [1, 2, 3, 4, 5]
User 1 has the exact same ratings as the trial user, so it would be the closest based
on Euclidean distance. However, User 2's ratings have the same pattern as the trial user's
ratings (they go up and then down), so it would have a higher Pearson correlation,
despite not having the same exact ratings.
Therefore, the users identified as most similar by these two methods can be
different, depending on the specific rating data. The choice of similarity metric depends
on what aspects of similarity are most important for the recommender system.

26
Linear Algebra – CC09 – Group 2

III. CONCLUSION
The report has successfully allocated and solved the given problems using the
mentioned theories on Inner Product, Norms, and Angles of vectors. Project 6 applied
matrix operations, linear transformations, and inner product computations to arrive at
solutions. convolution involves applying a filter (also called a kernel or mask) to an
image by sliding it over the image's pixels and computing the weighted sum of the pixel
values under the filter at each position. On the other hand, Project 7 presents a different
challenge that can be effectively tackled through the utilization of vector norms and
angles. Here, linear algebra concepts, including vector spaces and inner products, are
instrumental in constructing a sophisticated movie recommendation system. In more
detail, the project utilize norms to gauge the disparity between user preferences and
potential movie choices. Moreover, applying other techniques such as sorting and rating
to support decision-making process.
Besides, the project aslo give valuable lessons for the authors to improve their
knowledge and capabilities in solving problem using mathematical application,
specifically matrix and inner space. Firstly, the project help the authors learn how
convolution involves applying a filter to an image by sliding it over the image's pixels
and computing weighted sums. Moreover, understand the importance of convolution in
image processing tasks such as blurring, sharpening, edge detection, and feature
extraction. Secondly, the project clarify the importance of mathematical algorithm on
data analysis and manipulation. In this project context, the application of norms and
angles increase the approachable of the problem and thereby extract meaningful
insights, such as identifying common trends, preferences, and patterns among users and
movies.

27

You might also like