0% found this document useful (0 votes)

16 views7 pages

KNN Reccomendation

Uploaded by

xopen49006

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views7 pages

KNN Reccomendation

Uploaded by

xopen49006

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

In [2]: # This Python 3 environment comes with many helpful analytics libraries installed

# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-py

# For example, here's several helpful packages to load

import numpy as np # linear algebra

import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from IPython.display import display, Math
# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all fi

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
for filename in filenames:
print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets prese
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside

/kaggle/input/dataset-1/u.item
/kaggle/input/dataset-1/u.data
/kaggle/input/dataset-1/u.user

Introduction
I re-created an experiment that explores collaborative filtering techniques, specifically using the
KNNBasic algorithm for recommendation. I loaded a dataset from a plain text file and then
trained the model on it. To assess its performance, I employed cross-validation, evaluating
metrics such as RMSE and MAE.

Throughout the process, I followed several steps, including data loading, model training,
prediction making, and recommendation generation. My algorithm is based on user-based
collaborative filtering with cosine similarity. Additionally, I visualized the model errors and
retrieved nearest neighbors of an item.

In [3]: # Load Surprise libraries

from surprise import KNNBasic
from surprise import Reader
from surprise import Dataset
from surprise import accuracy
from surprise.model_selection import cross_validate

In [4]: # Load Plotting libraries

%matplotlib inline
import matplotlib.pyplot as plt

1. Loading data
In [5]: # Path to dataset file
file_path = os.path.expanduser('/kaggle/input/dataset-1/u.data')

In [6]: # Read the data into a Surprise dataset

reader = Reader(line_format = 'user item rating timestamp', sep = '\t', rating_scale =
data = Dataset.load_from_file(file_path, reader = reader)

In [7]: #!pip install latexify-py

In [8]: import math

import latexify

2. Train the model and measure its error

In [9]: # LaTeX representation of the formula
latex_formula = r"\hat{r}_{ui} = \frac{\sum_{v \in N_{ki}(u)} \text{sim}(u,v) \cdot r_

# Display the formula

display(Math(latex_formula))

∑ sim(u, v) ⋅ rvi
v∈Nki (u)
^ui =
r
∑ sim(u, v)
v∈Nki (u)

In [10]: # Use k-NN algorithm with user-based collaborative filtering and cosine similarity
kk = 50
sim_options = {'name': 'cosine', 'user_based': True}
algo = KNNBasic(k = kk, sim_options = sim_options, verbose = True)

We're setting up a recommendation system using a k-Nearest Neighbors (k-NN) algorithm for
collaborative filtering. Collaborative filtering is a method for making predictions about what a
user might like based on preferences from similar users.

Here's why we've chosen this specific setup:

Firstly, we're using the k-NN algorithm. It's a straightforward but powerful approach where we
find the 'k' nearest neighbors to a target user based on their past ratings. Then, we use these
neighbors' ratings to predict what the target user might like. It's a method that's often used in
recommendation systems because of its simplicity and effectiveness.

Next, we're opting for user-based collaborative filtering. This means we're recommending items
to a user based on the preferences of users who are similar to them. If two users have similar
tastes, they're likely to enjoy similar items. It's intuitive and often yields good results.

For measuring similarity between users, we're using cosine similarity. This metric calculates the
cosine of the angle between two vectors, providing a measure of similarity that's unaffected by
the magnitude of the vectors. It's suitable for recommendation systems because it focuses on
the direction of preferences rather than their magnitude.

The specific parameters we've chosen, like kk = 50 and sim_options, allow us to fine-tune the
algorithm. For example, kk = 50 specifies that we'll consider the 50 nearest neighbors, and
sim_options configures the similarity measure to cosine similarity. These parameters help
balance prediction accuracy and computational efficiency.
In [11]: # Run 5-fold cross-validation and print results
cv = cross_validate(algo, data, measures = ['RMSE', 'MAE'], cv = 5, verbose = True)

Computing the cosine similarity matrix...

Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Evaluating RMSE, MAE of algorithm KNNBasic on 5 split(s).

Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Mean Std

RMSE (testset) 1.0222 1.0101 1.0133 1.0190 1.0198 1.0169 0.0045
MAE (testset) 0.8073 0.7983 0.8009 0.8078 0.8068 0.8042 0.0039
Fit time 0.69 0.71 0.66 0.69 0.73 0.70 0.02
Test time 5.95 6.19 6.58 6.17 6.08 6.19 0.21

It's nice to see that the algorithm's performance doesn't vary much across different parts of the
dataset. The standard deviations for RMSE and MAE are pretty small, which tells us that it's
giving consistent results across the board.

The errors (RMSE and MAE) aren't too high, which means it's making decent guesses about
what users might like. The RMSE values are around 1.0169 on average, and the MAE values
hover around 0.8042

In [12]: # Get data

rmse = cv['test_rmse']
mae = cv['test_mae']
x = np.arange(len(rmse))

# Set up the matplotlib figure

fig, ax = plt.subplots(figsize = (10, 5))
plt.xticks(np.arange(min(x), max(x) + 1, 1.0))
plt.ylim(0.5, 1.3)
ax.plot(x, rmse, marker='o', label="rmse")
ax.plot(x, mae, marker='o', label="mae")

# Chart setup
plt.title("Model Errors", fontsize = 12)
plt.xlabel("CV", fontsize = 10)
plt.ylabel("Error", fontsize = 10)
plt.legend()
plt.show()
3. Make some predictions
In [13]: # Without real rating
p1 = algo.predict(uid = '13', iid = '181', verbose = True)

user: 13 item: 181 r_ui = None est = 4.04 {'actual_k': 50, 'was_im
possible': False}

In [14]: # With real rating

p2 = algo.predict(uid = '196', iid = '302', r_ui = 4, verbose = True)

user: 196 item: 302 r_ui = 4.00 est = 4.02 {'actual_k': 50, 'was_im
possible': False}

4. Get the k nearest neighbors of an item

In [15]: import os
import io

In [16]: # Return two mappings to convert raw ids into movie names and movie names into raw ids
def read_item_names(file_path):
rid_to_name = {}
name_to_rid = {}

with io.open(file_path, 'r', encoding = 'ISO-8859-1') as f:

for line in f:
line = line.split('|')
rid_to_name[line[0]] = line[1]
name_to_rid[line[1]] = line[0]

return rid_to_name, name_to_rid

#we are using the above function as it allows for easy conversion between raw movie ID
In [17]: # Read the mappings raw id <-> movie name
item_filepath = '/kaggle/input/dataset-1/u.item'
rid_to_name, name_to_rid = read_item_names(item_filepath)

In [18]: # Target movie

target_movie = 'Toy Story (1995)'

In [19]: # Retrieve inner id of the movie Toy Story

toy_story_raw_id = name_to_rid[target_movie]
toy_story_inner_id = algo.trainset.to_inner_iid(toy_story_raw_id)
print(target_movie + ':', toy_story_inner_id)

Toy Story (1995): 111

In [20]: # Retrieve inner ids of the nearest neighbors of Toy Story

toy_story_neighbors = algo.get_neighbors(toy_story_inner_id, k = 10)
toy_story_neighbors

[13, 44, 54, 91, 96, 100, 102, 106, 117, 148]
Out[20]:

In [21]: # The 10 nearest neighbors of Toy Story are:

print("The movies most similar to '" + target_movie + " are:")

for inner_id in toy_story_neighbors:

raw_id = algo.trainset.to_raw_iid(inner_id)
movie = rid_to_name[raw_id]
print(raw_id, '-', movie)

The movies most similar to 'Toy Story (1995) are:

177 - Good, The Bad and The Ugly, The (1966)
434 - Forbidden Planet (1956)
606 - All About Eve (1950)
1052 - Dracula: Dead and Loving It (1995)
194 - Sting, The (1973)
656 - M (1931)
174 - Raiders of the Lost Ark (1981)
685 - Executive Decision (1996)
607 - Rebecca (1940)
16 - French Twist (Gazon maudit) (1995)

5. Get the top-N recommendations

In [22]: # Return the top-N recommendation for each user from a set of predictions.
def get_top_n(predictions, n = 10):

# First map the predictions to each user.

top_n = defaultdict(list)
for uid, iid, true_r, est, _ in predictions:
top_n[uid].append((iid, est))

# Then sort the predictions for each user and retrieve the k highest ones.
for uid, user_ratings in top_n.items():
user_ratings.sort(key=lambda x: x[1], reverse=True)
top_n[uid] = user_ratings[:n]

return top_n
The get_top_n function efficiently generates personalized recommendations for users based
on predictions made by a recommendation algorithm. It achieves this by first grouping the
predictions according to user IDs, creating a mapping that associates each user with a list of
predicted ratings for different items. This initial organization lays the groundwork for
subsequent steps in the recommendation process.

In [23]: # Create train_set and test_set

train_set = data.build_full_trainset()
test_set = train_set.build_anti_testset()

# First train a KNN algorithm on the whole dataset

algo.fit(train_set)
predictions = algo.test(test_set)

# RMSE should be low as we are biased

accuracy.rmse(predictions, verbose = True)

Computing the cosine similarity matrix...

Done computing similarity matrix.
RMSE: 0.9255
0.9254759689787458
Out[23]:

With an RMSE of around 0.9255, the recommendation system appears to be performing

reasonably well. It implies that, on average, the algorithm's predicted ratings are about 0.9255
units away from the actual ratings in the dataset.

In [24]: from collections import defaultdict

In [25]: # Than predict ratings for all pairs (u, i) that are NOT in the training set
top_n = 10
top_pred = get_top_n(predictions, n = top_n)
# User raw Id
uid_list = ['196']

# Print the recommended items for a specific user

for uid, user_ratings in top_pred.items():
if uid in uid_list:
for (iid, rating) in user_ratings:
movie = rid_to_name[iid]
print('Movie:', iid, '-', movie, ', rating:', str(rating))

Movie: 1189 - Prefontaine (1997) , rating: 5

Movie: 1500 - Santa with Muscles (1996) , rating: 5
Movie: 814 - Great Day in Harlem, A (1994) , rating: 5
Movie: 1536 - Aiqing wansui (1994) , rating: 5
Movie: 1293 - Star Kid (1997) , rating: 5
Movie: 1599 - Someone Else's America (1995) , rating: 5
Movie: 1653 - Entertaining Angels: The Dorothy Day Story (1996) , rating: 5
Movie: 1467 - Saint of Fort Washington, The (1993) , rating: 5
Movie: 1122 - They Made Me a Criminal (1939) , rating: 5
Movie: 1201 - Marlene Dietrich: Shadow and Light (1996) , rating: 5

In [26]: # import networkx as nx

#import matplotlib.pyplot as plt/***
# Create a new graph
#G = nx.Graph()

# Add nodes for users and items

#for uid, iid, _, _, _ in predictions:
# G.add_node(uid, type='user')
# G.add_node(iid, type='item')

# Add edges representing interactions (ratings) between users and items

#for uid, iid, _, _, _ in predictions:
# G.add_edge(uid, iid)

# Plot the graph

#plt.figure(figsize=(12, 8))
#pos = nx.spring_layout(G, seed=42) # Position nodes using a spring layout algorithm
#nx.draw(G, pos, with_labels=False, node_size=50, node_color='skyblue', edge_color='gr
#plt.title('Interconnected Graph of Testing Dataset')
#plt.show()

Machine Learning Syllabus PDF
0% (1)
Machine Learning Syllabus PDF
4 pages
Data Science in E-Commerce - Report - Writing
No ratings yet
Data Science in E-Commerce - Report - Writing
18 pages
Topic 3
No ratings yet
Topic 3
62 pages
Research Proposal For Anhui Normal University
No ratings yet
Research Proposal For Anhui Normal University
15 pages
Big Data Analysis On Netflix
No ratings yet
Big Data Analysis On Netflix
10 pages
Data Mining Portfolio
No ratings yet
Data Mining Portfolio
19 pages
Report-Draft 1
No ratings yet
Report-Draft 1
76 pages
A Survey On Movie Recommendation System PDF
No ratings yet
A Survey On Movie Recommendation System PDF
4 pages
Chapter 9 - Recommendation Systems
No ratings yet
Chapter 9 - Recommendation Systems
12 pages
AAIC Syllabus
No ratings yet
AAIC Syllabus
19 pages
Data Science For Business
No ratings yet
Data Science For Business
18 pages
Is593-Lecture04 Recommendation Systems
No ratings yet
Is593-Lecture04 Recommendation Systems
51 pages
Energy Optimization Prediction Write Up
No ratings yet
Energy Optimization Prediction Write Up
44 pages
Just For Fun
No ratings yet
Just For Fun
24 pages
Music Recommendation System
No ratings yet
Music Recommendation System
24 pages
Activity 01: Python Set/s of Source Code Use in The Activity (Paste Below)
No ratings yet
Activity 01: Python Set/s of Source Code Use in The Activity (Paste Below)
2 pages
Movie Rec
No ratings yet
Movie Rec
13 pages
Your Paragraph Text
No ratings yet
Your Paragraph Text
13 pages
Final Report Chatbot
No ratings yet
Final Report Chatbot
45 pages
41 Perusse Alexander Aperusse PDF
No ratings yet
41 Perusse Alexander Aperusse PDF
7 pages
K-Nearest Neighbor: General Gist
No ratings yet
K-Nearest Neighbor: General Gist
14 pages
CBSE Class 10 Artificial Intelligence Solution Set 4 104
100% (4)
CBSE Class 10 Artificial Intelligence Solution Set 4 104
17 pages
International Conference On Innovative Computing and Communications
No ratings yet
International Conference On Innovative Computing and Communications
891 pages
K-Nearest Neighbor On Python Ken Ocuma
100% (2)
K-Nearest Neighbor On Python Ken Ocuma
9 pages
Big Data Analytics in Retail
No ratings yet
Big Data Analytics in Retail
18 pages
Worksheet - 2.3 20BCS7490
No ratings yet
Worksheet - 2.3 20BCS7490
6 pages
Modeling User Behavior and Costs in AI-Assisted Programming
No ratings yet
Modeling User Behavior and Costs in AI-Assisted Programming
39 pages
Expert Systems With Applications - Volume 149, 1 July 2020, 113301
No ratings yet
Expert Systems With Applications - Volume 149, 1 July 2020, 113301
11 pages
Worksheet - 2.3 20BCS7611
No ratings yet
Worksheet - 2.3 20BCS7611
6 pages
Social Suggest Team Report
No ratings yet
Social Suggest Team Report
52 pages
DL Exp-1.4 19BCS1431
No ratings yet
DL Exp-1.4 19BCS1431
5 pages
K-NN Algorithm: Need To Create Two Files File 1: KNN - Py Second File: Expt3.py
No ratings yet
K-NN Algorithm: Need To Create Two Files File 1: KNN - Py Second File: Expt3.py
4 pages
BCG China AI
No ratings yet
BCG China AI
16 pages
Blockchain Analytics
No ratings yet
Blockchain Analytics
9 pages
Full Download Data Science and Analytics With Python 1st Edition Jesus Rogel-Salazar PDF
100% (11)
Full Download Data Science and Analytics With Python 1st Edition Jesus Rogel-Salazar PDF
49 pages
Movie Recommendation System KNN (ML-Usecase)
No ratings yet
Movie Recommendation System KNN (ML-Usecase)
7 pages
Tarea 29-04
No ratings yet
Tarea 29-04
3 pages
Advanced Recommender Systems With Python
No ratings yet
Advanced Recommender Systems With Python
13 pages
Practical File of AI and ML
No ratings yet
Practical File of AI and ML
26 pages
Heart Disease Prediction - Colab
No ratings yet
Heart Disease Prediction - Colab
18 pages
Revolutionizing Retail: A Mini Review of E-Commerce Evolution
No ratings yet
Revolutionizing Retail: A Mini Review of E-Commerce Evolution
11 pages
CCS360 Lab Record
No ratings yet
CCS360 Lab Record
28 pages
Assignment No 2 AI
No ratings yet
Assignment No 2 AI
4 pages
Preprints202305 1649 v1
No ratings yet
Preprints202305 1649 v1
25 pages
Book Recommendation Project
No ratings yet
Book Recommendation Project
15 pages
9,12,19,68 - ML Assignment-2
No ratings yet
9,12,19,68 - ML Assignment-2
5 pages
B-56 Sanket Jambhulkar MLA-7
No ratings yet
B-56 Sanket Jambhulkar MLA-7
9 pages
New Amazaon Research
No ratings yet
New Amazaon Research
7 pages
Recommendation System in Python
No ratings yet
Recommendation System in Python
6 pages
ML
No ratings yet
ML
10 pages
Mod 4
No ratings yet
Mod 4
6 pages
Unit-4 Generative AI
No ratings yet
Unit-4 Generative AI
14 pages
Project Report
No ratings yet
Project Report
57 pages
Digital Transformation Sem 2
No ratings yet
Digital Transformation Sem 2
21 pages
Movie Recommendation System
No ratings yet
Movie Recommendation System
22 pages
ML Labmanual
No ratings yet
ML Labmanual
33 pages
01 134192 066 9559671601 28052022 103753pm
No ratings yet
01 134192 066 9559671601 28052022 103753pm
1 page
Implementing KNN Algorithm On The Iris Dataset
No ratings yet
Implementing KNN Algorithm On The Iris Dataset
7 pages
Title Obvhbresearch Project
No ratings yet
Title Obvhbresearch Project
7 pages
Sanjey RS Lab
No ratings yet
Sanjey RS Lab
33 pages
Deep Learning Ans Semantic Fused FRS
No ratings yet
Deep Learning Ans Semantic Fused FRS
10 pages
ML Lab Manual
No ratings yet
ML Lab Manual
24 pages
Social Media Analytics and Data Analysis (UNIT 3)
No ratings yet
Social Media Analytics and Data Analysis (UNIT 3)
22 pages
Karimi, Jannach, Jugovac - 2018 - News Recommender Systems - Survey and Roads Ahead
No ratings yet
Karimi, Jannach, Jugovac - 2018 - News Recommender Systems - Survey and Roads Ahead
25 pages
Ex 5
No ratings yet
Ex 5
4 pages
BCSL606 Machine Learning Lab
No ratings yet
BCSL606 Machine Learning Lab
33 pages
Assignment 5zeerak
No ratings yet
Assignment 5zeerak
6 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
33 pages
DR AhlemDrif-CV
No ratings yet
DR AhlemDrif-CV
6 pages
Movie Recommendation System Project
No ratings yet
Movie Recommendation System Project
9 pages
DL Project
No ratings yet
DL Project
9 pages
MLLab Manual
No ratings yet
MLLab Manual
24 pages
Exp 2
No ratings yet
Exp 2
14 pages
Rahul Raj - Ipynb - Colab
No ratings yet
Rahul Raj - Ipynb - Colab
50 pages
ML 3
No ratings yet
ML 3
24 pages
Inn Aat Report
No ratings yet
Inn Aat Report
10 pages
Mlalllabprgs
No ratings yet
Mlalllabprgs
17 pages
ML - Datascience Manual
No ratings yet
ML - Datascience Manual
64 pages
Dhanashree ML Report
No ratings yet
Dhanashree ML Report
3 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
26 pages
ML Manual
No ratings yet
ML Manual
30 pages
V
No ratings yet
V
8 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
9 pages
Machine Learning Lab Manaul BCSL606
No ratings yet
Machine Learning Lab Manaul BCSL606
27 pages
ML Short Code - Under Updating
No ratings yet
ML Short Code - Under Updating
4 pages
Machine Learning Programs
No ratings yet
Machine Learning Programs
10 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet
How to a Developers Guide to 4k: Developer edition, #3
From Everand
How to a Developers Guide to 4k: Developer edition, #3
Xinc Cyberwizard
No ratings yet
Data Science Programming In Python
From Everand
Data Science Programming In Python
Anita Raichand
No ratings yet

KNN Reccomendation

Uploaded by

KNN Reccomendation

Uploaded by

In [2]: # This Python 3 environment comes with many helpful analytics libraries installed

# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-py

import numpy as np # linear algebra

In [3]: # Load Surprise libraries

In [4]: # Load Plotting libraries

In [6]: # Read the data into a Surprise dataset

In [7]: #!pip install latexify-py

In [8]: import math

2. Train the model and measure its error

# Display the formula

Here's why we've chosen this specific setup:

Computing the cosine similarity matrix...

Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Mean Std

In [12]: # Get data

# Set up the matplotlib figure

In [14]: # With real rating

4. Get the k nearest neighbors of an item

with io.open(file_path, 'r', encoding = 'ISO-8859-1') as f:

return rid_to_name, name_to_rid

In [18]: # Target movie

In [19]: # Retrieve inner id of the movie Toy Story

Toy Story (1995): 111

In [20]: # Retrieve inner ids of the nearest neighbors of Toy Story

In [21]: # The 10 nearest neighbors of Toy Story are:

for inner_id in toy_story_neighbors:

The movies most similar to 'Toy Story (1995) are:

5. Get the top-N recommendations

# First map the predictions to each user.

In [23]: # Create train_set and test_set

# First train a KNN algorithm on the whole dataset

# RMSE should be low as we are biased

Computing the cosine similarity matrix...

With an RMSE of around 0.9255, the recommendation system appears to be performing

In [24]: from collections import defaultdict

# Print the recommended items for a specific user

Movie: 1189 - Prefontaine (1997) , rating: 5

In [26]: # import networkx as nx

# Add nodes for users and items

# Add edges representing interactions (ratings) between users and items

# Plot the graph

You might also like