0% found this document useful (0 votes)
36 views19 pages

Lecture7.2 After Large

The document discusses recommendation systems and their development. It provides an overview of recommendation systems, including examples like Pandora, YouTube, and Netflix recommendations. It also discusses the two main types of recommendation systems: collaborative filtering, which recommends items to a user based on preferences of similar users, and content-based filtering, which recommends items based on their features and a user's past preferences. The document explains how each approach works at a high level and also discusses challenges of recommendation systems.

Uploaded by

jinho baek
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views19 pages

Lecture7.2 After Large

The document discusses recommendation systems and their development. It provides an overview of recommendation systems, including examples like Pandora, YouTube, and Netflix recommendations. It also discusses the two main types of recommendation systems: collaborative filtering, which recommends items to a user based on preferences of similar users, and content-based filtering, which recommends items based on their features and a user's past preferences. The document explains how each approach works at a high level and also discusses challenges of recommendation systems.

Uploaded by

jinho baek
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

INTRODUCTION TO

COMPUTING SCIENCE AND PROGRAMMING


Lecture 7.2: Dictionaries and Files – Continue
Exploration Topic: Recommendation Systems

CMPT 120, Spring 2023, Mohammad Tayebi


Class Agenda

• Last Time • Today


• Dictionary Operations • Recommendation Systems
• Dictionary Methods • Recommendation System Development
• Finding a File on your Disk
• Reading Files
• Writing Files

• Reading
• cspy ch. 11 & 12

CMPT 120, Spring 2023, Mohammad Tayebi 2


Exploration Topic

Recommendation Systems
Related Course: CMPT 353

* Next slides are adopted from Subbarao Kambhampati.


CMPT 120, Spring 2023, Mohammad Tayebi 3
Recommendation Systems
• Recommenders are instances of
personalization software.
• Personalization concerns adapting to
the individual needs, interests, and
preferences of each user.
• Music, book, restaurant, product,
content …
• From a business perspective, it is
viewed as part of Customer
Relationship Management (CRM).
• Recommenders have been shown to
substantially increase sales at on-line
stores.
CMPT 120, Spring 2023, Mohammad Tayebi 4
Netflix Prize

• Task
• Given users ratings on some movies
• Predict user ratings on other movies
• If John rates
• “Mission Impossible”: 5 https://www.cnbc.com/

• “Over the Hedge”: 3, and


• “Back to the Future”: 4,
• Grand Prize
• $1M
• how would he rate “Harry Porter”, … ?
• 10% improvement over the existing system
• Performance • by 2011 (in 5 years)
• Error rate (accuracy)
• Participants
• 51K contestants
• 41K teams
CMPT 120, Spring 2023, Mohammad Tayebi
• 186 countries
5
Recommendation Systems - Examples

• Pandora music recommendation


• Finding songs similar to user’s favorite songs

• YouTube video recommendation


• Suggesting videos considering users previous activities

• Netflix movie recommendation


• Offering videos that share the same characteristics with
movies that a user rated highly

CMPT 120, Spring 2023, Mohammad Tayebi 6


Types of Recommendation Systems

• Collaborative recommendation
• Recommending a user based on the
preferences of other similar users

• Content-based recommendation
• Recommending based on item features
considering users previous actions

CMPT 120, Spring 2023, Mohammad Tayebi 7


Content-Based Recommending
Content/Profile-based
• Recommend items to customer C similar to previous items Red
Mars

rated highly by C. Found


ation

• E.g in movie recommendation, recommend movies with same actors, Juras-


sic
Park
Machine User
director, genre and so on. Lost
World
Learning Profile

• Uses machine learning algorithms to induce a profile of the 2001 Neuro-


mancer
2010

users preferences based on a featural description of content. Differ-


ence
Engine

Advantages Challenges
• Finding meaningful features
• No need for data on other users.
• Learning users’ tastes as a function of these content
• Able to recommend to users with unique tastes features
• Able to recommend new and unpopular items • Overspecialization by not being able to recommend
outside users’ profiles
• Recommendation for new users (cold starts)

CMPT 120, Spring 2023, Mohammad Tayebi 8


Collaborative Filtering Recommendation
• Finding a set of similar users whose interests are
“similar” to a given user interests
• How to measure users' similarity?
• Jaccard index, Cosine similarity, …
• Normalize ratings and compute a prediction from a
weighted combination of the selected users’ ratings
• Present items with highest predicted ratings as
recommendations

Jaccard index Cosine similarity

CMPT 120, Spring 2023, Mohammad Tayebi 9


Item-User Matrix

• The input to the collaborative


filtering algorithm is a matrix where
rows are items and columns are users
• Can think of users as vectors in the
space of items (or vice versa)
• Computing similarity between users
• Finding who are the most similar users
• Computing similarity between items
• Finding what are the most correlated items
https://buomsoo-kim.github.io/

CMPT 120, Spring 2023, Mohammad Tayebi 10


This page is intentionally left blank.

CMPT 120, Spring 2023, Mohammad Tayebi 11


Recommendation System - Implementation

• We want to develop a collaborative-filtering-based recommendation system.


• We use amazon users’ reviews for music instruments. In the table below you can see the data
columns of this dataset.
• We can first detect the most similar users to a given user C, and then recommend new items to C
based on the list of items purchased by similar users, or
• We can detect the most similar items to every item purchased by user C and then recommend new
items to user C.
• We use Jaccard index to compute the similarity.
https://s3.amazonaws.com/amazon-reviews-pds/tsv/amazon_reviews_us_Musical_Instruments_v1_00.tsv.gz
Amazon Reviews
Dataset Columns

CMPT 120, Spring 2023, Mohammad Tayebi 12


Recommendation System Functions
• def read_dataset()
• Reading dataset from file
• def data_preparation()
• Preparing data for processing
• def similar_items()
• Finding most similar items to a given item
• def jaccard()
• Computing Jaccard index for two given lists https://medium.com/web-mining-is688-spring-2021

• def main()
• Main function to initiate the program

CMPT 120, Spring 2023, Mohammad Tayebi 13


The file reading will be in UTF-8 to
handle non-ASCII characters in the file.
1. def read_dataset(path):
2. myfile = open(path, 'r', encoding='utf-8')
3. header = myfile.readline()
4. header = header.strip().split('\t’)
5. dataset = []
6. for line in myfile:
7. fields = line.strip().split('\t')
8. dict_fields = dict(zip(header, fields))
9. dataset.append(d)
10. return dataset

Zip function in Python is used to iterate over two lists. It takes corresponding
elements from two lists and merges them in a tuple.
import gzip
countries = ["UK", "Italy", "Canada"]
capitals = ["London", "Rome", "Ottawa"]
country_capital = zip(countries, capitals)
CMPT 120, Spring 2023, Mohammad Tayebi 14
1. def data_preparation(dataset):
2. users_per_item = {}
3. for d in dataset:
4. user = d['customer_id']
5. item = d['product_id']
6. if item not in users_per_item:
7. users_per_item[item] = [user]
8. else:
9. temp_list = users_per_item[item]
10. temp_list.append(user)
11. users_per_item[item] = temp_list
12. return users_per_item

CMPT 120, Spring 2023, Mohammad Tayebi 15


Sorting the list in descending order.

1. def similar_items(upi, itemid):


2. similarities = []
3. users = upi[itemid]
4. for i in upi:
5. if i == itemid:
6. continue
7. similarity = jaccard(users, upi[i])
8. similarities.append((similarity, i))
9. similarities.sort(reverse=True)
10. return similarities[:10]

CMPT 120, Spring 2023, Mohammad Tayebi 16


• The values in a set are unique because there cannot be any duplicates.
• When we convert a list to a set, all duplicates are removed.
• The intersection() method returns a set that contains the similarity
between two sets.
• The union() method returns a set that contains all items from both of the sets.

1. def jaccard(k1, k2):


2. s1 = set(k1)
3. s2 = set(k2)
4. numerator = len(s1.intersection(s2))
5. denominator = len(s1.union(s2))
6. return numerator / denominator

CMPT 120, Spring 2023, Mohammad Tayebi 17


1. def main():
2. path = 'music.txt'
3. data = read_dataset(path)
4. item_user = data_preparation(data)
5. query = data[2]['product_id']
6. sim = similar_items(item_user, query)
7. print(sim)

[(0.028446389496717725, 'B00006I5SD'), (0.01694915254237288, 'B00006I5SB'),


(0.015065913370998116, 'B000AJR482'), (0.014204545454545454, 'B00E7MVP3S'),
(0.008955223880597015, 'B001255YL2'), (0.008849557522123894, 'B003EIRVO8'),
(0.008333333333333333, 'B0015VEZ22'), (0.00821917808219178, 'B00006I5UH'),
(0.008021390374331552, 'B00008BWM7'), (0.007656967840735069, 'B000H2BC4E')]

CMPT 120, Spring 2023, Mohammad Tayebi 18


Next Lecture

Midterm Exam Preparation


Pre-reading: cspy ch. 1-12

CMPT 120, Spring 2023, Mohammad Tayebi


19

You might also like