0% found this document useful (0 votes)
64 views2 pages

2020 Dse Bds Assign3

This document provides instructions for an assignment to build a book recommendation engine using collaborative filtering on a GoodReads book rating dataset. Students are asked to analyze the dataset to determine the number of unique users and books as well as the percentage of books rated 3 or less. They then need to tune parameters of the recommendation model to minimize the RMSE and use the model to provide top 5 book recommendations for each user and top 5 user recommendations for each book. The model recommendations for user 1 should also be compared to that user's actual "to read" list to evaluate the model.

Uploaded by

surajpb1989
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views2 pages

2020 Dse Bds Assign3

This document provides instructions for an assignment to build a book recommendation engine using collaborative filtering on a GoodReads book rating dataset. Students are asked to analyze the dataset to determine the number of unique users and books as well as the percentage of books rated 3 or less. They then need to tune parameters of the recommendation model to minimize the RMSE and use the model to provide top 5 book recommendations for each user and top 5 user recommendations for each book. The model recommendations for user 1 should also be compared to that user's actual "to read" list to evaluate the model.

Uploaded by

surajpb1989
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

1

DSE BIG DATA SYSTEMS ASSIGNMENT 3

Submission Date: 20 May 2020 11.55 PM

Weightage: 10%

You all must have visited GoodReads, in order to see the ratings for the books you are interested in or
looking for an interested book! You might be deciding what to read next, then you’re in the right place.
You will tell what titles or genres you’ve enjoyed in the past, and GoodReads give you surprisingly
insightful recommendations. Now it’s your turn to develop such a recommendation system!

You have been given a GoodReads book rating dataset (link provided in the references section). Using the
Spark’s MLLib module and other related libraries / modules (additional references provided at the end of
this document), you are supposed to prepare a recommendation engine.

The Collaborative filtering (CF) is a technique used by recommender systems. Usually the two common
questions those will be answered by this technique are:

 For a given user, what are the top recommended products?


 For a given product, what are the recommended users?

With the help of the given dataset and the recommendation model you have built, answer the following
questions:

Q1. What are the number of unique users and books?

Q2. What percentage of books have received the ratings 3 or less than 3?

Q3. After tuning the parameters like rank, maxIter and regParam, what is the best RMSE that you have
obtained?

Q4. Using the recommendation engine based on the best RMSE obtained,

a) What are the top 5 book title recommendations made for each user?
b) What are the top 5 user recommendation made for each book title?

Q5. For user 1, what are the book titles recommendations made by your model actually appear in the
users “to read” list? What is your conclusion from the same?

Notes:
 This is a take-home assignment to be carried out by each learner group independently.
 This is programming exercise - requiring the given dataset to be used – on Jupyter notebook
environment / Apache Zeppelin notebook.

DSE BDS Assignment 3


1

 You may consult / discuss with other learners peripheral aspects such as the environment but not
on solving the specific problems in terms of design or implementation.
 You have to write the appropriate Python code in Jupyter / Zeppelin notebook to support you
answers and submit with following nomenclature
Final document - BDS_Assignment3_<Group_ID>.ipynb / zeppelin notbook
 Provide appropriate justification when processing the data or arriving at the conclusions.
 In case of any further queries, if those are generic once, learners are encouraged to use discussion
forums, otherwise they can reach out to me at [email protected].
 Manage your efforts properly as there is no scope to shift the deadlines announced above.

References:
1) Collaborative Filtering
2) Apache Spark Collaborative Filtering documentation
3) ALS algorithm
4) Large-scale Parallel Collaborative Filtering for the Netflix Prize
5) GoodReads Dataset
6) Apache Zeppelin

DSE BDS Assignment 3

You might also like