2020 Dse Bds Assign3

This document provides instructions for an assignment to build a book recommendation engine using collaborative filtering on a GoodReads book rating dataset. Students are asked to analyze the dataset to determine the number of unique users and books as well as the percentage of books rated 3 or less. They then need to tune parameters of the recommendation model to minimize the RMSE and use the model to provide top 5 book recommendations for each user and top 5 user recommendations for each book. The model recommendations for user 1 should also be compared to that user's actual "to read" list to evaluate the model.

Uploaded by

surajpb1989

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

64 views2 pages

2020 Dse Bds Assign3

Uploaded by

surajpb1989

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 2

1

DSE BIG DATA SYSTEMS ASSIGNMENT 3

Submission Date: 20 May 2020 11.55 PM

Weightage: 10%

You all must have visited GoodReads, in order to see the ratings for the books you are interested in or
looking for an interested book! You might be deciding what to read next, then you’re in the right place.
You will tell what titles or genres you’ve enjoyed in the past, and GoodReads give you surprisingly
insightful recommendations. Now it’s your turn to develop such a recommendation system!

You have been given a GoodReads book rating dataset (link provided in the references section). Using the
Spark’s MLLib module and other related libraries / modules (additional references provided at the end of
this document), you are supposed to prepare a recommendation engine.

The Collaborative filtering (CF) is a technique used by recommender systems. Usually the two common
questions those will be answered by this technique are:

 For a given user, what are the top recommended products?

 For a given product, what are the recommended users?

With the help of the given dataset and the recommendation model you have built, answer the following
questions:

Q1. What are the number of unique users and books?

Q2. What percentage of books have received the ratings 3 or less than 3?

Q3. After tuning the parameters like rank, maxIter and regParam, what is the best RMSE that you have
obtained?

Q4. Using the recommendation engine based on the best RMSE obtained,

a) What are the top 5 book title recommendations made for each user?
b) What are the top 5 user recommendation made for each book title?

Q5. For user 1, what are the book titles recommendations made by your model actually appear in the
users “to read” list? What is your conclusion from the same?

Notes:
 This is a take-home assignment to be carried out by each learner group independently.
 This is programming exercise - requiring the given dataset to be used – on Jupyter notebook
environment / Apache Zeppelin notebook.

DSE BDS Assignment 3

 You may consult / discuss with other learners peripheral aspects such as the environment but not
on solving the specific problems in terms of design or implementation.
 You have to write the appropriate Python code in Jupyter / Zeppelin notebook to support you
answers and submit with following nomenclature
Final document - BDS_Assignment3_<Group_ID>.ipynb / zeppelin notbook
 Provide appropriate justification when processing the data or arriving at the conclusions.
 In case of any further queries, if those are generic once, learners are encouraged to use discussion
forums, otherwise they can reach out to me at [email protected].
 Manage your efforts properly as there is no scope to shift the deadlines announced above.

References:
1) Collaborative Filtering
2) Apache Spark Collaborative Filtering documentation
3) ALS algorithm
4) Large-scale Parallel Collaborative Filtering for the Netflix Prize
5) GoodReads Dataset
6) Apache Zeppelin

DSE BDS Assignment 3

Python Machine Learning By Example
From Everand
Python Machine Learning By Example
Yuxi (Hayden) Liu
4/5 (7)
Database Management for Business Leaders: Building and Using Data Solutions That Work for You
From Everand
Database Management for Business Leaders: Building and Using Data Solutions That Work for You
Larry Ruddell
No ratings yet
Mastering Data Analysis with Python: A Comprehensive Guide to NumPy, Pandas, and Matplotlib
From Everand
Mastering Data Analysis with Python: A Comprehensive Guide to NumPy, Pandas, and Matplotlib
Rajender Kumar
No ratings yet
KNIME Essentials
From Everand
KNIME Essentials
Gábor Bakos
No ratings yet
Software Asset Management: What Is It and Why Do I Need It?: A Textbook on the Fundamentals in Software License Compliance, Audit Risks, Optimizing Software License ROI, Business Practices and Life Cycle Management
From Everand
Software Asset Management: What Is It and Why Do I Need It?: A Textbook on the Fundamentals in Software License Compliance, Audit Risks, Optimizing Software License ROI, Business Practices and Life Cycle Management
Carl A. Bolton
No ratings yet
Assignment 03:: Association Rule Mining
No ratings yet
Assignment 03:: Association Rule Mining
3 pages
Cracking C Programming Interview: 500+ interview questions and explanations to sharpen your C concepts for a lucrative programming career (English Edition)
From Everand
Cracking C Programming Interview: 500+ interview questions and explanations to sharpen your C concepts for a lucrative programming career (English Edition)
Tanuj Kumar Jhamb
No ratings yet
Project Report "E-Commerce Recommendation"
No ratings yet
Project Report "E-Commerce Recommendation"
20 pages
Data Structures and Algorithm Analysis in Java, Third Edition
From Everand
Data Structures and Algorithm Analysis in Java, Third Edition
Clifford A. Shaffer
4/5 (4)
Flip Flop Mealy and Moore Model
100% (2)
Flip Flop Mealy and Moore Model
25 pages
State Bank of India
No ratings yet
State Bank of India
1 page
Preparing Data for Analysis with JMP
From Everand
Preparing Data for Analysis with JMP
Robert Carver
No ratings yet
Delhi Technological University Project Proposal: Book Recommendation System
No ratings yet
Delhi Technological University Project Proposal: Book Recommendation System
6 pages
Building Machine Learning Systems Using Python: Practice to Train Predictive Models and Analyze Machine Learning Results with Real Use-Cases (English Edition)
From Everand
Building Machine Learning Systems Using Python: Practice to Train Predictive Models and Analyze Machine Learning Results with Real Use-Cases (English Edition)
Deepti Chopra
No ratings yet
Bda Mini Project Part2
No ratings yet
Bda Mini Project Part2
24 pages
Mastering TensorFlow 2.x: Implement Powerful Neural Nets across Structured, Unstructured datasets and Time Series Data
From Everand
Mastering TensorFlow 2.x: Implement Powerful Neural Nets across Structured, Unstructured datasets and Time Series Data
Rajdeep Dua
No ratings yet
Python Data Cleaning Cookbook: Prepare your data for analysis with pandas, NumPy, Matplotlib, scikit-learn, and OpenAI
From Everand
Python Data Cleaning Cookbook: Prepare your data for analysis with pandas, NumPy, Matplotlib, scikit-learn, and OpenAI
Michael Walker
5/5 (1)
Capstone Project: I. Definition
No ratings yet
Capstone Project: I. Definition
17 pages
Deep Learning for Computer Vision with SAS: An Introduction
From Everand
Deep Learning for Computer Vision with SAS: An Introduction
Robert Blanchard
No ratings yet
Pandas in 7 Days: Utilize Python to Manipulate Data, Conduct Scientific Computing, Time Series Analysis, and Exploratory Data Analysis
From Everand
Pandas in 7 Days: Utilize Python to Manipulate Data, Conduct Scientific Computing, Time Series Analysis, and Exploratory Data Analysis
Fabio Nelli
No ratings yet
Where to Place My Project: Code Hosting Platforms
From Everand
Where to Place My Project: Code Hosting Platforms
Jagoda Górska
No ratings yet
Book Recommendation System-Capstone Project 4
No ratings yet
Book Recommendation System-Capstone Project 4
31 pages
Algorithm Challenges: The Dojo Collection
From Everand
Algorithm Challenges: The Dojo Collection
Martin Puryear
No ratings yet
Practical Full Stack Machine Learning: A Guide to Build Reliable, Reusable, and Production-Ready Full Stack ML Solutions
From Everand
Practical Full Stack Machine Learning: A Guide to Build Reliable, Reusable, and Production-Ready Full Stack ML Solutions
Alok Kumar
No ratings yet
BDA Mini Project Report
No ratings yet
BDA Mini Project Report
27 pages
F# for Machine Learning Essentials: Get up and running with machine learning with F# in a fun and functional way
From Everand
F# for Machine Learning Essentials: Get up and running with machine learning with F# in a fun and functional way
Sudipta Mukherjee
No ratings yet
Deep Learning with Hadoop
From Everand
Deep Learning with Hadoop
Dipayan Dev
No ratings yet
Book Recommendation Project
No ratings yet
Book Recommendation Project
15 pages
Book Recs Report DAP - Aditi Sahal
No ratings yet
Book Recs Report DAP - Aditi Sahal
34 pages
INSTANT Premium Drupal Themes
From Everand
INSTANT Premium Drupal Themes
Pankaj Sharma
No ratings yet
Deep Learning with C#, .Net and Kelp.Net: The Ultimate Kelp.Net Deep Learning Guide
From Everand
Deep Learning with C#, .Net and Kelp.Net: The Ultimate Kelp.Net Deep Learning Guide
Matt R. Cole
No ratings yet
Finalproposal
No ratings yet
Finalproposal
16 pages
Introduction to Algorithms & Data Structures: A solid foundation for the real world of machine learning and data analytics
From Everand
Introduction to Algorithms & Data Structures: A solid foundation for the real world of machine learning and data analytics
Bolakale Aremu
No ratings yet
Your Paragraph Text
No ratings yet
Your Paragraph Text
13 pages
Distributed Computing with Python
From Everand
Distributed Computing with Python
Francesco Pierfederici
No ratings yet
Data Analysis with Python: Introducing NumPy, Pandas, Matplotlib, and Essential Elements of Python Programming (English Edition)
From Everand
Data Analysis with Python: Introducing NumPy, Pandas, Matplotlib, and Essential Elements of Python Programming (English Edition)
Rituraj Dixit
No ratings yet
Conceptual Frameworks: A Guide to Structuring Analyses, Decisions and Presentations
From Everand
Conceptual Frameworks: A Guide to Structuring Analyses, Decisions and Presentations
Chinmay Kakatkar
5/5 (2)
Data Manipulation with Python Step by Step: A Practical Guide with Examples
From Everand
Data Manipulation with Python Step by Step: A Practical Guide with Examples
William E. Clark
No ratings yet
Access 2016: Up To Speed
From Everand
Access 2016: Up To Speed
R.M. Hyttinen
5/5 (2)
BOOK Recommendation That Help To Analsis The
No ratings yet
BOOK Recommendation That Help To Analsis The
22 pages
41 Perusse Alexander Aperusse PDF
No ratings yet
41 Perusse Alexander Aperusse PDF
7 pages
Optimizing AI and Machine Learning Solutions: Your ultimate guide to building high-impact ML/AI solutions (English Edition)
From Everand
Optimizing AI and Machine Learning Solutions: Your ultimate guide to building high-impact ML/AI solutions (English Edition)
Mirza Rahim Baig
No ratings yet
AP Computer Science A Premium, 12th Edition: Prep Book with 6 Practice Tests + Comprehensive Review + Online Practice
From Everand
AP Computer Science A Premium, 12th Edition: Prep Book with 6 Practice Tests + Comprehensive Review + Online Practice
Barron's Educational Series
No ratings yet
Software Engineering & Object Oriented Modeling
From Everand
Software Engineering & Object Oriented Modeling
Jitendra Patel
No ratings yet
Everyday Data Structures
From Everand
Everyday Data Structures
William Smith
No ratings yet
Optimizing Hadoop for MapReduce
From Everand
Optimizing Hadoop for MapReduce
Khaled Tannir
No ratings yet
DynamoDB Applied Design Patterns
From Everand
DynamoDB Applied Design Patterns
Uchit Vyas
3/5 (1)
Software Reuse: Methods, Models, Costs, Second Edition
From Everand
Software Reuse: Methods, Models, Costs, Second Edition
Ronald J. Leach
No ratings yet
Mastering Scala Machine Learning
From Everand
Mastering Scala Machine Learning
Alex Kozlov
No ratings yet
8 Ways to Boost Your Logic
From Everand
8 Ways to Boost Your Logic
Pawan Sharma
No ratings yet
JMP for Mixed Models
From Everand
JMP for Mixed Models
Ruth Hummel
No ratings yet
Machine Learning for Beginners: Learn to Build Machine Learning Systems Using Python (English Edition)
From Everand
Machine Learning for Beginners: Learn to Build Machine Learning Systems Using Python (English Edition)
Harsh Bhasin
No ratings yet
Mastering Java Persistence API (JPA): Realize Java's Capabilities Spanning RDBMS, ORM, JDBC, Caching, Locking, Transaction Management, and JPQL
From Everand
Mastering Java Persistence API (JPA): Realize Java's Capabilities Spanning RDBMS, ORM, JDBC, Caching, Locking, Transaction Management, and JPQL
Nisha Parameswaran Kurur
No ratings yet
Team Guide to Software Testability: Better software through greater testability
From Everand
Team Guide to Software Testability: Better software through greater testability
Ash Winter
No ratings yet
Bookrecommendations 230615063942 3b1016c9
No ratings yet
Bookrecommendations 230615063942 3b1016c9
22 pages
.NET 7 Design Patterns In-Depth: Enhance code efficiency and maintainability with .NET Design Patterns (English Edition)
From Everand
.NET 7 Design Patterns In-Depth: Enhance code efficiency and maintainability with .NET Design Patterns (English Edition)
Vahid Farahmandian
No ratings yet
SQL and NoSQL Interview Questions: Your essential guide to acing SQL and NoSQL job interviews (English Edition)
From Everand
SQL and NoSQL Interview Questions: Your essential guide to acing SQL and NoSQL job interviews (English Edition)
Vishwanathan Narayanan
No ratings yet
Basics of Python Programming: Learn Python in 30 days (Beginners approach) - 2nd Edition
From Everand
Basics of Python Programming: Learn Python in 30 days (Beginners approach) - 2nd Edition
Dr. Pratiyush Guleria
No ratings yet
Writing Proposals and Grants
From Everand
Writing Proposals and Grants
Richard Johnson-Sheehan
No ratings yet
Assignment 5
No ratings yet
Assignment 5
6 pages
Problem Statement - RS - Amazon Product Recommendation
No ratings yet
Problem Statement - RS - Amazon Product Recommendation
2 pages
MATLAB for Machine Learning: Unlock the power of deep learning for swift and enhanced results
From Everand
MATLAB for Machine Learning: Unlock the power of deep learning for swift and enhanced results
Giuseppe Ciaburro
No ratings yet
26 TH 20 Jan 2020 Logout
No ratings yet
26 TH 20 Jan 2020 Logout
36 pages
Cell ID Cell Name Site ID Site Name District
No ratings yet
Cell ID Cell Name Site ID Site Name District
5 pages
Wireshark Cheatsheet3 PDF
No ratings yet
Wireshark Cheatsheet3 PDF
1 page
Vool
No ratings yet
Vool
203 pages
Site Id Site Name Unique Cellid Affected Sector: Custodian Zone 2G 3G 4G 2G
No ratings yet
Site Id Site Name Unique Cellid Affected Sector: Custodian Zone 2G 3G 4G 2G
10 pages
Cellindex 3 Gonly
No ratings yet
Cellindex 3 Gonly
90 pages
Cell ID Cell Name Site ID Site Name District
No ratings yet
Cell ID Cell Name Site ID Site Name District
5 pages
Cell Index
No ratings yet
Cell Index
207 pages
Appendix G Study Planner: Practice Test Reading Task
No ratings yet
Appendix G Study Planner: Practice Test Reading Task
3 pages
Dreamline - Assessment Form: Senior Executive Engineer, 8 Years
No ratings yet
Dreamline - Assessment Form: Senior Executive Engineer, 8 Years
1 page
Profile: Roopesh Kaimal Sasi Bhavan Nedumporom Post Thiruvalla Pathanathita Dist KERALA 689578 PH: +91-940-028-6748
No ratings yet
Profile: Roopesh Kaimal Sasi Bhavan Nedumporom Post Thiruvalla Pathanathita Dist KERALA 689578 PH: +91-940-028-6748
8 pages
Equations
No ratings yet
Equations
1 page
Kia Zens
No ratings yet
Kia Zens
15 pages
KL04681 1896 Painkulam AGRAHARM - 24 2117, 2118, 2119, 2120 Chelakkara Attoor KL04224 Elite
No ratings yet
KL04681 1896 Painkulam AGRAHARM - 24 2117, 2118, 2119, 2120 Chelakkara Attoor KL04224 Elite
8 pages
Current Technologies and Trends in The Development of Gyros Used in Navigation Applications - A Review
No ratings yet
Current Technologies and Trends in The Development of Gyros Used in Navigation Applications - A Review
6 pages
SCF Editing Paramaters
No ratings yet
SCF Editing Paramaters
5 pages
Kia Zens
No ratings yet
Kia Zens
1 page
Hardware Compatibility For New BoQ, Nov 2014
No ratings yet
Hardware Compatibility For New BoQ, Nov 2014
4 pages
Keylogger Code C++
No ratings yet
Keylogger Code C++
2 pages
Udyam Registration
No ratings yet
Udyam Registration
12 pages
K7D628
No ratings yet
K7D628
16 pages
Certificate
No ratings yet
Certificate
1 page
Business Model Canvas
No ratings yet
Business Model Canvas
3 pages
A Framework For The Automation of Testing Computer Vision Systems
No ratings yet
A Framework For The Automation of Testing Computer Vision Systems
4 pages
Get Data Analytics For Accounting, 3rd Edition Vernon J. Richardson Free All Chapters
No ratings yet
Get Data Analytics For Accounting, 3rd Edition Vernon J. Richardson Free All Chapters
40 pages
SDSS2022 Programme Book
No ratings yet
SDSS2022 Programme Book
22 pages
Pol Party Raz
No ratings yet
Pol Party Raz
1 page
Synchro PRO 2018 - Technical Overview
No ratings yet
Synchro PRO 2018 - Technical Overview
11 pages
Global Market Forecast 2015-2034 PDF
No ratings yet
Global Market Forecast 2015-2034 PDF
27 pages
OptaSense Third Party Interface Specification
No ratings yet
OptaSense Third Party Interface Specification
32 pages
Redox
No ratings yet
Redox
2 pages
Pseudo Holday - Handle COVID 19 - Facebook Prophet
No ratings yet
Pseudo Holday - Handle COVID 19 - Facebook Prophet
27 pages
Nocom vs. Camerino
0% (1)
Nocom vs. Camerino
7 pages
Notice of Recurrence: U.S. Department of Labor
No ratings yet
Notice of Recurrence: U.S. Department of Labor
4 pages
Bods Interview
100% (3)
Bods Interview
61 pages
BEIJER - IX TxA and IX TXB To X2 Migration Guidelines (08 - 2016)
No ratings yet
BEIJER - IX TxA and IX TXB To X2 Migration Guidelines (08 - 2016)
10 pages
OptiFlex 2 GM03 Manual Gun Operation Manual-En-0611
No ratings yet
OptiFlex 2 GM03 Manual Gun Operation Manual-En-0611
42 pages
Geo 111 Cartography and Map Analysis
No ratings yet
Geo 111 Cartography and Map Analysis
2 pages
Iphellstar Shirt Hellstar Studios Short Sleeve Tee Shirt6644203228classType VARIANT&From Search
No ratings yet
Iphellstar Shirt Hellstar Studios Short Sleeve Tee Shirt6644203228classType VARIANT&From Search
1 page
Manual Polipasto R&M Load Mate LM16
100% (1)
Manual Polipasto R&M Load Mate LM16
65 pages
Felcom 12 15 16 Ssas Tie PDF
No ratings yet
Felcom 12 15 16 Ssas Tie PDF
80 pages
Principles of Digital Transmission
No ratings yet
Principles of Digital Transmission
1 page
Document 1
No ratings yet
Document 1
17 pages
Corporate Governanceand Ethics
No ratings yet
Corporate Governanceand Ethics
8 pages
Amplifier Build and Design: Faculty of Engineering and Applied Science
No ratings yet
Amplifier Build and Design: Faculty of Engineering and Applied Science
21 pages
(MDS-G6) PMS
No ratings yet
(MDS-G6) PMS
22 pages
Laag 1
No ratings yet
Laag 1
12 pages
CB Model Gearbox Rebuild
No ratings yet
CB Model Gearbox Rebuild
7 pages

2020 Dse Bds Assign3

Uploaded by

2020 Dse Bds Assign3

Uploaded by

1

DSE BIG DATA SYSTEMS ASSIGNMENT 3

Submission Date: 20 May 2020 11.55 PM

 For a given user, what are the top recommended products?

Q1. What are the number of unique users and books?

DSE BDS Assignment 3

DSE BDS Assignment 3

You might also like