0% found this document useful (0 votes)

24 views41 pages

Movie Recomendation System Using R

Uploaded by

rc2899318

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views41 pages

Movie Recomendation System Using R

Uploaded by

rc2899318

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 41

PROJECT REPORT

Movie recommendation system in R

VAC (Data Science using R)

GROUP PROJECT

BACHELOR OF ENGINEERING
IN COMPUTER
SCIENCE
ENGINEERING

SUBMITTED BY : Rahul Chauhan-23BCS12035

Partner- Anustubh Mishra-23BCS13106

SUPERVISED BY : Rashmi Dhawan

My VAC Section : 3 Semester – 3rd

My Section – 708-B
Movie Recommendation System in
R

Preface
The R Markdown code used to generate this report and the PDF version are available on GitHub.
The HTML version is available on RPubs.

1 Introduction
Recommendation systems plays an important role in e-commerce and online streaming services,
such as Netflix, YouTube and Amazon. Making the right recommendation for the next product,
music or movie increases user retention and satisfaction, leading to sales and profit growth.
Companies competing for customer loyalty invest on systems that capture and analyses the
user’s preferences, and offer products or services with higher likelihood of purchase.
The economic impact of such company-customer relationship is clear: Amazon is the largest
online retail company by sales and part of its success comes from the recommendation system
and marketing based on user preferences. In 2006 Netflix offered a one million dollar prize2 for
the person or group that could improve their recommendation system by at least 10%.
Usually recommendation systems are based on a rating scale from 1 to 5 grades or stars, with 1
indicating lowest satisfaction and 5 is the highest satisfaction. Other indicators can also be used,
such as comments posted on previously used items; video, music or link shared with friends;
percentage of movie watched or music listened; web pages visited and time spent on each page;
product category; and any other interaction with the company’s web site or application can be
used as a predictor.
The primary goal of recommendation systems is to help users find what they want based on their
preferences and previous interactions, and predicting the rating for a new item. In this document,
we create a movie recommendation system using the MovieLens dataset and applying the
lessons learned during the HarvardX’s Data Science Professional Certificate3 program.
This document is structured as follows. Chapter 1 describes the dataset and summarizes the
goal of the project and key steps that were performed. In chapter 2 we explain the process and
techniques used, such as data cleaning, data exploration and visualization, any insights gained,
and the modeling approach. In chapter 3 we present the modeling results and discuss the model
performance. We conclude in chapter 4 with a brief summary of the report, its limitations and
future work.

1.1 MovieLens Dataset

GroupLens4 is a research lab in the University of Minnesota that has collected and made
available rating data for movies in the MovieLens web site5.
The complete MovieLens dataset6 consists of 27 million ratings of 58,000 movies by 280,000
users. The research presented in this paper is based in a subset7 of this dataset with 10 million
ratings on 10,000 movies by 72,000 users.
1.2 Model Evaluation
The evaluation of machine learning algorithms consists in comparing the predicted value with the
actual outcome. The loss function measures the difference between both values. There are other
metrics that go beyond the scope of this document.
The most common loss functions in machine learning are the mean absolute error (MAE), mean
squared error (MSE) and root mean squared error (RMSE).
Regardless of the loss function, when the user consistently selects the predicted movie, the error
is equal to zero and the algorithm is perfect.
The goal of this project is to create a recommendation system with RMSE lower than 0.8649.
There’s no specific target for MSE and MAE.

1.2.1 Mean Absolute Error - MAE

The mean absolute error is the average of absolute differences between the predicted value and
the true value. The metric is linear, which means that all errors are equally weighted. Thus, when
predicting ratings in a scale of 1 to 5, the MAE assumes that an error of 2 is twice as bad as an
error of 1.
The MAE is given by this formula:

MAE=1N∑i=1N|y^i−yi|
where N is the number of observations, y^i is the predicted value and y is the true value.

1.2.2 Mean Squared Error - MSE

The MSE is the average squared error of the predictions. Larger errors are weighted more than
smaller ones, thus if the error doubles, the MSE increases four times. On the other hand, errors
smaller than 1 also decrease faster, for example an error of 0.5 results in MSE of 0.25.

MSE=1N∑u,i(y^i−yi)2

1.2.3 Root Mean Squared Error - RMSE

The Root Mean Squared Error, RMSE, is the square root of the MSE. It is the typical metric to
evaluate recommendation systems, and is defined by the formula:

RMSE=1N∑u,i(y^u,i−yu,i)2
where N is the number of ratings, yu,i is the rating of movie i by user u and y^u, is the
prediction of movie i by user u.
Similar to MSE, the RMSE penalizes large deviations from the mean and is appropriate in cases
that small errors are not relevant. Contrary to the MSE, the error has the same unit as the
measurement.

1.3 Process and Workflow

The main steps in a data science project include:

1. Data preparation: download, parse, import and prepare the data to be processed and
analysed.
2. Data exploration and visualization: explore data to understand the features and the
relationship between the features and predictors.
3. Data cleaning: eventually the dataset contains unnecessary information that needs to be
removed.
4. Data analysis and modeling: create the model using the insights gained during
exploration. Also test and validate the model.
5. Communicate: create the report and publish the results.

First we download the dataset from MovieLens website and split into two subsets used for
training and validation. The training subset is called edx and the validation subset is
called validation. The edx set is split again into two subsets used for training and and testing.
When the model reaches the RMSE target in the testing set, we train the edx set with the model
and use the validation set for final validation. We pretend the validation set is new data with
unknown outcomes.
In the next step we create charts, tables and statistics summary to understand how the features
can impact the outcome. The information and insights obtained during exploration will help to
build the machine learning model.
Creating a recommendation system involves the identification of the most important features that
helps to predict the rating any given user will give to any movie. We start building a very simple
model, which is just the mean of the observed values. Then, the user and movie effects are
included in the linear model, improving the RMSE. Finally, the user and movie effects receive
regularization parameter that penalizes samples with few ratings.
Although the linear model with regularization achieves the desired RMSE, matrix factorization
using the LIBMF8 algorithm is evaluated and provides a better prediction. LIBMF is available
through the R package recosystem9.

2 Methods and Analysis

2.1 Data Preparation
In this section we download and prepare the dataset to be used in the analysis. We split the
dataset in two parts, the training set called edx and the evaluation set called validation with
90% and 10% of the original dataset respectively.
Then, we split the edx set in two parts, the train set and test set with 90% and 10% of edx set
respectively. The model is created and trained in the train set and tested in the test set until
the RMSE target is achieved, then finally we train the model again in the entire edx set and
validate in the validation set. The name of this method is cross-validation.

if(!require(tidyverse))
install.packages("tidyverse", repos = "http://cran.us.r-project.org")
if(!require(caret))
install.packages("caret", repos = "http://cran.us.r-project.org")
if(!require(data.table))
install.packages("data.table", repos = "http://cran.us.r-project.org")

# MovieLens 10M dataset:

# https://grouplens.org/datasets/movielens/10m/
# http://files.grouplens.org/datasets/movielens/ml-10m.zip

dl <- tempfile()
download.file("http://files.grouplens.org/datasets/movielens/ml-10m.zip", d
l)
ratings <- fread(text = gsub("::", "\t",
readLines(unzip(dl, "ml-10M100K/ratings.dat"))
),
col.names = c("userId", "movieId", "rating", "timestamp"))

movies <- str_split_fixed(readLines(unzip(dl, "ml-10M100K/movies.dat")), "\

\::", 3)
colnames(movies) <- c("movieId", "title", "genres")
movies <- as.data.frame(movies) %>%
mutate(movieId = as.numeric(levels(movieId))[movieId],
title = as.character(title),
genres = as.character(genres))

movielens <- left_join(ratings, movies, by = "movieId")

# 'Validation' set will be 10% of MovieLens data

set.seed(1, sample.kind="Rounding")
test_index <- createDataPartition(y = movielens$rating,
times = 1, p = 0.1, list = FALSE)
edx <- movielens[-test_index,]
temp <- movielens[test_index,]

# Make sure userId and movieId in 'validation' set are also in 'edx' set
validation <- temp %>%
semi_join(edx, by = "movieId") %>%
semi_join(edx, by = "userId")

# Add rows removed from 'validation' set back into 'edx' set
removed <- anti_join(temp, validation)
edx <- rbind(edx, removed)

rm(dl, ratings, movies, test_index, temp, movielens, removed)

The edx set is used for training and testing, and the validation set is used for final validation to
simulate the new data.
Here, we split the edx set in 2 parts: the training set and the test set.
The model building is done in the training set, and the test set is used to test the model. When
the model is complete, we use the validation set to calculate the final RMSE.
We use the same procedure used to create edx and validation sets.
The training set will be 90% of edx data and the test set will be the remaining 10%.
set.seed(1, sample.kind="Rounding")
test_index <- createDataPartition(y = edx$rating, times = 1, p = 0.1, list
= FALSE)
train_set <- edx[-test_index,]
temp <- edx[test_index,]

# Make sure userId and movieId in test set are also in train set
test_set <- temp %>%
semi_join(train_set, by = "movieId") %>%
semi_join(train_set, by = "userId")

# Add rows removed from test set back into train set
removed <- anti_join(temp, test_set)
train_set <- rbind(train_set, removed)

rm(test_index, temp, removed)

2.2 Data Exploration

Before start building the model, we need to understand the structure of the data, the distribution
of ratings and the relationship of the predictors. This information will help build a better model.

str(edx)
## 'data.frame': 9000055 obs. of 6 variables:
## $ userId : int 1 1 1 1 1 1 1 1 1 1 ...
## $ movieId : num 122 185 292 316 329 355 356 362 364 370 ...
## $ rating : num 5 5 5 5 5 5 5 5 5 5 ...
## $ timestamp: int 838985046 838983525 838983421 838983392 838983392 838
984474 838983653 838984885 838983707 838984596 ...
## $ title : chr "Boomerang (1992)" "Net, The (1995)" "Outbreak (1995)
" "Stargate (1994)" ...
## $ genres : chr "Comedy|Romance" "Action|Crime|Thriller" "Action|Dram
a|Sci-Fi|Thriller" "Action|Adventure|Sci-Fi" ...

From this initial exploration, we discover that edx has 6 columns:

movieId integer
userId integer
rating numeric
timestampnumeric
title character
genres character
How many rows and columns are there in the edx dataset?

dim(edx)
## [1] 9000055 6

The next table shows the structure and content of edx dataset
The dataset is in tidy format, i.e. each row has one observation and the column names are the
features. The rating column is the desired outcome. The user information is stored in userId;
the movie information is both in movieId and title columns. The rating date is available
in timestamp measured in seconds since January 1st, 1970. Each movie is tagged with one or
more genre in the genres column.

head(edx)

userId movieId rating timestamp title genres

1 1 122 5 838985046 Boomerang (1992) Comedy|Romance

2 1 185 5 838983525 Net, The (1995) Action|Crime|Thriller

4 1 292 5 838983421 Outbreak (1995) Action|Drama|Sci-Fi|Thriller

5 1 316 5 838983392 Stargate (1994) Action|Adventure|Sci-Fi

6 1 329 5 838983392 Star Trek: Generations (1994) Action|Adventure|Drama|Sci-Fi

7 1 355 5 838984474 Flintstones, The (1994) Children|Comedy|Fantasy

The next sections discover more details about each feature and outcome.

2.2.1 Genres
Along with the movie title, MovieLens provides the list of genres for each movie. Although this
information can be used to make better predictions, this research doesn’t use it. However it’s
worth exploring this information as well.
The data set contains 797 different combinations of genres. Here is the list of the first six.

edx %>% group_by(genres) %>%

summarise(n=n()) %>%
head()

Genres

(no genres listed)

Action
Genres

Action|Adventure

Action|Adventure|Animation|Children|Comedy

The table above shows that several movies are classified in more than one genre. The number of
genres in each movie is listed in this table, sorted in descend order.

tibble(count = str_count(edx$genres, fixed("|")), genres = edx$genres) %>%

group_by(count, genres) %>%
summarise(n = n()) %>%
arrange(-count) %>%
head()

count genres

2.2.2 Date
The rating period was collected over almost 14 years.

library(lubridate)
tibble(`Initial Date` = date(as_datetime(min(edx$timestamp), origin="1970-0
1-01")),
`Final Date` = date(as_datetime(max(edx$timestamp), origin="1970-01-
01"))) %>%
mutate(Period = duration(max(edx$timestamp)-min(edx$timestamp)))

Initial Date Final Date

1995-01-09 2009-01-05 441479727s

if(!require(ggthemes))
install.packages("ggthemes", repos = "http://cran.us.r-project.org")
if(!require(scales))
install.packages("scales", repos = "http://cran.us.r-project.org")
edx %>% mutate(year = year(as_datetime(timestamp, origin="1970-01-01"))) %>
%
ggplot(aes(x=year)) +
geom_histogram(color = "white") +
ggtitle("Rating Distribution Per Year") +
xlab("Year") +
ylab("Number of Ratings") +
scale_y_continuous(labels = comma) +
theme_economist()
Rating distribution per year.
The following table lists the days with more ratings. Not surprisingly, the movies are well known
blockbusters.

edx %>% mutate(date = date(as_datetime(timestamp, origin="1970-01-01"))) %>

%
group_by(date, title) %>%
summarise(count = n()) %>%
arrange(-count) %>%
head(10)

Date title

1998-05-22 Chasing Amy (1997)

2000-11-20 American Beauty (1999)

1999-12-11 Star Wars: Episode IV - A New Hope (a.k.a. Star Wars) (1977)

1999-12-11 Star Wars: Episode V - The Empire Strikes Back (1980)

1999-12-11 Star Wars: Episode VI - Return of the Jedi (1983)

2005-03-22 Lord of the Rings: The Two Towers, The (2002)

2005-03-22 Lord of the Rings: The Fellowship of the Ring, The (2001)

2000-11-20 Terminator 2: Judgment Day (1991)

1999-12-11 Matrix, The (1999)

2000-11-20 Jurassic Park (1993)

2.2.3 Ratings
Users have the option to choose a rating value from 0.5 to 5.0, totaling 10 possible values. This
is unusual scale, so most movies get a rounded value rating, as shown in the chart below.
Count the number of each ratings:

edx %>% group_by(rating) %>% summarize(n=n())

rating

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

5.0

How many ratings are in edx?

edx %>% group_by(rating) %>%

summarise(count=n()) %>%
ggplot(aes(x=rating, y=count)) +
geom_line() +
geom_point() +
scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
labels = trans_format("log10", math_format(10^.x))) +
ggtitle("Rating Distribution", subtitle = "Higher ratings are prevalent
.") +
xlab("Rating") +
ylab("Count") +
theme_economist()
Round values receive more ratings than decimals.

2.2.4 Movies
There are 10677 different movies in the edx set. We know from intuition that some of them are
rated more than others, since many movies are watched by few users and blockbusters tend to
have more ratings.

edx %>% group_by(movieId) %>%

summarise(n=n()) %>%
ggplot(aes(n)) +
geom_histogram(color = "white") +
scale_x_log10() +
ggtitle("Distribution of Movies",
subtitle = "The distribution is almost symetric.") +
xlab("Number of Ratings") +
ylab("Number of Movies") +
theme_economist()
10.7% of movies have less than 10 ratings.

2.2.5 Users
There are 69878 different users are in the edx set.
The majority of users rate few movies, while a few users rate more than a thousand movies.
5% users rated less than 20 movies.

edx %>% group_by(userId) %>%

summarise(n=n()) %>%
arrange(n) %>%
head()

userId n

62516 10

22170 12

15719 13

50608 13
userId n

901 14

1833 14

The user distribution is right skewed.

edx %>% group_by(userId) %>%

summarise(n=n()) %>%
ggplot(aes(n)) +
geom_histogram(color = "white") +
scale_x_log10() +
ggtitle("Distribution of Users",
subtitle="The distribution is right skewed.") +
xlab("Number of Ratings") +
ylab("Number of Users") +
scale_y_continuous(labels = comma) +
theme_economist()

5% of users rated less than 20 movies.

Show the heatmap of users x movies
This user-movie matrix is sparse, with the vast majority of empty cells. Notice that four movies
have more ratings, and one or two users are more active.

users <- sample(unique(edx$userId), 100)

edx %>% filter(userId %in% users) %>%
select(userId, movieId, rating) %>%
mutate(rating = 1) %>%
spread(movieId, rating) %>%
select(sample(ncol(.), 100)) %>%
as.matrix() %>% t(.) %>%
image(1:100, 1:100,. , xlab="Movies", ylab="Users")
abline(h=0:100+0.5, v=0:100+0.5, col = "grey")
title("User x Movie Matrix")

2.3 Data Cleaning

As previously discussed, several features can be used to predict the rating for a given user. However,
many predictors increases the model complexity and requires more computer resources, so in this
research the estimated rating uses only movie and user information.

train_set <- train_set %>% select(userId, movieId, rating, title)

test_set <- test_set %>% select(userId, movieId, rating, title)

2.4 Modeling
2.4.1 Random Prediction

A very simple model is just randomly predict the rating using the probability distribution observed
during the data exploration. For example, if we know the probability of all users giving a movie a
rating of 3 is 10%, then we may guess that 10% of the ratings will have a rating of 3.

Such prediction sets the worst error we may get, so any other model should provide better result.

2.4.2 Linear Model

The simplest model predicts all users will give the same rating to all movies and assumes the movie
to movie variation is the randomly distributed error. Although the predicted rating can be any value,
statistics theory says that the average minimizes the RMSE, so the initial prediction is just the
average of all observed ratings, as described in this formula:

Y^u,i=μ+ϵi,u

Where Y^ is the predicted rating, μ is the mean of observed data and ϵi,u is the error distribution.
Any value other than the mean increases the RMSE, so this is a good initial estimation.

Part of the movie to movie variability can be explained by the fact that different movies have
different rating distribution. This is easy to understand, since some movies are more popular than
others and the public preference varies. This is called movie effect or movie bias, and is expressed
as bi in this formula:

Y^u,i=μ+bi+ϵi,

The movie effect can be calculated as the mean of the difference between the observed rating y and
the mean μ.

b^i=1N∑i=1N(yi−μ^)

Similar to the movie effect, different users have different rating pattern or distribution. For example,
some users like most movies and consistently rate 4 or 5, while other users dislike most movies
rating 1 or 2. This is called user effect or user bias and is expressed in this formula:

b^u=1N∑i=1N(yu,i−b^i−μ^)

The prediction model that includes the user effect becomes:

Y^u,i=μ+bi+bu+ϵu,I

Movies can be grouped into categories or genres, with different distributions. In general, movies in
the same genre get similar ratings. In this project we won’t evaluate the genre effect.

2.4.3 Regularization

The linear model provides a good estimation for the ratings, but doesn’t consider that many movies
have very few number of ratings, and some users rate very few movies. This means that the sample
size is very small for these movies and these users. Statistically, this leads to large estimated error.

The estimated value can be improved adding a factor that penalizes small sample sizes and have
have little or no impact otherwise. Thus, estimated movie and user effects can be calculated with
these formulas:
b^i=1ni+λ∑u=1ni(yu,i−μ^)

b^u=1nu+λ∑i=1nu(yu,i−b^i−μ^)

For values of N smaller than or similar to λ , bÎ and bû is smaller than the original values,
whereas for values of N much larger than λ , bÎ and bû change very little.

An effective method to choose λ that minimizes the RMSE is running simulations with several values
of λ .

2.4.4 Matrix Factorization

Matrix factorization is widely used machine learning tool for predicting ratings in recommendation
systems. This method became widely known during the Netflix Prize challenge10.

The data can be converted into a matrix such that each user is in a row, each movie is in a column
and the rating is in the cell, then the algorithm attempts to fill in the missing values. The table below
provides a simple example of a 4×54×5 matrix.

movie 1 movie 2 movie 3 movie 4 movi

user 1 ? ? 4 ? 3

user 2 2 ? ? 4 ?

user 3 ? 3 ? ? 5

user 4 3 ? 2 ? ?

The concept is to approximate a large rating matrix Rm×n into the product of two lower dimension
matrices Pk×m and Qk×n , such that

R≈P′Q

The R recosystem package provides methods to decompose the rating matrix and estimate the user
rating, using parallel matrix factorization.

3 Results

This section presents the code and results of the models.

3.1 Model Evaluation Functions

Here we define the loss functions.

# Define Mean Absolute Error (MAE)

MAE <- function(true_ratings, predicted_ratings){

mean(abs(true_ratings - predicted_ratings))

}
# Define Mean Squared Error (MSE)

MSE <- function(true_ratings, predicted_ratings){

mean((true_ratings - predicted_ratings)^2)

# Define Root Mean Squared Error (RMSE)

RMSE <- function(true_ratings, predicted_ratings){

sqrt(mean((true_ratings - predicted_ratings)^2))

3.2 Random Prediction

The first model randomly predicts the ratings using the observed probabilities in the training set.
First, we calculate the probability of each rating in the training set, then we predict the rating for the
test set and compare with actual rating. Any model should be better than this one.

Since the training set is a sample of the entire population and we don’t know the real distribution of
ratings, the Monte Carlo simulation with replacement provides a good approximation of the rating
distribution.

set.seed(4321, sample.kind = "Rounding")

# Create the probability of each rating

p <- function(x, y) mean(y == x)

rating <- seq(0.5,5,0.5)

# Estimate the probability of each rating with Monte Carlo simulation

B <- 10^3

M <- replicate(B, {

s <- sample(train_set$rating, 100, replace = TRUE)

sapply(rating, p, y= s)

})

prob <- sapply(1:nrow(M), function(x) mean(M[x,]))

# Predict random ratings

y_hat_random <- sample(rating, size = nrow(test_set),

replace = TRUE, prob = prob)

# Create a table with the error results

result <- tibble(Method = "Project Goal", RMSE = 0.8649, MSE = NA, MAE = NA)

result <- bind_rows(result,

tibble(Method = "Random prediction",

RMSE = RMSE(test_set$rating, y_hat_random),

MSE = MSE(test_set$rating, y_hat_random),

MAE = MAE(test_set$rating, y_hat_random)))

The RMSE of random prediction is very high.

result

Method RMSE MSE MAE

Project Goal 0.864900 NA NA

Random prediction 1.501245 2.253735 1.167597

3.3 Linear Model

We’re building the linear model based on the formula:

y^=μ+bi+bu+ϵu,I

3.3.1 Initial Prediction

The initial prediction is just the mean of the ratings, μ .

y^=μ+ϵu,I

# Mean of observed values

mu <- mean(train_set$rating)

# Update the error table

result <- bind_rows(result,

tibble(Method = "Mean",

RMSE = RMSE(test_set$rating, mu),

MSE = MSE(test_set$rating, mu),

MAE = MAE(test_set$rating, mu)))

# Show the RMSE improvement

result

Method RMSE MSE MAE

Project Goal 0.864900 NA NA

Random prediction 1.501245 2.253735 1.1675974

Mean 1.060054 1.123714 0.8551636

3.3.2 Include Movie Effect (bi )

bi is the movie effect (bias) for movie I .

y^=μ+bi+ϵu,I

# Movie effects (bi)

bi <- train_set %>%

group_by(movieId) %>%

summarize(b_i = mean(rating - mu))

head(bi)

movieId b_i

1 0.4150040

2 -0.3064057

3 -0.3613952

4 -0.6372808

5 -0.4416058
movieId b_i

6 0.3018943

The movie effect is normally left skewed distributed.

bi %>% ggplot(aes(x = b_i)) +

geom_histogram(bins=10, col = I("black")) +

ggtitle("Movie Effect Distribution") +

xlab("Movie effect") +

ylab("Count") +

scale_y_continuous(labels = comma) +

theme_economist()

# Predict the rating with mean + bi

y_hat_bi <- mu + test_set %>%
left_join(bi, by = "movieId") %>%
.$b_i
# Calculate the RMSE
result <- bind_rows(result,
tibble(Method = "Mean + bi",
RMSE = RMSE(test_set$rating, y_hat_bi),
MSE = MSE(test_set$rating, y_hat_bi),
MAE = MAE(test_set$rating, y_hat_bi)))

# Show the RMSE improvement

result

Method RMSE MSE

Project Goal 0.8649000 NA

Random prediction 1.5012445 2.2537350

Mean 1.0600537 1.1237139

Mean + bi 0.9429615 0.8891764

3.3.3 Include User Effect (bu)

bu is the user effect (bias) for user u .

y^u,i=μ+bi+bu+ϵu,i
Predict the rating with mean + bi + bu

# User effect (bu)

bu <- train_set %>%
left_join(bi, by = 'movieId') %>%
group_by(userId) %>%
summarize(b_u = mean(rating - mu - b_i))

# Prediction
y_hat_bi_bu <- test_set %>%
left_join(bi, by='movieId') %>%
left_join(bu, by='userId') %>%
mutate(pred = mu + b_i + b_u) %>%
.$pred
# Update the results table
result <- bind_rows(result,
tibble(Method = "Mean + bi + bu",
RMSE = RMSE(test_set$rating, y_hat_bi_bu),
MSE = MSE(test_set$rating, y_hat_bi_bu),
MAE = MAE(test_set$rating, y_hat_bi_bu)))

# Show the RMSE improvement

result

Method RMSE MSE

Project Goal 0.8649000 NA

Random prediction 1.5012445 2.2537350

Mean 1.0600537 1.1237139

Mean + bi 0.9429615 0.8891764

Mean + bi + bu 0.8646843 0.7476789

The user effect is normally distributed.

train_set %>%
group_by(userId) %>%
summarize(b_u = mean(rating)) %>%
filter(n()>=100) %>%
ggplot(aes(b_u)) +
geom_histogram(color = "black") +
ggtitle("User Effect Distribution") +
xlab("User Bias") +
ylab("Count") +
scale_y_continuous(labels = comma) +
theme_economist()
3.3.4 Evaluating the model result
The RMSE improved from the initial estimation based on the mean. However, we still need to
check if the model makes good ratings predictions.
Check the 10 largest residual differences

train_set %>%
left_join(bi, by='movieId') %>%
mutate(residual = rating - (mu + b_i)) %>%
arrange(desc(abs(residual))) %>%
slice(1:10)

userId movieId rating title b_i

26423 6483 5.0 From Justin to Kelly (2003) -2.638139

5279 6371 5.0 PokÃ©mon Heroes (2003) -2.472133

57863 6371 5.0 PokÃ©mon Heroes (2003) -2.472133

2507 318 0.5 Shawshank Redemption, The (1994) 0.944111

userId movieId rating title b_i

7708 318 0.5 Shawshank Redemption, The (1994) 0.944111

9214 318 0.5 Shawshank Redemption, The (1994) 0.944111

9568 318 0.5 Shawshank Redemption, The (1994) 0.944111

9975 318 0.5 Shawshank Redemption, The (1994) 0.944111

10749 318 0.5 Shawshank Redemption, The (1994) 0.944111

13496 318 0.5 Shawshank Redemption, The (1994) 0.944111

titles <- train_set %>%

select(movieId, title) %>%
distinct()

Top 10 best movies (ranked by bi).

These are unknown movies
bi %>%
inner_join(titles, by = "movieId") %>%
arrange(-b_i) %>%
select(title) %>%
head()

title

Hellhounds on My Trail (1999)

Satan’s Tango (SÃ¡tÃ¡ntangÃ³) (1994)

Shadows of Forgotten Ancestors (1964)

Fighting Elegy (Kenka erejii) (1966)

Sun Alley (Sonnenallee) (1999)

Blue Light, The (Das Blaue Licht) (1932)

Top 10 worst movies (ranked by bi ), also unknown movies:
bi %>%
inner_join(titles, by = "movieId") %>%
arrange(b_i) %>%
select(title) %>%
head()

title

Besotted (2001)

Hi-Line, The (1999)

Accused (Anklaget) (2005)

Confessions of a Superhero (2007)

War of the Worlds 2: The Next Wave (2008)

SuperBabies: Baby Geniuses 2 (2004)

Number of ratings for 10 best movies:

train_set %>%
left_join(bi, by = "movieId") %>%
arrange(desc(b_i)) %>%
group_by(title) %>%
summarise(n = n()) %>%
slice(1:10)

title

’burbs, The (1989)

’night Mother (1986)

’Round Midnight (1986)

’Til There Was You (1997)

title

“Great Performances” Cats (1998)

*batteries not included (1987)

…All the Marbles (a.k.a. The California Dolls) (1981)

…And God Created Woman (Et Dieu… crÃ©a la femme) (1956)

…And God Spoke (1993)

…And Justice for All (1979)

train_set %>% count(movieId) %>%

left_join(bi, by="movieId") %>%
arrange(desc(b_i)) %>%
slice(1:10) %>%
pull(n)
## [1] 1 1 1 1 1 1 4 2 4 4

3.4 Regularization
Now, we regularize the user and movie effects adding a penalty factor λ , which is a tuning
parameter. We define a number of values for λ and use the regularization function to pick
the best value that minimizes the RMSE.
regularization <- function(lambda, trainset, testset){

# Mean
mu <- mean(trainset$rating)

# Movie effect (bi)

b_i <- trainset %>%
group_by(movieId) %>%
summarize(b_i = sum(rating - mu)/(n()+lambda))

# User effect (bu)

b_u <- trainset %>%
left_join(b_i, by="movieId") %>%
filter(!is.na(b_i)) %>%
group_by(userId) %>%
summarize(b_u = sum(rating - b_i - mu)/(n()+lambda))

# Prediction: mu + bi + bu
predicted_ratings <- testset %>%
left_join(b_i, by = "movieId") %>%
left_join(b_u, by = "userId") %>%
filter(!is.na(b_i), !is.na(b_u)) %>%
mutate(pred = mu + b_i + b_u) %>%
pull(pred)

return(RMSE(predicted_ratings, testset$rating))
}

# Define a set of lambdas to tune

lambdas <- seq(0, 10, 0.25)

# Tune lambda
rmses <- sapply(lambdas,
regularization,
trainset = train_set,
testset = test_set)

# Plot the lambda vs RMSE

tibble(Lambda = lambdas, RMSE = rmses) %>%
ggplot(aes(x = Lambda, y = RMSE)) +
geom_point() +
ggtitle("Regularization",
subtitle = "Pick the penalization that gives the lowest RMSE.")
+
theme_economist()
Choose the lambda that minimizes RMSE.

Next, we apply the best λ to the linear model.

# We pick the lambda that returns the lowest RMSE.
lambda <- lambdas[which.min(rmses)]

# Then, we calculate the predicted rating using the best parameters

# achieved from regularization.
mu <- mean(train_set$rating)

# Movie effect (bi)

b_i <- train_set %>%
group_by(movieId) %>%
summarize(b_i = sum(rating - mu)/(n()+lambda))

# User effect (bu)

b_u <- train_set %>%
left_join(b_i, by="movieId") %>%
group_by(userId) %>%
summarize(b_u = sum(rating - b_i - mu)/(n()+lambda))

# Prediction
y_hat_reg <- test_set %>%
left_join(b_i, by = "movieId") %>%
left_join(b_u, by = "userId") %>%
mutate(pred = mu + b_i + b_u) %>%
pull(pred)

# Update the result table

result <- bind_rows(result,
tibble(Method = "Regularized bi and bu",
RMSE = RMSE(test_set$rating, y_hat_reg),
MSE = MSE(test_set$rating, y_hat_reg),
MAE = MAE(test_set$rating, y_hat_reg)))

# Regularization made a small improvement in RMSE.

result

Method RMSE MSE

Project Goal 0.8649000 NA

Random prediction 1.5012445 2.2537350

Mean 1.0600537 1.1237139

Mean + bi 0.9429615 0.8891764

Mean + bi + bu 0.8646843 0.7476789

Regularized bi and bu 0.8641362 0.7467313

3.5 Matrix Factorization

Matrix factorization approximates a large user-movie matrix into the product of two smaller
dimension matrices. Information in the train set is stored in tidy format, with one observation per
row, so it needs to be converted to the user-movie matrix before using matrix factorization. This
code executes this transformation.

train_data <- train_set %>%

select(userId, movieId, rating) %>%
spread(movieId, rating) %>%
as.matrix()
The code above uses more memory than a commodity laptop is able to process, so we use an
alternative method: the recosystem package, which provides the complete solution for a
recommendation system using matrix factorization.
The package vignette12 describes how to use recosystem:
Usage of recosystem
The usage of recosystem is quite simple, mainly consisting of the following steps:

1. Create a model object (a Reference Class object in R) by calling Reco().

2. (Optionally) call the $tune() method to select best tuning parameters along a set of
candidate values.
3. Train the model by calling the $train() method. A number of parameters can be set
inside the function, possibly coming from the result of $tune().
4. (Optionally) export the model via $output(), i.e. write the factorization matrices P and Q
into files or return them as R objects.
5. Use the $predict() method to compute predicted values.

if(!require(recosystem))
install.packages("recosystem", repos = "http://cran.us.r-project.org")
set.seed(123, sample.kind = "Rounding") # This is a randomized algorithm

# Convert the train and test sets into recosystem input format
train_data <- with(train_set, data_memory(user_index = userId,
item_index = movieId,
rating = rating))
test_data <- with(test_set, data_memory(user_index = userId,
item_index = movieId,
rating = rating))

# Create the model object

r <- recosystem::Reco()

# Select the best tuning parameters

opts <- r$tune(train_data, opts = list(dim = c(10, 20, 30),
lrate = c(0.1, 0.2),
costp_l2 = c(0.01, 0.1),
costq_l2 = c(0.01, 0.1),
nthread = 4, niter = 10))

# Train the algorithm

r$train(train_data, opts = c(opts$min, nthread = 4, niter = 20))
## iter tr_rmse obj
## 0 0.9825 1.1036e+007
## 1 0.8740 8.9621e+006
## 2 0.8419 8.3303e+006
## 3 0.8208 7.9554e+006
## 4 0.8047 7.6948e+006
## 5 0.7921 7.5024e+006
## 6 0.7817 7.3538e+006
## 7 0.7728 7.2387e+006
## 8 0.7652 7.1393e+006
## 9 0.7587 7.0615e+006
## 10 0.7530 6.9936e+006
## 11 0.7479 6.9347e+006
## 12 0.7435 6.8850e+006
## 13 0.7394 6.8436e+006
## 14 0.7356 6.8037e+006
## 15 0.7322 6.7692e+006
## 16 0.7292 6.7380e+006
## 17 0.7262 6.7104e+006
## 18 0.7235 6.6837e+006
## 19 0.7212 6.6625e+006
# Calculate the predicted values
y_hat_reco <- r$predict(test_data, out_memory())
head(y_hat_reco, 10)
## [1] 4.851838 3.720369 3.274899 2.896342 3.550605 3.879312 3.637041
## [8] 3.427570 4.030577 3.193461

Matrix factorization improved substantially the RMSE.

result <- bind_rows(result,

tibble(Method = "Matrix Factorization - recosystem",
RMSE = RMSE(test_set$rating, y_hat_reco),
MSE = MSE(test_set$rating, y_hat_reco),
MAE = MAE(test_set$rating, y_hat_reco)))
result

Method RMSE MSE

Project Goal 0.8649000 NA

Random prediction 1.5012445 2.2537350

Method RMSE MSE

Mean 1.0600537 1.1237139

Mean + bi 0.9429615 0.8891764

Mean + bi + bu 0.8646843 0.7476789

Regularized bi and bu 0.8641362 0.7467313

Matrix Factorization - recosystem 0.7851291 0.6164277

3.6 Final Validation

As we can see from the result table, regularization and matrix factorization achieved the target
RMSE. So, finally we train the complete edx set with both models and calculate the RMSE in
the validation set. The project goal is achieved if the RMSE stays below the target.

3.6.1 Linear Model With Regularization

During the training and testing phases, the linear model with regularization achieved the target
RMSE with a small margin. Here we do the final validation with the validation set.

mu_edx <- mean(edx$rating)

# Movie effect (bi)

b_i_edx <- edx %>%
group_by(movieId) %>%
summarize(b_i = sum(rating - mu_edx)/(n()+lambda))

# User effect (bu)

b_u_edx <- edx %>%
left_join(b_i_edx, by="movieId") %>%
group_by(userId) %>%
summarize(b_u = sum(rating - b_i - mu_edx)/(n()+lambda))

# Prediction
y_hat_edx <- validation %>%
left_join(b_i_edx, by = "movieId") %>%
left_join(b_u_edx, by = "userId") %>%
mutate(pred = mu_edx + b_i + b_u) %>%
pull(pred)

# Update the results table

result <- bind_rows(result,
tibble(Method = "Final Regularization (edx vs validatio
n)",
RMSE = RMSE(validation$rating, y_hat_edx),
MSE = MSE(validation$rating, y_hat_edx),
MAE = MAE(validation$rating, y_hat_edx)))

# Show the RMSE improvement

result

Method RMSE MSE

Project Goal 0.8649000 NA

Random prediction 1.5012445 2.2537350

Mean 1.0600537 1.1237139

Mean + bi 0.9429615 0.8891764

Mean + bi + bu 0.8646843 0.7476789

Regularized bi and bu 0.8641362 0.7467313

Matrix Factorization - recosystem 0.7851291 0.6164277

Final Regularization (edx vs validation) 0.8648177 0.7479097

As expected, the RMSE calculated on the validation set (0.8648177) is lower than the target of
0.8649 and slightly higher than the RMSE of the test set (0.8641362).
Top 10 best movies

validation %>%
left_join(b_i_edx, by = "movieId") %>%
left_join(b_u_edx, by = "userId") %>%
mutate(pred = mu_edx + b_i + b_u) %>%
arrange(-pred) %>%
group_by(title) %>%
select(title) %>%
head(10)

title

Usual Suspects, The (1995)

Shawshank Redemption, The (1994)

Eternal Sunshine of the Spotless Mind (2004)

Star Wars: Episode IV - A New Hope (a.k.a. Star Wars) (1977)

Schindler’s List (1993)

Donnie Darko (2001)

Star Wars: Episode VI - Return of the Jedi (1983)

Schindler’s List (1993)

Top 10 worst movies

validation %>%
left_join(b_i_edx, by = "movieId") %>%
left_join(b_u_edx, by = "userId") %>%
mutate(pred = mu_edx + b_i + b_u) %>%
arrange(pred) %>%
group_by(title) %>%
select(title) %>%
head(10)

title

Battlefield Earth (2000)

title

Police Academy 4: Citizens on Patrol (1987)

Karate Kid Part III, The (1989)

Turbo: A Power Rangers Movie (1997)

Kazaam (1996)

Free Willy 3: The Rescue (1997)

Shanghai Surprise (1986)

Steel (1997)

3.6.2 Matrix Factorization

The initial test shows that matrix factorization gives the best RMSE. Now it’s time to validate with
the entire edx and validation sets.

set.seed(1234, sample.kind = "Rounding")

# Convert 'edx' and 'validation' sets to recosystem input format

edx_reco <- with(edx, data_memory(user_index = userId,
item_index = movieId,
rating = rating))
validation_reco <- with(validation, data_memory(user_index = userId,
item_index = movieId,
rating = rating))

# Create the model object

r <- recosystem::Reco()

# Tune the parameters

opts <- r$tune(edx_reco, opts = list(dim = c(10, 20, 30),
lrate = c(0.1, 0.2),
costp_l2 = c(0.01, 0.1),
costq_l2 = c(0.01, 0.1),
nthread = 4, niter = 10))

# Train the model

r$train(edx_reco, opts = c(opts$min, nthread = 4, niter = 20))
## iter tr_rmse obj
## 0 0.9728 1.2026e+007
## 1 0.8723 9.8942e+006
## 2 0.8404 9.1864e+006
## 3 0.8187 8.7792e+006
## 4 0.8026 8.4855e+006
## 5 0.7908 8.2918e+006
## 6 0.7809 8.1354e+006
## 7 0.7725 8.0127e+006
## 8 0.7654 7.9133e+006
## 9 0.7591 7.8295e+006
## 10 0.7538 7.7614e+006
## 11 0.7491 7.7028e+006
## 12 0.7448 7.6521e+006
## 13 0.7410 7.6053e+006
## 14 0.7374 7.5660e+006
## 15 0.7343 7.5314e+006
## 16 0.7314 7.5003e+006
## 17 0.7288 7.4738e+006
## 18 0.7263 7.4475e+006
## 19 0.7242 7.4260e+006
# Calculate the prediction
y_hat_final_reco <- r$predict(validation_reco, out_memory())

# Update the result table

result <- bind_rows(result,
tibble(Method = "Final Matrix Factorization - recosyste
m",
RMSE = RMSE(validation$rating, y_hat_final_reco)
,
MSE = MSE(validation$rating, y_hat_final_reco),
MAE = MAE(validation$rating, y_hat_final_reco))
)

The final RMSE with matrix factorization is 0.7826974, 9.5% better than the linear model with
regularization (0.8648177).

# Show the RMSE improvement

result

Method RMSE MSE

Project Goal 0.8649000 NA

Random prediction 1.5012445 2.2537350

Mean 1.0600537 1.1237139

Mean + bi 0.9429615 0.8891764

Mean + bi + bu 0.8646843 0.7476789

Regularized bi and bu 0.8641362 0.7467313

Matrix Factorization - recosystem 0.7851291 0.6164277

Final Regularization (edx vs validation) 0.8648177 0.7479097

Final Matrix Factorization - recosystem 0.7826974 0.6126152

Now, let’s check the best and worst movies predicted with matrix factorization.
Top 10 best movies:

tibble(title = validation$title, rating = y_hat_final_reco) %>%

arrange(-rating) %>%
group_by(title) %>%
select(title) %>%
head(10)

title

Lord of the Rings: The Return of the King, The (2003)

title

Schindler’s List (1993)

Shawshank Redemption, The (1994)

Star Wars: Episode IV - A New Hope (a.k.a. Star Wars) (1977)

Annie Hall (1977)

Boys Life 2 (1997)

Pulp Fiction (1994)

Top 10 worst movies:

tibble(title = validation$title, rating = y_hat_final_reco) %>%

arrange(rating) %>%
group_by(title) %>%
select(title) %>%
head(10)

title

Time Walker (a.k.a. Being From Another Planet) (1982)

Dead Girl, The (2006)

Giant Gila Monster, The (1959)

Phish: Bittersweet Motel (2000)

title

Bug (2007)

Blue Kite, The (Lan feng zheng) (1993)

Dirty Love (2005)

Mass Transit (1998)

Kiss Them for Me (1957)

4 Conclusion
We started collecting and preparing the dataset for analysis, then we explored the information
seeking for insights that might help during the model building.
Next, we created a random model that predicts the rating based on the probability distribution of
each rating. This model gives the worst result.
We started the linear model with a very simple model which is just the mean of the observed
ratings. From there, we added movie and user effects, that models the user behavior and movie
distribution. With regularization we added a penalty value for the movies and users with few
number of ratings. The linear model achieved the RMSE of 0.8648177, successfully passing the
target of 0.8649.
Finally, we evaluated the recosystem package that implements the LIBMF algorithm, and
achieved the RMSE of 0.7826974.

4.1 Limitations
Some machine learning algorithms are computationally expensive to run in a commodity laptop
and therefore were unable to test. The required amount of memory far exceeded the available in
a commodity laptop, even with increased virtual memory.
Only two predictors are used, the movie and user information, not considering other features.
Modern recommendation system models use many predictors, such as genres, bookmarks,
playlists, etc.
The model works only for existing users, movies and rating values, so the algorithm must run
every time a new user or movie is included, or when the rating changes. This is not an issue for
small client base and a few movies, but may become a concern for large data sets. The model
should consider these changes and update the predictions as information changes.
There is no initial recommendation for a new user or for users that usually don’t rate movies.
Algorithms that uses several features as predictors can overcome this issue.

4.2 Future Work

This report briefly describes simple models that predicts ratings. There are two other widely
adopted approaches not discussed here: content-based and collaborative filtering.
The recommenderlab package implements these methods and provides an environment to build
and test recommendation systems.
Besides recommenderlab, there are other packages for building recommendation systems
available in The Comprehensive R Archive Network (CRAN) website13.

References
1. Rafael A. Irizarry (2019), Introduction to Data Science: Data Analysis and Prediction
Algorithms with R
2. Yixuan Qiu (2017), recosystem: recommendation System Using Parallel Matrix
Factorization
3. Michael Hahsler (2019), recommendationlab: Lab for Developing and Testing
recommendation Algorithms. R package version 0.2-5.
4. Georgios Drakos, How to select the Right Evaluation Metric for Machine Learning
Models: Part 1 Regression Metrics

1. https://www.edx.org/professional-certificate/harvardx-data-science↩
2. https://www.netflixprize.com/↩
3. https://www.edx.org/professional-certificate/harvardx-data-science↩
4. https://grouplens.org/↩
5. https://movielens.org/↩
6. https://grouplens.org/datasets/movielens/latest/↩
7. https://grouplens.org/datasets/movielens/10m/↩
8. A Matrix-factorization Library for Recommender Systems -
https://www.csie.ntu.edu.tw/~cjlin/libmf/↩
9. https://cran.r-project.org/web/packages/recosystem/index.html↩
10. https://www.netflixprize.com/↩
11. https://cran.r-project.org/web/packages/recosystem/vignettes/introduction.html↩
12. https://cran.r-project.org/web/packages/recosystem/vignettes/introduction.html↩
13. https://cran.r-project.org/web/packages/available_packages_by_name.html↩

Non-Exact Differential Equation: Integrating Factors
80% (10)
Non-Exact Differential Equation: Integrating Factors
7 pages
Schaum S Theory and Problems of State Space and Linear Systems PDF
100% (2)
Schaum S Theory and Problems of State Space and Linear Systems PDF
246 pages
Matlab Code For Truss Problem, Generalised Program
80% (5)
Matlab Code For Truss Problem, Generalised Program
2 pages
Grokking Machine Learning v7 MEAP
100% (9)
Grokking Machine Learning v7 MEAP
280 pages
Movie Recommendation System Project Presentation
100% (1)
Movie Recommendation System Project Presentation
14 pages
DES Numerical
No ratings yet
DES Numerical
14 pages
Movie Recommendation System.
No ratings yet
Movie Recommendation System.
40 pages
A Project-Based Seminar Report On Movie Rating Prediction System
100% (2)
A Project-Based Seminar Report On Movie Rating Prediction System
21 pages
Business Case - Zee Recommender Systems Approach
No ratings yet
Business Case - Zee Recommender Systems Approach
10 pages
Graph Theory Walk and Path
No ratings yet
Graph Theory Walk and Path
7 pages
C1&C2
No ratings yet
C1&C2
44 pages
Python-Based Personalized Recommendation System Development
No ratings yet
Python-Based Personalized Recommendation System Development
37 pages
Movie Recommender System PDF
100% (1)
Movie Recommender System PDF
5 pages
Heba DSBook 2022
No ratings yet
Heba DSBook 2022
337 pages
21ESKCA031 Baldeep Report
No ratings yet
21ESKCA031 Baldeep Report
34 pages
Artificial Neural Networks An Econometric Perspective
No ratings yet
Artificial Neural Networks An Econometric Perspective
98 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
3 pages
Recommendation System Using Machine Learning
No ratings yet
Recommendation System Using Machine Learning
49 pages
AI Beyond Classical Search &CSPs
No ratings yet
AI Beyond Classical Search &CSPs
116 pages
Deepanshu - 21BCS5066 Summer Institutional Training Report
No ratings yet
Deepanshu - 21BCS5066 Summer Institutional Training Report
37 pages
Crime Analysisand Prediction Using Data Mining
No ratings yet
Crime Analysisand Prediction Using Data Mining
8 pages
17BIT024
No ratings yet
17BIT024
51 pages
N 228, PV - $1,100, FV $13,438 Compute I: Solutions To TVM Practice Set II
No ratings yet
N 228, PV - $1,100, FV $13,438 Compute I: Solutions To TVM Practice Set II
5 pages
Location Planning Models and Methods
No ratings yet
Location Planning Models and Methods
74 pages
2022 Set A PYQ Paper - Digital Signal Processing PYQ Paper For Sem V Uploaded by Navdeep Raghav (DU Academic Corner)
No ratings yet
2022 Set A PYQ Paper - Digital Signal Processing PYQ Paper For Sem V Uploaded by Navdeep Raghav (DU Academic Corner)
4 pages
Comprehensive Report
No ratings yet
Comprehensive Report
41 pages
Question Bank On Operations Research
No ratings yet
Question Bank On Operations Research
11 pages
Report Final-MovieLens
No ratings yet
Report Final-MovieLens
47 pages
Review 2 (Autosaved)
No ratings yet
Review 2 (Autosaved)
30 pages
ERM Study Schedule
No ratings yet
ERM Study Schedule
32 pages
1st Harvard Project
No ratings yet
1st Harvard Project
17 pages
Chander Raj Project
No ratings yet
Chander Raj Project
46 pages
Fall Detection For Elderly People Using Machine Learning
No ratings yet
Fall Detection For Elderly People Using Machine Learning
5 pages
Panel Data Regression
No ratings yet
Panel Data Regression
39 pages
A Movie Recommendation System (Amrs)
No ratings yet
A Movie Recommendation System (Amrs)
27 pages
CPP Report
No ratings yet
CPP Report
30 pages
Faids Final Report.. 1 1
No ratings yet
Faids Final Report.. 1 1
30 pages
(23BCS13473) - Ritesh Rauniyar
No ratings yet
(23BCS13473) - Ritesh Rauniyar
16 pages
Recommender System
No ratings yet
Recommender System
45 pages
Bda Mini Project Report
No ratings yet
Bda Mini Project Report
23 pages
22BCS10271 Harsh Thakur Assignment
No ratings yet
22BCS10271 Harsh Thakur Assignment
13 pages
Project - Report - Movie Recommendfation System
No ratings yet
Project - Report - Movie Recommendfation System
31 pages
Ali Docs
No ratings yet
Ali Docs
32 pages
Dr.B.C.Royengi Neeri Ngcollege: Academyofprofessi Onalcourses Durgapur
No ratings yet
Dr.B.C.Royengi Neeri Ngcollege: Academyofprofessi Onalcourses Durgapur
33 pages
Report
No ratings yet
Report
11 pages
Iv Year - Mini Project - Final Review PPT Sample Format
No ratings yet
Iv Year - Mini Project - Final Review PPT Sample Format
25 pages
Dsba Rasika Mini Pro2
No ratings yet
Dsba Rasika Mini Pro2
17 pages
IV Year Technical Seminar Presentation
No ratings yet
IV Year Technical Seminar Presentation
16 pages
Iv Year Technical Seminar Presentation
No ratings yet
Iv Year Technical Seminar Presentation
16 pages
Report
No ratings yet
Report
26 pages
285 Project Paper
No ratings yet
285 Project Paper
7 pages
Movielens Recommender System Capstone Project: Compiled by Mahesh Halkeri
No ratings yet
Movielens Recommender System Capstone Project: Compiled by Mahesh Halkeri
19 pages
Final - Viva PPTX Santosh
No ratings yet
Final - Viva PPTX Santosh
24 pages
Ipr Final Certificates
No ratings yet
Ipr Final Certificates
20 pages
Chapter 8 Steady State Error
No ratings yet
Chapter 8 Steady State Error
24 pages
Movie Rating Prediction Presentation
No ratings yet
Movie Rating Prediction Presentation
11 pages
Recommender Lecture
No ratings yet
Recommender Lecture
29 pages
Mini Project Report Template
No ratings yet
Mini Project Report Template
12 pages
Project Srs
No ratings yet
Project Srs
17 pages
Movie Recommender Systems
No ratings yet
Movie Recommender Systems
11 pages
Minor Project
No ratings yet
Minor Project
15 pages
BDA Project
No ratings yet
BDA Project
12 pages
Mtes1104 Coursework
100% (1)
Mtes1104 Coursework
3 pages
Seminar Report
No ratings yet
Seminar Report
13 pages
Chapter 3s ALGO
No ratings yet
Chapter 3s ALGO
11 pages
MMW - Topic 6 - Polyas Solving Technique
No ratings yet
MMW - Topic 6 - Polyas Solving Technique
4 pages
Vaibhav - Project Report On Movie Recommender System Using Machine Learning
No ratings yet
Vaibhav - Project Report On Movie Recommender System Using Machine Learning
11 pages
Iml Project Proposal
No ratings yet
Iml Project Proposal
5 pages
Survey On Cinematics Recommendation System
No ratings yet
Survey On Cinematics Recommendation System
10 pages
Building Recommendation System Using Movielens Data
No ratings yet
Building Recommendation System Using Movielens Data
6 pages
F24 Proj4
No ratings yet
F24 Proj4
6 pages
MIT Data Science and Big Data Analytics Case Study
No ratings yet
MIT Data Science and Big Data Analytics Case Study
8 pages
SRMDB - in (B28 - Research Paper)
No ratings yet
SRMDB - in (B28 - Research Paper)
5 pages
Movie Recommender System Using K-Means Clustering AND K-Nearest Neighbor
No ratings yet
Movie Recommender System Using K-Means Clustering AND K-Nearest Neighbor
6 pages
SMSSpam Detectionusing Deep Neural IJPAM
No ratings yet
SMSSpam Detectionusing Deep Neural IJPAM
13 pages
GP Python Final Certificates
No ratings yet
GP Python Final Certificates
8 pages
ML Case Study
No ratings yet
ML Case Study
4 pages
IMDB - Movie Recomendation-DA Project
No ratings yet
IMDB - Movie Recomendation-DA Project
4 pages
Practical Work 1 - Recommender Systems
No ratings yet
Practical Work 1 - Recommender Systems
3 pages
Assingment Ai
No ratings yet
Assingment Ai
7 pages
CI DSA Study Guide PDF
No ratings yet
CI DSA Study Guide PDF
1 page
TOC Question Bank
No ratings yet
TOC Question Bank
5 pages
DWDM - External Imp Q's
No ratings yet
DWDM - External Imp Q's
2 pages
ALGO Practice Session-I
No ratings yet
ALGO Practice Session-I
2 pages
Minitab® and Lean Six Sigma: A Guide to Improve Business Performance Metrics
From Everand
Minitab® and Lean Six Sigma: A Guide to Improve Business Performance Metrics
Forrest Breyfogle
5/5 (1)
Defect Prediction in Software Development & Maintainence
From Everand
Defect Prediction in Software Development & Maintainence
Rudra Kumar
No ratings yet
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
César Pérez López
No ratings yet
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Machine Learning Algorithms for Data Scientists: An Overview
From Everand
Machine Learning Algorithms for Data Scientists: An Overview
Vinaitheerthan Renganathan
No ratings yet
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet