How To Make The Best Use Of Live Sessions
• Please log in 10 mins before the class starts and check your internet connection to avoid any network issues during the LIVE
session
• All participants will be on mute, by default, to avoid any background noise. However, you will be unmuted by instructor if
required. Please use the “Questions” tab on your webinar tool to interact with the instructor at any point during the class
• Feel free to ask and answer questions to make your learning interactive. Instructor will address your queries at the end of on-
going topic
• Raise a ticket through your LMS in case of any queries. Our dedicated support team is available 24 x 7 for your assistance
• Your feedback is very much appreciated. Please share feedback after each class, which will help us enhance your learning
experience
Copyright © edureka and/or its affiliates. All rights reserved.
Course Outline
Introduction to Python Dimensionality Reduction
Sequences and File Operations Supervised Learning - II
Deep Dive-Functions, OOPS,
Modules, Errors and Exceptions Unsupervised Learning
Introduction to Numpy, Pandas Association Rules Mining and
and Matplotlib Recommendation Systems
Data Manipulation Reinforcement Learning
Introduction to Machine Learning
with Python Time Series Analysis
Supervised Learning - I Model Selection and Boosting
Copyright © edureka and/or its affiliates. All rights reserved.
Association Rule Mining and
Recommendation Systems
Topics
The topics covered in this module are:
▪ Association Rule Mining
▪ Apriori Algorithm
▪ Recommendation Engines
▪ Building a Recommender System
Copyright © edureka and/or its affiliates. All rights reserved.
Objectives
After completing this module, you should be able to:
▪ Define Association Rules
▪ Understand Apriori Algorithm
▪ Define Recommendation Engine
▪ Discuss types of Recommendation Engines
❖ Collaborative Filtering
❖ Content-Based Filtering
▪ Illustrate steps to build Recommendation Engines
Copyright © edureka and/or its affiliates. All rights reserved.
Association Rule Mining
Copyright © edureka and/or its affiliates. All rights reserved.
Association Rule Mining
▪ Association rule mining is a method for
discovering interesting relations between
variables in large databases
▪ Pattern that states when an event occurs,
one more event occurs with a certain
probability in parallel
Customers who purchase a keyboard have 60%
likelihood of also purchasing a mouse for their PC
as well
Copyright © edureka and/or its affiliates. All rights reserved.
Association Rule Mining
An Example of association rule is given below,
X Y
It means that if a person buys item X then he will also buy
item Y
Let’s dive a bit
deeper into
Association
Rules
Copyright © edureka and/or its affiliates. All rights reserved.
Association Rule Mining: Parameters
Association rule mining takes care of the following parameters:
Support
Gives fraction of
transactions which contains
the item X and Y 02
01
Lift Confidence
Lift indicates the strength of Gives how often the items X
a rule over the random co- 03 and Y occurs together, given
occurrence of X and Y no. of times X occurs
Copyright © edureka and/or its affiliates. All rights reserved.
Calculating Support, Confidence & Lift
Support = no. of times item X occurred / Total Confidence = no. of times item X & Y occurred
number of transactions = / Total occurrence of X =
P (X Y))= Pr (Y | X) =
= support
Lift = no. of times item X & Y occurred / Total
occurrence of X multiplied by Total occurrence
of Y =
Goal: Find all rules with user-specified minimum support (minsup) and minimum confidence (minconf)
Copyright © edureka and/or its affiliates. All rights reserved.
Association Rule Mining
▪ Lets take an example,
▪ Suppose we have five transactions T1,T2,T3,T4,T5 as given below:
T1 : A, B, C
T2 : A, C, D
T3 : B, C, D
T4 : A, D, E
T5 : B, C, E
▪ Here,
❖ A,B,C,D,E are items in a store, I = {A,B,C,D,E}
❖ Set of all transactions T = {T1,T2,T3,T4,T5}
❖ Each transaction is a set of items, T ⊆ I
Copyright © edureka and/or its affiliates. All rights reserved.
Association Rule Mining
▪ Suppose, you made some association rules using our transaction database as given below:
AD
CA
AC
B&CD
▪ Now we can find support, confidence and lift for these rules using the formula explained earlier:
Rule Support Confidence Lift
AD 2/5 2/3 2/9
CA 2/5 2/4 1/6
AC 2/5 2/3 1/6
B&CD 1/5 1/3 1/9
Copyright © edureka and/or its affiliates. All rights reserved.
Now, let’s understand
how apriori algorithm
is used for generating
association rules.
Copyright © edureka and/or its affiliates. All rights reserved.
Apriori Algorithm
Uses frequent itemsets to generate association rules,
“A subset of a frequent itemset must also be a frequent itemset”
Frequent Itemset
𝜖 Frequent Itemset
Note: Frequent Itemset: Support value > Threshold value
Copyright © edureka and/or its affiliates. All rights reserved.
Let’s understand
Apriori with an
example
Copyright © edureka and/or its affiliates. All rights reserved.
Apriori Algorithm
Consider the following transaction Dataset. We are using a threshold value = 2 for this example:
TID Items
Minimum support count = 2
100 134
200 235
300 1235
400 25
500 135 First step is to
build a list of
itemsets of size
one using this
dataset
Copyright © edureka and/or its affiliates. All rights reserved.
Apriori Algorithm – First Iteration
List of itemsets of size one is made. Also, its support values are calculated
TID Items Itemset Support CI1
Itemset Support FI1
100 134 {1} 3
{1} 3
200 235 {2} 3
{2} 3
300 1235 {3} 4
{3} 4
400 25 {4} 1
{5} 4
500 135 {5} 4
Since, our threshold value is 2, any itemsets with support less than it are omitted.
Copyright © edureka and/or its affiliates. All rights reserved.
Apriori Algorithm – Second Iteration
▪ In this iteration we will extend the length of our item set with 1, i.e. k = k+1
▪ All the combinations of itemsets in FI1 is used in this iteration
TID Items Itemset Support CI2
Itemset Support FI2
100 134 {1,2} 1
{1,3} 3
200 235 {1,3} 3
{1,5} 2
300 1235 {1,5} 2
{2,3} 2
400 25 {2,3} 2
{2,5} 3
500 135 {2,5} 3
{3,5} 3
{3,5} 3
Copyright © edureka and/or its affiliates. All rights reserved.
Apriori Algorithm – Third Iteration
▪ In this iteration also we will extend the length of our item set. All the combinations of itemsets in FI2 is
used
TID Items CI3
Itemset Support
100 134
{1,2,3}
200 235
{1,2,5}
300 1235
{1,3,5}
400 25
{2,3,5}
500 135 Before finding
the support value
we will do some
pruning of the
dataset
Copyright © edureka and/or its affiliates. All rights reserved.
Apriori Algorithm – Pruning
▪ After the combinations are made you will divide your itemsets to check if there any other subsets whose
support you have’nt calculated yet
FI2
Itemset In FI2? CI3
TID Items Itemset Support
{1,2,3}
100 134 No {1,3} 3
{1,2},{1,3},{2,3}
200 235 {1,2,5} {1,5} 2
No
{1,2},{1,5},{2,5} {2,3} 2
300 1235
{1,3,5} {2,5} 3
400 25 Yes
{1,5},{1,3},{3,5}
{3,5} 3
500 135 {2,3,5}
Yes
{2,3},{2,5},{3,5}
If any of the subsets of these itemsets are not there in FI2 then remove that itemset
Copyright © edureka and/or its affiliates. All rights reserved.
Apriori Algorithm – Fourth Iteration
▪ Using the itemsets of CI3 you will create new itemset CI4
TID Items
100 134
Itemset Support FI3
200 235 Itemset Support CI4
{1,3,5} 2
300 1235 {1,2,3,5} 1
{2,3,5} 2
400 25
500 135
Since, support of your CI4 is less than 2, you will stop and return to the previous itemset, i.e. CI3
Copyright © edureka and/or its affiliates. All rights reserved.
Apriori Algorithm – Subset Creation
▪ Now you have the list of frequent itemsets as:
Itemset Support FI3 Let’s assume the
minimum
{1,3,5} 2 confidence value is
60%
{2,3,5} 2
▪ Using this you will generate all non empty subsets for each frequent itemsets:
❖ For I = {1,3,5}, subsets are {1,3}, {1,5}, {3,5}, {1}, {3}, {5}
❖ For I = {2,3,5}, subsets are {2,3}, {2,5}, {3,5}, {2}, {3}, {5}
▪ For every subsets S of I, you output the rule
❖ S → (I-S) (means S recommends I-S)
❖ if support(I) / support(S) >= min_conf value
Copyright © edureka and/or its affiliates. All rights reserved.
Apriori Algorithm – Applying Rules
▪ Now we will apply our rules to the itemsets of FI3:
1. {1,3,5}
TID Items
❖ Rule 1: {1,3} → ({1,3,5} - {1,3}) means 1 & 3 → 5
Confidence = support(1,3,5)/support(1,3) = 2/3 = 66.66% > 60% 100 134
Rule 1 is selected 200 235
❖ Rule 2: {1,5} → ({1,3,5} - {1,5}) means 1 & 5 → 3 300 1235
Confidence = support(1,3,5)/support(1,5) = 2/2 = 100% > 60% 400 25
Rule 2 is selected
500 135
❖ Rule 3: {3,5} → ({1,3,5} - {3,5}) means 3 & 5 → 1
Confidence = support(1,3,5)/support(3,5) = 2/3 = 66.66% > 60%
Rule 3 is selected
Copyright © edureka and/or its affiliates. All rights reserved.
Apriori Algorithm – Applying Rules
▪ Now we will apply our rules to the itemsets of FI3:
1. {1,3,5}
TID Items
❖ Rule 4: {1} → ({1,3,5} - {1}) means 1 → 3 & 5
100 134
Confidence = support(1,3,5)/support(1) = 2/3 = 66.66% > 60%
Rule 4 is selected 200 235
❖ Rule 5: {3} → ({1,3,5} - {3}) means 3 → 1 & 5 300 1235
Confidence = support(1,3,5)/support(3) = 2/4 = 50% <60% 400 25
Rule 5 is rejected
500 135
❖ Rule 6: {5} → ({1,3,5} - {5}) means 5 → 1 & 3
Confidence = support(1,3,5)/support(5) = 2/4 = 50% < 60%
Rule 6 is rejected
Copyright © edureka and/or its affiliates. All rights reserved.
Now let’s learn how
association rules are
used in Market Basket
Analysis in Python
Copyright © edureka and/or its affiliates. All rights reserved.
Market Basket Analysis
We will be using the following online transactional data of a retail store for generating association rules
InvoiceNo StockCode Description Quantity InvoiceDate UnitPrice CustomerID Country
536365 85123A WHITE HANGING HEART T-LIGHT HOLDER 6 01-12-2010 08:26 2.55 17850 United Kingdom
536365 71053 WHITE METAL LANTERN 6 01-12-2010 08:26 3.39 17850 United Kingdom
536365 84406B CREAM CUPID HEARTS COAT HANGER 8 01-12-2010 08:26 2.75 17850 United Kingdom
536365 84029G KNITTED UNION FLAG HOT WATER BOTTLE 6 01-12-2010 08:26 3.39 17850 United Kingdom
536365 84029E RED WOOLLY HOTTIE WHITE HEART. 6 01-12-2010 08:26 3.39 17850 United Kingdom
536365 22752 SET 7 BABUSHKA NESTING BOXES 2 01-12-2010 08:26 7.65 17850 United Kingdom
536365 21730 GLASS STAR FROSTED T-LIGHT HOLDER 6 01-12-2010 08:26 4.25 17850 United Kingdom
536366 22633 HAND WARMER UNION JACK 6 01-12-2010 08:28 1.85 17850 United Kingdom
536366 22632 HAND WARMER RED POLKA DOT 6 01-12-2010 08:28 1.85 17850 United Kingdom
536367 84879 ASSORTED COLOUR BIRD ORNAMENT 32 01-12-2010 08:34 1.69 13047 United Kingdom
536367 22745 POPPY'S PLAYHOUSE BEDROOM 6 01-12-2010 08:34 2.1 13047 United Kingdom
536367 22748 POPPY'S PLAYHOUSE KITCHEN 6 01-12-2010 08:34 2.1 13047 United Kingdom
536367 22749 FELTCRAFT PRINCESS CHARLOTTE DOLL 8 01-12-2010 08:34 3.75 13047 United Kingdom
536367 22310 IVORY KNITTED MUG COSY 6 01-12-2010 08:34 1.65 13047 United Kingdom
Data can be downloaded from the LMS
Copyright © edureka and/or its affiliates. All rights reserved.
Market Basket Analysis: Step1
First you need to get your pandas and MLxtend libraries imported and read the data
import pandas as pd
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
df = pd.read_excel('Online_Retail.xlsx')
df.head()
Copyright © edureka and/or its affiliates. All rights reserved.
Market Basket Analysis: Step 2
In this step, you will be doing:
▪ Data clean up which includes removing spaces from some of the descriptions
▪ Drop the rows that don’t have invoice numbers and remove the credit transactions
df['Description'] = df['Description'].str.strip()
df.dropna(axis=0, subset=['InvoiceNo'], inplace=True)
df['InvoiceNo'] = df['InvoiceNo'].astype('str')
df = df[~df['InvoiceNo'].str.contains('C')]
df
Copyright © edureka and/or its affiliates. All rights reserved.
Market Basket Analysis: Step 3
▪ After the clean-up, we need to consolidate the items into 1 transaction per row with each product
▪ For the sake of keeping the data set small, we are only looking at sales for France
basket = (df[df['Country'] =="France"]
.groupby(['InvoiceNo', 'Description'])['Quantity']
.sum().unstack().reset_index().fillna(0)
.set_index('InvoiceNo'))
basket
Copyright © edureka and/or its affiliates. All rights reserved.
Market Basket Analysis: Step 4
▪ There are a lot of zeros in the data but we also need to make sure any positive values are converted to a 1
and anything less the 0 is set to 0
def encode_units(x):
if x <= 0:
return 0
if x >= 1:
return 1
basket_sets = basket.applymap(encode_units)
basket_sets.drop('POSTAGE', inplace=True, axis=1)
basket_sets
Copyright © edureka and/or its affiliates. All rights reserved.
Market Basket Analysis: Step 4 (O/P)
Now, you have structured the data properly
Copyright © edureka and/or its affiliates. All rights reserved.
Market Basket Analysis: Step 5
In this step, you will:
▪ Generate frequent item sets that have a support of at least 7% (this number is chosen so that you can get
close enough)
▪ Generate the rules with their corresponding support, confidence and lift
frequent_itemsets = apriori(basket_sets, min_support=0.07,
use_colnames=True)
rules = association_rules(frequent_itemsets, metric="lift",
min_threshold=1)
rules.head()
Copyright © edureka and/or its affiliates. All rights reserved.
Market Basket Analysis: Step 5 (O/P)
Observations:
▪ A few rules with a high lift value, which means that it occurs more frequently than would be expected,
given the number of transaction and product combinations
▪ Most of the places the confidence is high as well
Copyright © edureka and/or its affiliates. All rights reserved.
Market Basket Analysis: Step 6
▪ Filter the dataframe using standard pandas code, for a large lift (6) and high confidence (.8)
rules[ (rules['lift'] >= 6) &
(rules['confidence'] >= 0.8) ]
Copyright © edureka and/or its affiliates. All rights reserved.
For association rules, the granularity
lies at the transaction level. They use
transactions as a central entity and
hence, do not provide user specific
insights
For that we will use
Recommendation
Engines
Copyright © edureka and/or its affiliates. All rights reserved.
Recommendation Engines
▪ A Recommendation engine (sometimes referred to as a recommender system) is a tool, that
allows algorithm developers predict what a user may or may not like among a list of given items
▪ Help users discover products or content that we may not come across otherwise
This makes recommendation engines a great part
of web sites and services such as Facebook,
YouTube, Amazon, and more
Copyright © edureka and/or its affiliates. All rights reserved.
Recommendation Engine Types
Recommendation engines work ideally in one of two ways:
User-based filtering Content-based filtering
Building a model from a user's past Utilizes a series of discrete
behavior as well as similar decisions characteristics of an item in order to
made by other users. This model is then recommend additional items to user
used to predict items that the user may with similar properties
have an interest in
It is also possible to combine both these methods to build a much more robust recommendation
engine (Hybrid Recommender Systems)
Copyright © edureka and/or its affiliates. All rights reserved.
Hybrid Recommender System – Example
A Hybrid recommender system is a based on both UBF and CBF
Netflix, nowadays is using hybrid recommender systems for
movie/series recommendations to users
Copyright © edureka and/or its affiliates. All rights reserved.
User-Based Collaborative Filtering (UBCF)
▪ This algorithm searches a large group of people and finds a smaller set with tastes similar to yours
▪ It looks at other things they like and combines them to create a ranked list of suggestions
▪ Many algorithms have been used in measuring user similarity or item similarity:
❖ K – nearest neighbor (k-NN) approach[21]
❖ Pearson Correlation
It does not rely on machine analyzable content and therefore it’s capable of accurately recommending complex
items such as drinks without requiring an "understanding" of the item itself
Copyright © edureka and/or its affiliates. All rights reserved.
UBCF Working: Step 1
Consider an example of Movie Recommendation
Suppose Sarah has just watched the movie Inside out. Let’s see how
the recommendation engine works and which are the movies that it
thinks she would like to see next
First step
1. Generate a list of users who have seen the following movies
Copyright © edureka and/or its affiliates. All rights reserved.
UBCF Working: Step 2
2. Here we have 4 users who has
watched the following movies
John Yes Yes Yes Yes
Dave No Yes No Yes
Stuart Yes No Yes No
Sam No No Yes Yes
Copyright © edureka and/or its affiliates. All rights reserved.
UBCF Working: Step 3
Now, we find Users similar to Sarah
John Yes Yes Yes Yes
Dave No Yes No Yes
Stuart Yes No Yes No
Sam No No Yes Yes
Sarah Yes ?? ?? ??
Copyright © edureka and/or its affiliates. All rights reserved.
UBCF Working: Step 4
4. Based on the data we have, John and Stuart has also watched the movie inside out, so they are similar to
Sarah
John Yes Yes Yes Yes
Dave No Yes No Yes
Stuart Yes No Yes No
Sam No No Yes Yes
Sarah Yes ?? ?? ??
Copyright © edureka and/or its affiliates. All rights reserved.
UBCF Working: Step 5
▪ Using the data of similar users we
can see that the movie Avengers
gets more vote, so it is
recommended to Sarah
John Yes Yes Yes Yes
Dave No Yes No Yes
Stuart Yes No Yes No
Sam No No Yes Yes
Sarah Yes ?? ?? ??
1 vote 2 votes 1 vote
Copyright © edureka and/or its affiliates. All rights reserved.
Pros & Cons of User-based Filtering
Data not a • Works on consumer item scenario without any user or item
constraint feature data availability
Easy to
• Easy to explain overall mathematical logic
Comprehend
Differentiated
• More differentiated output than associated rule
Output
• Need enough users or items to find a match, does not work for
Cold Start
new user or item
• User/ratings matrix is sparse and hence, hard to find users that
Sparsity
have rated the same items
• Tends to recommend popular items, cannot recommend items
Popularity Bias
to one with unique taste
Copyright © edureka and/or its affiliates. All rights reserved.
Content Based Filtering
Copyright © edureka and/or its affiliates. All rights reserved.
Content Based Filtering
Based on that data,
Have the content as a user profile is
the central entities generated to make
suggestions to the
user
03
02
01
Works with data that
the user provides, 04
either explicitly As the user provides
(rating) or implicitly more and more
(clicking on a link, input the engine’s
purchase history) accuracy increases
Copyright © edureka and/or its affiliates. All rights reserved.
Content Based Filtering – An Example
If Sam buys DUFF consumer merchandise, content based filtering considers DUFF beer
can as an entity and recommends other DUFF merchandise such as a tee shirt to the
buyer
Copyright © edureka and/or its affiliates. All rights reserved.
CBF Working: Step 1
Consider the same example of Movie Recommendation
Suppose we have watched the movie Inside out, Lets see how the
recommendation engine works and which are the movies that it
thinks we would like to go see
1. Generate a list of features about the movies like, Actors,
Directors, Themes etc
Copyright © edureka and/or its affiliates. All rights reserved.
CBF Working: Step 2
2. Compare columns of each movies
with column of the movie Inside
out and see which of them
matches
Animated Yes Yes No No
Marvel No No Yes Yes
Super Villain No Yes Yes Yes
IMDB rating 8+ Yes No Yes No
Comedy Yes Yes No Yes
Copyright © edureka and/or its affiliates. All rights reserved.
CBF Working: Step 3
3. The column with the most match
is of Minions, so the system will
recommend it to watch
Animated Yes Yes No No
Marvel No No Yes Yes
Super Villain No Yes Yes Yes
IMDB rating 8+ Yes No Yes No
Comedy Yes Yes No Yes
3
1
1
Copyright © edureka and/or its affiliates. All rights reserved.
Pros & Cons of Content Based Filtering
Only user data • No need for data on other users
No
• Able to recommend to users with unique tastes
Differentiation
No first rater
• Able to recommend new and unpopular item
problem
Find important • Finding the appropriate feature is hard. E.g.: For movies and
features images which all features are important
Over
• Never recommends items outside user’s content profile
specialization
No good
• Unable to exploit quality judgements of other users
judgements
Copyright © edureka and/or its affiliates. All rights reserved.
Use-Case: E-Commerce Sites
Many of the largest commerce Web sites are already using
recommender systems to help their customers find products to
purchase
▪ The products can be recommended based on the top overall
sellers on a site or based on an analysis of the past buying
behavior of the customer as a prediction for future buying
behavior
Copyright © edureka and/or its affiliates. All rights reserved.
Use-Case : Social Networks
Social networking sites employ recommendation systems in
contribution to providing better user experiences
▪ Facebook and LinkedIn focus on link recommendation where
friend recommendations are presented to users
▪ Most of the friend suggestion mechanism rely on pre-existing
user relationship to pick friend candidates
Copyright © edureka and/or its affiliates. All rights reserved.
Building a Recommender System
Copyright © edureka and/or its affiliates. All rights reserved.
Use Case: Scenario
▪ Consider the ratings dataset below, containing the data on: UserID, MovieID, Rating and Timestamp
▪ Each line of this file represents one rating of one movie by one user, and has the following format:
UserID::MovieID::Rating::Timestamp 196 242 3 881250949
186 302 3 891717742
▪ Ratings are made on a 5 star scale with half star increments 22 377 1 878887116
244 51 2 880606923
166 346 1 886397596
298 474 4 884182806
UserID: represents ID of the user 115 265 2 881171488
253 465 5 891628467
MovieID: represents ID of the movie 305 451 3 886324817
Timestamp: represents seconds since midnight Coordinated Universal Time 6 86 3 883603013
62 257 2 879372434
(UTC) of January 1, 1970
286 1014 5 879781125
200 222 5 876042340
210 40 3 891035994
224 29 3 888104457
Data can be downloaded from the LMS
Copyright © edureka and/or its affiliates. All rights reserved.
Use Case: Tasks To Do
Predict recommendations Estimate the user-movie
based on user movie 2 model validation using root
collaberative filtering mean squared error
3
1
4 Estimate the movie-movie
Predict recommendations
model validation using root
based on movie-movie
mean squared error
collaberative filtering
Copyright © edureka and/or its affiliates. All rights reserved.
Use Case Solution: Step 1
Load the ‘Ratings’ movie data into pandas with labels
df = pd.read_csv('Recommend.csv',names=['user_id',
'movie_id', 'rating', 'timestamp'])
df
Copyright © edureka and/or its affiliates. All rights reserved.
Use Case Solution: Step 2
Declare number of users and movies and create a train test split of 75/25
n_users = df.user_id.unique().shape[0]
n_movies = df.movie_id.unique().shape[0]
train_data, test_data = train_test_split(df, test_size=0.25)
The data here now gets split as train data and test data, such that the train data is 75% of
the total data
Copyright © edureka and/or its affiliates. All rights reserved.
Use Case Solution: Step 3
Populate the train matrix (user_id x movie_id), containing ratings such that [user_id index, movie_id index] =
given rating
train_data_matrix = np.zeros((n_users, n_movies))
for line in train_data.itertuples():
#[user_id index, movie_id index] = given rating.
train_data_matrix[line[1]-1, line[2]-1] = line[3]
train_data_matrix
Copyright © edureka and/or its affiliates. All rights reserved.
Use Case Solution: Step 4
Populate the test matrix (user_id x movie_id), containing ratings such that [user_id index, movie_id index] =
given rating
test_data_matrix = np.zeros((n_users, n_movies))
for line in test_data.itertuples():
#[user_id index, movie_id index] = given rating.
test_data_matrix[line[1]-1, line[2]-1] = line[3]
test_data_matrix
Copyright © edureka and/or its affiliates. All rights reserved.
Use Case Solution: Step 5
Creates cosine similarity matrices for users and movies and predict a user-movie recommendation model (based
on difference from mean rating as it’s a better indicator than absolute rating)
user_similarity = pairwise_distances(train_data_matrix, metric='cosine')
movie_similarity = pairwise_distances(train_data_matrix.T, metric='cosine')
mean_user_rating = train_data_matrix.mean(axis=1)[:, np.newaxis]
ratings_diff = (train_data_matrix - mean_user_rating)
user_pred = mean_user_rating + user_similarity.dot(ratings_diff) /
np.array([np.abs(user_similarity).sum(axis=1)]).T
user_pred
Copyright © edureka and/or its affiliates. All rights reserved.
Use Case Solution: Step 6
Predict the same for the movie based recommendation model (based on difference from mean rating as it’s a
better indicator than absolute rating)
movie_pred = train_data_matrix.dot(movie_similarity) /
np.array([np.abs(movie_similarity).sum(axis=1)])
movie_pred
Copyright © edureka and/or its affiliates. All rights reserved.
Use Case Solution: Step 7
Define a root mean squared error (RMSE) function to check the validity of the user-based and movie-based
recommendation model
def rmse(pred, test):
pred = pred[test.nonzero()].flatten()
test = test[test.nonzero()].flatten()
return sqrt(mean_squared_error(pred, test))
Copyright © edureka and/or its affiliates. All rights reserved.
Use Case Solution: Step 8
Pass the user-based model that you have recently created into the rmse function
rmse(user_pred, test_data_matrix)
The error so obtained is 3.1205151270317386 which is minimal and thus you can
conclude that the model is a good model
Copyright © edureka and/or its affiliates. All rights reserved.
Use Case Solution: Step 9
Pass the movie-based model that you have recently created into the rmse function
rmse(movie_pred, test_data_matrix)
The error so obtained is 3.447038130496627 which is minimal and thus you can
conclude that the model is a good model
Copyright © edureka and/or its affiliates. All rights reserved.
Summary
▪ Association Rule Mining
▪ Support, Confidence & Lift Evaluation
▪ Apriori Algorithm
▪ Implementing Market Basket Analysis
▪ Recommendation Engines
Copyright © edureka and/or its affiliates. All rights reserved.
Copyright © edureka and/or its affiliates. All rights reserved.
Copyright © edureka and/or its affiliates. All rights reserved.