0% found this document useful (0 votes)

14 views58 pages

M4

Uploaded by

acharyaramya412

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views58 pages

M4

Uploaded by

acharyaramya412

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 58

PART 1-Recommender

System
Introduction to Recommender Systems
Definition: Recommender systems are algorithms that suggest relevant
products or content to users based on their preferences or past behavior.
Importance: They help increase user satisfaction and boost sales by
providing personalized experiences.
Examples:
• Amazon’s “Customers who bought this item also bought”
• Netflix’s “Recommended for you”
Types of Recommender Systems
• Association Rule Mining
• It identifies patterns in large datasets by discovering rules that show the
relationship between items. For example, in a market basket analysis,
if many customers buy milk and bread together, the rule might be: If a
customer buys milk, they are likely to buy bread.
• Collaborative Filtering
• It predicts a user's preferences based on the preferences of similar
users. It can be user-based (similar users like similar items) or item-
based (similar items are liked by similar users).Widely used in
recommendation systems like Netflix or Amazon.
Types of Recommender Systems

• Matrix Factorization
• It breaks down a large matrix (e.g., user-item ratings matrix) into
lower-dimensional matrices to reveal latent factors. Commonly
used in collaborative filtering to find hidden relationships between
users and items, improving recommendations.
Dataset Overview
• Grocery Dataset: Contains transactions from a grocery store.
• This dataset contains transactions from a grocery store, where each record lists
items purchased together. It’s commonly used for association rule mining to find
frequent itemsets (e.g., bread and milk) and generate rules (e.g., if bread is
purchased, milk is likely bought too). It helps businesses understand customer
buying patterns.
• MovieLens Dataset: Contains movie ratings from users, with over 20 million
ratings.
• The MovieLens dataset contains user ratings for movies, with the largest version
having over 20 million ratings. It’s used for building and evaluating
recommendation systems, applying techniques like collaborative filtering and
matrix factorization to predict user preferences based on past ratings and user
similarities.
Association Rule (Association Rule Mining)
 Association rule finds combinations of items that frequently
occur together in orders or baskets (in a retail context).
 The items that frequently occur together are called itemsets.
Itemsets help to discover relationships between items that
people buy together and use that as a basis for creating
strategies like combining products as combo offer or place
products next to each other in retail shelves to attract customer
attention.
 An application of association rule mining is in Market Basket
Analysis (MBA).
 MBA is a technique used mostly by retailers to find
associations between items purchased by customers.
Association Rule (Association Rule Mining)
Association Rule (Association Rule Mining)
• The primary objective of a recommender system is to predict items that a
customer may purchase in the future based on his/her purchases so far. In
future, if a customer buys beer, can we predict what he/she is most likely to
buy along with beer? To predict this, we need to find out which items have
shown a strong association with beer in previously purchased baskets. We
can use association rule mining technique to find this out.
• Association rule considers all possible combination of items in the previous
baskets and computes various measures such as support, confidence, and lift
to identify rules with stronger associations. One of the challenges in association
rule mining is the number of combination of items that need to be considered;
as the number of unique items sold by the seller increases, the number
of associations can increase exponentially. And in today’s world,
retailers sell millions of items. Thus, association rule mining may require huge
computational power to go through all possible combinations. (Refer figure
in the previous slide)
• One solution to this problem is to eliminate items that
possibly cannot be part of any itemsets. One such algorithm
the association rules use apriori algorithm. The apriori
algorithm was proposed by Agrawal and Srikant (1994). The
rules generated are represented as
• {diapers} → {beer}
• which means that customers who purchased diapers also
purchased beer in the same basket. {diaper, beer} together
is called itemset. {diaper} is called the antecedent and the
{beer} is called the consequent. Both antecedents and
consequents can have multiple items, e.g. {diaper,
milk}→{beer, bread} is also a valid rule. Each rule is
measured with a set of metrics
Metrics Used in Association Rule Mining
Concepts such as support, confidence, and lift are used to
generate association rules.
Support indicates the frequencies of items appearing together in baskets with respect
to all possible baskets being considered
Lift can be interpreted as the degree of association between two items. Lift value 1 indicates that
the items are independent (no association), lift value of less than 1 implies that the products are
substitution (purchase one product will decrease the probability of purchase of the other product) and lift
value of greater than 1 indicates purchase of Product X will increase the probability of purchase of
Product Y. Lift value of greater than 1 is a necessary condition of generating association rules.
Generating Association Rules
1. Tools and Library
Python’s mlxtend library: This library provides efficient implementations of data mining
algorithms like Apriori and functions for generating association rules. It is widely used for
market basket analysis.
Step 1: Data Preprocessing
• One-hot encoding: Each transaction is represented in a binary format, where items
purchased are marked as ‘1’ and those not purchased as ‘0’.
• This format helps in efficient itemset generation using the Apriori algorithm.
Step 2: Applying Apriori Algorithm
• Setting a minimum support threshold: The minimum support is a user-defined value
(e.g., 0.01), filtering out infrequent itemset.
• Only itemset meeting this threshold are considered, reducing the computation time.
Generating Association Rules
Step 3: Extracting Frequent Itemsets
• Frequent Itemsets: Apriori generates itemsets that appear frequently together based on
the support threshold.
• These itemsets provide insights into common purchasing patterns.
Step 4: Generating Association Rules
• Rules Generation: Using metrics like confidence (likelihood of buying Y given X) and
lift (strength of the rule), association rules are derived from the frequent itemsets.
• Example: A rule like “If a customer buys bread, they are 70% likely to buy milk”.
6. Summary and Application
• Business Insights: These rules help businesses in product placement, targeted
promotions, and inventory management by revealing relationships between items.
Pros and Cons of Association Rule Mining
The following are advantages of using association rules:
• 1. Transactions data, which is used for generating rules, is
always available and mostly clean.
• 2. The rules generated are simple and can be interpreted.
The following are disadvantages of using association rules:
• Association rules do not take the preference or ratings
given by customers into account, which is an important
information pertaining for generating rules. If customers
have bought two items but disliked one of them, then the
association should not be considered.
Collaborative Filtering
Definition: Collaborative filtering is a recommendation technique that suggests items to users based
on patterns of behavior and preferences, leveraging similarities either between users or items.
Variations of Collaborative Filtering
User-Based Collaborative Filtering:
• Identifies users with similar tastes or preferences based on their past ratings or interactions.
Recommends items liked by similar users that the target user has not yet rated or interacted with.
• Example: If User A and User B both liked similar movies, User B might be recommended a
movie that User A enjoyed but hasn’t been watched by User B yet.
Item-Based Collaborative Filtering:
• Focuses on finding items that are similar based on user interactions. Recommends items that are
frequently rated similarly by many users.
• Example: If many users who liked Movie X also liked Movie Y, then Movie Y is recommended to
users who liked Movie X.
How to Find Similarity between Users?

The picture in
Figure 9.2 depicts
three users
Rahul, Purvi, and
Gaurav and the
books they have
bought and rated.
The users are
represented using
their rating on the
Euclidean space in
Figure 9.3. Here the
dimensions are
represented by the
two books Into Thin
Air and Missoula,
which are the two
books commonly
bought by Rahul,
Purvi, and Gaurav.
Rahul’s preferences are similar to Purvi’s rather than to Gaurav’s. So, the other book, Into the Wild, which
Rahul has bought and rated high, can now be recommended to Purvi.
• Collaborative filtering comes in two variations:
1. User-Based Similarity: Finds K similar users based on
common items they have bought.
2. Item-Based Similarity: Finds K similar items based on
common users who have bought those items.
Calculating Cosine Similarity
Definition
• Cosine similarity measures the cosine of the angle between two vectors (e.g., rating vectors of users
or items). It calculates how similar two users or items are based on their ratings or interactions.
Range
• Cosine similarity values range from -1 to 1:
• 1 indicates perfect similarity (the vectors are identical).
• 0 indicates no similarity (the vectors are orthogonal or unrelated).
• Negative values may indicate opposite preferences, though they are less common in practical recommendation
systems.

Application
• Used in Collaborative Filtering to identify similar users or items, helping to make accurate
recommendations based on shared preferences or behaviors. It is especially useful when comparing
user-item interaction patterns.
Calculating Cosine Similarity between Users
for table 9.5 / User-Based Similarity
Challenges with User-Based Similarity
• Finding user similarity does not work for new users.
• We need to wait until the new user buys a few items and rates them.
Only then users with similar preferences can be found and
recommendations can be made based on that. This is called cold start
problem in recommender systems.
• This can be overcome by using item-based similarity. Item-based
similarity is based on the notion that if two items have been bought by
many users and rated similarly, then there must be some inherent
relationship between these two items. In other terms, in future, if a user
buys one of those two items, he or she will most likely buy the other one.
Item-Based Similarity
• If two movies, movie A and movie B, have been watched by several users and rated very similarly, then
movie A and movie B can be similar in taste. In other words, if a user watches movie A, then he or she is
very likely to watch B and vice versa.
USING SURPRISE LIBRARY
• For real-world implementations, we need a more extensive library
which hides all the implementation details and provides abstract
Application Programming Interfaces (APIs) to build recommender
systems. Surprise is a Python library for accomplishing this.
• 1. Various ready-to-use prediction algorithms like neighborhood
methods (user similarity and item similarity), and matrix
factorization-based. It also has built-in similarity measures such as
cosine, mean square distance (MSD), Pearson correlation
coefficient, etc.
• 2. Tools to evaluate, analyze, and compare the performance of the
algorithms. It also provides methods to recommend.
User-Based Similarity Algorithm
• The surprise.prediction_algorithms.knns.KNNBasic provides the
collaborative filtering algorithm and takes the following parameters:
1. K: The (max) number of neighbors to take into account for
aggregation.
2. min_k: The minimum number of neighbors to take into account for
aggregation, if there are not enough neighbors.
3. sim_options - (dict): A dictionary of options for the similarity measure.
(a) name: Name of the similarity to be used, e.g., cosine, msd or
pearson. (b) user_based: True for user-based similarity and False for item-
based similarity.
Finding the Best Model

• 1. Number of neighbors [10, 20].

• 2. Similarity indexes [‘cosine’, ‘pearson’] .
• 3. User-based or item-based similarity.
The last column rank_test_rmse shows the rank of the model as per the RMSE on test data
(mean_test_rmse) among all the models.
Making Predictions
Matrix Representation
Matrix Representation Overview
• The matrix representation is a fundamental structure used in recommendation systems to organize data for
analysis, typically involving users and items (e.g., movies, products).
Pivot Table
• A pivot table is constructed where:
• Rows represent users, and columns represent movies (or items).
• The cell values indicate the ratings given by users to movies (e.g., a 4-star rating).
• Unrated items by a user are left as NaN (Not a Number), representing missing data.

Sparse Matrix
• In real-world datasets, most users have rated only a few items, making the matrix sparse (i.e., containing
many NaN values).
• To simplify computation, these NaN entries are often replaced with 0s, assuming the user has not interacted
with the item.
• This helps when applying techniques like matrix factorization, which require numerical data.
MATRIX FACTORIZATION
• Matrix factorization is a matrix decomposition technique. Matrix
decomposition is an approach for reducing a matrix into its constituent
parts. Matrix factorization algorithms decompose the user-item matrix
into the product of two lower dimensional rectangular matrices.
• In Figure 9.4, (next slide) the original matrix contains users as rows,
movies as columns, and rating as values. The matrix can be decomposed
into two lower dimensional rectangular matrices.
• The Users–Movies matrix contains the ratings of 3 users (U1, U2, U3)
for 5 movies (M1 through M5). T his Users–Movies matrix is factorized
into a (3, 3) Users–Factors matrix and (3, 5) Factors–Movies matrix.
Multiplying the Users–Factors and Factors–Movies matrix will result in
the original Users Movies matrix.
The idea behind matrix factorization is that there are latent factors that determine why a
user rates a movie, and the way he/she rates. The factors could be the story or actors or
any other specific attributes of the movies. But we may never know what these factors
actually represent. That is why they are called latent factors. A matrix with size (n, m),
where n is the number of users and m is the number of movies, can be factorized into (n,
k) and (k, m) matrices, where k is the number of factors.
Real-World Applications
Retail (Product recommendations based on purchase history):
• Amazon uses a collaborative filtering recommendation system, where it suggests products like "Customers who bought this
also bought" based on users' browsing and purchase patterns
• Sephora, a beauty retailer, leverages product recommendations on product pages based on customer ratings and purchase
history. They suggest complementary products like makeup tools and skincare along with the primary product
Streaming Services (Movie or music recommendations):
• Netflix uses collaborative filtering to recommend movies and TV shows based on the user’s viewing history and similarities
to other users
• Spotify recommends playlists and songs using machine learning algorithms that analyze a user's listening habits, favorite
genres, and even the time of day
E-commerce (Personalized suggestions for cross-selling and upselling):
• eBay utilizes cross-selling recommendations such as "Frequently Bought Together" and upselling like "This item is part of
a more expensive version" to drive higher value purchases
• Best Buy offers upselling suggestions, like recommending a more advanced version of a tech product (e.g., higher-end
laptops) or additional accessories based on the user's interests
•Part 2 -Text
Analytics
TEXT ANALYTICS OVERVIEW
Definition: Text analytics is the process of transforming WHEN YOUR DATA IS
unstructured text into structured data for analysis. UNSTRUCTURED BUT
YOU WANT STRUCTURED
INSIGHTS ASAP!
Applications:
Sentiment Analysis: Used by businesses to understand customer
sentiment in reviews.
Spam Detection: Identifies spam emails by recognizing patterns
in words.
Topic Extraction: Clusters news articles into topics like politics,
sports, and technology.
Language Identification: Recognizes the language of a text (e.g.,
English, Spanish) for translation services.
Tools: Natural Language Processing (NLP), machine learning,
statistical analysis.
TEXT CLASSIFICATION AND SENTIMENT ANALYSIS
Text Classification: Assigns predefined categories to text data based on its
content.
● Examples: Classifying customer reviews as “positive” or “negative,” or
categorizing emails as “spam” or “not spam.”
Sentiment Analysis:
● Definition: Sentiment analysis determines the emotion (positive,
negative, neutral) expressed in text.
● Example: "The movie was fantastic!" (positive sentiment); "I wasted my
time on this movie" (negative sentiment).
● Importance: Companies use sentiment analysis to gauge public opinion
on products, events, and services
Exploring the Dataset

Loading data

Getting positive
sentiments

output
Exploring
sentiment
data using
matplotlib
output
Text Pre-processing
One way is to consider each word as a feature and find a measure to capture whether
a word exists or does not exist in a sentence. This is called the bag-of-words (BoW)
model. That is, each sentence (comment on a movie or a product) is treated as a bag
of words. Each sentence (record) is called a document and collection of all documents
is called corpus.

Bag-of-Words (BoW) Model (IMP)

The first step in creating a BoW model is to create a dictionary of all the words used in
the corpus. Then we will convert each document to a vector that represents words
available in the document. There are three ways to identify the importance of words in a
BoW model:

1. Count Vector Model

2. Term Frequency Vector Model
3. Term Frequency-Inverse Document Frequency (TF-IDF) Model
1. Count Vector Model
Consider the following two documents:
1. Document 1 (positive sentiment): I really really like IPL.
2. Document 2 (negative sentiment): I never like IPL.
2. Term Frequency Vector Model
2. Term Frequency-Inverse Document Frequency (TF-IDF)
Creating Count Vectors for sentiment_train Dataset
Each document in the dataset needs to be transformed into TF or TF-IDF vectors. The
sklearn.feature_ extraction.text module provides classes for creating both TF and TF-IDF vectors
from text data. We will use CountVectorizer to create count vectors. In CountVectorizer, the
documents will be represented by the number of times each word appears in the document.

Total number of features or unique words in the corpus are 2132.

Removing Stop Words
It can be noted that the stop words have been removed. But we also
notice another problem. Many words appear in multiple forms. For
example, love and loved. The vectorizer treats the two words as two
separate words and hence creates two separate features. But, if a
word has similar meaning in all its form, we can use only the root word
as a feature. Stemming and Lemmatization are two popular
techniques that are used to convert the words into root words.
Stemming: This removes the differences between inflected forms of a
word to reduce each word to its root form. This is done by mostly
chopping off the end of words (suffix). For instance, love or loved will
be reduced to the root word love. The root form of a word may not even
be a real word. For example, awesome and awesomeness will be
stemmed to awesom. One problem with stemming is that chopping of
words may result in words that are not part of vocabulary (e.g.,
awesom). PorterStemmer and LancasterStemmer are two popular
algorithms for stem ming, which have rules on how to chop off a word.
Lemmatization: This takes the morphological analysis of the words into
consideration. It uses a language dictionary (i.e., English dictionary) to convert
the words to the root word. For example, stemming would fail to differentiate
between man and men, while lemmatization can bring these words to its
original form man.
Natural Language Toolkit (NLTK) is a very popular library in Python that has an extensive set of features for natural
language processing. NLTK supports PorterStemmer, EnglishStemmer, and LancasterStemmer for stemming,
while WordNetLemmatizer for lemmatization. These features can be used in CountVectorizer, while creating count
vectors. We need to create a utility method, which takes documents, tokenizes it to create words, stems the words
and remove the stop words before returning the final set of words for creating vectors.
NAÏVE–BAYES MODEL FOR SENTIMENT CLASSIFICATION (IMP) Naïve–Bayes
classifier is
widely used in
Natural
Language
Processing and
proved to give
better results. It
works on the
concept of
Bayes’
theorem.
Data split

Build Naïve–Bayes Model

Make Prediction on Test Case

Finding Model Accuracy

Confusion matrix

In the confusion matrix, the rows represent the

actual number positive and negative
documents in the test set, whereas the
columns represent what the model has
predicted. Label 1 means positive sentiment
and label 0 means negative sentiment. Figure
10.6 shows, as per the model prediction, that
there are only 14 positive sentiment documents
classified wrongly as negative sentiment
documents (False Negatives) and there are
only 28 negative sentiment documents
classified wrongly as positive sentiment
documents (False Positives). Rest all have
been classified correctly.
USING TF-IDF VECTORIZER
CHALLENGES OF TEXT ANALYTICS
1. Context-Specific Language
- Language used to describe movies is different from the language used for other
products, like apparel.
- Training data should come from a similar context or distribution to ensure
effective model building.

2. Informal Language
- Language on social media is often informal, includes a mix of languages, and
uses emoticons.
- Training data should include similar examples to help the model learn from these
variations.

3. Limitations of Bag-of-Words Model

- It ignores the structure and sequence of words in a sentence.
- This limitation can be partially addressed by using **n-grams**, which consider
sequences of words.
Using n-Grams
The models we built in this chapter, created features out of each token or word. But
the meaning of some of the words might be dependent on the words it precedes or
succeeds, for example not happy. It should be considered as one feature and not
as two different features. n-gram is a contiguous sequence of n words. When two
consecutive words are treated as one feature, it is called bigram; three consecutive
words is called trigram and so on. We will write a new custom analyzer
get_stemmed_tokens(), which splits the sentences and stems the words from them
before creating n-grams. The following code block removes non-alphabetic characters
and then applies stemming.

Hooked (Review and Analysis of Eyal and Hoover's Book)
From Everand
Hooked (Review and Analysis of Eyal and Hoover's Book)
BusinessNews Publishing
No ratings yet
Oxfam Shop Volunteer Application Form A4
No ratings yet
Oxfam Shop Volunteer Application Form A4
2 pages
Sample Provisional Acceptance Sheger
67% (9)
Sample Provisional Acceptance Sheger
4 pages
AIML Presentation
No ratings yet
AIML Presentation
21 pages
CS8091 BDA Unit 3
No ratings yet
CS8091 BDA Unit 3
144 pages
BANA 560 Lecture 6 Association Rules Collaborative Filtering
No ratings yet
BANA 560 Lecture 6 Association Rules Collaborative Filtering
34 pages
Chapter 14 Association Rules Collaborative Filtering
No ratings yet
Chapter 14 Association Rules Collaborative Filtering
34 pages
Module 4
No ratings yet
Module 4
11 pages
Module 4-1
No ratings yet
Module 4-1
34 pages
ML Module3
No ratings yet
ML Module3
83 pages
Lesson #9
No ratings yet
Lesson #9
18 pages
Module4 RecommenderSystem
No ratings yet
Module4 RecommenderSystem
11 pages
Unit-3 New
No ratings yet
Unit-3 New
75 pages
Module 4 - Notes - 13 12 2024
No ratings yet
Module 4 - Notes - 13 12 2024
21 pages
Association Rule Mining
No ratings yet
Association Rule Mining
97 pages
MODULE - 4 Advance AIML Part 1
No ratings yet
MODULE - 4 Advance AIML Part 1
12 pages
Ariori Introduction and Concept
No ratings yet
Ariori Introduction and Concept
37 pages
AIML Mod 4
No ratings yet
AIML Mod 4
37 pages
CH 5
No ratings yet
CH 5
53 pages
DMBAR Chapter 14 Association Rules and Collaborative Filtering
No ratings yet
DMBAR Chapter 14 Association Rules and Collaborative Filtering
21 pages
6 - Association Rules - For Students
No ratings yet
6 - Association Rules - For Students
39 pages
Unit 4 - Data Mining - WWW - Rgpvnotes.in
No ratings yet
Unit 4 - Data Mining - WWW - Rgpvnotes.in
10 pages
Market Basket Analysis and Advanced Data Mining: Professor Amit Basu
No ratings yet
Market Basket Analysis and Advanced Data Mining: Professor Amit Basu
24 pages
Efficient Adaptive-Support Association Rule Mining
No ratings yet
Efficient Adaptive-Support Association Rule Mining
23 pages
Unit IV Recommender System
No ratings yet
Unit IV Recommender System
5 pages
Unit-5: Concept Description and Association Rule Mining
No ratings yet
Unit-5: Concept Description and Association Rule Mining
39 pages
CSA 106 Market Basket Analysis
No ratings yet
CSA 106 Market Basket Analysis
13 pages
14-Introduction To Apriori Level Wise Algorithm-03-09-2024
No ratings yet
14-Introduction To Apriori Level Wise Algorithm-03-09-2024
32 pages
Unit 2
No ratings yet
Unit 2
14 pages
Data Mining Task - Association Rule Mining
No ratings yet
Data Mining Task - Association Rule Mining
30 pages
Seminar 6
No ratings yet
Seminar 6
30 pages
Association
No ratings yet
Association
54 pages
Unit - III
No ratings yet
Unit - III
38 pages
Association Rules
No ratings yet
Association Rules
39 pages
Arm PPT
No ratings yet
Arm PPT
15 pages
Computing Techniques-Continued: Association Rule Mining Clustering Time Series Analysis
No ratings yet
Computing Techniques-Continued: Association Rule Mining Clustering Time Series Analysis
174 pages
Unit - III
No ratings yet
Unit - III
27 pages
Aml Unit 3
No ratings yet
Aml Unit 3
17 pages
DM Association
No ratings yet
DM Association
43 pages
Association-Analysis
No ratings yet
Association-Analysis
72 pages
Unit4 Ch6 AssociationRuleMining
No ratings yet
Unit4 Ch6 AssociationRuleMining
82 pages
Chapter 3
No ratings yet
Chapter 3
27 pages
Association: Market Basket Analysis
No ratings yet
Association: Market Basket Analysis
40 pages
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
No ratings yet
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
24 pages
Association Rule
No ratings yet
Association Rule
17 pages
04-Association Rule Mining
No ratings yet
04-Association Rule Mining
22 pages
Chapter 5 Data Mining: Dr. Huma Lone
No ratings yet
Chapter 5 Data Mining: Dr. Huma Lone
56 pages
Data Mining Techniques (DMT) by Kushal Anjaria Session-2: Tid Items
No ratings yet
Data Mining Techniques (DMT) by Kushal Anjaria Session-2: Tid Items
4 pages
P-3 1 5-Association
No ratings yet
P-3 1 5-Association
46 pages
Association Rule Mining
No ratings yet
Association Rule Mining
24 pages
Recommended System
No ratings yet
Recommended System
33 pages
DM Unit-2
No ratings yet
DM Unit-2
22 pages
MS (Data Science) Fall 2020 Semester
No ratings yet
MS (Data Science) Fall 2020 Semester
36 pages
Clickstream Analytics
No ratings yet
Clickstream Analytics
22 pages
Session 8-Association Rules Mining
No ratings yet
Session 8-Association Rules Mining
75 pages
COS10022 DSP Week06 Association Rules
No ratings yet
COS10022 DSP Week06 Association Rules
52 pages
DM GTU Study Material Presentations Unit-3 21052021124240PM
No ratings yet
DM GTU Study Material Presentations Unit-3 21052021124240PM
54 pages
Association Rule Mining
No ratings yet
Association Rule Mining
17 pages
Contents
No ratings yet
Contents
59 pages
Value Nets (Review and Analysis of Bovet and Martha's Book)
From Everand
Value Nets (Review and Analysis of Bovet and Martha's Book)
BusinessNews Publishing
No ratings yet
A Manual for Agribusiness Value Chain Analysis in Developing Countries
From Everand
A Manual for Agribusiness Value Chain Analysis in Developing Countries
Benjamin Dent
No ratings yet
Product Creation: The Basics
From Everand
Product Creation: The Basics
Janet Amber
No ratings yet
Module 2
No ratings yet
Module 2
36 pages
Full Stack
No ratings yet
Full Stack
123 pages
Module 5
No ratings yet
Module 5
86 pages
Module 4
No ratings yet
Module 4
12 pages
NOSQL
No ratings yet
NOSQL
64 pages
AAIML
No ratings yet
AAIML
10 pages
Radial Basis Function and Case Based Reasoning
No ratings yet
Radial Basis Function and Case Based Reasoning
7 pages
Quimpo Vs Mendoza - Digest
100% (1)
Quimpo Vs Mendoza - Digest
1 page
DLL Science 6 Q2 W9 1
100% (1)
DLL Science 6 Q2 W9 1
10 pages
SCERTS Implementation
100% (2)
SCERTS Implementation
8 pages
Diesel SWD
No ratings yet
Diesel SWD
4 pages
Conclusion
No ratings yet
Conclusion
3 pages
Pharmacology of The Gastrointestinal Drugs (Ii) Choleretics, Cholagogues and Other Biliary Secretion Modifiers
No ratings yet
Pharmacology of The Gastrointestinal Drugs (Ii) Choleretics, Cholagogues and Other Biliary Secretion Modifiers
14 pages
1VW BF820W NPN HV
No ratings yet
1VW BF820W NPN HV
7 pages
Table of Back Muscles RW
100% (1)
Table of Back Muscles RW
3 pages
L&T Long Term Infrastructure Bond Tranche 1 Application Form
No ratings yet
L&T Long Term Infrastructure Bond Tranche 1 Application Form
8 pages
L3 Math September 2025-26
No ratings yet
L3 Math September 2025-26
27 pages
Classroom Rules For Online Learning
No ratings yet
Classroom Rules For Online Learning
1 page
A Restatement of Hohfeld. Max Radin
No ratings yet
A Restatement of Hohfeld. Max Radin
25 pages
UNIT 8 GRADE 10 MOCK TEST - Key
No ratings yet
UNIT 8 GRADE 10 MOCK TEST - Key
6 pages
Counter Affidavit of FR Shay Cullen
No ratings yet
Counter Affidavit of FR Shay Cullen
6 pages
Living Things & Non - Living Things
No ratings yet
Living Things & Non - Living Things
8 pages
Saiva Siddhanta Church Act, No 22 of 1988
No ratings yet
Saiva Siddhanta Church Act, No 22 of 1988
2 pages
Geology of Sindh District
60% (5)
Geology of Sindh District
3 pages
MainNav GPS Manual MG-950d User Manual 2008-09-16
No ratings yet
MainNav GPS Manual MG-950d User Manual 2008-09-16
20 pages
New Microsoft Word Document
No ratings yet
New Microsoft Word Document
2 pages
Two Stories About Flying
No ratings yet
Two Stories About Flying
4 pages
Adobe Scan 19 May 2025
No ratings yet
Adobe Scan 19 May 2025
4 pages
Module SCI109 EARTH SCI SOIL RESOURCES
No ratings yet
Module SCI109 EARTH SCI SOIL RESOURCES
9 pages
Present Simple
No ratings yet
Present Simple
4 pages
HTML Quiz Questions
100% (1)
HTML Quiz Questions
11 pages
Lazar Et Al 2020
No ratings yet
Lazar Et Al 2020
31 pages
Human Rightts
No ratings yet
Human Rightts
2 pages
7-S Framework of McKinsey
No ratings yet
7-S Framework of McKinsey
13 pages
Fine Dining Lovers
No ratings yet
Fine Dining Lovers
3 pages

M4

Uploaded by

M4

Uploaded by

PART 1-Recommender

• 1. Number of neighbors [10, 20].

Bag-of-Words (BoW) Model (IMP)

1. Count Vector Model

Total number of features or unique words in the corpus are 2132.

Build Naïve–Bayes Model

Make Prediction on Test Case

Finding Model Accuracy

In the confusion matrix, the rows represent the

3. Limitations of Bag-of-Words Model

You might also like