0% found this document useful (0 votes)
38 views

Data Mining in Recommender

The document discusses different data mining techniques that can be used to build recommender systems, including clustering, classification, and association rule mining. Clustering involves grouping users with similar preferences and making recommendations based on the opinions of others in the user's cluster. Classification uses machine learning algorithms like neural networks to classify users and items in order to recommend items to users. Both clustering and classification can analyze large datasets to infer recommendation rules.

Uploaded by

Srinivas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views

Data Mining in Recommender

The document discusses different data mining techniques that can be used to build recommender systems, including clustering, classification, and association rule mining. Clustering involves grouping users with similar preferences and making recommendations based on the opinions of others in the user's cluster. Classification uses machine learning algorithms like neural networks to classify users and items in order to recommend items to users. Both clustering and classification can analyze large datasets to infer recommendation rules.

Uploaded by

Srinivas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

The Application of Data-Mining to Recommender Systems

of the opinions of a set of neighbors for that item. As Clustering techniques usually produce less-personal
applied in recommender systems, neighbors are often recommendations than other methods, and in some
generated online on a query-by-query basis rather than cases, the clusters have worse accuracy than CF-based
through the off-line construction of a more thorough algorithms (Breese, Heckerman, & Kadie, 1998). Once
model. As such, they have the advantage of being able the clustering is complete, however, performance can
to rapidly incorporate the most up-to-date information, be very good, since the size of the group that must be
but the search for neighbors is slow in large databases. analyzed is much smaller. Clustering techniques can
Practical algorithms use heuristics to search for good also be applied as a “first step” for shrinking the can-
neighbors and may use opportunistic sampling when didate set in a CF-based algorithm or for distributing
faced with large populations. neighbor computations across several recommender
Both nearest-neighbor and correlation-based rec- engines. While dividing the population into clusters
ommenders provide a high level of personalization in may hurt the accuracy of recommendations to users
their recommendations, and most early systems using near the fringes of their assigned cluster, pre-cluster-
these techniques showed promising accuracy rates. As ing may be a worthwhile trade-off between accuracy
such, CF-based systems have continued to be popular and throughput.
in recommender applications and have provided the Classifiers are general computational models for
benchmarks upon which more recent applications have assigning a category to an input. The inputs may be
been compared. vectors of features for the items being classified or data
about relationships among the items. The category is
a domain-specific classification such as malignant/be-
DATA MINING IN RECOMMENDER nign for tumor classification, approve/reject for credit
APPLICATIONS requests, or intruder/authorized for security checks.
One way to build a recommender system using a
The term data mining refers to a broad spectrum of math- classifier is to use information about a product and a
ematical modeling techniques and software tools that customer as the input, and to have the output category
are used to find patterns in data and user these to build represent how strongly to recommend the product to
models. In this context of recommender applications, the customer. Classifiers may be implemented using
the term data mining is used to describe the collection many different machine-learning strategies including
of analysis techniques used to infer recommendation rule induction, neural networks, and Bayesian networks.
rules or build recommendation models from large In each case, the classifier is trained using a training
data sets. Recommender systems that incorporate data set in which ground truth classifications are available.
mining techniques make their recommendations using It can then be applied to classify new items for which
knowledge learned from the actions and attributes of the ground truths are not available. If subsequent
users. These systems are often based on the develop- ground truths become available, the classifier may be
ment of user profiles that can be persistent (based on retrained over time.
demographic or item “consumption” history data), For example, Bayesian networks create a model
ephemeral (based on the actions during the current based on a training set with a decision tree at each
session), or both. These algorithms include clustering, node and edges representing user information. The
classification techniques, the generation of association model can be built off-line over a matter of hours or
rules, and the production of similarity graphs through days. The resulting model is very small, very fast,
techniques such as Horting. and essentially as accurate as CF methods (Breese,
Clustering techniques work by identifying groups Heckerman, & Kadie, 1998). Bayesian networks may
of consumers who appear to have similar preferences. prove practical for environments in which knowledge
Once the clusters are created, averaging the opinions of consumer preferences changes slowly with respect
of the other consumers in her cluster can be used to to the time needed to build the model but are not suit-
make predictions for an individual. Some clustering able for environments in which consumer preference
techniques represent each user with partial participation models must be updated rapidly or frequently.
in several clusters. The prediction is then an average Classifiers have been quite successful in a variety
across the clusters, weighted by degree of participation. of domains ranging from the identification of fraud



You might also like