0% found this document useful (0 votes)
16 views

ContentBasedFiltering (2)

Information security
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

ContentBasedFiltering (2)

Information security
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Content-Based

Recommendation
Content-based?

• Item descriptions to identify items that


are of particular interest to the user
Example
Example
Comparing with Non-
content based
Items
• User-based CF
Searches for similar users
in user-item rating
matrix

• No rating

• Item-feature matrix
Ratings
Item Representation

• Structured
• Unstructured
• Semi-structured
Structured
• Attribute - value
Unstructured
• Full-text
• No attributes formally defined
• Other complicated problems - such as
synonymy, polysymy
Semi-structured

• Structured + unstructured
• Well defined attributes/values + free text
Conversion of
Unstructured Data
• Need to convert to structured form
• IR techniques
• VSM, TF-IDF, stemming, etc.
User Profile
• A model of user s preference
• A history of the user s interactions
• Recently viewed items
• Already viewed items
• Training data — machine learning
User Profile
Tear Rate
Reduced Normal

No
Young
Age Pre-presb

Yes Presbyoptic
Prescription
Hypermetrope
Myope
Yes
Astigmatic
Yes No
Yes No
User Profiling
Collects information
about individual user User Modeling side

Adaptive User Model


System
Adaptation side
Provides
adaptation effect

User profiling == user modeling


User Profile
• Customization
• Manual
• Limitations — user efforts (interest
change), no ordering
• Rule-based recommendation — history
based rules
User Profile
User Model Learning
• Classification problem
• Classifying to Like or Dislike
• Training data — feedbacks
• Probability of classification
• Unstructured data conversion
• Feature selection — high/low dimensions
User Model Learning
Training Data Target Data

Train/Learn
Classify/Recommen
UM Learning
Feedbacks
• Implicit feedback
• Indirect interaction
• Opened document, Reading time, etc.
• Large data, high uncertainty
UM Learning
Feedbacks
• Explicit

• Directly from users

• No noise, hard to
obtain
User Model Learning
Feature Selection
• Problem of high dimensional input vectors
• Overfit
(especially when a dataset is small)
• Document frequency thresholding,
Information gain, Mutual information, Chi
square statistic, Term strength
Overfitting

Overfit Underfit
User Model Learning
Feature Selection
• Mutual Information
• A = number of times t and c co-occur
B = number of times t occurs without c
C = number of times c occurs without t
N = number of total documents

A! N
I(t, c) = log
(A + C) ! (A + B)
User Model Learning
Feature Selection
• Austrian train fire accident
• After learning 5 documents
Fire
Train Alps Austria People

A 5 5 2 5 5
B 5873 8092 93 974 34501
C 0 0 3 0 0
Decision Tree

• Partitioning dataset into trees


• Ideal for structured, small data
• Performance, simplicity, understandability
• ID3 (Iterative Dichotomiser 3)
Decision Tree
Example
• Using WEKA
• http://www.cs.waikato.ac.nz/ml/weka/
• Machine learning algorithm package
• JAVA API, interactive UI
Decision Tree
Example - dataset
Training Testing

attribute as inputs attribute to be es


Decision Tree
Example - tree
Decision Tree
Example - tree
Tear Rate
Reduced Normal

Young
Age Pre-presbyoptic
Presbyoptic Yes
Yes
Prescription
Hypermetrope
Myope
es
Astigmatic
Yes No
Decision Tree
Example - evaluation
k Nearest Neighbor

• Prepare training data (classification labels)


• Extract k most similar items
• Decide the label of the test data by looking
at kNN s
k Nearest Neighbor
Example

k=3
k=5
Linear Classifier

raining data

• Tries to find out a hyperplane that best


separates classes
Linear Classifier
SVN
• Support Vector Machine
• Maximizes the distance between decision
boundary & support vector (closest
training instance)
• Avoids overfitting
Linear Classifier
SVM
Naive Bayes

• VSM — lack of theoretical justification


• Probabilistic text classification method
• Naive = term independence
• Probability document d is classified to
category c
Naive Bayes
• Multivariate Bernoulli
• Document probability = of term
probability 
term independence assumption 
naive
Naive Bayes
• Multinomial
• Non-binary
Naive Bayes
Example
Conclusions
• User model learns from content
(description, fulltext, etc) itself
• Implicit, explicit method
• Classifying — like, dislike
• Limitations — not enough content
• Hybrid — content + collaborative

You might also like