Feature Reduction
● For given Test data instance , we have to find out near by
instance, for this we need distance function.
● This distance function computed in the term of features.
● If number of features are large , there is a problem ,
because distance may not be represent actual distance.
● As feature contained information about target
● More feature means more information and better
classification
● But , we have to properly handle
● - Irrelevant Feature : As it create noise
● - Redundant Feature : It lead to degradation of
performance of algorithm .
Feature Reduction
Feature Reduction: Type 1
● Feature Selection :
Let , F = { X1,X2,X3…….Xn}
F’ Ƈ F ={X1’,X2’,X3’……..Xm}
where F’ is a subset of F
● Here , to find subset of feature to optimize
certain criteria.
Feature Reduction: Type 2
● Feature Extraction:
● It is a transform of the original set of feature
into a new subset , which has a small
number of dimension.
Feature Reduction
● Forward Selection:
● We started with empty set of feature set and add one
feature at a time.
● Then try each of remaining features.
● Estimate classification/ regression error for adding each
features.
● Selected a feature that gives maximum improvement
● Stop , when there is no significant improvement
Feature Reduction
● Backward Selection:
● We started with full feature set
● Then try remaining features
● Drop a feature with smallest improvement /impact on
error
Feature Reduction
Filter Methods
1. IG- Information Gain Method
2. Chi-squre Test
3. Correlation Coef.
Wrapper Methods
1. Recursive feature elimination
2. Genetic Algorithms
Embedded Methods
1. Decision Tree
Selection of Optimal Feature / Correlation Coef. Method
A B C D E T
Step1: Find the correlation of attribute like A with target attribute T
Wrapper Methods
A B C D E T
Here , we have to make a combination (wrapping ) of attributes with target
attribute.
Let us consider , attribute A and target attribute T will be provide to Machine
Learning Algorithm which developed model M1.
Now , consider attribute A and B combine with T and provide to machine learning
algorithm , which develop model M2.
Now, A , B and C combine with T and provide to machine learning algorithm ,
which develop model M3 and so on .
Then find the best Model match with Objective.
N features, 2N possible feature subsets!
Feature Reduction Steps
Feature Reduction Steps(cont’d)
Evaluating feature subset
Feature selection
Pearson correlation coefficient
Pearson correlation coefficient
Signal to noise ratio
Multivariate feature selection
Multivariate feature selection
Collaborative Filtering Based Recommendation
System
● It is instance based learning
Recommendation Systems
Why there is a need?
Types of Recommendation System
Techniques : Data Acquisition
01 02 03
1. Explicit Data 2. Implicit Data 3. Product Information
- Customer Ratings - Purchase History - Product Taxonomy
- Feedback Click or Browse History - Product Attributes
- Demographics - Product Descriptions
Physiographics
Techniques : Recommendation Generation
1. Collaborative Filtering method finds a subset of users who have
similar tastes and preferences to the target user and use this
subset for offering recommendations.
Basic Assumptions :
- Users with similar interests have common preferences.
- Sufficiently large number of user preferences are available.
Main Approaches :
- User Based
- Item Based
Techniques : Recommendation Generation
Types of Recommendation System
Types of Recommendation System
User Based Collaborative Filtering The sparsity problem occurs when transactional or
● Advantage : feedback data is sparse and insufficient for
identifying neighbors and it is a major issue limiting
- No knowledge about item features needed the quality of recommendations and the
● Problems : applicability of collaborative filtering in general.
- New user cold start problem
- New item cold start problem: items with few rating cannot easily be
recommended
- Sparsity problem: If there are many items to be recommended,
user/rating matrix is sparse and it hard to find the users who have
rated the same item.
- Popularity Bias: Tend to recommend only popular items.
e.g. RINGO, GroupLens
Types of Recommendation System
Types of Recommendation System
Types of Recommendation System
Item Based Collaborative Filtering
● Advantage :
- No knowledge about item features needed
- Better scalability, because correlations between limited number of
items instead of very large number of users
- Reduced sparsity problem
● Problems :
- New user cold start problem
- New item cold start problem
e.g. Amazon, eBay
Types of Recommendation System
2. Recommendations are based on the content of items rather on other
user's opinion.
User Profiles: Create user profiles to describe the types of items that user
prefers.
e.g. User1 likes sci-fi, action and comedy.
Recommendation on the basis of keywords are also classified under content
based.
e.g. IMDB, Last.fm
Types of Recommendation System
Content Based Systems Cont'd...
Advantage :
- No need for data on other users. No cold start and sparsity.
- Able to recommend users with unique taste.
- Able to recommend new and unpopular items.
- Can provide explanation for recommendation.
Limitations:
- Data should be in structured format.
- Unable to use quality judgments from other users
Types of Recommendation System
Collaborative filtering based recommendation system
Let u is a set of users and s is a set of items.
P is utility function that finds out rating of item by users
P: u X s R
Its indicating that for a user u and for item s , the rating of the
user for that item.
Now, Learn P from data.
Where training data set is past rating of users for rating
prediction problem, or past purchase history .
Based on P to predict utility value of each item to each user.
Collaborative filtering based recommendation system
Collaborative filtering based recommendation system
In the previous phase , we have to find the similar user ,using KNN
algorithms and now
Collaborative filtering based recommendation system
● One of the drawback of user based
recommendation system is , when number of
users increased , then it is very difficult to handle
it.
● Amazon has Million of users.
Collaborative filtering based recommendation system
Collaborative filtering based recommendation system
Collaborative filtering based recommendation system
Collaborative Filtering
Finding Similar Users
Jaccard Similarity
Jaccard Similarity
Jaccard Similarity
Cosine Similarity
Cosine Similarity: To Solve
Cosine Similarity
Cosine Similarity
● In the above example, there is only one common rating between A and B
Cosine Similarity
● In this example let us consider A and C , they having two common rating ie. WT1 and TM1
Cosine Similarity
● In this example let us consider A and C , they having two common rating ie. WT1 and TM1
Conclusion From Results
Centered Cosine
Centered Cosine
Centered Cosine
Centered Cosine
Rating Prediction
In first condition, we have find out
Average rating from neighbor
Rating Prediction
Item-Item CF
Item-Item CF
Item-Item CF
Item-Item CF
Item-Item CF
Thank You