CH 2
CH 2
BTCOC503
Lecture
Topic to be covered
Number
2 ➢ Feature reduction
3 ➢ Collaborative filtering
6 ➢ Naïve Classifier
: Submitted by:
Prof. S.B.Mehta
--------------------------------------------------------------------------------------------------------------------------------------------------------
ASST PROF:- S.B.Mehta NCER, BATU UNIVERSITY, LONERE
DEPARTMENT OF
COMPUTER SCIENCE
Nutan College Of Engineering & Research, & ENGINEERING
Talegaon Dabhade, Pune- 410507
Machine Learning
2
Technique of Instance Based Learning
1.K-Nearest neighbor Learning
2.locally weighted regression
3.case-based reasoning
● K-Nearest Neighbors is one of the simplest Machine Learning algorithms based on Supervised
Learning technique.
● K-NN algorithm assumes the similarity between the new case/data and available cases and put the
new case into the category that is most similar to the available categories.
● K-NN algorithm stores all the available data and classifies a new data point based on the similarity.
This means when new data appears then it can be easily classified into a well suite category by using
K- NN algorithm.
● K-NN algorithm can be used for Regression as well as for Classification but mostly it is used for the
Classification problems.
● K-NN is a non-parametric algorithm, which means it does not make any assumption on underlying
data.
● It is also called a lazy learner algorithm because it does not learn from the training set immediately
instead it stores the dataset and at the time of classification, it performs an action on the dataset.
● KNN algorithm at the training phase just stores the dataset and when it gets new data, then it classifies
that data into a category that is much similar to the new data.
● Example: Suppose, we have an image of a creature that looks similar to cat and dog, but we want to
3
know either it is a cat or dog. So, for this identification, we can use the KNN algorithm, as it works
on a similarity measure. Our KNN model will find the similar features of the new data set to the cats
and dogs’ images and based on the most similar features it will put it in either cat or dog category.
The K-NN working can be explained on the basis of the below algorithm:
• Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
• Step-4: Among these k neighbors, count the number of the data points in each category.
• Step-5: Assign the new data points to that category for which the number of the neighbor is
maximum.
4
• Step-6: Our model is ready.
Example:
Suppose there are two categories, i.e., Category A and Category B, and we have a new data point x1, so
this data point will lie in which of these categories. To solve this type of problem, we need a K-NN
algorithm. With the help of K-NN, we can easily identify the category or class of a particular dataset.
Suppose we have a new data point and we need to put it in the required category.
Consider the below image:
Firstly, we will choose the number of neighbors, so we will choose the k=5.
• Next, we will calculate the Euclidean distance between the data points. The Euclidean distance is the
distance between two points, which we have already studied in geometry. It can be calculated as:
5
• By calculating the Euclidean distance, we got the nearest neighbors, as three nearest neighbors in
category A and two nearest neighbors in category B. Consider the below image:
• As we can see the 3 nearest neighbors are from category A, hence this new data point must belong
to category A.
The blue dots are the training data. We have a test point, and we want to predict the value of . Obviously,
fitting one line to this whole dataset will lead to a value that’s way off the real one. Let’s use this weighting
concept and only look at a few nearby points, and perform regressions using those nearby points only:
Well that’s significantly better. It looks like the predicted value of is something we’d expect given how
our curve looks. Let’s now go over the math for this, and see how we change standard linear regression to
this.
7
3.Case Based Reasoning (CBR) Classifier:
• Case-based reasoning (CBR), broadly construed, is the process of solving new problems based on the
solutions of similar past problems’ deals with very specific data from the previous situations, and
reuses results and experience to fit a new problem situation.
• CBR is a Problem-Solving Technique that matches a new case with previously solved case and it’s
solution. Both are stored in database.
How CBR works?
When a new case arises to classify, a Case-based Reasoner (CBR) will first check if an identical training
case exists. If one is found, then the accompanying solution to that case is returned. If no identical case is
found, then the CBR will search for training cases having components that are similar to those of the new
case. Conceptually, these training cases may be considered as neighbors of the new case. If cases are
represented as graphs, this involves searching for subgraphs that are similar to subgraphs within the new
case. The CBR tries to combine the solutions of the neighboring training cases to propose a solution for
the new case. If compatibilities arise with the individual solutions, then backtracking to search for other
solutions may be necessary. The CBR may employ background knowledge and problem-solving
strategies to propose a feasible solution.
• Remembering past experiences helps learners avoid repeating previous mistakes, and the reasoner can
discern what features of a problem are significant and focus on them.
• CBR is intuitive because it reflects how people work. Because no knowledge must be elicited to create
rules or methods, development is easier.
Another benefit is that systems learn by acquiring new cases through use, and this makes maintenance
easier. This makes development easier.
• Systems learn by acquiring new cases through
Disadvantages of CBR
Recommended System:
A Recommended system makes prediction based on users’ historical behaviors. Specifically, it’s to
predict user preference for a set of items based on past experience. To build a recommender system.
During the last few decades, with the rise of YouTube, Amazon, Netflix and many other such web
services, recommender systems have taken more and more place in our lives. From e-commerce (suggest
to buyers articles that could interest them) to online advertisement (suggest to users the right contents,
matching their preferences), recommender systems are today unavoidable in our daily online journeys.
In a very general way, recommender systems are algorithms aimed at suggesting relevant items to users
(items being movies to watch, text to read, products to buy or anything else depending on industries).
9
The most two popular approaches are:
1. Content-based Recommended System System
2. Collaborative Filtering Recommended System.
• Collaborative Filtering is a technique which is widely used in recommendation systems and is rapidly
advancing research area.
• Collaborative filtering models try to find similarities between items / users through commonly rated
/owned items.
• Collaborative Filtering is the process of filtering or evaluating items using the opinions of other
people. This filtering is done by using profiles. Collaborative filtering techniques collect and establish
10
profiles, and determine the relationships among the data according to similarity models. The possible
categories of the data in the profiles include user preferences, user behavior patterns, or item
properties.
• For each user, recommender systems recommend items based on how similar users liked the item.
• Example: Alice and Bob are users have similar interests in video games.
• Collaborative filtering is an unsupervised learning which we make predictions from ratings supplied
by people. Each row represents the ratings of movies from a person and each column indicates the
ratings of a movie.
• Collaborative filtering is a technique that can filter out items that a user might like on the basis of
reactions by similar users.
• It works by searching a large group of people and finding a smaller set of users with tastes similar to
a particular user. It looks at the items they like and combines them to create a ranked list of
suggestions.
• The functionalities of Collaborative filtering recommendations system can be stated as
Therefore, a single prediction request can afford a more expensive prediction calculation than a
11
recommendation request.
Collaborative filtering uses different methods to calculate the similarity between two products or two
users. In an item-based approach, a product is compared to other products. The more similar the
interactions of customers between these two products are, the more they fit together. With the user-based
approach, the same happens, but instead of products, customers are compared with each other. With the
help of the similarity matrix, a predict function can be used to create a predicted rating for each product
with which a customer has not yet interacted. Based on these predicted ratings, products can then be
recommended.
The two most popular collaborative filtering algorithms are categorized as:
1.Memory-based
2.Model-based.
1.Memory-based :
• Memory-based algorithms approach the collaborative filtering problem by using the entire
database .Memory-based techniques use the data (likes, votes, clicks, etc) that you have to
establish correlations (similarities?) between either users (Collaborative Filtering) or items
(Content-Based Recommendation) to recommend an item i to a user u who's never seen it before.
• Memory-based models calculate the similarities between users / items based on user-item rating
pairs.
• Memory based Recommendation generalizes from memory-based data at the time of making
memory-based learning it is also referred as lazy learning. In memory-based learning users are
divided into groups based on their interest. When a new user comes into system, we determine
neighbors of users to make predictions for him. Memory based recommendation uses entire or
sample of user item database to make predictions.
• The main idea behind UB-CF is that people with similar characteristics share similar taste. For
example, if you are interested in recommending a movie to our friend Bob, suppose Bob and I
have seen many movies together and we rated them almost identically. It makes sense to think
that in future as well we would continue to like similar movies and use this similarity metric to
recommend movies.
12
The two approaches
User-based:
• In user-based, similar users which have similar ratings for similar items are found and then target
user's rating for the item which target user has never interacted is predicted.
• user based finds similar users and gives them recommendations based on what other people with
similar consumption patterns appreciated.
• The report is focusing on the “nearest neighbors” approach for recommendations, which looks at
the users rating patterns and finds the “nearest neighbors”, i.e users with ratings similar to yours.
The algorithm then proceeds to give you recommendations based on the ratings of these neighbors.
• For a user U, with a set of similar users determined based on rating vectors consisting of given
item ratings, the rating for an item I, which hasn’t been rated, is found by picking out N users from
the similarity list who have rated the item I and calculating the rating based on these N ratings.
Item-based:
• Item based collaborative filtering finds similarity patterns between items and recommends them
to users based on the computed information
• Item based collaborative filtering was introduced 1998 by Amazon[6]. Unlike user based
collaborative filtering, item based filtering looks at the similarity between different items,and does
this by taking note of how many users that bought item X also bought item Y. If the correlation is
high enough, a similarity can be presumed to exist between the two items, and they can be assumed
to be similar to one another. Item Y will from there on be recommended to users who bought item
X and vice versa.
The picture depicts a graph of how users ratings affect their recommendations
Amazon currently uses item-to-item collaborative filtering, which scales to massive data sets and
produces high-quality recommendations in real time. This type of filtering matches each of the user's
13
purchased and rated items to similar items, then combines those similar items into a recommendation list
for the user
For an item I, with a set of similar items determined based on rating vectors consisting of received user
ratings, the rating by a user U, who hasn’t rated it, is found by picking out N items from the similarity list
that have been rated by U and calculating the rating based on these N ratings.
2.Model-based.:
• Model-based recommendation systems involve building a model based on the dataset of ratings.
In other words, we extract some information from the dataset, and use that as a "model" to make
recommendations without having to use the complete dataset every time. This approach
potentially offers the benefits of both speed and scalability.
• Model based collaborative filtering is a two stage process for recommendations in the first stage
model is
• learned offline in the second stage a recommendation is generated for a new user based on the
learned model.
• Model-based techniques on the other hand try to further fill out this matrix. They tackle the task
of “guessing” how much a user will like an item that they did not encounter before. For that they
utilize several machine learning algorithms to train on the vector of items for a specific user, then
they can build a model that can predict the user’s rating for a new item that has just been added to
the system.
• Popular model-based techniques are Bayesian Networks, Singular Value Decomposition, and
Probabilistic Latent Semantic Analysis (or Probabilistic Latent Semantic Indexing). For some
reason, all model-based techniques do not enjoy particularly happy-sounding names.
Features Reduction:
• Feature reduction, also known as dimensionality reduction, is the process of reducing the number
of features in a resource heavy computation without losing important information.
• Reducing the number of features means the number of variables is reduced making the computer’s
work easier and faster.
• In machine learning classification problems, there are often too many factors on the basis of which
the final classification is done. These factors are basically variables called features.
• The higher the number of features, the harder it gets to visualize the training set and then work on
it. Sometimes, most of these features are correlated, and hence redundant. This is where
dimensionality reduction algorithms come into play. Dimensionality reduction is the process of
reducing the number of random variables under consideration, by obtaining a set of principal
variables.
Feature reduction can be divided into two processes:
14
1.Feature selection:
Feature selection is the process of reducing the number of input variables when developing a predictive
model.
It is desirable to reduce the number of input variables to both reduce the computational cost of modeling
and, in some cases, to improve the performance of the model.
In this, we try to find a subset of the original set of variables, or features, to get a smaller subset which
can be used to model the problem.
It usually involves three ways:
1. Filter: Select subsets of features based on their relationship with the target. Feature Importance
Methods
Filter methods are generally used as a preprocessing step. The selection of features is independent of any
machine learning algorithms. Instead, features are selected on the basis of their scores in various statistical
tests for their correlation with the outcome variable. The correlation is a subjective term here. For basic
guidance, you can refer to the following table for defining correlation co-efficients
2. Wrapper: Search for well-performing subsets of features
In wrapper methods, we try to use a subset of features and train a model using them. Based on the
inferences that we draw from the previous model; we decide to add or remove features from your
subset. The problem is essentially reduced to a search problem. These methods are usually
computationally very expensive.
Some common examples of wrapper methods are forward feature selection, backward feature
elimination, recursive feature elimination, etc.
15
• Forward Selection: Forward selection is an iterative method in which we start with having no
feature in the model. In each iteration, we keep adding the feature which best improves our model
till an addition of a new variable does not improve the performance of the model.
• Backward Elimination: In backward elimination, we start with all the features and removes the
least significant feature at each iteration which improves the performance of the model. We repeat
this until no improvement is observed on removal of features.
• Recursive Feature elimination: It is a greedy optimization algorithm which aims to find the best
performing feature subset. It repeatedly creates models and keeps aside the best or the worst
performing feature at each iteration. It constructs the next model with the left features until all the
features are exhausted. It then ranks the features based on the order of their elimination.
3. Embedded: Embedded methods combine the qualities’ of filter and wrapper methods. It’s
implemented by algorithms that have their own built-in feature selection methods
The key difference between feature selection and extraction is that feature selection keeps a subset of the
original features while feature extraction creates brand new ones.
Top reasons to use feature selection are:
• It enables the machine learning algorithm to train faster.
• It reduces the complexity of a model and makes it easier to interpret.
• It improves the accuracy of a model if the right subset is chosen.
• It reduces overfitting.
2.Feature extraction:
Feature Extraction aims to reduce the number of features in a dataset by creating new features from the
existing ones (and then discarding the original features). These new reduced set of features should then
be able to summarize most of the information contained in the original set of features.
It follows Technique:
Principle Component Technique:
• Principle Component Analysis (PCA) is a common feature extraction method in data
science. Principal Component Analysis, or PCA, is a dimensionality-reduction method that is
often used to reduce the dimensionality of large data sets, by transforming a large set of variables
into a smaller one that still contains most of the information in the large set.
• PCA is a statistical procedure that orthogonally transforms the original n coordinates of a data set
into a new set of n coordinates called principal components.
16
• PCA is standard tool in modern data analysis in diverse fields from neuroscience to computer
graphics.
• It is very useful method for extracting relevant information from confusing data sets.
• Principle Component Analysis (PCA) is a common feature extraction method in data science.
Technically, PCA finds the eigenvectors of a covariance matrix with the highest eigenvalues and
then uses those to project the data into a new subspace of equal or less dimensions. Practically,
PCA converts a matrix of n features into a new dataset of (hopefully) less than n features. That is,
it reduces the number of features by constructing a new, smaller number variables which capture
a significant portion of the information found in the original features. However, the goal of this
tutorial is not to explain the concept of PCA, that is done very well elsewhere, but rather to
demonstrate PCA in action.
• Goals of PCA Analysis is to identify patterns in data, to detect the co-rrelation between variables.
It attempt to reduce the dimensionality.
17
Naive Bayes’ Classifiers
• Naive Bayes Classifier is a supervised machine-learning algorithm that uses the Bayes’ Theorem,
which assumes that features are statistically independent. The theorem relies on the naive
assumption that input variables are independent of each other, i.e. there is no way to know
anything about other variables when given an additional variable. Regardless of this assumption,
it has proven itself to be a classifier with good results.
• Naive Bayes’ Classifiers are a set of probabilistic classifiers based on the Bayes’ Theorem. The
underlying assumption of these classifiers is that all the features used for classification are
independent of each other. That’s where the name ‘naive’ comes in since it is rare that we obtain
a set of totally independent features.
• Naïve Bayes Classifier Algorithm is used for Classification. This Algorithm Learn the Probability
of ab object with certain features belonging to a particular groups/Class.
• For instance: If you are trying to identify a fruit based on its color, shape and taste then or orange
colored, Spherical and tangy fruit would most likely be an orange.
• All of these properties individually contribute to the probability that this fruit is an orange and that
is why it is known as naïve,
What Is the Bayes’ Theorem?
Naive Bayes Classifiers rely on the Bayes’ Theorem, which is based on conditional probability or in
simple terms, the likelihood that an event (A) will happen given that another event (B) has already
happened. Essentially, the theorem allows a hypothesis to be updated each time new evidence is
introduced.
Bayes’ Theorem is used for calculating the probability of a hypothesis (H) being true (i.e. having the
disease) given that a certain event (E) has happened (being diagnosed positive of this disease in the test).
This calculation is described using the following formulation:
Finding the probability of
Event, A, when event B is
given
18
• P(B) = The probability of event A (evidence) occurring. P(B) is called the marginal likelihood; this
is the total probability of observing the evidence.
Example: Picnic Day
You are planning a picnic today, but the morning is cloudy
• Oh no! 50% of all rainy days start off cloudy!
• But cloudy mornings are common (about 40% of days start cloudy)
• And this is usually a dry month (only 3 of 30 days tend to be rainy, or 10%)
What is the chance of rain during the day?
(Subject In-charge)
(Prof.S.B.Mehta)
19