0% found this document useful (0 votes)
11 views30 pages

CH 2

The document outlines the curriculum for Unit 2 of a Machine Learning course, focusing on Instance Based Learning, which includes topics such as K-Nearest Neighbors, locally weighted regression, and case-based reasoning. It discusses the principles of instance-based learning, its advantages, and techniques for implementing it, including collaborative filtering for recommendation systems. The document also highlights the differences between content-based and collaborative filtering methods in recommender systems.

Uploaded by

Rajat Joshi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views30 pages

CH 2

The document outlines the curriculum for Unit 2 of a Machine Learning course, focusing on Instance Based Learning, which includes topics such as K-Nearest Neighbors, locally weighted regression, and case-based reasoning. It discusses the principles of instance-based learning, its advantages, and techniques for implementing it, including collaborative filtering for recommendation systems. The document also highlights the differences between content-based and collaborative filtering methods in recommender systems.

Uploaded by

Rajat Joshi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

NUTAN MAHARASHTRA VIDYA PRASARAK MANDAL’S

NUTAN COLLEGE OF ENGINEERING & RESEARCH (NCER)


Department of Computer Science & Engineering
----------------------------------------------------------------------------------------------------------------------------------------------------

BTCOC503

Lecture
Topic to be covered
Number

Unit 2: Instance Based Learning (06Hrs)


1
➢ Instance based learning

2 ➢ Feature reduction

3 ➢ Collaborative filtering

4 ➢ Collaborative filtering-based recommendation

5 ➢ Probability & Bayes Thermos

6 ➢ Naïve Classifier

: Submitted by:
Prof. S.B.Mehta

--------------------------------------------------------------------------------------------------------------------------------------------------------
ASST PROF:- S.B.Mehta NCER, BATU UNIVERSITY, LONERE
DEPARTMENT OF
COMPUTER SCIENCE
Nutan College Of Engineering & Research, & ENGINEERING
Talegaon Dabhade, Pune- 410507
Machine Learning

Unit 2: Instance Based Learning

Instance Based Learning


• Instance: An instance is an example in the training data. An instance is described by a number of
attributes. One attribute can be a class label. Attribute/Feature: An attribute is an aspect of an instance
(e.g. temperature, humidity). Attributes are often called features in Machine Learning
• In machine learning, instance-based learning (sometimes called memory-based learning) is a family
of learning algorithms that, instead of performing explicit generalization, compares new problem
instances with instances seen in training, which have been stored in memory.
• Instance-based methods are sometimes referred to as lazy learning methods because they delay
processing until a new instance must be classified.
• Also known as Memory based learning, Instance based learning is a supervised classification learning
algorithm that performs operation after comparing the current instances with the previously trained
instances, which have been stored in memory. Its name is derived from the fact that it creates
assumption from the training data instances.
• Time complexity of Instance based learning algorithm depends upon the size of training data. Time
complexity of this algorithm in worst case is O (n), where n is the number of training items to be used
to classify a single new instance.
• To improve the efficiency of instance-based learning approach, preprocessing phase is required.
Preprocessing phase is a data structure that enables efficient usage of run time modeling of test
instance.
• Advantage of using Instance based learning over others is that it has the ability to adapt to previously
unseen data, which means that one can store a new instance or drop the old instance.
Example : Spam Email : It use the email flags to measure the similarity between two mails. Similarity
measure between two emails could beto count the no. of words they have common. The System would flag
an email as spam if it has many words in common with a known email.

2
Technique of Instance Based Learning
1.K-Nearest neighbor Learning
2.locally weighted regression
3.case-based reasoning

1.K-Nearest neighbor Learning

● K-Nearest Neighbors is one of the simplest Machine Learning algorithms based on Supervised
Learning technique.

● K-NN algorithm assumes the similarity between the new case/data and available cases and put the
new case into the category that is most similar to the available categories.

● K-NN algorithm stores all the available data and classifies a new data point based on the similarity.
This means when new data appears then it can be easily classified into a well suite category by using
K- NN algorithm.

● K-NN algorithm can be used for Regression as well as for Classification but mostly it is used for the
Classification problems.

● K-NN is a non-parametric algorithm, which means it does not make any assumption on underlying
data.

● It is also called a lazy learner algorithm because it does not learn from the training set immediately
instead it stores the dataset and at the time of classification, it performs an action on the dataset.

● KNN algorithm at the training phase just stores the dataset and when it gets new data, then it classifies
that data into a category that is much similar to the new data.

● Example: Suppose, we have an image of a creature that looks similar to cat and dog, but we want to

3
know either it is a cat or dog. So, for this identification, we can use the KNN algorithm, as it works
on a similarity measure. Our KNN model will find the similar features of the new data set to the cats
and dogs’ images and based on the most similar features it will put it in either cat or dog category.

How does K-NN work?

The K-NN working can be explained on the basis of the below algorithm:

• Step-1: Select the number K of the neighbors

• Step-2: Calculate the Euclidean distance of K number of neighbors

• Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.

• Step-4: Among these k neighbors, count the number of the data points in each category.

• Step-5: Assign the new data points to that category for which the number of the neighbor is
maximum.
4
• Step-6: Our model is ready.

Example:

Suppose there are two categories, i.e., Category A and Category B, and we have a new data point x1, so
this data point will lie in which of these categories. To solve this type of problem, we need a K-NN
algorithm. With the help of K-NN, we can easily identify the category or class of a particular dataset.

Consider the below diagram:

Suppose we have a new data point and we need to put it in the required category.
Consider the below image:

Firstly, we will choose the number of neighbors, so we will choose the k=5.
• Next, we will calculate the Euclidean distance between the data points. The Euclidean distance is the
distance between two points, which we have already studied in geometry. It can be calculated as:

5
• By calculating the Euclidean distance, we got the nearest neighbors, as three nearest neighbors in
category A and two nearest neighbors in category B. Consider the below image:

• As we can see the 3 nearest neighbors are from category A, hence this new data point must belong
to category A.

How to select the value of K in the K-NN Algorithm?


Below are some points to remember while selecting the value of K in the K-NN algorithm:
• There is no particular way to determine the best value for "K", so we need to try some values to find
the best out of them. The most preferred value for K is 5.
• A very low value for K such as K=1 or K=2, can be noisy and lead to the effects of outliers in the
model.
• Large values for K are good, but it may find some difficulties.

Advantages of KNN Algorithm:


• It is simple to implement.
• It is robust to the noisy training data
• It can be more effective if the training data is large.
6
Disadvantages of KNN Algorithm:
• Always needs to determine the value of K which may be complex some time.
• The computation cost is high because of calculating the distance between the data points for all the
training samples.
2. Locally weighted regression
• Locally weighted regression (LWR) attempts to fit the training data only in a region around the
location of a query example. LWR is a type of lazy learning, therefore the processing of training data
is often postponed until the target value of a query example needs to be predicted
• Locally weighted regression is also called LOESS or LOWESS. It’s inspired by cases when linear
regression, which simply fits a line, isn’t sufficient, but we don’t want to overfit either
• Locally weighted linear regression is a non-parametric algorithm, that is, the model does not learn a
fixed set of parameters as is done in ordinary linear regression
• LWR depends on the distance function used to recover the nearest neighbours of a given query
example. However, the distance function does not need to satisfy the formal mathematical
requirements for a distance metric 2. LWR enables several ways to use a distance function 2, for
instance: (I) one distance function is used in all parts of the input space (global distance function), (II)
the parameters of a distance function are set for each query example by an optimization process
(query-based local distance function), or (III) each training example has a distance function and its
corresponding parameter values (point-based local distance function).

The blue dots are the training data. We have a test point, and we want to predict the value of . Obviously,
fitting one line to this whole dataset will lead to a value that’s way off the real one. Let’s use this weighting
concept and only look at a few nearby points, and perform regressions using those nearby points only:

Well that’s significantly better. It looks like the predicted value of is something we’d expect given how
our curve looks. Let’s now go over the math for this, and see how we change standard linear regression to
this.
7
3.Case Based Reasoning (CBR) Classifier:
• Case-based reasoning (CBR), broadly construed, is the process of solving new problems based on the
solutions of similar past problems’ deals with very specific data from the previous situations, and
reuses results and experience to fit a new problem situation.
• CBR is a Problem-Solving Technique that matches a new case with previously solved case and it’s
solution. Both are stored in database.
How CBR works?
When a new case arises to classify, a Case-based Reasoner (CBR) will first check if an identical training
case exists. If one is found, then the accompanying solution to that case is returned. If no identical case is
found, then the CBR will search for training cases having components that are similar to those of the new
case. Conceptually, these training cases may be considered as neighbors of the new case. If cases are
represented as graphs, this involves searching for subgraphs that are similar to subgraphs within the new
case. The CBR tries to combine the solutions of the neighboring training cases to propose a solution for
the new case. If compatibilities arise with the individual solutions, then backtracking to search for other
solutions may be necessary. The CBR may employ background knowledge and problem-solving
strategies to propose a feasible solution.

Case-based reasoning consists of a cycle of the following four steps:


1. Retrieve: Gathering from memory an experience closest to the current problem. i.e. Given a new
case, retrieve similar cases from the case base.
2. Reuse: Suggesting a solution based on the experience and adapting it to meet the demands of the
new situation Adapt the retrieved cases to fit to the new case.
3. Revise: Evaluate the solution and revise it based on how well it works.
4. Retain: Storing this new problem-solving method in the memory system.
8
If the case retrieved works for the current situation, it should be used. Otherwise, it may need
to be adapted. The revision may involve other reasoning techniques, such as using the proposed
solution as a starting point to search for a solution, or a human could do the adaptation in an interactive
system. The new case and the solution can then be saved if retaining it will help in the future.
Applications of CBR includes:
1. Problem resolution for customer service help desks, where cases describe product-related diagnostic
problems.
2. It is also applied to areas such as engineering and law, where cases are either technical designs or
legal rulings, respectively.
3. Medical education, where patient case histories and treatments are used to help diagnose and treat
new patients.
Advantages of CBR

• Remembering past experiences helps learners avoid repeating previous mistakes, and the reasoner can
discern what features of a problem are significant and focus on them.
• CBR is intuitive because it reflects how people work. Because no knowledge must be elicited to create
rules or methods, development is easier.
Another benefit is that systems learn by acquiring new cases through use, and this makes maintenance
easier. This makes development easier.
• Systems learn by acquiring new cases through

Disadvantages of CBR

• Can take large storage space for all the cases


• Can take large processing time to find similar cases in case-base
• Cases may need to be created by hand
• Adaptation may be difficult
• Needs case-base, case selection algorithm, and possibly case-adaptation algorithm

Recommended System:
A Recommended system makes prediction based on users’ historical behaviors. Specifically, it’s to
predict user preference for a set of items based on past experience. To build a recommender system.
During the last few decades, with the rise of YouTube, Amazon, Netflix and many other such web
services, recommender systems have taken more and more place in our lives. From e-commerce (suggest
to buyers articles that could interest them) to online advertisement (suggest to users the right contents,
matching their preferences), recommender systems are today unavoidable in our daily online journeys.
In a very general way, recommender systems are algorithms aimed at suggesting relevant items to users
(items being movies to watch, text to read, products to buy or anything else depending on industries).

9
The most two popular approaches are:
1. Content-based Recommended System System
2. Collaborative Filtering Recommended System.

1. Content-based Recommended System System


• Content based filtering algorithms are based on the assumption that users are going to give similar
rating to object with similar objective features.
• A Content-based recommendation system tries to recommend items to users based on their profile.
The user's profile revolves around that user's preferences and tastes. It is shaped based on user ratings,
including the number of times that user has clicked on different items or perhaps even liked those
items.
• content based approaches use additional information about users and/or items. If we consider the
example of a movies recommender system, this additional information can be, for example, the age,
the sex, the job or any other personal information for users as well as the category, the main actors,
the duration or other characteristics for the movies (items).

2. Collaborative Filtering Recommended System:

• Collaborative Filtering is a technique which is widely used in recommendation systems and is rapidly
advancing research area.
• Collaborative filtering models try to find similarities between items / users through commonly rated
/owned items.
• Collaborative Filtering is the process of filtering or evaluating items using the opinions of other
people. This filtering is done by using profiles. Collaborative filtering techniques collect and establish
10
profiles, and determine the relationships among the data according to similarity models. The possible
categories of the data in the profiles include user preferences, user behavior patterns, or item
properties.
• For each user, recommender systems recommend items based on how similar users liked the item.
• Example: Alice and Bob are users have similar interests in video games.
• Collaborative filtering is an unsupervised learning which we make predictions from ratings supplied
by people. Each row represents the ratings of movies from a person and each column indicates the
ratings of a movie.
• Collaborative filtering is a technique that can filter out items that a user might like on the basis of
reactions by similar users.
• It works by searching a large group of people and finding a smaller set of users with tastes similar to
a particular user. It looks at the items they like and combines them to create a ranked list of
suggestions.
• The functionalities of Collaborative filtering recommendations system can be stated as

A. Recommendations and predictions


1) Recommendation
Recommendations functionality displays a list of items to a user. The items are listed in the order of
usefulness
to the user. For example, Amazon’s recommendation algorithm aggregates items similar to a user’s
purchases
and ratings without ever computing a predicted rating.
2) Prediction
In prediction a calculation of predicted rating is made for a particular item. Prediction is more demanding
that
recommendations because in order to make predictions the system must be able to say something about
required
item. Some algorithms take advantage of this to be more scalable by saving memory and computation
time.
B. Prediction versus Recommendation
• Prediction and Recommendation tasks place different requirements on a CF system.To recommend
items, information regarding all items is not required.
• To provide predictions for a particular item, information regarding every item, even rarely rated ones
is required
• The Algorithms used for recommendations have less memory and computation time requirements
when
• compared to algorithms used for making predictions.
• Recommendation tasks require calculation of predictions or some scoring function for many (if not
all) items.

Therefore, a single prediction request can afford a more expensive prediction calculation than a
11
recommendation request.

User and Item based collaborative filtering

Collaborative filtering uses different methods to calculate the similarity between two products or two
users. In an item-based approach, a product is compared to other products. The more similar the
interactions of customers between these two products are, the more they fit together. With the user-based
approach, the same happens, but instead of products, customers are compared with each other. With the
help of the similarity matrix, a predict function can be used to create a predicted rating for each product
with which a customer has not yet interacted. Based on these predicted ratings, products can then be
recommended.

The two most popular collaborative filtering algorithms are categorized as:
1.Memory-based
2.Model-based.

1.Memory-based :
• Memory-based algorithms approach the collaborative filtering problem by using the entire
database .Memory-based techniques use the data (likes, votes, clicks, etc) that you have to
establish correlations (similarities?) between either users (Collaborative Filtering) or items
(Content-Based Recommendation) to recommend an item i to a user u who's never seen it before.
• Memory-based models calculate the similarities between users / items based on user-item rating
pairs.
• Memory based Recommendation generalizes from memory-based data at the time of making
memory-based learning it is also referred as lazy learning. In memory-based learning users are
divided into groups based on their interest. When a new user comes into system, we determine
neighbors of users to make predictions for him. Memory based recommendation uses entire or
sample of user item database to make predictions.
• The main idea behind UB-CF is that people with similar characteristics share similar taste. For
example, if you are interested in recommending a movie to our friend Bob, suppose Bob and I
have seen many movies together and we rated them almost identically. It makes sense to think
that in future as well we would continue to like similar movies and use this similarity metric to
recommend movies.
12
The two approaches
User-based:
• In user-based, similar users which have similar ratings for similar items are found and then target
user's rating for the item which target user has never interacted is predicted.
• user based finds similar users and gives them recommendations based on what other people with
similar consumption patterns appreciated.
• The report is focusing on the “nearest neighbors” approach for recommendations, which looks at
the users rating patterns and finds the “nearest neighbors”, i.e users with ratings similar to yours.
The algorithm then proceeds to give you recommendations based on the ratings of these neighbors.

• For a user U, with a set of similar users determined based on rating vectors consisting of given
item ratings, the rating for an item I, which hasn’t been rated, is found by picking out N users from
the similarity list who have rated the item I and calculating the rating based on these N ratings.

Item-based:
• Item based collaborative filtering finds similarity patterns between items and recommends them
to users based on the computed information
• Item based collaborative filtering was introduced 1998 by Amazon[6]. Unlike user based
collaborative filtering, item based filtering looks at the similarity between different items,and does
this by taking note of how many users that bought item X also bought item Y. If the correlation is
high enough, a similarity can be presumed to exist between the two items, and they can be assumed
to be similar to one another. Item Y will from there on be recommended to users who bought item
X and vice versa.

The picture depicts a graph of how users ratings affect their recommendations
Amazon currently uses item-to-item collaborative filtering, which scales to massive data sets and
produces high-quality recommendations in real time. This type of filtering matches each of the user's
13
purchased and rated items to similar items, then combines those similar items into a recommendation list
for the user
For an item I, with a set of similar items determined based on rating vectors consisting of received user
ratings, the rating by a user U, who hasn’t rated it, is found by picking out N items from the similarity list
that have been rated by U and calculating the rating based on these N ratings.

2.Model-based.:
• Model-based recommendation systems involve building a model based on the dataset of ratings.
In other words, we extract some information from the dataset, and use that as a "model" to make
recommendations without having to use the complete dataset every time. This approach
potentially offers the benefits of both speed and scalability.
• Model based collaborative filtering is a two stage process for recommendations in the first stage
model is
• learned offline in the second stage a recommendation is generated for a new user based on the
learned model.
• Model-based techniques on the other hand try to further fill out this matrix. They tackle the task
of “guessing” how much a user will like an item that they did not encounter before. For that they
utilize several machine learning algorithms to train on the vector of items for a specific user, then
they can build a model that can predict the user’s rating for a new item that has just been added to
the system.
• Popular model-based techniques are Bayesian Networks, Singular Value Decomposition, and
Probabilistic Latent Semantic Analysis (or Probabilistic Latent Semantic Indexing). For some
reason, all model-based techniques do not enjoy particularly happy-sounding names.

Features Reduction:
• Feature reduction, also known as dimensionality reduction, is the process of reducing the number
of features in a resource heavy computation without losing important information.
• Reducing the number of features means the number of variables is reduced making the computer’s
work easier and faster.
• In machine learning classification problems, there are often too many factors on the basis of which
the final classification is done. These factors are basically variables called features.
• The higher the number of features, the harder it gets to visualize the training set and then work on
it. Sometimes, most of these features are correlated, and hence redundant. This is where
dimensionality reduction algorithms come into play. Dimensionality reduction is the process of
reducing the number of random variables under consideration, by obtaining a set of principal
variables.
Feature reduction can be divided into two processes:

14
1.Feature selection:
Feature selection is the process of reducing the number of input variables when developing a predictive
model.
It is desirable to reduce the number of input variables to both reduce the computational cost of modeling
and, in some cases, to improve the performance of the model.
In this, we try to find a subset of the original set of variables, or features, to get a smaller subset which
can be used to model the problem.
It usually involves three ways:
1. Filter: Select subsets of features based on their relationship with the target. Feature Importance
Methods

Filter methods are generally used as a preprocessing step. The selection of features is independent of any
machine learning algorithms. Instead, features are selected on the basis of their scores in various statistical
tests for their correlation with the outcome variable. The correlation is a subjective term here. For basic
guidance, you can refer to the following table for defining correlation co-efficients
2. Wrapper: Search for well-performing subsets of features

In wrapper methods, we try to use a subset of features and train a model using them. Based on the
inferences that we draw from the previous model; we decide to add or remove features from your
subset. The problem is essentially reduced to a search problem. These methods are usually
computationally very expensive.
Some common examples of wrapper methods are forward feature selection, backward feature
elimination, recursive feature elimination, etc.

15
• Forward Selection: Forward selection is an iterative method in which we start with having no
feature in the model. In each iteration, we keep adding the feature which best improves our model
till an addition of a new variable does not improve the performance of the model.
• Backward Elimination: In backward elimination, we start with all the features and removes the
least significant feature at each iteration which improves the performance of the model. We repeat
this until no improvement is observed on removal of features.
• Recursive Feature elimination: It is a greedy optimization algorithm which aims to find the best
performing feature subset. It repeatedly creates models and keeps aside the best or the worst
performing feature at each iteration. It constructs the next model with the left features until all the
features are exhausted. It then ranks the features based on the order of their elimination.

3. Embedded: Embedded methods combine the qualities’ of filter and wrapper methods. It’s
implemented by algorithms that have their own built-in feature selection methods

The key difference between feature selection and extraction is that feature selection keeps a subset of the
original features while feature extraction creates brand new ones.
Top reasons to use feature selection are:
• It enables the machine learning algorithm to train faster.
• It reduces the complexity of a model and makes it easier to interpret.
• It improves the accuracy of a model if the right subset is chosen.
• It reduces overfitting.
2.Feature extraction:
Feature Extraction aims to reduce the number of features in a dataset by creating new features from the
existing ones (and then discarding the original features). These new reduced set of features should then
be able to summarize most of the information contained in the original set of features.
It follows Technique:
Principle Component Technique:
• Principle Component Analysis (PCA) is a common feature extraction method in data
science. Principal Component Analysis, or PCA, is a dimensionality-reduction method that is
often used to reduce the dimensionality of large data sets, by transforming a large set of variables
into a smaller one that still contains most of the information in the large set.
• PCA is a statistical procedure that orthogonally transforms the original n coordinates of a data set
into a new set of n coordinates called principal components.
16
• PCA is standard tool in modern data analysis in diverse fields from neuroscience to computer
graphics.
• It is very useful method for extracting relevant information from confusing data sets.
• Principle Component Analysis (PCA) is a common feature extraction method in data science.
Technically, PCA finds the eigenvectors of a covariance matrix with the highest eigenvalues and
then uses those to project the data into a new subspace of equal or less dimensions. Practically,
PCA converts a matrix of n features into a new dataset of (hopefully) less than n features. That is,
it reduces the number of features by constructing a new, smaller number variables which capture
a significant portion of the information found in the original features. However, the goal of this
tutorial is not to explain the concept of PCA, that is done very well elsewhere, but rather to
demonstrate PCA in action.
• Goals of PCA Analysis is to identify patterns in data, to detect the co-rrelation between variables.
It attempt to reduce the dimensionality.

Advantages of Dimensionality Reduction


• It helps in data compression, and hence reduced storage space.
• It reduces computation time.
• It also helps remove redundant features, if any.

Disadvantages of Dimensionality Reduction


• It may lead to some amount of data loss.
• PCA tends to find linear correlations between variables, which is sometimes undesirable.
• PCA fails in cases where mean and covariance are not enough to define datasets.
• We may not know how many principal components to keep- in practice, some thumb rules are
applied.
Probability and Bayes learning.
• Bayes theorem provides a way to calculate the probability of a hypothesis based on its prior
probability, the probabilities of observing various data given the hypothesis, and the observed data
itself.
• Bayes' theorem is a formula that describes how to update the probabilities of hypotheses when
given evidence. It follows simply from the axioms of conditional probability, but can be used to
powerfully reason about a wide range of problems involving belief updates.

• Bayes' theorem, named after 18th-century British mathematician Thomas Bayes, is a


mathematical formula for determining conditional probability. Conditional probability is the
likelihood of an outcome occurring, based on a previous outcome occurring. Bayes' theorem
provides a way to revise existing predictions or theories (update probabilities) given new or
additional evidence. In finance, Bayes' theorem can be used to rate the risk of lending money to
potential borrowers.
• Bayes' theorem is also called Bayes' Rule or Bayes' Law and is the foundation of the field of
Bayesian statistics.

17
Naive Bayes’ Classifiers

• Naive Bayes Classifier is a supervised machine-learning algorithm that uses the Bayes’ Theorem,
which assumes that features are statistically independent. The theorem relies on the naive
assumption that input variables are independent of each other, i.e. there is no way to know
anything about other variables when given an additional variable. Regardless of this assumption,
it has proven itself to be a classifier with good results.
• Naive Bayes’ Classifiers are a set of probabilistic classifiers based on the Bayes’ Theorem. The
underlying assumption of these classifiers is that all the features used for classification are
independent of each other. That’s where the name ‘naive’ comes in since it is rare that we obtain
a set of totally independent features.
• Naïve Bayes Classifier Algorithm is used for Classification. This Algorithm Learn the Probability
of ab object with certain features belonging to a particular groups/Class.
• For instance: If you are trying to identify a fruit based on its color, shape and taste then or orange
colored, Spherical and tangy fruit would most likely be an orange.
• All of these properties individually contribute to the probability that this fruit is an orange and that
is why it is known as naïve,
What Is the Bayes’ Theorem?
Naive Bayes Classifiers rely on the Bayes’ Theorem, which is based on conditional probability or in
simple terms, the likelihood that an event (A) will happen given that another event (B) has already
happened. Essentially, the theorem allows a hypothesis to be updated each time new evidence is
introduced.
Bayes’ Theorem is used for calculating the probability of a hypothesis (H) being true (i.e. having the
disease) given that a certain event (E) has happened (being diagnosed positive of this disease in the test).
This calculation is described using the following formulation:
Finding the probability of
Event, A, when event B is
given

Let’s explain what each of these terms means.


• “P” is the symbol to denote probability.
• P(A | B) = The probability of event A (hypothesis) occurring given that B (evidence) has occurred.
P(A|B) is called the posterior; this is what we are trying to estimate.
• P(B | A) = The probability of the event B (evidence) occurring given that A (hypothesis) has
occurred. P(B|A) is called the likelihood; this is the probability of observing the new evidence, given
our initial hypothesis.
• P(A) = The probability of event B (hypothesis) occurring. P(A) is called the prior; this is the
probability of our hypothesis without any additional prior information.

18
• P(B) = The probability of event A (evidence) occurring. P(B) is called the marginal likelihood; this
is the total probability of observing the evidence.
Example: Picnic Day
You are planning a picnic today, but the morning is cloudy
• Oh no! 50% of all rainy days start off cloudy!
• But cloudy mornings are common (about 40% of days start cloudy)
• And this is usually a dry month (only 3 of 30 days tend to be rainy, or 10%)
What is the chance of rain during the day?

(Subject In-charge)
(Prof.S.B.Mehta)

19

You might also like