L6 - SLM Notes (Bayes Algorithm)
L6 - SLM Notes (Bayes Algorithm)
TEXTBOOKS/LEARNING RESOURCES:
a) Masashi Sugiyama, Introduction to Statistical Machine Learning (1 st ed.), Morgan Kaufmann, 2017. ISBN 978-0128021217.
b) T. M. Mitchell, Machine Learning (1st ed.), McGraw Hill, 2017. ISBN 978-1259096952.
Probability Theory
Python Implementation
Realtime Problems
It is a supervised learning algorithm, which is based on Bayes theorem and used for
solving classification problems.
Bayes theorem: we can find the probability of happening A, given that B has already
occurred.
Bayes The assumption made here is that the predictors/features are independent. That is presence of
Theorem one particular feature does not affect the other. Hence it is called Naive.
??
{ Thomas Bayes }
Where, A, B = events
P(A) =Probability of happening event A
P(B) =Probability of happening event B
P(A|B) =Probability of happening event A provided event B has occurred
P(B|A) =Probability of happening event B provided event A has occurred
Example:
Suppose we have a dataset of weather conditions and
corresponding target variable "Play“.
Don’t
play
Given
Weather
Condition
Play
Example:
Steps to solve this problem:
Problem -1: If the weather is sunny, then the Player should play or not?
Step -1: Frequency table for the Weather Conditions:
Yes No
Sunny 3 2
Overcast 4 0
Rainy 2 3
Total 9 5
P(Yes|Sunny) =
Step -3: Applying Bayes’ theorem: 0.60 > P(No|Sunny) = 0.40
P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny) P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny)
Hence, on a Sunny day, Player can play the game.
P(Sunny|Yes)= 3/9=0.33 P(Sunny|No)= 2/5= 0.5
P(Sunny)= 5/14=0.36 P(Sunny)= 5/14
P(Yes)=9/14=0.64 P(No)=5/14
So P(Yes|Sunny) = ((3/9)*(9/14))/(5/14)= 0.60 So P(No|Sunny)= ((2/5)*(5/14))/(5/14) =0.40
Example
-2
Problem-5: Find probability of playing golf when today = (Sunny, Hot, Normal, False)
• Naïve Bayes is one of the fast and easy ML algorithms to predict a class of datasets.
Bayes ??
Disadvantages of Naïve Bayes Classifier:
• Naive Bayes assumes that all features are independent or unrelated, so it cannot learn the
relationship between features.
Gaussian: It assumes that features follow a normal distribution. This means if predictors
take continuous values instead of discrete, then the model assumes that these values are
sampled from the Gaussian distribution.
Types of Multinomial: The Multinomial Naïve Bayes classifier is used when the data is
Naïve Bayes multinomial distributed. It is primarily used for document classification problems, it
Model: means a particular document belongs to which category such as Sports, Politics,
education, etc.
Bernoulli: The Bernoulli classifier works similar to the Multinomial classifier, but the
predictor variables are the independent Booleans variables. Such as if a particular word is
present or not in a document. This model is also famous for document classification tasks.
Bayesian
P(class|data) = (P(data|class) * P(class)) / P(data)
Classifier
Where is
Naive Bayes
Used?
News Classification: With the help of a Naive Bayes classifier, Google News recognizes
whether the news is political, world news, and so on…
Where is
Naive Bayes Real time Prediction: Naive Bayes is an eager learning classifier and it is sure fast. Thus,
Used? it could be used for making predictions in real time.
Recommendation System:
Python
Code
Confusion
Matrix
Specificity = TN
Precision =
Performance
Measures FP Recall = TP FN
F1- Measure =
Precision. The number of samples actually belonging to the positive class out of all the
Performance samples that were predicted to be of the positive class by the model.
Measures
F1-Score. The harmonic mean of the precision and recall scores obtained for the positive
Performance class.
Measures
Generally, a screening test should be highly sensitive, whereas a follow-up confirmatory test
should be highly specific.
How should I
balance Sensitivity? How many diseased
sensitivity individuals does the model correctly
with identify as diseased?
specificity?
Specificity? How many healthy
individuals does the model correctly
identify as healthy?
Confusion
Matrix for
Multiclass
Problems
3 Class 4 Class
Confusion
Matrix for
Multiclass
Problems
3 Class 4 Class
• https://en.wikipedia.org/wiki/Naive_Bayes_classifier
• http://gerardnico.com/wiki/data_mining/naive_bayes
• http://scikit-learn.org/stable/modules/naive_bayes.html
https://towardsdatascience.com/micro-macro-weighted-averages-of-f1-score-clearly-explained-b603420b292f