0% found this document useful (0 votes)

13 views79 pages

BSC ML CH2

Unit 2 of the Machine Learning course covers Bayesian classification, including prior and posterior probabilities, the Naive Bayesian algorithm, and logistic regression. It explains how Bayesian classifiers predict class membership probabilities and provides examples of calculating these probabilities. The unit also discusses the advantages and disadvantages of the Naive Bayes classifier, its types, and applications in real-world scenarios.

Uploaded by

rachitdhiliwal18

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views79 pages

BSC ML CH2

Uploaded by

rachitdhiliwal18

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 79

Machine Learning 1

Unit 2
BSc (Data Science)

*few figures/content have been prepared/referred from internet source/books Unit 2-Lecture 6-15 1
Unit 2
1) Prior and posterior probabilities
2) Naive Bayesian algorithm
3) Laplacian correction
4) Logistic Regression: The Logistic model
5) Estimating the regression coefficients
6) Making predictions
7) Multiple logistic regression

Unit 2-Lecture 6-15 2

Statistical (Bayesian) classification
• Bayesian classifiers are the statistical classifiers based on Bayes' Theorem
• Bayesian classifiers can predict class membership probabilities i.e. the probability
that a given tuple belongs to a particular class.
• It uses the given values to train a model and then it uses this model to classify
new data
Above,
•P(c|x) is the posterior probability of class (c, target)
given predictor (x, attributes).
•P(c) is the prior probability of class.
•P(x|c) is the likelihood which is the probability of
the predictor given class.
•P(x) is the prior probability of the predictor.
Prior and posterior probabilities
Prior: Probability distribution representing knowledge or uncertainty of a data object prior or before
observing it
Posterior: Conditional probability distribution representing what parameters are likely after observing the
data object

Unit 2-Lecture 6-15 5

Prior Probability
A prior probability is the probability that an observation will fall into a group
before you collect the data. The prior is a probability distribution that represents
your uncertainty over θ before you have sampled any data and attempted to
estimate it – usually denoted π(θ).

Posterior Probability
A posterior probability is the probability of assigning observations to groups given
the data. The posterior is a probability distribution representing your
uncertainty over θ after you have sampled data – denoted π(θ|X). It is a
conditional distribution because it conditions on the observed data.

Unit 2-Lecture 6-15 6

Example
Consider a population where the proportion of HIV-infected individuals is 0.01. Then, the
prior probability that a randomly chosen subject is HIV-infected is Pprior = 0.01 .

Suppose now a subject has been positive for HIV. It is known that specificity of the test is
95%, and sensitivity of the test is 99%.

What is the probability that the subject is HIV-infected? In other words, what is the
conditional probability that a subject is HIV-infected if he/she has tested positive?

The following table summarizes calculations. (For the sake of simplicity you may consider
the fractions (probabilities) as proportions of the general population.)

Unit 2-Lecture 6-15 7

Example

• Thus, the average proportion of positive tests overall is 0.0594, and the proportion of actually
infected among them is 0.0099/0.0594 or 0.167 = 16.7%. So, the posterior (i.e. after the test has
been carried out and turns out to be positive) probability that the subject is really HIV-infected is
0.167.
• The difference between prior and posterior probabilities characterizes the information we have
gotten from the experiment or measurement. In this example the probability changed from 0.01
(prior) to 0.167 (posterior)

Unit 2-Lecture 6-15 8

Naive Bayes Classifier

• Naïve Bayes algorithm is a supervised learning

algorithm, which is based on Bayes theorem and
used for solving classification problems.
• It is a probabilistic classifier, which means it
predicts on the basis of the probability of an
object.
• Bayes' theorem is also known as Bayes'
Rule or Bayes' law, which is used to determine
the probability of a hypothesis with prior
knowledge. It depends on the conditional
probability.
• The formula for Bayes' theorem is given as:

Where,
P(h): the probability of hypothesis h being true (regardless of the data). This is known as the prior
probability of h.
P(D): the probability of the data (regardless of the hypothesis). This is known as the prior
probability.
P(h|D): the probability of hypothesis h given the data D. This is known as posterior probability.
P(D|h): the probability of data d given that the hypothesis h was true. This is known as posterior
probability.
• Basic assumptions
• Use all the attributes
• Attributes are assumed to be:
• equally important: all attributes have the same relevance to the classification task.
• statistically independent (given the class value): knowledge about the value of a particular
attribute doesn't tell us anything about the value of another attribute (if the class is
known).
• Why is it called Naïve Bayes?
• The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, Which can be described as:

• Naïve: It is called Naïve because it assumes that the occurrence of a certain feature is independent of
the occurrence of other features. Such as if the fruit is identiﬁed on the basis of color, shape, and taste,
then red, spherical, and sweet fruit is recognized as an apple. Hence each feature individually
contributes to identify that it is an apple without depending on each other.
• Bayes: It is called Bayes because it depends on the principle of Bayes' Theorem
Properties
1. The Naive Bayes Algorithm is one of the popular classification machine learning algorithms
that helps to classify the data based upon the conditional probability values computation.
2. It implements the Bayes theorem for the computation and used class levels represented as feature
values or vectors of predictors for classification.
3. Naive Bayes Algorithm is a fast algorithm for classification problems.
4. This algorithm is a good fit for real-time prediction, multi-class prediction, recommendation
system, text classification, and sentiment analysis use cases.
5. Naive Bayes Algorithm can be built using Gaussian, Multinomial and Bernoulli distribution.
6. This algorithm is scalable and easy to implement for a large data set.
7. It helps to calculate the posterior probability P(c|x) using the prior probability of class P(c), the prior
probability of predictor P(x), and the probability of predictor given class, also called as
likelihood P(x|c).

13
Steps
1. Convert the given dataset into frequency tables.
2. Generate Likelihood table by finding the probabilities of given features.
3. Now, use Bayes theorem to calculate the posterior probability.

Problem: If the weather is sunny, then the Player should play or not?
• Solution: To solve this, first consider the dataset:
Outlook Play

0 Rainy Yes
1 Sunny Yes
2 Overcast Yes
3 Overcast Yes
4 Sunny No Weather No Yes

5 Rainy Yes Overcast 0 5 5/14= 0.35

6 Sunny Yes Rainy 2 2 4/14=0.29

7 Overcast Yes Sunny 2 3 5/14=0.35

8 Rainy No All 4/14=0.29 10/14=0.71

9 Sunny No
10 Sunny Yes
11 Rainy No
12 Overcast Yes
13 Overcast Yes
• Likelihood table weather condition:
Applying Bayes'theorem:

PYes|Sunny)= PSunny|Yes)*PYes/PSunny

PSunny|Yes)= 3/10 0.3

PSunny 0.35 Weather No Yes
PYes)=0.71 Overcast 0 5 5/14= 0.35
So PYes|Sunny) = 0.30.71/0.35 0.60
Rainy 2 2 4/14=0.29
PNo|Sunny)= PSunny|No)*PNo/PSunny
Sunny 2 3 5/14=0.35
PSunny|NO 2/40.5
PNo 0.29 All 4/14=0.29 10/14=0.71
PSunny 0.35
So PNo|Sunny)= 0.50.29/0.35  0.41

So as we can see from the above calculation that PYes|Sunny)>PNo|Sunny)

Hence on a Sunny day, Player can play the game.
• Types of Naïve Bayes Model:
• There are three types of Naive Bayes Model, which are given below:

• Gaussian: The Gaussian model assumes that features follow a normal distribution. This means if
predictors take continuous values instead of discrete, then the model assumes that these values are
sampled from the Gaussian distribution.

• From sklearn.naive_bayes import GaussianNB

• Multinomial: The Multinomial Naïve Bayes classifier is used when the data is multinomial distributed. It
is primarily used for document classification problems, it means a particular document belongs to
which category such as Sports, Politics, education, etc.
The classifier uses the frequency of words for the predictors.
• Bernoulli: The Bernoulli classifier works similar to the Multinomial classifier, but the predictor variables
are the independent Booleans variables. Such as if a particular word is present or not in a document.
This model is also famous for document classification tasks.
data should follow the rule of Gaussian
distribution or normal distribution, used
for continuous data
it goes through all the possibilities, which is very slow and time-consuming.

Types of the
Naive Bayes
Model

data that has binary or boolean attributes, binary

Works on discrete data, used on classes
documentation ,
text classification when frequencies are given
Classifier Type Data Type Common Use Cases

Sensor data analysis,

Continuous medical diagnostics,
real values
Text classification,
document
Discrete
categorization, discrete
values(frequencies)
Spam detection,
Binary sentiment analysis ,
Boolean features
Classifier Type Data Type Common Use Cases

Sensor data analysis,

Gaussian Naive Bayes Continuous
medical diagnostics,

Text classification,
Multinomial Naive
Discrete document
Bayes
categorization
Spam detection,
Bernoulli Naive Bayes Binary
sentiment analysis
Advantages of Naïve Bayes Classifier:
• Naïve Bayes is one of the fast and easy ML algorithms to predict a
class of datasets.
• It can be used for Binary classification.
• It performs well in Multi-class predictions as compared to the other
Algorithms.
• It is the most popular choice for text classification problems.
Disadvantages of Naive Bayes:
• Naive Bayes assumes that all predictors (or features) are
independent, rarely happening in real life. This limits the applicability
of this algorithm in real-world use cases.
• This algorithm faces the ‘zero-frequency problem’ where it assigns
zero probability to a categorical variable whose category in the test
data set wasn’t available in the training dataset. It would be best if
you used a smoothing technique to overcome this issue.
• Its estimations can be wrong in some cases, so you shouldn’t take its
probability outputs very seriously.
Applications
of Naïve Bayes Classifier:
Example 1: Probabilities of weather data

New instance: [outlook=sunny, temp=cool, humidity=high, windy=true, play=?]

• outlook = sunny [yes (2/9); no (3/5)];
• temperature = cool [yes (3/9); no (1/5)];
• humidity = high [yes (3/9); no (4/5)];
• windy = true [yes (3/9); no (3/5)];
• play = yes [(9/14)]
• play = no [(5/14)]
• New instance: [outlook=sunny, temp=cool, humidity=high, windy=true, play=?]
• Likelihood of the two classes (play=yes; play=no):
• yes = (2/9)*(3/9)*(3/9)*(3/9)*(9/14) = 0.0053;
• no = (3/5)*(1/5)*(4/5)*(3/5)*(5/14) = 0.0206;
• Conversion into probabilities by normalization:
• P(yes) = 0.0053 / (0.0053 + 0.0206) = 0.205
• P(no) = 0.0206 / (0.0053 + 0.0206) = 0.795

• Answer : Play=No
Example
• Using this data, we
have to identify the
species of an entity
with the following
attributes.

• X={Color=Green,
Legs=2,
Height=Tall,
Smelly=No}
• To predict the class label for the above attribute set, we will first
calculate the probability of the species being M or H in total.
• P(Species=M)=4/8=0.5
• P(Species=H)=4/8=0.5
• Next, we will calculate the conditional probability of each attribute
value for each class label.

• P(Color=White/Species=M)=2/4=0.5 P(Height=Tall/Species=M)=3/4=0.75
P(Height=Tall/Species=H)=2/4=0.5
• P(Color=White/Species=H)=¾=0.75 P(Height=Short/Species=M)=1/4=0.25
• P(Color=Green/Species=M)=2/4=0.5 P(Height=Short/Species=H)=2/4=0.5
• P(Color=Green/Species=H)=¼=0.25

P(Legs=2/Species=M)=1/4=0.25 P(Smelly=Yes/Species=M)=3/4=0.75
P(Legs=2/Species=H)=4/4=1 P(Smelly=Yes/Species=H)=1/4=0.25
P(Legs=3/Species=M)=3/4=0.75 P(Smelly=No/Species=M)=1/4=0.25
P(Legs=3/Species=H)=0/4=0 P(Smelly=No/Species=H)=3/4=0.75
• Now that we have calculated the conditional probabilities, we will use them to calculate the probability of the
new attribute set belonging to a single class.

• Let us consider X= {Color=Green, Legs=2, Height=Tall, Smelly=No}.

• Then, the probability of X belonging to Species M will be as follows.

• P(M/X)=P(Species=M)*P(Color=Green/Species=M)*P(Legs=2/Species=M)*P(Height=Tall/Species=M)*P(Smelly=
No/Species=M)
• =0.5*0.5*0.25*0.75*0.25
• =0.0117
• Similarly, the probability of X belonging to Species H will be calculated as
follows.

• P(H/X)=P(Species=H)*P(Color=Green/Species=H)*P(Legs=2/Species=H)*P(
Height=Tall/Species=H)*P(Smelly=No/Species=H)
• =0.5*0.25*1*0.5*0.75
• =0.0468
• So, the probability of X belonging to Species M is 0.0117 and that to
Species H is 0.0468. Hence, we will assign the entity X with attributes
{Color=Green, Legs=2, Height=Tall, Smelly=No} to species H.
• P(A|B) = (P(B|A) * P(A) )/ P(B)
• Mango:
• P(X | Mango) = P(Yellow | Mango) * P(Sweet | Mango) * P(Long | Mango)
• a)P(Yellow | Mango) = (P(Yellow | Mango) * P(Yellow) )/ P (Mango)
• = ((350/800) * (800/1200)) / (650/1200)
• P(Yellow | Mango)= 0.53 →1

• b)P(Sweet | Mango) = (P(Sweet | Mango) * P(Sweet) )/ P (Mango)

• = ((450/850) * (850/1200)) / (650/1200)
• P(Sweet | Mango)= 0.69 → 2

• c)P(Long | Mango) = (P(Long | Mango) * P(Long) )/ P (Mango)

• = ((0/650) * (400/1200)) / (800/1200)
• P(Long | Mango)= 0 → 3

• On multiplying eq 1,2,3 ==> P(X | Mango) = 0.53 * 0.69 * 0

• 2.a) P(Yellow | Banana) = (P( Banana | Yellow ) * P(Yellow) )/ P (Banana)

• = ((400/800) * (800/1200)) / (400/1200)
• P(Yellow | Banana) = 1 → 4

• 2.b) P(Sweet | Banana) = (P( Banana | Sweet) * P(Sweet) )/ P (Banana)

• = ((300/850) * (850/1200)) / (400/1200)
• P(Sweet | Banana) = .75→ 5

• 2.c)P(Long | Banana) = (P( Banana | Yellow ) * P(Long) )/ P (Banana)

• = ((350/400) * (400/1200)) / (400/1200)
• P(Yellow | Banana) = 0.875 → 6

• On multiplying eq 4,5,6 ==> P(X | Banana) = 1 * .75 * 0.875

• 3.a) P(Yellow | Others) = (P( Others| Yellow ) * P(Yellow) )/ P (Others)

• = ((50/800) * (800/1200)) / (150/1200)
• P(Yellow | Others) = 0.34→ 7

• 3.b) P(Sweet | Others) = (P( Others| Sweet ) * P(Sweet) )/ P (Others)

• = ((100/850) * (850/1200)) / (150/1200)
• P(Sweet | Others) = 0.67 → 8

• 3.c) P(Long | Others) = (P( Others| Long) * P(Long) )/ P (Others)

• = ((50/400) * (400/1200)) / (150/1200)
• P(Long | Others) = 0.34 → 9

• On multiplying eq 7,8,9 ==> P(X | Others) = 0.34 * 0.67* 0.34

• P(X | Others) = 0.07742
•So finally from P(X | Mango) == 0 , P(X | Banana) == 0.65 and
P(X| Others) == 0.07742.
•We can conclude Fruit{Yellow,Sweet,Long} is Banana.
• The “zero-frequency problem”
• What if an attribute value doesn't occur with every class value (e. g. humidity =
high for class yes)?
• Probability will be zero, for example P(humidity=high|yes) = 0;
• A posteriori probability will also be zero: P(yes|E) = 0 (no matter how likely the other values
are!)
• Remedy: add 1 to the count for every attribute value-class combination (i.e. use the
Laplace estimator: (p+1) / (n+1) ).
• Result: probabilities will never be zero! (also stabilizes probability estimates)
• Query review= x12xʼ
• Let, a test sample have three words, where we assume x1
and x2 are present in the training data but not xʼ. So we
have the likelihood for these two words.
• To predict whether the review is positive or negative, we
compare P(positive/review) and P(negative/review) and
choose the maximum probability out of the two to our
prediction for review.
• So, now probability equation becomes,
• P(positive/review) =
KP(x1/positive)*P(x2/positive)*P(xʼ/positive)*P(positive)
• Similarly,
• P(negative/review) =
KP(x1/negative)*P(x2/negative)*P(xʼ/negative)*P(negative)
• Here k is the proportionality constant.
• In the likelihood table, the value of P(x1/positive), P(x2/positive)
and P(positive) are present but P(xʼ/positive) is not present
since xʼ is not present in our training data. As we have no value
• In a bag of words model, we count the occurrence of
words. If occurrences of word xʼ in training are 0.
According to that P(xʼ|positive)=0 and P(xʼ|negative)=0,
but this will make both P(positive|review) and
P(negative|review) equal to 0. This is the problem of zero
probability.
• So, how to deal with this problem?
• Idea-1 Ignore the term P(xʼ/positive)
• Idea-2 Throw Away Features that Appear Zero Times In
Any Class
• Idea-3 Donʼt Throw Anything Away – Use Smoothing
Laplace smoothing:
1. A small-sample correction, or pseudo-count, will be incorporated in
every probability estimate.
2. Consequently, no probability will be zero.
3. This is a way of regularizing Naive Bayes, and when the
pseudo-count is zero, it is called Laplace smoothing.
To ensure that our posterior probabilities are never zero, we add 1 to
the numerator, and we add k to the denominator.

So, in the case that we donʼt have a particular ingredient in our training
set, the posterior probability comes out to 1 / N + k instead of zero.
Using Laplace smoothing, we can represent P(xʼ|positive) as,

P(x’/positive)= (number of reviews with x’ and target_outcome=positive + α) / (N+ α*k)

Here, alpha(α) represents the smoothing parameter,

K represents the dimensions(no of features) in the data,

N represents the number of reviews with target_outcome=positive

Finding Optimal ‘αʼ:

Here, alpha is a hyper-parameter and you have to tune it. The basic
methods fortune it is as follows:
1. Using elbow plot, try plotting ‘performance metricʼ v/s ‘αʼ
hyper-parameter.
2. In most cases, the best way to determine optimal values of alpha is
through a grid search over possible parameter values, using
cross-validation to evaluate the performance of the model on your data at
Interpretation of changing alpha
Let’s say the occurrence of word w is 3 with y=positive in training data.
Assuming we have 2 features in our dataset, i.e., K=2 and N=100 (total
number of positive reviews).

Case 1- when alpha=1: P(w’|positive) = 4/102

Case 2- when alpha = 100: P(w’|positive) = 103/300
Case 3- when alpha=1000: P(w’|positive) = 1003/2100

As alpha increases, the likelihood probability moves towards uniform distribution

(0.5). Most of the time, alpha = 1 is being used to remove the problem of zero
probability.

Unit 2-Lecture 6-15 43

Interpretation of changing alpha
• Laplace smoothing is a smoothing technique that helps tackle the problem
of zero probability in the Naïve Bayes machine learning algorithm.
• Using higher alpha values will push the likelihood towards a value of 0.5,
i.e., the probability of a word equal to 0.5 for both the positive and
negative reviews.
• Since we are not getting much information from that, it is not preferable.
Therefore, it is preferred to use alpha=1.

Unit 2-Lecture 6-15 44

Advantages and Disadvantages of Laplace Smoothing
The beneﬁt of Laplace Smoothing
• It guarantees no instance of zero prior probability and
appropriately executes the order.
The disadvantages of Laplace Smoothing
• Since the numerical terms are changed to give a superior
order, the genuine probabilities of the occasion are changed.
• Additionally, to expand the worth of the zero probability
relevant informative item, the other information point's
possibilities are decreased to keep up with the law of
probability.
from sklearn.naive_bayes import GaussianNB
classifier = GaussianNB()
classifier.fit(X_train, y_train)

https://colab.research.google.com/drive/1DX8wOrqeWyNCOBWlN9dQzJL_IccX8wiS?authuser=2
Email spam
detection
Logistic Regression: The Logistic model
1. Logistic regression becomes a classification technique only when a decision threshold is brought into
the picture.
2. The setting of the threshold value is a very important aspect of Logistic regression and is dependent
on the classification problem itself.
3. The decision for the value of the threshold value is majorly affected by the values of precision and
recall.
4. Ideally, we want both precision and recall to be 1, but this seldom is the case.
5. Logistic Regression is a “Supervised machine learning” algorithm that can be used to model the
probability of a certain class or event.
6. It is used when the data is linearly separable and the outcome is binary in nature.
7. That means Logistic regression is usually used for Binary classification problems.

Unit 2-Lecture 6-15 52

Logistic Function
1. Logistic regression is named for the function used at the core of the
method, the logistic function.
2. The logistic function, also called the sigmoid function was developed
by statisticians to describe properties of population growth in
1
ecology, rising quickly and maxing out
Y = at the carrying capacity of
1− e− x
the environment.
3. It’s an S-shaped curve that can take any real-valued number and map
it into a value between 0 and 1, but never exactly at those limits.
Unit 2-Lecture 6-15 53
Logistic Function
Below is a plot of the numbers between -5 and 5 transformed into the range 0 and 1 using the
logistic function.

Unit 2-Lecture 6-15 56

Logistic Regression

• Supervised machine learning algorithm

mainly used for classification tasks .
• The goal is to predict the probability that
an instance of belonging to a given class.
• Logistic regression uses a more
sophisticated cost function called the
“Sigmoid function” or “logistic function”
instead of a linear function.
It’s referred to as regression because it takes the output of the linear regression function as
input and uses a sigmoid function to estimate the probability for the given class.

Sigmoid function maps any real value into another value between 0 and 1. In machine
learning, we use sigmoid to map predictions to probabilities.
When to use Logistic Regression?
• Logistic Regression is used when the input needs to be separated into
“two regions” by a linear boundary.
• The data points are separated using a linear line as shown:
• Based on the number of categories, Logistic regression can be classified as:
• binomial: target variable can have only 2 possible types: “0” or “1” which
may represent “win” vs “loss”, “pass” vs “fail”, “dead” vs “alive”, etc.
• multinomial: target variable can have 3 or more possible types which are
not ordered(i.e. types have no quantitative significance) like “disease A” vs
“disease B” vs “disease C”.
• ordinal: it deals with target variables with ordered categories. For example,
a test score can be categorized as:“very poor”, “poor”, “good”, “very good”.
Here, each category can be given a score like 0, 1, 2, 3.
Interpretation of Regression Coefficients
Odds of success
• Odds (𝜃) = Probability of an event happening / Probability of an event
not happening
𝜃=p/1-p
• The values of odds range from zero to ∞ and the values of probability
lies between zero and one.
• Consider the equation of a straight line:
𝑦 = 𝛽0 + 𝛽1* 𝑥
• to transform the model from linear regression to logistic regression using the
logistic function.
• Now to predict the odds of success, we use the following formula:

• Exponentiating both the sides, we have:

Let Y = e^𝛽0+𝛽1 * 𝑥
Then p(x) / 1 - p(x) = Y
p(x) = Y(1 - p(x))
p(x) = Y - Y(p(x))
p(x) + Y(p(x)) = Y
p(x)(1+Y) = Y
p(x) = Y / 1+Y
• The equation of the sigmoid function is:

• The sigmoid curve obtained from the above equation is as follows:

• This is the Sigmoid function, which produces an S-shaped curve. It always returns
a probability value between 0 and 1.
• The Sigmoid function is used to convert expected values to probabilities.
• The function converts any real number into a number between 0 and 1.
• We utilize sigmoid to translate predictions to probabilities in machine learning.
• The mathematically sigmoid function can be,
• #import pandas
• import pandas as pd
• col_names = ['pregnant', 'glucose', 'bp', 'skin', 'insulin', 'bmi', 'pedigree', 'age',
'label']
• # load dataset
• pima = pd.read_csv("pima-indians-diabetes.csv", header=None,
names=col_names)
#split dataset in features and target variable
feature_cols = ['pregnant', 'insulin', 'bmi', 'age','glucose','bp','pedigree']
X = pima[feature_cols] # Features
y = pima.label # Target variable

# split X and y into training and testing sets

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25,
random_state=16)
# import the class
from sklearn.linear_model import LogisticRegression

# instantiate the model (using the default parameters)

logreg = LogisticRegression(random_state=16)

# fit the model with data

logreg.fit(X_train, y_train)

y_pred = logreg.predict(X_test)
# import the metrics class
from sklearn import metrics

cnf_matrix = metrics.confusion_matrix(y_test, y_pred)

cnf_matrix

array([[115, 8],
[ 30, 39]])
Example
• The dataset of pass/fail in an exam for 5 students is given in the table below. If we use Logistic
Regression as the classifier and assume the model suggested by the optimizer will become the
following for Odds of passing a course:
• log(Odds)=−64+2×hours
• 1) How to calculate the probability of Pass for the student who studied 33 hours?
• 2) At least how many hours the student should study that makes sure will pass the
course with the probability of more than 95%?

ML Unit No.4 Naïve Bayes Classifiers PPT Notes
No ratings yet
ML Unit No.4 Naïve Bayes Classifiers PPT Notes
47 pages
Naïve Bayes
No ratings yet
Naïve Bayes
15 pages
Naive Bayes
No ratings yet
Naive Bayes
21 pages
Baye's Notes
No ratings yet
Baye's Notes
3 pages
Preliminary Examination in Statistics & Probability
No ratings yet
Preliminary Examination in Statistics & Probability
3 pages
ML Unit3
No ratings yet
ML Unit3
21 pages
Rtmnu Questions: 1) A Discrete Probability Function F (X) Is Always
No ratings yet
Rtmnu Questions: 1) A Discrete Probability Function F (X) Is Always
16 pages
Bayesian Learning
No ratings yet
Bayesian Learning
58 pages
Cours #5 - Naive Bayes Classification
No ratings yet
Cours #5 - Naive Bayes Classification
18 pages
Machine Learning: Naïve Bayes Classifier
No ratings yet
Machine Learning: Naïve Bayes Classifier
11 pages
ML CLassification Naive Bayes
No ratings yet
ML CLassification Naive Bayes
6 pages
Lecture 5 Bayesian
No ratings yet
Lecture 5 Bayesian
37 pages
Naive Bayes Classifier in Machine Learning - Javatpoint
No ratings yet
Naive Bayes Classifier in Machine Learning - Javatpoint
19 pages
Probability Models
No ratings yet
Probability Models
23 pages
Naive Bayes
No ratings yet
Naive Bayes
24 pages
Chapter 4
No ratings yet
Chapter 4
57 pages
07 Naive Bayes
No ratings yet
07 Naive Bayes
6 pages
Bayesian Data Analysis
No ratings yet
Bayesian Data Analysis
14 pages
Naive Bayes Classifier
No ratings yet
Naive Bayes Classifier
37 pages
1.probability Random Variables and Stochastic Processes Athanasios Papoulis S. Unnikrishna Pillai 1 300 211 240
No ratings yet
1.probability Random Variables and Stochastic Processes Athanasios Papoulis S. Unnikrishna Pillai 1 300 211 240
30 pages
Uncertainty Estimation and Monte Carlo Simulation Method
No ratings yet
Uncertainty Estimation and Monte Carlo Simulation Method
8 pages
29-Naive Bayes-03-10-2024
No ratings yet
29-Naive Bayes-03-10-2024
48 pages
Bayes Classifier
No ratings yet
Bayes Classifier
20 pages
Unit 3
No ratings yet
Unit 3
46 pages
Module 3 - Bayesian Classifier
No ratings yet
Module 3 - Bayesian Classifier
17 pages
Ba Yes Naive
No ratings yet
Ba Yes Naive
15 pages
Lec 03 NaiveBayesClassification
No ratings yet
Lec 03 NaiveBayesClassification
33 pages
Naïve Bayes Classifier: April 25, 2006
No ratings yet
Naïve Bayes Classifier: April 25, 2006
19 pages
Bayes Classifier
No ratings yet
Bayes Classifier
35 pages
Naive Bayes
No ratings yet
Naive Bayes
26 pages
Ex - No.5 - Naïve Bayesian Classifier
No ratings yet
Ex - No.5 - Naïve Bayesian Classifier
4 pages
Mechine Learning
No ratings yet
Mechine Learning
7 pages
Bayesian Classification
No ratings yet
Bayesian Classification
25 pages
Naive Bayes
No ratings yet
Naive Bayes
29 pages
Bayesian Classification: Cse 634 Data Mining - Prof. Anita Wasilewska
No ratings yet
Bayesian Classification: Cse 634 Data Mining - Prof. Anita Wasilewska
66 pages
AI NOTES Unit 2
No ratings yet
AI NOTES Unit 2
9 pages
L25 - Naïve Bayes
No ratings yet
L25 - Naïve Bayes
18 pages
What Is Naive Bayes Algorithm
No ratings yet
What Is Naive Bayes Algorithm
10 pages
Bayes Theorem, Types of Naive Bayes, Implementation
No ratings yet
Bayes Theorem, Types of Naive Bayes, Implementation
8 pages
WK 08
No ratings yet
WK 08
10 pages
E-Note 14654 Content Document 20231228101425AM
No ratings yet
E-Note 14654 Content Document 20231228101425AM
10 pages
Lecture Slide 03 - Bayesian Classifier - Summer 2023
No ratings yet
Lecture Slide 03 - Bayesian Classifier - Summer 2023
23 pages
Unit 6
No ratings yet
Unit 6
19 pages
Naive by
No ratings yet
Naive by
23 pages
Unit-3 (After Mid)
No ratings yet
Unit-3 (After Mid)
10 pages
Naive Bayes Classifier in Machine Learning
No ratings yet
Naive Bayes Classifier in Machine Learning
16 pages
Probabilistic Models in Machine Learning: Unit - III Chapter - 1
No ratings yet
Probabilistic Models in Machine Learning: Unit - III Chapter - 1
18 pages
Unit 4
No ratings yet
Unit 4
36 pages
Naive Bayes Classification
No ratings yet
Naive Bayes Classification
47 pages
Unit-4 Naïve Bayes & Support Vector Machine
No ratings yet
Unit-4 Naïve Bayes & Support Vector Machine
79 pages
Two Mark Questions With Answers
No ratings yet
Two Mark Questions With Answers
31 pages
Kashmir University BSC Math Syllabus
No ratings yet
Kashmir University BSC Math Syllabus
2 pages
Bayes Theorem
No ratings yet
Bayes Theorem
20 pages
Baye's Theorem - Example
No ratings yet
Baye's Theorem - Example
7 pages
Pattern Recognition - Lec02
No ratings yet
Pattern Recognition - Lec02
44 pages
Report Ai
No ratings yet
Report Ai
7 pages
6 Easy Steps To Learn Naive Bayes Algorithm With Codes in Python and R
No ratings yet
6 Easy Steps To Learn Naive Bayes Algorithm With Codes in Python and R
6 pages
Lecture - 4.1 - Bayes Classifier
No ratings yet
Lecture - 4.1 - Bayes Classifier
31 pages
Naive Bayes Classifier
No ratings yet
Naive Bayes Classifier
6 pages
UNIT 2 AAM Notes
No ratings yet
UNIT 2 AAM Notes
38 pages
Unit II Probabilistic Reasoning
No ratings yet
Unit II Probabilistic Reasoning
28 pages
2020 21sjit PQT QB
No ratings yet
2020 21sjit PQT QB
66 pages
Lecture Slides On Random Vectors, Processes
No ratings yet
Lecture Slides On Random Vectors, Processes
32 pages
Chapter 8 Random Variables
No ratings yet
Chapter 8 Random Variables
50 pages
Naive Bayes Classifier
No ratings yet
Naive Bayes Classifier
14 pages
IE 403 Ch03 RNG RVG With Comments
No ratings yet
IE 403 Ch03 RNG RVG With Comments
27 pages
Naive Bayes Model
No ratings yet
Naive Bayes Model
10 pages
Unit I Probabilistic Reasoning I 9
No ratings yet
Unit I Probabilistic Reasoning I 9
20 pages
Ps 1
No ratings yet
Ps 1
6 pages
JuliaPro v0.6.2.1 Package API Manual
No ratings yet
JuliaPro v0.6.2.1 Package API Manual
480 pages
Easy Education: ECON 2123
No ratings yet
Easy Education: ECON 2123
10 pages
The Expectation-Maximization Algorithm: IEEE Signal Processing Magazine December 1996
No ratings yet
The Expectation-Maximization Algorithm: IEEE Signal Processing Magazine December 1996
15 pages
F.Y.B.Sc-CS Stas-1 & 2 Syllabus
No ratings yet
F.Y.B.Sc-CS Stas-1 & 2 Syllabus
10 pages
Procstat
No ratings yet
Procstat
494 pages
Lampiran D Hasil Penelitian
No ratings yet
Lampiran D Hasil Penelitian
13 pages
Conditional Probability: Unit 12 Lesson 2
No ratings yet
Conditional Probability: Unit 12 Lesson 2
13 pages
Mathematics IA - Vaibhav G Dayal
No ratings yet
Mathematics IA - Vaibhav G Dayal
11 pages
1P - 11 - Taller Preparatorio MS
No ratings yet
1P - 11 - Taller Preparatorio MS
56 pages
Birthday Problem
No ratings yet
Birthday Problem
2 pages
Chapter 9
No ratings yet
Chapter 9
28 pages
Cumulative Distribution Function & Expectation: Muhammed Haris
No ratings yet
Cumulative Distribution Function & Expectation: Muhammed Haris
12 pages
Random Variables
No ratings yet
Random Variables
31 pages
Lesson 8 The T Distribution
No ratings yet
Lesson 8 The T Distribution
12 pages
Some Probability Distribution Binomial Poisson
No ratings yet
Some Probability Distribution Binomial Poisson
6 pages
Fidp Ucsp
No ratings yet
Fidp Ucsp
7 pages
Economics Major
No ratings yet
Economics Major
3 pages
Sampling Distribution
No ratings yet
Sampling Distribution
22 pages
Stat Mech A-1
No ratings yet
Stat Mech A-1
2 pages
Bayesian Methodology: an Overview With The Help Of R Software
From Everand
Bayesian Methodology: an Overview With The Help Of R Software
Editor IJSMI
No ratings yet
Bayesian Inference: Fundamentals and Applications
From Everand
Bayesian Inference: Fundamentals and Applications
Fouad Sabry
No ratings yet

BSC ML CH2

Uploaded by

BSC ML CH2

Uploaded by

Machine Learning 1

Unit 2-Lecture 6-15 2

Unit 2-Lecture 6-15 5

Unit 2-Lecture 6-15 6

Unit 2-Lecture 6-15 7

Unit 2-Lecture 6-15 8

• Naïve Bayes algorithm is a supervised learning

5 Rainy Yes Overcast 0 5 5/14= 0.35

6 Sunny Yes Rainy 2 2 4/14=0.29

7 Overcast Yes Sunny 2 3 5/14=0.35

8 Rainy No All 4/14=0.29 10/14=0.71

PSunny|Yes)= 3/10 0.3

So as we can see from the above calculation that PYes|Sunny)>PNo|Sunny)

• From sklearn.naive_bayes import GaussianNB

data that has binary or boolean attributes, binary

Sensor data analysis,

Sensor data analysis,

New instance: [outlook=sunny, temp=cool, humidity=high, windy=true, play=?]

• Let us consider X= {Color=Green, Legs=2, Height=Tall, Smelly=No}.

• Then, the probability of X belonging to Species M will be as follows.

• b)P(Sweet | Mango) = (P(Sweet | Mango) * P(Sweet) )/ P (Mango)

• c)P(Long | Mango) = (P(Long | Mango) * P(Long) )/ P (Mango)

• On multiplying eq 1,2,3 ==> P(X | Mango) = 0.53 * 0.69 * 0

• 2.a) P(Yellow | Banana) = (P( Banana | Yellow ) * P(Yellow) )/ P (Banana)

• 2.b) P(Sweet | Banana) = (P( Banana | Sweet) * P(Sweet) )/ P (Banana)

• 2.c)P(Long | Banana) = (P( Banana | Yellow ) * P(Long) )/ P (Banana)

• On multiplying eq 4,5,6 ==> P(X | Banana) = 1 * .75 * 0.875

• 3.a) P(Yellow | Others) = (P( Others| Yellow ) * P(Yellow) )/ P (Others)

• 3.b) P(Sweet | Others) = (P( Others| Sweet ) * P(Sweet) )/ P (Others)

• 3.c) P(Long | Others) = (P( Others| Long) * P(Long) )/ P (Others)

• On multiplying eq 7,8,9 ==> P(X | Others) = 0.34 * 0.67* 0.34

P(x’/positive)= (number of reviews with x’ and target_outcome=positive + α) / (N+ α*k)

Here, alpha(α) represents the smoothing parameter,

K represents the dimensions(no of features) in the data,

N represents the number of reviews with target_outcome=positive

Finding Optimal ‘αʼ:

Case 1- when alpha=1: P(w’|positive) = 4/102

As alpha increases, the likelihood probability moves towards uniform distribution

Unit 2-Lecture 6-15 43

Unit 2-Lecture 6-15 44

Unit 2-Lecture 6-15 52

Unit 2-Lecture 6-15 56

• Supervised machine learning algorithm

• Exponentiating both the sides, we have:

• The sigmoid curve obtained from the above equation is as follows:

# split X and y into training and testing sets

# instantiate the model (using the default parameters)

# fit the model with data

cnf_matrix = metrics.confusion_matrix(y_test, y_pred)

You might also like