0% found this document useful (0 votes)
9 views

Unit-4

Unit IV covers probabilistic learning and ensemble methods, including Bayesian learning, Naive Bayes classifiers, and ensemble techniques like bagging and boosting. It explains conditional probability, Bayes' theorem, and the workings of Naive Bayes algorithms, highlighting their applications and limitations. Additionally, it discusses ensemble learning's effectiveness in improving prediction accuracy by combining multiple models through techniques such as max voting, averaging, and weighted averaging.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Unit-4

Unit IV covers probabilistic learning and ensemble methods, including Bayesian learning, Naive Bayes classifiers, and ensemble techniques like bagging and boosting. It explains conditional probability, Bayes' theorem, and the workings of Naive Bayes algorithms, highlighting their applications and limitations. Additionally, it discusses ensemble learning's effectiveness in improving prediction accuracy by combining multiple models through techniques such as max voting, averaging, and weighted averaging.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Unit-IV: PROBABILISTIC LEARNING & ENSEMBLE

• Bayesian Learning, Bayes Optimal Classifier, Naive Bayes Classifier, Bayesian Belief
Networks.
• Ensembles methods: Bagging & boosting, C5.0 boosting, Random Forest, Gradient Boosting
Machines and XGBoost
Conditional Probability:

• Conditional Probability:
• Conditional probability is a concept in probability theory that deals with the probability of an event occurring
given that another event has already occurred.
• It's denoted as P(A | B), which reads as "the probability of event A occurring given that event B has occurred.“
• In other words, it quantifies the likelihood of event A happening under the condition that we already know
event B has taken place.
• The formula for conditional probability is:

P(A | B) = P(A and B) / P(B)

Where:
• P(A | B) is the conditional probability of event A given event B.
• P(A and B) is the joint probability of both events A and B occurring together.
• P(B) is the probability of event B happening.
Conditional Probability applications:
Conditional probability is useful in various real-world applications, such as:

1. Medical Diagnosis: The probability of a patient having a particular disease given the results of a medical test.

2. Weather Forecasting: The probability of rain tomorrow given today's weather conditions.

3. Finance: Assessing the probability of a stock price increasing given certain economic indicators.

4. Quality Control: The probability of a product being defective given a specific manufacturing process.

5. Information Retrieval: Recommender systems in e-commerce websites calculate the probability of a user liking a
product based on their past preferences.

Conditional probability is a fundamental concept in probability theory and is essential for making informed decisions in
situations where events depend on each other.
Bayes Theorem:
We should have prerequisite knowledge of conditional probability.

• Bayes theorem is also known with some other name such as Bayes rule or Bayes Law.

• Bayes theorem helps to determine the probability of an event with random knowledge.

• It is used to calculate the probability of occurring one event while other one already occurred.

• It is a best method to relate the condition probability and marginal probability.

• Bayes theorem is extensively applied in financial, in health and medical, research and survey industry, aeronautical
sector, etc.

• a simplified version of Bayes theorem (Naïve Bayes classification) is also used to reduce computation time and average
cost of the projects.
Bayes Theorem:
Bayes' Theorem, named after the 18th-century statistician and philosopher Thomas Bayes, is a fundamental principle in probability theory

and statistics. It provides a way to update the probability for a hypothesis (or event) based on new evidence. The theorem can be stated as

follows:

P(A | B) = [P(B | A) * P(A)] / P(B)

Where: P(A | B) is called as posterior, which we need to calculate. It is defined as updated probability after considering the evidence.

P(B | A) is called the likelihood. It is the probability of evidence when hypothesis is true.

P(A) is the prior probability of event A, probability of hypothesis before considering the evidence.

P(B) is called marginal probability. It is defined as the probability of evidence under any consideration.

Hence, Bayes Theorem can be written as:

posterior = likelihood * prior / evidence


Bayes Theorem:
Bayes' Theorem allows you to update your prior beliefs (prior probability) by incorporating new evidence (likelihood) to calculate the
revised probability (posterior probability) of the hypothesis of interest. The marginal probability represents the overall probability of
observing the evidence, taking into account all possible hypotheses.

P(A) is prior probability : Probability of hypothesis before observing the evidence.

P(B | A) is likelihood probability : Probability of the evidence B given that the probability of a hypothesis A is true.

P(A | B) is posterior probability: Probability of hypothesis A on the observed event B.

P(B) is marginal probability: The marginal probability is the overall probability of observing evidence B.

P(B) = Σ [P(Ai) * P(B | Ai)]


Naïve Bayes Classifier:
• Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes theorem and used for solving

classification problems.

• It is mainly used in text classification that includes a high-dimensional training dataset.

• Naïve Bayes Classifier is one of the simple and most effective Classification algorithms which helps in building the fast

machine learning models that can make quick predictions.

• It is a probabilistic classifier, which means it predicts on the basis of the probability of an object.

• Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental analysis, and classifying articles.
Why is it called Naïve Bayes?
• The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, Which can be described as:

• Naïve: It is called Naïve because it assumes that the occurrence of a certain feature is independent of the occurrence of

other features. Such as if the fruit is identified on the bases of color, shape, and taste, then red, spherical, and sweet fruit

is recognized as an apple. Hence each feature individually contributes to identify that it is an apple without depending on

each other.

• Bayes: It is called Bayes because it depends on the principle of Bayes' Theorem.


How Do Naive Bayes Algorithms Work?
• I have a training data set of weather and corresponding target variable ‘Play’ (suggesting possibilities of playing).

• Now, we need to classify whether players will play or not based on weather condition.

• Let’s follow the below steps to perform it.

1. Convert the data set into a frequency table: In this first step data set is converted into a frequency table

2. Create Likelihood table by finding the probabilities: Create Likelihood table by finding the probabilities like Overcast

probability = 0.29 and probability of playing is 0.64.


How Do Naive Bayes Algorithms Work?
How Do Naive Bayes Algorithms Work?
• Use Naive Bayesian equation to calculate the posterior probability
• Now, use Naive Bayesian equation to calculate the posterior probability for each class. The class with the
highest posterior probability is the outcome of the prediction.

• Problem: Players will play if the weather is sunny. Is this statement correct?
• We can solve it using the above-discussed method of posterior probability.
P(Yes | Sunny) = P( Sunny | Yes) * P(Yes) / P (Sunny)

• Here we have P (Sunny |Yes) = 3/9 = 0.33, P(Sunny) = 5/14 = 0.36, P( Yes)= 9/14 = 0.64

• Now, P (Yes | Sunny) = 0.33 * 0.64 / 0.36 = 0.60, which has higher probability.
How Do Naive Bayes Algorithms Work?
How Do Naive Bayes Algorithms Work?
How Do Naive Bayes Algorithms Work?
How Do Naive Bayes Algorithms Work?
How Do Naive Bayes Algorithms Work?
How Do Naive Bayes Algorithms Work?
Pros and Cons of Naive Bayes:
Pros:
• It is easy and fast to predict class of test data set. It also perform well in multi class prediction
• When assumption of independence holds, the classifier performs better compared to other machine learning
models like logistic regression or decision tree, and requires less training data.
• It perform well in case of categorical input variables compared to numerical variable(s). For numerical
variable, normal distribution is assumed (bell curve, which is a strong assumption).
Cons:
• If categorical variable has a category (in test data set), which was not observed in training data set, then model
will assign a 0 (zero) probability and will be unable to make a prediction. This is often known as “Zero
Frequency”. To solve this, we can use the smoothing technique. One of the simplest smoothing techniques is
called Laplace estimation.
• On the other side, Naive Bayes is also known as a bad estimator, so the probability outputs from
predict_proba are not to be taken too seriously.
• Another limitation of this algorithm is the assumption of independent predictors. In real life, it is almost
impossible that we get a set of predictors which are completely independent.
Applications of Naive Bayes Algorithms:
• Real-time Prediction: Naive Bayesian classifier is an eager learning classifier and it is super fast. Thus, it
could be used for making predictions in real time.

• Multi-class Prediction: This algorithm is also well known for multi class prediction feature. Here we can
predict the probability of multiple classes of target variable.

• Text classification/ Spam Filtering/ Sentiment Analysis: Naive Bayesian classifiers mostly used in text
classification (due to better result in multi class problems and independence rule) have higher success rate as
compared to other algorithms. As a result, it is widely used in Spam filtering (identify spam e-mail) and
Sentiment Analysis (in social media analysis, to identify positive and negative customer sentiments)

• Recommendation System: Naive Bayes Classifier and Collaborative Filtering together builds a
Recommendation System that uses machine learning and data mining techniques to filter unseen information
and predict whether a user would like a given resource or not.
Ensemble Learning:
• Ensemble learning is a machine learning technique that enhances accuracy and resilience in forecasting by merging
predictions from multiple models.

• It aims to mitigate errors or biases that may exist in individual models by leveraging the collective intelligence of the
ensemble.

• The underlying concept behind ensemble learning is to combine the outputs of diverse models to create a more precise
prediction. By considering multiple perspectives and utilizing the strengths of different models, ensemble learning
improves the overall performance of the learning system.

• This approach not only enhances accuracy but also provides resilience against uncertainties in the data.

• By effectively merging predictions from multiple models, ensemble learning has proven to be a powerful tool in
various domains, offering more robust and reliable forecasts.
Ensemble Learning Prediction Techniques:
• Max Voting

• Averaging

• Weighted Averaging
Ensemble Learning Prediction Techniques:
Max Voting: A Voting Classifier is an ensemble machine learning technique that combines the predictions from multiple
individual classifiers (also known as base classifiers or estimators) to make a final prediction.

• It’s a type of model averaging approach where each base classifier contributes its prediction, and the final prediction
is determined by a majority vote (for classification) or an average (for regression).

• The Voting Classifier can be used for both binary and multiclass classification tasks.
Types of Voting Classifiers:
• Hard Voting: In hard voting, each base classifier’s prediction is treated as a vote, and the final prediction is the
majority vote among the predictions of the individual classifiers. This is commonly used for classification tasks.

• Soft Voting: In soft voting, each base classifier’s predicted probabilities for each class are averaged, and the class with
the highest average probability is chosen as the final prediction. Soft voting often produces better results than hard
voting because it takes into account the confidence levels of the classifiers.
Ensemble Learning Prediction Techniques:
Max Voting:

• A Voting Classifier is an ensemble machine learning technique that combines the predictions from multiple individual
classifiers (also known as base classifiers or estimators) to make a final prediction.

• It’s a type of model averaging approach where each base classifier contributes its prediction, and the final prediction
is determined by a majority vote (for classification) or an average (for regression). The Voting Classifier can be used
for both binary and multiclass classification tasks.

• For example, when you asked 5 of your colleagues to rate your movie (out of 5); we’ll assume three of them rated it as
4 while two of them gave it a 5. Since the majority gave a rating of 4, the final rating will be taken as 4. You can
consider this as taking the mode of all the predictions.

• The result of max voting would be something like this:

Colleague 1 Colleague 2 Colleague 3 Colleague 4 Colleague 5 Final rating


5 4 5 4 4 4
Averaging:
Similar to the max voting technique, multiple predictions are made for each data point in averaging.

In this method, we take an average of predictions from all the models and use it to make the final prediction.

Averaging can be used for making predictions in regression problems or while calculating probabilities for classification
problems.

For example, in the below case, the averaging method would take the average of all the values.

i.e. (5+4+5+4+4)/5 = 4.4

Colleague 1 Colleague 2 Colleague 3 Colleague 4 Colleague 5 Final rating

5 4 5 4 4 4.4
Weighted Average:
This is an extension of the averaging method. All models are assigned different weights defining the importance of each
model for prediction.

For instance, if two of your colleagues are critics, while others have no prior experience in this field, then the answers by
these two friends are given more importance as compared to the other people.

The result is calculated as [(5*0.23) + (4*0.23) + (5*0.18) + (4*0.18) + (4*0.18)] = 4.41.

Colleague 1 Colleague 2 Colleague 3 Colleague 4 Colleague 5 Final rating

weight 0.23 0.23 0.18 0.18 0.18


rating 5 4 5 4 4 4.41
Ensemble learning technique classification:
Ensemble learning is a machine learning technique where multiple models are combined to improve the overall
performance of the system. The basic idea is that by combining multiple models, each model capturing different aspects
of the data, the ensemble model can often achieve better predictive performance than any individual model.

There are two main types of ensemble learning techniques: bagging and boosting.
Bagging (Bootstrap Aggregating):
• Bagging involves training multiple instances of the same base learning algorithm on different subsets of the training
data.

• Each subset is sampled with replacement from the original dataset, which means that some instances may be repeated
while others may be left out.

• The final prediction is typically made by averaging the predictions of all the models (for regression) or by taking a
majority vote (for classification).

• Random Forest is a popular algorithm based on bagging, where decision trees are the base learners.
Bagging (Bootstrap Aggregating):
• Step 1: Generate Bootstrap Samples of same size (with replacement) are repeatedly taken from the training data set, so
that each record has an equal probability of being selected.

• Step 2: A classification or estimation model is trained on each bootstrap sample drawn in Step 1, and a prediction is
recorded for each sample.

• Step 3: The bagging ensemble prediction is then defined to be the class with the most votes in Step 2 (for
classification models) or the average of the predictions made in Step 2 (for estimation models)
Bagging (Bootstrap Aggregating):
Boosting:
• Boosting involves sequentially training multiple weak learners (models that are only slightly better than
random guessing) to correct the errors made by the previous models in the sequence.
• Each subsequent model focuses more on the instances that were misclassified by the previous models, thereby
reducing the overall error.
• Unlike bagging, the base learners are trained sequentially, and each subsequent model tries to correct the
mistakes of the previous ones.
• Some popular boosting algorithms include AdaBoost (Adaptive Boosting), Gradient Boosting Machines
(GBM), and XGBoost.
Boosting:
Step 1: All observations have equal weight in the original training data set D1. An initial “base” classifier h1 is determined.

Step 2: The observations that were incorrectly classified by the previous base classifier have their weights increased, while
the observations that were correctly classified have their weights decreased.
This gives us data distribution Dm, m=2, … , M.
A new base classifier hm, m = 2, … , M is determined, based on the new weights. This step is repeated until the desired
number of iterations M is achieved.

Step 3: The final boosted classifier is the weighted sum of the M base classifiers.

Generalization & Test Error:


Generalization error refers to the performance on examples that are not seen by the weak learners during the training. It is
the probability of misclassifying a new example by the base classifier.

Test error is the faction of mistakes on a newly sampled test set.


Boosting:
Bagging vs Boosting:
The bagging method will not present better result in bias when there is a challenge of low performance. However, since
the boosting method concentrates on the optimization of the strengths of individual model and reducing the weakness of
a single model, booting model gives output with low errors.

When facing the challenge of overfitting in a single model, the bagging method out performs the boosting method as the
later model itself comes with over-fitting.
Bagging VS Boosting:
Reference:
1. https://www.analyticsvidhya.com/blog/2018/03/introduction-k-neighbours-algorithm-clustering/

2. https://www.javatpoint.com/clustering-in-machine-learning

3. https://www.shiksha.com/online-courses/articles/types-of-clustering-algorithm-scenario-you-must-know-as-a-data-
scientist/

4. https://enstoa.com/blog/machine-learning-construction-how-clustering-data-can-improve-processes-part-2-of-2

You might also like