0% found this document useful (0 votes)

8 views24 pages

Unit 4

Unit IV covers ensemble techniques and unsupervised learning, focusing on model combination schemes like bagging, boosting, and stacking, as well as K-Nearest Neighbor (KNN) algorithm. It explains the workings of KNN, its advantages and disadvantages, and the importance of selecting the appropriate value of K. Additionally, it discusses Gaussian mixture models and the Expectation-Maximization algorithm for handling incomplete data.

Uploaded by

inban0405

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views24 pages

Unit 4

Uploaded by

inban0405

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

UNIT IV

ENSEMBLE TECHNIQUES AND UNSUPERVISED LEARNING

Combining multiple learners: Model combination schemes, Voting, Ensemble Learning -
bagging, boosting, stacking,
Unsupervised learning: K-means, Instance Based Learning: KNN, Gaussian mixture
models and Expectation maximization

Combining multiple learners:

Model combination schemes,

There are also different ways the multiple base-learners are combined to generate the final
output:
● Multiexpert Combination
○ Multiexpert combination methods have base-learners that work in parallel.
○ These methods can in turn be divided into two:
■ In the global approach, also called learner fusion,
● given an input,
● all base-learners generate an output and
● all these outputs are used.
● Examples are voting and stacking.
■ In the local approach, or learner selection,
● for example, in mixture of experts, there is a gating model, which
looks at the input and chooses one (or very few) of the learners
as responsible for generating the output

● Multistage combination
○ Multistage combination methods use a serial approach where
■ The next combination base-learner is trained with or tested on only
the instances where the previous base-learners are not accurate
enough.
■ The idea is that the base-learners (or the different representations
they use) are sorted in increasing complexity so that a complex
base-learner is not used (or its complex representation is not extracted)
unless the preceding simpler base-learners are not confident.
○ An example is cascading
—------------------------------------------

K-Nearest Neighbor(KNN)
● K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on
Supervised Learning technique.
● K-NN algorithm
○ assumes the similarity between the new case/data and available cases and
○ put the new case into the category that is most similar to the available categories.
● K-NN algorithm stores all the available data and classifies a new data point based on the
similarity. This means when new data appears then it can be easily classified into a well
suite category by using K- NN algorithm.
● K-NN algorithm can be used for Regression as well as for Classification but mostly it
is used for the Classification problems.
● K-NN is a non-parametric algorithm, which means it does not make any assumption on
underlying data.
● It is also called a lazy learner algorithm because it does not learn from the training set
immediately instead it stores the dataset and at the time of classification, it performs an
action on the dataset.
● KNN algorithm at the training phase just stores the dataset and when it gets new data,
then it classifies that data into a category that is much similar to the new data.
● Example: Suppose, we have an image of a creature that looks similar to cat and dog, but
we want to know either it is a cat or dog. So for this identification, we can use the KNN
algorithm, as it works on a similarity measure. Our KNN model will find the similar
features of the new data set to the cats and dogs images and based on the most similar
features it will put it in either cat or dog category.

Why do we need a K-NN Algorithm?

Suppose there are two categories, i.e., Category A and Category B, and

● we have a new data point x1, so this data point will lie in which of these
categories.

To solve this type of problem, we need a K-NN algorithm.

With the help of K-NN, we can easily identify the category or class of a particular
dataset. Consider the below diagram:

How does K-NN work?

The K-NN working can be explained on the basis of the below algorithm:

- Step-1: Select the number K of the neighbors

- Step-2: Calculate the Euclidean distance of K number of neighbors
- Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
- Step-4: Among these k neighbors, count the number of the data points in each category.
- Step-5: Assign the new data points to that category for which the number of the neighbor
is maximum.
Suppose we have a new data point and we need to put it in the required category.
Consider the below image:

○ Firstly, we will choose the number of neighbors, so we will choose the k=5.

○ Next, we will calculate the Euclidean distance between the data points. The
Euclidean distance is the distance between two points, which we have already
studied in geometry. It can be calculated as:
○ By calculating the Euclidean distance we got the nearest neighbors, as three
nearest neighbors in category A and two nearest neighbors in category B.
Consider the below image:
○ As we can see the 3 nearest neighbors are from category A, hence this new data
point must belong to category A.

Suppose we have height, weight and T-shirt size of some customers

and we need to predict the T-shirt size of a new customer given only
height and weight information we have. Data including height, weight
and T-shirt size information is shown below -
Height (in cms) Weight (in kgs) T Shirt Size
158 58 M
158 59 M
158 63 M
160 59 M
160 60 M
163 60 M
163 61 M
160 64 L
163 64 L
165 61 L
165 62 L
165 65 L
168 62 L
168 63 L
168 66 L
170 63 L
170 64 L
L
170 68

How to select the value of K in the K-NN Algorithm?

○ The most preferred value for K is 5.

○ A very low value for K such as K=1 or K=2, can be noisy and lead to the effects of
outliers in the model.

○ Large values for K are good, but it may have some difficulties.
Advantages of KNN Algorithm:

○ It is simple to implement.

○ It is robust to the noisy training data

○ It can be more effective if the training data is large.

○ Easy to understand

○ No assumptions about data

○ Can be applied to both classification and regression

○ Works easily on multi-class problems

Disadvantages of KNN Algorithm:

○ Always needs to determine the value of K which may be complex some time.

○ The computation cost is high because of calculating the distance between the
data points for all the training samples.

○ Memory Intensive / Computationally expensive

○ Sensitive to scale of data

○ Does not work well on rare event (skewed) target variable

○ Struggle when high number of independent variables

○
Example-
Height (in cms) Weight (in kgs) T Shirt Size
158 58 M
158 59 M
158 63 M
160 59 M
160 60 M
163 60 M
163 61 M
160 64 L
163 64 L
165 61 L
165 62 L
165 65 L
168 62 L
168 63 L
168 66 L
170 63 L
170 64 L
L
170 68
Height 161 : Weight : 61 Predict the T Shirt Size

Ensemble techniques
● methods that use multiple learning algorithms or models to produce one optimal
predictive model.
● The model produced has better performance than the base learners taken alone.
● Other applications of ensemble learning also include selecting the important features,
data fusion, etc.
● Ensemble techniques can be primarily classified into Bagging, Boosting, and Stacking.
Here, m represents a weak learner; d1, d2, d3, d4 are the random samples from Data D; d’, d”,
d”’ are updated training data based on the results from the previous weak learner.

1. Bagging:
● Bagging is mainly applied in supervised learning problems.
● It involves two steps, i.e., bootstrapping and aggregation.
● Bootstrapping is a random sampling method in which samples are derived from the data
using the replacement procedure.
● In Fig 1., the first step in bagging is bootstrapping, where random data samples are fed
to each base learner.
○ The base learning algorithm is run on the samples to complete the procedure.
● In Aggregation, the outputs from the base learners are combined.
● The goal is to increase the accuracy meanwhile reducing variance to a large extent.
● Eg.- RandomForest where the predictions from decision trees(base learners) are taken
parallelly.
● In the case of regression problems, these predictions are averaged to give the final
prediction and
● in the case of classification problems, the mode is selected as the predicted class.

2. Boosting:
● It is an ensemble method in which each predictor learns from preceding predictor
mistakes to make better predictions in the future.
● The technique combines several weak base learners that are arranged in a sequential
(Fig 1.) manner such that weak learners learn from the previous weak learner’s errors to
create a better predictive model.
● Hence one strong learner is formed through significantly improving the predictability of
models.
● Eg. – XGBoost, AdaBoost.
Bagging Boosting

Various training data subsets are randomly Each new subset contains the

drawn with replacement from the whole components that were misclassified by

training dataset. previous models.

Bagging attempts to tackle the over-fitting Boosting tries to reduce bias.

issue.

If the classifier is unstable (high variance), If the classifier is steady and

then we need to apply bagging. straightforward (high bias), then we

need to apply boosting.

Every model receives an equal weight. Models are weighted by their

performance.

Objective to decrease variance, not bias. Objective to decrease bias, not

variance.

It is the easiest way of connecting It is a way of connecting predictions

predictions that belong to the same type. that belong to the different types.

Every model is constructed independently. New models are affected by the

performance of the previously

developed model.

3. Stacking:
The number of weak learners in the stack is variable.

● While bagging and boosting used homogenous weak learners for ensemble,
● Stacking often considers heterogeneous weak learners, learns them in parallel, and
combines them by training a meta-learner to output a prediction based on the different
weak learner’s predictions.
● A meta learner inputs the predictions as the features and the target being the ground
truth values in data D(Fig 2.), it attempts to learn how to best combine the input
predictions to make a better output prediction.

In averaging ensemble eg. Random Forest, the model combines the predictions from multiple
trained models. A limitation of this approach is that each model contributes the same amount to
the ensemble prediction, irrespective of how well the model performed. An alternate approach is
a weighted average ensemble, which weighs the contribution of each ensemble member by the
trust on their contribution in giving the best predictions. The weighted average ensemble
provides an improvement over the model average ensemble.

A further generalization of this approach is replacing the linear weighted sum with Linear
Regression (regression problem) or Logistic Regression (classification problem) to
combine the predictions of the sub-models with any learning algorithm.
This approach is called Stacking.

In stacking, an algorithm takes the outputs of sub-models as input and attempts to learn how to
best combine the input predictions to make a better output prediction.

Stacking for Machine Learning

Fig 3. The stacked model with meta learner = Logistic Regression and the weak
learners = Decision Tree, Random Forest, K Neighbors Classifier and XGBoost

Voting

Hard voting
● Hard voting is also known as majority voting.
● The base model's classifiers are fed with the training data individually.
● The models predict the output class independent of each other.
● The output class is a class expected by the majority of the models.

In the above figure, Pf is the class predicted by the majority of the classifiers Cm.

Soft voting
● In Soft voting, Classifiers or base models are fed with training data to predict the
classes out of m possible courses.
● Each base model classifier independently assigns the probability of occurrence
of each type.
● In the end, the average of the possibilities of each class is calculated, and the
final output is the class having the highest probability.
Gaussian mixture models

● Suppose there are set of data points that need to be grouped into several parts or
clusters based on their similarity. In machine learning, this is known as Clustering.
● There are several methods available for clustering:
○ K Means Clustering
○ Hierarchical Clustering
○ Gaussian Mixture Models
● Normal or Gaussian Distribution
○ In real life, many datasets can be modeled by Gaussian Distribution (Univariate
or Multivariate).
○ So it is quite natural and intuitive to assume that the clusters come from different
Gaussian Distributions.
○ it is tried to model the dataset as a mixture of several Gaussian Distributions.

○
●
Gaussian Mixture Model

Suppose there are K clusters (For the sake of simplicity here it is assumed that
the number of clusters is known and it is K). So and is also estimated for
each k. Had it been only one distribution, they would have been estimated by
the maximum-likelihood method. But since there are K such clusters and the
probability density is defined as a linear function of densities of all these K
distributions, i.e.

where is the mixing coefficient for k-th distribution.

Expectation maximization
The Expectation-Maximization (EM) algorithm is an iterative way to find
maximum-likelihood estimates for model parameters when the data is incomplete
or has some missing data points or has some hidden variables. EM chooses
some random values for the missing data points and estimates a new set of data.
These new values are then recursively used to estimate a better first date, by
filling up missing points, until the values get fixed.
These are the two basic steps of the EM algorithm, namely E Step or
Expectation Step or Estimation Step and M Step or Maximization Step.

1. Given a set of incomplete data, consider a set of starting parameters.

2. Expectation step (E – step): Using the observed available data of the

dataset, estimate (guess) the values of the missing data.

3. Maximization step (M – step): Complete data generated after the

expectation (E) step is used in order to update the parameters.

4. Repeat step 2 and step 3 until convergence.

Usage of EM algorithm –

● It can be used to fill the missing data in a sample.

● It can be used as the basis of unsupervised learning of clusters.

● It can be used for the purpose of estimating the parameters of Hidden

Markov Model (HMM).

● It can be used for discovering the values of latent variables.

Advantages of EM algorithm –

● It is always guaranteed that likelihood will increase with each iteration.

● The E-step and M-step are often pretty easy for many problems in

terms of implementation.

● Solutions to the M-steps often exist in the closed form.

Disadvantages of EM algorithm –

● It has slow convergence.

● It makes convergence to the local optima only.

● It requires both the probabilities, forward and backward (numerical

optimization requires only forward probability).

Voting
The simplest way to combine multiple classifiers is by voting, which corresponds to taking a
linear combination of the learner
This is also known as ensembles and linear opinion pools. In the simplest case, all learners are
given equal weight and we have simple voting
that corresponds to taking an average. Still, taking a (weighted) sum is
only one of the possibilities and there are also other combination rules,
as shown in table 17.1 (Kittler et al. 1998). If the outputs are not posterior probabilities, these
rules require that outputs be normalized to the
same scale

Example of combination rules on three learners and three classes.

Sum rule is the most intuitive
and is the most widely used in practice. Median rule is more robust to
outliers; minimum and maximum rules are pessimistic and optimistic, respectively. With the
product rule, each learner has veto power; regardless
of the other ones, if one learner has an output of 0, the overall output
goes to 0. Note that after the combination rules, yi do not necessarily
sum up to 1.

In weighted sum, dji is the vote of learner j for class Ci and wj is the
weight of its vote. Simple voting is a special case where all voters have
equal weight, namely, wj = 1/L. In classification, this is called plurality
voting where the class having the maximum number of votes is the winner. When there are two
classes, this is majority voting where the winn

2024 Aboitiz Integrated Report Final
No ratings yet
2024 Aboitiz Integrated Report Final
121 pages
K-Nearest Neighbor (KNN) Algorithm For Machine Learning
No ratings yet
K-Nearest Neighbor (KNN) Algorithm For Machine Learning
17 pages
Mini Jolly Dali 20 Manual
No ratings yet
Mini Jolly Dali 20 Manual
6 pages
Supervised Machine Learning-Adi
No ratings yet
Supervised Machine Learning-Adi
51 pages
Get Peugeot 307 Service and Repair Manual Models covered Peugeot 307 Hatchback Estate SW including special limited editions 2001 2008 Y to 58 reg Petrol Diesel Martynn Randall PDF ebook with Full Chapters Now
100% (1)
Get Peugeot 307 Service and Repair Manual Models covered Peugeot 307 Hatchback Estate SW including special limited editions 2001 2008 Y to 58 reg Petrol Diesel Martynn Randall PDF ebook with Full Chapters Now
51 pages
Distance-Based Methods - KNN
No ratings yet
Distance-Based Methods - KNN
8 pages
ML Unit 5..
No ratings yet
ML Unit 5..
40 pages
AIML Unit-IV & V
100% (1)
AIML Unit-IV & V
47 pages
Resolution 2024-38 (Realignment Diesel)
No ratings yet
Resolution 2024-38 (Realignment Diesel)
2 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
130 pages
Decision Tree, Clustering
No ratings yet
Decision Tree, Clustering
73 pages
Republic Act No. 10846
No ratings yet
Republic Act No. 10846
1 page
ML Unit 2 (Ab22)
No ratings yet
ML Unit 2 (Ab22)
61 pages
ML Unit 3
No ratings yet
ML Unit 3
106 pages
Machine Learning Unit 3
No ratings yet
Machine Learning Unit 3
40 pages
Luvlygurumi Kitty (ING)
No ratings yet
Luvlygurumi Kitty (ING)
5 pages
Unit - 2 ML
No ratings yet
Unit - 2 ML
32 pages
CH 2
No ratings yet
CH 2
30 pages
I. Swot Analysis Facts of The Case Strengths Weaknesses Opportunity Threats
No ratings yet
I. Swot Analysis Facts of The Case Strengths Weaknesses Opportunity Threats
5 pages
K-Nearest Neighbor: Dr. Adven
No ratings yet
K-Nearest Neighbor: Dr. Adven
20 pages
DSV Ia2
No ratings yet
DSV Ia2
18 pages
ASTM A829 Steel Grades: General Product Description
No ratings yet
ASTM A829 Steel Grades: General Product Description
2 pages
Unit 4 AIML
No ratings yet
Unit 4 AIML
29 pages
FPA Unit 2
No ratings yet
FPA Unit 2
20 pages
Unit 6
No ratings yet
Unit 6
22 pages
2.unit 2 ML Q&A
No ratings yet
2.unit 2 ML Q&A
36 pages
ML Unit-2 (CEC)
No ratings yet
ML Unit-2 (CEC)
96 pages
Tutorial 1
No ratings yet
Tutorial 1
14 pages
UNIT 3 - Final
No ratings yet
UNIT 3 - Final
37 pages
Unit-4 Unsupervised Algorithm
No ratings yet
Unit-4 Unsupervised Algorithm
18 pages
Peter Velikov Petrov: Personal Details
No ratings yet
Peter Velikov Petrov: Personal Details
2 pages
ML-Unit 5
No ratings yet
ML-Unit 5
40 pages
K-Nearest Neighbor Classification-Algorithm and Characteristics
No ratings yet
K-Nearest Neighbor Classification-Algorithm and Characteristics
6 pages
3 Important Man Overboard Recovery Methods Used at Sea
No ratings yet
3 Important Man Overboard Recovery Methods Used at Sea
16 pages
Algorithm
No ratings yet
Algorithm
27 pages
Nonprofit Organizations
No ratings yet
Nonprofit Organizations
25 pages
Yunsu Han KNN K Means
No ratings yet
Yunsu Han KNN K Means
8 pages
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
No ratings yet
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
33 pages
ML Unit-2
No ratings yet
ML Unit-2
24 pages
ACME-LEAD Screws
No ratings yet
ACME-LEAD Screws
23 pages
Unit 3 - Supervise Learning Classification
No ratings yet
Unit 3 - Supervise Learning Classification
23 pages
Unit 3 Ds
No ratings yet
Unit 3 Ds
10 pages
Presentation UNIT-2 (Old)
No ratings yet
Presentation UNIT-2 (Old)
58 pages
Sayan Das - Machine Learning
No ratings yet
Sayan Das - Machine Learning
4 pages
ML Unit 2
No ratings yet
ML Unit 2
46 pages
ML Lec-10
No ratings yet
ML Lec-10
19 pages
'Machine Learning (Nagarjun)
No ratings yet
'Machine Learning (Nagarjun)
10 pages
Unit Ii
No ratings yet
Unit Ii
102 pages
CH 04 Classification Techniques
No ratings yet
CH 04 Classification Techniques
89 pages
Unit 5
No ratings yet
Unit 5
28 pages
U3 KNN
No ratings yet
U3 KNN
6 pages
Guideline On Submission of Amendment and Record Piling Plans PDF
No ratings yet
Guideline On Submission of Amendment and Record Piling Plans PDF
9 pages
P40-01-F21-1 Fire Safety Maintenance Plan & Log - Weekly (Enf, 21.01.01)
No ratings yet
P40-01-F21-1 Fire Safety Maintenance Plan & Log - Weekly (Enf, 21.01.01)
3 pages
Machine Learning Unit-3.1
No ratings yet
Machine Learning Unit-3.1
20 pages
Session 5
No ratings yet
Session 5
36 pages
Wcma Application Note
No ratings yet
Wcma Application Note
103 pages
Chapter 4
No ratings yet
Chapter 4
40 pages
Unit - II
No ratings yet
Unit - II
37 pages
ML Unit-2
No ratings yet
ML Unit-2
33 pages
ML Unit2
No ratings yet
ML Unit2
38 pages
Classification
No ratings yet
Classification
58 pages
Recruitment Process: Step 1: Application
No ratings yet
Recruitment Process: Step 1: Application
6 pages
3.1 K Nearest Neighbour Classifier
No ratings yet
3.1 K Nearest Neighbour Classifier
24 pages
Module 3
No ratings yet
Module 3
63 pages
Chaifetz, Jagger - 2014 - 40 Years of Dialogue On Food Sovereignty A Review and A Look Ahead
No ratings yet
Chaifetz, Jagger - 2014 - 40 Years of Dialogue On Food Sovereignty A Review and A Look Ahead
7 pages
UNIT 2 - Notes
No ratings yet
UNIT 2 - Notes
31 pages
Silt Measuring Instrument
No ratings yet
Silt Measuring Instrument
6 pages
MH12NR9505 PDF
No ratings yet
MH12NR9505 PDF
2 pages
Classification
No ratings yet
Classification
50 pages
Barton Liquid Level (Mechanical)
No ratings yet
Barton Liquid Level (Mechanical)
36 pages
Co-2 ML 2019
No ratings yet
Co-2 ML 2019
71 pages
TLP 3526
No ratings yet
TLP 3526
6 pages
MII Declaration Format
No ratings yet
MII Declaration Format
2 pages
Module Iii
No ratings yet
Module Iii
15 pages
Alv Output Editable and Saving The Data To Database
100% (1)
Alv Output Editable and Saving The Data To Database
10 pages
DW&M Unit 3 Part I
No ratings yet
DW&M Unit 3 Part I
101 pages
ML - Machine Learning PDF
No ratings yet
ML - Machine Learning PDF
13 pages
Accelerated Data Science Introduction To Machine Learning Algorithms
No ratings yet
Accelerated Data Science Introduction To Machine Learning Algorithms
37 pages
ML UNIT 2 Sir
No ratings yet
ML UNIT 2 Sir
46 pages
Marketing Research
No ratings yet
Marketing Research
4 pages
USA PCC Form Pages 2
No ratings yet
USA PCC Form Pages 2
1 page
Natural Gas Continuous
No ratings yet
Natural Gas Continuous
7 pages
Artificial Intelligence and Machine Learning
No ratings yet
Artificial Intelligence and Machine Learning
12 pages
Jntuk r20 ML Unit-II
No ratings yet
Jntuk r20 ML Unit-II
33 pages
AI & ML Unit 4 Notes
No ratings yet
AI & ML Unit 4 Notes
16 pages
Installing Guide For Citroen C4
No ratings yet
Installing Guide For Citroen C4
23 pages
Lecture 3
No ratings yet
Lecture 3
17 pages
CH1 Path D&R Agam
100% (1)
CH1 Path D&R Agam
34 pages
Sox Audit
No ratings yet
Sox Audit
8 pages

Unit 4

Uploaded by

Unit 4

Uploaded by

UNIT IV​

ENSEMBLE TECHNIQUES AND UNSUPERVISED LEARNING

Combining multiple learners:

Model combination schemes,

Why do we need a K-NN Algorithm?

To solve this type of problem, we need a K-NN algorithm.

How does K-NN work?

-​ Step-1: Select the number K of the neighbors

Suppose we have height, weight and T-shirt size of some customers

How to select the value of K in the K-NN Algorithm?

○​ The most preferred value for K is 5.

○​ It is robust to the noisy training data

○​ It can be more effective if the training data is large.

○​ No assumptions about data

○​ Can be applied to both classification and regression

○​ Works easily on multi-class problems

Disadvantages of KNN Algorithm:

○​ Memory Intensive / Computationally expensive

○​ Sensitive to scale of data

○​ Does not work well on rare event (skewed) target variable

○​ Struggle when high number of independent variables

training dataset. previous models.

Bagging attempts to tackle the over-fitting Boosting tries to reduce bias.

If the classifier is unstable (high variance), If the classifier is steady and

then we need to apply bagging. straightforward (high bias), then we

need to apply boosting.

Every model receives an equal weight. Models are weighted by their

Objective to decrease variance, not bias. Objective to decrease bias, not

It is the easiest way of connecting It is a way of connecting predictions

Every model is constructed independently. New models are affected by the

performance of the previously

Stacking for Machine Learning

where is the mixing coefficient for k-th distribution.

1.​ Given a set of incomplete data, consider a set of starting parameters.

dataset, estimate (guess) the values of the missing data.

expectation (E) step is used in order to update the parameters.

4.​ Repeat step 2 and step 3 until convergence.

●​ It can be used to fill the missing data in a sample.

●​ It can be used as the basis of unsupervised learning of clusters.

●​ It can be used for the purpose of estimating the parameters of Hidden

Markov Model (HMM).

●​ It can be used for discovering the values of latent variables.

●​ It is always guaranteed that likelihood will increase with each iteration.

●​ Solutions to the M-steps often exist in the closed form.

●​ It has slow convergence.

●​ It makes convergence to the local optima only.

●​ It requires both the probabilities, forward and backward (numerical

optimization requires only forward probability).

Example of combination rules on three learners and three classes.

You might also like

UNIT IV

- Step-1: Select the number K of the neighbors

○ The most preferred value for K is 5.

○ It is robust to the noisy training data

○ It can be more effective if the training data is large.

○ No assumptions about data

○ Can be applied to both classification and regression

○ Works easily on multi-class problems

○ Memory Intensive / Computationally expensive

○ Sensitive to scale of data

○ Does not work well on rare event (skewed) target variable

○ Struggle when high number of independent variables

1. Given a set of incomplete data, consider a set of starting parameters.

4. Repeat step 2 and step 3 until convergence.

● It can be used to fill the missing data in a sample.

● It can be used as the basis of unsupervised learning of clusters.

● It can be used for the purpose of estimating the parameters of Hidden

● It can be used for discovering the values of latent variables.

● It is always guaranteed that likelihood will increase with each iteration.

● Solutions to the M-steps often exist in the closed form.

● It has slow convergence.

● It makes convergence to the local optima only.

● It requires both the probabilities, forward and backward (numerical