ML Unit 3
ML Unit 3
Q)VOTING CLASSIFIERS?
A Voting Classifier is a machine learning model that trains on an ensemble of
numerous models and predicts an output (class) based on their highest
probability of chosen class as the output.
It simply aggregates the findings of each classifier passed into Voting Classifier
and predicts the output class based on the highest majority of voting. The idea
is instead of creating separate dedicated models and finding the accuracy for
each them, we create a single model which trains by these models and
predicts output based on their combined majority of voting for each output
class.
Voting Classifier supports two types of votings.
1. Hard Voting: In hard voting, the predicted output class is a class with the
highest majority of votes i.e the class which had the highest probability of
being predicted by each of the classifiers. Suppose three classifiers predicted
the output class(A, A, B), so here the majority predicted A as output. Hence A
will be the final prediction.
Example:
.
2. Soft Voting: In soft voting, the output class is the prediction based on the
average of probability given to that class. Suppose given some input to three
models, the prediction probability for class A = (0.30, 0.47, 0.53) and B =
(0.20, 0.32, 0.40). So the average for class A is 0.4333 and B is 0.3067, the
winner is clearly class A because it had the highest probability averaged by
each classifier.
Example:
……………… end………
Q) Bagging and pasting?
When sampling is performed with replacement, this method is called bagging
(short for bootstrap aggregating). When sampling is performed without
replacement, it is called pasting.
Both bagging and pasting allow training instances to be sampled several times
across multiple predictors, but only bagging allows training instances to be
sampled several times for the same predictor.
Once all predictors are trained, the ensemble can make a prediction for a new
instance by simply aggregating the predictions of all predictors. The
aggregation function is typically the statistical mode for classification or the
average for the regression.
Predictors can all be trained in parallel, via different CPU cores. Similarly,
predictions can be made in parallel. This is one of the reasons bagging and
pasting scale very well.
Pasting is an ensemble technique similar to bagging except for the fact that in
pasting sampling is done without replacement i.e. an observation can be
present in only one subset. Since pasting limits diversity of models its
performance with is suboptimal when compared to bagging, particularly in
case of small datasets. However, pasting is preferred over bagging in case of so
large datasets, owing to computational efficiency.
Q) Random forests?
The below diagram explains the working of the Random Forest algorithm:
Note: To better understand the Random Forest Algorithm, you should have
knowledge of the Decision Tree Algorithm.
Since the random forest combines multiple trees to predict the class of the
dataset, it is possible that some decision trees may predict the correct output,
while others may not. But together, all the trees predict the correct output.
Therefore, below are two assumptions for a better Random forest classifier:
o There should be some actual values in the feature variable of the dataset
so that the classifier can predict accurate results rather than a guessed
result.
o The predictions from each tree must have very low correlations.
Below are some points that explain why we should use the Random Forest
algorithm:
<="" li="">
The Working process can be explained in the below steps and diagram:
Step-2: Build the decision trees associated with the selected data points
(Subsets).
Step-3: Choose the number N for decision trees that you want to build.
Step-5: For new data points, find the predictions of each decision tree, and
assign the new data points to the category that wins the majority votes.
The working of the algorithm can be better understood by the below example:
Example: Suppose there is a dataset that contains multiple fruit images. So,
this dataset is given to the Random forest classifier. The dataset is divided into
subsets and given to each decision tree. During the training phase, each
decision tree produces a prediction result, and when a new data point occurs,
then based on the majority of results, the Random Forest classifier predicts
the final decision. Consider the below image:
Applications of Random Forest
There are mainly four sectors where Random forest mostly used:
Q) Boosting?
Boosting=highbias,low variance+additivelycombining.
Step 1: The base algorithm reads the data and assigns equal weight to each
sample observation.
Step 2: False predictions made by the base learner are identified. In the next
iteration, these false predictions are assigned to the next base learner with a
higher weightage on these incorrect predictions.
Step 3: Repeat step 2 until the algorithm can correctly classify the output.
Types Of Boosting
There are three main ways through which boosting can be carried out:
Adaptive Boosting
Gradient Boosting
The difference in this type of boosting is that the weights for misclassified
outcomes are not incremented, instead, Gradient Boosting method tries to
optimize the loss function of the previous learner by adding a new model that
adds weak learners in order to reduce the loss function.
The main idea here is to overcome the errors in the previous learner’s
predictions. This type of boosting has three main components:
Like AdaBoost, Gradient Boosting can also be used for both classification and
regression problems.
……………….. end………..
Q) STACKING ?
Q)naive bayes ?
Conditional probability:
Conditional Probability
Where,
P(A): The probability of hypothesis H being true. This is known as the prior
probability.
P(B): The probability of the evidence.
P(A|B): The probability of the evidence given that hypothesis is true.
P(B|A): The probability of the hypothesis given that the evidence is true.
Naive Bayes classifiers conclude that all the variables or features are not
related to each other. The Existence or absence of a variable does not impact
the existence or absence of any other variable.
Advantages:
Disadvantages:
The goal of the SVM algorithm is to create the best line or decision boundary
that can segregate n-dimensional space into classes so that we can easily put
the new data point in the correct category in the future. This best decision
boundary is called a hyperplane.
SVM chooses the extreme points/vectors that help in creating the hyperplane.
These extreme cases are called as support vectors, and hence algorithm is
termed as Support Vector Machine. Consider the below diagram in which
there are two different categories that are classified using a decision boundary
or hyperplane:
Example: SVM can be understood with the example that we have used in the
KNN classifier. Suppose we see a strange cat that also has some features of
dogs, so if we want a model that can accurately identify whether it is a cat or
dog, so such a model can be created by using the SVM algorithm. We will first
train our model with lots of images of cats and dogs so that it can learn about
different features of cats and dogs, and then we test it with this strange
creature. So as support vector creates a decision boundary between these two
data (cat and dog) and choose extreme cases (support vectors), it will see the
extreme case of cat and dog. On the basis of the support vectors, it will classify
it as a cat. Consider the below diagram:
SVM algorithm can be used for Face detection, image classification, text
categorization, etc.
Types of SVM
o Linear SVM: Linear SVM is used for linearly separable data, which
means if a dataset can be classified into two classes by using a single
straight line, then such data is termed as linearly separable data, and
classifier is used called as Linear SVM classifier.
o Non-linear SVM: Non-Linear SVM is used for non-linearly separated
data, which means if a dataset cannot be classified by using a straight
line, then such data is termed as non-linear data and classifier used is
called as Non-linear SVM classifier.
Hyperplane:
Support Vectors:
The data points or vectors that are the closest to the hyperplane and which
affect the position of the hyperplane are termed as Support Vector. Since these
vectors support the hyperplane, hence called a Support vector.
Linear SVM:
If data is linearly arranged, then we can separate it by using a straight line, but
for non-linear data, we cannot draw a single straight line. Consider the below
image:
So to separate these data points, we need to add one more dimension. For
linear data, we have used two dimensions x and y, so for non-linear data, we
will add a third dimension z. It can be calculated as:
z=x2 +y2
By adding the third dimension, the sample space will become as below image:
So now, SVM will divide the datasets into classes in the following way.
Consider the below image:
Since we are in 3-d Space, hence it is looking like a plane parallel to the x-axis.
If we convert it in 2d space with z=1, then it will become as:
Hence we get a circumference of radius 1 in case of non-linear data.