0% found this document useful (0 votes)
42 views5 pages

Statistics Project

The AdaBoost algorithm builds models sequentially and focuses more on misclassified examples by adjusting example weights. It uses decision stumps as weak learners and combines them into a single model. The SVM algorithm finds the optimal hyperplane that separates classes with the maximum margin and can perform linear and non-linear classification using kernels.

Uploaded by

Test User
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views5 pages

Statistics Project

The AdaBoost algorithm builds models sequentially and focuses more on misclassified examples by adjusting example weights. It uses decision stumps as weak learners and combines them into a single model. The SVM algorithm finds the optimal hyperplane that separates classes with the maximum margin and can perform linear and non-linear classification using kernels.

Uploaded by

Test User
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

AdaBoost Algorithm:

AdaBoost also called Adaptive Boosting is a technique in Machine


Learning used as an Ensemble Method. The most common algorithm
used with AdaBoost is decision trees with one level that means with
Decision trees with only 1 split. These trees are also called Decision
Stumps.

What this algorithm does is that it builds a model and gives equal
weights to all the data points. It then assigns higher weights to points that are wrongly classified.
Now all the points which have higher weights are given more importance in the next model. It will
keep training models until and unless a lower
error is received.

How AdaBoost Works:

Step 1: A weak classifier (e.g. a decision stump) is made on top of the training data based on the
weighted samples. Here, the weights of each sample indicate how important it is to be correctly
classified. Initially, for the first stump, we give all the samples equal weights.

Step 2: We create a decision stump for each variable and see how well each stump classifies samples
to their target classes. For example, in the diagram below we check for Age, Eating Junk Food, and
Exercise. We'd look at how many samples are correctly or incorrectly classified as Fit or Unfit for
each individual stump.

Step 3: More weight is assigned to the incorrectly classified samples so that they're classified
correctly in the next decision stump. Weight is also assigned to each classifier based on the accuracy
of the classifier, which means high accuracy = high weight!

Step 4: Reiterate from Step 2 until all the data points have been correctly classified, or the maximum
iteration level has been reached.

AdaBoost Formula:

Here comes the hair-tugging part. Let's break AdaBoost down, step-by-step and equation-by-
equation so that it's easier to comprehend.

Let's start by considering a dataset with N points, or rows, in our dataset.

In this case,

n is the dimension of real numbers, or the number of attributes in our dataset

x is the set of data points

y is the target variable which is either -1 or 1 as it is a binary classification problem, denoting the first
or the second class (e.g. Fit vs Not Fit)
We calculate the weighted samples for each data point. AdaBoost assigns weight to each training
example to determine its significance in the training dataset. When the assigned weights are high,
that set of training data points are likely to have a larger say in the training set. Similarly, when the
assigned weights are low, they have a minimal influence in the training dataset.

Initially, all the data points will have the same weighted sample w:

Where, N is the total number of data points.

The weighted samples always sum to 1, so the value of each individual weight will always lie
between 0 and 1. After this, we calculate the actual influence for this classifier in classifying the data
points using the formula:

Alpha is how much influence this stump will have in the final classification. Total Error is nothing but
the total number of misclassifications for that training set divided by the training set size. We can
plot a graph for Alpha by plugging in various values of Total Error ranging from 0 to 1.

After plugging in the actual values of Total Error for each stump, it's time for us to update the sample
weights which we had initially taken as 1/N for every data point. We'll do this using the following
formula:

In other words, the new sample weight will be equal to the old sample weight multiplied by Euler's
number, raised to plus or minus alpha (which we just calculated in the previous step).

The two cases for alpha (positive or negative) indicate:

Alpha is positive when the predicted and the actual output agree (the sample was classified
correctly). In this case we decrease the sample weight from what it was before, since we're already
performing well.
Alpha is negative when the predicted output does not agree with the actual class (i.e. the sample is
misclassified). In this case we need to increase the sample weight so that the same misclassification
does not repeat in the next stump. This is how the stumps are dependent on their predecessors.

Support Vector Machine (SVM):


“Support Vector Machine” (SVM) is a supervised machine learning algorithm that can be used for
both classification and regression challenges. However, it is mostly used in classification problems. In
the SVM algorithm, we plot each data item as a point in n-dimensional space (where n is a number
of features you have) with the value of each feature being the value of a particular coordinate. Then,
we perform classification by finding the hyper-plane that differentiates the two classes very well.

Support Vectors are simply the coordinates of individual observation. The SVM classifier is a frontier
that best segregates the two classes (hyper-plane/ line).

Let’s understand:

Identify the right hyper-plane (Scenario-1): Here, we have three hyper-planes (A, B, and C). Now,
identify the right hyper-plane to classify stars and circles.

We need to remember a thumb rule to identify the right hyper-plane: “Select the hyper-plane which
segregates the two classes better”. In this scenario, hyper-plane “B” has excellently performed this
job.

Identify the right hyper-plane (Scenario-2): Here, we have three hyper-planes (A, B, and C) and all
are segregating the classes well.

Here, maximizing the distances between nearest data point (either class) and hyper-plane will help
us to decide the right hyper-plane. This distance is called as Margin. Let’s look at the below
snapshot:
Above, we can see that the margin for hyper-plane C is high as compared to both A and B. Hence, we
name the right hyper-plane as C. Another lightning reason for selecting the hyper-plane with higher
margin is robustness. If we select a hyper-plane having low margin then there is high chance of miss-
classification.

Identify the right hyper-plane (Scenario-3): Use the rules as discussed in previous section to identify
the right hyper-plane

Some of we may have selected the hyper-plane B as it has higher margin compared to A. But, here is
the catch, SVM selects the hyper-plane which classifies the classes accurately prior to maximizing
margin. Here, hyper-plane B has a classification error and A has classified all correctly. Therefore, the
right hyper-plane is A.

Can we classify two classes (Scenario-4)?: Below, I am unable to segregate the two classes using a
straight line, as one of the stars lies in the territory of other(circle) class as an outlier.

As I have already mentioned, one star at other end is like an outlier for star class. The SVM algorithm
has a feature to ignore outliers and find the hyper-plane that has the maximum margin. Hence, we
can say, SVM classification is robust to outliers.

Find the hyper-plane to segregate to classes (Scenario-5): In the scenario below, we can’t have linear
hyper-plane between the two classes, so how does SVM classify these two classes? Till now, we have
only looked at the linear hyper-plane.
SVM can solve this problem. Easily! It solves this problem by introducing additional feature. Here, we
will add a new feature z=x^2+y^2. Now, let’s plot the data points on axis x and z:

In above plot, points to consider are:

 All values for z would be positive always because z is the squared sum of both x and y
 In the original plot, red circles appear close to the origin of x and y axes, leading to lower
value of z and star relatively away from the origin result to higher value of z.

In the SVM classifier, it is easy to have a linear hyper-plane between these two classes. But, another
burning question which arises is, should we need to add this feature manually to have a hyper-plane.
No, the SVM algorithm has a technique called the kernel trick. The SVM kernel is a function that
takes low dimensional input space and transforms it to a higher dimensional space i.e. it converts
not separable problem to separable problem. It is mostly useful in non-linear separation problem.
Simply put, it does some extremely complex data transformations, then finds out the process to
separate the data based on the labels or outputs you’ve defined.

When we look at the hyper-plane in original input space it looks like a circle:

You might also like