0% found this document useful (0 votes)
10 views46 pages

Ensemble Learning

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views46 pages

Ensemble Learning

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 46

• Inductive Learning, also known as Concept Learning, is how A.I.

systems
attempt to use a generalized rule to carry out observations.
• Inductive Learning Algorithms (APIs) are used to generate a set of
classification rules. These generated rules are in the "If this, then that"
format.
• There are four processes that the students go through when given an
inductive learning activity-(1) observe, (2) hypothesize, (3) collect
evidence, and (4) generalize.
• There are two methods for obtaining knowledge in the real world: first,
from domain experts, and second, from machine learning.
• Domain experts are not very useful or reliable for large amounts of data.
• machine learning, replicates the logic of 'experts' in algorithms, but this
work may be very complex, time-consuming, and expensive.
• Computational Learning Theory (CoLT) is a branch of Artificial
Intelligence study that focuses with formal studies on the design of
computer programmes that can learn.
• Essentially, it evaluates which issues may be learned using mathematical
frameworks and comprehends the theoretical background of
deep learning algorithms while increasing accuracy.
• Computational learning theory is a subfield of artificial intelligence
(AI) that is concerned with how computers can learn from data. The
main goals of computational learning theory are to develop algorithms
that can learn from data and to understand the limits of what can be
learned from data.

• Computational Learning theory is used in many fields such as statistics,


calculus and geometry, Information theory, programming optimizations
and many others.
• It is very similar to Statistical Learning Theory (SLT) as they both use
Mathematical Analysis.
• The basic difference between the two is that CoLT is basically concerned
with the “learnability” of the machines and the necessary steps that are
required to take in order to make a given task comprehensible for
an algorithm.
• SLT is more focused on studying and improving the accuracy of already
existing programs.
• The theory of computation provides the foundational understanding of
what can and cannot be computed, which is essential for developing AI
algorithms and systems. AI also draws on concepts from mathematical
logic, probability theory, and optimization, all of which have
connections to the theory of computation
• Computational Learning Theory is concerned with Supervised Learning,
which is a type of inductive learning in the field of Machine Learning that
maps an input to an output on the basis of existing input-output pairs.
• It provides a formal framework for accurately formulating and answering
questions about the performance of various learning algorithms, allowing
for thorough comparisons of both the predictive capacity and the
computational efficiency of alternative learning algorithms.
Ensemble learning helps enhance the performance of machine learning models.
Multiple machine learning models are combined to obtain a more accurate
model.
Bagging, boosting and stacking are the three most popular ensemble learning
techniques.
Each of these techniques offers a unique approach to improving predictive
accuracy.
A problem in machine learning is that individual models tend to perform poorly.
they tend to have low prediction accuracy. To mitigate this problem, we combine
multiple models to get one with a better performance.

The individual models that we combine are known as weak learners. We call
them weak learners because they either have a high bias or high variance.
Because they either have high bias or variance, weak learners cannot learn
efficiently and perform poorly.
• Both high bias and high variance models thus cannot generalize properly.
• weak learners will either make incorrect generalizations or fail to
generalize altogether. Because of this, the predictions of weak learners
cannot be relied on by themselves.
• bias-variance trade-off, an underfit model has high bias and low variance,
overfit model has high variance and low bias. In either case, there is no
balance between bias and variance. For there to be a balance, both the bias
and variance need to be low. Ensemble learning tries to balance this bias-
variance trade-off by reducing either the bias or the variance
• a machine learning model analyses the data, find patterns in it and make
predictions. While training, the model learns these patterns in the dataset and
applies them to test data for prediction. While making predictions, a
difference occurs between prediction values made by the model and actual
values/expected values, and this difference is known as bias errors or
Errors due to bias.
• Low Bias: A low bias model will make fewer assumptions about the form of
the target function.
• High Bias: A model with a high bias makes more assumptions, and the
model becomes unable to capture the important features of our dataset. A
high bias model also cannot perform well on new data.
• Some examples of machine learning algorithms with low bias are Decision
Trees, k-Nearest Neighbours and Support Vector Machines. At the same
time, an algorithm with high bias is Linear Regression, Linear
Discriminant Analysis and Logistic Regression.
• Ways to reduce High Bias:
• High bias mainly occurs due to a much simple model. Below are some
ways to reduce the high bias:
• Increase the input features as the model is underfitted.
• Decrease the regularization term.
• Use more complex models, such as including some polynomial features.

The variance would specify the amount of variation in the prediction if the
different training data was used. In simple words, variance tells that how
much a random variable is different from its expected value. Ideally, a
model should not vary too much from one training dataset to another,
which means the algorithm should be good in understanding the hidden
mapping between inputs and output variables. Variance errors are either
of low variance or high variance.
• Low variance means there is a small variation in the prediction of the
target function with changes in the training data set.
• High variance shows a large variation in the prediction of the target
function with changes in the training dataset.
• A model that shows high variance learns a lot and perform well with the
training dataset, and does not generalize well with the unseen dataset. As a
result, such a model gives good results with the training dataset but shows
high error rates on the test dataset.
• Since, with high variance, the model learns too much from the dataset, it
leads to overfitting of the model. A model with high variance has the below
problems:
• A high variance model leads to overfitting.
• Increase model complexities.
• Some examples of machine learning algorithms with low variance
are, Linear Regression, Logistic Regression, and Linear discriminant
analysis. At the same time, algorithms with high variance are decision
tree, Support Vector Machine, and K-nearest neighbours.
• Ways to Reduce High Variance:
• Reduce the input features or number of parameters as a model is overfitted.
• Do not use a much complex model.
• Increase the training data.
• Increase the Regularization term.
• Low-Bias, Low-Variance:
The combination of low bias and low variance shows an ideal machine
learning model. However, it is not possible practically.
• Low-Bias, High-Variance: With low bias and high variance, model
predictions are inconsistent and accurate on average. This case occurs
when the model learns with a large number of parameters and hence leads
to an overfitting
• High-Bias, Low-Variance: With High bias and low variance, predictions
are consistent but inaccurate on average. This case occurs when a model
does not learn well with the training dataset or uses few numbers of the
parameter. It leads to underfitting problems in the model.
• High-Bias, High-Variance:
With high bias and high variance, predictions are inconsistent and also
inaccurate on average.
• Ensemble learning tries to balance this bias-variance trade-off by reducing
either the bias or the variance.
• Ensemble learning will aim to reduce the bias if we have a weak model
with high bias and low variance. Ensemble learning will aim to reduce the
variance if we have a weak model with high variance and low bias.
• Ensemble learning improves a model’s performance in mainly three ways:
• By reducing the variance of weak learners
• By reducing the bias of weak learners,
• By improving the overall accuracy of strong learners.
• Bagging is used to reduce the variance of weak learners. Boosting is used
to reduce the bias of weak learners. Stacking is used to improve the
overall accuracy of strong learners
• Bagging aims to produce a model with lower variance than the individual
weak models. These weak learners are homogenous, meaning they are of
the same type.
• Bagging is also known as Bootstrap aggregating. It consists of two steps:
bootstrapping and aggregation.
• Bootstrap:-subsets of data are taken from the initial dataset. These subsets
of data are called bootstrapped datasets or, simply, bootstraps. Resampled
‘with replacement’ means an individual data point can be sampled multiple
times. Each bootstrap dataset is used to train a weak learner.
• Aggregating
• The individual weak learners are trained independently from each other.
Each learner makes independent predictions. The results of those
predictions are aggregated at the end to get the overall prediction. The
predictions are aggregated using either max voting or averaging.
• Max Voting is commonly used for classification problems. It consists of
taking the mode of the predictions (the most occurring prediction). It is
called voting because like in election voting, the premise is that ‘the
majority rules’. Each model makes a prediction. A prediction from each
model counts as a single ‘vote’. The most occurring ‘vote’ is chosen as the
representative for the combined model.
• Averaging is generally used for regression problems. It involves taking the
average of the predictions. The resulting average is used as the overall
prediction for the combined model.
Steps of Bagging
• The steps of bagging are as follows:
• We have an initial training dataset containing n-number of instances.
• We create a m-number of subsets of data from the training set. We
take a subset of N sample points from the initial dataset for each
subset. Each subset is taken with replacement. This means that a
specific data point can be sampled more than once.
• For each subset of data, we train the corresponding weak learners
independently. These models are homogeneous, meaning that they
are of the same type.
• Each model makes a prediction.
• The predictions are aggregated into a single prediction. For this,
either max voting or averaging is used.
• boosting for combining weak learners with high bias. Boosting aims to
produce a model with a lower bias than that of the individual models.
• Boosting involves sequentially training weak learners.
• each subsequent learner improves the errors of previous learners in the
sequence.
• A sample of data is first taken from the initial dataset. This sample is
used to train the first model, and the model makes its prediction. The
samples can either be correctly or incorrectly predicted. The samples
that are wrongly predicted are reused for training the next model. In this
way, subsequent models can improve on the errors of previous models.
• Unlike bagging, which aggregates prediction results at the end, boosting
aggregates the results at each step. They are aggregated using weighted
averaging.
• Weighted averaging involves giving all models different weights
depending on their predictive power. In other words, it gives more weight
to the model with the highest predictive power. This is because the learner
with the highest predictive power is considered the most important.
• Boosting works with the following steps:
• We sample m-number of subsets from an initial training dataset.
• Using the first subset, we train the first weak learner.
• We test the trained weak learner using the training data. As a result
of the testing, some data points will be incorrectly predicted.
• Each data point with the wrong prediction is sent into the second
subset of data, and this subset is updated.
• Using this updated subset, we train and test the second weak learner.
• We continue with the following subset until the total number of
subsets is reached.
• We now have the total prediction. The overall prediction has already
been aggregated at each step, so there is no need to calculate it.
Support Vector Machine (SVM)

• Support Vector Machine (SVM) is a supervised machine learning algorithm


used for both classification and regression.
• Support Vector Machine (SVM) is a powerful machine learning algorithm
used for linear or nonlinear classification, regression, and even outlier
detection tasks. SVMs can be used for a variety of tasks, such as text
classification, image classification, spam detection, handwriting
identification, gene expression analysis, face detection, and anomaly
detection.
• The main objective of the SVM algorithm is to find the optimal hyperplane
in an N-dimensional space that can separate the data points in different
classes in the feature space. The hyperplane tries that the margin between
the closest points of different classes should be as maximum as possible.
The dimension of the hyperplane depends upon the number of features.
• The goal of the SVM algorithm is to create the best line or decision
boundary that can segregate n-dimensional space into classes so that we
can easily put the new data point in the correct category in the future. This
best decision boundary is called a hyperplane.
• SVM chooses the extreme points/vectors that help in creating the
hyperplane. These extreme cases are called as support vectors, and hence
algorithm is termed as Support Vector Machine.
• Example: SVM can be understood with the example that we have used in
the KNN classifier. Suppose we see a strange cat that also has some
features of dogs, so if we want a model that can accurately identify
whether it is a cat or dog, so such a model can be created by using the
SVM algorithm. We will first train our model with lots of images of cats
and dogs so that it can learn about different features of cats and dogs, and
then we test it with this strange creature. So as support vector creates a
decision boundary between these two data (cat and dog) and choose
extreme cases (support vectors), it will see the extreme case of cat and dog.
On the basis of the support vectors, it will classify it as a cat. Consider the
below diagram:
• SVM algorithm can be used for Face detection, image
classification, text categorization, etc.
• Types of SVM
• SVM can be of two types:
• Linear SVM: Linear SVM is used for linearly separable data, which
means if a dataset can be classified into two classes by using a
single straight line, then such data is termed as linearly separable
data, and classifier is used called as Linear SVM classifier.
• Non-linear SVM: Non-Linear SVM is used for non-linearly
separated data, which means if a dataset cannot be classified by
using a straight line, then such data is termed as non-linear data
and classifier used is called as Non-linear SVM classifier.
• the SVM algorithm helps to find the best line or decision boundary; this
best boundary or region is called as a hyperplane.
• SVM algorithm finds the closest point of the lines from both the classes.
These points are called support vectors. The distance between the vectors
and the hyperplane is called as margin.
• And the goal of SVM is to maximize this margin. The hyperplane with
maximum margin is called the optimal hyperplane.
• Non-Linear SVM:
• If data is linearly arranged, then we can separate it by using a straight line,
but for non-linear data, we cannot draw a single straight line. Consider the
below image:
• Since we are in 3-d Space, hence it is looking like a plane parallel to the x-
axis. If we convert it in 2d space with z=1, then it will become as:
• Advantages of SVM
• Effective in high-dimensional cases.
• Its memory is efficient as it uses a subset of training points in the decision
function called support vectors.
• Different kernel functions can be specified for the decision functions and
its possible to specify custom kernels.
Random forest
• Random Forest is a popular machine learning algorithm that belongs to
the supervised learning technique. It can be used for both Classification
and Regression problems in ML. It is based on the concept of ensemble
learning, which is a process of combining multiple classifiers to solve a
complex problem and to improve the performance of the model.
• Random Forest is a classifier that contains a number of decision trees on
various subsets of the given dataset and takes the average to improve
the predictive accuracy of that dataset." Instead of relying on one
decision tree, the random forest takes the prediction from each tree and
based on the majority votes of predictions, and it predicts the final
output.
• The greater number of trees in the forest leads to higher accuracy and
prevents the problem of overfitting.
• Assumptions for Random Forest
• Since the random forest combines multiple trees to predict the class of the
dataset, it is possible that some decision trees may predict the correct output,
while others may not. But together, all the trees predict the correct output.
Therefore, below are two assumptions for a better Random forest classifier:
• There should be some actual values in the feature variable of the dataset so
that the classifier can predict accurate results rather than a guessed result.
• The predictions from each tree must have very low correlations.
• explain why we should use the Random Forest algorithm:
It takes less training time as compared to other algorithms.
• It predicts output with high accuracy, even for the large dataset it runs
efficiently.
• It can also maintain accuracy when a large proportion of data is missing.
• Random Forest works in two-phase first is to create the random forest by
combining N decision tree, and second is to make predictions for each tree
created in the first phase.
• The Working process can be explained in the below steps and diagram:
• Step-1: Select random K data points from the training set.
• Step-2: Build the decision trees associated with the selected data points
(Subsets).
• Step-3: Choose the number N for decision trees that you want to build.
• Step-4: Repeat Step 1 & 2.
• Step-5: For new data points, find the predictions of each decision tree, and
assign the new data points to the category that wins the majority votes.
• Suppose there is a dataset that contains multiple fruit images. So, this
dataset is given to the Random forest classifier. The dataset is divided into
subsets and given to each decision tree. During the training phase, each
decision tree produces a prediction result, and when a new data point
occurs, then based on the majority of results, the Random Forest classifier
predicts the final decision. Consider the below image:
• Applications of Random Forest
• There are mainly four sectors where Random forest mostly used:
• Banking: Banking sector mostly uses this algorithm for the identification
of loan risk.
• Medicine: With the help of this algorithm, disease trends and risks of the
disease can be identified.
• Land Use: We can identify the areas of similar land use by this algorithm.
• Marketing: Marketing trends can be identified using this algorithm.
• Advantages of Random Forest
• Random Forest is capable of performing both Classification and
Regression tasks.
• It is capable of handling large datasets with high dimensionality.
• It enhances the accuracy of the model and prevents the overfitting issue.
• Disadvantages of Random Forest
• Although random forest can be used for both classification and regression
tasks, it is not more suitable for Regression tasks.
• Difference between random forest and decision tree
Deep Learning
• Deep Learning is a subfield of Machine Learning that involves the use of
deep neural networks to model and solve complex problems. Deep
Learning has achieved significant success in various fields, and its use is
expected to continue to grow as more data becomes available, and more
powerful computing resources become available.
• Deep learning is the branch of machine learning which is based on
artificial neural network architecture. An artificial neural network or ANN
uses layers of interconnected nodes called neurons that work together to
process and learn from the input data.
• Deep learning can be used for supervised, unsupervised as well as
reinforcement machine learning. it uses a variety of ways to process these.
• Deep learning algorithms like Convolutional neural networks, Recurrent
neural networks are used for many supervised tasks like image
classifications and recognization, sentiment analysis, language translations,
etc.
• Deep learning algorithms like autoencoders and generative models are
used for unsupervised tasks like clustering, dimensionality reduction, and
anomaly detection.
• Deep learning can be used to learn policies, or a set of actions, that
maximizes the cumulative reward over time. Deep reinforcement learning
algorithms like Deep Q networks and Deep Deterministic Policy Gradient
(DDPG) are used to reinforce tasks like robotics and game playing etc.
• Applications of Deep Learning :
• The main applications of deep learning can be divided into computer
vision, natural language processing (NLP), and reinforcement learning.
• Computer vision
• In computer vision, Deep learning models can enable machines to identify
and understand visual data. Some of the main applications of deep learning
in computer vision include:
• Object detection and recognition: Deep learning model can be used to
identify and locate objects within images and videos, making it possible for
machines to perform tasks such as self-driving cars, surveillance, and
robotics.
• Image classification: Deep learning models can be used to classify images
into categories such as animals, plants, and buildings. This is used in
applications such as medical imaging, quality control, and image retrieval.
• Image segmentation: Deep learning models can be used for image
segmentation into different regions, making it possible to identify specific
features within images.
• Natural language processing (NLP):
• In NLP, the Deep learning model can enable machines to understand and
generate human language. Some of the main applications of deep learning
in NLP include:
• Automatic Text Generation – Deep learning model can learn the corpus
of text and new text like summaries, essays can be automatically generated
using these trained models.
• Language translation: Deep learning models can translate text from one
language to another, making it possible to communicate with people from
different linguistic backgrounds.
• Sentiment analysis: Deep learning models can analyze the sentiment of a
piece of text, making it possible to determine whether the text is positive,
negative, or neutral. This is used in applications such as customer service,
social media monitoring, and political analysis.
• Speech recognition: Deep learning models can recognize and transcribe
spoken words, making it possible to perform tasks such as speech-to-text
conversion, voice search, and voice-controlled devices.
Reinforcement learning:
In reinforcement learning, deep learning works as training agents to take action
in an environment to maximize a reward. Some of the main applications of deep
learning in reinforcement learning include:
Game playing: Deep reinforcement learning models have been able to beat
human experts at games such as Go, Chess, and Atari.
Robotics: Deep reinforcement learning models can be used to train robots to
perform complex tasks such as grasping objects, navigation, and manipulation.
Control systems: Deep reinforcement learning models can be used to control
complex systems such as power grids, traffic management, and supply chain
optimization.
• Challenges in Deep Learning
• Deep learning has made significant advancements in various fields, but
there are still some challenges that need to be addressed. Here are some
of the main challenges in deep learning:
• Data availability: It requires large amounts of data to learn from. For
using deep learning it’s a big concern to gather as much data for training.
• Computational Resources: For training the deep learning model, it is
computationally expensive because it requires specialized hardware like
GPUs and TPUs.
• Time-consuming: While working on sequential data depending on the
computational resource it can take very large even in days or months.
• Interpretability: Deep learning models are complex, it works like a black
box. it is very difficult to interpret the result.
• Overfitting: when the model is trained again and again, it becomes too
specialized for the training data, leading to overfitting and poor
performance on new data.
• Advantages of Deep Learning:
• High accuracy: Deep Learning algorithms can achieve state-of-the-art
performance in various tasks, such as image recognition and natural
language processing.
• Automated feature engineering: Deep Learning algorithms can
automatically discover and learn relevant features from data without the
need for manual feature engineering.
• Scalability: Deep Learning models can scale to handle large and complex
datasets, and can learn from massive amounts of data.
• Flexibility: Deep Learning models can be applied to a wide range of tasks
and can handle various types of data, such as images, text, and speech.
• Continual improvement: Deep Learning models can continually improve
their performance as more data becomes available.
• Disadvantages of Deep Learning:
• High computational requirements: Deep Learning models require large
amounts of data and computational resources to train and optimize.
• Requires large amounts of labeled data: Deep Learning models often
require a large amount of labeled data for training, which can be expensive
and time- consuming to acquire.
• Interpretability: Deep Learning models can be challenging to interpret,
making it difficult to understand how they make decisions.
Overfitting: Deep Learning models can sometimes overfit to the training
data, resulting in poor performance on new and unseen data.
• Black-box nature: Deep Learning models are often treated as black boxes,
making it difficult to understand how they work and how they arrived at
their predictions.
Machine Learning Deep Learning

Apply statistical algorithms to learn the Uses artificial neural network architecture
hidden patterns and relationships in the to learn the hidden patterns and
dataset. relationships in the dataset.

Requires the larger volume of dataset


Can work on the smaller amount of dataset
compared to machine learning

Better for complex task like image


Better for the low-label task. processing, natural language processing,
etc.

Takes less time to train the model. Takes more time to train the model.

A model is created by relevant features Relevant features are automatically


which are manually extracted from images extracted from images. It is an end-to-end
to detect an object in the image. learning process.

Less complex and easy to interpret the More complex, it works like the black box
result. interpretations of the result are not easy.

It can work on the CPU or requires less


It requires a high-performance computer
computing power as compared to deep
with GPU.
learning.
• Deep Learning models are able to automatically learn features from the data,
which makes them well-suited for tasks such as image recognition, speech
recognition, and natural language processing. The most widely used
architectures in deep learning are feedforward neural networks, convolutional
neural networks (CNNs), and recurrent neural networks (RNNs).
• Feedforward neural networks (FNNs) are the simplest type of ANN, with a
linear flow of information through the network. FNNs have been widely used
for tasks such as image classification, speech recognition, and natural
language processing.
• Convolutional Neural Networks (CNNs) are specifically for image and video
recognition tasks. CNNs are able to automatically learn features from the
images, which makes them well-suited for tasks such as image classification,
object detection, and image segmentation.
• Recurrent Neural Networks (RNNs) are a type of neural network that is able
to process sequential data, such as time series and natural language. RNNs
are able to maintain an internal state that captures information about the
previous inputs, which makes them well-suited for tasks such as speech
recognition, natural language processing, and language translation.

You might also like