0% found this document useful (0 votes)

28 views11 pages

Three Machine Learning Algorithms

The document provides information about three machine learning algorithms: Random Forest, Support Vector Machines, and Artificial Neural Networks. It describes the basic concepts, workings, advantages and applications of each algorithm.

Uploaded by

jsksjskjs.3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views11 pages

Three Machine Learning Algorithms

Uploaded by

jsksjskjs.3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

NOTES ON Three Machine Learning Algorithms

## Random Forest
Random Forest is a versatile and powerful supervised ML
algorithm used for both classification and regression tasks. It
operates by constructing a multitude of decision trees during
the training phase and outputs the mode (for classification) or
mean (for regression) prediction of the individual trees.
Random Forest works in the following manner:

1) Random Sampling with Replacement (Bootstrapping): RF

builds multiple decision trees by sampling the training data with
replacement. This means that each tree in the forest is trained
on a bootstrapped subset of the original dataset. This random
sampling helps introduce diversity among the trees.

IMPORTANT NOTE ABOUT BOOTSTAPPING:

Bootstrapping is a resampling technique used in statistics to
estimate the sampling distribution of a statistic by repeatedly
resampling with replacement from the observed dataset. It is
particularly useful when the underlying distribution of the data
is unknown or when the sample size is small. It is performed
through:
Sampling with Replacement: Bootstrapping involves randomly
sampling observations from the original dataset with
replacement to create new "bootstrap samples." This means
that each observation in the original dataset has the same
chance of being selected for a bootstrap sample at each
iteration, and some observations may be selected multiple
times while others may not be selected at all.

Estimation: After creating multiple bootstrap samples (typically

thousands or more), the statistic of interest (e.g., mean,
median, standard deviation, regression coefficient) is computed
for each bootstrap sample.

Calculating the Sampling Distribution: The collection of statistics

computed from the bootstrap samples forms the "bootstrap
distribution" or "sampling distribution" of the statistic of
interest. From this distribution, various properties such as
confidence intervals, standard errors, and hypothesis test
statistics can be estimated.

2) Random Feature Selection: At each node of the decision tree,

instead of considering all features for splitting, Random Forest
only considers a random subset of features. This further adds
randomness to the trees and helps prevent overfitting.
3) Decision Tree Building: Each decision tree is grown to its
maximum depth or until it reaches a stopping criterion (e.g.,
minimum number of samples in a leaf node).The trees are
typically constructed using techniques such as CART
(Classification and Regression Trees).

4) Voting (Classification) or Averaging (Regression): For

classification tasks, the prediction of each tree is considered,
and the class with the most votes across all trees is chosen as
the final prediction. For regression tasks, the predictions of all
trees are averaged to produce the final prediction.

### Advantages of Random Forest

a) Robust to Overfitting: By aggregating predictions from
multiple trees and introducing randomness, Random Forest is
less prone to overfitting compared to individual decision trees.
b) Handles Large Datasets: Random Forest can efficiently handle
large datasets with a large number of features and instances.
c) Implicit Feature Selection: Random Forest provides a
measure of feature importance, allowing users to identify the
most relevant features in the dataset.
Overall, Random Forest is a popular choice for many machine
learning tasks due to its robustness, flexibility, and ease of use.
## SUPPORT VECTOR MACHINES (SVM) -
A Support Vector Machine (SVM) is a supervised ML algorithm
primarily used for classification tasks, although it can also be
adapted for regression tasks. SVMs are effective in high-
dimensional spaces and are particularly well-suited for
problems where the number of features exceeds the number of
samples. SVM works by:

1) Separating Hyperplane: SVM aims to find the optimal

hyperplane that best separates the data points into different
classes. In two dimensions, this hyperplane is a line; in three
dimensions, it's a plane, and in higher dimensions, it's a
hyperplane.

2) Maximizing Margin:The optimal hyperplane is the one that

maximizes the margin, which is the distance between the
hyperplane and the nearest data point from each class, also
known as support vectors. Maximizing the margin helps
improve the generalization ability of the classifier.

3) Kernel Trick: SVMs can efficiently handle nonlinear

relationships between features through the kernel trick. By
mapping the input features into a higher-dimensional space,
SVM can find a hyperplane that effectively separates the data
points.
4) Soft Margin SVM: In situations where the data is not linearly
separable, or there are outliers, a soft-margin SVM can be used.
This allows for some misclassification of data points, controlled
by a regularization parameter (C), to find a better separating
hyperplane.

5) Kernel Functions: SVMs can utilize different types of kernel

functions (e.g., linear, polynomial, radial basis function (RBF),
sigmoid) to map the data into higher-dimensional space. The
choice of kernel function depends on the problem at hand and
the characteristics of the data.

6) Decision Function: Once trained, the SVM constructs a

decision function based on the support vectors and their
associated weights. This decision function is used to classify
new, unseen data points into one of the predefined classes.

What are the Advantages of Support Vector Machines:

a) Effective in High-Dimensional Spaces: SVMs perform well
even in cases where the number of dimensions exceeds the
number of samples.
b) Versatile: SVMs can be adapted for various tasks, including
classification, regression, and outlier detection.
c) Regularization: SVMs offer regularization parameters to
control overfitting and handle noise effectively.
d) Kernel Trick: SVMs can handle nonlinear relationships
between features using kernel functions, enabling them to
model complex decision boundaries.

Support Vector Machines are widely used in various fields,

including image classification, text classification, bioinformatics,
and finance, owing to their effectiveness and versatility in
handling both linear and nonlinear data.

## Artificial Neural Network (ANN)

ANN is a computational model inspired by the structure and
functioning of the human brain's biological neural networks.
ANNs are composed of interconnected nodes called artificial
neurons or perceptrons, organized in layers. These networks are
used for solving various machine learning tasks, including
classification, regression, clustering, and pattern recognition.

Structure of Artificial Neural Network:

1) Input Layer - this consists of neurons that receive the input
data. Each neuron corresponds to a feature in the input data.
2) Hidden Layers - Hidden layers are layers between the input
and output layers. They contain one or more layers of neurons
that perform intermediate computations. Deep neural networks
have multiple hidden layers.

3) Output Layer - The output layer produces the network's final

predictions. The number of neurons in the output layer
depends on the task; for example, in classification tasks, the
number of output neurons corresponds to the number of
classes.

4) Connections and Weights - Each connection between

neurons has an associated weight, which determines the
strength of the connection. During training, these weights are
adjusted to minimize the difference between the predicted and
actual outputs.

5) Activation Functions - they introduce nonlinearity into the

network, allowing it to learn complex relationships in the data.
Common activation functions include sigmoid, tanh, ReLU
(Rectified Linear Unit), and softmax.

## How does ANN work?

1) Forward Propagation - During forward propagation, input
data is fed into the network, and computations are performed
layer by layer until the output layer produces predictions. Each
neuron in a layer receives inputs from the previous layer,
applies a weighted sum of inputs, adds a bias term, and passes
the result through an activation function.

2) Backpropagation - this is the process of updating the

network's weights based on the difference between predicted
and actual outputs (the loss). This process involves computing
gradients of the loss function with respect to each weight and
adjusting the weights using gradient descent or its variants.

3) Training - training an ANN involves iteratively feeding training

data through the network, adjusting weights through
backpropagation to minimize the loss function, and optimizing
network performance.

4) Inference - Once trained, the ANN can be used for making

predictions on new, unseen data by simply performing forward
propagation.

Advantages of Artificial Neural Networks:

a) Ability to Learn Complex Patterns: ANNs can learn complex
patterns and relationships in data, making them suitable for
various tasks.
b) Adaptability: ANNs can adapt and learn from new data,
allowing them to generalize well to unseen examples.
Feature Learning: Deep neural networks can automatically learn
useful features from raw data, reducing the need for manual
feature engineering.
c) Parallel Processing: ANNs can be efficiently implemented on
parallel architectures, making them suitable for large-scale
computations.

Artificial Neural Networks are widely used in fields such as

image recognition, natural language processing, speech
recognition, autonomous vehicles, and many more, owing to
their effectiveness and flexibility in solving diverse problems.

## Comparing prediction outputs of the 3 algorithms above

Comparing Random Forest (RF), Support Vector Machine (SVM),

and Artificial Neural Network (ANN) models for classification
tasks involves evaluating their performance based on various
metrics.
1) Accuracy - measures the proportion of correctly classified
instances out of the total instances. Higher accuracy indicates
better performance.

2) Confusion Matrix - The confusion matrix provides a detailed

breakdown of the model's predictions, including true positives,
true negatives, false positives, and false negatives. It helps in
understanding the types of errors made by the model.

3) Precision and Recall - Precision measures the proportion of

true positive predictions out of all positive predictions, while
recall measures the proportion of true positive predictions out
of all actual positive instances. Higher precision and recall
values indicate better performance.

4) F1-score - this is the harmonic mean of precision and recall,

providing a balance between the two metrics. It's useful when
there's an imbalance between the classes.

5) ROC Curve and AUC (if applicable) - For binary classification

tasks, Receiver Operating Characteristic (ROC) curve and Area
Under the Curve (AUC) can be used to evaluate the trade-off
between true positive rate and false positive rate across
different threshold values.
6) Computational Complexity and Training Time - Consider the
computational resources required and the training time of each
model. Some models, like SVM with large datasets or complex
neural networks, may require more computational resources
and time for training.

7) Robustness and Generalization - Evaluate how well each

model generalizes to unseen data and its robustness to noise
and outliers.

To compare RF, SVM, and ANN models, you can assess each
model's performance using these metrics on a common
validation or test dataset. Additionally, consider the
interpretability, ease of implementation, and suitability for the
specific problem domain when making comparisons. It's also a
good practice to perform cross-validation to ensure the
robustness of the evaluation. Ultimately, the choice of the best
model depends on the specific characteristics of the dataset
and the requirements of the problem at hand.

A) What Is Motivation Behind Ensemble Methods? Give Your Answer in Probabilistic Terms
100% (1)
A) What Is Motivation Behind Ensemble Methods? Give Your Answer in Probabilistic Terms
6 pages
Module 3
No ratings yet
Module 3
11 pages
DL
No ratings yet
DL
10 pages
Decision Tree
No ratings yet
Decision Tree
16 pages
DSM MOd 5
No ratings yet
DSM MOd 5
34 pages
D3 IT Random Forest Apr 2023
No ratings yet
D3 IT Random Forest Apr 2023
32 pages
Unit 4 Learning
No ratings yet
Unit 4 Learning
5 pages
Key Machine Learning Terminologies and Their Expla
No ratings yet
Key Machine Learning Terminologies and Their Expla
4 pages
AI For Eng Supervised-Learning
No ratings yet
AI For Eng Supervised-Learning
25 pages
21AI502 Syllbus
No ratings yet
21AI502 Syllbus
5 pages
1.write The Formula For Sigmoid, Hyperbolic Tangen...
No ratings yet
1.write The Formula For Sigmoid, Hyperbolic Tangen...
3 pages
DL Unit 1
No ratings yet
DL Unit 1
20 pages
ML & DL Notes
No ratings yet
ML & DL Notes
30 pages
ML Algorithms Comprehensive Study
No ratings yet
ML Algorithms Comprehensive Study
9 pages
Types of Kernels in Support Vector Machines
No ratings yet
Types of Kernels in Support Vector Machines
14 pages
Data Mining Notes
No ratings yet
Data Mining Notes
5 pages
Midterm Topics - V Advanced Data Mining Algorithms
No ratings yet
Midterm Topics - V Advanced Data Mining Algorithms
7 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
ES335
No ratings yet
ES335
22 pages
Machine Learning
No ratings yet
Machine Learning
15 pages
Data Science
No ratings yet
Data Science
5 pages
Machine Learning Algorithms
No ratings yet
Machine Learning Algorithms
13 pages
Random Forests 2
No ratings yet
Random Forests 2
43 pages
UNIT 2-Part2
No ratings yet
UNIT 2-Part2
9 pages
ML Models
No ratings yet
ML Models
21 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
Steps & Applications of Machine Learning
No ratings yet
Steps & Applications of Machine Learning
32 pages
Decision Trees and Neural Networks Guide
No ratings yet
Decision Trees and Neural Networks Guide
49 pages
Ensemble Learning Explained
No ratings yet
Ensemble Learning Explained
32 pages
ML Overview
No ratings yet
ML Overview
11 pages
Interview Preparing - ML Draft
No ratings yet
Interview Preparing - ML Draft
12 pages
ML Mod1
No ratings yet
ML Mod1
48 pages
ML Unit-3
No ratings yet
ML Unit-3
28 pages
ML Mod 4
No ratings yet
ML Mod 4
13 pages
Notes
No ratings yet
Notes
35 pages
Random Forests
No ratings yet
Random Forests
43 pages
ML
No ratings yet
ML
18 pages
Ensembles
No ratings yet
Ensembles
9 pages
13 PracticalMachineLearning
100% (1)
13 PracticalMachineLearning
84 pages
05.random Forest
No ratings yet
05.random Forest
3 pages
Lecture 6
No ratings yet
Lecture 6
24 pages
ML Module 3
No ratings yet
ML Module 3
44 pages
Amlt Bca Unit-1
No ratings yet
Amlt Bca Unit-1
24 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
AI Internal Questions Solution
No ratings yet
AI Internal Questions Solution
15 pages
Spam Not Spam
No ratings yet
Spam Not Spam
7 pages
Parameter's Resumes
No ratings yet
Parameter's Resumes
18 pages
Compare 5 ML Classification Algorithms
No ratings yet
Compare 5 ML Classification Algorithms
4 pages
DSUP Exp6
No ratings yet
DSUP Exp6
5 pages
Machine Learning Mid 2 Set 1
No ratings yet
Machine Learning Mid 2 Set 1
6 pages
Exploring Machine Learning Algorithms - A Beginner's Guide
No ratings yet
Exploring Machine Learning Algorithms - A Beginner's Guide
10 pages
Classification Algorithms
No ratings yet
Classification Algorithms
68 pages
Assessing Predictive Models
No ratings yet
Assessing Predictive Models
25 pages
Machine Learning for Heart Failure Prediction
No ratings yet
Machine Learning for Heart Failure Prediction
15 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
10 pages
ML Notes
No ratings yet
ML Notes
12 pages
Random Forests Simplified
No ratings yet
Random Forests Simplified
39 pages
4 DHS6 - Sampling - Manual - Sept2012 - DHSM4
No ratings yet
4 DHS6 - Sampling - Manual - Sept2012 - DHSM4
26 pages
TREND User Guide
No ratings yet
TREND User Guide
29 pages
Fish Biodiversity in Indian Tropical Rivers
No ratings yet
Fish Biodiversity in Indian Tropical Rivers
17 pages
Third-Year Syllabus (Effective For Academic Year 2024-25 and 2025-26)
No ratings yet
Third-Year Syllabus (Effective For Academic Year 2024-25 and 2025-26)
72 pages
(FREE PDF Sample) Stochastic Methods in Scientific Computing 1st Edition Massimo D'Elia Ebooks
100% (8)
(FREE PDF Sample) Stochastic Methods in Scientific Computing 1st Edition Massimo D'Elia Ebooks
84 pages
Sampling in Crop Protection
No ratings yet
Sampling in Crop Protection
296 pages
Practical Tools For Designing and Weighting Survey Samples Richard Valliant Ebook Revised PDF Version
100% (5)
Practical Tools For Designing and Weighting Survey Samples Richard Valliant Ebook Revised PDF Version
45 pages
Assignment 4 R Program1
No ratings yet
Assignment 4 R Program1
11 pages
Service Quality and Loyalty in Indonesia Mobile Internet
No ratings yet
Service Quality and Loyalty in Indonesia Mobile Internet
6 pages
The FED Model and Expected Asset Returns
No ratings yet
The FED Model and Expected Asset Returns
68 pages
Intermittent Demand Forecasting: Context, Methods and Applications Syntetos - Download The Ebook Today and Own The Complete Version
100% (3)
Intermittent Demand Forecasting: Context, Methods and Applications Syntetos - Download The Ebook Today and Own The Complete Version
59 pages
Demography and Evolutionary Ecology of Hadza - Nicholas Blurton Jones - Cambridge Studies in Biological and Evolutionary - Cambridge University - 9781107069824 - Anna's Archive
No ratings yet
Demography and Evolutionary Ecology of Hadza - Nicholas Blurton Jones - Cambridge Studies in Biological and Evolutionary - Cambridge University - 9781107069824 - Anna's Archive
511 pages
Real Statistics Examples Correlation Reliability
No ratings yet
Real Statistics Examples Correlation Reliability
404 pages
Modern Mathematical Statistics-Dudewics
No ratings yet
Modern Mathematical Statistics-Dudewics
6 pages
Self-Study Plan For Becoming A Quantitative Trader - Part II
No ratings yet
Self-Study Plan For Becoming A Quantitative Trader - Part II
4 pages
Phillips 1994
No ratings yet
Phillips 1994
24 pages
Stata 15 Data Analysis Cheat Sheet
No ratings yet
Stata 15 Data Analysis Cheat Sheet
1 page
Partial Least Squares Regression and Projection On Latent Structure Regression (PLS Regression)
No ratings yet
Partial Least Squares Regression and Projection On Latent Structure Regression (PLS Regression)
10 pages
Bootstrap Methods and Their Application
100% (1)
Bootstrap Methods and Their Application
596 pages
Comparative Study of Heavy and Light Users of Popular Smart Apps
No ratings yet
Comparative Study of Heavy and Light Users of Popular Smart Apps
8 pages
Heip, C. & Al. (1998) - Indices of Diversity and Evenness.
100% (1)
Heip, C. & Al. (1998) - Indices of Diversity and Evenness.
27 pages
Calcutta Univ. M.Sc. Stats Guide
No ratings yet
Calcutta Univ. M.Sc. Stats Guide
37 pages
Model Evaluation Techniques
No ratings yet
Model Evaluation Techniques
53 pages
Lang and Litzenburger 1989
No ratings yet
Lang and Litzenburger 1989
11 pages
Berrar EBCB 2nd Edition Cross-Validation Preprint
No ratings yet
Berrar EBCB 2nd Edition Cross-Validation Preprint
13 pages
Institute of Mathematical Statistics
No ratings yet
Institute of Mathematical Statistics
12 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
4 pages
Levels of Species Diversity and Its Measurement 3, 7th August, 2020
No ratings yet
Levels of Species Diversity and Its Measurement 3, 7th August, 2020
12 pages
Data Science and Machine Learning Interview Questions Using Python Second Edition Vishwanathan Narayanan ebook unrestricted access
100% (2)
Data Science and Machine Learning Interview Questions Using Python Second Edition Vishwanathan Narayanan ebook unrestricted access
45 pages
Article2021 Forestist
No ratings yet
Article2021 Forestist
8 pages

Three Machine Learning Algorithms

Uploaded by

Three Machine Learning Algorithms

Uploaded by

NOTES ON Three Machine Learning Algorithms

1) Random Sampling with Replacement (Bootstrapping): RF

IMPORTANT NOTE ABOUT BOOTSTAPPING:

Estimation: After creating multiple bootstrap samples (typically

Calculating the Sampling Distribution: The collection of statistics

2) Random Feature Selection: At each node of the decision tree,

4) Voting (Classification) or Averaging (Regression): For

### Advantages of Random Forest

1) Separating Hyperplane: SVM aims to find the optimal

2) Maximizing Margin:The optimal hyperplane is the one that

3) Kernel Trick: SVMs can efficiently handle nonlinear

5) Kernel Functions: SVMs can utilize different types of kernel

6) Decision Function: Once trained, the SVM constructs a

What are the Advantages of Support Vector Machines:

Support Vector Machines are widely used in various fields,

## Artificial Neural Network (ANN)

Structure of Artificial Neural Network:

3) Output Layer - The output layer produces the network's final

4) Connections and Weights - Each connection between

5) Activation Functions - they introduce nonlinearity into the

## How does ANN work?

2) Backpropagation - this is the process of updating the

3) Training - training an ANN involves iteratively feeding training

4) Inference - Once trained, the ANN can be used for making

Advantages of Artificial Neural Networks:

Artificial Neural Networks are widely used in fields such as

## Comparing prediction outputs of the 3 algorithms above

Comparing Random Forest (RF), Support Vector Machine (SVM),

2) Confusion Matrix - The confusion matrix provides a detailed

3) Precision and Recall - Precision measures the proportion of

4) F1-score - this is the harmonic mean of precision and recall,

5) ROC Curve and AUC (if applicable) - For binary classification

7) Robustness and Generalization - Evaluate how well each

You might also like