0% found this document useful (0 votes)
18 views14 pages

Comparing ML Algorithms - Anjali Garg

Uploaded by

Prince
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views14 pages

Comparing ML Algorithms - Anjali Garg

Uploaded by

Prince
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

A systematic approach for better understanding of:

How to compare algorithms effectively


How to optimize models for specific tasks
MODEL
MODEL
TRAINING LOSS / COST OPTIMISATION WEIGHTS
WEIGHTS AND TRAINING OUTPUT
DATA FUNCTION ALGORITHM AND
HYPER- PROCESS DATA
HYPER-
PARAMETERS
PARAMETERS

These six components are crucial for understanding the core differences between
machine learning algorithms.
Different algorithms are designed to handle
different types of data.

Data Structure:
Some Algorithms work with structured i.e.,
tabular data like decision tree, regression
models, k-means

Neural Network models can work with


unstructured data as well like images, texts
etc.

Time series or sequential data is handled by


models like ARIMA, LSTMs
Different algorithms are designed to handle
different types of data.

Labeled vs. Unlabeled Data:

Labeled data is used in supervised


learning algorithms like decision tree
classifier, logistic regression, linear
regression etc.

Unlabeled data is used in unsupervised


algorithms like K-Means Clustering
Loss function quantifies the difference between predicted and actual outcomes.
The choice of loss function determines how the model learns from data, because the model
tries to minimize this loss in order to get better at understanding the patterns in data.

Common Loss Functions:


The loss function of classification algorithm The loss function of regression algorithm
measures how well or poorly the model’s predicted measures the difference between the
labels match the true labels. predicted values and the actual values.
Example: Cross entropy loss for multi-class Example: MSE measures the average of
classification: the squares of the errors

where C is the number of classes, y_ic​is the true label


(1 if the class is the correct one, 0 otherwise), and p_ic​ where y_i​is the actual value and p_​i​is the
is the predicted probability for class c. predicted value.
Common Loss Functions:

Algorithm Name Algorithm Type Loss Function Name

Linear Regression Regression Mean Squared Error (MSE)

Logistic Regression Classification Cross Entropy Loss

Support Vector Machine (SVM) Classification Hinge Loss

Robust Regression Regression Huber Loss

Poisson Regression Regression Poisson Loss

AdaBoost Classification Exponential Loss

Linear SVM Classification Squared Hinge Loss

Lasso Regression Regression Mean Absolute Error (MAE)

Gradient Boosting (for binary classification) Classification Log Loss


Optimization algorithms help in finding the best set of parameters (weights, biases, etc.) that
reduce the error as much as possible by minimizing the loss function, improving the model's
accuracy or predictive performance.

Key Types of Optimization Algorithms in ML:

Gradient Descent:
Minimizes the cost function by iteratively moving in the
direction of steepest descent (negative gradient) with respect
to the parameters.

Adaptive Learning Rate Algorithms:


Adjust the learning rate dynamically to make the algorithm
more efficient by modifying how fast or slow it learns.
Model Parameters: These are internal variables learned from the data during training
(e.g., weights in a neural network, coefficients and bias in linear regression). They directly
affect the model's predictions and are adjusted by optimization algorithms like gradient
descent.

Hyperparameters: These are external configurations set before training (e.g., learning
rate, number of layers, number of neurons in a neural network). They guide the learning
process and affect the model's performance and generalization ability. Tuning them is
key to improving model accuracy.

By tuning hyperparameters based on the validation data, the model is more likely to
generalize well to unseen data (i.e., test data), ensuring better performance in real-world
scenarios.
Model Parameters Hyperparameters

Linear Regression Coefficients (weights), Intercept (bias) Regularization strength (L2: Ridge, L1: Lasso), Learning rate

Logistic Regression Coefficients (weights), Intercept (bias) Regularization strength, Solver type

Decision Tree Node splits, Leaf nodes Max depth, Min samples split, Min samples leaf

Random Forest Decision tree parameters (per tree) Number of trees, Max depth, Max features, Min samples split

Support Vector
Support vectors, Coefficients Kernel type, Regularization parameter (C), Gamma
Machine

K-Nearest Neighbors N/A Number of neighbors (K), Distance metric

Learning rate, Number of layers, Number of neurons, Activation


Neural Networks Weights, Biases
function, Batch size, Epochs

K-Means Clustering Cluster centroids Number of clusters (K), Initialization method, Max iterations

XGBoost Tree parameters (weights) Learning rate, Max depth, Number of estimators, Subsample ratio
The training process is the backbone of building an effective machine learning model.
Here's a breakdown of the key steps involved.

1. Data Preprocessing: Prepare the data (e.g., normalization, feature selection, lemmatization for text)
to ensure it is in a suitable format for the model.
2. Model Initialization: Set initial values for the model's weights and hyperparameters.
3. Forward Pass: The model makes predictions using the initial weights on the training data.
4. Compute Loss: The difference between predicted and actual values is measured using a loss
function (e.g., MSE for regression, Cross Entropy for classification).
5. Backpropagation: The error is propagated back through the network to adjust the weights using an
optimization algorithm like Gradient Descent.
6. Parameter Update: The model's parameters (weights, biases) are updated to reduce the loss.
7. Repeat: Steps 3 to 6 are repeated for multiple iterations (epochs) until the model converges

Different models differ in their data preprocessing steps, initialization process, loss fuctions,
optimization algorithms. Choosing effective methods are crucial for model’s performance and
accuracy.
Algorithms can be categorized by their output data, depending on the nature and type of data they produce.
Here are some key ways algorithms differ based on their output:

Output Algorithm Type Common Algorithms

Categorical label or class Classification Logistic Regression, Decision Trees, SVM, k-NN

Continuous value Regression Linear Regression, Ridge Regression, Neural Networks

Group or cluster Clustering k-Means, DBSCAN, Hierarchical Clustering

Generated data resembling input Generative GANs, VAEs

Reduced data Dimensionality Reduction PCA, t-SNE, UMAP

Suggested items Recommendation Collaborative Filtering, Content-based Filtering

Optimal solution Optimization Gradient Descent, Genetic Algorithms


Aspect Linear Regression Decision Tree

Uses labeled and structured dataset, assuming a Also requires labeled and structured dataset but splits
Training Data
linear relationship between features and target. data recursively without assuming linearity.

Uses MSE (regression) or Gini/Entropy (classification) to


Loss Function Minimizes Mean Squared Error (MSE).
determine splits.

Optimization Direct computation (normal equation) or gradient Greedy algorithm that selects the feature providing the
Algorithm descent to minimize loss. best split.

Model Parameters Parameters: coefficients and intercept. Parameters: tree structure. Hyperparameters: depth,
and Hyperparameters Hyperparameters: learning rate for gradient descent. split criteria, etc.

Fits a linear equation by minimizing error i.e., loss Builds a tree by recusively splitting data on features
Training Process
function. until stopping criteria.

Continuous values (regression) or class labels


Output Data Continuous values (regression).
(classification).
Aspect Logistic Regression SVM (Support Vector Machine)

Requires labeled data for binary or multiclass Requires labeled data, works well with small or large
Training Data
classification. datasets.

Uses Log Loss (Cross Entropy Loss) to measure


Loss Function Uses Hinge Loss to maximize the margin between classes.
prediction error.

Optimization Uses Quadratic Programming (QP) to maximize the margin,


Optimized via gradient descent or its variants.
Algorithm or gradient-based methods for non-linear cases.

Model Parameters Parameters: weights and bias. Hyperparameters: Parameters: support vectors and weights. Hyperparameters:
and Hyperparameters regularization strength (L1/L2). C (regularization), kernel type.

Adjusts weights to minimize Log Loss using Finds a hyperplane that maximizes the margin, with kernel
Training Process
gradient descent. options for non-linear cases.

Outputs probabilities for class membership (via Outputs class labels based on distance from the separating
Output Data
sigmoid function). hyperplane (no probabilities).
Thank's For
Watching

You might also like