Comparing ML Algorithms - Anjali Garg
Comparing ML Algorithms - Anjali Garg
These six components are crucial for understanding the core differences between
machine learning algorithms.
Different algorithms are designed to handle
different types of data.
Data Structure:
Some Algorithms work with structured i.e.,
tabular data like decision tree, regression
models, k-means
Gradient Descent:
Minimizes the cost function by iteratively moving in the
direction of steepest descent (negative gradient) with respect
to the parameters.
Hyperparameters: These are external configurations set before training (e.g., learning
rate, number of layers, number of neurons in a neural network). They guide the learning
process and affect the model's performance and generalization ability. Tuning them is
key to improving model accuracy.
By tuning hyperparameters based on the validation data, the model is more likely to
generalize well to unseen data (i.e., test data), ensuring better performance in real-world
scenarios.
Model Parameters Hyperparameters
Linear Regression Coefficients (weights), Intercept (bias) Regularization strength (L2: Ridge, L1: Lasso), Learning rate
Logistic Regression Coefficients (weights), Intercept (bias) Regularization strength, Solver type
Decision Tree Node splits, Leaf nodes Max depth, Min samples split, Min samples leaf
Random Forest Decision tree parameters (per tree) Number of trees, Max depth, Max features, Min samples split
Support Vector
Support vectors, Coefficients Kernel type, Regularization parameter (C), Gamma
Machine
K-Means Clustering Cluster centroids Number of clusters (K), Initialization method, Max iterations
XGBoost Tree parameters (weights) Learning rate, Max depth, Number of estimators, Subsample ratio
The training process is the backbone of building an effective machine learning model.
Here's a breakdown of the key steps involved.
1. Data Preprocessing: Prepare the data (e.g., normalization, feature selection, lemmatization for text)
to ensure it is in a suitable format for the model.
2. Model Initialization: Set initial values for the model's weights and hyperparameters.
3. Forward Pass: The model makes predictions using the initial weights on the training data.
4. Compute Loss: The difference between predicted and actual values is measured using a loss
function (e.g., MSE for regression, Cross Entropy for classification).
5. Backpropagation: The error is propagated back through the network to adjust the weights using an
optimization algorithm like Gradient Descent.
6. Parameter Update: The model's parameters (weights, biases) are updated to reduce the loss.
7. Repeat: Steps 3 to 6 are repeated for multiple iterations (epochs) until the model converges
Different models differ in their data preprocessing steps, initialization process, loss fuctions,
optimization algorithms. Choosing effective methods are crucial for model’s performance and
accuracy.
Algorithms can be categorized by their output data, depending on the nature and type of data they produce.
Here are some key ways algorithms differ based on their output:
Categorical label or class Classification Logistic Regression, Decision Trees, SVM, k-NN
Uses labeled and structured dataset, assuming a Also requires labeled and structured dataset but splits
Training Data
linear relationship between features and target. data recursively without assuming linearity.
Optimization Direct computation (normal equation) or gradient Greedy algorithm that selects the feature providing the
Algorithm descent to minimize loss. best split.
Model Parameters Parameters: coefficients and intercept. Parameters: tree structure. Hyperparameters: depth,
and Hyperparameters Hyperparameters: learning rate for gradient descent. split criteria, etc.
Fits a linear equation by minimizing error i.e., loss Builds a tree by recusively splitting data on features
Training Process
function. until stopping criteria.
Requires labeled data for binary or multiclass Requires labeled data, works well with small or large
Training Data
classification. datasets.
Model Parameters Parameters: weights and bias. Hyperparameters: Parameters: support vectors and weights. Hyperparameters:
and Hyperparameters regularization strength (L1/L2). C (regularization), kernel type.
Adjusts weights to minimize Log Loss using Finds a hyperplane that maximizes the margin, with kernel
Training Process
gradient descent. options for non-linear cases.
Outputs probabilities for class membership (via Outputs class labels based on distance from the separating
Output Data
sigmoid function). hyperplane (no probabilities).
Thank's For
Watching