Understanding Machine Learning Algorithms - in Depth
Understanding Machine Learning Algorithms - in Depth
Table of Contents
1. What is Machine Learning?
5. Reinforcement Learning
6. Semi-Supervised Learning
7. Steps in ML Project
- Data Preprocessing
- Feature Engineering
Notes by RaviTeja G
10. Exploring Step 3 - Train Model on Dataset
- Types of Learning
- Under Fitting and OverFitting
- Regularization techniques
- Hyperparameter Tuning
11. Exploring Step 4 - Evaluation of a Model
- Evaluation Metrics
- Confusion Matrix
- Recall/Sensitivity
- Precision
- Specificity
- F1 Score
- AUC and ROC Curve
- Analysis of a Model
12. Supervised Learning
- Linear Regression
- Regularization Techniques
- Logistic Regression
- Decision Trees
- Ensemble Techniques
- Random Forests
- AdaBoost
- Gradient Boost
- XG Boost
- K-Nearest Neighbours
- Support Vector Machines
- Naive Bayes Classifiers
13. Unsupervised Learning
- Clustering Techniques
- K-Means Clustering
- Hierarchical Clustering
- DB Scan Clustering
- Evaluation of Clustering Models
- Curse of Dimensionality
- Principal Component Analysis
14. Cheat Sheet of Supervised and Unsupervised Algorithms
Table of Contents
1. What is Linear Regression
2. Understanding with an example
3. Evaluating the fitness of the model
4. Understanding Gradient descent
5. Understanding Loss Function
6. Measuring Model Strength
7. Another Approach for LR - OLS
Table Of Contents
1. Understanding Multicollinearity
2. Variance Inflection Factor
3. Regularization
4. Lasso - L1 Form
5. Ridge - L2 Form
6. Elastic Net
7. Difference Between Ridge and Lasso
8. When to use Ridge/Lasso/Elastic Net
9. Polynomial Regression
Table Of Contents
1. Why do we need Decision Trees
2. How it works
3. How do we select a root node
4. Understanding Entropy, Information Gain
5. Solving an Example on Entropy
6. Understanding Gini Impurity
7. Solving an Example on Gini Impurity
8. Decision tree for Regression
9. Why Decision Trees are Greedy Apporach
10. Understanding Pruning
Table of Contents
1.Understanding Boosting
2.Understanding AdaBoost
3.Solving and Example on AdaBoost
4.Understanding Gradient Boosting
5.Solving an Example on Gradient Boosting
6.AdaBoost Vs Gradient Boosting
Table Of Contents
1. How does K-Nearest Neighbours work
2. How is Distance Calculated
- Eculidean Distance
- Hamming Distance
- Manhattan Distance
3. Why is KNN a Lazy Learner
4. Effects of Choosing the value of K
5. Different ways to perform KNN
6. Understanding KD-Tree
7. Solving an Example of KD Tree
8. Understanding Ball Tree
Understanding Support Vector Machines
Table Of Contents
1. Understanding Concept of SVC
2. What are Support Vectors
3. What is Margin
4. Hard Margin and Soft Margin
5. Kernelized SVC
6. Types of Kernels
7. Understanding SVR
Table Of Contents
1. Why do we need Naive Bayes
2. Concept of how it works
3. Mathematical Intuition of Naive Bayes
4. Solving an Example on Naive Bayes
5. Other Bayes Classifiers
- Gaussian Naive Bayes Classifier
- Multinomial Naive Bayes Classifier
- Bernoulli Naive Bayes Classifier
Table of Contents
1. How clustering is different from classification
2. Applications of Clustering
3. What are density based methods
4. What are Hierarchial based methods
5. What are partitioning methods
6. What are Grid Based methods
7. Main Requirements for Clustering Algorithms
Table Of Contents
1. Concept of K-Means Clustering
2. Math Intuition Behind K-Means
3. Cluster Building Process
4. Edge Case Scenarios of K-Means
5. Challenges and Improvements in K-Means
Understanding Principal Component Analysis
Table Of Contents
1. Idea Behind PCA
2. What are Principal Components
3. Eigen Decomposition Approach
4. Singular Value Decomposition Approach
5. Why do we maximize Variance
6. What is Explained Variance Ratio
7. How to select optimal no.of Prinicpal Components
8. Understanding Scree plot
9. Issues with PCA
10. Understanding Kernel PCA
– Supervised Algorithms –
Regression Models
ALGORITHM DESCRIPTION & ADVANTAGES DISADVANTAGES
APPLICATION
Linear Linear Regression models 1. Fast training because 1. Assumes a linear relationship
Regression a linear relationship there are few parameters. between input and output variables.
between input variables 2. 2. Sensitive to outliers.
and a continuous Interpretable/Explainable 3. Typically generalizes worse than
numerical output variable. results by its output ridge or lasso regression.
The default loss function is coefficients.
the mean square error
(MSE).
Support Vector Support Vector 1. Robust against outliers. 1. Does not perform well with large
Regression Regression (SVR) uses 2. Effective learning and datasets. 2. Tends to underfit in
the same principle as strong generalization cases where the number of variables
SVMs but optimizes the performance. 3. Different is much smaller than the number of
cost function to fit the most Kernel functions can be observations.
straight line (or plane) specified for the decision
through the data points. function.
With the kernel trick it can
efficiently perform a
non-linear regression by
implicitly mapping their
inputs into
high-dimensional feature
spaces.
Gaussian Gaussian Process 1. Provides uncertainty 1. Poor choice of kernel can make
Process Regression (GPR) uses a measures on the convergence slow.
Regression Bayesian approach that predictions. 2. Specifying specific kernels
infers a probability 2. It is a flexible and requires deep mathematical
distribution over the usable non-linear model understanding
possible functions that fit which fits many datasets
the data. The Gaussian well.
process is a prior that is 3. Performs well on small
specified as a multivariate datasets as the GP kernel
Gaussian distribution. allows to specify a prior on
the function space.
Classification Models
Logistic The logistic regression 1. Explainable & 1. Makes a strong assumption about
Regression models a linear Interpretable. 2. Less the relationship between input and
(and its relationship between input prone to overfitting using response variables. 2.
extensions) variables and the regularization. 3. Multicollinearity can cause the model
response variable. It Applicable for multi-class to easily overfit without
models the output as predictions. regularization.
binary values (o or rather
than numeric values.
Linear The linear decision 1. Explainable & 1. Multicollinearity can cause the
Discriminant boundary maximizes the Interpretable. 2. Applicable model to overfit. 2. Assuming that all
Analysis separability between the for multi-class predictions. classes share the same covariance
classes by finding a linear matrix. 3. Sensitive to outliers. 4.
combination of features. Doesn't work well with small class
sizes.
Both Regression and Classification Models
ALGORITHM DESCRIPTION & ADVANTAGES DISADVANTAGES
APPLICATION
Decision Decision Tree models learn 1. Explainable and 1. Prone to overfitting. 2. Can be
Trees on the data by making interpretable. 2. Can unstable with minor data drift. 3.
decision rules on the handle missing values. Sensitive to outliers.
variables to separate the
classes in a flowchart like a
tree data structure. They
can be used for both
regression and
classification.
Random Random Forest 1. Effective learning and 1. Large number of trees can slow
Forest classification models learn better generalization down performance. 2. Predictions
using an ensemble of performance. 2. Can are sensitive to outliers. 3.
decision trees. The output handle moderately large Hyperparameter tuning can be
of the random forest is datasets. 3. Less prone to complex.
based on a majority vote of overfit than decision trees.
the different decision trees.
Ridge Ridge Regression penalizes 1. Less prone to 1. All the predictors are kept in the
Regression variables with low predictive overfitting. 2. Best suited final model. 2. Doesn't perform
outcomes by shrinking their when data suffers from feature selection.
coefficients towards zero. It multicollinearity. 3.
can be used for Explainable &
classification and Interpretable.
regression.
AdaBoost Adaptive Boosting uses an 1. Explainable & 1. Less prone to overfitting as the
ensemble of weak learners Interpretable. 2. Less need input variables are not jointly
that is combined into a for tweaking parameters. optimized. 2. Sensitive to noisy data
weighted sum that 3. Usually outperforms and outliers.
represents the final output Random Forest.
of the boosted classifier.
– Unsupervised Algorithms –
Clustering Algorithms
ALGORITHM DESCRIPTION & ADVANTAGES DISADVANTAGES
APPLICATION
K-Means Most common clustering 1. Scales to large datasets 1. Requires defining the
approach which assumes 2. Interpretable & expected number of clusters in
that the closer data points explainable results 3. Can advance. 2. Not suitable to
are to each other, the more generate tight clusters identify clusters with
similar they are It non-convex shapes.
determines K clusters
based on Euclidean
distances.
t-SNE t-distributed Stochastic 1. Helps preserve the 1. The cost function is not
Neighbor Embedding is a relationships seen in high convex: different initializations
non-linear dimensionality dimensionality 2. Easy to can get different results. 2.
reduction method that visualise the structure of Computationally intensive for
converts similarities between high dimensional data in or large datasets. 3. Default
data points to joint 3 dimensions 3. Very parameters do not always
probabilities using the effective for visualizing achieve the best results
Student t-distribution in the clusters or groups of data
low-dimensional space points and their relative
proximities
Apriori The Apriori algorithm uses 1. Explainable & 1. Requires defining the
algorithm the join and prune step interpretable results. 2. expected number of clusters or
iteratively to identify the Exhaustive approach mixture components in advance
most frequent itemset in the based on the confidence 2. The covariance type needs
given dataset. Prior and support. to be defined for the mixture of
knowledge (apriori) of component
frequent itemset properties
is used in the process.