0% found this document useful (0 votes)
0 views

Machine Learning Algorithm Cheat Sheet

Uploaded by

smitkhristi4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

Machine Learning Algorithm Cheat Sheet

Uploaded by

smitkhristi4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Machine Learning Algorithm Cheat Sheet (Interview Focused)

Algorithm Advantages Disadvantages Use Cases Key Parameters Bias-Variance Complexity (Class) Outlier Affect
Linear Regres- Simple, inter- Assumes linearity, Predictive an- Learning rate, High bias, low O(n) (Low) High
sion pretable, fast sensitive to outliers alytics, trend L1/L2 regulariza- variance
analysis tion
Logistic Regres- Probabilistic out- Requires feature Binary classifi- Learning rate, High bias, low O(n) (Low) Moderate
sion put, efficient scaling, linear cation L1/L2 regulariza- variance
boundaries tion
Decision Trees Easy interpreta- Prone to overfit- Classification, Max depth, min Low bias, high O(n log n) (Medium) High
tion, handles non- ting, sensitive to regression, fea- samples split variance
linearity noise ture importance
Random Forest Reduces overfit- Computationally Classification, Number of estima- Moderate bias, O(n log n) (Medium) Low
ting, handles miss- expensive, less in- regression, fea- tors, max depth moderate vari-
ing values terpretable ture importance ance
Support Vec- Effective in high Slow training, Classification, Kernel type, C, Low bias, high O(n2 )toO(n3 )(High) High
tor Machines dimensions, robust memory-intensive regression, out- gamma variance
(SVM) lier detection
K-Nearest Simple, non- Computationally Classification, Number of neigh- Low bias, high O(n) per query (Low) High
Neighbors parametric, adapt- expensive, sensitive regression, rec- bors (K), distance variance
(KNN) able to noise ommendation metric
Naive Bayes Fast, requires less Assumes feature Text classifi- Smoothing param- High bias, low O(n) (Low) Low
data, good for text independence, lim- cation, spam eter variance
ited complexity filtering, senti-
ment analysis
Gradient Boost- High accuracy, Computationally Classification, Learning rate, Low bias, high O(n*m*d) (High) Moderate
ing (XGBoost, handles missing expensive, prone to regression, number of trees, variance
LightGBM, data overfitting ranking max depth
CatBoost)
Neural Net- Complex patterns, Requires large Image/speech Layers, learning Low bias, high Varies (High) Moderate to High
works (Deep handles large data data, computa- recognition, rate, activation variance
Learning) tionally expensive NLP
K-Means Clus- Simple, scalable Sensitive to initial- Clustering, Number of clus- N/A O(n*k*i) (Medium) High
tering ization, spherical segmentation, ters (K), distance
clusters anomaly detec- metric
tion
Hierarchical No need for K, in- Computationally Taxonomy, Linkage criteria N/A O(n2 logn)(High) Moderate
Clustering terpretable dendro- expensive, difficult grouping, bioin- (single, complete,
gram for large datasets formatics average)
End of table

1
Continued on next page
Algorithm Advantages Disadvantages Use Cases Key Parameters Bias-Variance Complexity (Class) Outlier Affect
Principal Com- Reduces dimen- Loss of inter- Dimensionality Number of compo- N/A O(n*m2 )(M edium) High
ponent Analysis sionality, removes pretability, requires reduction, fea- nents
(PCA) multicollinearity standardization ture extraction,
noise reduction
Reinforcement Adapts to environ- Requires extensive Robotics, game Discount factor, Varies Varies greatly (High) Varies
Learning (Q- ment, long-term training, complex playing, finance learning rate, ex-
Learning, optimization tuning ploration rate
DDPG, PPO)

Evaluation Metrics

Metric Type Metric Name Description and Formula Use Case


Continued on next page
Metric Type Metric Name Description and Formula Use Case
T P +T N
Classification Accuracy T P +T N +F P +F N Overall performance, sim-
ple cases
TP
Classification Precision T P +F P Minimizing false positives
(e.g., spam detection)
TP
Classification Recall (Sensitivity) T P +F N Minimizing false nega-
tives (e.g., medical diag-
nosis)
TN
Classification Specificity T N +F P Evaluating negative case
performance
P recision×Recall
Classification F1-Score 2× P recision+Recall Balanced precision and
recall
TP FP
Classification Confusion Matrix Detailed error analysis,
FN TN
all classifications
Classification ROC AUC Curve TPR vs. FPR at thresholds. Binary classification, im-
balanced datasets, Deep
Pn Learning.
1
Regression Mean Absolute Error n i=1 |yi − ŷi | Robust to outliers, easy
(MAE) Pn interpretation
1
Regression Mean Squared Error n i=1 (yi − ŷi )2 Penalizes large errors,
(MSE) common optimization
target
End of table

2
Metric Type Metric Name Description and Formula Use Case
Continued on next page
Metric Type Metric Name Description and Formula Use Case
q P
1 n 2
Regression Root Mean Squared Er- n i=1 (yi − ŷi ) Same units as target,
ror (RMSE) common for regression
P 2
Regression R-squared (Coefficient of 1− P(yi −ŷi )2 Model fit, variance ex-
(yi −ȳ)
Determination) plained
Clustering Silhouette Score [-1, 1], similarity to cluster vs. others. Cluster quality, validation
Clustering Davies-Bouldin Index Lower is better, scatters vs. separa- Cluster quality, validation
tions.
Dimensionality Explained Variance Ratio Variance along principal components. Feature reduction, data
Reduction (PCA) understanding

You might also like