# [ Python Essential Methods In Machine Learning ] [ cheatsheet ]
Data Preprocessing:
● train_test_split(): Split data into training and testing sets.
● StandardScaler(): Standardize features by removing the mean and scaling
to unit variance.
● MinMaxScaler(): Scale features to a specified range (default: [0, 1]).
● MaxAbsScaler(): Scale features by their maximum absolute value.
● RobustScaler(): Scale features using statistics that are robust to
outliers.
● Normalizer(): Normalize samples individually to unit norm.
● Binarizer(): Binarize data (set feature values to 0 or 1) according to a
threshold.
● PolynomialFeatures(): Generate polynomial and interaction features.
● FunctionTransformer(): Construct a transformer from an arbitrary
callable.
● KBinsDiscretizer(): Bin continuous data into intervals.
● LabelEncoder(): Encode target labels with integer values between 0 and
n_classes-1.
● OneHotEncoder(): Encode categorical features as a one-hot numeric array.
● OrdinalEncoder(): Encode categorical features as an integer array.
● LabelBinarizer(): Binarize labels in a one-vs-all fashion.
● MultiLabelBinarizer(): Transform between iterable of iterables and a
multilabel format.
● SimpleImputer(): Impute missing values using specified strategy (e.g.,
mean, median, most_frequent).
● IterativeImputer(): Impute missing values by modeling each feature with
missing values as a function of other features.
● KNNImputer(): Impute missing values using k-Nearest Neighbors.
● MissingIndicator(): Transform a dataset into corresponding binary matrix
indicating the presence of missing values.
Feature Selection:
● SelectKBest(): Select features according to the k highest scores.
● SelectPercentile(): Select features according to a percentile of the
highest scores.
● SelectFpr(): Select features based on a false positive rate test.
By: Waleed Mousa
● SelectFdr(): Select features based on an estimated false discovery rate.
● SelectFromModel(): Select features based on importance weights.
● SequentialFeatureSelector(): Select features sequentially based on a
specified criterion.
● RFE(): Feature ranking with recursive feature elimination.
● RFECV(): Feature ranking with recursive feature elimination and
cross-validated selection of the best number of features.
● VarianceThreshold(): Feature selector that removes low-variance
features.
● GenericUnivariateSelect(): Univariate feature selector with configurable
strategy.
Dimensionality Reduction:
● PCA(): Perform principal component analysis (PCA) for dimensionality
reduction.
● IncrementalPCA(): Perform incremental PCA on a large dataset.
● KernelPCA(): Perform kernel PCA for non-linear dimensionality reduction.
● SparsePCA(): Perform PCA with sparsity constraints.
● TruncatedSVD(): Perform dimensionality reduction using truncated SVD
(aka LSA).
● FastICA(): Perform Independent Component Analysis (ICA) for blind source
separation.
● NMF(): Perform non-negative matrix factorization (NMF) for
dimensionality reduction.
● MiniBatchNMF(): Perform mini-batch non-negative matrix factorization.
● LatentDirichletAllocation(): Perform Latent Dirichlet Allocation (LDA)
for topic modeling.
● TSNE(): Perform t-distributed Stochastic Neighbor Embedding for
dimensionality reduction.
● Isomap(): Perform Isomap embedding for non-linear dimensionality
reduction.
● LocallyLinearEmbedding(): Perform Locally Linear Embedding for
non-linear dimensionality reduction.
● MDS(): Perform Multidimensional Scaling (MDS) for dimensionality
reduction.
● SpectralEmbedding(): Perform spectral embedding for non-linear
dimensionality reduction.
By: Waleed Mousa
Model Training and Evaluation:
● fit(): Train a model on the given training data.
● predict(): Make predictions using a trained model.
● score(): Return the mean accuracy on the given test data and labels.
● cross_val_score(): Perform cross-validation and compute accuracy scores.
● cross_val_predict(): Generate cross-validated estimates for each input
data point.
● cross_validate(): Evaluate a model using cross-validation.
● learning_curve(): Compute the learning curve to assess model
performance.
● validation_curve(): Compute the validation curve to assess model
performance.
● permutation_test_score(): Perform a permutation test for model
evaluation.
● check_cv(): Determine the cross-validation splitting strategy.
● train_test_split(): Split data into training and testing sets.
● KFold(): K-Folds cross-validator.
● StratifiedKFold(): Stratified K-Folds cross-validator.
● LeaveOneOut(): Leave-One-Out cross-validator.
● LeavePOut(): Leave-P-Out cross-validator.
● ShuffleSplit(): Random permutation cross-validator.
● StratifiedShuffleSplit(): Stratified ShuffleSplit cross-validator.
● TimeSeriesSplit(): Time Series cross-validator.
Model Selection and Hyperparameter Tuning:
● GridSearchCV(): Perform grid search over specified parameter values for
an estimator.
● RandomizedSearchCV(): Perform randomized search over specified parameter
distributions for an estimator.
● HalvingGridSearchCV(): Perform successive halving with grid search.
● HalvingRandomSearchCV(): Perform successive halving with randomized
search.
● BayesSearchCV(): Perform Bayesian optimization for hyperparameter
tuning.
● validation_curve(): Compute the validation curve to assess model
performance.
By: Waleed Mousa
● learning_curve(): Compute the learning curve to assess model
performance.
Model Evaluation Metrics:
● accuracy_score(): Compute the accuracy score.
● balanced_accuracy_score(): Compute the balanced accuracy score.
● average_precision_score(): Compute the average precision score.
● brier_score_loss(): Compute the Brier score loss.
● classification_report(): Build a text report showing the main
classification metrics.
● cohen_kappa_score(): Compute the Cohen's kappa score.
● confusion_matrix(): Compute the confusion matrix to evaluate the
accuracy of a classification.
● dcg_score(): Compute the Discounted Cumulative Gain (DCG) score.
● det_curve(): Compute the Detection Error Tradeoff (DET) curve.
● f1_score(): Compute the F1 score, which is the harmonic mean of
precision and recall.
● fbeta_score(): Compute the F-beta score, which is the weighted harmonic
mean of precision and recall.
● hamming_loss(): Compute the Hamming loss.
● hinge_loss(): Compute the hinge loss for binary classification.
● jaccard_score(): Compute the Jaccard similarity coefficient score.
● log_loss(): Compute the logarithmic loss.
● matthews_corrcoef(): Compute the Matthews correlation coefficient (MCC).
● multilabel_confusion_matrix(): Compute a confusion matrix for each class
or sample.
● ndcg_score(): Compute the Normalized Discounted Cumulative Gain (NDCG)
score.
● precision_recall_curve(): Compute precision-recall pairs for different
probability thresholds.
● precision_recall_fscore_support(): Compute precision, recall, F-measure,
and support for each class.
● precision_score(): Compute the precision score.
● recall_score(): Compute the recall score.
● roc_auc_score(): Compute the Area Under the Receiver Operating
Characteristic Curve (ROC AUC) score.
● roc_curve(): Compute Receiver Operating Characteristic (ROC) curve.
● top_k_accuracy_score(): Compute the Top-k accuracy score.
By: Waleed Mousa
● zero_one_loss(): Compute the Zero-One classification loss.
● explained_variance_score(): Compute the explained variance score.
● max_error(): Compute the maximum residual error.
● mean_absolute_error(): Compute the mean absolute error.
● mean_squared_error(): Compute the mean squared error.
● mean_squared_log_error(): Compute the mean squared logarithmic error.
● median_absolute_error(): Compute the median absolute error.
● r2_score(): Compute the coefficient of determination (R^2) score.
● mean_poisson_deviance(): Compute the mean Poisson deviance.
● mean_gamma_deviance(): Compute the mean Gamma deviance.
● mean_tweedie_deviance(): Compute the mean Tweedie deviance.
Model Interpretability:
● permutation_importance(): Compute feature importances using permutation
importance.
● partial_dependence(): Compute partial dependence plots.
● plot_partial_dependence(): Plot partial dependence plots.
● plot_tree(): Plot a decision tree.
● export_graphviz(): Export a decision tree in DOT format.
● export_text(): Export a decision tree in text format.
● inspect(): Inspect an estimator or a callable.
Model Persistence:
● pickle.dump(): Save a trained model to a file using pickle.
● pickle.load(): Load a trained model from a file using pickle.
● joblib.dump(): Save a trained model to a file using joblib.
● joblib.load(): Load a trained model from a file using joblib.
Multiclass and Multilabel Classification:
● OneVsRestClassifier(): One-vs-the-rest (OvR) multiclass/multilabel
strategy.
● OneVsOneClassifier(): One-vs-one (OvO) multiclass strategy.
● OutputCodeClassifier(): (Error-Correcting) Output-Code multiclass
strategy.
● ClassifierChain(): A multi-label model that arranges binary classifiers
into a chain.
● MultiOutputClassifier(): Multi target classification.
By: Waleed Mousa
Clustering:
● KMeans(): K-Means clustering algorithm.
● MiniBatchKMeans(): Mini-Batch K-Means clustering algorithm.
● AffinityPropagation(): Affinity Propagation clustering algorithm.
● MeanShift(): Mean Shift clustering algorithm.
● SpectralClustering(): Spectral clustering algorithm.
● AgglomerativeClustering(): Agglomerative Hierarchical Clustering
algorithm.
● DBSCAN(): Density-Based Spatial Clustering of Applications with Noise
(DBSCAN) algorithm.
● OPTICS(): Ordering Points To Identify the Clustering Structure (OPTICS)
algorithm.
● Birch(): Balanced Iterative Reducing and Clustering using Hierarchies
(BIRCH) algorithm.
● FeatureAgglomeration(): Agglomerate features.
By: Waleed Mousa