0% found this document useful (0 votes)
16 views

Python Essential Methods In Machine Learning

This document is a comprehensive cheatsheet for essential Python methods used in machine learning, covering data preprocessing, feature selection, dimensionality reduction, model training and evaluation, model selection and hyperparameter tuning, evaluation metrics, model interpretability, persistence, multiclass and multilabel classification, and clustering. Each section lists various functions and classes along with their primary purposes. The cheatsheet serves as a quick reference for practitioners in the field of machine learning.

Uploaded by

vamsitarak55
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Python Essential Methods In Machine Learning

This document is a comprehensive cheatsheet for essential Python methods used in machine learning, covering data preprocessing, feature selection, dimensionality reduction, model training and evaluation, model selection and hyperparameter tuning, evaluation metrics, model interpretability, persistence, multiclass and multilabel classification, and clustering. Each section lists various functions and classes along with their primary purposes. The cheatsheet serves as a quick reference for practitioners in the field of machine learning.

Uploaded by

vamsitarak55
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

# [ Python Essential Methods In Machine Learning ] [ cheatsheet ]

Data Preprocessing:

● train_test_split(): Split data into training and testing sets.


● StandardScaler(): Standardize features by removing the mean and scaling
to unit variance.
● MinMaxScaler(): Scale features to a specified range (default: [0, 1]).
● MaxAbsScaler(): Scale features by their maximum absolute value.
● RobustScaler(): Scale features using statistics that are robust to
outliers.
● Normalizer(): Normalize samples individually to unit norm.
● Binarizer(): Binarize data (set feature values to 0 or 1) according to a
threshold.
● PolynomialFeatures(): Generate polynomial and interaction features.
● FunctionTransformer(): Construct a transformer from an arbitrary
callable.
● KBinsDiscretizer(): Bin continuous data into intervals.
● LabelEncoder(): Encode target labels with integer values between 0 and
n_classes-1.
● OneHotEncoder(): Encode categorical features as a one-hot numeric array.
● OrdinalEncoder(): Encode categorical features as an integer array.
● LabelBinarizer(): Binarize labels in a one-vs-all fashion.
● MultiLabelBinarizer(): Transform between iterable of iterables and a
multilabel format.
● SimpleImputer(): Impute missing values using specified strategy (e.g.,
mean, median, most_frequent).
● IterativeImputer(): Impute missing values by modeling each feature with
missing values as a function of other features.
● KNNImputer(): Impute missing values using k-Nearest Neighbors.
● MissingIndicator(): Transform a dataset into corresponding binary matrix
indicating the presence of missing values.

Feature Selection:

● SelectKBest(): Select features according to the k highest scores.


● SelectPercentile(): Select features according to a percentile of the
highest scores.
● SelectFpr(): Select features based on a false positive rate test.

By: Waleed Mousa


● SelectFdr(): Select features based on an estimated false discovery rate.
● SelectFromModel(): Select features based on importance weights.
● SequentialFeatureSelector(): Select features sequentially based on a
specified criterion.
● RFE(): Feature ranking with recursive feature elimination.
● RFECV(): Feature ranking with recursive feature elimination and
cross-validated selection of the best number of features.
● VarianceThreshold(): Feature selector that removes low-variance
features.
● GenericUnivariateSelect(): Univariate feature selector with configurable
strategy.

Dimensionality Reduction:

● PCA(): Perform principal component analysis (PCA) for dimensionality


reduction.
● IncrementalPCA(): Perform incremental PCA on a large dataset.
● KernelPCA(): Perform kernel PCA for non-linear dimensionality reduction.
● SparsePCA(): Perform PCA with sparsity constraints.
● TruncatedSVD(): Perform dimensionality reduction using truncated SVD
(aka LSA).
● FastICA(): Perform Independent Component Analysis (ICA) for blind source
separation.
● NMF(): Perform non-negative matrix factorization (NMF) for
dimensionality reduction.
● MiniBatchNMF(): Perform mini-batch non-negative matrix factorization.
● LatentDirichletAllocation(): Perform Latent Dirichlet Allocation (LDA)
for topic modeling.
● TSNE(): Perform t-distributed Stochastic Neighbor Embedding for
dimensionality reduction.
● Isomap(): Perform Isomap embedding for non-linear dimensionality
reduction.
● LocallyLinearEmbedding(): Perform Locally Linear Embedding for
non-linear dimensionality reduction.
● MDS(): Perform Multidimensional Scaling (MDS) for dimensionality
reduction.
● SpectralEmbedding(): Perform spectral embedding for non-linear
dimensionality reduction.

By: Waleed Mousa


Model Training and Evaluation:

● fit(): Train a model on the given training data.


● predict(): Make predictions using a trained model.
● score(): Return the mean accuracy on the given test data and labels.
● cross_val_score(): Perform cross-validation and compute accuracy scores.
● cross_val_predict(): Generate cross-validated estimates for each input
data point.
● cross_validate(): Evaluate a model using cross-validation.
● learning_curve(): Compute the learning curve to assess model
performance.
● validation_curve(): Compute the validation curve to assess model
performance.
● permutation_test_score(): Perform a permutation test for model
evaluation.
● check_cv(): Determine the cross-validation splitting strategy.
● train_test_split(): Split data into training and testing sets.
● KFold(): K-Folds cross-validator.
● StratifiedKFold(): Stratified K-Folds cross-validator.
● LeaveOneOut(): Leave-One-Out cross-validator.
● LeavePOut(): Leave-P-Out cross-validator.
● ShuffleSplit(): Random permutation cross-validator.
● StratifiedShuffleSplit(): Stratified ShuffleSplit cross-validator.
● TimeSeriesSplit(): Time Series cross-validator.

Model Selection and Hyperparameter Tuning:

● GridSearchCV(): Perform grid search over specified parameter values for


an estimator.
● RandomizedSearchCV(): Perform randomized search over specified parameter
distributions for an estimator.
● HalvingGridSearchCV(): Perform successive halving with grid search.
● HalvingRandomSearchCV(): Perform successive halving with randomized
search.
● BayesSearchCV(): Perform Bayesian optimization for hyperparameter
tuning.
● validation_curve(): Compute the validation curve to assess model
performance.

By: Waleed Mousa


● learning_curve(): Compute the learning curve to assess model
performance.

Model Evaluation Metrics:

● accuracy_score(): Compute the accuracy score.


● balanced_accuracy_score(): Compute the balanced accuracy score.
● average_precision_score(): Compute the average precision score.
● brier_score_loss(): Compute the Brier score loss.
● classification_report(): Build a text report showing the main
classification metrics.
● cohen_kappa_score(): Compute the Cohen's kappa score.
● confusion_matrix(): Compute the confusion matrix to evaluate the
accuracy of a classification.
● dcg_score(): Compute the Discounted Cumulative Gain (DCG) score.
● det_curve(): Compute the Detection Error Tradeoff (DET) curve.
● f1_score(): Compute the F1 score, which is the harmonic mean of
precision and recall.
● fbeta_score(): Compute the F-beta score, which is the weighted harmonic
mean of precision and recall.
● hamming_loss(): Compute the Hamming loss.
● hinge_loss(): Compute the hinge loss for binary classification.
● jaccard_score(): Compute the Jaccard similarity coefficient score.
● log_loss(): Compute the logarithmic loss.
● matthews_corrcoef(): Compute the Matthews correlation coefficient (MCC).
● multilabel_confusion_matrix(): Compute a confusion matrix for each class
or sample.
● ndcg_score(): Compute the Normalized Discounted Cumulative Gain (NDCG)
score.
● precision_recall_curve(): Compute precision-recall pairs for different
probability thresholds.
● precision_recall_fscore_support(): Compute precision, recall, F-measure,
and support for each class.
● precision_score(): Compute the precision score.
● recall_score(): Compute the recall score.
● roc_auc_score(): Compute the Area Under the Receiver Operating
Characteristic Curve (ROC AUC) score.
● roc_curve(): Compute Receiver Operating Characteristic (ROC) curve.
● top_k_accuracy_score(): Compute the Top-k accuracy score.

By: Waleed Mousa


● zero_one_loss(): Compute the Zero-One classification loss.
● explained_variance_score(): Compute the explained variance score.
● max_error(): Compute the maximum residual error.
● mean_absolute_error(): Compute the mean absolute error.
● mean_squared_error(): Compute the mean squared error.
● mean_squared_log_error(): Compute the mean squared logarithmic error.
● median_absolute_error(): Compute the median absolute error.
● r2_score(): Compute the coefficient of determination (R^2) score.
● mean_poisson_deviance(): Compute the mean Poisson deviance.
● mean_gamma_deviance(): Compute the mean Gamma deviance.
● mean_tweedie_deviance(): Compute the mean Tweedie deviance.

Model Interpretability:

● permutation_importance(): Compute feature importances using permutation


importance.
● partial_dependence(): Compute partial dependence plots.
● plot_partial_dependence(): Plot partial dependence plots.
● plot_tree(): Plot a decision tree.
● export_graphviz(): Export a decision tree in DOT format.
● export_text(): Export a decision tree in text format.
● inspect(): Inspect an estimator or a callable.

Model Persistence:

● pickle.dump(): Save a trained model to a file using pickle.


● pickle.load(): Load a trained model from a file using pickle.
● joblib.dump(): Save a trained model to a file using joblib.
● joblib.load(): Load a trained model from a file using joblib.

Multiclass and Multilabel Classification:

● OneVsRestClassifier(): One-vs-the-rest (OvR) multiclass/multilabel


strategy.
● OneVsOneClassifier(): One-vs-one (OvO) multiclass strategy.
● OutputCodeClassifier(): (Error-Correcting) Output-Code multiclass
strategy.
● ClassifierChain(): A multi-label model that arranges binary classifiers
into a chain.
● MultiOutputClassifier(): Multi target classification.

By: Waleed Mousa


Clustering:

● KMeans(): K-Means clustering algorithm.


● MiniBatchKMeans(): Mini-Batch K-Means clustering algorithm.
● AffinityPropagation(): Affinity Propagation clustering algorithm.
● MeanShift(): Mean Shift clustering algorithm.
● SpectralClustering(): Spectral clustering algorithm.
● AgglomerativeClustering(): Agglomerative Hierarchical Clustering
algorithm.
● DBSCAN(): Density-Based Spatial Clustering of Applications with Noise
(DBSCAN) algorithm.
● OPTICS(): Ordering Points To Identify the Clustering Structure (OPTICS)
algorithm.
● Birch(): Balanced Iterative Reducing and Clustering using Hierarchies
(BIRCH) algorithm.
● FeatureAgglomeration(): Agglomerate features.

By: Waleed Mousa

You might also like