0% found this document useful (0 votes)

41 views6 pages

Python Essential Methods in Machine Learning

This document is a comprehensive cheatsheet for essential Python methods used in machine learning, covering data preprocessing, feature selection, dimensionality reduction, model training and evaluation, model selection and hyperparameter tuning, evaluation metrics, model interpretability, persistence, multiclass and multilabel classification, and clustering. Each section lists various functions and classes along with their primary purposes. The cheatsheet serves as a quick reference for practitioners in the field of machine learning.

Uploaded by

vamsitarak55

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views6 pages

Python Essential Methods in Machine Learning

Uploaded by

vamsitarak55

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

# [ Python Essential Methods In Machine Learning ] [ cheatsheet ]

Data Preprocessing:

● train_test_split(): Split data into training and testing sets.

● StandardScaler(): Standardize features by removing the mean and scaling
to unit variance.
● MinMaxScaler(): Scale features to a specified range (default: [0, 1]).
● MaxAbsScaler(): Scale features by their maximum absolute value.
● RobustScaler(): Scale features using statistics that are robust to
outliers.
● Normalizer(): Normalize samples individually to unit norm.
● Binarizer(): Binarize data (set feature values to 0 or 1) according to a
threshold.
● PolynomialFeatures(): Generate polynomial and interaction features.
● FunctionTransformer(): Construct a transformer from an arbitrary
callable.
● KBinsDiscretizer(): Bin continuous data into intervals.
● LabelEncoder(): Encode target labels with integer values between 0 and
n_classes-1.
● OneHotEncoder(): Encode categorical features as a one-hot numeric array.
● OrdinalEncoder(): Encode categorical features as an integer array.
● LabelBinarizer(): Binarize labels in a one-vs-all fashion.
● MultiLabelBinarizer(): Transform between iterable of iterables and a
multilabel format.
● SimpleImputer(): Impute missing values using specified strategy (e.g.,
mean, median, most_frequent).
● IterativeImputer(): Impute missing values by modeling each feature with
missing values as a function of other features.
● KNNImputer(): Impute missing values using k-Nearest Neighbors.
● MissingIndicator(): Transform a dataset into corresponding binary matrix
indicating the presence of missing values.

Feature Selection:

● SelectKBest(): Select features according to the k highest scores.

● SelectPercentile(): Select features according to a percentile of the
highest scores.
● SelectFpr(): Select features based on a false positive rate test.

By: Waleed Mousa

● SelectFdr(): Select features based on an estimated false discovery rate.
● SelectFromModel(): Select features based on importance weights.
● SequentialFeatureSelector(): Select features sequentially based on a
specified criterion.
● RFE(): Feature ranking with recursive feature elimination.
● RFECV(): Feature ranking with recursive feature elimination and
cross-validated selection of the best number of features.
● VarianceThreshold(): Feature selector that removes low-variance
features.
● GenericUnivariateSelect(): Univariate feature selector with configurable
strategy.

Dimensionality Reduction:

● PCA(): Perform principal component analysis (PCA) for dimensionality

reduction.
● IncrementalPCA(): Perform incremental PCA on a large dataset.
● KernelPCA(): Perform kernel PCA for non-linear dimensionality reduction.
● SparsePCA(): Perform PCA with sparsity constraints.
● TruncatedSVD(): Perform dimensionality reduction using truncated SVD
(aka LSA).
● FastICA(): Perform Independent Component Analysis (ICA) for blind source
separation.
● NMF(): Perform non-negative matrix factorization (NMF) for
dimensionality reduction.
● MiniBatchNMF(): Perform mini-batch non-negative matrix factorization.
● LatentDirichletAllocation(): Perform Latent Dirichlet Allocation (LDA)
for topic modeling.
● TSNE(): Perform t-distributed Stochastic Neighbor Embedding for
dimensionality reduction.
● Isomap(): Perform Isomap embedding for non-linear dimensionality
reduction.
● LocallyLinearEmbedding(): Perform Locally Linear Embedding for
non-linear dimensionality reduction.
● MDS(): Perform Multidimensional Scaling (MDS) for dimensionality
reduction.
● SpectralEmbedding(): Perform spectral embedding for non-linear
dimensionality reduction.

By: Waleed Mousa

Model Training and Evaluation:

● fit(): Train a model on the given training data.

● predict(): Make predictions using a trained model.
● score(): Return the mean accuracy on the given test data and labels.
● cross_val_score(): Perform cross-validation and compute accuracy scores.
● cross_val_predict(): Generate cross-validated estimates for each input
data point.
● cross_validate(): Evaluate a model using cross-validation.
● learning_curve(): Compute the learning curve to assess model
performance.
● validation_curve(): Compute the validation curve to assess model
performance.
● permutation_test_score(): Perform a permutation test for model
evaluation.
● check_cv(): Determine the cross-validation splitting strategy.
● train_test_split(): Split data into training and testing sets.
● KFold(): K-Folds cross-validator.
● StratifiedKFold(): Stratified K-Folds cross-validator.
● LeaveOneOut(): Leave-One-Out cross-validator.
● LeavePOut(): Leave-P-Out cross-validator.
● ShuffleSplit(): Random permutation cross-validator.
● StratifiedShuffleSplit(): Stratified ShuffleSplit cross-validator.
● TimeSeriesSplit(): Time Series cross-validator.

Model Selection and Hyperparameter Tuning:

● GridSearchCV(): Perform grid search over specified parameter values for

an estimator.
● RandomizedSearchCV(): Perform randomized search over specified parameter
distributions for an estimator.
● HalvingGridSearchCV(): Perform successive halving with grid search.
● HalvingRandomSearchCV(): Perform successive halving with randomized
search.
● BayesSearchCV(): Perform Bayesian optimization for hyperparameter
tuning.
● validation_curve(): Compute the validation curve to assess model
performance.

By: Waleed Mousa

● learning_curve(): Compute the learning curve to assess model
performance.

Model Evaluation Metrics:

● accuracy_score(): Compute the accuracy score.

● balanced_accuracy_score(): Compute the balanced accuracy score.
● average_precision_score(): Compute the average precision score.
● brier_score_loss(): Compute the Brier score loss.
● classification_report(): Build a text report showing the main
classification metrics.
● cohen_kappa_score(): Compute the Cohen's kappa score.
● confusion_matrix(): Compute the confusion matrix to evaluate the
accuracy of a classification.
● dcg_score(): Compute the Discounted Cumulative Gain (DCG) score.
● det_curve(): Compute the Detection Error Tradeoff (DET) curve.
● f1_score(): Compute the F1 score, which is the harmonic mean of
precision and recall.
● fbeta_score(): Compute the F-beta score, which is the weighted harmonic
mean of precision and recall.
● hamming_loss(): Compute the Hamming loss.
● hinge_loss(): Compute the hinge loss for binary classification.
● jaccard_score(): Compute the Jaccard similarity coefficient score.
● log_loss(): Compute the logarithmic loss.
● matthews_corrcoef(): Compute the Matthews correlation coefficient (MCC).
● multilabel_confusion_matrix(): Compute a confusion matrix for each class
or sample.
● ndcg_score(): Compute the Normalized Discounted Cumulative Gain (NDCG)
score.
● precision_recall_curve(): Compute precision-recall pairs for different
probability thresholds.
● precision_recall_fscore_support(): Compute precision, recall, F-measure,
and support for each class.
● precision_score(): Compute the precision score.
● recall_score(): Compute the recall score.
● roc_auc_score(): Compute the Area Under the Receiver Operating
Characteristic Curve (ROC AUC) score.
● roc_curve(): Compute Receiver Operating Characteristic (ROC) curve.
● top_k_accuracy_score(): Compute the Top-k accuracy score.

By: Waleed Mousa

● zero_one_loss(): Compute the Zero-One classification loss.
● explained_variance_score(): Compute the explained variance score.
● max_error(): Compute the maximum residual error.
● mean_absolute_error(): Compute the mean absolute error.
● mean_squared_error(): Compute the mean squared error.
● mean_squared_log_error(): Compute the mean squared logarithmic error.
● median_absolute_error(): Compute the median absolute error.
● r2_score(): Compute the coefficient of determination (R^2) score.
● mean_poisson_deviance(): Compute the mean Poisson deviance.
● mean_gamma_deviance(): Compute the mean Gamma deviance.
● mean_tweedie_deviance(): Compute the mean Tweedie deviance.

Model Interpretability:

● permutation_importance(): Compute feature importances using permutation

importance.
● partial_dependence(): Compute partial dependence plots.
● plot_partial_dependence(): Plot partial dependence plots.
● plot_tree(): Plot a decision tree.
● export_graphviz(): Export a decision tree in DOT format.
● export_text(): Export a decision tree in text format.
● inspect(): Inspect an estimator or a callable.

Model Persistence:

● pickle.dump(): Save a trained model to a file using pickle.

● pickle.load(): Load a trained model from a file using pickle.
● joblib.dump(): Save a trained model to a file using joblib.
● joblib.load(): Load a trained model from a file using joblib.

Multiclass and Multilabel Classification:

● OneVsRestClassifier(): One-vs-the-rest (OvR) multiclass/multilabel

strategy.
● OneVsOneClassifier(): One-vs-one (OvO) multiclass strategy.
● OutputCodeClassifier(): (Error-Correcting) Output-Code multiclass
strategy.
● ClassifierChain(): A multi-label model that arranges binary classifiers
into a chain.
● MultiOutputClassifier(): Multi target classification.

By: Waleed Mousa

Clustering:

● KMeans(): K-Means clustering algorithm.

● MiniBatchKMeans(): Mini-Batch K-Means clustering algorithm.
● AffinityPropagation(): Affinity Propagation clustering algorithm.
● MeanShift(): Mean Shift clustering algorithm.
● SpectralClustering(): Spectral clustering algorithm.
● AgglomerativeClustering(): Agglomerative Hierarchical Clustering
algorithm.
● DBSCAN(): Density-Based Spatial Clustering of Applications with Noise
(DBSCAN) algorithm.
● OPTICS(): Ordering Points To Identify the Clustering Structure (OPTICS)
algorithm.
● Birch(): Balanced Iterative Reducing and Clustering using Hierarchies
(BIRCH) algorithm.
● FeatureAgglomeration(): Agglomerate features.

By: Waleed Mousa

Power BI Deployment Pipelines CheatSheet 1731972155
No ratings yet
Power BI Deployment Pipelines CheatSheet 1731972155
10 pages
GitLab CI CD Operations CheatSheet 1731972419
No ratings yet
GitLab CI CD Operations CheatSheet 1731972419
11 pages
Scripts
No ratings yet
Scripts
5 pages
Key Machine Learning Terminologies and Their Expla
No ratings yet
Key Machine Learning Terminologies and Their Expla
4 pages
Singapore Maths (P2) Test 1
No ratings yet
Singapore Maths (P2) Test 1
3 pages
ML Complete Notes Hridoy
No ratings yet
ML Complete Notes Hridoy
5 pages
PA UNIT 4
No ratings yet
PA UNIT 4
5 pages
AIML Project
No ratings yet
AIML Project
4 pages
SLAC-Proposal-May 19, 2023
No ratings yet
SLAC-Proposal-May 19, 2023
16 pages
Phase 3 IBM
No ratings yet
Phase 3 IBM
7 pages
ML notes
No ratings yet
ML notes
8 pages
SML
No ratings yet
SML
8 pages
Machine Learning Engineer Interview Preparation Guide
No ratings yet
Machine Learning Engineer Interview Preparation Guide
14 pages
Nurses Notes: Patient Name: Mr. X Age: 48 Y/o Sex: Male C.S: Married Room/bed No.: 6
50% (2)
Nurses Notes: Patient Name: Mr. X Age: 48 Y/o Sex: Male C.S: Married Room/bed No.: 6
2 pages
ML Overview
No ratings yet
ML Overview
11 pages
Exam Preparation Notes
No ratings yet
Exam Preparation Notes
31 pages
Machine Learning Cheat Sheet
No ratings yet
Machine Learning Cheat Sheet
15 pages
Web Scraping and Data Collection CheatSheet 1731972399
No ratings yet
Web Scraping and Data Collection CheatSheet 1731972399
10 pages
A3 Classification and Feature Engineering
No ratings yet
A3 Classification and Feature Engineering
2 pages
ML LAB
No ratings yet
ML LAB
29 pages
Da Lab Mannual
No ratings yet
Da Lab Mannual
25 pages
ML Functions
No ratings yet
ML Functions
12 pages
Regression Analysis - Cheatsheet
No ratings yet
Regression Analysis - Cheatsheet
9 pages
Model Evaluation and Selection Cheatsheet 1708023215
No ratings yet
Model Evaluation and Selection Cheatsheet 1708023215
7 pages
PYTHON PROGRAMMING FOR MACHINE LEARNING-220901004 - Compressed
No ratings yet
PYTHON PROGRAMMING FOR MACHINE LEARNING-220901004 - Compressed
6 pages
Northbay Summarizes Data Pre-Processing Algorithms
No ratings yet
Northbay Summarizes Data Pre-Processing Algorithms
10 pages
ML Book Notes
No ratings yet
ML Book Notes
9 pages
Hyper Parameter Tuning
No ratings yet
Hyper Parameter Tuning
4 pages
Comprehensive Python CheatSheet 1731972192
No ratings yet
Comprehensive Python CheatSheet 1731972192
10 pages
Articles Xgboost Classification With Smote-Enn Algorithm
No ratings yet
Articles Xgboost Classification With Smote-Enn Algorithm
11 pages
AI Note
No ratings yet
AI Note
5 pages
Complete Data Science Questions
No ratings yet
Complete Data Science Questions
5 pages
Final ML
No ratings yet
Final ML
2 pages
Progress of GRADIENT BOOSTING ALGORITHM FOR ELECTRICITY THEFT DETECTION IN POWER UTILITIES
No ratings yet
Progress of GRADIENT BOOSTING ALGORITHM FOR ELECTRICITY THEFT DETECTION IN POWER UTILITIES
10 pages
DPT Week 1
No ratings yet
DPT Week 1
3 pages
MLTAHER
No ratings yet
MLTAHER
14 pages
ML Viva Practice (Answers)
No ratings yet
ML Viva Practice (Answers)
4 pages
Topics Group 8 MTB Mle
No ratings yet
Topics Group 8 MTB Mle
4 pages
ML Lab-1
No ratings yet
ML Lab-1
32 pages
ML Pipeline
No ratings yet
ML Pipeline
6 pages
Kaggle Course Notes
No ratings yet
Kaggle Course Notes
87 pages
Unit 5
No ratings yet
Unit 5
11 pages
ML Assignment
No ratings yet
ML Assignment
13 pages
NF Assighment4
No ratings yet
NF Assighment4
5 pages
Hikmah Task 10 Presentation
No ratings yet
Hikmah Task 10 Presentation
12 pages
Assignment 9
No ratings yet
Assignment 9
8 pages
Progress of CATBOOST ALGORITHM FOR ELECTRICITY THEFT DETECTION IN POWER UTILITIES
No ratings yet
Progress of CATBOOST ALGORITHM FOR ELECTRICITY THEFT DETECTION IN POWER UTILITIES
9 pages
(REPORT) LAB - 2 - Decision - Tree
No ratings yet
(REPORT) LAB - 2 - Decision - Tree
17 pages
Lecture 5 - Feature Extraction, Model Building & Evaluation
No ratings yet
Lecture 5 - Feature Extraction, Model Building & Evaluation
35 pages
Structured Format of Predictive
No ratings yet
Structured Format of Predictive
13 pages
AML Code For m2
No ratings yet
AML Code For m2
7 pages
ML5&6&7&8&9&10
No ratings yet
ML5&6&7&8&9&10
35 pages
Ann Experiential Learning
No ratings yet
Ann Experiential Learning
43 pages
Data Modeling - Cheatsheet
No ratings yet
Data Modeling - Cheatsheet
9 pages
AIML Short Term Internship Session 10 Summary-1719293295226
No ratings yet
AIML Short Term Internship Session 10 Summary-1719293295226
3 pages
UNIT-2 ML
No ratings yet
UNIT-2 ML
10 pages
Data Collection
No ratings yet
Data Collection
8 pages
Applied Motor Control and Control Module ISU Ilagan - 060606
No ratings yet
Applied Motor Control and Control Module ISU Ilagan - 060606
10 pages
SQL For Data Science
No ratings yet
SQL For Data Science
8 pages
Experiential Learning Presentation With Index
No ratings yet
Experiential Learning Presentation With Index
12 pages
6 Workflow
No ratings yet
6 Workflow
11 pages
Comprehensive Overview of Common ML Techniques
No ratings yet
Comprehensive Overview of Common ML Techniques
7 pages
MLT 1 - 7 Kanish
No ratings yet
MLT 1 - 7 Kanish
24 pages
Ai Chapter 3
No ratings yet
Ai Chapter 3
8 pages
Power BI Important Shortcuts
No ratings yet
Power BI Important Shortcuts
5 pages
Classification Metrics For Generalized Results
No ratings yet
Classification Metrics For Generalized Results
70 pages
Data Wrangling With Dask CheatSheet 1731972488
No ratings yet
Data Wrangling With Dask CheatSheet 1731972488
7 pages
TED TALK Manoush Zomorodi
100% (1)
TED TALK Manoush Zomorodi
2 pages
Artikel Media Pembelajaran
No ratings yet
Artikel Media Pembelajaran
15 pages
DLL - Science 4 - q1 w2 Final
No ratings yet
DLL - Science 4 - q1 w2 Final
11 pages
Python The Essentials 1731972875
No ratings yet
Python The Essentials 1731972875
11 pages
Machine Learning Deep
No ratings yet
Machine Learning Deep
95 pages
Hermann Haken
No ratings yet
Hermann Haken
2 pages
Machine Learning Practice
No ratings yet
Machine Learning Practice
17 pages
English 3
100% (1)
English 3
5 pages
Consortium of National Law Universities: Provisional 2nd List - CLAT 2023 - UG
No ratings yet
Consortium of National Law Universities: Provisional 2nd List - CLAT 2023 - UG
6 pages
Kenny-230718-The Ultimate Machine Learning Cheat Sheet
No ratings yet
Kenny-230718-The Ultimate Machine Learning Cheat Sheet
20 pages
Umbrella To Which All The Defense Mechanism Exist
No ratings yet
Umbrella To Which All The Defense Mechanism Exist
9 pages
Python Lists, Sets, and Tuples
No ratings yet
Python Lists, Sets, and Tuples
5 pages
Anne Vosser
No ratings yet
Anne Vosser
3 pages
Jomar and Co. Edited
No ratings yet
Jomar and Co. Edited
3 pages
BracU Scholarship - Financial Aid Policy (Undergraduate) Jan 27 2020
No ratings yet
BracU Scholarship - Financial Aid Policy (Undergraduate) Jan 27 2020
7 pages
SVM K NN MLP With Sklearn Jupyter NoteBo
No ratings yet
SVM K NN MLP With Sklearn Jupyter NoteBo
22 pages
Intermediate Relational Database Certificate
No ratings yet
Intermediate Relational Database Certificate
1 page
Referensi
No ratings yet
Referensi
9 pages
Nanomaterials - Introductory Slides
No ratings yet
Nanomaterials - Introductory Slides
5 pages
Catch
No ratings yet
Catch
2 pages
Alumni: in Their Own Words: Admissions Essays That Worked
No ratings yet
Alumni: in Their Own Words: Admissions Essays That Worked
10 pages
G7 Physics Comp Review Packet 2022-2023
No ratings yet
G7 Physics Comp Review Packet 2022-2023
25 pages
Multimodal AI On Wound Images and Clinical Notes For Home Patient Referral
No ratings yet
Multimodal AI On Wound Images and Clinical Notes For Home Patient Referral
11 pages
PG Handbook 2019
No ratings yet
PG Handbook 2019
96 pages
Application Form Status Details Pandey
No ratings yet
Application Form Status Details Pandey
1 page
Catherine Hoblin-5
No ratings yet
Catherine Hoblin-5
1 page
（阿里国际-MarcoVL 团队）WINGS- Learning Multimodal LLMs Without Text-Only Forgetting
No ratings yet
（阿里国际-MarcoVL 团队）WINGS- Learning Multimodal LLMs Without Text-Only Forgetting
19 pages
Model: BERT + DNN Discussion: Anushya Subbiah Divya Sudhakar Kenny Hsu
No ratings yet
Model: BERT + DNN Discussion: Anushya Subbiah Divya Sudhakar Kenny Hsu
1 page
PBL Rubric Ed PDF
No ratings yet
PBL Rubric Ed PDF
1 page

Python Essential Methods in Machine Learning

Uploaded by

Python Essential Methods in Machine Learning

Uploaded by

# [ Python Essential Methods In Machine Learning ] [ cheatsheet ]

● train_test_split(): Split data into training and testing sets.

● SelectKBest(): Select features according to the k highest scores.

By: Waleed Mousa

● PCA(): Perform principal component analysis (PCA) for dimensionality

By: Waleed Mousa

● fit(): Train a model on the given training data.

Model Selection and Hyperparameter Tuning:

● GridSearchCV(): Perform grid search over specified parameter values for

By: Waleed Mousa

Model Evaluation Metrics:

● accuracy_score(): Compute the accuracy score.

By: Waleed Mousa

● permutation_importance(): Compute feature importances using permutation

● pickle.dump(): Save a trained model to a file using pickle.

Multiclass and Multilabel Classification:

● OneVsRestClassifier(): One-vs-the-rest (OvR) multiclass/multilabel

By: Waleed Mousa

● KMeans(): K-Means clustering algorithm.

By: Waleed Mousa

You might also like