0% found this document useful (0 votes)

102 views19 pages

Comprehensive Guide To Multiclass Classification With Sklearn - Towards Data Science

This document provides a comprehensive guide to multiclass classification with Sklearn. It discusses three approaches Sklearn uses for multiclass classification: native multiclass classifiers, binary classifiers with a one-vs-one strategy, and binary classifiers with a one-vs-rest strategy. It covers choosing an appropriate model, evaluation metrics, and hyperparameter tuning to optimize a custom evaluation metric for multiclass problems.

Uploaded by

Samuel Asmelash

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

102 views19 pages

Comprehensive Guide To Multiclass Classification With Sklearn - Towards Data Science

Uploaded by

Samuel Asmelash

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Get unlimited access Open in app

Published in Towards Data Science

This is your last free member-only story this month. Upgrade for unlimited access.

Bex T. Follow

Jun 6, 2021 · 13 min read · Listen

Save

Comprehensive Guide to Multiclass Classification With Sklearn

Model selection, developing a strategy, and choosing an evaluation metric

Learn how to tackle any multiclass classification problem with Sklearn. The tutorial covers how to choose a model selection strategy,
several multiclass evaluation metrics and how to use them finishing off with hyperparameter tuning to optimize for user-defined metrics.

Photo by Sergiu Iacob on Pexels

Introduction
Even though multi-class classification is not as common, it certainly poses a much bigger challenge than binary classification problems.
You can literally take my word for it because this article has been the most challenging post I have ever written (have written close to 70).

I found that the topic of multiclass classification is deep and full of nuances. I have read so many articles, read multiple StackOverflow
threads, created a few of my own, and spent several hours exploring the Sklearn user guide and doing experiments. The core topics of
multiclass classification such as
Get unlimited access Open in app

choosing a strategy to binarize the problem

choosing a base mode

understanding excruciatingly many metrics

filtering out a single metric that solves your business problem and customizing it

tuning hyperparameters for this custom metric

and finally putting all the theory into practice with Sklearn

have all been scattered in the dark, sordid corners of the Internet. This was enough to conclude that no single resource shows an end-to-
end workflow of dealing with multiclass classification problems on the Internet (maybe, I missed it).

For this reason, this article will be a comprehensive tutorial on how to solve any multiclass supervised classification problem using
Sklearn. You will learn both the theory and the implementation of the above core concepts. It is going to be a long and technical read, so
get a coffee!

Native multiclass classifiers

Depending on the model you choose, Sklearn approaches multiclass classification problems in 3 different ways. In other words, Sklearn
estimators are grouped into 3 categories by their strategy to deal with multi-class data.

The first and the biggest group of estimators are the ones that support multi-class classification natively:

naive_bayes.BernoulliNB

tree.DecisionTreeClassifier

tree.ExtraTreeClassifier

ensemble.ExtraTreesClassifier

naive_bayes.GaussianNB

neighbors.KNeighborsClassifier

svm.LinearSVC (setting multi_class=”crammer_singer”)`

linear_model.LogisticRegression (setting multi_class=”multinomial”)

linear_model.LogisticRegressionCV (setting multi_class=”multinomial”)

For an N-class problem, they produce N by N confusion matrix, and most of the evaluation metrics are derived from it:

1 from sklearn.datasets import make_classification

2 from sklearn.ensemble import ExtraTreesClassifier
3 from sklearn.metrics import ConfusionMatrixDisplay, confusion_matrix
4 from sklearn.model_selection import train_test_split
5
6 # Build a synthetic dataset
7 X, y = make_classification(
8 n_samples=1000, n_features=5, n_informative=4, n_redundant=1, n_classes=4
9 )
10
11 # Train/test sets
12 X_train, X_test, y_train, y_test = train_test_split(
13 X, y, test_size=0.3, random_state=1121218
14 ) Get unlimited access Open in app
15
16 # Fit/predict
17 etc = ExtraTreesClassifier()
18 _ = etc.fit(X_train, y_train)
19 y_pred = etc.predict(X_test)
20
21 # Plot confusion matrix
22 fig, ax = plt.subplots(figsize=(8, 5))
23 cmp = ConfusionMatrixDisplay(
24 confusion_matrix(y_test, y_pred),
25 display_labels=["class_1", "class_2", "class_3", "class_4"],
26 )
27
28 cmp.plot(ax=ax)
29 plt.show();

6601.py
hosted with ❤ by GitHub view raw
Get unlimited access Open in app

We will focus on multiclass confusion matrices later in the tutorial.

Binary classifiers with One-vs-One (OVO) strategy

Other supervised classification algorithms were mainly designed for the binary case. However, Sklearn implements two strategies called
One-vs-One (OVO) and One-vs-Rest (OVR, also called One-vs-All) to convert a multi-class problem into a series of binary tasks.

OVO splits a multi-class problem into a single binary classification task for each pair of classes. In other words, for each pair, a single
binary classifier will be built. For example, a target with 4 classes — brain, lung, breast, and kidney cancer, uses 6 individual classifiers to
binarize the problem:

Classifier 1: lung vs. breast

Classifier 2: lung vs. kidney

Classifier 3: lung vs. brain

Classifier 4: breast vs. kidney

Classifier 5: breast vs. brain

Classifier 6: kidney vs. brain

Sklearn suggests these classifiers to work best with the OVO approach:
Get unlimited access Open in app

svm.NuSVC

svm.SVC

gaussian_process.GaussianProcessClassifier (setting multi_class = “one_vs_one”)

Sklearn also provides a wrapper estimator for the above models under sklearn.multiclass.OneVsOneClassifier :

1 from sklearn.gaussian_process import GaussianProcessClassifier

2 from sklearn.multiclass import OneVsOneClassifier
3 from sklearn.svm import SVC
4
5 # Don't have to set `multi_class` argument if used with OVOClassifier
6 ovo = OneVsOneClassifier(estimator=GaussianProcessClassifier())
7
8 >>> ovo.fit(X_train, y_train)
9 OneVsOneClassifier(estimator=GaussianProcessClassifier())

6602.py
hosted with ❤ by GitHub view raw

A major downside of this strategy is its computation workload. As each pair of classes require a separate binary classifier, targets with high
cardinality may take too long to train. To compute the number of classifiers that will be built for an N-class problem, the following formula
is used:

1 # Print the number of estimators created

2 print(len(ovo.estimators_))
3
4 ----------------------------------------
5
6 6

6603.py
hosted with ❤ by GitHub view raw
In practice, the One-vs-Rest strategy is much preferred because of this disadvantage.
Get unlimited access Open in app

Binary classifiers with One-vs-Rest (OVR) strategy

Alternatively, the OVR strategy creates an individual classifier for each class in the target. Essentially, each binary classifier chooses a
single class and marks it as positive, encoding it as 1. The rest of the classes are considered negative labels and, thus, encoded with 0. For
classifying 4 types of cancer:

Classifier 1: lung vs. [breast, kidney, brain] — (lung cancer, not lung cancer)

Classifier 2: breast vs. [lung, kidney, brain] — (breast cancer, not breast cancer)

Classifier 3: kidney vs. [lung, breast, brain] — (kidney cancer, not kidney cancer)

Classifier 4: brain vs. [lung, breast kidney] — (brain cancer, not brain cancer)

Sklearn suggests these classifiers to work best with the OVR approach:

ensemble.GradientBoostingClassifier

gaussian_process.GaussianProcessClassifier (setting multi_class = “one_vs_rest”)

svm.LinearSVC (setting multi_class=”ovr”)

linear_model.LogisticRegression (setting multi_class=”ovr”)

linear_model.LogisticRegressionCV (setting multi_class=”ovr”)

linear_model.SGDClassifier

linear_model.Perceptron

Alternatively, you can use the above models with the default OneVsRestClassifier :

1 from sklearn.linear_model import Perceptron

2 from sklearn.multiclass import OneVsRestClassifier
3
4 # Init/fit
5 ovr = OneVsRestClassifier(estimator=Perceptron())
6 _ = ovr.fit(X_train, y_train)
7 print(len(ovr.estimators_))
8
9 ---------------------------------------------------
10
11 4

6604.py
hosted with ❤ by GitHub view raw

Even though this strategy significantly lowers the computational cost, the fact that only one class is considered positive and the rest as
negative makes each binary problem an imbalanced classification. This problem is even more pronounced for classes with low proportions
in the target.

In both approaches, depending on the passed estimator, the results of all binary classifiers can be summarized in two ways:
majority of the vote: each binary classifier predicts one class, and the class that got the most votes from all classifiers is chosen
Get unlimited access Open in app

depending on the argmax of class membership probability scores: classifiers such as LogisticRegression computes probability scores
for each class ( .predict_proba() ). Then, the argmax of the sum of the scores is chosen.

We will talk more about how to score each of these strategies later in the tutorial.

Sample classification problem and preprocessing pipeline

As an example problem, we will be predicting the quality of diamonds using the Diamonds dataset from Kaggle:

1 import pandas as pd
2
3 diamonds = pd.read_csv("data/diamonds.csv").drop("Unnamed: 0", axis=1)
4 diamonds.head()

6605.py
hosted with ❤ by GitHub view raw

1 >>> diamonds.shape
2 (53940, 10)
3
4
5 >>> diamonds.describe().T.round(3)

6606.py
hosted with ❤ by GitHub view raw
Get unlimited access Open in app

The above output shows the features are on different scales, suggesting we use some type of normalization. This step is essential for many
linear-based models to perform well.

1 >>> diamonds.cut.value_counts()
2
3 Ideal 21551
4 Premium 13791
5 Very Good 12082
6 Good 4906
7 Fair 1610
8 Name: cut, dtype: int64

6607.py
hosted with ❤ by GitHub view raw

The dataset contains a mixture of numeric and categorical features. I covered preprocessing steps for binary classification in my last article
in detail. You can easily apply the ideas to the multi-class case, so I will keep the explanations here nice and short.

The target is ‘cut’, which has 5 classes: Ideal, Premium, Very Good, Good, and Fair (descending quality). We will encode the textual
features with OneHotEncoder.

Let’s take a quick look at the distributions of each numeric feature to decide what type of normalization to use:

>>> diamonds.hist(figsize=(16, 12));

Get unlimited access Open in app

Price and carat show skewed distributions. We will use a logarithmic transformer to make them as normally distributed as possible. For
the rest, simple standardization is enough. If you are not familiar with numeric transformations, check out my article on the topic. Also,
the below code contains an example of Sklearn pipelines, and you can learn all about them from here.

Let’s get to work:

1 from sklearn.model_selection import train_test_split

2
3 # Build feature/target arrays
4 X, y = diamonds.drop("cut", axis=1), diamonds["cut"].values.flatten()
5
6 # Create train/test sets
7 X_train, X_test, y_train, y_test = train_test_split(
8 X, y, random_state=1121218, test_size=0.33, stratify=y
9 )

6608.py
hosted with ❤ by GitHub view raw

1 from sklearn.compose import ColumnTransformer

2 from sklearn.ensemble import RandomForestClassifier
3 from sklearn.pipeline import Pipeline, make_pipeline
4 from sklearn.preprocessing import (
5 OneHotEncoder, PowerTransformer, StandardScaler
6 ) Get unlimited access Open in app
7
8 # Build categorical preprocessor
9 categorical_cols = X.select_dtypes(include="object").columns.to_list()
10 categorical_pipe = make_pipeline(
11 OneHotEncoder(sparse=False, handle_unknown="ignore")
12 )
13
14 # Build numeric processor
15 to_log = ["price", "carat"]
16 to_scale = ["x", "y", "z", "depth", "table"]
17 numeric_pipe_1 = make_pipeline(PowerTransformer())
18 numeric_pipe_2 = make_pipeline(StandardScaler())
19
20 # Full processor
21 full = ColumnTransformer(
22 transformers=[
23 ("categorical", categorical_pipe, categorical_cols),
24 ("power_transform", numeric_pipe_1, to_log),
25 ("standardization", numeric_pipe_2, to_scale),
26 ]
27 )
28
29 # Final pipeline combined with RandomForest
30 pipeline = Pipeline(
31 steps=[
32 ("preprocess", full),
33 (
34 "base",
35 RandomForestClassifier(max_depth=13),
36 ),
37 ]
38 )
39 # Fit
40 _ = pipeline.fit(X_train, y_train)

6609.py
hosted with ❤ by GitHub view raw

The first version of our pipeline uses RandomForestClassifier . Let's look at its confusion matrix by generating predictions:

1 from sklearn.metrics import ConfusionMatrixDisplay, confusion_matrix

2
3 y pred = pipeline.predict(X test)
3 y_pred pipeline.predict(X_test)
4
Get unlimited access Open in app
5 # Plot the confusion matrix
6 fig, ax = plt.subplots(figsize=(12, 8))
7 # Create the matrix
8 cm = confusion_matrix(y_test, y_pred)
9 cmp = ConfusionMatrixDisplay(cm, display_labels=pipeline.classes_)
10 cmp.plot(ax=ax)
11
12 plt.show();

6610.py
hosted with ❤ by GitHub view raw

In lines 8 and 9, we are creating the matrix and using a special Sklearn function to plot it. ConfusionMatrixDisplay also has display_labels

argument, to which we are passing the class names accessed by pipeline.classes_ attribute.
Interpreting N by N confusion matrix
Get unlimited access Open in app
If you read my other article on binary classification, you know that confusion matrices are the holy grail of supervised classification
problems. In a 2 by 2 matrix, the matrix terms are easy to interpret and locate.

Even though it gets more difficult to interpret the matrix as the number of classes increases, there are sure-fire ways to find your way
around any matrix of any shape.

The first step is always identifying your positive and negative classes. This depends on the problem you are trying to solve. As a jewelry
store owner, I may want my classifier to differentiate Ideal and Premium diamonds better than other types, making these types of
diamonds my positive class. Other classes will be considered negative.

Establishing positive and negative classes early on is very important in evaluating model performance and in hyperparameter tuning. After
doing this, you should define your true positives, true negatives, false positives, and false negatives. In our case:

Positive classes: Ideal and Premium diamonds

Negative classes: Very Good, Good, and Fair diamonds

True Positives, type 1: actual Ideal, predicted Ideal

True Positives, type 2: actual Premium, predicted Premium

True Negatives: the rest of the diamond types predicted correctly

False Positives: actual value belongs to any of the 3 negative classes but predicted either Ideal or Premium

False Negatives: actual value is either Ideal or Premium but predicted by any of the 3 negative classes.

Always list out the terms of your matrix in this manner, and the rest of your workflow will be much easier, as you will see in the next
section.

How Sklearn computes multiclass classification metrics — ROC AUC score

This section is only about the nitty-gritty details of how Sklearn calculates common metrics for multiclass classification. Specifically, we
will peek under the hood of the 4 most common metrics: ROC_AUC, precision, recall, and f1 score. Even though I will give a brief
overview of each metric, I will mostly focus on using them in practice. If you want a deeper explanation of what each metric measures,
please refer to this article.

The first metric we will discuss is the ROC AUC score or area under the receiver operating characteristic curve. It is mostly used when we
want to measure a classifier’s performance to differentiate between each class. This means that ROC AUC is better suited for balanced
classification tasks.

In essence, the ROC AUC score is used for binary classification and with models that can generate class membership probabilities based on
some threshold. Here is a brief overview of the steps to calculate ROC AUC for binary classification:

1. A binary classifier that can generate class membership probabilities such as LogisticRegression with its predict_proba method.

2. An initial, close to 0 decision threshold is chosen. For example, if the probability is higher than 0.1, the class is predicted negative else
positive.

3. Using this threshold, a confusion matrix is created.

4. True positive rate (TPR) and false positive rate (FPR) are found.

5. A new threshold is chosen, and steps 3–4 are repeated.

6. Repeat steps 2–5 for various thresholds between 0 and 1 to create a set of TPRs and FPRs.
7. Plot all TPRs vs. FPRs to generate the receiver operating characteristic curve.
Get unlimited access Open in app

8. Calculate the area under this curve.

For multiclass classification, you can calculate the ROC AUC for all classes using either OVO or OVR strategies. Since we agreed that OVR
is a better option, here is how ROC AUC is calculated for OVR classification:

1. Each binary classifier created using OVR finds the ROC AUC score for its own class using the above steps.

2. ROC AUC scores of all classifiers are then averaged using either of these 2 methods:

“macro”: this is simply the arithmetic mean of the scores

“weighted”: this takes class imbalance into account by finding a weighted average. Each ROC AUC is multiplied by their class weight
and summed, then divided by the total number of samples.

As an example, let’s say there are 100 samples in the target — class 1 (45), class 2 (30), class 3 (25). OVR creates 3 binary classifiers, 1 for
each class, and their ROC AUC scores are 0.75, 0.68, 0.84, respectively. The weighted ROC AUC score across all classes will be:

ROC AUC (weighted): ((45 * 0.75) + (30 * 0.68) + (25 * 0.84)) / 100 = 0.7515

Here is the implementation of all this in Sklearn:

1 from sklearn.metrics import roc_auc_score

2
3 # Generate membership scores with .predict_proba
4 y_pred_probs = pipeline.predict_proba(X_test)
5
6 # Calculate ROC_AUC
7 roc_auc_score(
8 y_test, y_pred_probs, multi_class="ovr", average="weighted"
9 )
10
11 --------------------------------------
12
13 0.9104965737411006

6611.py
hosted with ❤ by GitHub view raw

Above, we calculated ROC AUC for our diamond classification problem and got an excellent score. Don’t forget to set the multi_class and
average parameters properly when using roc_auc_score . If you want to generate the score for a particular class, here is how you do it:

1 # GENERATE ROC_AUC SCORE FOR 'IDEAL' CLASS DIAMONDS

2
3 # Find the index of Ideal class diamonds
4 idx = np.where(pipeline.classes_ == "Ideal")[0][0]
5
6 # Don't have to set multiclass and average params
7 >>> roc_auc_score(y_test == "Ideal", y_pred_probs[:, idx])
8 0.9431101165153962

6612.py
hosted with ❤ by GitHub view raw
Get unlimited access Open in app

ROC AUC score is only a good metric to see how the classifier differentiates between classes. A higher ROC AUC score does not necessarily
mean a better model. On top of that, we care more about our model’s ability to classify Ideal and Premium diamonds, so a metric like ROC
AUC is not a good option for our case.

Precision, Recall and F1 scores for multiclass classification

A better metric to measure our pipeline’s performance would be using precision, recall, and F1 scores. For the binary case, they are easy
and intuitive to understand:

Images by author

In a multiclass case, these 3 metrics are calculated per-class basis. For example, let’s look at the confusion matrix again:

1 # Plot the confusion matrix

2 fig, ax = plt.subplots(figsize=(12, 8))
3 # Create the matrix
4 cm = confusion_matrix(y_test, y_pred)
5 cmp = ConfusionMatrixDisplay(cm, display_labels=pipeline.classes_)
6 cmp.plot(ax=ax)
7
8 plt.show();

6613.py
hosted with ❤ by GitHub view raw
Get unlimited access Open in app

Precision tells us what proportion of predicted positives is truly positive. If we want to calculate precision for Ideal diamonds, true
positives would be the number of Ideal diamonds predicted correctly (the center of the matrix, 6626). False positives would be any cells
that count the number of times our classifier predicted other types of diamonds as Ideal. These would be the cells above and below the
center of the matrix (1013 + 521 + 31 + 8 = 1573). Using the formula of precision, we calculate it to be:

Precision (Ideal) = TP / (TP + FP) = 6626 / (6626 + 1573) = 0.808

Recall is calculated similarly. We know the number of true positives — 6626. False negatives would be any cells that count the number of
times the classifier predicted the Ideal type of diamonds belonging to any other negative class. These would be the cells right and left to
the center of the matrix (3 + 9 + 363 + 111 = 486). Using the formula of recall, we calculate it to be:

Recall (Ideal) = TP / (TP + FN) = 6626 / (6626 + 486) = 0.93

So, how do we choose between recall and precision for the Ideal class? It depends on the type of problem you are trying to solve. If you
want to minimize the instances where other, cheaper types of diamonds are predicted as Ideal, you should optimize precision. As a jewelry
store owner, you might be sued for fraud for selling cheaper diamonds as expensive Ideal diamonds.

On the other hand, if you want to minimize the instances where you accidentally sell Ideal diamonds for a lower price, you should
optimize for recall of the Ideal class. Indeed, you won’t get sued, but you might lose money.

The third option is to have a model that is equally good at the above 2 scenarios. In other words, a model with high precision and recall.
Fortunately, there is a metric that measures just that: the F1 score. F1 score takes the harmonic mean of precision and recall and produces
a value between 0 and 1:
Get unlimited access Open in app

So, the F1 score for the Ideal class would be:

F1 (Ideal) = 2 * (0.808 * 0.93) / (0.808 + 0.93) = 0.87

Up to this point, we calculated the 3 metrics only for the Ideal class. But in multiclass classification, Sklearn computes them for all classes.
You can use classification_report to see this:

1 from sklearn.metrics import classification_report

2
3 >>> print(classification_report(y_test, y_pred))
4
5 precision recall f1-score support
6
7 Fair 0.92 0.78 0.85 532
8 Good 0.79 0.64 0.70 1619
9 Ideal 0.81 0.93 0.86 7112
10 Premium 0.67 0.86 0.75 4551
11 Very Good 0.73 0.36 0.48 3987
12
13 accuracy 0.75 17801
14 macro avg 0.78 0.71 0.73 17801
15 weighted avg 0.76 0.75 0.74 17801

6614.py
hosted with ❤ by GitHub view raw

You can check that our calculations for the Ideal class were correct. The last column of the table — support shows how many samples are
there for each class. Also, the last 2 rows show averaged scores for the 3 metrics. We already covered what macro and weighted averages
are in the example of ROC AUC.

For imbalanced classification tasks such as these, you rarely choose averaged precision, recall of F1 scores. Again, choosing one metric to
optimize for a particular class depends on your business problem. For our case, we will choose to optimize the F1 score of Ideal and
Premium classes (yes, you can choose multiple classes simultaneously). First, let’s see how to calculate weighted F1 across all class:

1 from sklearn.metrics import f1_score

2
3 # Weighed F1 across all classes
4 >>> f1_score(y_test, y_pred, average="weighted")
5 0.7355520553610462

6615.py
hosted with ❤ by GitHub view raw
Get unlimited access Open in app

The above is consistent with the output of classification_report . To choose the F1 scores for Ideal and Premium classes, specify the

labels parameter:

1 # F1 score for Ideal and Premium with weighted average

2 >>> f1_score(
3 ... y_test, y_pred, labels=["Premium", "Ideal"], average="weighted"
4 ... )
5 0.8205313467958754

6616.py
hosted with ❤ by GitHub view raw

Finally, let’s see how to optimize these metrics with hyperparameter tuning.

Hyperparameter tuning to optimize model performance for a custom metric

Optimizing the model performance for a metric is almost the same as when we did for the binary case. The only difference is how we pass
a scoring function to a hyperparameter tuner like GridSearch.

Up until now, we were using the RandomForestClassifier pipeline, so we will create a hyperparameter grid for this estimator:

1 n_estimators = [int(x) for x in np.linspace(start=200, stop=2000, num=10)]

2 max_depth = [int(x) for x in np.linspace(10, 110, num=11)]
3 min_samples_split = [2, 5, 7, 10]
4 min_samples_leaf = [1, 2, 3, 4]
5
6 param_grid = {
7 "base__n_estimators": n_estimators,
8 "base__max_depth": max_depth,
9 "base__min_samples_split": min_samples_split,
10 "base__min_samples_leaf": min_samples_leaf,
11 }

6617.py
hosted with ❤ by GitHub view raw

244 3

Don’t forget to prepend each hyperparameter name with the step name you chose in the pipeline for your estimator. When we created our
pipeline, we specified RandomForests as ‘base’. See this discussion for more info.

We will use the HalvingGridSeachCV (HGS), which was much faster than a regular GridSearch. You can read this article to see my
experiments:

11 Times Faster Hyperparameter Tuning with HalvingGridSearch

Edit description
towardsdatascience.com
Get unlimited access Open in app

Before we feed the above grid to HGS, let’s create a custom scoring function. In the binary case, we could pass string values as the names
of the metrics we wanted to use, such as ‘precision’ or ‘recall.’ But in multiclass case, those functions accept additional parameters, and we
cannot do that if we pass the function names as strings. To solve this, Sklearn provides make_scorer function:

1 from sklearn.metrics import make_scorer

2
3 custom_f1 = make_scorer(
4 f1_score, greater_is_better=True, average="weighted", labels=["Ideal", "Premium"]
5 )
6
7 >>> custom_f1
8 make_scorer(f1_score, average=weighted)

6618.py
hosted with ❤ by GitHub view raw

As we did in the last section, we pasted custom values for average and labels parameters.

Finally, let’s initialize the HGS and fit it to the full data with 3-fold cross-validation:

1 from sklearn.experimental import enable_halving_search_cv

2 from sklearn.model_selection import HalvingRandomSearchCV
3
4 hrs = HalvingRandomSearchCV(
5 estimator=pipeline,
6 param_distributions=param_grid,
7 scoring=custom_f1,
8 cv=3,
9 n_candidates="exhaust",
10 factor=5,
11 n_jobs=-1,
12 )
13 # Fit
14 _ = hrs.fit(X, y)
15
16 # Score
17 best_estimator = hrs.best_estimator_
18 _ = model.fit(X_train, y_train)
19 y_preds = model.predict(X_test)
20
21 >>> f1_score(y_test, preds, average="weighted", labels=["Ideal", "Premium"])
22 0.8136686577320091

6619.py
hosted with ❤ by GitHub view raw
After the search is done, you can get the best score and estimator with .best_score_ and .best_estimator_ attributes, respectively.
Get unlimited access Open in app

Your model is only as good as the metric you choose to evaluate it with. Hyperparameter tuning will be time-consuming but assuming you
did everything right until this point and gave a good enough parameter grid, everything will turn out as expected. If not, it is an iterative
process, so take your time by tweaking the preprocessing steps, take a second look at your chosen metrics, and maybe widen your search
grid. Thank you for reading!

Related Articles
Multi-Class Metrics Made Simple, Part I: Precision and Recall

Multi-Class Metrics Made Simple, Part II: the F1-score

How to Calculate Precision, Recall, and F-Measure for Imbalanced Classification

Discussions
How to choose between ROC AUC and the F1 score?

What are the differences between AUC and F1-score?

API and User Guides

Classification Metrics

Multiclass and multioutput algorithms

Sign up for The Variable

By Towards Data Science

Every Thursday, the Variable delivers the very best of Towards Data Science: from hands-on tutorials and cutting-edge research to original features you don't want to
miss. Take a look.

Emails will be sent to [email protected].

Get this newsletter Not you?

The StatQuest Illustrated Guide To Machine Learning - Josh Starmer
100% (7)
The StatQuest Illustrated Guide To Machine Learning - Josh Starmer
305 pages
Machine Learning?
100% (2)
Machine Learning?
114 pages
INT524 Unit3
No ratings yet
INT524 Unit3
35 pages
Tutorial 6
No ratings yet
Tutorial 6
8 pages
2018 02 Msu Data Science
No ratings yet
2018 02 Msu Data Science
65 pages
Beginner's Guide To Implementing A Simple Machine Learning Project - DeV Community
No ratings yet
Beginner's Guide To Implementing A Simple Machine Learning Project - DeV Community
9 pages
Survey On Multiclass Classification Methods
No ratings yet
Survey On Multiclass Classification Methods
9 pages
Classification
No ratings yet
Classification
4 pages
Introduction To Classification - PPT Slides 1
No ratings yet
Introduction To Classification - PPT Slides 1
62 pages
Algorithmeknn 121213175830 Phpapp02
No ratings yet
Algorithmeknn 121213175830 Phpapp02
52 pages
Phython 3
No ratings yet
Phython 3
10 pages
ML Notes - 2025
No ratings yet
ML Notes - 2025
145 pages
ML Unit-Ii
No ratings yet
ML Unit-Ii
37 pages
Decision Tree
No ratings yet
Decision Tree
30 pages
Multiclass Classification
No ratings yet
Multiclass Classification
45 pages
04 EnsembleLearning
No ratings yet
04 EnsembleLearning
40 pages
Classifying Data Using Support Vector Machines (SVMS) in Python
No ratings yet
Classifying Data Using Support Vector Machines (SVMS) in Python
5 pages
In5490 Classification
No ratings yet
In5490 Classification
85 pages
IntroClassificationDA 2024
No ratings yet
IntroClassificationDA 2024
129 pages
Phyton
No ratings yet
Phyton
10 pages
Module 4 - Classification
No ratings yet
Module 4 - Classification
10 pages
Binary, Multi-Class & Multi-Label Classification
No ratings yet
Binary, Multi-Class & Multi-Label Classification
6 pages
Chapter 2
No ratings yet
Chapter 2
124 pages
Classification in PySpark
No ratings yet
Classification in PySpark
20 pages
4c Sklearn-Classification-Regression-Bkhw-Spring 2019
No ratings yet
4c Sklearn-Classification-Regression-Bkhw-Spring 2019
20 pages
Sample Final Q2
No ratings yet
Sample Final Q2
1 page
Python Predictive Modeling
No ratings yet
Python Predictive Modeling
24 pages
4 22865 IS465 2019 1 2 1 08ClassBasic
No ratings yet
4 22865 IS465 2019 1 2 1 08ClassBasic
43 pages
"Classifiers": R & D Project by Under The Guidance of
No ratings yet
"Classifiers": R & D Project by Under The Guidance of
59 pages
Machine Learning With PySpark and MLlib - Solving A Binary Classification Problem - by Susan Li - Towards Data Science
No ratings yet
Machine Learning With PySpark and MLlib - Solving A Binary Classification Problem - by Susan Li - Towards Data Science
10 pages
Lecture 11 - 09.09.24 Classification Part 1
No ratings yet
Lecture 11 - 09.09.24 Classification Part 1
51 pages
Dmunit 4
No ratings yet
Dmunit 4
23 pages
10 Classification New 1
No ratings yet
10 Classification New 1
31 pages
7 Classification
100% (3)
7 Classification
63 pages
19-Introduction Classification Algorithm-18-09-2024
No ratings yet
19-Introduction Classification Algorithm-18-09-2024
102 pages
Classification
No ratings yet
Classification
22 pages
IGNOU PGDCA MCS 206 Object Oriented Programming using Java Previous Years solved Papers
From Everand
IGNOU PGDCA MCS 206 Object Oriented Programming using Java Previous Years solved Papers
Manish Soni
No ratings yet
Lecture03. Classification (Chapter 3)
No ratings yet
Lecture03. Classification (Chapter 3)
46 pages
5 Markd
No ratings yet
5 Markd
24 pages
FALLSEM2024-25 BCSE209L TH VL2024250101735 2024-07-25 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101735 2024-07-25 Reference-Material-I
37 pages
06 EnsembleLearning
No ratings yet
06 EnsembleLearning
65 pages
Logistic Regression
No ratings yet
Logistic Regression
4 pages
Minor Project
No ratings yet
Minor Project
21 pages
ML - Mod2 Classification
No ratings yet
ML - Mod2 Classification
74 pages
3ML.02.MainConcepts Evaluation
No ratings yet
3ML.02.MainConcepts Evaluation
35 pages
Ds Notes Mca
No ratings yet
Ds Notes Mca
30 pages
Ensembles of Classifiers: Evgueni Smirnov
No ratings yet
Ensembles of Classifiers: Evgueni Smirnov
43 pages
FYP Notes
No ratings yet
FYP Notes
69 pages
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
17 Ensemble Techniques Problem Statement
No ratings yet
17 Ensemble Techniques Problem Statement
28 pages
ML Unit 2
No ratings yet
ML Unit 2
31 pages
Pyspark - Mllib Package
No ratings yet
Pyspark - Mllib Package
87 pages
Practical # 11
No ratings yet
Practical # 11
10 pages
Unit Ii
No ratings yet
Unit Ii
118 pages
Unit 3 (DWDM)
No ratings yet
Unit 3 (DWDM)
23 pages
8 Classification
No ratings yet
8 Classification
45 pages
Maxbox Starter60 Machine Learning
No ratings yet
Maxbox Starter60 Machine Learning
8 pages
DTS 101 Lecture 7
No ratings yet
DTS 101 Lecture 7
32 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
17 pages
Lec 5 B Analytics Classification
No ratings yet
Lec 5 B Analytics Classification
56 pages
Ensembles Models and Decision Tree
No ratings yet
Ensembles Models and Decision Tree
21 pages
How To Set Up Oracle GoldenGate Microservices 12.3
No ratings yet
How To Set Up Oracle GoldenGate Microservices 12.3
33 pages
Exporting A Keystore From ASM To A Target Host For Oracle TDE Provisioning
No ratings yet
Exporting A Keystore From ASM To A Target Host For Oracle TDE Provisioning
8 pages
How To Create Disk Storage With LVM in Linux - Part 1
No ratings yet
How To Create Disk Storage With LVM in Linux - Part 1
35 pages
Upgrade PDBs - Everything at Once (Full CDB Upgrade)
No ratings yet
Upgrade PDBs - Everything at Once (Full CDB Upgrade)
20 pages
Linux Volume Extend
No ratings yet
Linux Volume Extend
4 pages
Oracle Linux HugePages What It Is and What It Is Not Doc ID 361323
No ratings yet
Oracle Linux HugePages What It Is and What It Is Not Doc ID 361323
6 pages
How To Perform ORACLE - HOME Backup Doc ID 565017
No ratings yet
How To Perform ORACLE - HOME Backup Doc ID 565017
2 pages
Performance Tuning Basics 15 - AWR Report Analysis - Expert Oracle
No ratings yet
Performance Tuning Basics 15 - AWR Report Analysis - Expert Oracle
63 pages
Active DataGuard DML Redirection
No ratings yet
Active DataGuard DML Redirection
9 pages
How To Optimize A Data Guard Configuration
100% (1)
How To Optimize A Data Guard Configuration
12 pages
Metrics For Multi-Class Classification
No ratings yet
Metrics For Multi-Class Classification
17 pages
A Simple Framework For Building Predictive Models
No ratings yet
A Simple Framework For Building Predictive Models
18 pages
A Review of Multi-Class Classification Algorithms
No ratings yet
A Review of Multi-Class Classification Algorithms
10 pages
Feature Enhanced Deep Learning Technique With Soft Attention
No ratings yet
Feature Enhanced Deep Learning Technique With Soft Attention
10 pages
2019 Developmentofthe PMGQand Prevalenceof Mobile Gaming Addiction Among Adolescentsin Taiwan Cyberpsychol Behav Soc Netw
No ratings yet
2019 Developmentofthe PMGQand Prevalenceof Mobile Gaming Addiction Among Adolescentsin Taiwan Cyberpsychol Behav Soc Netw
8 pages
Analysis of Machine Learning Algorithms On Cancer Dataset
No ratings yet
Analysis of Machine Learning Algorithms On Cancer Dataset
10 pages
Severe Mini-Mental State Examination PDF
No ratings yet
Severe Mini-Mental State Examination PDF
1 page
Literature Survey On Customer Churn Prediction
No ratings yet
Literature Survey On Customer Churn Prediction
4 pages
Identifying A Cut-Off Point For Normal Mobility: A Comparison of The Timed Up and Go' Test in Community-Dwelling and Institutionalised Elderly Women
No ratings yet
Identifying A Cut-Off Point For Normal Mobility: A Comparison of The Timed Up and Go' Test in Community-Dwelling and Institutionalised Elderly Women
6 pages
Mouse Dynamics Based User Recognition Using Deep L
No ratings yet
Mouse Dynamics Based User Recognition Using Deep L
12 pages
The Receiver Operating Characteristic ROC Curve
No ratings yet
The Receiver Operating Characteristic ROC Curve
3 pages
Applied Artificial Intelligence For Predicting Construction Projects Delay
No ratings yet
Applied Artificial Intelligence For Predicting Construction Projects Delay
16 pages
STRONGkids Nutritional Screening Tool in Pediatrics An Analysis of Cutoff Points in South America
No ratings yet
STRONGkids Nutritional Screening Tool in Pediatrics An Analysis of Cutoff Points in South America
8 pages
Swapandeep Kaur 1910941059 Research Paper 8
No ratings yet
Swapandeep Kaur 1910941059 Research Paper 8
28 pages
Machine Learning Methods For Surgery Cancellation
No ratings yet
Machine Learning Methods For Surgery Cancellation
4 pages
Predictive Study of New Model For Wound Healing
No ratings yet
Predictive Study of New Model For Wound Healing
11 pages
Dlsu Aki Working Paper Series 2022-09-085
No ratings yet
Dlsu Aki Working Paper Series 2022-09-085
48 pages
Logistic Regression in Python Using Dask
No ratings yet
Logistic Regression in Python Using Dask
19 pages
2022 Asme Poster Jon
No ratings yet
2022 Asme Poster Jon
1 page
Does The Central Venous Pressure Predict Fluid Responsiveness? An Updated Meta-Analysis and A Plea For Some Common Sense
No ratings yet
Does The Central Venous Pressure Predict Fluid Responsiveness? An Updated Meta-Analysis and A Plea For Some Common Sense
8 pages
Using An Automated Tail Movement Sensor Device To Predict Calving Time in Dairy Cows
No ratings yet
Using An Automated Tail Movement Sensor Device To Predict Calving Time in Dairy Cows
6 pages
Page 175
No ratings yet
Page 175
12 pages
s12652 021 03075 2
No ratings yet
s12652 021 03075 2
8 pages
Knime Workflow For Beginners - Nodes
No ratings yet
Knime Workflow For Beginners - Nodes
2 pages
Classification Metrics in Machine Learning
No ratings yet
Classification Metrics in Machine Learning
6 pages
Data Encoding
No ratings yet
Data Encoding
18 pages
Khoury 2015
No ratings yet
Khoury 2015
12 pages
Module 3
No ratings yet
Module 3
11 pages
Alzheimers Screening MSE
No ratings yet
Alzheimers Screening MSE
7 pages
Coca Cola - Killian Farrell & Luis Honsel
No ratings yet
Coca Cola - Killian Farrell & Luis Honsel
1 page
Behavior-Based Features Model For Malware Detectio
No ratings yet
Behavior-Based Features Model For Malware Detectio
12 pages
An Introduction To ROC Analysis
100% (1)
An Introduction To ROC Analysis
14 pages

Comprehensive Guide To Multiclass Classification With Sklearn - Towards Data Science

Uploaded by

Comprehensive Guide To Multiclass Classification With Sklearn - Towards Data Science

Uploaded by

Get unlimited access Open in app

Published in Towards Data Science

Jun 6, 2021 · 13 min read · Listen

Comprehensive Guide to Multiclass Classification With Sklearn

Photo by Sergiu Iacob on Pexels

choosing a strategy to binarize the problem

choosing a base mode

understanding excruciatingly many metrics

tuning hyperparameters for this custom metric

Native multiclass classifiers

svm.LinearSVC (setting multi_class=”crammer_singer”)`

linear_model.LogisticRegression (setting multi_class=”multinomial”)

linear_model.LogisticRegressionCV (setting multi_class=”multinomial”)

1 from sklearn.datasets import make_classification

We will focus on multiclass confusion matrices later in the tutorial.

Binary classifiers with One-vs-One (OVO) strategy

Classifier 1: lung vs. breast

Classifier 2: lung vs. kidney

Classifier 3: lung vs. brain

Classifier 4: breast vs. kidney

Classifier 5: breast vs. brain

Classifier 6: kidney vs. brain

gaussian_process.GaussianProcessClassifier (setting multi_class = “one_vs_one”)

1 from sklearn.gaussian_process import GaussianProcessClassifier

1 # Print the number of estimators created

Binary classifiers with One-vs-Rest (OVR) strategy

gaussian_process.GaussianProcessClassifier (setting multi_class = “one_vs_rest”)

svm.LinearSVC (setting multi_class=”ovr”)

linear_model.LogisticRegression (setting multi_class=”ovr”)

linear_model.LogisticRegressionCV (setting multi_class=”ovr”)

1 from sklearn.linear_model import Perceptron

Sample classification problem and preprocessing pipeline

>>> diamonds.hist(figsize=(16, 12));

Let’s get to work:

1 from sklearn.model_selection import train_test_split

1 from sklearn.compose import ColumnTransformer

1 from sklearn.metrics import ConfusionMatrixDisplay, confusion_matrix

Positive classes: Ideal and Premium diamonds

Negative classes: Very Good, Good, and Fair diamonds

True Positives, type 1: actual Ideal, predicted Ideal

True Positives, type 2: actual Premium, predicted Premium

True Negatives: the rest of the diamond types predicted correctly

How Sklearn computes multiclass classification metrics — ROC AUC score

3. Using this threshold, a confusion matrix is created.

5. A new threshold is chosen, and steps 3–4 are repeated.

8. Calculate the area under this curve.

“macro”: this is simply the arithmetic mean of the scores

Here is the implementation of all this in Sklearn:

1 from sklearn.metrics import roc_auc_score

1 # GENERATE ROC_AUC SCORE FOR 'IDEAL' CLASS DIAMONDS

Precision, Recall and F1 scores for multiclass classification

1 # Plot the confusion matrix

Precision (Ideal) = TP / (TP + FP) = 6626 / (6626 + 1573) = 0.808

Recall (Ideal) = TP / (TP + FN) = 6626 / (6626 + 486) = 0.93

So, the F1 score for the Ideal class would be:

F1 (Ideal) = 2 * (0.808 * 0.93) / (0.808 + 0.93) = 0.87

1 from sklearn.metrics import classification_report

1 from sklearn.metrics import f1_score

1 # F1 score for Ideal and Premium with weighted average

Hyperparameter tuning to optimize model performance for a custom metric

1 n_estimators = [int(x) for x in np.linspace(start=200, stop=2000, num=10)]

11 Times Faster Hyperparameter Tuning with HalvingGridSearch

1 from sklearn.metrics import make_scorer

1 from sklearn.experimental import enable_halving_search_cv

Multi-Class Metrics Made Simple, Part II: the F1-score

How to Calculate Precision, Recall, and F-Measure for Imbalanced Classification

What are the differences between AUC and F1-score?

API and User Guides

Multiclass and multioutput algorithms

Sign up for The Variable

Emails will be sent to [email protected].

You might also like