Skip to content

Speed up classification_report #26808

@fingoldo

Description

@fingoldo

Describe the workflow you want to enable

I'm concerned with slow execution speed of the classification_report procedure which makes it barely suitable for production-grade workloads.
On a 8M sample it already takes 15 seconds whereas simple POC numba implementation takes 20 to 40 milliseconds. I understand that sklearn code is well covered with tests, has wide functionality, follows style guildlines and best practices, but there should be no excuse for the performace leap with simple POC of magnitude that big (x1000).

import numpy as np
from sklearn.metrics import classification_report

y_true = np.random.randint(0, 2, size=2 ** 23)
y_pred = y_true.copy()
np.random.shuffle(y_pred[2 ** 20 : 2 ** 21])

print(classification_report(y_true=y_true, y_pred=y_pred, digits=10, output_dict=False, zero_division=0,))
          precision    recall  f1-score   support

       0  0.9373906697 0.9373906697 0.9373906697   4192570
       1  0.9374424159 0.9374424159 0.9374424159   4196038

accuracy                      0.9374165535   8388608
macro avg  0.9374165428 0.9374165428 0.9374165428   8388608
weighted avg  0.9374165535 0.9374165535 0.9374165535   8388608

time: 18.6 s (started: 2023-07-08 16:10:08 +03:00)

print(own_classification_report(y_true=y_true, y_pred=y_pred, zero_division=0))

(3930076, 3933544, 262494, 262494, 0.9374165534973145, array([4192570, 4196038], dtype=int64), array([0.93739067, 0.93744242]), array([0.93739067, 0.93744242]), array([0.93739067, 0.93744242]))
time: 16 ms (started: 2023-07-08 16:11:18 +03:00)

Describe your proposed solution

import numpy as np
from numba import njit


@njit()
def own_classification_report(y_true: np.ndarray, y_pred: np.ndarray, nclasses: int = 2, zero_division: int = 0):

    correct = np.zeros(nclasses, dtype=np.int64)
    wrong = np.zeros(nclasses, dtype=np.int64)
    for truth, pred in zip(y_true, y_pred):
        if pred == truth:
            correct[pred] += 1
        else:
            wrong[pred] += 1
    tn, tp = correct
    fn, fp = wrong
    accuracy = (tn + tp) / len(y_true)

    groups = np.array([(tn + fn), (tp + fp)])
    supports = np.array([(tn + fp), (tp + fn)])

    precisions = np.array([tn, tp]) / groups
    recalls = np.array([tn, tp]) / supports
    f1s = 2 * (precisions * recalls) / (precisions + recalls)

    for arr in (precisions, recalls, f1s):
        np.nan_to_num(arr, copy=False, nan=zero_division)

    return tn, tp, fn, fp, accuracy, supports, precisions, recalls, f1s

Describe alternatives you've considered, if relevant

Can someone from original classification_report authors take a look please why it's taking so long?

Additional context

I have the latest versions of scikit-learn and numba to this date, an Intel CPU with AVX.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions