Speed up classification_report

### Describe the workflow you want to enable

I'm concerned with slow execution speed of the classification_report  procedure which makes it barely suitable for production-grade workloads.
On a 8M sample it already takes 15 seconds whereas simple POC numba implementation takes 20 to 40 **milli**seconds. I understand that sklearn code is well covered with tests, has wide functionality, follows style guildlines and best practices, but there should be no excuse for the performace leap with simple POC of magnitude that big (x1000).

```python
import numpy as np
from sklearn.metrics import classification_report

y_true = np.random.randint(0, 2, size=2 ** 23)
y_pred = y_true.copy()
np.random.shuffle(y_pred[2 ** 20 : 2 ** 21])

print(classification_report(y_true=y_true, y_pred=y_pred, digits=10, output_dict=False, zero_division=0,))
```

>               precision    recall  f1-score   support
> 
>            0  0.9373906697 0.9373906697 0.9373906697   4192570
>            1  0.9374424159 0.9374424159 0.9374424159   4196038
> 
>     accuracy                      0.9374165535   8388608
>     macro avg  0.9374165428 0.9374165428 0.9374165428   8388608
>     weighted avg  0.9374165535 0.9374165535 0.9374165535   8388608
> 
> time: 18.6 s (started: 2023-07-08 16:10:08 +03:00)
> 


`print(own_classification_report(y_true=y_true, y_pred=y_pred, zero_division=0))`


> (3930076, 3933544, 262494, 262494, 0.9374165534973145, array([4192570, 4196038], dtype=int64), array([0.93739067, 0.93744242]), array([0.93739067, 0.93744242]), array([0.93739067, 0.93744242]))
> time: 16 ms (started: 2023-07-08 16:11:18 +03:00)

### Describe your proposed solution

```python
import numpy as np
from numba import njit


@njit()
def own_classification_report(y_true: np.ndarray, y_pred: np.ndarray, nclasses: int = 2, zero_division: int = 0):

    correct = np.zeros(nclasses, dtype=np.int64)
    wrong = np.zeros(nclasses, dtype=np.int64)
    for truth, pred in zip(y_true, y_pred):
        if pred == truth:
            correct[pred] += 1
        else:
            wrong[pred] += 1
    tn, tp = correct
    fn, fp = wrong
    accuracy = (tn + tp) / len(y_true)

    groups = np.array([(tn + fn), (tp + fp)])
    supports = np.array([(tn + fp), (tp + fn)])

    precisions = np.array([tn, tp]) / groups
    recalls = np.array([tn, tp]) / supports
    f1s = 2 * (precisions * recalls) / (precisions + recalls)

    for arr in (precisions, recalls, f1s):
        np.nan_to_num(arr, copy=False, nan=zero_division)

    return tn, tp, fn, fp, accuracy, supports, precisions, recalls, f1s
```

### Describe alternatives you've considered, if relevant

Can someone from original classification_report authors take a look please why it's taking so long?

### Additional context

I have the latest versions of scikit-learn and numba to this date, an Intel CPU with AVX.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Speed up classification_report #26808

Describe the workflow you want to enable

Describe your proposed solution

Describe alternatives you've considered, if relevant

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Speed up classification_report #26808

Description

Describe the workflow you want to enable

Describe your proposed solution

Describe alternatives you've considered, if relevant

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions