Make standard scaler compatible to Array API #27113

AlexanderFabisch · 2023-08-19T16:42:43Z

Here's my contribution from the EuroSciPy 2023 sprint. It's still work in progress and I won't have the time to continue the work before October. So if anyone else wants to take it from here, feel free to do so.

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Make standard scaler compatible to Array API.

Any other comments?

Unfortunately, the current implementation breaks some unit tests of the standard scaler that are related to dtypes. That's because I wanted to make it work for torch.float16, but maybe that is not necessary and we should just support float32 and float64.

I'll also add some comments to the diff. See below.

sklearn/preprocessing/_data.py

github-actions · 2023-08-19T16:44:14Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 784e189. Link to the linter CI: here}

sklearn/preprocessing/_data.py

sklearn/preprocessing/tests/test_data.py

sklearn/utils/_array_api.py

sklearn/utils/extmath.py

sklearn/preprocessing/tests/test_data.py

EdAbati · 2023-08-20T09:06:14Z

Hi @AlexanderFabisch, I'm happy to continue this if it cannot wait until October. Waiting to see what the maintainers think. :)

Here are a few things I learned while working on my PR that might be helpful if you decide to keep working on it:

update your branch with main to get some useful functions like _array_api.supported_float_dtypes
testing the Array API compliance could be done by using a function that looks like this
in other places, a scalar array is created using xp.asarray(0.0, device=device(...))

AlexanderFabisch · 2023-08-20T17:10:33Z

I'm happy to continue this if it cannot wait until October.

Sure, I could also give you write access to my fork if needed. That way we could collaborate better.

sklearn/preprocessing/_data.py

EdAbati · 2023-09-16T17:58:34Z

Hi @AlexanderFabisch , thank you for sharing the fork :)
I continued a bit, and tried to resolve some comments based on what I saw in the other PRs.

There are still a couple of TODOs:

the "hack" regarding sqrt needs to be updated
some tests are still failing because certain operations with arrays of different types are not allowed (e.g. multiplication between int64 and float64). Make standard scaler compatible to Array API #27113 (comment)

Another thing to bear in mind is that device='mps' does not support float64. #27232 introduces something we could use

AlexanderFabisch · 2023-09-17T07:41:39Z

That looks a lot better @EdAbati . Thanks for continuing this PR.

…et_output

sklearn/utils/estimator_checks.py

betatim · 2025-07-16T07:25:33Z

sklearn/utils/extmath.py

+
+        try:
+            return op(x, *args, **kwargs, dtype=target_dtype)
+        except TypeError:


A reason to inspect the signature is that it is more explicit/specific. It covers us for the case where a TypeError is raised for a reason other than "this function doesn't have a dtype kwarg". I don't know off the top of my head what all the reasons are that a TypeError could be raised, so maybe this isn't an issue. But maybe the fact that I don't know off the top of my head is a reason to be more specific?

A reason to not use inspection would be that it is slow (compared to the time spent on op).

sklearn/utils/tests/test_extmath.py

betatim · 2025-07-16T07:59:54Z

@OmarManzoor the script in your comment uses PolynomialFeatures instead of StandardScaler. Is that on purpose?

OmarManzoor · 2025-07-16T08:58:43Z

@OmarManzoor the script in your comment uses PolynomialFeatures instead of StandardScaler. Is that on purpose?

Really sorry about that. I used the prior script and forgot to replace StandardScaler.

betatim · 2025-07-16T10:51:11Z

No worries. I was just wondering if I was missing something :D

When I make the swap, I see numbers roughly similar to what you do. A ten times slowdown for fitting seems weird no?

edit: these are the numbers I see if I use StandardScaler
Avg fit time for numpy: 0.036
Avg transform time for numpy: 0.011
Avg fit time for torch mps: 0.028, speed-up: 1.3x
Avg transform time for torch mps: 0.003 speed-up: 3.6x

So I think the results above came from the code you pasted. I see a roughly 3x speed up if I add a 0th run of the StandardScaler on MPS before the for loop and exclude its number from the average calculation

Details

from time import time

import numpy as np
import torch as xp

from sklearn._config import config_context
from sklearn.preprocessing import StandardScaler

X_np = np.random.rand(100000, 100).astype(np.float32)
X_xp = xp.asarray(X_np, device="mps")

# Numpy benchmarks
fit_times = []
transform_times = []
n_iter = 10
for _ in range(n_iter):
    start = time()
    pf_np = StandardScaler()
    pf_np.fit(X_np)
    fit_times.append(time() - start)

    start = time()
    pf_np.transform(X_np)
    transform_times.append(time() - start)

avg_fit_time_numpy = sum(fit_times) / n_iter
avg_transform_time_numpy = sum(transform_times) / n_iter
print(f"Avg fit time for numpy: {avg_fit_time_numpy:.3f}")
print(f"Avg transform time for numpy: {avg_transform_time_numpy:.3f}")


# Torch mps benchmarks
with config_context(array_api_dispatch=True):
    pf_xp = StandardScaler()
    pf_xp.fit(X_xp)

fit_times = []
transform_times = []
for _ in range(n_iter):
    with config_context(array_api_dispatch=True):
        start = time()
        pf_xp = StandardScaler()
        pf_xp.fit(X_xp)
        fit_times.append(time() - start)

        start = time()
        float(pf_xp.transform(X_xp)[0, 0])
        transform_times.append(time() - start)

avg_fit_time_mps = sum(fit_times) / n_iter
avg_transform_time_mps = sum(transform_times) / n_iter
print(
    f"Avg fit time for torch mps: {avg_fit_time_mps:.3f}, "
    f"speed-up: {avg_fit_time_numpy / avg_fit_time_mps:.1f}x"
)
print(
    f"Avg transform time for torch mps: {avg_transform_time_mps:.3f} "
    f"speed-up: {avg_transform_time_numpy / avg_transform_time_mps:.1f}x"
)

OmarManzoor · 2025-07-16T12:42:27Z

So I think the results above came from the code you pasted. I see a roughly 3x speed up if I add a 0th run of the StandardScaler on MPS before the for loop and exclude its number from the average calculation

Here are the results that I ran. I increased the dataset size to (1000000, 200). I just used the original code and replaced StandardScaler

Avg fit time for numpy: 0.893
Avg transform time for numpy: 0.201

Avg fit time for torch mps: 0.245, speed-up: 3.6x
Avg transform time for torch mps: 0.094 speed-up: 2.1x

from time import time

import numpy as np
import torch as xp
from tqdm import tqdm

from sklearn._config import config_context
from sklearn.preprocessing import StandardScaler

X_np = np.random.rand(1000000, 200).astype(np.float32)
X_xp = xp.asarray(X_np, device="mps")

# Numpy benchmarks
fit_times = []
transform_times = []
n_iter = 10
for _ in tqdm(range(n_iter), desc="Numpy Flow"):
    start = time()
    pf_np = StandardScaler()
    pf_np.fit(X_np)
    fit_times.append(time() - start)

    start = time()
    pf_np.transform(X_np)
    transform_times.append(time() - start)

avg_fit_time_numpy = sum(fit_times) / n_iter
avg_transform_time_numpy = sum(transform_times) / n_iter
print(f"Avg fit time for numpy: {avg_fit_time_numpy:.3f}")
print(f"Avg transform time for numpy: {avg_transform_time_numpy:.3f}")


# Torch mps benchmarks
fit_times = []
transform_times = []
for _ in tqdm(range(n_iter), desc="Torch mps Flow"):
    with config_context(array_api_dispatch=True):
        start = time()
        pf_xp = StandardScaler()
        pf_xp.fit(X_xp)
        fit_times.append(time() - start)

        start = time()
        float(pf_xp.transform(X_xp)[0, 0])
        transform_times.append(time() - start)

avg_fit_time_mps = sum(fit_times) / n_iter
avg_transform_time_mps = sum(transform_times) / n_iter
print(
    f"Avg fit time for torch mps: {avg_fit_time_mps:.3f}, "
    f"speed-up: {avg_fit_time_numpy / avg_fit_time_mps:.1f}x"
)
print(
    f"Avg transform time for torch mps: {avg_transform_time_mps:.3f} "
    f"speed-up: {avg_transform_time_numpy / avg_transform_time_mps:.1f}x"
)

sklearn/preprocessing/_data.py

sklearn/utils/extmath.py

…rstate

…nto feature/standard_scaler_array_api

codecov · 2025-07-18T15:49:32Z

❌ Unsupported file format

Upload processing failed due to unsupported file format. Please review the parser error message:

Error deserializing json

Caused by: expected value at line 1 column 1

For more help, visit our troubleshooting guide.

lesteve

Some cosmetic comments that were in draft for a while. I'll try to go back to this to have a closer look in the not too far future

doc/whats_new/upcoming_changes/array-api/27113.feature.rst

sklearn/preprocessing/tests/test_data.py

sklearn/utils/tests/test_extmath.py

sklearn/preprocessing/tests/test_data.py

betatim

I think this looks good.

One thing to address in a new PR is the whole story around "everything follows X" -

betatim · 2025-08-19T12:27:29Z

sklearn/preprocessing/_data.py

@@ -1106,9 +1113,9 @@ def transform(self, X, copy=None):
                inplace_column_scale(X, 1 / self.scale_)
        else:
            if self.with_mean:
-                X -= self.mean_
+                X -= xp.astype(self.mean_, X.dtype)


For my education: why do we (now) need this additional astype? Is it because the type of X in transform can be different from what is used in fit? Why did we not need it before?

The inner computation tries to use the maximum float available and sets the computed values and attributes accordingly. Since StandardScaler preserves the dtype we need this here as from what I remember the self.mean_ can be set according to the max float (float64) but X is float32.

Commenting out the xp.astype and running the tests, I think only array-api-strict is picky about this. For other namespaces X -= self.mean_ works fine if X has dtype float32 and self.mean_ has type float64, which it was it was not needed before with numpy.

Thanks for figuring it out. array-pi-strict is strict :-/

lesteve · 2025-08-27T08:15:10Z

sklearn/preprocessing/_data.py

@@ -1050,7 +1056,7 @@ def partial_fit(self, X, y=None, sample_weight=None):
        # for backward-compatibility, reduce n_samples_seen_ to an integer
        # if the number of samples is the same for each feature (i.e. no
        # missing values)
-        if np.ptp(self.n_samples_seen_) == 0:
+        if xp.max(self.n_samples_seen_) == xp.min(self.n_samples_seen_):


FYI, I double-checked and np.ptp does nothing smart (like computing both the min and max in a single pass) so this is fine to replace it by max == min.

lesteve · 2025-08-27T08:50:58Z

LGTM, thanks to everyone involved in this PR over the last 2 years @AlexanderFabisch @EdAbati, @ogrisel @charlesjhill @OmarManzoor @betatim!

github-actions bot added module:preprocessing module:utils labels Aug 19, 2023