scipy.cluster.vq.

whiten#

scipy.cluster.vq.whiten(obs, check_finite=None)[source]#

Normalize a group of observations on a per feature basis.

Before running k-means, it is beneficial to rescale each feature dimension of the observation set by its standard deviation (i.e. “whiten” it - as in “white noise” where each frequency has equal power). Each feature is divided by its standard deviation across all observations to give it unit variance.

Parameters:

obsndarray

Each row of the array is an observation. The columns are the features seen during each observation:

#        f0  f1  f2
obs = [[ 1., 1., 1.],  #o0
       [ 2., 2., 2.],  #o1
       [ 3., 3., 3.],  #o2
       [ 4., 4., 4.]]  #o3

check_finitebool, optional

Whether to check that the input matrices contain only finite numbers. Disabling may give a performance gain, but may result in problems (crashes, non-termination) if the inputs do contain infinities or NaNs. Default: True for eager backends and False for lazy ones.

Returns:

resultndarray: Contains the values in obs scaled by the standard deviation of each column.

Notes

whiten has experimental support for Python Array API Standard compatible backends in addition to NumPy. Please consider testing these features by setting an environment variable SCIPY_ARRAY_API=1 and providing CuPy, PyTorch, JAX, or Dask arrays as array arguments. The following combinations of backend and device (or other capability) are supported.

Library	CPU	GPU
NumPy	✅	n/a
CuPy	n/a	✅
PyTorch	✅	✅
JAX	✅	✅
Dask	✅	n/a

See Support for the array API standard for more information.

Examples

>>> import numpy as np
>>> from scipy.cluster.vq import whiten
>>> features  = np.array([[1.9, 2.3, 1.7],
...                       [1.5, 2.5, 2.2],
...                       [0.8, 0.6, 1.7,]])
>>> whiten(features)
array([[ 4.17944278,  2.69811351,  7.21248917],
       [ 3.29956009,  2.93273208,  9.33380951],
       [ 1.75976538,  0.7038557 ,  7.21248917]])