boxcox#
- scipy.stats.boxcox(x, lmbda=None, alpha=None, optimizer=None)[source]#
Return a dataset transformed by a Box-Cox power transformation.
- Parameters:
- xndarray
Input array to be transformed.
If lmbda is not None, this is an alias of
scipy.special.boxcox
. Returns nan ifx < 0
; returns -inf ifx == 0 and lmbda < 0
.If lmbda is None, array must be positive, 1-dimensional, and non-constant.
- lmbdascalar, optional
If lmbda is None (default), find the value of lmbda that maximizes the log-likelihood function and return it as the second output argument.
If lmbda is not None, do the transformation for that value.
- alphafloat, optional
If lmbda is None and
alpha
is not None (default), return the100 * (1-alpha)%
confidence interval for lmbda as the third output argument. Must be between 0.0 and 1.0.If lmbda is not None,
alpha
is ignored.- optimizercallable, optional
If lmbda is None, optimizer is the scalar optimizer used to find the value of lmbda that minimizes the negative log-likelihood function. optimizer is a callable that accepts one argument:
- funcallable
The objective function, which evaluates the negative log-likelihood function at a provided value of lmbda
and returns an object, such as an instance of
scipy.optimize.OptimizeResult
, which holds the optimal value of lmbda in an attribute x.See the example in
boxcox_normmax
or the documentation ofscipy.optimize.minimize_scalar
for more information.If lmbda is not None, optimizer is ignored.
- Returns:
- boxcoxndarray
Box-Cox power transformed array.
- maxlogfloat, optional
If the lmbda parameter is None, the second returned argument is the lmbda that maximizes the log-likelihood function.
- (min_ci, max_ci)tuple of float, optional
If lmbda parameter is None and
alpha
is not None, this returned tuple of floats represents the minimum and maximum confidence limits givenalpha
.
See also
Notes
The Box-Cox transform is given by:
\[y = \begin{cases} \frac{x^\lambda - 1}{\lambda}, &\text{for } \lambda \neq 0 \log(x), &\text{for } \lambda = 0 \end{cases}\]boxcox
requires the input data to be positive. Sometimes a Box-Cox transformation provides a shift parameter to achieve this;boxcox
does not. Such a shift parameter is equivalent to adding a positive constant to x before callingboxcox
.The confidence limits returned when
alpha
is provided give the interval where:\[l(\hat{\lambda}) - l(\lambda) < \frac{1}{2}\chi^2(1 - \alpha, 1),\]with \(l\) the log-likelihood function and \(\chi^2\) the chi-squared function.
Array API Standard Support
boxcox
has experimental support for Python Array API Standard compatible backends in addition to NumPy. Please consider testing these features by setting an environment variableSCIPY_ARRAY_API=1
and providing CuPy, PyTorch, JAX, or Dask arrays as array arguments. The following combinations of backend and device (or other capability) are supported.Library
CPU
GPU
NumPy
✅
n/a
CuPy
n/a
⛔
PyTorch
⛔
⛔
JAX
⛔
⛔
Dask
⛔
n/a
See Support for the array API standard for more information.
References
G.E.P. Box and D.R. Cox, “An Analysis of Transformations”, Journal of the Royal Statistical Society B, 26, 211-252 (1964).
Examples
>>> from scipy import stats >>> import matplotlib.pyplot as plt
We generate some random variates from a non-normal distribution and make a probability plot for it, to show it is non-normal in the tails:
>>> fig = plt.figure() >>> ax1 = fig.add_subplot(211) >>> x = stats.loggamma.rvs(5, size=500) + 5 >>> prob = stats.probplot(x, dist=stats.norm, plot=ax1) >>> ax1.set_xlabel('') >>> ax1.set_title('Probplot against normal distribution')
We now use
boxcox
to transform the data so it’s closest to normal:>>> ax2 = fig.add_subplot(212) >>> xt, _ = stats.boxcox(x) >>> prob = stats.probplot(xt, dist=stats.norm, plot=ax2) >>> ax2.set_title('Probplot after Box-Cox transformation')
>>> plt.show()