Skip to content

ENH: Inverse indices from np.unique() sharing the input array's shape #20638

@honno

Description

@honno

Currently inverse indices are always(?) a 1-D array. This means input arrays with !=1 ndims cannot just be reconstructed by indexing the unique values with the inverse indices e.g. in the docs we have the following reconstruction example

>>> a = np.array([1, 2, 6, 4, 2, 3, 2])
>>> u, indices = np.unique(a, return_inverse=True)
>>> u[indices]
array([1, 2, 6, 4, 2, 3, 2])

but we can't do the same for 0-D/2+ dimensional arrays

>>> a = np.array([[1, 2, 6], [2, 3, 2]])
>>> u, indices = np.unique(a, return_inverse=True)
>>> u[indices]
array([1, 2, 6, 2, 3, 2])  # shape is not 2-D

Right now you need to use reshape() to actually reconstruct the original array

>>> u[indices.reshape(a.shape)]
array([[1, 2, 6],
       [2, 3, 2]])

So IMO it would be a nice usability change for these inverse indices to share the input array shape so you can just do

>>> u[indices]
array([[1, 2, 6],
       [2, 3, 2]])

Another benefit is that you will fix a current bug in array_api.unique_inverse/array_api.unique_all, as the Array API spec specifies inverse indices should indeed share the same shape as the input array. Admittedly that can easily be solved with an internal reshaping. cc @asmeurer

I wonder if there's some reasoning I'm missing for always returning 1-D arrays. Maybe integer indexing came after return_inverse was added. And whilst the shape of the returned inverse indices is not documented, I wonder if there are unintended consequences to changing this behaviour for downstream libraries.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions