ENH: array-api for sparse arrays (as much as possible)

I would like to propose that scipy sparse arrays follow the [array-api](https://data-apis.org/array-api/latest/).

A big part of the push to move from the matrix API to a numpy-array-like API for sparse data structures is to have better interoperability with dense numpy arrays. The array-api and broader data-api consortium generalizes this goal for a variety of array implementations – and has the support of large amounts of the numeric-python ecosystem.

## Benefits

* Downstream libraries can support a single code path for dense and sparse arrays in more cases
    * Key usecase:  support in array wrapper libraries like xarray and dask
* It comes with a test suite!
* Easier decision making around API since many decisions have already been made
* scipy itself is already adopting the array-api
    * https://github.com/scipy/scipy/issues/15354

## Big caveat: parts of the array api don't make sense for sparse arrays

There are cases where the array-api isn't a good match for sparse data. I'll give some examples below, but would propose partial API support would be reasonable.

Off the top of my head:

### `dlpack` based interchange

Since sparse arrays are meant to be an efficient encoding of matrices with large amounts of zeros, it does not neccesarily make sense to allocate all those missing values as a default interchange mechanism.

Surely we can achieve more reasonable interchange of sparse matrices between devices.

### nD support

While 1d support may be reasonable nD support, especially for non-COO formats, is likely out of scope for this library. See also:

* https://github.com/scipy/scipy/pull/18530

This means that specific concatenation, indexing, and reshaping operations may not work. Arguably, the reshape operation may not even make sense.

### Broadcasting with null values

Sparse libraries often don't play very well with null values. The optimization of skipping the 0 values often means 

<details>
<summary> Example </summary>

```python
from scipy import sparse
import numpy as np

coo = sparse.coo_array(([1, 1, 1, 1, 1], ([0, 0, 1, 2, 2], [0, 1, 1, 0, 2])))
coo.toarray()
# array([[1, 1, 0],
#        [0, 1, 0],
#        [1, 0, 1]])

(coo * np.array([np.nan, np.nan, 2.])).toarray()
# array([[nan, nan,  0.],
#        [ 0., nan,  0.],
#        [nan,  0.,  2.]])
```

</details>

## Alternative: this is for [pydata/sparse](https://sparse.pydata.org/en/stable/) to do

While I like pydata sparse, I think it's adoption by the broader ecosystem has some major barriers. 

First is the use of `numba` as a dependency. I quite like numba, and have made a number of contributions there with the specific goal of making operations of sparse arrays work better (a lot of slicing and indexing). However, it is not as friendly a runtime dependency as scipy. It has had multiple compatibility issues with libraries which I'd like to share my sparse data with (like `pytorch` and `jax`) and frequently pins both numpy and python.

A second reason is that it would be strange to end up at "you should use pydata/sparse" but still have sparse linear algebra and IO libraries included in scipy. This could make sense if  scipy depended on pydata/sparse, though I understand this to be a non-starter due to `numba`.

If pydata/sparse and scipy.sparse both need to exist, it would be really nice if they could exist largely interchangably. E.g. one array-api code path.

*Aside: it could be interesting to explore AOT compiling a subset of pydata/sparse (COO, CSR, CSC) and distributing the compiled implementations through scipy.*

cc: @perimosocordiae @dschult @jjerphan @rossbar @stefanv 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: array-api for sparse arrays (as much as possible) #18915

Benefits

Big caveat: parts of the array api don't make sense for sparse arrays

`dlpack` based interchange

nD support

Broadcasting with null values

Alternative: this is for pydata/sparse to do

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

ENH: array-api for sparse arrays (as much as possible) #18915

Description

Benefits

Big caveat: parts of the array api don't make sense for sparse arrays

dlpack based interchange

nD support

Broadcasting with null values

Alternative: this is for pydata/sparse to do

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`dlpack` based interchange