Skip to content

MAINT: io: migration to use sparray in IO #21905

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Dec 4, 2024

Conversation

dschult
Copy link
Contributor

@dschult dschult commented Nov 18, 2024

This PR migrates sparse.io to use sparse arrays internally and to read files into either sparray or spmatrix.
The primary functions that can (depending on the file contents) return sparse containers are:
scipy.io.loadmat, scipy.io.hb_read, scipy.io.mmread.
These can also be accessed via scipy.io.matlab.loadmat, scipy._harwell_boeing.hb_read scipy.io._fast_matrix_market.mmread, and alternate version of mmread is available at scipy.io._mmio.mmread.

Each of those functions currently returns a sparse matrix. This PR adds a kwarg sparray=None indicating (bool) whether this should return a sparray or an spmatrix. The default is None indicating a preference has not been provided.
The default value is set to None, but is deprecated in release 1.15 with a change of default to sparray coming in 1.17.

The deprecation in the doc_strings is:

    .. deprecated:: 1.15.0
        The default sparse return type of ``coo_matrix`` has been deprecated
        in favour of ``coo_array``. Default will be changed in SciPy 1.17.0.
        Use new argument ``sparray=True`` to anticipate the future, or
        ``False`` to silence the warning and return ``coo_matrix`` even
        after the change in default.

And the DeprecationWarning message is shown when a sparse container is returned and sparray is None. The warning message is:

                msg = ("The default sparse return type, ``coo_matrix``, has"
                       " been deprecated in favour of ``coo_array``."
                       " Default will be changed in SciPy 1.17.0."
                       " Use new argument ``sparray=True`` to anticipate"
                       " the future, or ``False`` to silence the warning and"
                       " return ``coo_matrix`` even after the change in default.")
                warnings.warn(msg, DeprecationWarning, stacklevel=2)

I hope I have the stack levels correct. It seems to be OK because the tests picked up the warnings before I updated them to avoid it.

Outside of those functions, the helper functions and classes/methods and the tests now work with sparse arrays.

After the deprecation period for the deault return value, the default will shift to returning sparse arrays, though folks who choose to quite the deprecation warning by setting sparray=False will still get spmatrix.

The dev.py smoke-docs doesn't reach all doc_tests in this subpackage. But dev.py smoke-docs -t scipy/io/_mmio.py showed a number of errors that I corrected here (hopefully those fixes will work with the CI doctests too).

One test in sparse.csgraph uses mmread and had to be updated.

@dschult dschult added scipy.io maintenance Items related to regular maintenance tasks labels Nov 18, 2024
@github-actions github-actions bot added scipy.sparse.csgraph Cython Issues with the internal Cython code base labels Nov 18, 2024
Copy link
Member

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dschult! Before looking at the details, my main comment here is about the migration strategy, which doesn't look ideal. The current approach shows every user of these functions a deprecation warning, and then to avoid it they have a choice between:

  1. Explicitly add sparray=False, or
  2. Write an if scipy_version>=1.15: sparray=True else: leave-out-sparray-kw

(2) is quite awkward and in addition it doesn't get you the same behavior for your code for older and newer SciPy. Hence most users probably need to go with (1).

It seems much cleaner to just add an sparray=False keyword, which lets users opt into returning arrays and otherwise it gets you the same outcome as (1) but without all users needing to change their code. And then when the matrix classes themselves get deprecated, those users will see a deprecation warning. That's a few releases away, so the if-else dance is then no longer needed.

@dschult
Copy link
Contributor Author

dschult commented Nov 21, 2024

Done. I've set the new keyword to be sparray=False. This will avoid making users check scipy_version for the ability to use sparray.

@dschult
Copy link
Contributor Author

dschult commented Nov 25, 2024

Perhaps a better name for the keyword is spmatrix=True rather than sparray=False. The name makes the eventual removal of the keyword feel more natural.

@rgommers
Copy link
Member

Perhaps a better name for the keyword is spmatrix=True rather than sparray=False. The name makes the eventual removal of the keyword feel more natural.

Sure, that sounds fine to me. I suspect it comes down to the same thing - one has to start using the keyword at some point to avoid a deprecation warning.

@dschult
Copy link
Contributor Author

dschult commented Nov 27, 2024

I've switched the kwarg name from sparray=False to spmatrix=True.

The lint errors are UP031 due to the new ruff version (format strings). And the changes are subtle in the io subpackage. I think some of these changes are good examples for why Python needs to keep %. So let's make those in a different PR so we pay close attention.

@rgommers
Copy link
Member

The lint errors are UP031 due to the new ruff version (format strings

Yeah it'd be nice to turn those off ....

@dschult dschult force-pushed the io_migrate_sparray branch from e235a49 to 962506d Compare December 1, 2024 19:29
@dschult
Copy link
Contributor Author

dschult commented Dec 3, 2024

I think this is ready to merge. @rgommers did you want to look at this one more time?

Copy link
Member

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comments were addressed, changes look good, and CI is happy - let's give this a go. Thanks @dschult!

@rgommers rgommers added this to the 1.15.0 milestone Dec 4, 2024
@rgommers rgommers merged commit bddd6df into scipy:main Dec 4, 2024
37 checks passed
@dschult dschult deleted the io_migrate_sparray branch January 29, 2025 21:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Cython Issues with the internal Cython code base maintenance Items related to regular maintenance tasks scipy.io scipy.sparse.csgraph
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants