Skip to content

ENH: spatial: ensure thread-safety #21955

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 27, 2024

Conversation

andfoy
Copy link
Contributor

@andfoy andfoy commented Nov 26, 2024

Reference issue

See #20669

What does this implement/fix?

This is a continuation of the work done in #21496, this time, each module gets its own PR detailing changes that ensure that the tests pass under a concurrent scenario.

Additional information

See the description of #21496 for more detailed information.

cc @rgommers

@github-actions github-actions bot added scipy.spatial enhancement A new feature or improvement labels Nov 26, 2024
@lucascolley lucascolley changed the title ENH: Ensure that scipy.spatial is thread-safe ENH: spatial: ensure thread-safety Nov 26, 2024
@rgommers rgommers added the free-threading Items related to supporting free-threaded (a.k.a. "no-GIL") builds of CPython label Nov 27, 2024
Copy link
Member

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andfoy. This looks pretty good! Just one question inline.

It is surprising that the tests pass given that gh-20655 is still open. If KDTree is unsafe but the current test suite doesn't find a problem, it seems important to try to add one or more new tests to gh-20655 that do exercise robustness under free-threading better.

And ... just after I wrote the above, my longer stress test with --parallel-threads=30 does turn up a problem in KDTree:

____________________________________________________________ test_ckdtree_parallel[cKDTree] ____________________________________________________________
scipy/spatial/tests/test_kdtree.py:896: in test_ckdtree_parallel
    T2 = T.query(points, k=5, workers=-1)[-1]
        T          = <scipy.spatial._ckdtree.cKDTree object at 0x453e40b0070>
        T1         = array([[   0, 3354, 2409,  591, 4678],
       [   1, 4585, 1793, 3631, 2603],
       [   2,  842, 4689,   70, 2452],
 ...,
       [4997,  590, 2104, 2254, 4105],
       [4998, 4780, 3204, 1758, 4396],
       [4999,  509, 1466,  972, 3261]])
        k          = 4
        kdtree_type = <class 'scipy.spatial._ckdtree.cKDTree'>
        monkeypatch = <_pytest.monkeypatch.MonkeyPatch object at 0x453ab59b2d0>
        n          = 5000
        points     = array([[-1.93950036,  0.73885045,  1.39468453, -0.81358502],
       [-0.818822  , -0.1027978 ,  1.23934523,  0.3642774...    [ 1.50415164, -0.24124136, -0.3741709 , -1.39281727],
       [-0.14961262,  0.89597933, -2.11687003, -0.18395164]])
scipy/spatial/_ckdtree.pyx:789: in scipy.spatial._ckdtree.cKDTree.query
    ???
scipy/spatial/_ckdtree.pyx:396: in scipy.spatial._ckdtree.get_num_workers
    ???
E   NotImplementedError: Cannot determine the number of cpus using os.cpu_count(), cannot use -1 for the number of workers

Fine to leave that alone here and deal with it in gh-20655.

# check indices
assert actual[1:] == expected[1:]
if num_parallel_threads == 1 or starting_seed != 77098:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit cryptic. The failure for parallel testing with this particular seed looks like:

__________________________________________________ TestHausdorff.test_subsets[A5-B5-seed5-expected5] ___________________________________________________
scipy/spatial/tests/test_hausdorff.py:173: in test_subsets
    assert actual[1:] == expected[1:]
E   AssertionError: assert (1, 1) == (0, 2)
E     
E     At index 0 diff: 1 != 0
E     
E     Full diff:
E       (
E     -     0,
E     ?     ^...
E     
E     ...Full output truncated (7 lines hidden), use '-vv' to show
        A          = [(-5, 3), (0, 0)]
        B          = [(0, 1), (0, 0), (-5, 3)]
        actual     = (0.0, 1, 1)
        expected   = (0.0, 0, 2)
        num_parallel_threads = 2
        seed       = Generator(PCG64) at 0x27C647B9840
        self       = <scipy.spatial.tests.test_hausdorff.TestHausdorff object at 0x27c62574390>
        starting_seed = 77098

I can't immediately tell why - is there a thread safety issue, or is there a deterministic reason for the mismatch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the comment above:

# NOTE: using a Generator changes the

it states that the indices might change, which I imagine it is what is occurring here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah makes sense, thanks. @tylerjereddy you're the expert here, so you may want to have a peek at this tweak to the hausdorff tests perhaps.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all makes sense as I note below. One question I have is how this relates to "genuine" concurrent implementations of the algorithm, like the Rust one I tried a few years ago in gh-14719 (which also notes the lack of determinism for degenerate inputs)--specifically, this allows threads to run concurrently so that in a given C program like this one ordering guarantees are loosened, but we don't necessarily get substantial performance improvements from that alone right? I.e., we still likely need to write something with atomics/locks in the compiled backend to fully leverage the concurrency/distribute the work and so on?

Is that a reasonable understanding?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

specifically, this allows threads to run concurrently so that in a given C program like this one ordering guarantees are loosened

The free-threading work allows Python level threads with the threading module to execute in parallel. If those call into C/C++/Cython code, that code will also run in parallel. That parallelism with the default (with-gil) CPython is (a) not possible for Python code, and (b) only possible in C/C++/Cython code if that code explicitly releases the GIL itself, and then re-acquires it before returning the result.

but we don't necessarily get substantial performance improvements from that alone right?

Correct. For a single function call it doesn't help. The speedups come from the end user starting to use threading.

I.e., we still likely need to write something with atomics/locks in the compiled backend to fully leverage the concurrency/distribute the work and so on?

That's a separate thing entirely. For making a single hausdorff call run in parallel, we'd need to apply the workers= pattern like done in scipy.fft for example. And that may need locks and/or atomics if there are shared data structures, yes.

@rgommers rgommers added this to the 1.15.0 milestone Nov 27, 2024
Copy link
Contributor

@tylerjereddy tylerjereddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My review observations are similar to Ralf's:

  • on latest main branch on my ARM Mac laptop in 3.13t venv with export PYTHON_GIL=0, python dev.py test -t scipy/spatial/tests -- --parallel-threads=10 has multiple failures and a hang on scipy/spatial/tests/test_distance.py
  • on the new branch: all tests pass for spatial, so that seems like a net improvement
  • like Ralf, I can also repro failures in test_kdtree.py if I go wild with --parallel-threads setting, and agree on delaying that matter-KDTree internals are a nightmare that need specific reviewers

The Hausdorff shim makes sense to me since the ignored seed (when concurrent) is for the non-deterministic index value cases. It seems fine, but I do have a question about it that I'll ask inline.

The lint CI failure is the usual "same file, different lines" business, and the other one is gh-21957, so can be ignored as well.

@tylerjereddy tylerjereddy merged commit 3f1403a into scipy:main Nov 27, 2024
34 of 37 checks passed
@tylerjereddy
Copy link
Contributor

thanks both

@andfoy andfoy deleted the spatial_parallel_testing branch November 27, 2024 22:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement free-threading Items related to supporting free-threaded (a.k.a. "no-GIL") builds of CPython scipy.spatial
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants