ENH: spatial: ensure thread-safety #21955

andfoy · 2024-11-26T22:17:54Z

Reference issue

See #20669

What does this implement/fix?

This is a continuation of the work done in #21496, this time, each module gets its own PR detailing changes that ensure that the tests pass under a concurrent scenario.

Additional information

See the description of #21496 for more detailed information.

cc @rgommers

…llel_threads > 1

rgommers

Thanks @andfoy. This looks pretty good! Just one question inline.

It is surprising that the tests pass given that gh-20655 is still open. If KDTree is unsafe but the current test suite doesn't find a problem, it seems important to try to add one or more new tests to gh-20655 that do exercise robustness under free-threading better.

And ... just after I wrote the above, my longer stress test with --parallel-threads=30 does turn up a problem in KDTree:

____________________________________________________________ test_ckdtree_parallel[cKDTree] ____________________________________________________________
scipy/spatial/tests/test_kdtree.py:896: in test_ckdtree_parallel
    T2 = T.query(points, k=5, workers=-1)[-1]
        T          = <scipy.spatial._ckdtree.cKDTree object at 0x453e40b0070>
        T1         = array([[   0, 3354, 2409,  591, 4678],
       [   1, 4585, 1793, 3631, 2603],
       [   2,  842, 4689,   70, 2452],
 ...,
       [4997,  590, 2104, 2254, 4105],
       [4998, 4780, 3204, 1758, 4396],
       [4999,  509, 1466,  972, 3261]])
        k          = 4
        kdtree_type = <class 'scipy.spatial._ckdtree.cKDTree'>
        monkeypatch = <_pytest.monkeypatch.MonkeyPatch object at 0x453ab59b2d0>
        n          = 5000
        points     = array([[-1.93950036,  0.73885045,  1.39468453, -0.81358502],
       [-0.818822  , -0.1027978 ,  1.23934523,  0.3642774...    [ 1.50415164, -0.24124136, -0.3741709 , -1.39281727],
       [-0.14961262,  0.89597933, -2.11687003, -0.18395164]])
scipy/spatial/_ckdtree.pyx:789: in scipy.spatial._ckdtree.cKDTree.query
    ???
scipy/spatial/_ckdtree.pyx:396: in scipy.spatial._ckdtree.get_num_workers
    ???
E   NotImplementedError: Cannot determine the number of cpus using os.cpu_count(), cannot use -1 for the number of workers

Fine to leave that alone here and deal with it in gh-20655.

rgommers · 2024-11-27T16:31:31Z

scipy/spatial/tests/test_hausdorff.py

        # check indices
-        assert actual[1:] == expected[1:]
+        if num_parallel_threads == 1 or starting_seed != 77098:


This is a bit cryptic. The failure for parallel testing with this particular seed looks like:

__________________________________________________ TestHausdorff.test_subsets[A5-B5-seed5-expected5] ___________________________________________________ scipy/spatial/tests/test_hausdorff.py:173: in test_subsets assert actual[1:] == expected[1:] E AssertionError: assert (1, 1) == (0, 2) E E At index 0 diff: 1 != 0 E E Full diff: E ( E - 0, E ? ^... E E ...Full output truncated (7 lines hidden), use '-vv' to show A = [(-5, 3), (0, 0)] B = [(0, 1), (0, 0), (-5, 3)] actual = (0.0, 1, 1) expected = (0.0, 0, 2) num_parallel_threads = 2 seed = Generator(PCG64) at 0x27C647B9840 self = <scipy.spatial.tests.test_hausdorff.TestHausdorff object at 0x27c62574390> starting_seed = 77098

I can't immediately tell why - is there a thread safety issue, or is there a deterministic reason for the mismatch?

According to the comment above:

scipy/scipy/spatial/tests/test_hausdorff.py

Line 158 in fd24a7a

# NOTE: using a Generator changes the

it states that the indices might change, which I imagine it is what is occurring here

Ah makes sense, thanks. @tylerjereddy you're the expert here, so you may want to have a peek at this tweak to the hausdorff tests perhaps.

This all makes sense as I note below. One question I have is how this relates to "genuine" concurrent implementations of the algorithm, like the Rust one I tried a few years ago in gh-14719 (which also notes the lack of determinism for degenerate inputs)--specifically, this allows threads to run concurrently so that in a given C program like this one ordering guarantees are loosened, but we don't necessarily get substantial performance improvements from that alone right? I.e., we still likely need to write something with atomics/locks in the compiled backend to fully leverage the concurrency/distribute the work and so on?

Is that a reasonable understanding?

specifically, this allows threads to run concurrently so that in a given C program like this one ordering guarantees are loosened

The free-threading work allows Python level threads with the threading module to execute in parallel. If those call into C/C++/Cython code, that code will also run in parallel. That parallelism with the default (with-gil) CPython is (a) not possible for Python code, and (b) only possible in C/C++/Cython code if that code explicitly releases the GIL itself, and then re-acquires it before returning the result.

but we don't necessarily get substantial performance improvements from that alone right?

Correct. For a single function call it doesn't help. The speedups come from the end user starting to use threading.

I.e., we still likely need to write something with atomics/locks in the compiled backend to fully leverage the concurrency/distribute the work and so on?

That's a separate thing entirely. For making a single hausdorff call run in parallel, we'd need to apply the workers= pattern like done in scipy.fft for example. And that may need locks and/or atomics if there are shared data structures, yes.

tylerjereddy

My review observations are similar to Ralf's:

on latest main branch on my ARM Mac laptop in 3.13t venv with export PYTHON_GIL=0, python dev.py test -t scipy/spatial/tests -- --parallel-threads=10 has multiple failures and a hang on scipy/spatial/tests/test_distance.py
on the new branch: all tests pass for spatial, so that seems like a net improvement
like Ralf, I can also repro failures in test_kdtree.py if I go wild with --parallel-threads setting, and agree on delaying that matter-KDTree internals are a nightmare that need specific reviewers

The Hausdorff shim makes sense to me since the ignored seed (when concurrent) is for the non-deterministic index value cases. It seems fine, but I do have a question about it that I'll ask inline.

The lint CI failure is the usual "same file, different lines" business, and the other one is gh-21957, so can be ignored as well.

tylerjereddy · 2024-11-27T22:16:01Z

thanks both

andfoy added 2 commits November 26, 2024 17:17

FIX: scipy.spatial tests are running under concurrent loads

6a0b02b

TST: omit index check on test_subsets when seed == 77098 and num_para…

ed8713d

…llel_threads > 1

andfoy requested review from tylerjereddy and peterbell10 as code owners November 26, 2024 22:17

github-actions bot added scipy.spatial enhancement A new feature or improvement labels Nov 26, 2024

TST: use thread_unsafe and not parallel_threads(1)

28a6492

andfoy mentioned this pull request Nov 26, 2024

Tracker: Support free-threaded CPython builds #20669

Open

51 tasks

lucascolley changed the title ~~ENH: Ensure that scipy.spatial is thread-safe~~ ENH: spatial: ensure thread-safety Nov 26, 2024

rgommers added the free-threading Items related to supporting free-threaded (a.k.a. "no-GIL") builds of CPython label Nov 27, 2024

rgommers reviewed Nov 27, 2024

View reviewed changes

rgommers added this to the 1.15.0 milestone Nov 27, 2024

tylerjereddy approved these changes Nov 27, 2024

View reviewed changes

tylerjereddy merged commit 3f1403a into scipy:main Nov 27, 2024
34 of 37 checks passed

andfoy deleted the spatial_parallel_testing branch November 27, 2024 22:44

rgommers mentioned this pull request Nov 28, 2024

ENH: special: ensure error handling thread-safety #21956

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: spatial: ensure thread-safety #21955

ENH: spatial: ensure thread-safety #21955

Uh oh!

andfoy commented Nov 26, 2024

Uh oh!

rgommers left a comment

Uh oh!

rgommers Nov 27, 2024

Uh oh!

andfoy Nov 27, 2024

Uh oh!

rgommers Nov 27, 2024

Uh oh!

tylerjereddy Nov 27, 2024

Uh oh!

rgommers Nov 28, 2024

Uh oh!

tylerjereddy left a comment

Uh oh!

Uh oh!

tylerjereddy commented Nov 27, 2024

Uh oh!

Uh oh!

Uh oh!

ENH: spatial: ensure thread-safety #21955

ENH: spatial: ensure thread-safety #21955

Uh oh!

Conversation

andfoy commented Nov 26, 2024

Reference issue

What does this implement/fix?

Additional information

Uh oh!

rgommers left a comment

Choose a reason for hiding this comment

Uh oh!

rgommers Nov 27, 2024

Choose a reason for hiding this comment

Uh oh!

andfoy Nov 27, 2024

Choose a reason for hiding this comment

Uh oh!

rgommers Nov 27, 2024

Choose a reason for hiding this comment

Uh oh!

tylerjereddy Nov 27, 2024

Choose a reason for hiding this comment

Uh oh!

rgommers Nov 28, 2024

Choose a reason for hiding this comment

Uh oh!

tylerjereddy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tylerjereddy commented Nov 27, 2024

Uh oh!

Uh oh!