-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
ENH: spatial: ensure thread-safety #21955
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @andfoy. This looks pretty good! Just one question inline.
It is surprising that the tests pass given that gh-20655 is still open. If KDTree
is unsafe but the current test suite doesn't find a problem, it seems important to try to add one or more new tests to gh-20655 that do exercise robustness under free-threading better.
And ... just after I wrote the above, my longer stress test with --parallel-threads=30
does turn up a problem in KDTree
:
____________________________________________________________ test_ckdtree_parallel[cKDTree] ____________________________________________________________
scipy/spatial/tests/test_kdtree.py:896: in test_ckdtree_parallel
T2 = T.query(points, k=5, workers=-1)[-1]
T = <scipy.spatial._ckdtree.cKDTree object at 0x453e40b0070>
T1 = array([[ 0, 3354, 2409, 591, 4678],
[ 1, 4585, 1793, 3631, 2603],
[ 2, 842, 4689, 70, 2452],
...,
[4997, 590, 2104, 2254, 4105],
[4998, 4780, 3204, 1758, 4396],
[4999, 509, 1466, 972, 3261]])
k = 4
kdtree_type = <class 'scipy.spatial._ckdtree.cKDTree'>
monkeypatch = <_pytest.monkeypatch.MonkeyPatch object at 0x453ab59b2d0>
n = 5000
points = array([[-1.93950036, 0.73885045, 1.39468453, -0.81358502],
[-0.818822 , -0.1027978 , 1.23934523, 0.3642774... [ 1.50415164, -0.24124136, -0.3741709 , -1.39281727],
[-0.14961262, 0.89597933, -2.11687003, -0.18395164]])
scipy/spatial/_ckdtree.pyx:789: in scipy.spatial._ckdtree.cKDTree.query
???
scipy/spatial/_ckdtree.pyx:396: in scipy.spatial._ckdtree.get_num_workers
???
E NotImplementedError: Cannot determine the number of cpus using os.cpu_count(), cannot use -1 for the number of workers
Fine to leave that alone here and deal with it in gh-20655.
# check indices | ||
assert actual[1:] == expected[1:] | ||
if num_parallel_threads == 1 or starting_seed != 77098: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit cryptic. The failure for parallel testing with this particular seed looks like:
__________________________________________________ TestHausdorff.test_subsets[A5-B5-seed5-expected5] ___________________________________________________
scipy/spatial/tests/test_hausdorff.py:173: in test_subsets
assert actual[1:] == expected[1:]
E AssertionError: assert (1, 1) == (0, 2)
E
E At index 0 diff: 1 != 0
E
E Full diff:
E (
E - 0,
E ? ^...
E
E ...Full output truncated (7 lines hidden), use '-vv' to show
A = [(-5, 3), (0, 0)]
B = [(0, 1), (0, 0), (-5, 3)]
actual = (0.0, 1, 1)
expected = (0.0, 0, 2)
num_parallel_threads = 2
seed = Generator(PCG64) at 0x27C647B9840
self = <scipy.spatial.tests.test_hausdorff.TestHausdorff object at 0x27c62574390>
starting_seed = 77098
I can't immediately tell why - is there a thread safety issue, or is there a deterministic reason for the mismatch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the comment above:
scipy/scipy/spatial/tests/test_hausdorff.py
Line 158 in fd24a7a
# NOTE: using a Generator changes the |
it states that the indices might change, which I imagine it is what is occurring here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah makes sense, thanks. @tylerjereddy you're the expert here, so you may want to have a peek at this tweak to the hausdorff
tests perhaps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all makes sense as I note below. One question I have is how this relates to "genuine" concurrent implementations of the algorithm, like the Rust one I tried a few years ago in gh-14719 (which also notes the lack of determinism for degenerate inputs)--specifically, this allows threads to run concurrently so that in a given C program like this one ordering guarantees are loosened, but we don't necessarily get substantial performance improvements from that alone right? I.e., we still likely need to write something with atomics/locks in the compiled backend to fully leverage the concurrency/distribute the work and so on?
Is that a reasonable understanding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
specifically, this allows threads to run concurrently so that in a given C program like this one ordering guarantees are loosened
The free-threading work allows Python level threads with the threading
module to execute in parallel. If those call into C/C++/Cython code, that code will also run in parallel. That parallelism with the default (with-gil) CPython is (a) not possible for Python code, and (b) only possible in C/C++/Cython code if that code explicitly releases the GIL itself, and then re-acquires it before returning the result.
but we don't necessarily get substantial performance improvements from that alone right?
Correct. For a single function call it doesn't help. The speedups come from the end user starting to use threading
.
I.e., we still likely need to write something with atomics/locks in the compiled backend to fully leverage the concurrency/distribute the work and so on?
That's a separate thing entirely. For making a single hausdorff
call run in parallel, we'd need to apply the workers=
pattern like done in scipy.fft
for example. And that may need locks and/or atomics if there are shared data structures, yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My review observations are similar to Ralf's:
- on latest
main
branch on my ARM Mac laptop in3.13t
venv withexport PYTHON_GIL=0
,python dev.py test -t scipy/spatial/tests -- --parallel-threads=10
has multiple failures and a hang onscipy/spatial/tests/test_distance.py
- on the new branch: all tests pass for
spatial
, so that seems like a net improvement - like Ralf, I can also repro failures in
test_kdtree.py
if I go wild with --parallel-threads setting, and agree on delaying that matter-KDTree internals are a nightmare that need specific reviewers
The Hausdorff shim makes sense to me since the ignored seed (when concurrent) is for the non-deterministic index value cases. It seems fine, but I do have a question about it that I'll ask inline.
The lint CI failure is the usual "same file, different lines" business, and the other one is gh-21957, so can be ignored as well.
thanks both |
Reference issue
See #20669
What does this implement/fix?
This is a continuation of the work done in #21496, this time, each module gets its own PR detailing changes that ensure that the tests pass under a concurrent scenario.
Additional information
See the description of #21496 for more detailed information.
cc @rgommers