-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
ENH: stats.qmc: add bounds parameters to PoissonDisk #20767
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: Diogo Pires <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR @Matryoskas 🚀
@mplough-kobold if you have time, it would be great if you could also have a look.
From a cursory look now, I think we are missing something for the lower bound. If think it would be easier to verify all that if you showed some figures e.g. using this code that I had for the initial PR.
Figure code
import matplotlib.pyplot as plt
from matplotlib.collections import PatchCollection
from scipy.stats import qmc
rng = np.random.default_rng()
radius = 0.2
engine = qmc.PoissonDisk(d=2, radius=0.2, seed=rng)
sample = engine.random(20)
fig, ax = plt.subplots()
_ = ax.scatter(sample[:, 0], sample[:, 1])
circles = [plt.Circle((xi, yi), radius=radius/2, fill=False)
for xi, yi in sample]
collection = PatchCollection(circles, match_original=True)
ax.add_collection(collection)
_ = ax.set(aspect='equal', xlabel=r'$x_1$', ylabel=r'$x_2$',
xlim=[0, 1], ylim=[0, 1])
plt.show()
for i in range(self.d): | ||
if (sample[i] > self.u_bounds[i] or sample[i] < self.l_bounds[i]): | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to do this in a vectorised way to hopefully not get too much impact on speed.
Did you benchmark you changes? We actually don't have a benchmark for this function, maybe we can consider adding one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added the benchmark in my last commit and I tested it with the changes I think you suggested, replacing that section of the code with:
if np.any((sample > self.u_bounds) | (sample < self.l_bounds)):
return False
However, this actually worsened its performance. What should I do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would guess that it's a matter of size of the array. Do you have the numbers? I would expect that testing on a larger array would make the NumPy version faster. So I would use the NumPy version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before:
=== ======== ============= ========== ========== ==========
-- n
-------------------------- --------------------------------
d radius ncandidates 30 100 300
=== ======== ============= ========== ========== ==========
1 0.2 30 3.13±0ms 2.39±0ms 2.73±0ms
1 0.2 60 4.65±0ms 3.99±0ms 3.94±0ms
1 0.2 120 6.04±0ms 6.01±0ms 5.92±0ms
1 0.1 30 4.00±0ms 3.97±0ms 3.94±0ms
1 0.1 60 6.71±0ms 6.69±0ms 6.67±0ms
1 0.1 120 12.6±0ms 12.7±0ms 12.6±0ms
1 0.05 30 7.72±0ms 7.17±0ms 7.32±0ms
1 0.05 60 13.4±0ms 13.1±0ms 13.3±0ms
1 0.05 120 29.5±0ms 25.2±0ms 25.0±0ms
3 0.2 30 5.35±0ms 52.3±0ms 52.1±0ms
3 0.2 60 6.20±0ms 99.9±0ms 104±0ms
3 0.2 120 8.81±0ms 158±0ms 222±0ms
3 0.1 30 4.69±0ms 19.6±0ms 85.1±0ms
3 0.1 60 6.45±0ms 26.1±0ms 105±0ms
3 0.1 120 9.11±0ms 37.0±0ms 195±0ms
3 0.05 30 4.19±0ms 18.2±0ms 62.9±0ms
3 0.05 60 5.59±0ms 24.4±0ms 86.4±0ms
3 0.05 120 6.91±0ms 42.2±0ms 166±0ms
5 0.2 30 6.37±0ms 19.1±0ms 61.8±0ms
5 0.2 60 6.69±0ms 20.0±0ms 76.1±0ms
5 0.2 120 7.29±0ms 26.4±0ms 98.6±0ms
5 0.1 30 23.7±0ms 35.7±0ms 83.4±0ms
5 0.1 60 23.4±0ms 40.4±0ms 95.4±0ms
5 0.1 120 23.2±0ms 46.3±0ms 108±0ms
5 0.05 30 487±0ms 503±0ms 566±0ms
5 0.05 60 489±0ms 506±0ms 579±0ms
5 0.05 120 488±0ms 508±0ms 581±0ms
=== ======== ============= ========== ========== ==========
After:
=== ======== ============= ========== ========== ==========
-- n
-------------------------- --------------------------------
d radius ncandidates 30 100 300
=== ======== ============= ========== ========== ==========
1 0.2 30 3.05±0ms 3.04±0ms 2.99±0ms
1 0.2 60 5.17±0ms 5.13±0ms 5.11±0ms
1 0.2 120 8.16±0ms 8.20±0ms 8.13±0ms
1 0.1 30 4.93±0ms 4.97±0ms 5.01±0ms
1 0.1 60 8.32±0ms 8.40±0ms 8.35±0ms
1 0.1 120 16.3±0ms 16.3±0ms 16.4±0ms
1 0.05 30 8.84±0ms 8.88±0ms 8.91±0ms
1 0.05 60 17.1±0ms 16.2±0ms 16.5±0ms
1 0.05 120 32.6±0ms 32.3±0ms 32.9±0ms
3 0.2 30 6.16±0ms 61.7±0ms 61.9±0ms
3 0.2 60 7.12±0ms 121±0ms 128±0ms
3 0.2 120 10.3±0ms 187±0ms 254±0ms
3 0.1 30 5.41±0ms 22.0±0ms 95.9±0ms
3 0.1 60 7.28±0ms 29.6±0ms 121±0ms
3 0.1 120 10.2±0ms 42.1±0ms 221±0ms
3 0.05 30 4.44±0ms 19.7±0ms 70.9±0ms
3 0.05 60 6.10±0ms 25.7±0ms 98.2±0ms
3 0.05 120 7.59±0ms 49.2±0ms 184±0ms
5 0.2 30 6.27±0ms 20.1±0ms 66.0±0ms
5 0.2 60 7.12±0ms 22.8±0ms 81.1±0ms
5 0.2 120 7.63±0ms 28.1±0ms 104±0ms
5 0.1 30 24.1±0ms 36.3±0ms 83.7±0ms
5 0.1 60 23.4±0ms 41.3±0ms 99.7±0ms
5 0.1 120 23.3±0ms 47.0±0ms 110±0ms
5 0.05 30 489±0ms 501±0ms 548±0ms
5 0.05 60 488±0ms 509±0ms 586±0ms
5 0.05 120 484±0ms 516±0ms 582±0ms
=== ======== ============= ========== ========== ==========
scipy/stats/_qmc.py
Outdated
|
||
with np.errstate(divide='ignore'): | ||
self.cell_size = self.radius / np.sqrt(self.d) | ||
self.grid_size = ( | ||
np.ceil(np.ones(self.d) / self.cell_size) | ||
np.ceil(self.u_bounds / self.cell_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok but then how does that take into account the lower bounds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On line 2072 ind_min = np.maximum(indices - n, self.l_bounds.astype(int))
, this is where the lower bounds is taken into account.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But shouldn't the lower bound also affect the grid size?
Side note: please do not resolve conversations. Maintainers in SciPy typically like to close themselves at it helps us to keep track of our review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct. I took a look at the algorithm from the original paper and made some updates to the code. Lower bounds should now be properly taken into account
@tupui can you take a look at the pr, when you have the chance? |
Hi @GreenMemory16 I was waiting for you. I would still like to see a few plots demonstrating that it does work as expected (cf my previous message). For the benchmark, I would be fine as it is because I don't think it's very critical. If we wanted to be sure we would need to do something like d=10 and n=1000. Otherwise that's probably insignificant. |
Ah, yes. I'll do it as soon as possible. Sorry about that. |
@tupui Is this good enough? Beforeimport numpy as np
import matplotlib.pyplot as plt
from matplotlib.collections import PatchCollection
from scipy.stats import qmc
rng = np.random.default_rng()
radius = 0.2
engine = qmc.PoissonDisk(d=2, radius=radius, seed=rng)
sample = engine.fill_space()
sample = qmc.scale(sample, [0.5, 0.75], [2, 1.5])
fig, ax = plt.subplots()
_ = ax.scatter(sample[:, 0], sample[:, 1])
circles = [plt.Circle((xi, yi), radius=radius/2, fill=False)
for xi, yi in sample]
collection = PatchCollection(circles, match_original=True)
ax.add_collection(collection)
_ = ax.set(aspect='equal', xlabel=r'$x_1$', ylabel=r'$x_2$',
xlim=[0.5, 2], ylim=[0.75, 1.5])
plt.show() Afterimport numpy as np
import matplotlib.pyplot as plt
from matplotlib.collections import PatchCollection
from scipy.stats import qmc
rng = np.random.default_rng()
radius = 0.2
engine = qmc.PoissonDisk(d=2, radius=radius, seed=rng, l_bounds=[0.5, 0.75], u_bounds=[2, 1.5])
sample = engine.fill_space()
fig, ax = plt.subplots()
_ = ax.scatter(sample[:, 0], sample[:, 1])
circles = [plt.Circle((xi, yi), radius=radius/2, fill=False)
for xi, yi in sample]
collection = PatchCollection(circles, match_original=True)
ax.add_collection(collection)
_ = ax.set(aspect='equal', xlabel=r'$x_1$', ylabel=r'$x_2$',
xlim=[0.5, 2], ylim=[0.75, 1.5])
plt.show() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you 🙏 this looks good to me now! It's great to have this feature.
@dschmitz89 I think I saw you having a look. Do you have an opinion here? If not I plan to merge next week.
I commented with a question but noticed my own stupidity a minute later, so deleted it :D. This LGTM. |
No worries 😉 there is no such thing as a stupid question, better ask than stay silent and miss something. So thank you! Getting this in then! Thank again everyone 🙌 great work. |
Reference issue
Closes gh-20288
What does this implement/fix?
Add optional lower bounds and upper bounds parameters to scipy.stats.qmc.PoissonDisk, to allow for equal scaling in target sample. Only points that are inside the specified bounds are accepted and added to the target sample.