Skip to content

ENH: stats.qmc: add bounds parameters to PoissonDisk #20767

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 20, 2024

Conversation

Matryoskas
Copy link
Contributor

Reference issue

Closes gh-20288

What does this implement/fix?

Add optional lower bounds and upper bounds parameters to scipy.stats.qmc.PoissonDisk, to allow for equal scaling in target sample. Only points that are inside the specified bounds are accepted and added to the target sample.

@Matryoskas Matryoskas requested a review from tupui as a code owner May 22, 2024 16:53
@github-actions github-actions bot added scipy.stats enhancement A new feature or improvement labels May 22, 2024
Copy link
Member

@tupui tupui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @Matryoskas 🚀

@mplough-kobold if you have time, it would be great if you could also have a look.

From a cursory look now, I think we are missing something for the lower bound. If think it would be easier to verify all that if you showed some figures e.g. using this code that I had for the initial PR.

Figure code
import matplotlib.pyplot as plt
from matplotlib.collections import PatchCollection
from scipy.stats import qmc


rng = np.random.default_rng()
radius = 0.2
engine = qmc.PoissonDisk(d=2, radius=0.2, seed=rng)
sample = engine.random(20)

fig, ax = plt.subplots()
_ = ax.scatter(sample[:, 0], sample[:, 1])
circles = [plt.Circle((xi, yi), radius=radius/2, fill=False)
           for xi, yi in sample]
collection = PatchCollection(circles, match_original=True)
ax.add_collection(collection)
_ = ax.set(aspect='equal', xlabel=r'$x_1$', ylabel=r'$x_2$',
           xlim=[0, 1], ylim=[0, 1])
plt.show()

Comment on lines +2061 to +2063
for i in range(self.d):
if (sample[i] > self.u_bounds[i] or sample[i] < self.l_bounds[i]):
return False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to do this in a vectorised way to hopefully not get too much impact on speed.

Did you benchmark you changes? We actually don't have a benchmark for this function, maybe we can consider adding one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the benchmark in my last commit and I tested it with the changes I think you suggested, replacing that section of the code with:

if np.any((sample > self.u_bounds) | (sample < self.l_bounds)):
    return False

However, this actually worsened its performance. What should I do?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would guess that it's a matter of size of the array. Do you have the numbers? I would expect that testing on a larger array would make the NumPy version faster. So I would use the NumPy version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before:

=== ======== ============= ========== ========== ==========
--                                        n                
-------------------------- --------------------------------
d   radius   ncandidates      30        100        300    
=== ======== ============= ========== ========== ==========
1    0.2          30       3.13±0ms   2.39±0ms   2.73±0ms 
1    0.2          60       4.65±0ms   3.99±0ms   3.94±0ms 
1    0.2         120       6.04±0ms   6.01±0ms   5.92±0ms 
1    0.1          30       4.00±0ms   3.97±0ms   3.94±0ms 
1    0.1          60       6.71±0ms   6.69±0ms   6.67±0ms 
1    0.1         120       12.6±0ms   12.7±0ms   12.6±0ms 
1    0.05         30       7.72±0ms   7.17±0ms   7.32±0ms 
1    0.05         60       13.4±0ms   13.1±0ms   13.3±0ms 
1    0.05        120       29.5±0ms   25.2±0ms   25.0±0ms 
3    0.2          30       5.35±0ms   52.3±0ms   52.1±0ms 
3    0.2          60       6.20±0ms   99.9±0ms   104±0ms  
3    0.2         120       8.81±0ms   158±0ms    222±0ms  
3    0.1          30       4.69±0ms   19.6±0ms   85.1±0ms 
3    0.1          60       6.45±0ms   26.1±0ms   105±0ms  
3    0.1         120       9.11±0ms   37.0±0ms   195±0ms  
3    0.05         30       4.19±0ms   18.2±0ms   62.9±0ms 
3    0.05         60       5.59±0ms   24.4±0ms   86.4±0ms 
3    0.05        120       6.91±0ms   42.2±0ms   166±0ms  
5    0.2          30       6.37±0ms   19.1±0ms   61.8±0ms 
5    0.2          60       6.69±0ms   20.0±0ms   76.1±0ms 
5    0.2         120       7.29±0ms   26.4±0ms   98.6±0ms 
5    0.1          30       23.7±0ms   35.7±0ms   83.4±0ms 
5    0.1          60       23.4±0ms   40.4±0ms   95.4±0ms 
5    0.1         120       23.2±0ms   46.3±0ms   108±0ms  
5    0.05         30       487±0ms    503±0ms    566±0ms  
5    0.05         60       489±0ms    506±0ms    579±0ms  
5    0.05        120       488±0ms    508±0ms    581±0ms  
=== ======== ============= ========== ========== ==========

After:

=== ======== ============= ========== ========== ==========
--                                        n                
-------------------------- --------------------------------
d   radius   ncandidates      30        100        300    
=== ======== ============= ========== ========== ==========
1    0.2          30       3.05±0ms   3.04±0ms   2.99±0ms 
1    0.2          60       5.17±0ms   5.13±0ms   5.11±0ms 
1    0.2         120       8.16±0ms   8.20±0ms   8.13±0ms 
1    0.1          30       4.93±0ms   4.97±0ms   5.01±0ms 
1    0.1          60       8.32±0ms   8.40±0ms   8.35±0ms 
1    0.1         120       16.3±0ms   16.3±0ms   16.4±0ms 
1    0.05         30       8.84±0ms   8.88±0ms   8.91±0ms 
1    0.05         60       17.1±0ms   16.2±0ms   16.5±0ms 
1    0.05        120       32.6±0ms   32.3±0ms   32.9±0ms 
3    0.2          30       6.16±0ms   61.7±0ms   61.9±0ms 
3    0.2          60       7.12±0ms   121±0ms    128±0ms  
3    0.2         120       10.3±0ms   187±0ms    254±0ms  
3    0.1          30       5.41±0ms   22.0±0ms   95.9±0ms 
3    0.1          60       7.28±0ms   29.6±0ms   121±0ms  
3    0.1         120       10.2±0ms   42.1±0ms   221±0ms  
3    0.05         30       4.44±0ms   19.7±0ms   70.9±0ms 
3    0.05         60       6.10±0ms   25.7±0ms   98.2±0ms 
3    0.05        120       7.59±0ms   49.2±0ms   184±0ms  
5    0.2          30       6.27±0ms   20.1±0ms   66.0±0ms 
5    0.2          60       7.12±0ms   22.8±0ms   81.1±0ms 
5    0.2         120       7.63±0ms   28.1±0ms   104±0ms  
5    0.1          30       24.1±0ms   36.3±0ms   83.7±0ms 
5    0.1          60       23.4±0ms   41.3±0ms   99.7±0ms 
5    0.1         120       23.3±0ms   47.0±0ms   110±0ms  
5    0.05         30       489±0ms    501±0ms    548±0ms  
5    0.05         60       488±0ms    509±0ms    586±0ms  
5    0.05        120       484±0ms    516±0ms    582±0ms  
=== ======== ============= ========== ========== ==========


with np.errstate(divide='ignore'):
self.cell_size = self.radius / np.sqrt(self.d)
self.grid_size = (
np.ceil(np.ones(self.d) / self.cell_size)
np.ceil(self.u_bounds / self.cell_size)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok but then how does that take into account the lower bounds?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On line 2072 ind_min = np.maximum(indices - n, self.l_bounds.astype(int)), this is where the lower bounds is taken into account.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But shouldn't the lower bound also affect the grid size?

Side note: please do not resolve conversations. Maintainers in SciPy typically like to close themselves at it helps us to keep track of our review.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct. I took a look at the algorithm from the original paper and made some updates to the code. Lower bounds should now be properly taken into account

@GreenMemory16
Copy link
Contributor

@tupui can you take a look at the pr, when you have the chance?

@tupui
Copy link
Member

tupui commented Jun 16, 2024

@tupui can you take a look at the pr, when you have the chance?

Hi @GreenMemory16 I was waiting for you. I would still like to see a few plots demonstrating that it does work as expected (cf my previous message).

For the benchmark, I would be fine as it is because I don't think it's very critical. If we wanted to be sure we would need to do something like d=10 and n=1000. Otherwise that's probably insignificant.

@GreenMemory16
Copy link
Contributor

Ah, yes. I'll do it as soon as possible. Sorry about that.

@Matryoskas
Copy link
Contributor Author

@tupui Is this good enough?

Before

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.collections import PatchCollection
from scipy.stats import qmc

rng = np.random.default_rng()
radius = 0.2
engine = qmc.PoissonDisk(d=2, radius=radius, seed=rng)
sample = engine.fill_space()
sample = qmc.scale(sample, [0.5, 0.75], [2, 1.5])

fig, ax = plt.subplots()
_ = ax.scatter(sample[:, 0], sample[:, 1])
circles = [plt.Circle((xi, yi), radius=radius/2, fill=False)
           for xi, yi in sample]
collection = PatchCollection(circles, match_original=True)
ax.add_collection(collection)
_ = ax.set(aspect='equal', xlabel=r'$x_1$', ylabel=r'$x_2$',
           xlim=[0.5, 2], ylim=[0.75, 1.5])
plt.show()

before

After

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.collections import PatchCollection
from scipy.stats import qmc

rng = np.random.default_rng()
radius = 0.2
engine = qmc.PoissonDisk(d=2, radius=radius, seed=rng, l_bounds=[0.5, 0.75], u_bounds=[2, 1.5])
sample = engine.fill_space()

fig, ax = plt.subplots()
_ = ax.scatter(sample[:, 0], sample[:, 1])
circles = [plt.Circle((xi, yi), radius=radius/2, fill=False)
           for xi, yi in sample]
collection = PatchCollection(circles, match_original=True)
ax.add_collection(collection)
_ = ax.set(aspect='equal', xlabel=r'$x_1$', ylabel=r'$x_2$',
           xlim=[0.5, 2], ylim=[0.75, 1.5])
plt.show()

after

Copy link
Member

@tupui tupui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you 🙏 this looks good to me now! It's great to have this feature.

@dschmitz89 I think I saw you having a look. Do you have an opinion here? If not I plan to merge next week.

@dschmitz89
Copy link
Contributor

Thank you 🙏 this looks good to me now! It's great to have this feature.

@dschmitz89 I think I saw you having a look. Do you have an opinion here? If not I plan to merge next week.

I commented with a question but noticed my own stupidity a minute later, so deleted it :D. This LGTM.

@tupui
Copy link
Member

tupui commented Jun 20, 2024

No worries 😉 there is no such thing as a stupid question, better ask than stay silent and miss something. So thank you!

Getting this in then!

Thank again everyone 🙌 great work.

@tupui tupui merged commit 4c876b9 into scipy:main Jun 20, 2024
@tupui tupui added this to the 1.15.0 milestone Jun 20, 2024
@lucascolley lucascolley added the needs-release-note a maintainer should add a release note written by a reviewer/author to the wiki label Jun 21, 2024
@lucascolley lucascolley removed the needs-release-note a maintainer should add a release note written by a reviewer/author to the wiki label Jun 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement scipy.stats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: Poisson disk sampling for arbitrary bounds
5 participants