8000 Add Sobol' sampler from SciPy by tupui · Pull Request #519 · SALib/SALib · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Add Sobol' sampler from SciPy #519

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Sep 7, 2022
Merged

Conversation

tupui
Copy link
Member
@tupui tupui commented Aug 5, 2022

Closes #486, closes #450 and should close #478 as if the method does work better (which I am not convinced yet from my tests) should go in SciPy directly.

Following discussions in #486, this adds Sobol' sampler as a new sampler and deprecate salib.sample.saltelli.

Note that SciPy also has LHS (and Halton that can be used otherwise).

This needs SciPy to 1.7 and doing so I updated NumPy and Python minimal versions following NEP29.

This is a draft as I need some guidance on how to proceed next. E.g. I removed the skipping as we can scramble and should have better properties doing so.

cc @ConnectedSystems

@tupui
Copy link
Member Author
tupui commented Aug 5, 2022

I tried to update the ubuntu version in the CI but it would need more work. So probably better not to touch this here.

@ConnectedSystems
Copy link
Member

Thanks @tupui

I will take a closer look when I can (may take a week or two) but at first glance this looks like a very clean re-implementation.

Re skipping vs scrambling, I would prefer to support skipping anyway, as I think the user should be able to decide what to apply.

One thing I would like to see is expanded docstrings with more complete notes to cover the differences (I can do this if you like).

@tupui
Copy link
Member Author
tupui commented Aug 9, 2022

Great thanks @ConnectedSystems. It took me quite some time to do this as well 😉

I will take a closer look when I can (may take a week or two) but at first glance this looks like a very clean re-implementation.

It's highly based on the current Saltelli sampler. I just removed the skipping part.

Re skipping vs scrambling, I would prefer to support skipping anyway, as I think the user should be able to decide what to apply.

But what is the use case? As seen in #518. Skipping bring difficulties if one want's to resample. And it's also a complex logic to keep the properties.

Instead, if the use case is to have different samples, I would suggest adding a seed or random_state parameter. This way users can increment this instead and the scrambler does the correct thing behind the scene.

This brings up the scramble parameter. I added this almost just as a provision in case you want to set a different strategy, but as it is, it should always be True. There is no operational reasons to not scramble the sequence.

One thing I would like to see is expanded docstrings with more complete notes to cover the differences (I can do this if you like).

Sure, I can do this once we settle on the API.

@ConnectedSystems
Copy link
Member

But what is the use case? As seen in #518. Skipping bring difficulties if one want's to resample. And it's also a complex logic to keep the properties.

I understand the reservation - my view is that SALib is useful as a teaching tool as well as a research tool. Someone out there may be relying on it for some purpose (e.g., to show case different strategies). The code itself is useful if one were interested in seeing how skipping might be implemented.

Instead, if the use case is to have different samples, I would suggest adding a seed or random_state parameter

I think seed is best as it matches implementation of other methods.

@tupui
Copy link
Member Author
tupui commented Aug 15, 2022

Ok I can add back the skip parameter following what's in the current Saltelli method. And I will add seed.

@tupui
Copy link
Member Author
tupui commented Aug 17, 2022

I did the update.

Copy link
Member
@ConnectedSystems ConnectedSystems left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, just some minor changes and a question for you.

qrng = qmc.Sobol(d=2*D, scramble=scramble, seed=seed)

# fast-forward logic
if skip_values > 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it acceptable if skip_values > 0 and scramble == True?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whatever the answer, it would be worthwhile to add it to the docstring I think

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we can combine both. I added a note in the docstrings as suggested.

saltelli_sequence = np.zeros([(Dg + 2) * N, D])
index = 0

for i in range(N):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not completely necessary but I suggest we replace these nested loops with itertools.product()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how to go about this. The loops need to update index in the first level. Maybe this can be kept for latter. And actually there might be another way to write these with NumPy operations.

Copy link
Member Author
@tupui tupui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @ConnectedSystems. I updated the PR

qrng = qmc.Sobol(d=2*D, scramble=scramble, seed=seed)

# fast-forward logic
if skip_values > 0:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we can combine both. I added a note in the docstrings as suggested.

saltelli_sequence = np.zeros([(Dg + 2) * N, D])
index = 0

for i in range(N):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how to go about this. The loops need to update index in the first level. Maybe this can be kept for latter. And actually there might be another way to write these with NumPy operations.

@tupui
Copy link
Member Author
tupui commented Aug 24, 2022

@ConnectedSystems the CI failure is also present on main. I propose to fix this in another PR as I update the CI to use GitHub Actions.

@tupui tupui marked this pull request as ready for review August 24, 2022 14:50
@tupui
Copy link
Member Author
tupui commented Sep 6, 2022

I updated the PR with the recent changes in main. Dependencies are already up to date now.

@ConnectedSystems ConnectedSystems merged commit 547b73b into SALib:main Sep 7, 2022
@tupui tupui deleted the qmc_scipy branch September 7, 2022 12:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants
0