-
Notifications
You must be signed in to change notification settings - Fork 86
Questions about fit_regression
function
#422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
In and of itself that sounds really slow. But may I ask, how big is the model, as in the shape of on sample evaluation? If it is big, that could make sense. |
The polynomials shape with 6 parameters in the 4-th order are of shape (6, 210). For the PCE, I then sample 210 points from the joint PDF with the 'latin-hypercube' sampling method, and evaluate the model. The output of the model is a (210, 1000) shaped array. Fitting a regression with this size takes pretty long. I was able to reduce times by being more smart about the model output shape. Some of my simulations can have less than 1000 output points and I can save time like this. But some others don't because of the output features. So in this process I came up with more questions:
Sorry for the long post, and thanks for answering back! |
210 000 should not be an issue performence whice. Using default method of lstsq should do that fast. If it is slow, that sounds like an issue with numpy. Just a sanity check: what is your output's
|
So the signature this is the way I am calling fit regression.
|
Gaussian sampling, don't refer to Gaussian distribution, but optimal Gaussian quadrature. If you want to use psuedo-spectral approach, use Clenshaw-curtis for your samples. Fast, simple, stable, well tailored for uniform distributions, and most of all: nested nearly perfectly with sparse grid. Your distributions seem to be a bit ill-posed. as With crashes, I think you should use regression. You will have a better chance of getting good results in case you need to remove some samples. |
I am trying to build a surrogate model using PCE for 6 unknown variables and 4-th order polynomials.
I build my samples using the latin-hypercube sampling and evaluate all the samples with my models. I have succesfully parallelize the evaluation procedure and I am able to run 200-ish evaluations (6 variablas and 4-th order polynomials) in few minutes. However the last step of the procedure, i.e., creating a surrogate model and obtaining the sobol indices takes a long time, about the same time it takes to run the evaluations.
I would like to know if this is normal. Maybe I am messing up the multiprocessing and there are some problems with that.
The text was updated successfully, but these errors were encountered: