-
Notifications
You must be signed in to change notification settings - Fork 32
Remove scikit-learn as a dependency #774
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AJTheDataGuy thanks for this PR.
I have made two minor comments for your consideration.
To anyone reviewing this (perhaps @nicholasloveday), AJ has left some additional comments on testing in issue #636 which should be reviewed also. |
Hi @nicholasloveday I just ran some tests. Using a modified version of the testing script I ran 4 tests. All 4 tests used x,y, and weight arrays of 10 million elements each and I used cProfile to time the results for the overall script, running cProfile from the command line. The 4 tests I ran were:
1 run results:
50 run results:
These initial tests suggest Scipy is a nice performance enhance over Scikit-learn for this use case. I've attached the python test files (start with "test") and well as the results files (start with "result") if you'd like to take a look. test_scipy_50.txt result_scikit_1.txt @Steph-Chong I will make those changes soon. |
Thanks for checking that @AJTheDataGuy - that's great that there are some performance improvements! |
Co-authored-by: Stephanie Chong <168800785+Steph-Chong@users.noreply.github.com> Signed-off-by: Tennessee Leeuwenburg <134973832+tennlee@users.noreply.github.com>
643b9e7
to
a5d0ecc
Compare
Hi All - just made some updates and pushed a commit to the feature branch to my fork. Updates:
3A. I noticed that the x array parameter was no longer needed on the _contiguous_mean_ir function so removed that.
Pytest passing all tests at the end. Can someone please rerun the workflows? |
Thanks very much for this work. It's really helpful and it's a great feeling to reduce the number of dependencies. |
* Refactored isotonic regression function from scikit-learn to scipy Please note, this was manually speed tested, and proved to be faster according to that test regime. So this PR reduces complexity and may speed things up at the same time. --------- Signed-off-by: Tennessee Leeuwenburg <134973832+tennlee@users.noreply.github.com> Co-authored-by: AJ Fisher <AJTheDataGuy@gmail.com> Co-authored-by: Tennessee Leeuwenburg <134973832+tennlee@users.noreply.github.com> Co-authored-by: Stephanie Chong <168800785+Steph-Chong@users.noreply.github.com>
Fixes #636 |
* Refactored isotonic regression function from scikit-learn to scipy Please note, this was manually speed tested, and proved to be faster according to that test regime. So this PR reduces complexity and may speed things up at the same time. --------- Signed-off-by: Tennessee Leeuwenburg <134973832+tennlee@users.noreply.github.com> Co-authored-by: AJ Fisher <AJTheDataGuy@gmail.com> Co-authored-by: Tennessee Leeuwenburg <134973832+tennlee@users.noreply.github.com> Co-authored-by: Stephanie Chong <168800785+Steph-Chong@users.noreply.github.com>
Please work through the following checklists. Delete anything that isn't relevant.
Development for new xarray-based metrics
reduce_dims
,preserve_dims
, andweights
args.xr.DataArrays
andxr.Datasets
if possibleDocstrings
Testing of new xarray-based metrics
xr.DataArrays
andxr.Datasets
Tutorial notebook
Documentation