-
Notifications
You must be signed in to change notification settings - Fork 834
Unify softmax implementations #826
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #826 +/- ##
==========================================
+ Coverage 96.63% 96.73% +0.09%
==========================================
Files 64 65 +1
Lines 5081 5077 -4
Branches 879 880 +1
==========================================
+ Hits 4910 4911 +1
+ Misses 88 85 -3
+ Partials 83 81 -2
|
There's also softmin being used for segmentation right? |
Could you once run each of the main user facing methods (ie find_label_issues analog from each tutorial notebook) and verify the outputs have not changed before vs after this PR? I'm not 100% confident any introduced mathematical error would be caught without this manual check |
- Update type hint for `min_entropy_ind` from built-in `int` to `np.intp`. - This refinement addresses a type compatibility warning. - `np.intp` is the integer type used by numpy for indexing and can differ in size from the built-in Python `int` depending on the platform (32-bit vs 64-bit). - Mypy highlighted this type hint discrepancy.
Verified that the outputs of the corresponding methods are not affected. |
Summary
This PR refactors the usage of the softmax function across various modules within the cleanlab package. A dedicated softmax function is introduced to enhance code reusability and improve numerical stability, ensuring the codebase remains robust and maintainable.
Changes
softmax
utility function: Located incleanlab/internal/numerics.py
, this function includes options for softmax temperature, selection of axis, and numeric stability shift.softmax
utility function across multiple modules (multiannotator_utils, multilabel_scorer, object_detection_utils, and token_classification/rank).find_best_temp_scaler
logic: Refactored the temperature scaling in the multiannotator_utils module to use the new softmax function. This also led to the introduction of the helper function_set_fine_search_range
to better organize and segregate logic.The PR assumes that the current testing suite covers these modules and that any potential deviations would be flagged.
It does not treat related implementations like "softmin", etc.
Usage
The
softmax
function can be used for both 1D and 2D arrays.Here's a quick demonstration:
Outputs: