Fixes and improvements coming from test refactoring (part 3) #1848

luisfpereira · 2023-05-05T15:00:47Z

Follows #1828.

Main additions are:

repeat_out: quick way of repeating an output for vectorization consistency when some of the inputs are not used in the computation
AutodiffNotImplementedError exception: will be used later in the tests in a try...except statement to avoid having to skip tests that depend in autodiff when testing for numpy.

Several vectorization inconsistencies are also fixed.

…sing new test framework

ninamiolane

A first batch of comments!

ninamiolane · 2023-05-05T15:04:03Z

geomstats/_backend/numpy/autodiff.py

@@ -4,6 +4,8 @@
 The following functions return error messages.
 """

+from geomstats.exceptions import AutodiffNotImplementedError


Why do we need a special exception, since the MSG is not the exception file?

Either:

Put the message in exception file
OR:

Only use RunTimeError with this MSG?
I'd prefer the second one, as it is easier for the reader.

If we keep that tailored exception, maybe change the name to:
UseAutodiffBackend?
Or
AutodiffNotInNumpy?

The goal of having a special exception is that we can run numpy tests under a context

try: # test except AutodiffNotImplementedError: pass

This way we don't have to skip tests if they require autodiff (which removes a lot of boilerplate code in the tests and is much nicer for the user).

Regarding naming, I think AutodiffNotInNumpy is a better name than UseAutodiffBackend. Between AutodiffNotInNumpy and AutodiffNotImplementedError I probably prefer the latter because if it happens we add another backend that does not support autodiff, we will not have to change the exception name.

ninamiolane · 2023-05-05T15:04:39Z

geomstats/exceptions.py

+
+
+class AutodiffNotImplementedError(RuntimeError):
+    """Raised when autodiff is not implemented."""


I prefered the RuntimeError option, which seems less "over-engineering", but there might be something I am missing there?

geomstats/geometry/euclidean.py

ninamiolane · 2023-05-05T15:09:26Z

geomstats/geometry/grassmannian.py

@@ -117,7 +118,7 @@ def _squared_dist(point_a, point_b, metric):
    _ : array-like, shape=[...,]
        Geodesic distance between point_a and point_b.
    """
-    return metric.private_squared_dist(point_a, point_b)
+    return metric._squared_dist(point_a, point_b)


Ah nice! How did you solve this? The introduction of the very ugly "private_squared_dist" was to get the automatic differentiation with custom gradient work with all backends.

Maybe it was not needed anymore since we dropped TF?

here is really just naming. I'm representing private by _. But nothing changes from a logical perspective.

codecov · 2023-05-05T15:11:16Z

Codecov Report

Merging #1848 (b8d49d3) into master (6f289d3) will decrease coverage by 2.87%.
The diff coverage is 91.33%.

@@            Coverage Diff             @@
##           master    #1848      +/-   ##
==========================================
- Coverage   90.10%   87.22%   -2.87%     
==========================================
  Files         131      126       -5     
  Lines       13180    12402     -778     
==========================================
- Hits        11874    10816    -1058     
- Misses       1306     1586     +280

Flag	Coverage Δ
autograd	`87.22% <91.33%> (?)`
numpy	`?`
pytorch	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
geomstats/_backend/__init__.py	`82.26% <ø> (ø)`
geomstats/exceptions.py	`0.00% <0.00%> (ø)`
geomstats/geometry/_hyperbolic.py	`96.35% <ø> (+1.90%)`	⬆️
geomstats/geometry/discrete_surfaces.py	`93.34% <ø> (-4.00%)`	⬇️
geomstats/geometry/stratified/wald_space.py	`26.39% <0.00%> (-63.82%)`	⬇️
geomstats/learning/aac.py	`45.60% <0.00%> (-52.80%)`	⬇️
.../learning/agglomerative_hierarchical_clustering.py	`100.00% <ø> (ø)`
geomstats/learning/exponential_barycenter.py	`100.00% <ø> (ø)`
geomstats/learning/geodesic_regression.py	`83.77% <ø> (+3.90%)`	⬆️
geomstats/visualization/hypersphere.py	`69.92% <ø> (-0.26%)`	⬇️
... and 75 more

... and 22 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

ninamiolane · 2023-05-05T15:11:23Z

geomstats/geometry/invariant_metric.py

-        """Reshape diagonal metric matrix to a symmetric matrix of size n.
+    @property
+    def reshaped_metric_matrix(self):
+        """Diagonal metric matrix reshaped to a symmetric matrix of size n.


Does the rule "Docstring starts with verb at infinitive" not apply for @Property ?

I think not, because we can look to properties a bit like attributes.

ninamiolane · 2023-05-05T15:18:11Z

geomstats/geometry/pre_shape.py

-        flat_bp = gs.reshape(base_point, (-1, sphere_embedding_dim))
-        flat_pt = gs.reshape(point, (-1, sphere_embedding_dim))
-        flat_log = sphere.metric.log(flat_pt, flat_bp)
+        batch_shape = get_batch_shape(self._space, point, base_point)


Why "get_batch_shape"? and not "batch_shape". Is this because it is a defined as a property somewhere?

Even in that case, I don't think that we use "get" for the other properties defined in the library: I would remove to keep the coding style consistent, and conciseness.

https://stackoverflow.com/questions/374763/should-i-use-get-set-prefixes-in-python-method-names

And we generally don't use "get" in the whole codebase, thus if we can avoid it altogether, it'll be better.

this comes from another PR. (see our discussion)

I still think having a verb is better for a function (for me a name represents an object, not a callable). also, as you see in this example, I do batch_shape = get_batch_shape(self._space, point, base_point). if the function didn't have the verb, I would have to do batch_shape_ = batch_shape(self._space, point, base_point) which is more cumbersome.

maybe we can change this in another PR if you really don't want it, since it is unrelated.

geomstats/geometry/riemannian_metric.py

ninamiolane · 2023-05-05T15:20:40Z

geomstats/geometry/spd_matrices.py

-        return trace_a + trace_b - 2 * trace_prod
+        squared_dist = trace_a + trace_b - 2.0 * trace_prod
+
+        return gs.where(squared_dist < 0.0, 0.0, squared_dist)


Why would the squared_dist be < 0?

Are we sure that it is a numerical issue, and not a problem in the math in the code?

If it is the later, then the gs.where would hide that bug, that would otherwise be caught by unit tests.

On my tests the negative values were always super close to zero (e.g. -1e-12). the problem seems to be the tolerance in gs.linalg.sqrtm.

Your point is very good though!

ninamiolane

Awesome!

There seem to be vectorization errors in the examples: fix these?

Another high level comments:

Add small explanations for vectorization logics in docstrings
Why "get" and "_get"? If we can avoid using "get" that could be great, otherwise what is the clear logic jsutifying its use? Anything that is "engineering related" and not "math related"?

ninamiolane · 2023-05-05T15:25:18Z

geomstats/vectorization.py

@@ -106,6 +111,36 @@ def repeat_point(point, n_reps=2, expand=False):
    return gs.repeat(gs.expand_dims(point, 0), n_reps, axis=0)


+def _is_not_none(value):


Why do we need this?

to use with filter in some of the functions of this module.

geomstats/vectorization.py

ninamiolane · 2023-05-05T15:28:26Z

geomstats/vectorization.py

+    out : array-like
+        If no batch, then input is returned. Otherwise it is broadcasted.
+    """
+    points = filter(_is_not_none, points)


I have trouble reading through this because some private functions do not have docstrings (as is conventional) and the public functions have short docstrings.

Any chance you could add details + 1-2 examples in the docstrings, even if private ones, to explain the logic?

I was also confused in the _get_max_ndim and the line point_max_ndim = point[0]

--> why is the ndim an element of point?

docstrings would help for this.

I'll improve the docstrings.

The main idea is that I want to have these methods working for the cases where point is None, because in a lot of our methods we have something like base_point=None and then I want to use something like repeat_out(self.space, out, base_point) without having to check if there's None (it simplifies a lot the vectorization logic there).

therefore, before checking batch_shape, I remove the None from the list of points (that's the role of filter).

ninamiolane · 2023-05-05T15:29:40Z

geomstats/vectorization.py

+        If no batch, then input is returned. Otherwise it is broadcasted.
+    """
+    points = filter(_is_not_none, points)
+    batch_shape = get_batch_shape(space, *points)


Why the "get_" wording?

ninamiolane · 2023-05-05T15:30:33Z

tests/data/sasaki_metric_data.py

@@ -26,7 +26,9 @@ class SasakiMetricTestData(TestData):
    def inner_product_test_data(self):
        _sqrt2 = 1.0 / gs.sqrt(2.0)
        base_point = gs.array([[_sqrt2, -_sqrt2, 0], [_sqrt2, _sqrt2, 1]])
-        _log = self.sas_sphere_metric.log(gs.array([self.pu0, self.pu1]), base_point)
+        end_point = gs.stack([self.pu0, self.pu1])


NIT: what is pu0, pu1? what is sqrt2? the names could be more self explanatory.

This comes from a very old PR. I'll take that into account in the new tests.

luisfpereira added 9 commits April 7, 2023 18:00

Fix several metrics (especially vectorization) after detecting bugs u…

ca6b6ba

…sing new test framework

Fix invariant metric (especially vectorization)

da5495f

Add AutodiffNotImplementedError

5bf7173

Fix special_euclidean vectorizations

52a5683

Fix poincare_ball vectorizations

2a3d7cb

Fix PreShapeMetric vectorizations

ea52004

Merge branch 'metric-refactor' into fixes-from-test

2739146

Fix SasakiMetric log vectorization

1799d56

Fixes due to use of repeat_out in matrices

4152455

luisfpereira requested a review from ninamiolane May 5, 2023 15:00

ninamiolane reviewed May 5, 2023

View reviewed changes

ninamiolane approved these changes May 5, 2023

View reviewed changes

luisfpereira added 2 commits May 5, 2023 18:24

Fix test failures in learning methods

4b0fd5c

Address Nina's comments

b8d49d3

luisfpereira merged commit 62d0ab0 into geomstats:master May 5, 2023

luisfpereira deleted the fixes-from-test branch May 9, 2023 07:32

luisfpereira mentioned this pull request Sep 6, 2023

Fixes and improvements coming from test refactoring (part 4) #1885

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fixes and improvements coming from test refactoring (part 3) #1848

Fixes and improvements coming from test refactoring (part 3) #1848

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!



		class AutodiffNotImplementedError(RuntimeError):
		"""Raised when autodiff is not implemented."""

		@@ -106,6 +111,36 @@ def repeat_point(point, n_reps=2, expand=False):
		return gs.repeat(gs.expand_dims(point, 0), n_reps, axis=0)


		def _is_not_none(value):

Fixes and improvements coming from test refactoring (part 3) #1848

Fixes and improvements coming from test refactoring (part 3) #1848

Conversation

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!