Simplify creation of identity matrices in the Siegel manifold #1918

YannCabanes · 2023-12-21T15:13:26Z

The function siegel.metric.isometry returns an error when used on PyTorch GPU.

When I run the following code in a Google Colab environment with the execution type "T4 GPU":

%env GEOMSTATS_BACKEND=pytorch

try:
    import geomstats
except ImportError:
    !pip install geomstats["pytorch"]

import geomstats
import geomstats.backend as gs
import torch
import torch.cuda

from geomstats.geometry.siegel import Siegel, SiegelMetric

device = torch.device("cuda:0")

point = torch.tensor([[0.5 + 0.2j, 0], [0, 0.5]], requires_grad=True, device=device)
point_to_zero = torch.tensor([[-0.5, 0], [0.1j, 0]], requires_grad=True, device=device)

siegel = Siegel(n=2)
isometry = siegel.metric.isometry(point=point, point_to_zero=point_to_zero)

I obtain the following error message:

env: GEOMSTATS_BACKEND=pytorch

---------------------------------------------------------------------------

RuntimeError                              Traceback (most recent call last)

[<ipython-input-2-7790184e609e>](https://localhost:8080/#) in <cell line: 23>()
     21 
     22 siegel = Siegel(n=2)
---> 23 isometry = siegel.metric.isometry(point=point, point_to_zero=point_to_zero)

[/usr/local/lib/python3.10/dist-packages/geomstats/geometry/siegel.py](https://localhost:8080/#) in isometry(point, point_to_zero)
    340         aux_1 = gs.matmul(point_to_zero, point_to_zero_transconj)
    341         aux_2 = gs.matmul(point_to_zero_transconj, point_to_zero)
--> 342         aux_3 = identity - aux_1
    343         aux_4 = identity - aux_2
    344         factor_1 = HermitianMatrices.powerm(aux_3, -1 / 2)

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Indeed, the identity matrix created with the function _create_identity_mat is always created on CPU.
To fix this error, this PR replaces the function _create_identity_mat by the function _eye_like.

YannCabanes · 2023-12-21T15:37:53Z

Once the function _create_identity_mat is replaced by the function _eye_like, I still have the following error message using PyTorch GPU:

---------------------------------------------------------------------------

RuntimeError                              Traceback (most recent call last)

[<ipython-input-18-072e204274b2>](https://localhost:8080/#) in <cell line: 1>()
----> 1 isometry = siegel_2.metric.isometry(point=a, point_to_zero=b)

5 frames

[<ipython-input-14-98f9d1188d32>](https://localhost:8080/#) in isometry(point, point_to_zero)
    347         aux_3 = identity - aux_1
    348         aux_4 = identity - aux_2
--> 349         factor_1 = powermh(aux_3, -1 / 2)
    350         factor_4 = powermh(aux_4, 1 / 2)
    351         factor_2 = point - point_to_zero

[/usr/local/lib/python3.10/dist-packages/geomstats/geometry/hermitian_matrices.py](https://localhost:8080/#) in powerm(cls, mat, power)
    224                 return gs.power(ev, power)
    225 
--> 226         return cls.apply_func_to_eigvals(mat, power_, check_positive=False)
    227 
    228     @staticmethod

[/usr/local/lib/python3.10/dist-packages/geomstats/geometry/hermitian_matrices.py](https://localhost:8080/#) in apply_func_to_eigvals(mat, function, check_positive)
    264         for fun in function:
    265             eigvals_f = fun(eigvals)
--> 266             eigvals_f = algebra_utils.from_vector_to_diagonal_matrix(eigvals_f)
    267             reconstruction.append(Matrices.mul(eigvecs, eigvals_f, transconj_eigvecs))
    268         return reconstruction if return_list else reconstruction[0]

[/usr/local/lib/python3.10/dist-packages/geomstats/algebra_utils.py](https://localhost:8080/#) in from_vector_to_diagonal_matrix(vector, num_diag)
    117     identity = gs.eye(num_columns)
    118     identity = gs.cast(identity, vector.dtype)
--> 119     diagonals = gs.einsum("...i,ij->...ij", vector, identity)
    120     diagonals = gs.to_ndarray(diagonals, to_ndim=3)
    121     num_lines = diagonals.shape[0]

[/usr/local/lib/python3.10/dist-packages/geomstats/_backend/pytorch/__init__.py](https://localhost:8080/#) in einsum(equation, *inputs)
    331     input_tensors_list = convert_to_wider_dtype(input_tensors_list)
    332 
--> 333     return _torch.einsum(equation, *input_tensors_list)
    334 
    335 

[/usr/local/lib/python3.10/dist-packages/torch/functional.py](https://localhost:8080/#) in einsum(*args)
    375         # the path for contracting 0 or 1 time(s) is already optimized
    376         # or the user has disabled using opt_einsum
--> 377         return _VF.einsum(equation, operands)  # type: ignore[attr-defined]
    378 
    379     path = None

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

This error comes from the identity matrix created in the function from_vector_to_diagonal_matrix of the file algebra_utils.py.
Indeed, this indentity matrix is always created on CPU.

…g PyTorch GPU

YannCabanes · 2023-12-21T21:58:02Z

Hello @ninamiolane, @SaitejaUtpala and @luisfpereira,
A friend of mine is willing to use the SiegelMetric methods on PyTorch GPUs.
Is geomstats willing to support the use of PyTorch GPUs?
If yes, is there a way to test the codes on GPUs?
I do not have GPUs on my local computer and only have limited access to GPUs on Google Colab.
From what I have understood, it is not possible to use GPUs with GitHub continuous integration tests. 8000
Is there another way to test geomstats code? I have noticed a few lines of code to modify to enable the use of PyTorch GPUs.

luisfpereira

Can you please address my comments? After we change accordingly, everything related with GPUs has to be solved at the backend level, though I think it should probably work fine already.

luisfpereira · 2023-12-22T12:05:38Z

geomstats/geometry/siegel.py

@@ -231,7 +228,7 @@ def inner_product(self, tangent_vec_a, tangent_vec_b, base_point):
        inner_product : array-like, shape=[...,]
            Inner-product.
        """
-        identity = _create_identity_mat(base_point.shape, dtype=base_point.dtype)
+        identity = _eye_like(base_point)


looking back to the code, I've realized that every time we use this function, we actually only need to create a (n, n) identity matrix with same type as base_point. Therefore, I suggest we remove the function _eye_like and do:

identity = gs.eye(space.shape, dtype=base.dtype)

I think it will work fine gpu.

Hello @luisfpereira,

No, using:
identity = gs.eye(space.shape, dtype=base.dtype)
is not sufficient for two reasons.

The first reason is the shape.
The first input of the function torch.eye should be an integer n, not a shape.
Then, a 2D tensor is returned.
I would like my identity matrix to be 2D or 3D (in this case it is a repetition of 2D identity matrices along the first axis), depending on the shape of the input arrays of my function.

The second reason is the device.
When using the PyTorch backend with GPUs enable, the following code:

a = torch.tensor([[0.5 + 0.2j, 0], [0, 0.5]], requires_grad=True, device=device)

returns

tensor([[0.5000+0.2000j, 0.0000+0.0000j], [0.0000+0.0000j, 0.5000+0.0000j]], device='cuda:0', requires_grad=True)

Then

gs.eye(a.shape[0], dtype=a.dtype).device

returns

device(type='cpu')

which is not what we want.

However,

gs.eye(a.shape[0], dtype=a.dtype, device=a.device).device

returns

device(type='cuda', index=0)

which is what we want.

So it is not sufficient to use the dtype optional input argument when using the PyTorch backend on GPUs, we have to use the device optional input argument.
However, note that for the NumPy backend, np.eye do not accept a device argument.

It think that the easiest way to do what I want is to use _eye_like as I have proposed, using the function gs.zeros_like which is designed to create a zero matrix compatible with the input matrix for both the NumPy and the PyTorch backend.

For the shapes, even though it looks like you need to create 3d arrays, actually it suffices (and is better) to create 2d arrays, as the operations where they are used take advantage of broadcasting (thus, we save some memory).

For the device, I'm not suggesting passing device as input, I'm suggesting that when we set globally the device to be the GPU, then any tensor is created by default on the GPU. I thought this was actually the behavior of pytorch. But if not, we can easily implement this behavior in the backend, only concerning pytorch.
This is the part that we need to test, probably not the full code.

luisfpereira · 2023-12-22T12:07:48Z

geomstats/algebra_utils.py

+    identity = gs.repeat(
+        gs.zeros_like(gs.reshape(vector, (-1, num_columns))[:1, :]),
+        repeats=num_columns,
+        axis=0,
+    )
+    identity[range(num_columns), range(num_columns)] = 1


identity = gs.eye(num_cols, dtype=vector.dtype suffices, no?

No, it is not sufficient for two reasons:

the shape of the output would be 2D, we want it to be 2D or 3D.

the device would always be CPU.

luisfpereira · 2023-12-22T12:16:25Z

Thanks @YannCabanes (see my review up).

GPUs are in our head, but it is very unlikely that we can test code on them with github continuous integration (as far as I know, we can only access CPUs).

We can definitely start creating some tests to run locally and mark them as gpu such that once in a while we can manually see if they're passing (I have access to GPU locally).

My first intuition, is that we need to be careful mostly with the backend functions that create arrays, to ensure they are created in the right device. Most of it should be already taken care of by pytorch, but as you see in your code, we still have places where we haven't thought about this.

Anyway, I think a good starting point is to solve the problem in the places we already know. Then we need to find out a strategy to identify places where we've done similar mistakes.

codecov · 2023-12-22T14:49:46Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (806764b) 90.29% compared to head (a97ec99) 90.55%.
Report is 20 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1918      +/-   ##
==========================================
+ Coverage   90.29%   90.55%   +0.27%     
==========================================
  Files         143      148       +5     
  Lines       13407    13554     +147     
==========================================
+ Hits        12105    12273     +168     
+ Misses       1302     1281      -21

Flag	Coverage Δ
autograd	`86.65% <100.00%> (?)`
numpy	`89.04% <100.00%> (-0.06%)`	⬇️
pytorch	`84.56% <100.00%> (+0.02%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

YannCabanes · 2023-12-22T14:59:10Z

Hello @luisfpereira,

Would you like to start adding PyTorch GPU tests in this PR?
If yes, what should be the structure of the PyTorch GPU tests? I have two suggestions:

We could create a specific folder for the PyTorch GPU tests. This folder would not be run by the GitHub continuous integration tests. And it would be easier to run only the PyTorch GPU tests on your local machine.
We could create a decorator, like @pytorch_gpu_only.

What do you think?

luisfpereira · 2023-12-22T17:37:01Z

I think we can add the tests in a separate PR, as it is preferable to solve this first.

Then, I suggest we add them in the backends folder. pytest has the notion of marker. We can add the marker gpu (then we do @pytest.mark.gpu) and from outside we can control if we choose to run them or not (e.g. `pytest tests/ -m "not gpu").

I would like to add precise tests for the gpu, as it is costly to be always running too many (reduntant?) tests. Ideally, we want to check the device where some tensors live.

YannCabanes · 2023-12-22T23:20:22Z

Hello @luisfpereira,

It is fine for me to add the tests in a different PR.

The modifications of the code you are suggesting to create the identity matrices using:
gs.eye(point.shape[-1], dtype=point.dtype)
are working if the user writes:
torch.set_default_device('cuda')
before calling the function siegel.metric.isometry.

However, with the modifications that you suggested, if the user does not write torch.set_default_device('cuda') and only writes:

siegel = Siegel(n=2)
a = torch.tensor([[0.5 + 0.2j, 0], [0, 0.5]], requires_grad=True, device=device)
b = torch.tensor([[-0.5, 0], [0.1j, 0]], requires_grad=True, device=device)
isometry = siegel.metric.isometry(point=a, point_to_zero=b)

then the following error is returned:

---------------------------------------------------------------------------

RuntimeError                              Traceback (most recent call last)

[<ipython-input-12-02cb729306a7>](https://localhost:8080/#) in <cell line: 1>()
----> 1 isometry = siegel.metric.isometry(point=a, point_to_zero=b)
      2 print(isometry)

[/usr/local/lib/python3.10/dist-packages/geomstats/geometry/siegel.py](https://localhost:8080/#) in isometry(point, point_to_zero)
    344         aux_1 = gs.matmul(point_to_zero, point_to_zero_transconj)
    345         aux_2 = gs.matmul(point_to_zero_transconj, point_to_zero)
--> 346         aux_3 = identity - aux_1
    347         aux_4 = identity - aux_2
    348         factor_1 = HermitianMatrices.powerm(aux_3, -1 / 2)

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Are we willing to fix this error for the users who do not use torch.set_default_device('cuda')?
I would like to fix it. For this, it is not sufficient to use gs.eye(point.shape[-1], dtype=point.dtype).
What do you think?

luisfpereira · 2023-12-23T10:13:00Z

With the current design of the backend (which does not take into account a device), we have to assume that the user will do torch.set_default_device('cuda') or run the code under a context e.g. (see https://pytorch.org/docs/stable/tensor_attributes.html#torch.device):

with torch.device('cuda:1'):
    r = torch.randn(2, 3)

In the long run we can try something out where the innerly created tensors check the device of the arrays of the computations they'll be involved in. But we definitely need to make this change in a principled way and probably take a look to how it is done in other backends, to ensure we can also use them in the future (thinking about jax).

…ils.py

YannCabanes · 2024-01-09T06:21:58Z

Hello @luisfpereira,
Could you review this PR please?
I have finally followed your initial suggestions.

luisfpereira · 2024-01-19T14:24:32Z

Thanks @YannCabanes. Merging.

Change _create_identity_mat into _eye_like for gpu usage

2c7a3b7

YannCabanes added 2 commits December 21, 2023 11:22

Fix error in algebra_utils.py during creation of identity matrix usin…

71baa12

…g PyTorch GPU

Use black on the file algebra_utils.py

a0ee2a7

luisfpereira self-requested a review December 22, 2023 10:42

luisfpereira requested changes Dec 22, 2023

View reviewed changes

YannCabanes added 2 commits December 22, 2023 09:31

Fix error in creation of identity matrix in algebra_utils.py

1cc80e2

Fix error in creation of identity matrix in function _eye_like

07dca37

YannCabanes added 11 commits December 28, 2023 21:15

Modify gs.eye when the pytorch backend is used

c3e505a

Replace _eye_like by gs.eye with like parameter in siegel.py

713a3e6

Create identity matrix using gs.eye in algebra_utils.py

33b8eaf

Use black on siegel.py

8810ad6

Merge if statements

21898d0

Convert the like input argument of gs.eye into gs.array in algebra_ut…

e61e1a2

…ils.py

Improve torch.eye

a0d8c98

Add gs.array in like argument of function eye in siegel.py

86226ff

Remove like argument in gs.eye

527f555

Remove all type arguments from gs.eye in siegel.py

14bdcb3

Remove all type arguments from gs.eye in siegel.py

a97ec99

luisfpereira self-requested a review January 9, 2024 07:53

YannCabanes changed the title ~~Fix errors in the Siegel manifold when used on PyTorch GPU~~ Simplify creation of identity matrices in the Siegel manifold Jan 10, 2024

luisfpereira approved these changes Jan 19, 2024

View reviewed changes

luisfpereira merged commit c887e2d into geomstats:main Jan 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Simplify creation of identity matrices in the Siegel manifold #1918

Simplify creation of identity matrices in the Siegel manifold #1918

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Simplify creation of identity matrices in the Siegel manifold #1918

Simplify creation of identity matrices in the Siegel manifold #1918

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!