8000 Simplify creation of identity matrices in the Siegel manifold by YannCabanes · Pull Request #1918 · geomstats/geomstats · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Simplify creation of identity matrices in the Siegel manifold #1918

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Jan 19, 2024

Conversation

YannCabanes
Copy link
Collaborator

The function siegel.metric.isometry returns an error when used on PyTorch GPU.

When I run the following code in a Google Colab environment with the execution type "T4 GPU":

%env GEOMSTATS_BACKEND=pytorch

try:
    import geomstats
except ImportError:
    !pip install geomstats["pytorch"]

import geomstats
import geomstats.backend as gs
import torch
import torch.cuda

from geomstats.geometry.siegel import Siegel, SiegelMetric

device = torch.device("cuda:0")

point = torch.tensor([[0.5 + 0.2j, 0], [0, 0.5]], requires_grad=True, device=device)
point_to_zero = torch.tensor([[-0.5, 0], [0.1j, 0]], requires_grad=True, device=device)

siegel = Siegel(n=2)
isometry = siegel.metric.isometry(point=point, point_to_zero=point_to_zero)

I obtain the following error message:

env: GEOMSTATS_BACKEND=pytorch

---------------------------------------------------------------------------

RuntimeError                              Traceback (most recent call last)

[<ipython-input-2-7790184e609e>](https://localhost:8080/#) in <cell line: 23>()
     21 
     22 siegel = Siegel(n=2)
---> 23 isometry = siegel.metric.isometry(point=point, point_to_zero=point_to_zero)

[/usr/local/lib/python3.10/dist-packages/geomstats/geometry/siegel.py](https://localhost:8080/#) in isometry(point, point_to_zero)
    340         aux_1 = gs.matmul(point_to_zero, point_to_zero_transconj)
    341         aux_2 = gs.matmul(point_to_zero_transconj, point_to_zero)
--> 342         aux_3 = identity - aux_1
    343         aux_4 = identity - aux_2
    344         factor_1 = HermitianMatrices.powerm(aux_3, -1 / 2)

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Indeed, the identity matrix created with the function _create_identity_mat is always created on CPU.
To fix this error, this PR replaces the function _create_identity_mat by the function _eye_like.

@YannCabanes
Copy link
Collaborator Author
YannCabanes commented Dec 21, 2023

Once the function _create_identity_mat is replaced by the function _eye_like, I still have the following error message using PyTorch GPU:

---------------------------------------------------------------------------

RuntimeError                              Traceback (most recent call last)

[<ipython-input-18-072e204274b2>](https://localhost:8080/#) in <cell line: 1>()
----> 1 isometry = siegel_2.metric.isometry(point=a, point_to_zero=b)

5 frames

[<ipython-input-14-98f9d1188d32>](https://localhost:8080/#) in isometry(point, point_to_zero)
    347         aux_3 = identity - aux_1
    348         aux_4 = identity - aux_2
--> 349         factor_1 = powermh(aux_3, -1 / 2)
    350         factor_4 = powermh(aux_4, 1 / 2)
    351         factor_2 = point - point_to_zero

[/usr/local/lib/python3.10/dist-packages/geomstats/geometry/hermitian_matrices.py](https://localhost:8080/#) in powerm(cls, mat, power)
    224                 return gs.power(ev, power)
    225 
--> 226         return cls.apply_func_to_eigvals(mat, power_, check_positive=False)
    227 
    228     @staticmethod

[/usr/local/lib/python3.10/dist-packages/geomstats/geometry/hermitian_matrices.py](https://localhost:8080/#) in apply_func_to_eigvals(mat, function, check_positive)
    264         for fun in function:
    265             eigvals_f = fun(eigvals)
--> 266             eigvals_f = algebra_utils.from_vector_to_diagonal_matrix(eigvals_f)
    267             reconstruction.append(Matrices.mul(eigvecs, eigvals_f, transconj_eigvecs))
    268         return reconstruction if return_list else reconstruction[0]

[/usr/local/lib/python3.10/dist-packages/geomstats/algebra_utils.py](https://localhost:8080/#) in from_vector_to_diagonal_matrix(vector, num_diag)
    117     identity = gs.eye(num_columns)
    118     identity = gs.cast(identity, vector.dtype)
--> 119     diagonals = gs.einsum("...i,ij->...ij", vector, identity)
    120     diagonals = gs.to_ndarray(diagonals, to_ndim=3)
    121     num_lines = diagonals.shape[0]

[/usr/local/lib/python3.10/dist-packages/geomstats/_backend/pytorch/__init__.py](https://localhost:8080/#) in einsum(equation, *inputs)
    331     input_tensors_list = convert_to_wider_dtype(input_tensors_list)
    332 
--> 333     return _torch.einsum(equation, *input_tensors_list)
    334 
    335 

[/usr/local/lib/python3.10/dist-packages/torch/functional.py](https://localhost:8080/#) in einsum(*args)
    375         # the path for contracting 0 or 1 time(s) is already optimized
    376         # or the user has disabled using opt_einsum
--> 377         return _VF.einsum(equation, operands)  # type: ignore[attr-defined]
    378 
    379     path = None

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

This error comes from the identity matrix created in the function from_vector_to_diagonal_matrix of the file algebra_utils.py.
Indeed, this indentity matrix is always created on CPU.

@YannCabanes
Copy link
Collaborator Author

Hello @ninamiolane, @SaitejaUtpala and @luisfpereira,
A friend of mine is willing to use the SiegelMetric methods on PyTorch GPUs.
Is geomstats willing to support the use of PyTorch GPUs?
If yes, is there a way to test the codes on GPUs?
I do not have GPUs on my local computer and only have limited access to GPUs on Google Colab.
From what I have understood, it is not possible to use GPUs with GitHub continuous integration tests. 8000
Is there another way to test geomstats code? I have noticed a few lines of code to modify to enable the use of PyTorch GPUs.

@luisfpereira luisfpereira self-requested a review December 22, 2023 10:42
Copy link
Collaborator
@luisfpereira luisfpereira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please address my comments? After we change accordingly, everything related with GPUs has to be solved at the backend level, though I think it should probably work fine already.

@@ -231,7 +228,7 @@ def inner_product(self, tangent_vec_a, tangent_vec_b, base_point):
inner_product : array-like, shape=[...,]
Inner-product.
"""
identity = _create_identity_mat(base_point.shape, dtype=base_point.dtype)
identity = _eye_like(base_point)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking back to the code, I've realized that every time we use this function, we actually only need to create a (n, n) identity matrix with same type as base_point. Therefore, I suggest we remove the function _eye_like and do:

identity = gs.eye(space.shape, dtype=base.dtype)

I think it will work fine gpu.

Copy link
Collaborator Author
@YannCabanes YannCabanes Dec 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @luisfpereira,

No, using:
identity = gs.eye(space.shape, dtype=base.dtype)
is not sufficient for two reasons.

The first reason is the shape.
The first input of the function torch.eye should be an integer n, not a shape.
Then, a 2D tensor is returned.
I would like my identity matrix to be 2D or 3D (in this case it is a repetition of 2D identity matrices along the first axis), depending on the shape of the input arrays of my function.

The second reason is the device.
When using the PyTorch backend with GPUs enable, the following code:

a = torch.tensor([[0.5 + 0.2j, 0], [0, 0.5]], requires_grad=True, device=device)

returns

tensor([[0.5000+0.2000j, 0.0000+0.0000j],
        [0.0000+0.0000j, 0.5000+0.0000j]], device='cuda:0', requires_grad=True)

Then

gs.eye(a.shape[0], dtype=a.dtype).device

returns

device(type='cpu')

which is not what we want.

However,

gs.eye(a.shape[0], dtype=a.dtype, device=a.device).device

returns

device(type='cuda', index=0)

which is what we want.

So it is not sufficient to use the dtype optional input argument when using the PyTorch backend on GPUs, we have to use the device optional input argument.
However, note that for the NumPy backend, np.eye do not accept a device argument.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It think that the easiest way to do what I want is to use _eye_like as I have proposed, using the function gs.zeros_like which is designed to create a zero matrix compatible with the input matrix for both the NumPy and the PyTorch backend.

Copy link
Collaborator
@luisfpereira luisfpereira Dec 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the shapes, even though it looks like you need to create 3d arrays, actually it suffices (and is better) to create 2d arrays, as the operations where they are used take advantage of broadcasting (thus, we save some memory).

For the device, I'm not suggesting passing device as input, I'm suggesting that when we set globally the device to be the GPU, then any tensor is created by default on the GPU. I thought this was actually the behavior of pytorch. But if not, we can easily implement this behavior in the backend, only concerning pytorch.
This is the part that we need to test, probably not the full code.

Comment on lines 117 to 122
identity = gs.repeat(
gs.zeros_like(gs.reshape(vector, (-1, num_columns))[:1, :]),
repeats=num_columns,
axis=0,
)
identity[range(num_columns), range(num_columns)] = 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

identity = gs.eye(num_cols, dtype=vector.dtype suffices, no?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it is not sufficient for two reasons:

  • the shape of the output would be 2D, we want it to be 2D or 3D.
  • the device would always be CPU.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

@luisfpereira
Copy link
Collaborator

Thanks @YannCabanes (see my review up).

GPUs are in our head, but it is very unlikely that we can test code on them with github continuous integration (as far as I know, we can only access CPUs).

We can definitely start creating some tests to run locally and mark them as gpu such that once in a while we can manually see if they're passing (I have access to GPU locally).

My first intuition, is that we need to be careful mostly with the backend functions that create arrays, to ensure they are created in the right device. Most of it should be already taken care of by pytorch, but as you see in your code, we still have places where we haven't thought about this.

Anyway, I think a good starting point is to solve the problem in the places we already know. Then we need to find out a strategy to identify places where we've done similar mistakes.

Copy link
codecov bot commented Dec 22, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (806764b) 90.29% compared to head (a97ec99) 90.55%.
Report is 20 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1918      +/-   ##
==========================================
+ Coverage   90.29%   90.55%   +0.27%     
==========================================
  Files         143      148       +5     
  Lines       13407    13554     +147     
==========================================
+ Hits        12105    12273     +168     
+ Misses       1302     1281      -21     
Flag Coverage Δ
autograd 86.65% <100.00%> (?)
numpy 89.04% <100.00%> (-0.06%) ⬇️
pytorch 84.56% <100.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@YannCabanes
Copy link
Collaborator Author

Hello @luisfpereira,

Would you like to start adding PyTorch GPU tests in this PR?
If yes, what should be the structure of the PyTorch GPU tests? I have two suggestions:

  1. We could create a specific folder for the PyTorch GPU tests. This folder would not be run by the GitHub continuous integration tests. And it would be easier to run only the PyTorch GPU tests on your local machine.
  2. We could create a decorator, like @pytorch_gpu_only.

What do you think?

@luisfpereira
Copy link
Collaborator

I think we can add the tests in a separate PR, as it is preferable to solve this first.

Then, I suggest we add them in the backends folder. pytest has the notion of marker. We can add the marker gpu (then we do @pytest.mark.gpu) and from outside we can control if we choose to run them or not (e.g. `pytest tests/ -m "not gpu").

I would like to add precise tests for the gpu, as it is costly to be always running too many (reduntant?) tests. Ideally, we want to check the device where some tensors live.

@YannCabanes
Copy link
Collaborator Author
YannCabanes commented Dec 22, 2023

Hello @luisfpereira,

It is fine for me to add the tests in a different PR.

The modifications of the code you are suggesting to create the identity matrices using:
gs.eye(point.shape[-1], dtype=point.dtype)
are working if the user writes:
torch.set_default_device('cuda')
before calling the function siegel.metric.isometry.

However, with the modifications that you suggested, if the user does not write torch.set_default_device('cuda') and only writes:

siegel = Siegel(n=2)
a = torch.tensor([[0.5 + 0.2j, 0], [0, 0.5]], requires_grad=True, device=device)
b = torch.tensor([[-0.5, 0], [0.1j, 0]], requires_grad=True, device=device)
isometry = siegel.metric.isometry(point=a, point_to_zero=b)

then the following error is returned:

---------------------------------------------------------------------------

RuntimeError                              Traceback (most recent call last)

[<ipython-input-12-02cb729306a7>](https://localhost:8080/#) in <cell line: 1>()
----> 1 isometry = siegel.metric.isometry(point=a, point_to_zero=b)
      2 print(isometry)

[/usr/local/lib/python3.10/dist-packages/geomstats/geometry/siegel.py](https://localhost:8080/#) in isometry(point, point_to_zero)
    344         aux_1 = gs.matmul(point_to_zero, point_to_zero_transconj)
    345         aux_2 = gs.matmul(point_to_zero_transconj, point_to_zero)
--> 346         aux_3 = identity - aux_1
    347         aux_4 = identity - aux_2
    348         factor_1 = HermitianMatrices.powerm(aux_3, -1 / 2)

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Are we willing to fix this error for the users who do not use torch.set_default_device('cuda')?
I would like to fix it. For this, it is not sufficient to use gs.eye(point.shape[-1], dtype=point.dtype).
What do you think?

@luisfpereira
Copy link
Collaborator

With the current design of the backend (which does not take into account a device), we have to assume that the user will do torch.set_default_device('cuda') or run the code under a context e.g. (see https://pytorch.org/docs/stable/tensor_attributes.html#torch.device):

with torch.device('cuda:1'):
    r = torch.randn(2, 3)

In the long run we can try something out where the innerly created tensors check the device of the arrays of the computations they'll be involved in. But we definitely need to make this change in a principled way and probably take a look to how it is done in other backends, to ensure we can also use them in the future (thinking about jax).

@YannCabanes
Copy link
Collaborator Author

Hello @luisfpereira,
Could you review this PR please?
I have finally followed your initial suggestions.

@luisfpereira luisfpereira self-requested a review January 9, 2024 07:53
@YannCabanes YannCabanes changed the title Fix errors in the Siegel manifold when used on PyTorch GPU Simplify creation of identity matrices in the Siegel manifold Jan 10, 2024
@luisfpereira
Copy link
Collaborator

Thanks @YannCabanes. Merging.

@luisfpereira luisfpereira merged commit c887e2d into geomstats:main Jan 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0