8000 Zero copy tensor conversion between xla:gpu and torch.cuda · Issue #4692 · pytorch/xla · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
Zero copy tensor conversion between xla:gpu and torch.cuda #4692
Closed
@cicirori

Description

@cicirori

Currently, switching between lazy and eager can be a huge overhead even when using the same device. This is mainly due to the ir graph execution and the conversion of tensor device types. However, the latter is not necessary, I think it's historical reasons (xrt), which can be seen from the interface name TransferToServer/TransferFromServer. Even if it is from gpu to the same gpu, it must be redirected from the cpu.

I'm implementing a PoC so that xla_tensor.to('cuda') and cuda_tensor.to('xla') are actually zero copy. So far it could running a eager/lazy mixed mnist.

But there should be some problems here, I used _to_copy op but there is no copy actually, I wonder if there will be problems with the backward direction during training.

I am currently considering how to implement zero copy while ensuring correctness, and would like to know if the community has any relevant experience.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0