8000 Error while trying to run on TPU from VM instance. · Issue #4896 · pytorch/xla · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Error while trying to run on TPU from VM instance. #4896

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
listless-dude opened this issue Apr 18, 2023 · 5 comments
Closed

Error while trying to run on TPU from VM instance. #4896

listless-dude opened this issue Apr 18, 2023 · 5 comments
Assignees
Labels
bug Something isn't working needs reproduction xla:tpu TPU specific issues and PRs

Comments

@listless-dude
Copy link

❓ Questions and Help

I did set up XRT_TPU_CONFIG with the IP address of the TPU.
This is my test.py script

import os
import torch
import torch_xla.core.xla_model as xm

os.environ['XRT_TPU_CONFIG'] = "tpu_worker;0;10.128.0.29:8470"

dev = xm.xla_device() ## Error while executing this line
t1 = torch.randn(3,3,device=dev)
t2 = torch.randn(3,3,device=dev)
print(t1 + t2)

Here's the error:

\2023-04-17 19:35:38.550666: F    5184 tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:1362\] Non-OK-status: session.Run({tensorflow::Output(result, 0)}, &outputs) status: UNIMPLEMENTED: method "RunStep" not implemented
\*\*\* Begin stack trace \*\*\*
tsl::CurrentStackTrace()
xla::XrtComputationClient::InitializeAndFetchTopology(std::string const&, int, std::string const&, tensorflow::ConfigProto const&)
xla::XrtComputationClient::InitializeDevices(std::unique_ptr\<tensorflow::tpu::TopologyProto, std::default_delete\<tensorflow::tpu::TopologyProto\> \>)
xla::XrtComputationClient::XrtComputationClient(xla::XrtComputationClient::Options, std::unique_ptr\<tensorflow::tpu::TopologyProto, std::default_delete\<tensorflow::tpu::TopologyProto\> \>)
xla::ComputationClient::Create()

        xla::ComputationClient::Get()
    
    
        PyCFunction_Call
        _PyObject_MakeTpCall
        _PyEval_EvalFrameDefault
        _PyFunction_Vectorcall
    
    
        _PyObject_GenericGetAttrWithDict
        _PyEval_EvalFrameDefault
        _PyEval_EvalCodeWithName
        _PyFunction_Vectorcall
    
    
    
        _PyEval_EvalCodeWithName
        PyEval_EvalCode
    
    
    
        PyRun_SimpleFileExFlags
    
        Py_BytesMain
        __libc_start_main

        End stack trace 

Aborted\

I don't know what am I doing wrong. Can someone give me a possible fix

@vanbasten23
Copy link
Collaborator

Do you require specific PyTorch/XLA version or it is fine to use the most recent stable version (2.0)? If you are fine with version 2.0, can you remove the line os.environ['XRT_TPU_CONFIG'] = "tpu_worker;0;10.128.0.29:8470" and retry?

Also for XRT_TPU_CONFIG, as the name suggests, it uses the xrt runtime which we plan to drop the support in the near future.

@listless-dude
Copy link
Author

I removed it, and did export PJRT_DEVICE=TPU, still got the same error.

@AdSear
Copy link
AdSear commented Apr 24, 2023

Same for me, @mr-oogway any update?

@ManfeiBai
Copy link
Collaborator

Hi, @vanbasten23 , is that ok to assign this to you?

@vanbasten23
Copy link
Collaborator

I tried your script on https://colab.sandbox.google.com/github/pytorch/xla/blob/master/contrib/colab/getting-started.ipynb and the script runs fine on the colab.

@ysiraichi ysiraichi added bug Something isn't working xla:tpu TPU specific issues and PRs needs reproduction labels May 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs reproduction xla:tpu TPU specific issues and PRs
Projects
None yet
Development

No branches or pull requests

5 participants
0