Error while trying to run on TPU from VM instance. #4896

listless-dude · 2023-04-18T05:56:12Z

❓ Questions and Help

I did set up XRT_TPU_CONFIG with the IP address of the TPU.
This is my test.py script

import os
import torch
import torch_xla.core.xla_model as xm

os.environ['XRT_TPU_CONFIG'] = "tpu_worker;0;10.128.0.29:8470"

dev = xm.xla_device() ## Error while executing this line
t1 = torch.randn(3,3,device=dev)
t2 = torch.randn(3,3,device=dev)
print(t1 + t2)

Here's the error:

\2023-04-17 19:35:38.550666: F    5184 tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:1362\] Non-OK-status: session.Run({tensorflow::Output(result, 0)}, &outputs) status: UNIMPLEMENTED: method "RunStep" not implemented
\*\*\* Begin stack trace \*\*\*
tsl::CurrentStackTrace()
xla::XrtComputationClient::InitializeAndFetchTopology(std::string const&, int, std::string const&, tensorflow::ConfigProto const&)
xla::XrtComputationClient::InitializeDevices(std::unique_ptr\<tensorflow::tpu::TopologyProto, std::default_delete\<tensorflow::tpu::TopologyProto\> \>)
xla::XrtComputationClient::XrtComputationClient(xla::XrtComputationClient::Options, std::unique_ptr\<tensorflow::tpu::TopologyProto, std::default_delete\<tensorflow::tpu::TopologyProto\> \>)
xla::ComputationClient::Create()

        xla::ComputationClient::Get()
    
    
        PyCFunction_Call
        _PyObject_MakeTpCall
        _PyEval_EvalFrameDefault
        _PyFunction_Vectorcall
    
    
        _PyObject_GenericGetAttrWithDict
        _PyEval_EvalFrameDefault
        _PyEval_EvalCodeWithName
        _PyFunction_Vectorcall
    
    
    
        _PyEval_EvalCodeWithName
        PyEval_EvalCode
    
    
    
        PyRun_SimpleFileExFlags
    
        Py_BytesMain
        __libc_start_main

        End stack trace 

Aborted\

I don't know what am I doing wrong. Can someone give me a possible fix

The text was updated successfully, but these errors were encountered:

vanbasten23 · 2023-04-18T23:44:58Z

Do you require specific PyTorch/XLA version or it is fine to use the most recent stable version (2.0)? If you are fine with version 2.0, can you remove the line os.environ['XRT_TPU_CONFIG'] = "tpu_worker;0;10.128.0.29:8470" and retry?

Also for XRT_TPU_CONFIG, as the name suggests, it uses the xrt runtime which we plan to drop the support in the near future.

listless-dude · 2023-04-19T03:45:09Z

I removed it, and did export PJRT_DEVICE=TPU, still got the same error.

AdSear · 2023-04-24T16:32:20Z

Same for me, @mr-oogway any update?

ManfeiBai · 2023-05-08T07:39:22Z

Hi, @vanbasten23 , is that ok to assign this to you?

vanbasten23 · 2023-06-28T23:32:34Z

I tried your script on https://colab.sandbox.google.com/github/pytorch/xla/blob/master/contrib/colab/getting-started.ipynb and the script runs fine on the colab.

ManfeiBai assigned vanbasten23 May 8, 2023

ysiraichi added bug Something isn't working xla:tpu TPU specific issues and PRs needs reproduction labels May 5, 2025

ysiraichi closed this as completed May 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error while trying to run on TPU from VM instance. #4896

Error while trying to run on TPU from VM instance. #4896

Error while trying to run on TPU from VM instance. #4896

Error while trying to run on TPU from VM instance. #4896

Comments

❓ Questions and Help