8000 Libtpu pin update after 04/25 hangs · Issue #9084 · pytorch/xla · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Libtpu pin update after 04/25 hangs #9084

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
bhavya01 opened this issue May 2, 2025 · 0 comments
Open

Libtpu pin update after 04/25 hangs #9084

bhavya01 opened this issue May 2, 2025 · 0 comments
Assignees
Labels
bug Something isn't working libtpu

Comments

@bhavya01
Copy link
Collaborator
bhavya01 commented May 2, 2025

🐛 Bug

Trying to update the libtpu nightly to anything after 04/25 hangs the TPU tests.

To Reproduce

  1. Set libtpu nightly to after 04/25 in setup.py file
  2. Run python test/test_operations.py
  3. This leads to hang

Expected behavior

Tests should pass.

Environment

  • Reproducible on XLA backend [CPU/TPU/CUDA]: TPU
  • torch_xla version: 04/25 nightly

Additional context

Libtpu turned the flag --xla_tpu_use_enhanced_launch_barrier to default true value. This flag ensures that each device that the pjrt_executable is compiled for is executing the same code by doing an allreduce on the run_id.

I think that when running Compile we use all the available PjRt devices to compile

xla::DeviceAssignment device_assignment(client_->device_count(), 1);

When executing the computation, the barrier probably expects all the devices to be running the same computation due to the device assignment. Creating an issue to verify and fix this.

@bhavya01 bhavya01 self-assigned this May 2, 2025
@ysiraichi ysiraichi added bug Something isn't working libtpu labels May 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working libtpu
Projects
None yet
Development

No branches or pull requests

2 participants
0