Open
Description
I can successfully run the openvla-oft test example and get the output action tchunk on the L40 server, but when I switch to the H20 server, error occurs. Why is this happening?
Using LIBERO constants:
NUM_ACTIONS_CHUNK = 8
ACTION_DIM = 7
PROPRIO_DIM = 8
ACTION_PROPRIO_NORMALIZATION_TYPE = bounds_q99
If needed, manually set the correct constants in `prismatic/vla/constants.py`!
2025-06-17 13:29:57.916840: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2348] TensorFlow was not built with CUDA kernel binaries compatible with compute capability 9.0. CUDA kernels will be jit-compiled from PTX, which could take 30 minutes or longer.
Instantiating pretrained VLA policy...
Created backup of original config at: /home/zyli/openvla-oft/openvla-7b-oft-finetuned-libero-spatial/config.json.back.20250617_133010
Updated config.json at: /home/zyli/openvla-oft/openvla-7b-oft-finetuned-libero-spatial/config.json
Changes made:
- Set AutoConfig to "configuration_prismatic.OpenVLAConfig"
- Set AutoModelForVision2Seq to "modeling_prismatic.OpenVLAForActionPrediction"
<frozen importlib._bootstrap>:283: DeprecationWarning: the load_module() method is deprecated and slated for removal in Python 3.12; use exec_module() instead
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:01<00:00, 2.53it/s]
Floating point exception (core dumped)
My understanding is that the H20 GPU has a compute capability of 9.0, but currently released TensorFlow versions do not support it, so I must recompile TensorFlow from source code, right?
Has anyone successfully configured a working environment on an H100 or H20 server?
Metadata
Metadata
Assignees
Labels
No labels