Runtime Error: BF16 unsupported on supported hardware #891

slobodaapl · 2024-04-28T17:36:03Z

I am using the default lora_finetune_single_device.py without any modifications, and 2B_qlora_single_device.yaml without modifications. Running with an RTX 4090.

Attempting to use tune run lora_finetune_single_device.py --config 2B_qlora_single_device.yaml results in:

RuntimeError: bf16 precision was requested but not available on this hardware. Please use fp32 precision instead.

The environment is set up with mamba, latest torch, with all requirements met:

torch==2.3.0
torchao==0.1
torchaudio==2.3.0
torchtune==0.1.1
torchvision==0.18.0

I tried running the following:

>>> torch.cuda.is_available()
True
torch.cuda.is_bf16_supported()
True

Also tested on nightly:

torch==2.4.0.dev20240428+cu121
torchtune==0.2.0.dev20240428+cu121

The text was updated successfully, but these errors were encountered:

kartikayk · 2024-04-29T15:20:31Z

Thanks for filing this issue!

I'm surprised you're running into this issue for two reasons:

As you pointed out 4090s support bfloat16
We relaxed this check for non-CUDA devices in this PR

I just launched a QLoRA training on a 4090 using the nightly build and didn't run into this. Mind checking the following as well?

torch.distributed.is_nccl_available()
torch.cuda.nccl.version() >= (2, 10)

slobodaapl · 2024-04-29T16:33:22Z

Thanks for filing this issue!

I'm surprised you're running into this issue for two reasons:

As you pointed out 4090s support bfloat16

We relaxed this check for non-CUDA devices in this PR

I just launched a QLoRA training on a 4090 using the nightly build and didn't run into this. Mind checking the following as well?
torch.distributed.is_nccl_available()
torch.cuda.nccl.version() >= (2, 10)

Ahh it seems that might be the issue, though I didn't know that was a requirement:

>>> torch.distributed.is_nccl_available()
False
>>> torch.cuda.nccl.version() >= (2, 10)
AttributeError: module 'torch._C' has no attribute '_nccl_version'

I had assumed I can launch this training on Windows, and I should've mentioned that is the OS I am using. Does this mean torchtune will not work on Windows?

ebsmothers · 2024-04-29T21:53:46Z

@slobodaapl thanks for pointing this out. Right now I think we assume availability of NCCL in a couple places. @rohan-varma may know best here: is it sufficient to just point to Gloo backend for Windows, or is there more we need to do there for proper support?

rohan-varma · 2024-04-30T18:09:13Z

@ebsmothers Gloo library is currently unmaintained and there at one point was minimal support for Windows, but no one's on the hook for maintaining that at the moment in PyTorch core. @slobodaapl currently torchtune is not tested on windows OSes, we've only run comprehensive tests and verification on linux machines at the moment.

Nihilentropy-117 · 2024-05-30T04:12:31Z

I also had the
RuntimeError: bf16 precision was requested but not available on this hardware. Please use fp32 precision instead.
Problem.

Python Version: 3.11.9
PyTorch Version: 2.3.0+rocm6.0
BitsAndBytes Version: 0.43.2.dev
Torchtune Version: 0.1.1+cpu
GPU Info: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 [Radeon RX 7900 XT] (rev cc)
AMD Driver Version: OpenGL version string: 4.6 (Compatibility Profile) Mesa 24.0.8-arch1.1
OS Description: Arch Linux
Kernel Version: 6.9.1-arch1-1

torch.cuda.is_bf16_supported() = True
torch.distributed.is_nccl_available() = True
torch.cuda.nccl.version() >= (2, 10) = True

We relaxed this check for non-CUDA devices in this PR

Making these changes fixes the issue for me on the above system.

msaroufim · 2024-06-01T19:54:52Z

I'm seeing this error pop up a lot here https://discuss.pytorch.org/t/fine-tune-llms-using-torchtune/201804

kartikayk · 2024-06-01T22:07:25Z

I think this change is in the nightly not in the stable package. @Nihilentropy-117 can you try the instructions mentioned here: https://pytorch.org/torchtune/main/install.html. @msaroufim can you help point folks to this?

Couple of follow ups for us (cc: @ebsmothers)

Make the nightly clearer on the README and also highlight which features are not in 0.1.1
Package push - we're planning for in a couple of weeks.

kartikayk · 2024-06-01T22:10:50Z

Actually I responded to the thread. Thanks for sharing, @msaroufim! I haven't been keeping up with torchtune issues on pytorch discuss. Will do so moving forward.

ebsmothers mentioned this issue May 15, 2024

(Windows 11) cross_entropy_loss(): RuntimeError: expected scalar type Long but found Int #981

Closed

RdoubleA closed this as completed Jul 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Runtime Error: BF16 unsupported on supported hardware #891

Runtime Error: BF16 unsupported on supported hardware #891

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Runtime Error: BF16 unsupported on supported hardware #891

Runtime Error: BF16 unsupported on supported hardware #891

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!