8000 Runtime Error: BF16 unsupported on supported hardware · Issue #891 · pytorch/torchtune · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Runtime Error: BF16 unsupported on supported hardware #891

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
slobodaapl opened this issue Apr 28, 2024 · 8 comments
Closed

Runtime Error: BF16 unsupported on supported hardware #891

slobodaapl opened this issue Apr 28, 2024 · 8 comments

Comments

@slobodaapl
Copy link
slobodaapl commented Apr 28, 2024

I am using the default lora_finetune_single_device.py without any modifications, and 2B_qlora_single_device.yaml without modifications. Running with an RTX 4090.

Attempting to use tune run lora_finetune_single_device.py --config 2B_qlora_single_device.yaml results in:

RuntimeError: bf16 precision was requested but not available on this hardware. Please use fp32 precision instead.

The environment is set up with mamba, latest torch, with all requirements met:

torch==2.3.0
torchao==0.1
torchaudio==2.3.0
torchtune==0.1.1
torchvision==0.18.0

I tried running the following:

>>> torch.cuda.is_available()
True
torch.cuda.is_bf16_supported()
True

Also tested on nightly:

torch==2.4.0.dev20240428+cu121
torchtune==0.2.0.dev20240428+cu121
@kartikayk
Copy link
Contributor

Thanks for filing this issue!

I'm surprised you're running into this issue for two reasons:

  • As you pointed out 4090s support bfloat16
  • We relaxed this check for non-CUDA devices in this PR

I just launched a QLoRA training on a 4090 using the nightly build and didn't run into this. Mind checking the following as well?

torch.distributed.is_nccl_available()
torch.cuda.nccl.version() >= (2, 10)

@slobodaapl
Copy link
Author

Thanks for filing this issue!

I'm surprised you're running into this issue for two reasons:

  • As you pointed out 4090s support bfloat16
  • We relaxed this check for non-CUDA devices in this PR

I just launched a QLoRA training on a 4090 using the nightly build and didn't run into this. Mind checking the following as well?

torch.distributed.is_nccl_available()
torch.cuda.nccl.version() >= (2, 10)

Ahh it seems that might be the issue, though I didn't know that was a requirement:

>>> torch.distributed.is_nccl_available()
False
>>> torch.cuda.nccl.version() >= (2, 10)
AttributeError: module 'torch._C' has no attribute '_nccl_version'

I had assumed I can launch this training on Windows, and I should've mentioned that is the OS I am using. Does this mean torchtune will not work on Windows?

@ebsmothers
Copy link
Contributor

@slobodaapl thanks for pointing this out. Right now I think we assume availability of NCCL in a couple places. @rohan-varma may know best here: is it sufficient to just point to Gloo backend for Windows, or is there more we need to do there for proper support?

@rohan-varma
Copy link
Member

@ebsmothers Gloo library is currently unmaintained and there at one point was minimal support for Windows, but no one's on the hook for maintaining that at the moment in PyTorch core. @slobodaapl currently torchtune is not tested on windows OSes, we've only run comprehensive tests and verification on linux machines at the moment.

@Nihilentropy-117
Copy link
Nihilentropy-117 commented May 30, 2024

I also had the
RuntimeError: bf16 precision was requested but not available on this hardware. Please use fp32 precision instead.
Problem.

Python Version: 3.11.9
PyTorch Version: 2.3.0+rocm6.0
BitsAndBytes Version: 0.43.2.dev
Torchtune Version: 0.1.1+cpu
GPU Info: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 [Radeon RX 7900 XT] (rev cc)
AMD Driver Version: OpenGL version string: 4.6 (Compatibility Profile) Mesa 24.0.8-arch1.1
OS Description: Arch Linux
Kernel Version: 6.9.1-arch1-1

torch.cuda.is_bf16_supported() = True
torch.distributed.is_nccl_available() = True
torch.cuda.nccl.version() >= (2, 10) = True

  • We relaxed this check for non-CUDA devices in this PR

Making these changes fixes the issue for me on the above system.

@msaroufim
Copy link
Member

I'm seeing this error pop up a lot here https://discuss.pytorch.org/t/fine-tune-llms-using-torchtune/201804

@kartikayk
Copy link
Contributor

I think this change is in the nightly not in the stable package. @Nihilentropy-117 can you try the instructions mentioned here: https://pytorch.org/torchtune/main/install.html. @msaroufim can you help point folks to this?

Couple of follow ups for us (cc: @ebsmothers)

  • Make the nightly clearer on the README and also highlight which features are not in 0.1.1
  • Package push - we're planning for in a couple of weeks.

@kartikayk
Copy link
Contributor

Actually I responded to the thread. Thanks for sharing, @msaroufim! I haven't been keeping up with torchtune issues on pytorch discuss. Will do so moving forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants
0