-
Notifications
You must be signed in to change notification settings - Fork 619
Runtime Error: BF16 unsupported on supported hardware #891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for filing this issue! I'm surprised you're running into this issue for two reasons:
I just launched a QLoRA training on a 4090 using the nightly build and didn't run into this. Mind checking the following as well?
|
Ahh it seems that might be the issue, though I didn't know that was a requirement:
I had assumed I can launch this training on Windows, and I should've mentioned that is the OS I am using. Does this mean torchtune will not work on Windows? |
@slobodaapl thanks for pointing this out. Right now I think we assume availability of NCCL in a couple places. @rohan-varma may know best here: is it sufficient to just point to Gloo backend for Windows, or is there more we need to do there for proper support? |
@ebsmothers Gloo library is currently unmaintained and there at one point was minimal support for Windows, but no one's on the hook for maintaining that at the moment in PyTorch core. @slobodaapl currently torchtune is not tested on windows OSes, we've only run comprehensive tests and verification on linux machines at the moment. |
I also had the Python Version: 3.11.9
Making these changes fixes the issue for me on the above system. |
I'm seeing this error pop up a lot here https://discuss.pytorch.org/t/fine-tune-llms-using-torchtune/201804 |
I think this change is in the nightly not in the stable package. @Nihilentropy-117 can you try the instructions mentioned here: https://pytorch.org/torchtune/main/install.html. @msaroufim can you help point folks to this? Couple of follow ups for us (cc: @ebsmothers)
|
Actually I responded to the thread. Thanks for sharing, @msaroufim! I haven't been keeping up with torchtune issues on pytorch discuss. Will do so moving forward. |
Uh oh!
There was an error while loading. Please reload this page.
I am using the default
lora_finetune_single_device.py
without any modifications, and2B_qlora_single_device.yaml
without modifications. Running with an RTX 4090.Attempting to use
tune run lora_finetune_single_device.py --config 2B_qlora_single_device.yaml
results in:RuntimeError: bf16 precision was requested but not available on this hardware. Please use fp32 precision instead.
The environment is set up with
mamba
, latest torch, with all requirements met:I tried running the following:
Also tested on nightly:
The text was updated successfully, but these errors were encountered: