You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This test class has a number of problems. In no particular order:
The enable_fsdp=True case is not working properly (i.e. since we are on a single device we are actually testing NO_SHARD and not FULL_SHARD sharding strategy).
The full_bf16=True case is probably adding more complexity than value.
The building and modifying of different formatted strings for different tune commands is unclear and not well-documented.
(Part of a larger issue) We should split the test_lora_finetune.py test file into single device and distributed files to align with the split in the recipe files.
The text was updated successfully, but these errors were encountered:
BTW @ebsmothers I'm also getting "duplicate distributed initialized" error for this test. I thought that got fixed?
@kartikayk it should be fixed. Can you give the command you're running? And is the error the same one about trying to initialize a process group that's already been initialized?
This test class has a number of problems. In no particular order:
enable_fsdp=True
case is not working properly (i.e. since we are on a single device we are actually testingNO_SHARD
and notFULL_SHARD
sharding strategy).full_bf16=True
case is probably adding more complexity than value.tune
commands is unclear and not well-documented.test_lora_finetune.py
test file into single device and distributed files to align with the split in the recipe files.The text was updated successfully, but these errors were encountered: