8000 Fix TestLoRAFinalCheckpoints test class · Issue #498 · pytorch/torchtune · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Fix TestLoRAFinalCheckpoints test class #498

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ebsmothers opened this issue Mar 14, 2024 · 4 comments
Closed

Fix TestLoRAFinalCheckpoints test class #498

ebsmothers opened this issue Mar 14, 2024 · 4 comments
Assignees

Comments

@ebsmothers
Copy link
Contributor

This test class has a number of problems. In no particular order:

  1. The enable_fsdp=True case is not working properly (i.e. since we are on a single device we are actually testing NO_SHARD and not FULL_SHARD sharding strategy).
  2. The full_bf16=True case is probably adding more complexity than value.
  3. The building and modifying of different formatted strings for different tune commands is unclear and not well-documented.
  4. (Part of a larger issue) We should split the test_lora_finetune.py test file into single device and distributed files to align with the split in the recipe files.
@rohan-varma
Copy link
Member

We should also definitely enable multi-GPU CI now that we're in pytorch repo and these runners are available to us: #500

@kartikayk
Copy link
Contributor

BTW @ebsmothers I'm also getting "duplicate distributed initialized" error for this test. I thought that got fixed?

@ebsmothers
Copy link
Contributor Author

BTW @ebsmothers I'm also getting "duplicate distributed initialized" error for this test. I thought that got fixed?

@kartikayk it should be fixed. Can you give the command you're running? And is the error the same one about trying to initialize a process group that's already been initialized?

@ebsmothers
Copy link
Contributor Author

Fixed in #537

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
0