-
Notifications
You must be signed in to change notification settings - Fork 25
feat: Update sft config to use single GPU #90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: ashors1 <ashors@nvidia.com>
Can you verify that the loss is at least somewhat going down with the new default before we merge? Otherwise, lgtm. |
Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
@SahilJain314 @ashors1 I've added slightly different defaults for 1 GPU job. Both train and val curves go down enough. |
LGTM @okuchaiev. Out of curiosity, is there a reason you decreased the sequence length to 1k? Were you running into OOM with 2k? |
No, I wasn't running OOM with 2K. But for SQUAD, I think, even 1K could be an overkill. |
Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com> Co-authored-by: Oleksii Kuchaiev <okuchaiev@nvidia.com> Co-authored-by: Terry Kong <terrycurtiskong@gmail.com> Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com> Co-authored-by: Oleksii Kuchaiev <okuchaiev@nvidia.com> Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>
What does this PR do ?
Add a one line overview of what this PR aims to accomplish.
Issues
List issues that this PR closes (syntax):
Usage
# Add a code snippet demonstrating how to use this
Before your PR is "Ready for review"
Pre checks:
Additional Information