-
Notifications
You must be signed in to change notification settings - Fork 432
Allow arbitary trainging args to be overridden #1008
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
5387d3b
to
a262c1e
Compare
src/instructlab/train/linux_train.py
Outdated
training_args["fp16"] = use_fp16 | ||
training_args["bf16"] = not use_fp16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bf16 issue is addressed in #993
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, but this PR isn't really intended to deal with any specific training options, the point is to allow the advanced user to override any of them without needing to make changes to the code.
This pull request has merge conflicts that must be resolved before it can be |
a262c1e
to
845ff0e
Compare
845ff0e
to
4617a7d
Compare
I tested this patch and the following worked for me using my RTX A4000 GPU with 16G of VRAM: $ ilab train --device cuda --override-training-args '{"bf16":false, "gradient_checkpointing":true, "gradient_accumulation_steps":8}' TY! |
4617a7d
to
e80d6d3
Compare
This pull request has merge conflicts that must be resolved before it can be |
This pull request has merge conflicts that must be resolved before it can be |
Due to the complexity of the data, it seems this is better suited to be added to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have functional test coverage for this?
e80d6d3
to
1a5cc23
Compare
I've added a example of how to use this from a json file e.g. --override-training-args "$(< override_train_args.json)"
If this is merged I'll update the e2e tests which should cover it (e.g. #1111 ) |
Adding as a hidden argument to allow experimentation on various devices. Eventually once we know whats needed we can add something more permanent. Fixes instructlab#1007 Signed-off-by: Derek Higgins <derekh@redhat.com>
1a5cc23
to
4f776f8
Compare
This pull request has merge conflicts that must be resolved before it can be |
try: | ||
override_training_args_dict = json.loads(override_training_args) | ||
except json.decoder.JSONDecodeError as e: | ||
ctx.fail("Parsing override trainign args: " + str(e)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"trainign" nit on spelling.
I think the command fail (CLI exits) if the input is malformed, too, rather than proceeding and making the user ctl-c and reload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This functionality is super desirable. In the churn of designing the CLI in the context of other pillars of the project (SDG, evaluation, publishing), it's become clear that we need more wide-spread and type-checked configuration for everything. @cdoern's inbound "profiles" PR accounts for this, taking a first step toward application-wide default and override configuration support.
@derekhiggins your PR is very very appreciated, we'd love your input on @cdoern's work as well.
@JamesKunstle can you provide a link (or links) to the work you're referring to and requesting feedback on? |
Clos 6D40 ing this, a lot has changed since it was created and its probably no longer relevant |
Adding as a hidden argument to allow experimentation on various devices. Eventually once we know whats needed we can add something more permanent.
Fixes #1007
With this PR and #1012 , running ilab e2e including training works on colab