Allow arbitary trainging args to be overridden #1008

derekhiggins · 2024-04-25T22:48:39Z

Adding as a hidden argument to allow experimentation on various devices. Eventually once we know whats needed we can add something more permanent.

Fixes #1007

With this PR and #1012 , running ilab e2e including training works on colab

!ilab train --device cuda --override-training-args '{"bf16":false, "gradient_checkpointing":true, "gradient_accumulation_steps":8}'

tiran · 2024-04-26T07:03:31Z

src/instructlab/train/linux_train.py

+    training_args["fp16"] = use_fp16
+    training_args["bf16"] = not use_fp16


The bf16 issue is addressed in #993

Sounds good, but this PR isn't really intended to deal with any specific training options, the point is to allow the advanced user to override any of them without needing to make changes to the code.

mergify · 2024-04-29T18:25:19Z

This pull request has merge conflicts that must be resolved before it can be
merged. @derekhiggins please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

maxamillion · 2024-04-30T20:12:09Z

I tested this patch and the following worked for me using my RTX A4000 GPU with 16G of VRAM:

$ ilab train --device cuda --override-training-args '{"bf16":false, "gradient_checkpointing":true, "gradient_accumulation_steps":8}'

TY!

mergify · 2024-05-06T18:12:22Z

This pull request has merge conflicts that must be resolved before it can be
merged. @derekhiggins please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify · 2024-05-07T12:17:35Z

This pull request has merge conflicts that must be resolved before it can be
merged. @derekhiggins please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

tyll · 2024-05-08T16:03:39Z

Due to the complexity of the data, it seems this is better suited to be added to config.yaml instead of passing it on the command line.

leseb

Can we have functional test coverage for this?

src/instructlab/lab.py

derekhiggins · 2024-05-23T20:44:30Z

Due to the complexity of the data, it seems this is better suited to be added to config.yaml instead of passing it on the command line.

I've added a example of how to use this from a json file e.g. --override-training-args "$(< override_train_args.json)"
would this be enough? As there is no training section currently in the config.yaml and I'm not sure this is a good reason to add one?

Can we have functional test coverage for this?

If this is merged I'll update the e2e tests which should cover it (e.g. #1111 )

Adding as a hidden argument to allow experimentation on various devices. Eventually once we know whats needed we can add something more permanent. Fixes instructlab#1007 Signed-off-by: Derek Higgins <derekh@redhat.com>

mergify · 2024-06-05T06:56:45Z

This pull request has merge conflicts that must be resolved before it can be
merged. @derekhiggins please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

JamesKunstle · 2024-06-21T15:32:35Z

src/instructlab/lab.py

+    try:
+        override_training_args_dict = json.loads(override_training_args)
+    except json.decoder.JSONDecodeError as e:
+        ctx.fail("Parsing override trainign args: " + str(e))


"trainign" nit on spelling.

I think the command fail (CLI exits) if the input is malformed, too, rather than proceeding and making the user ctl-c and reload.

JamesKunstle

This functionality is super desirable. In the churn of designing the CLI in the context of other pillars of the project (SDG, evaluation, publishing), it's become clear that we need more wide-spread and type-checked configuration for everything. @cdoern's inbound "profiles" PR accounts for this, taking a first step toward application-wide default and override configuration support.

@derekhiggins your PR is very very appreciated, we'd love your input on @cdoern's work as well.

russellb · 2024-06-24T15:42:48Z

This functionality is super desirable. In the churn of designing the CLI in the context of other pillars of the project (SDG, evaluation, publishing), it's become clear that we need more wide-spread and type-checked configuration for everything. @cdoern's inbound "profiles" PR accounts for this, taking a first step toward application-wide default and override configuration support.

@derekhiggins your PR is very very appreciated, we'd love your input on @cdoern's work as well.

@JamesKunstle can you provide a link (or links) to the work you're referring to and requesting feedback on?

derekhiggins · 2024-08-22T14:49:12Z

Clos 6D40 ing this, a lot has changed since it was created and its probably no longer relevant

derekhiggins force-pushed the override_args branch from 5387d3b to a262c1e Compare April 25, 2024 23:13

github-actions bot added the testing Relates to testing label Apr 25, 2024

derekhiggins mentioned this pull request Apr 26, 2024

#1006 - Optimize CPU training on Linux #1010

Closed

tiran reviewed Apr 26, 2024

View reviewed changes

mergify bot added the needs-rebase This Pull Request needs to be rebased label Apr 29, 2024

derekhiggins force-pushed the override_args branch from a262c1e to 845ff0e Compare April 29, 2024 21:02

github-actions bot removed the testing Relates to testing label Apr 29, 2024

mergify bot removed the needs-rebase This Pull Request needs to be rebased label Apr 29, 2024

derekhiggins force-pushed the override_args branch from 845ff0e to 4617a7d Compare April 29, 2024 21:40

derekhiggins force-pushed the override_args branch from 4617a7d to e80d6d3 Compare April 30, 2024 21:38

mergify bot added the testing Relates to testing label Apr 30, 2024

derekhiggins mentioned this pull request Apr 30, 2024

lab train errors out on 16GB M1 Mac #380

Closed

mergify bot added the needs-rebase This Pull Request needs to be rebased label May 6, 2024

mergify bot added needs-rebase This Pull Request needs to be rebased and removed needs-rebase This Pull Request needs to be rebased labels May 6, 2024

This was referenced May 20, 2024

Introduce InstructLab Profiles Managed via ilab profile... to Run Key Commands at Different Fidelity Levels instructlab/dev-docs#52

Closed

Run e2e training test without 4 bit quant #1111

Closed

leseb reviewed May 23, 2024

View reviewed changes

src/instructlab/lab.py Show resolved Hide resolved

derekhiggins force-pushed the override_args branch from e80d6d3 to 1a5cc23 Compare May 23, 2024 20:41

mergify bot added ci-failure PR has at least one CI failure and removed needs-rebase This Pull Request needs to be rebased labels May 23, 2024

Allow arbitary trainging args to be overridden

4f776f8

Adding as a hidden argument to allow experimentation on various devices. Eventually once we know whats needed we can add something more permanent. Fixes instructlab#1007 Signed-off-by: Derek Higgins <derekh@redhat.com>

derekhiggins force-pushed the override_args branch from 1a5cc23 to 4f776f8 Compare May 23, 2024 21:07

mergify bot removed the ci-failure PR has at least one CI failure label May 23, 2024

leseb approved these changes May 30, 2024

View reviewed changes

leseb requested a review from tiran May 30, 2024 14:15

mergify bot added the one-approval PR has one approval from a maintainer label May 30, 2024

nathan-weinberg requested a review from a team June 4, 2024 14:23

russellb added the e2e-trigger label Jun 4, 2024

mergify bot removed the e2e-trigger label Jun 4, 2024

mergify bot added the needs-rebase This Pull Request needs to be rebased label Jun 5, 2024

JamesKunstle reviewed Jun 21, 2024

View reviewed changes

JamesKunstle self-requested a review June 21, 2024 15:35

JamesKunstle reviewed Jun 21, 2024

View reviewed changes

derekhiggins closed this Aug 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow arbitary trainging args to be overridden #1008

Allow arbitary trainging args to be overridden #1008

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

		training_args["fp16"] = use_fp16
		training_args["bf16"] = not use_fp16

Allow arbitary trainging args to be overridden #1008

Allow arbitary trainging args to be overridden #1008

Uh oh!

Conversation

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!