8000 CLI flags/configuration and examples for Linux GPU training · Issue #647 · instructlab/instructlab · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
CLI flags/configuration and examples for Linux GPU training #647
Closed
@bbrowning

Description

@bbrowning

The exact settings that need to change to successfully train with a Linux GPU can vary quite a lot by system. AMD cards vs Nvidia cards, memory available on the card, age of card, etc.

A list of the settings I've found so far that a user may want to tweak to get training working at all or to make tradeoffs in overall training speed versus resources used, nonexhaustive:

  • device(s) to use
  • fp16 vs bf16 precision
  • quantitization, specifically using 4bit BitsAndBytes vs not
  • gradient accumulation steps or disabled entirely
  • gradient checkpointing enabled/disabled
  • per-device training batch size
  • distributed training across multiple GPUs/CPUs (may be out of scope for just config, as that's more work to setup)

Some of these are configurable by CLI flags today. Can we expose all the needed parameters via CLI flags? Do we need configuration files? Which of these are also applicable to other lab commands, such as serve, generate, test, convert?

However we expose the necessary configuration, a list of example configuration/flags to use for different setups would be nice. Show people the things to tweak to lower memory usage at the expense of speed. Perhaps some guidance on the options needed to reduce GPU memory required under popular thresholds, like 8GB, 16GB, 24GB.

My assumption here is a goal would be to give people enough knobs to turn that they can get training going on their machine without having to change the actual Python code to do it. Perhaps others disagree with that assumption? All opinions are welcome!

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestlinuxSomething Linux-specificstale

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0