-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Insights: Lightning-AI/litgpt
Overview
Could not load contribution data
Please try again later
12 Pull requests merged by 6 people
-
pin: restrict datasets version to <4.0.0 for compatibility
#2095 merged
Jul 15, 2025 -
Complete pending todos in testing
#2088 merged
Jul 9, 2025 -
doc: add comments for clarifying query / KV groups
#2093 merged
Jul 9, 2025 -
doc: add
n_query_groups
to attention notation table#2092 merged
Jul 9, 2025 -
[pre-commit.ci] pre-commit suggestions
#2091 merged
Jul 7, 2025 -
build(deps): update numpy requirement from <2 to none
#2085 merged
Jul 1, 2025 -
docs: Add documentation for OpenAI-compatible API in LitGPT deployment
#2082 merged
Jul 1, 2025 -
ci: show the longest tests for improvement
#2083 merged
Jul 1, 2025 -
update bug-report/issue with reproducing in Studio
#2081 merged
Jun 23, 2025 -
Deferring import of torch in config to allow faster import
#2079 merged
Jun 18, 2025 -
limit PR permissions vol.2
#2078 merged
Jun 17, 2025 -
limit PR permissions
#2077 merged
Jun 17, 2025
5 Pull requests opened by 5 people
-
feat(serve.py): add api_path parameter to cli options to allow custom API endpoint configuration
#2080 opened
Jun 21, 2025 -
build(deps): update transformers requirement from <4.52,>=4.51.3 to >=4.51.3,<4.54
#2084 opened
Jul 1, 2025 -
finetune_lora upgrades
#2086 opened
Jul 3, 2025 -
Submission
#2089 opened
Jul 7, 2025 -
add/debug Lit CI [wip]
#2094 opened
Jul 14, 2025
52 Issues closed by 4 people
-
doc: Misleading QKV shape code comments
#2074 closed
Jul 9, 2025 -
args bug in setup
#2064 closed
Jun 29, 2025 -
converting llama models
#1823 closed
Jun 24, 2025 -
Can I train a model on 7900XT 4 cards?
#1220 closed
Jun 24, 2025 -
Is there any support for visual generation?
#1606 closed
Jun 24, 2025 -
how to change dataset path or download url when evaluating
#1556 closed
Jun 24, 2025 -
Load an AWQ model via python API
#1849 closed
Jun 24, 2025 -
how to resume training from a lora checkpoint?
#2021 closed
Jun 24, 2025 -
How to run within NVIDIA Container
#1941 closed
Jun 24, 2025 -
Convert HF checkpoint to litgpt format
#1854 closed
Jun 24, 2025 -
setting pretraining learning rate in command line interface
#1905 closed
Jun 24, 2025 -
how to merge multiple Lora weights into one base model
#853 closed
Jun 24, 2025 -
Converting lit-gpt checkpoint to huggingface with RoPE scaling
#807 closed
Jun 24, 2025 -
Any finetuning is having No effect on small models
#627 closed
Jun 24, 2025 -
Any equivalent workaround for resize_token_embeddings() in HF llama?
#739 closed
Jun 24, 2025 -
Mistral7B pretraining convergence is too slow
#697 closed
Jun 24, 2025 -
[question] how to set the max_iters value
#682 closed
Jun 24, 2025 -
Can I use lora or adapter to fine-tune some non-instruction set data?
#550 closed
Jun 24, 2025 -
processing the dataset.
#1549 closed
Jun 24, 2025 -
Gradients in GPT module of the finetuning/lora.py script are always zero
#1229 closed
Jun 24, 2025 -
LlamaMOE. The order of softmax and topK
#1286 closed
Jun 24, 2025 -
fabric.print only works on sys.stderr, does not print inference result
#1384 closed
Jun 24, 2025 -
domain specific fine-tuning
#1173 closed
Jun 24, 2025 -
Compatible with local 8xH100 instead of cloud?
#1184 closed
Jun 24, 2025 -
Half-Quadratic Quantization `HQQ`
#1059 closed
Jun 24, 2025 -
Long hang on Llama2-70b startup
#856 closed
Jun 24, 2025 -
How to save the lora weight and config as the huggingface format?
#1990 closed
Jun 24, 2025 -
Converting Safetensors Format Weights from Llama Model with New Tokens to LitGPT Format
#2019 closed
Jun 24, 2025 -
Are there any plans to support multimodal and reinforcement learning?
#1869 closed
Jun 24, 2025 -
How to make use of NVIDIA GH200 Grace Hopper Superchip
#1892 closed
Jun 24, 2025 -
How to convert and use huggingface checkpoint in litgpt ?
#1850 closed
Jun 24, 2025 -
OOM for training llama
#1900 closed
Jun 24, 2025 -
Error when I try to evaluate pretrained Qwen 2.5 0.5B model
#1936 closed
Jun 24, 2025 -
Falcon3-1B-Base has the `model.safetensors.index.json` file from Falcon3-3B-Base?
#1954 closed
Jun 23, 2025 -
Lora training seems to be using the same single record for validation step
#1951 closed
Jun 23, 2025 -
litgpt chat crash at first char that differs from english encodind with error : UnicodeDecodeError:
#1953 closed
Jun 23, 2025 -
slow finetuning on TPUv3-8
#519 closed
Jun 23, 2025 -
NotImplementedError: max_seq_length 264 needs to be >= 857
#905 closed
Jun 23, 2025 -
Issue when training a MOE model
#967 closed
Jun 23, 2025 -
Higher memory use with QLoRA
#1112 closed
Jun 23, 2025 -
Pretrain then finetune on multiple GPUs error
#1613 closed
Jun 23, 2025 -
performing continuous pretraining and then finetuning causes error
#1430 closed
Jun 23, 2025 -
How to use Activation Checkpointing and Parameter Offloading in a single GPU?
#594 closed
Jun 23, 2025 -
AttributeError: <class 'lit_gpt.utils.NotYetLoadedTensor'> does not have permute
#639 closed
Jun 23, 2025 -
Finetuning bug
#645 closed
Jun 23, 2025 -
bump hf transformer version compatibility
#1913 closed
Jun 23, 2025 -
Slow download from HuggingFace Hub (capped at 10.5 MB/s)
#1886 closed
Jun 23, 2025 -
Installing litgpt should not downgrade PyTorch
#1825 closed
Jun 23, 2025 -
TypeError: TextInputSequence must be str
#1759 closed
Jun 23, 2025 -
Pretraining example is not working
#1318 closed
Jun 23, 2025
2 Issues opened by 2 people
-
Secrets exfiltration vulnerability
#2090 opened
Jul 7, 2025
68 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
AssertionError: Rank 2 has different values for step: 49996.0 Other ranks: 49991.0
#2032 commented on
Jun 23, 2025 • 0 new comments -
"RuntimeError: All the chunks should have been deleted." on non-Studio machine
#1716 commented on
Jun 23, 2025 • 0 new comments -
Issue with Dolly Dataloader: `context` key not found!
#1760 commented on
Jun 23, 2025 • 0 new comments -
Tensor parallelism generates non-sensical outputs
#1663 commented on
Jun 23, 2025 • 0 new comments -
attention mask is incorrect when generate with softcapping
#1672 commented on
Jun 23, 2025 • 0 new comments -
Llama3 finetuning and generation: Double begin_of_text, no eot_id
#1682 commented on
Jun 23, 2025 • 0 new comments -
Error with self dataset falcon 7b: RuntimeError: expected mat1 and mat2 to have the same dtype, but got: c10::BFloat16 != signed char
#223 commented on
Jun 23, 2025 • 0 new comments -
Evaluation with triviaqa
#424 commented on
Jun 23, 2025 • 0 new comments -
LoRA with quantization: `micro_batch_size` effect on memory footprint
#501 commented on
Jun 23, 2025 • 0 new comments -
OOM with bf16-true, Quantization, for long context length.
#477 commented on
Jun 23, 2025 • 0 new comments -
Data Loading bug in `pretrain` on resume over multiple epochs
#1712 commented on
Jun 23, 2025 • 0 new comments -
Gemma 2B weights seem to have changed
#1665 commented on
Jun 23, 2025 • 0 new comments -
Chat: doesn't work with enabled `compilation`
#1584 commented on
Jun 23, 2025 • 0 new comments -
LR scheduler can result in a division by 0
#1393 commented on
Jun 23, 2025 • 0 new comments -
Continue pre-training got RuntimeError: Failed processing /tmp/data
#1413 commented on
Jun 23, 2025 • 0 new comments -
Is it correct to keep using adapter_kv_cache during training in litgpt/adapter.py?
#1287 commented on
Jun 23, 2025 • 0 new comments -
LoRA model tokenizer configuration fails to load
#1226 commented on
Jun 23, 2025 • 0 new comments -
Unable to `finetune/lora.py` with `DDP`
#834 commented on
Jun 23, 2025 • 0 new comments -
Pretraining example from readme fails in Colab
#1402 commented on
Jun 23, 2025 • 0 new comments -
Failed to load the finetuned model with `AutoModelForCausalLM.from_pretrained(name, state_dict=state_dict)`
#1362 commented on
Jun 23, 2025 • 0 new comments -
Multi-gpu training with slurm times out
#1832 commented on
Jun 23, 2025 • 0 new comments -
Abnormal Output from Gemma Pretrained Model After Conversion to Hugging Face Format
#1762 commented on
Jun 23, 2025 • 0 new comments -
failure converting pretrained litgpt checkpoints to HF format: a reproducible example
#1871 commented on
Jun 23, 2025 • 0 new comments -
Multiple redundant calls to generate_example() when using multiple GPUs
#1957 commented on
Jun 23, 2025 • 0 new comments -
finetune_lora on gemma bug
#2020 commented on
Jun 23, 2025 • 0 new comments -
chatting with mistral generates answer with no spaces
#1822 commented on
Jun 23, 2025 • 0 new comments -
Make `save_hyperparameters()` robust against different CLI entry points
#1102 commented on
Jun 23, 2025 • 0 new comments -
Loading checkpoint before `fabric.setup(model)` gets abnormal loss when using `fabric.init_module()`
#1868 commented on
Jun 24, 2025 • 0 new comments -
Support for mini-omni and mini-omni2 pre training, fine tuning on custom dataset.
#1809 commented on
Jun 24, 2025 • 0 new comments -
Exporting LoRA to HF format without merging
#1878 commented on
Jun 24, 2025 • 0 new comments -
Unexpected behaviour in inference with merged QLoRA weights
#935 commented on
Jun 24, 2025 • 0 new comments -
I'm doing an image generation experiment, but my script outputs a json file, how do I train a Transformer model to generate a pixel representation of an image?
#945 commented on
Jun 24, 2025 • 0 new comments -
How to pretrain moe model?
#872 commented on
Jun 24, 2025 • 0 new comments -
LIMA multiturn dialogues not working correctly?
#1504 commented on
Jun 24, 2025 • 0 new comments -
Custom 4k context length supporting and converting model config to huggingface supportted config file
#666 commented on
Jun 24, 2025 • 0 new comments -
Issues Converting lit_model.pth to Huggingface Format Using convert_from_litgpt
#1847 commented on
Jun 24, 2025 • 0 new comments -
Fine-Tuning Chat Model with Domain-Specific Data for custom dataset
#1877 commented on
Jun 24, 2025 • 0 new comments -
Trouble to load a litpgt trained model using transformers library
#1910 commented on
Jun 24, 2025 • 0 new comments -
Merging weights after Finetuning with Adapter.
#1921 commented on
Jun 24, 2025 • 0 new comments -
Question about tie_embeddings
#1727 commented on
Jun 24, 2025 • 0 new comments -
Generating output from finetuned model using LLM.generate() method vs `litgpt chat` cli command
#1937 commented on
Jun 24, 2025 • 0 new comments -
Pretraining an OLMo model on the SlimPajama dataset
#1837 commented on
Jun 24, 2025 • 0 new comments -
Getting probability distributions
#1950 commented on
Jun 24, 2025 • 0 new comments -
finetune (lora) with LitData
#1966 commented on
Jun 24, 2025 • 0 new comments -
After Fintune lora. apply chat_template
#1986 commented on
Jun 24, 2025 • 0 new comments -
Why Alpaca is only mask_prompt=False??
#1987 commented on
Jun 24, 2025 • 0 new comments -
How to use fabric.clip_gradients in 16-mixed?
#2042 commented on
Jun 24, 2025 • 0 new comments -
Finetune LLM model for Classification task
#1839 commented on
Jun 24, 2025 • 0 new comments -
access hidden layer(s) from a model
#1642 commented on
Jun 24, 2025 • 0 new comments -
UserWarning: The file size of checkpoints/microsoft/phi-4/lit_model.pth is over 4.2 GB.
#1979 commented on
Jun 24, 2025 • 0 new comments -
LitGPT fine-tuning Dont Use GPU
#1911 commented on
Jun 24, 2025 • 0 new comments -
[Tokenizer Question]: Falcon 7B fine-tune with new language
#225 commented on
Jun 24, 2025 • 0 new comments -
[Question] Usage of sep token in prepare_redpajama.py
#706 commented on
Jun 24, 2025 • 0 new comments -
Full finetuning crash and model outputs are random tokens.
#258 commented on
Jun 24, 2025 • 0 new comments -
[Question] How to decrease my loss?
#575 commented on
Jun 24, 2025 • 0 new comments -
TPU Pod Training
#1643 commented on
Jun 24, 2025 • 0 new comments -
Errors when try to save checkpoints during the full fine-tuning
#1764 commented on
Jun 24, 2025 • 0 new comments -
openai-gpt2
#1824 commented on
Jun 24, 2025 • 0 new comments -
Pretrain with Llama3.1-70b
#1828 commented on
Jun 24, 2025 • 0 new comments -
more checkpoints
#1819 commented on
Jun 24, 2025 • 0 new comments -
pretraining multi jobs
#1831 commented on
Jun 24, 2025 • 0 new comments -
Performance degradation on multi-node pretrain
#1836 commented on
Jun 24, 2025 • 0 new comments -
Llama 4 support
#2015 commented on
Jul 15, 2025 • 0 new comments -
Add LongLora for both full and lora fine-tuning
#1350 commented on
Jun 24, 2025 • 0 new comments -
Add Multi-head Latent Attention (DeepSeekv2)
#1945 commented on
Jun 20, 2025 • 0 new comments -
Refactoring of multi-head attention and support for KV caching
#2061 commented on
Jul 15, 2025 • 0 new comments -
build(deps): update bitsandbytes requirement from <0.43,>=0.42 to >=0.42,<0.47
#2069 commented on
Jul 15, 2025 • 0 new comments -
Moving to lazy root imports to make config loading snappy
#2073 commented on
Jul 9, 2025 • 0 new comments