Pulse · Lightning-AI/litgpt · GitHub

8000 Pulse · Lightning-AI/litgpt · GitHub

More Web Proxy on the site http://driver.im/

June 16, 2025 – July 16, 2025

Overview

17 Active pull request 10000 s

54 Active issues

12 Pull requests merged by 6 people

pin: restrict datasets version to <4.0.0 for compatibility
#2095 merged Jul 15, 2025
Complete pending todos in testing
#2088 merged Jul 9, 2025
doc: add comments for clarifying query / KV groups
#2093 merged Jul 9, 2025
doc: add n_query_groups to attention notation table
#2092 merged Jul 9, 2025
[pre-commit.ci] pre-commit suggestions
#2091 merged Jul 7, 2025
build(deps): update numpy requirement from <2 to none
#2085 merged Jul 1, 2025
docs: Add documentation for OpenAI-compatible API in LitGPT deployment
#2082 merged Jul 1, 2025
ci: show the longest tests for improvement
#2083 merged Jul 1, 2025
update bug-report/issue with reproducing in Studio
#2081 merged Jun 23, 2025
Deferring import of torch in config to allow faster import
#2079 merged Jun 18, 2025
limit PR permissions vol.2
#2078 merged Jun 17, 2025
limit PR permissions
#2077 merged Jun 17, 2025

5 Pull requests opened by 5 people

feat(serve.py): add api_path parameter to cli options to allow custom API endpoint configuration
#2080 opened Jun 21, 2025
build(deps): update transformers requirement from <4.52,>=4.51.3 to >=4.51.3,<4.54
#2084 opened Jul 1, 2025
finetune_lora upgrades
#2086 opened Jul 3, 2025
Submission
#2089 opened Jul 7, 2025
add/debug Lit CI [wip]
#2094 opened Jul 14, 2025

52 Issues closed by 4 people

doc: Misleading QKV shape code comments
#2074 closed Jul 9, 2025
args bug in setup
#2064 closed Jun 29, 2025
converting llama models
#1823 closed Jun 24, 2025
Can I train a model on 7900XT 4 cards?
#1220 closed Jun 24, 2025
Is there any support for visual generation?
#1606 closed Jun 24, 2025
how to change dataset path or download url when evaluating
#1556 closed Jun 24, 2025
Load an AWQ model via python API
#1849 closed Jun 24, 2025
how to resume training from a lora checkpoint?
#2021 closed Jun 24, 2025
How to run within NVIDIA Container
#1941 closed Jun 24, 2025
Convert HF checkpoint to litgpt format
#1854 closed Jun 24, 2025
setting pretraining learning rate in command line interface
#1905 closed Jun 24, 2025
how to merge multiple Lora weights into one base model
#853 closed Jun 24, 2025
Converting lit-gpt checkpoint to huggingface with RoPE scaling
#807 closed Jun 24, 2025
Any finetuning is having No effect on small models
#627 closed Jun 24, 2025
Any equivalent workaround for resize_token_embeddings() in HF llama?
#739 closed Jun 24, 2025
Mistral7B pretraining convergence is too slow
#697 closed Jun 24, 2025
[question] how to set the max_iters value
#682 closed Jun 24, 2025
Can I use lora or adapter to fine-tune some non-instruction set data?
#550 closed Jun 24, 2025
processing the dataset.
#1549 closed Jun 24, 2025
Gradients in GPT module of the finetuning/lora.py script are always zero
#1229 closed Jun 24, 2025
LlamaMOE. The order of softmax and topK
#1286 closed Jun 24, 2025
fabric.print only works on sys.stderr, does not print inference result
#1384 closed Jun 24, 2025
domain specific fine-tuning
#1173 closed Jun 24, 2025
Compatible with local 8xH100 instead of cloud?
#1184 closed Jun 24, 2025
Half-Quadratic Quantization `HQQ`
#1059 closed Jun 24, 2025
Long hang on Llama2-70b startup
#856 closed Jun 24, 2025
How to save the lora weight and config as the huggingface format?
#1990 closed Jun 24, 2025
Converting Safetensors Format Weights from Llama Model with New Tokens to LitGPT Format
#2019 closed Jun 24, 2025
Are there any plans to support multimodal and reinforcement learning?
#1869 closed Jun 24, 2025
How to make use of NVIDIA GH200 Grace Hopper Superchip
#1892 closed Jun 24, 2025
How to convert and use huggingface checkpoint in litgpt ?
#1850 closed Jun 24, 2025
OOM for training llama
#1900 closed Jun 24, 2025
Error when I try to evaluate pretrained Qwen 2.5 0.5B model
#1936 closed Jun 24, 2025
TPU RuntimeError: Expected query, key, and value to have the same dtype, but got query.dtype: float key.dtype: c10::BFloat16 and value.dtype: c10::BFloat16 instead.
#518 closed Jun 23, 2025
Falcon3-1B-Base has the `model.safetensors.index.json` file from Falcon3-3B-Base?
#1954 closed Jun 23, 2025
Lora training seems to be using the same single record for validation step
#1951 closed Jun 23, 2025
litgpt chat crash at first char that differs from english encodind with error : UnicodeDecodeError:
#1953 closed Jun 23, 2025
slow finetuning on TPUv3-8
#519 closed Jun 23, 2025
NotImplementedError: max_seq_length 264 needs to be >= 857
#905 closed Jun 23, 2025
RuntimeError: The size of tensor a (266) must match the size of tensor b (263) at non-singleton dimension 2
#939 closed Jun 23, 2025
Issue when training a MOE model
#967 closed Jun 23, 2025
Higher memory use with QLoRA
#1112 closed Jun 23, 2025
Pretrain then finetune on multiple GPUs error
#1613 closed Jun 23, 2025
performing continuous pretraining and then finetuning causes error
#1430 closed Jun 23, 2025
How to use Activation Checkpointing and Parameter Offloading in a single GPU?
#594 closed Jun 23, 2025
AttributeError: <class 'lit_gpt.utils.NotYetLoadedTensor'> does not have permute
#639 closed Jun 23, 2025
Finetuning bug
#645 closed Jun 23, 2025
bump hf transformer version compatibility
#1913 closed Jun 23, 2025
Slow download from HuggingFace Hub (capped at 10.5 MB/s)
#1886 closed Jun 23, 2025
Installing litgpt should not downgrade PyTorch
#1825 closed Jun 23, 2025
TypeError: TextInputSequence must be str
#1759 closed Jun 23, 2025
Pretraining example is not working
#1318 closed Jun 23, 2025

2 Issues opened by 2 people

Secrets exfiltration vulnerability
#2090 opened Jul 7, 2025
How to bring my tokenizer and set vocabulary size accordingly for training a model with loaded weights
#2087 opened Jul 3, 2025

68 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

AssertionError: Rank 2 has different values for step: 49996.0 Other ranks: 49991.0
#2032 commented on Jun 23, 2025 • 0 new comments
"RuntimeError: All the chunks should have been deleted." on non-Studio machine
#1716 commented on Jun 23, 2025 • 0 new comments
Issue with Dolly Dataloader: `context` key not found!
#1760 commented on Jun 23, 2025 • 0 new comments
Tensor parallelism generates non-sensical outputs
#1663 commented on Jun 23, 2025 • 0 new comments
attention mask is incorrect when generate with softcapping
#1672 commented on Jun 23, 2025 • 0 new comments
Llama3 finetuning and generation: Double begin_of_text, no eot_id
#1682 commented on Jun 23, 2025 • 0 new comments
Error with self dataset falcon 7b: RuntimeError: expected mat1 and mat2 to have the same dtype, but got: c10::BFloat16 != signed char
#223 commented on Jun 23, 2025 • 0 new comments
Evaluation with triviaqa
#424 commented on Jun 23, 2025 • 0 new comments
LoRA with quantization: `micro_batch_size` effect on memory footprint
#501 commented on Jun 23, 2025 • 0 new comments
OOM with bf16-true, Quantization, for long context length.
#477 commented on Jun 23, 2025 • 0 new comments
Data Loading bug in `pretrain` on resume over multiple epochs
#1712 commented on Jun 23, 2025 • 0 new comments
Gemma 2B weights seem to have changed
#1665 commented on Jun 23, 2025 • 0 new comments
Chat: doesn't work with enabled `compilation`
#1584 commented on Jun 23, 2025 • 0 new comments
LR scheduler can result in a division by 0
#1393 commented on Jun 23, 2025 • 0 new comments
Continue pre-training got RuntimeError: Failed processing /tmp/data
#1413 commented on Jun 23, 2025 • 0 new comments
Is it correct to keep using adapter_kv_cache during training in litgpt/adapter.py?
#1287 commented on Jun 23, 2025 • 0 new comments
LoRA model tokenizer configuration fails to load
#1226 commented on Jun 23, 2025 • 0 new comments
Unable to `finetune/lora.py` with `DDP`
#834 commented on Jun 23, 2025 • 0 new comments
Pretraining example from readme fails in Colab
#1402 commented on Jun 23, 2025 • 0 new comments
Failed to load the finetuned model with `AutoModelForCausalLM.from_pretrained(name, state_dict=state_dict)`
#1362 commented on Jun 23, 2025 • 0 new comments
Multi-gpu training with slurm times out
#1832 commented on Jun 23, 2025 • 0 new comments
Abnormal Output from Gemma Pretrained Model After Conversion to Hugging Face Format
#1762 commented on Jun 23, 2025 • 0 new comments
failure converting pretrained litgpt checkpoints to HF format: a reproducible example
#1871 commented on Jun 23, 2025 • 0 new comments
Multiple redundant calls to generate_example() when using multiple GPUs
#1957 commented on Jun 23, 2025 • 0 new comments
finetune_lora on gemma bug
#2020 commented on Jun 23, 2025 • 0 new comments
chatting with mistral generates answer with no spaces
#1822 commented on Jun 23, 2025 • 0 new comments
Make `save_hyperparameters()` robust against different CLI entry points
#1102 commented on Jun 23, 2025 • 0 new comments
Loading checkpoint before `fabric.setup(model)` gets abnormal loss when using `fabric.init_module()`
#1868 commented on Jun 24, 2025 • 0 new comments
Support for mini-omni and mini-omni2 pre training, fine tuning on custom dataset.
#1809 commented on Jun 24, 2025 • 0 new comments
Exporting LoRA to HF format without merging
#1878 commented on Jun 24, 2025 • 0 new comments
Unexpected behaviour in inference with merged QLoRA weights
#935 commented on Jun 24, 2025 • 0 new comments
I'm doing an image generation experiment, but my script outputs a json file, how do I train a Transformer model to generate a pixel representation of an image?
#945 commented on Jun 24, 2025 • 0 new comments
How to pretrain moe model?
#872 commented on Jun 24, 2025 • 0 new comments
LIMA multiturn dialogues not working correctly?
#1504 commented on Jun 24, 2025 • 0 new comments
Custom 4k context length supporting and converting model config to huggingface supportted config file
#666 commented on Jun 24, 2025 • 0 new comments
Issues Converting lit_model.pth to Huggingface Format Using convert_from_litgpt
#1847 commented on Jun 24, 2025 • 0 new comments
Fine-Tuning Chat Model with Domain-Specific Data for custom dataset
#1877 commented on Jun 24, 2025 • 0 new comments
Trouble to load a litpgt trained model using transformers library
#1910 commented on Jun 24, 2025 • 0 new comments
Merging weights after Finetuning with Adapter.
#1921 commented on Jun 24, 2025 • 0 new comments
Question about tie_embeddings
#1727 commented on Jun 24, 2025 • 0 new comments
Generating output from finetuned model using LLM.generate() method vs `litgpt chat` cli command
#1937 commented on Jun 24, 2025 • 0 new comments
Pretraining an OLMo model on the SlimPajama dataset
#1837 commented on Jun 24, 2025 • 0 new comments
Getting probability distributions
#1950 commented on Jun 24, 2025 • 0 new comments
finetune (lora) with LitData
#1966 commented on Jun 24, 2025 • 0 new comments
After Fintune lora. apply chat_template
#1986 commented on Jun 24, 2025 • 0 new comments
Why Alpaca is only mask_prompt=False??
#1987 commented on Jun 24, 2025 • 0 new comments
How to use fabric.clip_gradients in 16-mixed?
#2042 commented on Jun 24, 2025 • 0 new comments
Finetune LLM model for Classification task
#1839 commented on Jun 24, 2025 • 0 new comments
access hidden layer(s) from a model
#1642 commented on Jun 24, 2025 • 0 new comments
UserWarning: The file size of checkpoints/microsoft/phi-4/lit_model.pth is over 4.2 GB.
#1979 commented on Jun 24, 2025 • 0 new comments
LitGPT fine-tuning Dont Use GPU
#1911 commented on Jun 24, 2025 • 0 new comments
[Tokenizer Question]: Falcon 7B fine-tune with new language
#225 commented on Jun 24, 2025 • 0 new comments
[Question] Usage of sep token in prepare_redpajama.py
#706 commented on Jun 24, 2025 • 0 new comments
Full finetuning crash and model outputs are random tokens.
#258 commented on Jun 24, 2025 • 0 new comments
[Question] How to decrease my loss?
#575 commented on Jun 24, 2025 • 0 new comments
TPU Pod Training
#1643 commented on Jun 24, 2025 • 0 new comments
Errors when try to save checkpoints during the full fine-tuning
#1764 commented on Jun 24, 2025 • 0 new comments
openai-gpt2
#1824 commented on Jun 24, 2025 • 0 new comments
Pretrain with Llama3.1-70b
#1828 commented on Jun 24, 2025 • 0 new comments
more checkpoints
#1819 commented on Jun 24, 2025 • 0 new comments
pretraining multi jobs
#1831 commented on Jun 24, 2025 • 0 new comments
Performance degradation on multi-node pretrain
#1836 commented on Jun 24, 2025 • 0 new comments
Llama 4 support
#2015 commented on Jul 15, 2025 • 0 new comments
Add LongLora for both full and lora fine-tuning
#1350 commented on Jun 24, 2025 • 0 new comments
Add Multi-head Latent Attention (DeepSeekv2)
#1945 commented on Jun 20, 2025 • 0 new comments
Refactoring of multi-head attention and support for KV caching
#2061 commented on Jul 15, 2025 • 0 new comments
build(deps): update bitsandbytes requirement from <0.43,>=0.42 to >=0.42,<0.47
#2069 commented on Jul 15, 2025 • 0 new comments
Moving to lazy root imports to make config loading snappy
#2073 commented on Jul 9, 2025 • 0 new comments

0