Add support for Quantization-Aware Low-Rank Adaptation (QALoRA) #2571

gapsong · 2025-06-04T12:41:50Z

BenjaminBossan

gapsong · 2025-06-09T10:57:28Z

BenjaminBossan

BenjaminBossan · 2025-06-11T10:29:08Z

src/peft/tuners/lora/config.py

+        default=False,
+        metadata={
+            "help": (
+                "Enable <a href='https://huggingface.co/papers/2309.14717'>Quantization-Aware Low-Rank Adaptation (QALoRA)</a>. This technique combines quantization-aware training "


Let's mention that it is only implemented for GPTQ for now. Also, please update the docstring (can use the same text).

BenjaminBossan · 2025-06-11T10:29:38Z

src/peft/tuners/lora/config.py

+                "Enable <a href='https://huggingface.co/papers/2309.14717'>Quantization-Aware Low-Rank Adaptation (QALoRA)</a>. This technique combines quantization-aware training "
+                "with LoRA to improve performance for quantized models. This can improve the performance of LoRA, "
+                "especially at low ranks. Right now, QALoRA only supports linear layers. QALoRA introduces a bigger "
+                "overhead than pure LoRA, so it is recommended to merge weights for inference."


This recommendation is a bit moot as merging is not supported for GPTQ. Let's remove this sentence.

BenjaminBossan · 2025-06-11T10:32:20Z

src/peft/tuners/lora/gptq.py

+        if use_qalora:
+            from .variants import QALoraLinearVariant
+
+            return QALoraLinearVariant()
+        if not use_dora:
+            return None
+
+        from .variants import DoraLinearVariant
+
+        return DoraLinearVariant()


Let's change the check a bit for completeness to basically:

if use_dora and use_qalora: NotImplementedError elif use_dora: variant = ... elif use_qalora: variant = ... else: variant = None return variant

BenjaminBossan · 2025-06-11T10:32:46Z

src/peft/tuners/lora/gptq.py

@@ -64,29 +80,33 @@ def forward(self, x: torch.Tensor):
            return result

        lora_A_keys = self.lora_A.keys()
+        torch_result_dtype = result.dtype


This is not needed, right?

BenjaminBossan · 2025-06-11T10:33:25Z

src/peft/tuners/lora/gptq.py

-
-            if requires_conversion:
-                output = output.to(expected_dtype)
+            # requires_conversion = not torch.is_autocast_enabled()


remove the comment?

src/peft/tuners/lora/variants.py

BenjaminBossan · 2025-06-11T10:39:44Z

src/peft/tuners/lora/variants.py

+
+        # Create and store pooling factor for scaling
+        if not hasattr(module, "qalora_scaling_factor"):
+            module.qalora_scaling_factor = {}


Same comment as above regarding other_param_names. But do we even need qalora_scaling_factor, as it can be calculated on the fly based on module.in_features and qalora_group_size anyway?

BenjaminBossan · 2025-06-11T10:41:47Z

src/peft/tuners/lora/variants.py

+            module.qalora_scaling_factor[adapter_name] = module.in_features / qalora_group_size
+        else:
+            # No special scaling if dimensions don't align
+            module.qalora_scaling_factor[adapter_name] = 1.0


Would this not lead to very different results than qalora_scaling_factor = module.in_features / qalora_group_size?

I wonder if perhaps it makes more sense to raise an error here and require that module.in_features % qalora_group_size == 0? I think this would simplify the code and when module.in_features % qalora_group_size != 0 we just get standard LoRA?

BenjaminBossan · 2025-06-11T10:42:51Z

src/peft/tuners/lora/variants.py

+            torch.Tensor: The calculated delta weight.
+        """
+        if (
+            not hasattr(module, "qalora_group_size")


Is it even possible to hit this condition? Same question about not hasattr(module, "qalora_scaling_factor").

You are right. We will never reach in here

src/peft/tuners/lora/variants.py

gapsong · 2025-06-12T06:26:11Z

BenjaminBossan

BenjaminBossan · 2025-06-12T12:31:04Z

examples/qalora_finetuning/README.md

+    --qalora_group_size 8
+```
+
+QALoRA also works with different quantization methods (GPTQ, EETQ, AWQ, etc.):


Currently, that's not true, is it?

Question still open

BenjaminBossan · 2025-06-12T12:32:38Z

examples/qalora_finetuning/README.md

+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+
+## QALoRA vs. LoRA vs. DoRA


I don't think there is a specific reason to compare QALoRA to DoRA, is there? I think the more interesting question for users would be: What's the difference between QALoRA and QLoRA? When should I use which, what trade offs are there?

examples/qalora_finetuning/README.md

BenjaminBossan · 2025-06-12T12:34:46Z

examples/qalora_finetuning/README.md

+2. The QALoRA adapter weights are then merged with the dequantized model
+3. The merged model must be re-quantized if quantization is still desired
+
+This implementation choice was made because **it yielded better performance in practice**, despite being less memory-efficient than the direct modification approach described in the paper.


Out of curiosity: Is this your personal experience or is there a reference for this?

Question still open

examples/qalora_finetuning/README.md

BenjaminBossan · 2025-06-12T13:33:31Z

examples/qalora_finetuning/qalora_gptq_finetuning.py

+            elif optimizer_name.lower() == "sgd":
+                opt_size = param_size  # SGD with momentum keeps 1 extra state
+
+        print(


Again, I think printing this for each layer is a bit of information overkill.

BenjaminBossan · 2025-06-12T13:36:16Z

examples/qalora_finetuning/README.md

+
+Run the finetuning script with a GPTQ quantized model:
+```bash
+python examples/qalora_finetuning/qalora_gptq_finetuning.py \


When I run this locally, I get a train loss of 0.0, can you replicate?

Question still open

BenjaminBossan · 2025-06-12T13:41:20Z

examples/qalora_finetuning/qalora_gptq_finetuning.py

+        task_type="CAUSAL_LM",
+        use_dora=use_dora,
+        use_qalora=use_qalora,
+        qalora_group_size=8,  # Explicitly set group size for QALoRA


Would it make sense to expose this argument to the CLI?

BenjaminBossan · 2025-06-12T13:42:17Z

src/peft/tuners/lora/gptq.py

        )

+    def resolve_lora_variant(self, *, use_dora: bool, use_qalora: bool, **kwargs) -> Optional[LoraVariant]:
+        if use_dora and use_qalora:
+            NotImplementedError


raise NotImplementedError(...)

BenjaminBossan · 2025-06-12T13:45:02Z

src/peft/tuners/lora/gptq.py

+            from .variants import DoraLinearVariant
+            variant = DoraLinearVariant()
+        elif use_qalora:
+            if self.in_features % kwargs["qalora_group_size"] == 0:


Let's not perform the check here. Instead, let's move it inside QALoraLinearVariant.init.

This check can now be removed, right?

gapsong · 2025-06-14T17:23:48Z

BenjaminBossan · 2025-06-16T09:57:14Z

gapsong · 2025-06-16T21:16:04Z

BenjaminBossan

examples/qalora_finetuning/README.md

BenjaminBossan · 2025-06-17T14:12:15Z

examples/qalora_finetuning/README.md

+
+Run the finetuning script with a GPTQ quantized model:
+```bash
+python examples/qalora_finetuning/qalora_gptq_finetuning.py \


Question still open

BenjaminBossan · 2025-06-17T14:12:19Z

examples/qalora_finetuning/README.md

+    --qalora_group_size 8
+```
+
+QALoRA also works with different quantization methods (GPTQ, EETQ, AWQ, etc.):


Question still open

BenjaminBossan · 2025-06-17T14:12:33Z

examples/qalora_finetuning/README.md

+2. The QALoRA adapter weights are then merged with the dequantized model
+3. The merged model must be re-quantized if quantization is still desired
+
+This implementation choice was made because **it yielded better performance in practice**, despite being less memory-efficient than the direct modification approach described in the paper.


Question still open

BenjaminBossan · 2025-06-17T14:14:11Z

src/peft/tuners/lora/config.py

+        default=False,
+        metadata={
+            "help": (
+                "It is only implemented in GPTQ for now. Enable <a href='https://huggingface.co/papers/2309.14717'>Quantization-Aware Low-Rank Adaptation (QALoRA)</a>. This technique combines quantization-aware training "


Let's ensure that each line is 120 chars max.

BenjaminBossan · 2025-06-17T14:14:47Z

src/peft/tuners/lora/gptq.py

+            from .variants import DoraLinearVariant
+            variant = DoraLinearVariant()
+        elif use_qalora:
+            if self.in_features % kwargs["qalora_group_size"] == 0:


BenjaminBossan · 2025-06-17T14:16:13Z

tests/test_gpu_examples.py

-
-            # assert loss is not None
-            assert trainer.state.log_history[-1]["train_loss"] is not None
+    # In der PeftGPTQGPUTests Klasse hinzufügen:


BenjaminBossan · 2025-06-17T14:17:46Z

tests/test_gpu_examples.py

@@ -1168,201 +1168,32 @@ def test_initialize_dora_with_bnb_on_cpu(self, kbit):
        weights_not_cpu = [name for name, p in peft_model.named_parameters() if p.device != torch.device("cpu")]
        assert not weights_not_cpu

-    @pytest.mark.single_gpu_tests
-    def test_causal_lm_training_vera(self):


Hmm, it looks like a bunch of unrelated tests where moved around. I don't think that should be necessary and it also makes reviewing a lot harder. Could you please ensure that the existing tests remain in place and only new tests for GPTQ-QALoRA are added?

BenjaminBossan · 2025-06-18T14:06:08Z

gapsong · 2025-06-22T12:48:21Z

BenjaminBossan · 2025-06-23T09:18:23Z

gapsong · 2025-06-23T13:15:56Z

BenjaminBossan

BenjaminBossan · 2025-06-23T14:42:26Z

examples/qalora_finetuning/README.md

+    --learning_rate 3e-4 \
+    --cutoff_len 512 \
+    --use_qalora \
+    --qalora_group_size 32 \


Thanks for updating to a better default. Do you think that qalora_group_size=16 is still a good choice for the default value in LoraConfig?

BenjaminBossan · 2025-06-23T14:43:31Z

src/peft/tuners/lora/gptq.py

+            from .variants import DoraLinearVariant
+            variant = DoraLinearVariant()
+        elif use_qalora:
+            if self.in_features % kwargs["qalora_group_size"] == 0:


This check can now be removed, right?

BenjaminBossan · 2025-06-23T14:45:43Z

src/peft/tuners/lora/variants.py

+        """
+        if "qalora_group_size" not in kwargs:
+            raise ValueError(
+                "QALoraLinearVariant.init expects 'qalora_group_size' to be provided in kwargs."


Suggested change

"QALoraLinearVariant.init expects 'qalora_group_size' to be provided in kwargs."

"`use_qalora=True` requires 'qalora_group_size' to be provided in kwargs."

BenjaminBossan · 2025-06-23T14:46:09Z

src/peft/tuners/lora/variants.py

+
+        if module.in_features is not None and module.in_features % kwargs["qalora_group_size"] != 0:
+            raise ValueError(
+                f"QALoraLinearVariant.init expects module.in_features ({module.in_features}) to be divisible by 'qalora_group_size' ({kwargs['qalora_group_size']})"


Suggested change

f"QALoraLinearVariant.init expects module.in_features ({module.in_features}) to be divisible by 'qalora_group_size' ({kwargs['qalora_group_size']})"

f"`use_qalora=True` requires `module.in_features` ({module.in_features}) to be divisible by 'qalora_group_size' ({kwargs['qalora_group_size']})"

gapsong · 2025-06-23T15:52:08Z

gapsong · 2025-06-24T11:52:26Z

gapsong · 2025-06-24T13:35:34Z

BenjaminBossan

BenjaminBossan · 2025-06-24T15:24:03Z

tests/test_custom_models.py

@@ -2820,6 +2820,51 @@ def test_requires_grad_lora_different_targets(self):
            "base_model.model.lin1.lora_B.adapter1.weight",
        )

+    def test_requires_grad_qalora_same_targets(self):


Hmm, for some reason I haven't really noticed this test. IMO it is not necessary and can be removed. This is because QALoRA does not change the handling of multiple adapters, so there is not really a reason to believe there could be anything wrong there. We also don't check other quantization methods here. Unless you had a specific reason to add this test, I'd suggest to just remove it.

HuggingFaceDocBuilderDev · 2025-06-24T16:10:47Z

BenjaminBossan

BenjaminBossan · 2025-06-25T08:57:40Z

src/peft/tuners/lora/variants.py

+
+        if module.in_features is not None and module.in_features % kwargs["qalora_group_size"] != 0:
+            raise ValueError(
+                f"`use_qalora=True` requires `module.in_features` ({module.in_features}) to be divisible by 'qalora_group_size' ({kwargs['qalora_group_size']})"


Could you please add line breaks to this long line?

BenjaminBossan · 2025-06-25T08:58:12Z

src/peft/tuners/lora/variants.py

@@ -130,6 +130,101 @@ def forward(module: Linear, active_adapter: str, x: torch.Tensor, result: torch.
        return result


+class QALoraLinearVariant(LoraVariant):


Please move this class to the bottom of the file, as is it sits between different DoRA classes.

gapsong · 2025-06-25T09:56:11Z

BenjaminBossan · 2025-06-25T10:22:53Z

gapsong · 2025-06-25T12:30:49Z

BenjaminBossan · 2025-06-25T13:08:42Z

gapsong · 2025-06-26T06:19:53Z

BenjaminBossan

Add support for Quantization-Aware Low-Rank Adaptation (QALoRA) in Lo…

edc88cb

…raLayer and GPTQLoraLinear

gapsong mentioned this pull request Jun 4, 2025

QaLora #986

Closed

BenjaminBossan reviewed Jun 5, 2025

View reviewed changes

gapsong added 2 commits June 9, 2025 12:54

Refactor QALoraLinearVariant methods to raise NotImplementedError for…

c5f4bf9

… unsupported operations.

Fix dtype handling and use foward in GPTQLoraLinear forward method

af27857

BenjaminBossan requested changes Jun 11, 2025

View reviewed changes

Enhance QALoRA support by refining group size handling and error mana…

6ebf4e5

…gement in variants

BenjaminBossan requested changes Jun 12, 2025

View reviewed changes

Refactor QALoRA implementation by removing unused memory printing fun…

cad05f5

…ction, enhancing gradient checks in tests, and improving adapter handling in model training.

BenjaminBossan requested changes Jun 17, 2025

View reviewed changes

gapsong added 2 commits June 17, 2025 21:11

Refactor test for QALoRA with GPTQ quantization by improving indentat…

45c0c65

…ion and structure for better readability and maintainability.

Changed structure

4f8d336

gapsong added 4 commits June 21, 2025 11:38

Update README and config for QALoRA clarity; remove redundant line in…

1d76b08

… tests

Refactor README and training script for QALoRA: improve readability b…

75b0c56

…y breaking long lines, update dataset loading to use provided data path, and remove unused arguments.

Refactor README for QALoRA: remove outdated command examples and unne…

170eca8

…cessary parameters for clarity and conciseness.

Refactor train_model function in QALoRA script: add data_split parame…

268259b

…ter, update dataset loading, and remove unused parameters for improved clarity and functionality.

Update QALoRA examples and validation: change default group size to 3…

6cdb5cd

…2 in README and add validation for divisibility by group size in QALoraLinearVariant.

BenjaminBossan requested changes Jun 23, 2025

View reviewed changes

Refactor QALoraLinearVariant error messages for clarity and consisten…

8d4f9dd

…cy; streamline argument handling in training script.

Refactor forward method in QALoraLinearVariant: streamline weight acc…

2fd3f77

…ess, improve pooling logic, and enhance LoRA computation for clarity and efficiency. Memory peaks were really high

BenjaminBossan reviewed Jun 24, 2025

View reviewed changes

Remove redundant QALoRA gradient tests from RequiresGradTester for im…

8a08a10

…proved clarity and maintainability.

BenjaminBossan requested changes Jun 25, 2025

View reviewed changes

Moved QALoraLinearVariant to the bottom of variants.py

8d85f75

BenjaminBossan approved these changes Jun 26, 2025

View reviewed changes

BenjaminBossan merged commit e34852f into huggingface:main Jun 26, 2025
9 of 14 checks passed

BenjaminBossan added a commit to BenjaminBossan/peft that referenced this pull request Jun 27, 2025

FIX Update signature for resolve_lora_variant

85df7b8

The function signature was missing **kwargs, which results in a failure after merging huggingface#2571.

BenjaminBossan mentioned this pull request Jun 27, 2025

FIX Update signature for resolve_lora_variant #2618

Merged

BenjaminBossan added a commit that referenced this pull request Jun 27, 2025

FIX Update signature for resolve_lora_variant (#2618)

4562926

The function signature was missing **kwargs, which results in a failure after merging #2571.

Add support for Quantization-Aware Low-Rank Adaptation (QALoRA) #2571

Add support for Quantization-Aware Low-Rank Adaptation (QALoRA) #2571

Uh oh!

Conversation

Documentation Updates

Configuration Enhancements

Core Logic for QALoRA

QALoRA Variant Implementation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment