Official implementation of ACIP (Adaptive Compression by Iterative Pruning). Just give it a try with only 3 lines of code:
from transformers import AutoModel
model = AutoModel.from_pretrained("MerantixMomentum/ACIP-llama2-7b", trust_remote_code=True)
model.prune_model_by_score(size_ratio=0.5).compress()
See our project website for a quick overview of the ACIP algorithm or dive into the full details with our paper
Choose Your Model Size: Any Compression by a Single Gradient Descent
Martin Genzel*, Patrick Putzky*, Pengfei Zhao*, Sebastian Schulze, Mattes Mollenhauer, Robert Seidel, Stefan Dietzel, Thomas Wollmann (* equal contribution)
This work was developed at Merantix Momentum. If you are using it, please cite it.
The easiest way to get started with ACIP is to download a ready-for-use model from our Merantix Momentum π€ Hub.
For this, you don't have to clone this repo and only minimal dependencies are required (torch
, transformers
, peft
, and optionally, bitsandbytes
in case you want to quantize your model).
See acip/core/requirements.txt for pip-installable dependencies.
Just select any ACIP model and load it via from_pretrained
like this one:
from transformers import AutoModel
model = AutoModel.from_pretrained("MerantixMomentum/ACIP-llama2-7b", trust_remote_code=True)
This will download and create a fully parameterized ACIP model that can be pruned to any compression rate you wish. For example,
model.prune_model_by_score(size_ratio=0.4)
will prune model
to 40% if its original size measured in number of parameters, i.e., 60% compression rate.
A unique feature of ACIP is that this operation is revertible in the sense that you can rerun model.prune_model_by_score
as often as you like to evaluate your model at different sizes. Finally, you can "commit" to a certain ratio and run
model.compress()
which will discard all pruned mask values of compressible linear layers. Now the model is actually compressed and you should observe a significant decrease of memory usage (this step is not revertible without reloading the ACIP model). If you like, you can also run
model.quantize()
to save even more memory (we have only tested 4bit quantization with bitsandbytes
, but you could also customize this).
π That's it! You can now use your compressed model for inference or fine-tuning as any other Causal Language Model from π€ transformers.
βΉοΈ The parameter
size_ratio
ranges from 1.0 to 0.0, indicating the model size after compression. For example, 0.4 means that the model has only 40% of the original number of parameters and 1.0 means no compression at all. Alternatively, you can also setcompression_rate
inprune_model_by_score
, which is equivalent tosize_ratio = 1.0 - compression_rate
.
To run the ACIP code to compress or fine-tune your own model, please clone this repo:
git clone https://github.com/MerantixMomentum/acip.git
To install all dependencies, we recommend using uv with Python 3.11 as base interpreter (Python 3.12 should work as well). Once uv is set up, you can just run
uv sync
to install the requirements as well as the acip package (see pyproject.toml for details).
If you want to use a different package manager like Conda, you can also simply install all pinned dependencies from the provided requirements.txt.
βοΈ Custom environment variables are managed via dot-env. Before using the repo, please create a
.env
file from.env.example
and fill in the required values.
To try out ACIP on your own model, you can run the acip_compress entrypoint with
python -m acip.entrypoints.acip_compress model.base_model_name_or_path=<HF Repo or Local Path> model.identifier=<model name>
Here, base_model_name_or_path
is passed to PreTrainedModel.from_pretrained
to load the base model and identifier
specifies the run id and output directory name. You may omit identifier
, which will set it to "default"
.
ACIP will now run for a while and produce a prunable version of your base model, which is finally saved to an output directory (by default this is <Project Root>/artifacts/runs/default/compress_<model.identifier>/model
).
Next, you can revisit Quick Start and load your ACIP from disk via from_pretrained
β just replace the π€ Repo name with the local model output directory.
That's it! You can now work with your ACIP model as with the ones from our Hub.
There are of course many more options and tweaks we skipped here for simplicity. Please find more details on available ACIP Entrypoints and the underlying Code Design below.
To make the experiments from our paper as reproducible as possible, we have compiled all necessary Python run-commands in scripts/experiments_paper.sh. The corresponding Hydra configs of our experiments can be found in config/experiment/paper. Note that the finetuning runs and some ablations require a ready-to-use ACIP model as input. So you first need to perform the corresponding ACIP compression run (or load the model from our π€ Hub).
All entrypoints of the ACIP project are based on Hydra config management. We currently provide the following entrypoints:
- acip_compress: Runs the ACIP algorithm on a given base model to produce a prunable ACIP model.
- acip_finetune: Fine-tunes a given ACIP model with LoRA.
- acip_eval: Loads and evaluates a given ACIP model.
The basic CLI syntax to run these entrypoints is as follows:
python -m acip.entrypoints.<Entrypoint Name> <Hydra Config Args>
You have already seen a typical example above. Below, we outline what options you have for the Hydra Config Args in general and for each of the above entrypoints. For a detailed discussion of Hydra's basic override syntax, please see their docs.
The above-mentioned entrypoints all share the same base class, ACIPEntrypoint
, which is based on our MxM Scaffold package.
So all entrypoints basically run the same code but with different configurations, which are determined by the accompanying (structured) config class ACIPEntrypointConf
.
Technically, ACIPEntrypointConf
is just a dataclass-like container that aggregates all sub-configs required for the run.
Please see below for more details on the individual sub-configs and global overrides, which can be tweaked via the <Hydra Config Args>
.
βΉοΈοΈ All config arguments described below have sensible defaults, so that all overrides are fully optional. Moreover, we only focus on the most relevant arguments in this documentation. For even more information and docs, please use the links to navigate to the actual (sub-)config files.
οΈοΈβΉοΈ To explore and debug your entrypoint config, use
run.dry_run=true
, which will compile and print the full config of your experiment without running it.
π οΈ run
Basic information & config of the run. Important options are:
run.id
: Descriptive identifier for the run. Also determines the name of the output directory.run.group
: Group identifier for the run. By default, runs are grouped by theirmodel.identifier
,data.identifier
, andrun.series
.run.series
: Series identifier for the run, typically the name of an entire experiment series.run.path
: The output directory for the run. Defaults to<paths.run_dir>/<run.id>
.run.tags_custom
: List of additional tags for the run, which will be also used as W&B tags if applicable.run.dry_run
: Iftrue
, the entrypoint will not run the actual experiment but instead print the full config.run.save
: List of artifact types to save. Available options:config
,results
,models
.
π οΈ data
Configures the dataset (factories) for the entrypoint. Important options are:
- Currently available datasets:
data=c4
(default) anddata=wikitext2
. data.identifier
: Descriptive identifier for the dataset.data.train_dataset_factory.shuffle
: Whether to shuffle the train dataset or not. Similar options exist forval_dataset_factory
andtest_dataset_factory
.data.train_dataset_factory.seed
: Shuffle seed for the train dataset. By default it is set totraining.seed
.
π οΈ model
Configures the model factory and tokenizer factory for the entrypoint. The resulting ACIPModelFactory
is used to instantiate or load an ACIP model.
Important options are:
model.identifier
: Descriptive identifier for the base model.model.base_model_name_or_path
: Huggingface repo or local path pointing to the base model to be loaded and compressed by ACIP.model.ctx_length
: Context length to use for perplexity evaluation (see here).
β
model.base_model_name_or_path
is a required parameter to specify a base model. Instead of setting it manually, you can define or choose a base model config here and inject it by an override, e.g.,model/base@model=llama1_7b
.
Details on sub-configs:
model.model_factory
: Configures anACIPModelFactory
. Its key parameters are managed by the acip sub-config of each entrypoint.- The precise config of the ACIP model is defined in config/model/model_factory/acip_config, aggregating sub-configs for the base model, parametrization, adapters, and optional quantization (see also
ParametrizedModelConfig
). You can expand and modify them to your needs, but as formodel_factory
, the most important parameters are managed by the acip sub-config of each entrypoint. Also note that you can ignore these sub-configs if you load an ACIP model from disk or repo. model.tokenizer_factory
: Configures aTokenizerFactory
. By default, we use the pre-trained tokenizer associated with the base model, but you can also use a custom one like llama.yaml and inject it by an overridemodel/tokenizer_factory@model=llama
.
π οΈ training
Configures the training-related parts of the ACIP algorithm and optional fine-tuning, based on PyTorch Lightning. Important options are:
training.seed
: Global training seed used for PL's seed_everything and dataset factories.training.batch_size
: Batch size for train, val, and test dataloaders.training.log_every_n_train_steps
: Logging frequency of model monitoring while training. Set tonull
to disable.training.data_module
: Keyword arguments for the BaseDataModule.training.trainer
: Keyword arguments for the PL Trainer. Here, you can specify important training parameters (devices, train steps, precision, etc.).
Details on sub-configs:
training.objective
: Configures theObjective
to be optimized byBaseLitModule
. This sub-config is highly entrypoint-specific and selected by the (top-level) configs.training.optimizer_factory
: Configures the optimizer (factory) used byBaseLitModule
. As fortraining.objective
, this sub-config is highly entrypoint-specific and selected by the (top-level) configs.training.callbacks
: Following PL's best practices, we make use of several callbacks to flexibly extent the training process by additional functionality.training.callbacks
compiles a dictionary of all callbacks that will be passed to the PL Trainer. The injection of callbacks is managed by the (top-level) entrypoint configs and is organized in three different sub-classes:training.acip
: Schedules the ACIP algorithm and score map updates, see also here. Note: Only used by the acip_compress entrypoint and all key parameters are conveniently managed by the acip sub-config.training.monitoring
: Configures one or more callbacks that monitor the training process and important model characteristics (e.g., size ratio and gradient norms) with frequencytraining.log_every_n_train_steps
.training.benchmarking
: Configures one or more callbacks that benchmark the (ACIP) model at the beginning and end of training. Conceptually, these callbacks are similar totraining.monitoring
but can involve a more extensive evaluation that is not practical during training.
π οΈ eval
Helper sub-config that configures a dictionary collection of ModelEvaluator
instances that can be used to evaluate an (ACIP) model at any point in training. See config/eval/evaluator and the corresponding classes for details about the individual evaluators.
The configured evaluators are primarily used by the monitoring and benchmarking callbacks.
π οΈ wandb
Specifies the config for W&B logging. Important options are:
wandb.name
: Display name the W&B run, which is also used to generate a unique, but human-readable W&B run id. Defaults to<run.id>
.wandb.dir
: Local output directory for W&B logs. Defaults to/tmp/wandb
.- By default,
wandb.base_url
,wandb.entity
, andwandb.project
are set by the environment variablesWANDB_BASE_URL
,WANDB_ENTITY
, andWANDB_PROJECT
, respectively (see dot-env). - You can fully disable W&B logging by setting
wandb=null
.
π οΈ paths
Specifies the project root path and where to store artifacts (run outputs, models, datasets, cache, etc.). Important options are:
paths.artifact_dir
: Parent directory for all artifacts. Defaults to<path.root_dir>/artifacts
.paths.run_dir
: Parent directory for all run outputs. Defaults to<paths.artifact_dir>/runs
.paths.data_dir
: Parent directory for all local datasets. Defaults to<paths.artifact_dir>/data
.paths.cache_dir
: Parent directory for cache, in particular, HuggingFace (HF_HOME
is set via dot-env). Defaults to<paths.artifact_dir>/cache
.
π οΈ acip
This sub-config configures all ACIP-related parameters of a run. It is highly entrypoint-specific and managed by the (top-level) entrypoints configs. Please find more details on available tweaks and options of the individual entrypoints in the sections below.
π οΈ experiment
While the ACIP entrypoint configs set sensible defaults, they can be easily overwritten or modified by an experiment config to design a custom run. Each of these configs operates on the top-level (global) entrypoint config and can therefore override any parameter or sub-config. Here is a typical example that runs an experiments from our paper:
python -m acip.entrypoints.acip_compress experiment=paper/llama1_7b/compress
Please see config/experiment/paper many more examples of experiments. You may also provide a list of experiment configs to apply multiple overrides.
π οΈ options
Technically, this override follows the same syntax as experiment and they can be combined with each other. The purpose of "options", however, is to tweak specific parts of the entrypoint config, especially for debugging purposes. You can add you own config to config/options or choose one or more from the following list:
no_benchmarking
: Disable the benchmarking sub-config.no_monitoring
: Disable the monitoring sub-config.no_output
: Disable any output to files, no saving, and no W&B logging (stdout/stderr is still printed).verbose
: Set Hydra's job logging level toDEBUG
.
This is the central entrypoint to run the ACIP algorithm from our paper, see there for more conceptual details. For this particular entrypoint, the following options of the acip sub-config are important:
acip.stop_ratio
: Size ratio at which to stop ACIP.acip.post_tune_steps
: How many steps to continue optimizing the adapters after ACIP stopped.acip.lr
: Global learning rate for AdamW to tune the mask parameters and adapters.acip.test_ratios
: Size ratios at which to benchmark the final ACIP model. These results will be also save to the output directory.acip.quantize_weights
: If true, the U and V weights of the SVD parametrization will be quantized according to a quantization config.- The ACIP regularization parameter scheduler can be controlled through
reg_scheduler_start_weight
,reg_scheduler_update_every
, andreg_scheduler_update_factor
. acip.save.path
: Where to save the final ACIP model. Defaults to<run.path>/model
.
Relevant sub-config overrides:
- (Required)
model/base@model=...
: Specify a pre-configured base model to be compressed. Alternatively, you can overridemodel.base_model_name_or_path
like in our introductory example. training/monitoring@training=...
: Specify one or more monitoring callbacks.training/benchmarking@training=...
: Specify one or more benchmarking callbacks.storage@acip=...
: Ifacip_compress_compact
, only the mask parameters and adapters will be saved. This will save a lot of disk space, but requires to fully parametrize the initial ACIP model again when loading it. In that case, you have to setacip.load.init_empty_weights=false
in acip_finetune and acip_eval.data=...
: Specify a pre-configured dataset.
You can fine more examples of overrides in our paper experiments.
This is a complementary entrypoint that allows you to fine-tune an ACIP model obtained from acip_compress. Note that fine-tuning only concerns the adapters (LoRA parameters), not the mask parameters, which remain frozen. For this particular entrypoint, the following options of the acip sub-config are important:
acip.finetune_steps
: How many steps to fine-tune the adapters.acip.prune_to_ratio
: Size ratio to which the loaded ACIP model is to be pruned (and compressed). This operation is not revertible and you will obtain a fine-tuned ACIP model at this particular size ratio.acip.lr
: Global learning rate for AdamW to tune the adapters.acip.quantize_weights
: If true, the U and V weights of the SVD parametrization will be quantized according to a quantization config.acip.load.model_name_or_path
: Path to the ACIP model to be fine-tuned. Could also be an ACIP model from our π€ Hub.acip.save.path
: Where to save the final ACIP model. Defaults to<run.path>/model
. By default, only the mask parameters and adapters are stored to save disk space, see also storage override below.
β When loading a fine-tuned ACIP model from disk, you need to set
acip.load.init_empty_weights=false
.
Relevant sub-config overrides:
- (Required)
model/base@model=...
: Specify a pre-configured base model to be compressed. Note that the base model weights are implicitly loaded with the ACIP model, but this sub-config is still required to configure a suitable tokenizer (factory). training/objective@training=...
: Specify a customObjective
implementation.training/optimizer@training=...
: Specify a customOptimizerFactory
implementation.training/monitoring@training=...
: Specify one or more monitoring callbacks.training/benchmarking@training=...
: Specify one or more benchmarking callbacks.storage@acip=...
: Ifacip_compress_full
, the full fine-tuned ACIP is saved, which allows you to quickly load it from disk.data=...
: Specify a pre-configured dataset.
This is entrypoint is similar to acip_finetune, but only evaluates a ready-to-use ACIP model without any fine-tuning. The specific evaluation routine is controlled by the training/benchmarking
sub-config, see below.
For this particular entrypoint, the following options of the acip sub-config are important:
acip.prune_to_ratio
: Size ratio to which the loaded ACIP model is to be pruned. Ifnull
(default), the model is loaded as is.acip.test_ratios
: Size ratios at which to evaluate the ACIP model. These results will be also save to the output directory. Ifnull
, the model is only evaluated atacip.prune_to_ratio
.acip.compress_and_unparametrize
: Whether to actually compress the model (cannot be reverted). Activating this flag only makes sense ifacip.prune_to_ratio
is notnull
andacip.test_ratios=null
.acip.quantize_weights
: If true, the U and V weights of the SVD parametrization will be quantized according to a quantization config.acip.load.model_name_or_path
: Path to the ACIP model to be evaluated. Could also be an ACIP model from our π€ Hub.
β If the ACIP model was saved in compact format, i.e., only mask parameters and adapters were saved, you need to set
acip.load.init_empty_weights=false
.
Relevant sub-config overrides:
- (Required)
model/base@model=...
: Specify a pre-configured base model to be compressed. Note that the base model weights are implicitly loaded with the ACIP model, but this sub-config is still required to configure a suitable tokenizer (factory) used for evaluation. training/benchmarking@training=...
: Specify one or more benchmarking callbacks that will evaluate the loaded ACIP model.
βββ acip # ACIP package source code
β βββ core # Self-containted core package defining ACIPModel and required components
β βββ data # Factories to create ready-to-use datasets
β βββ entrypoints # ACIP entrypoints
β βββ eval # Model evaluators for monitoring and benchmarking
β βββ model # Factories for ACIP models and corresponding tokenizers
β βββ training # Training-related code (Lightning modules, optimizers, callbacks, etc.)
β βββ utils # Utility functions
βββ artifacts # Parent directory for all artifacts
β βββ cache # Cache directory, in particular for π€ artifacts
β βββ data # Local datasets
β βββ runs # Parent directory for all run outputs
βββ config # Hydra configs (subdirectories mirror the "acip" package structure)
βββ scripts # Utility scripts
βββ test # Unit tests
acip.core
is designed as a fully self-contained package:
- The central object of
acip.core
is theACIPModel
, which implements the central pruning functionality viaACIPModel.prune_model_by_score
and is the outcome of the ACIP algorithm. - Its base class is
ParametrizedModel
, which manages the underlying parametrization of an ACIP model. Moreover, it allows you to equip the model with π€-PEFT adapters and perform quantization if needed. - The parametrization mechanism itself is based on
Parametrization
, which enabled in-place modification of existing (linear) models layers.SVDLinearParametrization
is a child class that implements the SVD decomposition used in ACIP. - Both
ACIPModel
andParametrizedModel
are implemented asPreTrainedModel
, wrapping the base model to be parametrized and compressed. Each class is accompanied by a customPreTrainedConfig
, which fully configures the model, seeParametrizedModelConfig
and acip_config for more details. We followed π€'s custom model guide to make our ACIP models fully compatible with their API (from_pretrained
,save_pretrained
,push_to_hub
, etc.). In particular, an ACIP model should behave exactly as the underlying base model and inherit its I/O interface. The (parametrized) base model can be accessed viaACIPModel.model
.
Our training logic is based on PyTorch Lightning. BaseLitModule
implements the central LightningModule
, which provides a general training loop for any PreTrainedModel
(not just Causal Language Models):
BaseLitModule
requires anObjective
, which is responsible for performing the model forward pass and loss computation.BaseLitModule
builds its model and optimizer from a model factory and optimizer factory, respectively. This follows Lightning's conventions to configure these objects lazily via theconfigure_model
andconfigure_optimizers
hooks, making our code extendable to more advanced Parallel Strategies like FSDP.- Similarly,
BaseDataModule
builds its datsets and dataloaders from a dataset factory. - All custom and ACIP-specific functionality is implemented via Lightning Callbacks, which can be divided into three groups:
- ACIP: Implements training-related parts of the ACIP algorithm, like score map updates, regularization parameter scheduling, and post-tuning.
- Monitoring: Implements model monitoring during training, involving regular calls of different Model Evaluators.
- Benchmarking: Implements benchmarking of a model before and after training, involving (more expensive) calls of Model Evaluators.
βΉοΈ To dive deeper into the code, we recommend starting with
acip_entrypoint
, as it instantiates and manages all high level objects of an ACIP run.
- [2025-04-22] Released ACIP paper code.
- [2025-04-15] Shared all ACIP models on π€ Hub.
Feel free to reach out to us via GH issues or email!
martin.genzel at merantix-momentum dot com
patrick.putzky at merantix-momentum dot com
This project is released under the Apache 2.0 license. Please see the LICENSE file for more information.
When using or referring to this project, please cite our paper:
@article{mxm2025acip,
title={Choose Your Model Size: Any Compression by a Single Gradient Descent},
author={M. Genzel, P. Putzky, P. Zhao, S. Schulze, M. Mollenhauer, R. Seidel, S. Dietzel, T. Wollmann},
year={2025},
journal={Preprint arXiv:2502.01717}
}