8000 GitHub - merantix-momentum/acip: πŸ—œοΈCodebase of the ACIP algorithm πŸ—œοΈ
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

merantix-momentum/acip

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

logo

arxiv website models llama llama llama license

Compressing Large Language Models as Intuitively as Images

Official implementation of ACIP (Adaptive Compression by Iterative Pruning). Just give it a try with only 3 lines of code:

from transformers import AutoModel

model = AutoModel.from_pretrained("MerantixMomentum/ACIP-llama2-7b", trust_remote_code=True)
model.prune_model_by_score(size_ratio=0.5).compress()

See our project website for a quick overview of the ACIP algorithm or dive into the full details with our paper

Choose Your Model Size: Any Compression by a Single Gradient Descent
Martin Genzel*, Patrick Putzky*, Pengfei Zhao*, Sebastian Schulze, Mattes Mollenhauer, Robert Seidel, Stefan Dietzel, Thomas Wollmann (* equal contribution)

This work was developed at Merantix Momentum. If you are using it, please cite it.

Getting Started

Quick Start

The easiest way to get started with ACIP is to download a ready-for-use model from our Merantix Momentum πŸ€— Hub. For this, you don't have to clone this repo and only minimal dependencies are required (torch, transformers, peft, and optionally, bitsandbytes in case you want to quantize your model). See acip/core/requirements.txt for pip-installable dependencies.

Just select any ACIP model and load it via from_pretrained like this one:

from transformers import AutoModel

model = AutoModel.from_pretrained("MerantixMomentum/ACIP-llama2-7b", trust_remote_code=True)

This will download and create a fully parameterized ACIP model that can be pruned to any compression rate you wish. For example,

model.prune_model_by_score(size_ratio=0.4)

will prune model to 40% if its original size measured in number of parameters, i.e., 60% compression rate. A unique feature of ACIP is that this operation is revertible in the sense that you can rerun model.prune_model_by_score as often as you like to evaluate your model at different sizes. Finally, you can "commit" to a certain ratio and run

model.compress()

which will discard all pruned mask values of compressible linear layers. Now the model is actually compressed and you should observe a significant decrease of memory usage (this step is not revertible without reloading the ACIP model). If you like, you can also run

model.quantize()

to save even more memory (we have only tested 4bit quantization with bitsandbytes, but you could also customize this).

πŸš€ That's it! You can now use your compressed model for inference or fine-tuning as any other Causal Language Model from πŸ€— transformers.

ℹ️ The parameter size_ratio ranges from 1.0 to 0.0, indicating the model size after compression. For example, 0.4 means that the model has only 40% of the original number of parameters and 1.0 means no compression at all. Alternatively, you can also set compression_rate in prune_model_by_score, which is equivalent to size_ratio = 1.0 - compression_rate.

Installation

To run the ACIP code to compress or fine-tune your own model, please clone this repo:

git clone https://github.com/MerantixMomentum/acip.git

To install all dependencies, we recommend using uv with Python 3.11 as base interpreter (Python 3.12 should work as well). Once uv is set up, you can just run

uv sync

to install the requirements as well as the acip package (see pyproject.toml for details).

If you want to use a different package manager like Conda, you can also simply install all pinned dependencies from the provided requirements.txt.

❗️ Custom environment variables are managed via dot-env. Before using the repo, please create a .env file from .env.example and fill in the required values.

Running ACIP

To try out ACIP on your own model, you can run the acip_compress entrypoint with

python -m acip.entrypoints.acip_compress model.base_model_name_or_path=<HF Repo or Local Path> model.identifier=<model name>

Here, base_model_name_or_path is passed to PreTrainedModel.from_pretrained to load the base model and identifier specifies the run id and output directory name. You may omit identifier, which will set it to "default".

ACIP will now run for a while and produce a prunable version of your base model, which is finally saved to an output directory (by default this is <Project Root>/artifacts/runs/default/compress_<model.identifier>/model).

Next, you can revisit Quick Start and load your ACIP from disk via from_pretrained β€” just replace the πŸ€— Repo name with the local model output directory. That's it! You can now work with your ACIP model as with the ones from our Hub.

There are of course many more options and tweaks we skipped here for simplicity. Please find more details on available ACIP Entrypoints and the underlying Code Design below.

Paper Experiments

To make the experiments from our paper as reproducible as possible, we have compiled all necessary Python run-commands in scripts/experiments_paper.sh. The corresponding Hydra configs of our experiments can be found in config/experiment/paper. Note that the finetuning runs and some ablations require a ready-to-use ACIP model as input. So you first need to perform the corresponding ACIP compression run (or load the model from our πŸ€— Hub).

Advanced Usage

ACIP Entrypoints

All entrypoints of the ACIP project are based on Hydra config management. We currently provide the following entrypoints:

The basic CLI syntax to run these entrypoints is as follows:

python -m acip.entrypoints.<Entrypoint Name> <Hydra Config Args>

You have already seen a typical example above. Below, we outline what options you have for the Hydra Config Args in general and for each of the above entrypoints. For a detailed discussion of Hydra's basic override syntax, please see their docs.

General Config & Tweaks

The above-mentioned entrypoints all share the same base class, ACIPEntrypoint, which is based on our MxM Scaffold package. So all entrypoints basically run the same code but with different configurations, which are determined by the accompanying (structured) config class ACIPEntrypointConf. Technically, ACIPEntrypointConf is just a dataclass-like container that aggregates all sub-configs required for the run. Please see below for more details on the individual sub-configs and global overrides, which can be tweaked via the <Hydra Config Args>.

ℹ️️ All config arguments described below have sensible defaults, so that all overrides are fully optional. Moreover, we only focus on the most relevant arguments in this documentation. For even more information and docs, please use the links to navigate to the actual (sub-)config files.

️️ℹ️ To explore and debug your entrypoint config, use run.dry_run=true, which will compile and print the full config of your experiment without running it.

Entrypoint Sub-Configs

πŸ› οΈ run

Basic information & config of the run. Important options are:

  • run.id: Descriptive identifier for the run. Also determines the name of the output directory.
  • run.group: Group identifier for the run. By default, runs are grouped by their model.identifier, data.identifier, and run.series.
  • run.series: Series identifier for the run, typically the name of an entire experiment series.
  • run.path: The output directory for the run. Defaults to <paths.run_dir>/<run.id>.
  • run.tags_custom: List of additional tags for the run, which will be also used as W&B tags if applicable.
  • run.dry_run: If true, the entrypoint will not run the actual experiment but instead print the full config.
  • run.save: List of artifact types to save. Available options: config, results, models.
πŸ› οΈ data

Configures the dataset (factories) for the entrypoint. Important options are:

  • Currently available datasets: data=c4 (default) and data=wikitext2.
  • data.identifier: Descriptive identifier for the dataset.
  • data.train_dataset_factory.shuffle: Whether to shuffle the train dataset or not. Similar options exist for val_dataset_factory and test_dataset_factory.
  • data.train_dataset_factory.seed: Shuffle seed for the train dataset. By default it is set to training.seed.
πŸ› οΈ model

Configures the model factory and tokenizer factory for the entrypoint. The resulting ACIPModelFactory is used to instantiate or load an ACIP model. Important options are:

  • model.identifier: Descriptive identifier for the base model.
  • model.base_model_name_or_path: Huggingface repo or local path pointing to the base model to be loaded and compressed by ACIP.
  • model.ctx_length: Context length to use for perplexity evaluation (see here).

❗ model.base_model_name_or_path is a required parameter to specify a base model. Instead of setting it manually, you can define or choose a base model config here and inject it by an override, e.g., model/base@model=llama1_7b.

Details on sub-configs:

πŸ› οΈ training

Configures the training-related parts of the ACIP algorithm and optional fine-tuning, based on PyTorch Lightning. Important options are:

  • training.seed: Global training seed used for PL's seed_everything and dataset factories.
  • training.batch_size: Batch size for train, val, and test dataloaders.
  • training.log_every_n_train_steps: Logging frequency of model monitoring while training. Set to null to disable.
  • training.data_module: Keyword arguments for the BaseDataModule.
  • training.trainer: Keyword arguments for the PL Trainer. Here, you can specify important training parameters (devices, train steps, precision, etc.).

Details on sub-configs:

  • training.objective: Configures the Objective to be optimized by BaseLitModule. This sub-config is highly entrypoint-specific and selected by the (top-level) configs.
  • training.optimizer_factory: Configures the optimizer (factory) used by BaseLitModule. As for training.objective, this sub-config is highly entrypoint-specific and selected by the (top-level) configs.
  • training.callbacks: Following PL's best practices, we make use of several callbacks to flexibly extent the training process by additional functionality. training.callbacks compiles a dictionary of all callbacks that will be passed to the PL Trainer. The injection of callbacks is managed by the (top-level) entrypoint configs and is organized in three different sub-classes:
    • training.acip: Schedules the ACIP algorithm and score map updates, see also here. Note: Only used by the acip_compress entrypoint and all key parameters are conveniently managed by the acip sub-config.
    • training.monitoring: Configures one or more callbacks that monitor the training process and important model characteristics (e.g., size ratio and gradient norms) with frequency training.log_every_n_train_steps.
    • training.benchmarking: Configures one or more callbacks that benchmark the (ACIP) model at the beginning and end of training. Conceptually, these callbacks are similar to training.monitoring but can involve a more extensive evaluation that is not practical during training.
πŸ› οΈ eval

Helper sub-config that configures a dictionary collection of ModelEvaluator instances that can be used to evaluate an (ACIP) model at any point in training. See config/eval/evaluator and the corresponding classes for details about the individual evaluators.

The configured evaluators are primarily used by the monitoring and benchmarking callbacks.

πŸ› οΈ wandb

Specifies the config for W&B logging. Important options are:

  • wandb.name: Display name the W&B run, which is also used to generate a unique, but human-readable W&B run id. Defaults to <run.id>.
  • wandb.dir: Local output directory for W&B logs. Defaults to /tmp/wandb.
  • By default, wandb.base_url, wandb.entity, and wandb.project are set by the environment variables WANDB_BASE_URL, WANDB_ENTITY, and WANDB_PROJECT, respectively (see dot-env).
  • You can fully disable W&B logging by setting wandb=null.
πŸ› οΈ paths

Specifies the project root path and where to store artifacts (run outputs, models, datasets, cache, etc.). Important options are:

  • paths.artifact_dir: Parent directory for all artifacts. Defaults to <path.root_dir>/artifacts.
  • paths.run_dir: Parent directory for all run outputs. Defaults to <paths.artifact_dir>/runs.
  • paths.data_dir: Parent directory for all local datasets. Defaults to <paths.artifact_dir>/data.
  • paths.cache_dir: Parent directory for cache, in particular, HuggingFace (HF_HOME is set via dot-env). Defaults to <paths.artifact_dir>/cache.
πŸ› οΈ acip

This sub-config configures all ACIP-related parameters of a run. It is highly entrypoint-specific and managed by the (top-level) entrypoints configs. Please find more details on available tweaks and options of the individual entrypoints in the sections below.

Global Overrides

πŸ› οΈ experiment

While the ACIP entrypoint configs set sensible defaults, they can be easily overwritten or modified by an experiment config to design a custom run. Each of these configs operates on the top-level (global) entrypoint config and can therefore override any parameter or sub-config. Here is a typical example that runs an experiments from our paper:

python -m acip.entrypoints.acip_compress experiment=paper/llama1_7b/compress

Please see config/experiment/paper many more examples of experiments. You may also provide a list of experiment configs to apply multiple overrides.

πŸ› οΈ options

Technically, this override follows the same syntax as experiment and they can be combined with each other. The purpose of "options", however, is to tweak specific parts of the entrypoint config, especially for debugging purposes. You can add you own config to config/options or choose one or more from the following list:

  • no_benchmarking: Disable the benchmarking sub-config.
  • no_monitoring: Disable the monitoring sub-config.
  • no_output: Disable any output to files, no saving, and no W&B logging (stdout/stderr is still printed).
  • verbose: Set Hydra's job logging level to DEBUG.

This is the central entrypoint to run the ACIP algorithm from our paper, see there for more conceptual details. For this particular entrypoint, the following options of the acip sub-config are important:

  • acip.stop_ratio: Size ratio at which to stop ACIP.
  • acip.post_tune_steps: How many steps to continue optimizing the adapters after ACIP stopped.
  • acip.lr: Global learning rate for AdamW to tune the mask parameters and adapters.
  • acip.test_ratios: Size ratios at which to benchmark the final ACIP model. These results will be also save to the output directory.
  • acip.quantize_weights: If true, the U and V weights of the SVD parametrization will be quantized according to a quantization config.
  • The ACIP regularization parameter scheduler can be controlled through reg_scheduler_start_weight, reg_scheduler_update_every, and reg_scheduler_update_factor.
  • acip.save.path: Where to save the final ACIP model. Defaults to <run.path>/model.

Relevant sub-config overrides:

You can fine more examples of overrides in our paper experiments.

This is a complementary entrypoint that allows you to fine-tune an ACIP model obtained from acip_compress. Note that fine-tuning only concerns the adapters (LoRA parameters), not the mask parameters, which remain frozen. For this particular entrypoint, the following options of the acip sub-config are important:

  • acip.finetune_steps: How many steps to fine-tune the adapters.
  • acip.prune_to_ratio: Size ratio to which the loaded ACIP model is to be pruned (and compressed). This operation is not revertible and you will obtain a fine-tuned ACIP model at this particular size ratio.
  • acip.lr: Global learning rate for AdamW to tune the adapters.
  • acip.quantize_weights: If true, the U and V weights of the SVD parametrization will be quantized according to a quantization config.
  • acip.load.model_name_or_path: Path to the ACIP model to be fine-tuned. Could also be an ACIP model from our πŸ€— Hub.
  • acip.save.path: Where to save the final ACIP model. Defaults to <run.path>/model. By default, only the mask parameters and adapters are stored to save disk space, see also storage override below.

❗ When loading a fine-tuned ACIP model from disk, you need to set acip.load.init_empty_weights=false.

Relevant sub-config overrides:

This is entrypoint is similar to acip_finetune, but only evaluates a ready-to-use ACIP model without any fine-tuning. The specific evaluation routine is controlled by the training/benchmarking sub-config, see below. For this particular entrypoint, the following options of the acip sub-config are important:

  • acip.prune_to_ratio: Size ratio to which the loaded ACIP model is to be pruned. If null (default), the model is loaded as is.
  • acip.test_ratios: Size ratios at which to evaluate the ACIP model. These results will be also save to the output directory. If null, the model is only evaluated at acip.prune_to_ratio.
  • acip.compress_and_unparametrize: Whether to actually compress the model (cannot be reverted). Activating this flag only makes sense if acip.prune_to_ratio is not null and acip.test_ratios=null.
  • acip.quantize_weights: If true, the U and V weights of the SVD parametrization will be quantized according to a quantization config.
  • acip.load.model_name_or_path: Path to the ACIP model to be evaluated. Could also be an ACIP model from our πŸ€— Hub.

❗ If the ACIP model was saved in compact format, i.e., only mask parameters and adapters were saved, you need to set acip.load.init_empty_weights=false.

Relevant sub-config overrides:

  • (Required) model/base@model=...: Specify a pre-configured base model to be compressed. Note that the base model weights are implicitly loaded with the ACIP model, but this sub-config is still required to configure a suitable tokenizer (factory) used for evaluation.
  • training/benchmarking@training=...: Specify one or more benchmarking callbacks that will evaluate the loaded ACIP model.

Code Structure & Design

β”œβ”€β”€ acip             # ACIP package source code
β”‚   β”œβ”€β”€ core         # Self-containted core package defining ACIPModel and required components
β”‚   β”œβ”€β”€ data         # Factories to create ready-to-use datasets
β”‚   β”œβ”€β”€ entrypoints  # ACIP entrypoints
β”‚   β”œβ”€β”€ eval         # Model evaluators for monitoring and benchmarking
β”‚   β”œβ”€β”€ model        # Factories for ACIP models and corresponding tokenizers
β”‚   β”œβ”€β”€ training     # Training-related code (Lightning modules, optimizers, callbacks, etc.)
β”‚   └── utils        # Utility functions
β”œβ”€β”€ artifacts        # Parent directory for all artifacts
β”‚   β”œβ”€β”€ cache        # Cache directory, in particular for πŸ€— artifacts
β”‚   β”œβ”€β”€ data         # Local datasets
β”‚   └── runs         # Parent directory for all run outputs
β”œβ”€β”€ config           # Hydra configs (subdirectories mirror the "acip" package structure)
β”œβ”€β”€ scripts          # Utility scripts
└── test             # Unit tests

Important Design Concepts

acip.core is designed as a fully self-contained package:

  • The central object of acip.core is the ACIPModel, which implements the central pruning functionality via ACIPModel.prune_model_by_score and is the outcome of the ACIP algorithm.
  • Its base class is ParametrizedModel, which manages the underlying parametrization of an ACIP model. Moreover, it allows you to equip the model with πŸ€—-PEFT adapters and perform quantization if needed.
  • The parametrization mechanism itself is based on Parametrization, which enabled in-place modification of existing (linear) models layers. SVDLinearParametrization is a child class that implements the SVD decomposition used in ACIP.
  • Both ACIPModel and ParametrizedModel are implemented as PreTrainedModel, wrapping the base model to be parametrized and compressed. Each class is accompanied by a custom PreTrainedConfig, which fully configures the model, see ParametrizedModelConfig and acip_config for more details. We followed πŸ€—'s custom model guide to make our ACIP models fully compatible with their API (from_pretrained, save_pretrained, push_to_hub, etc.). In particular, an ACIP model should behave exactly as the underlying base model and inherit its I/O interface. The (parametrized) base model can be accessed via ACIPModel.model.

Our training logic is based on PyTorch Lightning. BaseLitModule implements the central LightningModule, which provides a general training loop for any PreTrainedModel (not just Causal Language Models):

  • BaseLitModule requires an Objective, which is responsible for performing the model forward pass and loss computation.
  • BaseLitModule builds its model and optimizer from a model factory and optimizer factory, respectively. This follows Lightning's conventions to configure these objects lazily via the configure_model and configure_optimizers hooks, making our code extendable to more advanced Parallel Strategies like FSDP.
  • Similarly, BaseDataModule builds its datsets and dataloaders from a dataset factory.
  • All custom and ACIP-specific functionality is implemented via Lightning Callbacks, which can be divided into three groups:
    1. ACIP: Implements training-related parts of the ACIP algorithm, like score map updates, regularization parameter scheduling, and post-tuning.
    2. Monitoring: Implements model monitoring during training, involving regular calls of different Model Evaluators.
    3. Benchmarking: Implements benchmarking of a model before and after training, involving (more expensive) calls of Model Evaluators.

ℹ️ To dive deeper into the code, we recommend starting with acip_entrypoint, as it instantiates and manages all high level objects of an ACIP run.

Updates

  • [2025-04-22] Released ACIP paper code.
  • [2025-04-15] Shared all ACIP models on πŸ€— Hub.

Contact

Feel free to reach out to us via GH issues or email!
martin.genzel at merantix-momentum dot com
patrick.putzky at merantix-momentum dot com

License

This project is released under the Apache 2.0 license. Please see the LICENSE file for more information.

Citation

When using or referring to this project, please cite our paper:

@article{mxm2025acip,
  title={Choose Your Model Size: Any Compression by a Single Gradient Descent}, 
  author={M. Genzel, P. Putzky, P. Zhao, S. Schulze, M. Mollenhauer, R. Seidel, S. Dietzel, T. Wollmann},
  year={2025},
  journal={Preprint arXiv:2502.01717}
}

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  
0