DistillKitPlus is an open-source toolkit for doing knowledge distillation (KLD). The repo was inspired by acree-ai/DistillKit. The main motivation behind the toolkit was to support offline distillation and PEFT for low computation resource settings.
- Logit Distillation: Supports same/cross tokenizer teacher and student models.
- Pre-Computed Logits: Enables memory-efficient training by generating logits in advance.
- LoRA Fine-Tuning Integration: Efficient low-rank adaptation fine-tuning support.
- Quantization Support: 4-bit model quantization for faster inference and reduced memory usage.
- Accelerate & DeepSpeed Integration: Support for distributed training with optimized memory usage.
LOSS TYPE | BEST FOR | SPECIAL REQUIREMENTS |
---|---|---|
KL Divergence (fkl, kld) | Same tokenizer distillation | None |
Universal Logit Distillation (uld) | Cross-tokenizer distillation | Requires teacher_labels |
Multi-Level Optimal Transport (multi-ot) | Cross-tokenizer distillation | Requires teacher_labels, additional parameters |
git clone https://github.com/agokrani/distillkitplus.git
cd distillkitplus
pip install -r requirements.txt
pip install .
-
Configure your distillation settings in
config/default_config.json
-
Generate teacher logits:
python scripts/local/generate_logits.py --config config/default_config.json
-
Run distillation:
Without Accelerate (default):
python scripts/local/distill_logits.py --config config/default_config.json
With Accelerate & DeepSpeed:
# Make sure to set "use_accelerate": true in your config file accelerate launch --config_file config/accelerate_configs/default_config.yaml scripts/local/distill_logits.py --config config/default_config.json
DistillKitPlus also supports running scripts using Modal. Follow the steps below to perform knowledge distillation with Modal.
Use the following commands with Modal:
- Generate teacher logits:
modal run scripts/modal/generate_logits.py --config config/default_config.json
- Run distillation:
modal run scripts/modal/distill_logits.py --config config/default_config.json
When using Modal, the accelerate configuration is handled internally based on your config file settings. Just set "use_accelerate": true
and specify "accelerate_config"
in the "execution"
section of your config file.
The toolkit uses a JSON configuration file with the following main sections:
project_name
: Name of your distillation projectdataset
: Dataset configuration including source and processing settingsmodels
: Teacher and student model specificationstokenizer
: Tokenizer settings including max length and paddingtraining
: Training hyperparametersdistillation
: Distillation-specific parameters (temperature, alpha)lora
: LoRA configuration for efficient fine-tuningquantization
: Model quantization settingsexecution
: Settings for accelerate and distributed training
See config/default_config.json
for a complete example.
We welcome contributions from the community! If you have ideas for improvements, new features, or bug fixes, please feel free to open an issue or submit a pull request.
For any technical questions or issues, please open an issue in this repository. We appreciate your feedback and support!