8000 GitHub - ZLKong/Awesome-Collection-Token-Reduction: A collection of token reduction (token pruning, merging, clustering, etc.) techniques for ML/AI
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

ZLKong/Awesome-Collection-Token-Reduction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

87 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

⭐ Awesome Token Pruning / Compression / Reduction Awesome Token Reduction Papers

πŸ“š This repository contains a list of recent papers on token reduction (token pruning, merging, clustering, compressing, etc.) for ML/AI; we categorize them based on their year and application scenarios.

πŸ‘€ If you found any errors or missing papers, please don't hesitate to open an issue or pull request. We invite your participation in advancing this field.

πŸ“’ News

  • 2025/03/24 Added CVPR 2025, ICLR 2025, WACV 2025, AAAI 2025, EMNLP 2024

Table of Contents

A detailed list of papers organized by modality can be found in this Google Sheet, including a brief introduction of the task, token reduction type, contribution, and methodology for each paper (update weekly).

🌁 Vision

2025

  • [ICML'25] Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMM [Paper]
  • [ICME'25] Sparsedm: Toward sparse efficient diffusion models [Paper]
  • [CVPR'25] Faster Parameter-Efficient Tuning with Token Redundancy Reduction [Paper] [Code]
  • [CVPR'25] AdaCM2: On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction [Paper]
  • [CVPR'25] Token Cropr: Faster ViTs for Quite a Few Tasks [Paper]
  • [CVPR'25] Attend to Not Attended: Structure-then-Detail Token Merging for Post-training DiT Acceleration [Paper] [Code]
  • [CVPR'25] MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization [Paper] [Code]
  • [CVPR'25] Rethinking Token Reduction with Parameter-Efficient Fine-Tuning in ViT for Pixel-Level Tasks
  • [CVPR'25] CATANet: Efficient Content-Aware Token Aggregation for Lightweight Image Super-Resolution [Paper]
  • [CVPR'25] VASparse: Towards Efficient Visual Hallucination Mitigation via Visual-Aware Token Sparsification [Paper] [Code]
  • [CVPR'25] Faster Parameter-Efficient Tuning with Token Redundancy Reduction [Paper]
  • [ICLR'25] Accelerating Diffusion Transformers with Token-wise Feature Caching [Paper] [Code]
  • [ICLR'25] Mutual Effort for Efficiency: A Similarity-based Token Pruning for Vision Transformers in Self-Supervised Learning [Paper]
  • [ICLR'25] Dynamic diffusion transformer [Paper] [Code]
  • [WACV'25] Pruning One More Token is Enough: Leveraging Latency-Workload Non-Linearities for Vision Transformers on the Edge [Paper]
  • [ICASSP'25] Pruning then reweighting: Towards data-efficient training of diffusion models [Paper] [Code]
  • [AAAI'25] FreqTS: Frequency-Aware Token Selection for Accelerating Diffusion Models [Paper]
  • [AAAI'25] Multimodal Promptable Token Merging for Diffusion Models [Paper]
  • [AAAI'25] Training-free and hardware-friendly acceleration for diffusion models via similarity-based token pruning [Paper] [Code]
  • [arXiv] Token Transforming: A Unified and Training-Free Token Compression Framework for Vision Transformer Acceleration [Paper] [Code]
  • [arXiv] Pyramid Sparse Transformer: Efficient Multi-Scale Feature Fusion with Dynamic Token Selection [Paper] [Code]
  • [arXiv] Cached Adaptive Token Merging: Dynamic Token Reduction and Redundant Computation Elimination in Diffusion Model [Paper] [Code]
  • [arXiv] Layer-and Timestep-Adaptive Differentiable Token Compression Ratios for Efficient Diffusion Transformers [Paper] [Code]
  • [arXiv] UniCP: A Unified Caching and Pruning Framework for Efficient Video Generation [Paper]
  • [arXiv] CAT Pruning: Cluster-Aware Token Pruning For Text-to-Image Diffusion Models [Paper] [Code]
  • [arXiv] Concise Reasoning, Big Gains: Pruning Long Reasoning Trace with Difficulty-Aware Prompting [Paper] [Code]

2024

  • [NeurIPS'24] Accelerating Transformers with Spectrum-Preserving Token Merging [Paper]
  • [NeurIPS'24] Video Token Merging for Long Video Understanding [Paper]
  • [NeurIPS'24] Don't Look Twice: Faster Video Transformers with Run-Length Tokenization [Paper] [Code]
  • [NeurIPSW'24] M2M-TAG: Training-Free Many-to-Many Token Aggregation for Vision Transformer Acceleration [Paper] [Code]
  • [ECCV'24] Agglomerative Token Clustering [Paper] [Code]
  • [ECCV'24] Token Compensator: Altering Inference Cost of Vision Transformer without Re-Tuning [Paper] [Code]
  • [ECCV'24] LookupViT: Compressing visual information to a limited number of tokens [Paper]
  • [ECCV'24] PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient Task Adaptation [Paper] [Code]
  • [ECCV'24] Turbo: Informativity-driven acceleration plug-in for vision-language large models [Paper]
  • [ECCV'24] Object-centric diffusion for efficient video editing [Paper]
  • [ECCV'24] Leveraging temporal contextualization for video action recognition [Paper] [Code]
  • [IJCAI'24] ToDo: token downsampling for efficient generation of high-resolution images [Paper]
  • [CVPR'24] Attention-driven training-free efficiency enhancement of diffusion models [Paper]
  • [CVPR'24] vid-TLDR: Training Free Token Merging for Light-weight Video Transformer [Paper] [Code]
  • [CVPR'24] Vidtome: Video token merging for zero-shot video editing [Paper] [Code]
  • [CVPR'24] Zero-TPrune: Zero-Shot Token Pruning through Leveraging of the Attention Graph in Pre-Trained Transformers [Paper] [Code]
  • [ICLR'24] Synergistic Patch Pruning for Vision Transformer: Unifying Intra- & Inter-Layer Patch Importance [Paper]
  • [WACV'24] Token Fusion: Bridging the Gap Between Token Pruning and Token Merging [Paper]
  • [WACV'24] Revisiting Token Pruning for Object Detection and Instance Segmentation [Paper] [Code]
  • [arXiv] Token Pruning for Caching Better: 9 Times Acceleration on Stable Diffusion for Free [Paper]
  • [arXiv] Vote&Mix: Plug-and-Play Token Reduction for Efficient Vision Transformer [Paper]
  • [arXiv] Dynamic and Compressive Adaptation of Transformers From Images to Videos [Paper]
  • [arXiv] Importance-based Token Merging for Diffusion Models [Paper]
  • [arXiv] AsymRnR: Video Diffusion Transformers Acceleration with Asymmetric Reduction and Restoration [Paper] [Code]
  • [arXiv] Token Caching for Diffusion Transformer Acceleration [Paper]
  • [arXiv] FlexDiT: Dynamic Token Density Control for Diffusion Transformer [Paper] [Code]
  • [arXiv] Principles of Visual Tokens for Efficient Video Understanding [Paper]

2023

  • [EMNLP'23] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding [Paper] [Code]
  • [ICCV'23] Dynamic Token Pruning in Plain Vision Transformers for Semantic Segmentation [Paper] [Code]
  • [ICCV'23] DiffRate: Differentiable Compression Rate for Efficient Vision Transformers [Paper] [Code]
  • [ICCV'23] TORE: Token Reduction for Efficient Human Mesh Recovery with Transformer [Paper] [Code]
  • [ICCV'23] Prune spatio-temporal tokens by semantic-aware temporal accumulation [Paper] [Code]
  • [ICCV'23] Efficient Video Action Detection with Token Dropout and Context Refinement [Paper] [Code]
  • [ICCV'23 Workshop] Which Tokens to Use? Investigating Token Reduction in Vision Transformers [Paper] [Code]
  • [CVPR'23] Joint Token Pruning and Squeezing Towards More Aggressive Compression of Vision Transformers [Paper] [Code]
  • [CVPRW'23] Token merging for fast stable diffusion [Paper] [Code]
  • [ICLR'23] Token Merging: Your ViT But Faster [Paper] [Code]
  • [IJCAI'23] Adaptive Sparse ViT: Towards Learnable Adaptive Token Pruning by Fully Exploiting Self-Attention [Paper] [Code]
  • [TIP] Efficient Vision Transformer via Token Merger [Paper]
  • [arXiv] PPT: Token Pruning and Pooling for Efficient Vision Transformers [Paper] [Code]

2022

  • [ECCV'22] SPViT: Enabling Faster Vision Transformers via Latency-aware Soft Token Pruning [Paper] [Code]
  • [ECCV'22] ATS: Adaptive Token Sampling For Efficient Vision Transformers [Paper] [Code]
  • [ECCV'22] PPT: token-Pruned Pose Transformer for monocular and multi-view human pose estimation [Paper] [Code]
  • [ECCV'22] Ts2-net: Token shift and selection transformer for text-video retrieval [Paper]
  • [ECCV'22] Efficient video transformers with spatial-temporal token selection [Paper] [Code]
  • [CVPR'22] Vision Transformer Slimming: Multi-Dimension Searching in Continuous Optimization Space [Paper] [Code]
  • [CVPR'22] Patch Slimming for Efficient Vision Transformers [Paper]
  • [CVPR'22] A-ViT: Adaptive Tokens for Efficient Vision Transformer [Paper] [Code]
  • [ICLR'22] EViT: Expediting Vision Transformers via Token Reorganizations [Paper] [Code]
  • [AAAI'22] Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer [Paper] [Code]

2021

  • [NeurIPS'21] IA-RED2: Interpretability-Aware Redundancy Reduction for Vision Transformers [Paper]
  • [NeurIPS'21] Tokenlearner: Adaptive space-time tokenization for videos [Paper] [Code]
  • [NeurIPS'21] DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification [Paper] [Code]

πŸ“ Language

2025

  • [arXiv] Thinkless: LLM Learns When to Think [Paper] [Code]
  • [arXiv] LightThinker: Thinking Step-by-Step Compression [Paper] [Code]
  • [arXiv] TL;DR: Too Long, Do Re-weighting for Effcient LLM Reasoning Compression [Paper] [Code]
  • [arXiv] EPiC: Towards Lossless Speedup for Reasoning Training through Edge-Preserving CoT Condensation [Paper] [Code]
  • [arXiv] Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning [Paper]
  • [arXiv] TokenSkip: Controllable Chain-of-Thought Compression in LLMs [Paper] [Code]
  • [arXiv] ThinkPrune: Pruning Long Chain-of-Thought of LLMs via Reinforcement Learning [Paper] [Code]
  • [ACL'25] Token-Budget-Aware LLM Reasoning [Paper] [Code]
  • [ACL'25] Accurate KV Cache Quantization with Outlier Tokens Tracing [Paper] [Code]
  • [NAACL'25] S2-MAD: Breaking the Token Barrier to Enhance Multi-Agent Debate Efficiency [Paper]
  • [ICLR'25] MrT5: Dynamic Token Merging for Efficient Byte-level Language Models [Paper] [Code]
  • [KAIS] Dynamic token pruning for LLMs: leveraging task-specific attention and adaptive thresholds [Paper] [Code]

2024

  • [NeurIPS'24] Fast Best-of-N Decoding via Speculative Rejection [Paper] [Code]
  • [EMNLP'24] Attention Score is not All You Need for Token Importance Indicator in KV Cache Reduction: Value Also Matters [Paper] [Code]
  • [EMNLP'24] Memory-Efficient Fine-Tuning of Transformers via Token Selection [Paper] [Code]
  • [arXiv] LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference [Paper]
  • [ICLR'24] In-context autoencoder for context compression in a large language model [Paper] [Code]
  • [EMNLP'24] Fewer is More: Boosting LLM Reasoning with Reinforced Context Pruning [Paper]

2023

  • [EMNLP'23] Optimizing Retrieval-augmented Reader Models via Token Elimination [Paper] [Code]
  • [EMNLP'23] Context Compression for Auto-regressive Transformers with Sentinel Tokens [Paper] [Code]
  • [EMNLP'23] Leap-of-Thought: Accelerating Transformers via Dynamic Token Routing [Paper] [Code]
  • [EMNLP'23] TLM: Token-Level Masking for Transformers [Paper] [Code]
  • [EMNLP'23] Understanding the Role of Input Token Characters in Language Models: How Does Information Loss Affect Performance? [Paper]
  • [EMNLP'23] Adapting Language Models to Compress Contexts [Paper] [Code]
  • [NeurIPS'23] Learning to Compress Prompts with Gist Tokens [Paper] [Code]
  • [NeurIPS'23] Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers [Paper]
  • [ACL'23] Efficient Transformers with Dynamic Token Pooling [Paper] [Code]
  • [ACL'23] Token-wise Decomposition of Autoregressive Language Model Hidden States for Analyzing Model Predictions [Paper]
  • [ACL'23] LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models [Paper] [Code]
  • [ACL'23] Revisiting Token Dropping Strategy in Efficient BERT Pretraining [Paper]

2022

  • [ACL'22] Pyramid-BERT: Reducing Complexity via Successive Core-set based Token Selection [Paper]
  • [ACL'22] AdapLeR: Speeding up Inference by Adaptive Length Reduction [Paper] [Code]
  • [KDD'22] Learned Token Pruning for Transformers [Paper] [Code]
  • [EMNLP'22] Prompt Compression and Contrastive Conditioning for Controllability and Toxicity Reduction in Language Models [Paper]

2021

  • [NeurIPS'21] Magic Pyramid: Accelerating Inference with Early Exiting and Token Pruning [Paper]
  • [ACL'21] Length-Adaptive Transformer: Train Once with Length Drop, Use Anytime with Search [Paper] [Code]
  • [NAACL'21] TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference [Paper] [Code]

2020

  • [ICML'20] Power-bert: Accelerating bert inference via progressive word-vector elimination [Paper] [Code]

🎬 Vision-Language Model

2025

  • [arXiv] GreedyPrune: Retenting Critical Visual Token Set for Large Vision Language Models [Paper]
  • [arXiv] Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs [Paper]
  • [arXiv] Generic Token Compression in Multimodal Large Language Models from an Explainability Perspective [Paper]
  • [arXiv] DynTok: Dynamic Compression of Visual Tokens for Efficient and Effective Video Understanding [Paper]
  • [arXiv] Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models [Paper] [Code]
  • [arXiv] SmolVLM: Redefining small and efficient multimodal models [Paper] [Code]
  • [ICML'25] SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference [Paper] [Code]
  • [CVPR'25] A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for accelerating Large VLMs [Paper] [Code]
  • [CVPR'25] Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models [Paper] [Code]
  • [CVPR'25] PACT: Pruning and Clustering-Based Token Reduction for Faster Visual Language Models [Paper]
  • [CVPR'25] DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models [Paper] [Code]
  • [CVPR'25] SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding [Paper]
  • [CVPR'25] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models [Paper]
  • [CVPR'25] TopV: Compatible Token Pruning with Inference Time Optimization for Fast and Low-Memory Multimodal Vision Language Model
  • [CVPR'25] Accelerating Multimodel Large Language Models by Searching Optimal Vision Token Reduction [Paper]
  • [CVPR'25] ATP-LLaVA: Adaptive Token Pruning for Large Vision Language Models [Paper] [Code]
  • [CVPR'25] DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models [Paper]
  • [CVPR'25] VoCo-LLaMA: Towards Vision Compression with Large Language Models [Paper] [Code]
  • [NAACL'25 Finding] LVPruning: An Effective yet Simple Language-Guided Vision Token Pruning Approach for Multi-modal Large Language Models [Paper]
  • [ICLR'25] Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Context Sparsification [Paper] [Code]
  • [ICLR'25] Inference Optimal VLMs Need Only One Visual Token But Larger Models [Paper]
  • [ICLR'25] Towards Semantic Equivalence of Tokenization in Multimodal LLM [Paper] [Code]
  • [ICLR'25] LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token [Paper] [Code]
  • [ICLR'25] Matryoshka Multimodal Models [Paper] [Code]
  • [ICLR'25] MrT5: Dynamic Token Merging for Efficient Byte-level Language Models [Paper] [Code]
  • [ICLR'25] TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval [Paper] [Code]
  • [ICLR'25] Inference Optimal VLMs Need Fewer Visual Tokens and More Parameters [Paper]
  • [WACV'25] VLTP: Vision-Language Guided Token Pruning for Task-Oriented Segmentation [Paper]
  • [WACV'25] Patch Ranking: Token Pruning as Ranking Prediction for Efficient CLIP [Paper]
  • [AAAI'25] Boosting Multimodal Large Language Models with Visual Tokens Withdrawal for Rapid Inference [Paper] [Code]
  • [AAAI'25] HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models in Resource-Constrained Environments [Paper] [Code]
  • [AAAI'25] Fit and Prune: Fast and Training-free Visual Token Pruning for Multi-modal Large Language Models [Paper] [Code]
  • [COLING'25] Less is More: A Simple yet Effective Token Reduction Method for Efficient Multi-modal LLMs [Paper] [Code]
  • [arXiv] TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos [Paper] [Code]
  • [arXiv] Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models [Paper] [Code]
  • [arXiv] ZipR1: Reinforcing Token Sparsity in MLLMs [Paper]
  • [arXiv] Fast-Slow Thinking for Large Vision-Language Model Reasoning [Paper] [Code]
  • [arXiv] Dynamic Token Reduction during Generation for Vision Language Models [Paper]
  • [arXiv] Compression with Global Guidance: Towards Training-free High-Resolution MLLMs Acceleration [Paper] [Code]
  • [arXiv] FlowCut: Rethinking Redundancy via Information Flow for Efficient Vision-Language Models [Paper] [Code]
  • [arXiv] VScan: Rethinking Visual Token Reduction for Efficient Large Vision-Language Models [Paper]
  • [arXiv] HoliTom: Holistic Token Merging for Fast Video Large Language Models [Paper] [Code]
  • [arXiv] ToDRE: Visual Token Pruning via Diversity and Task Awareness for Efficient Large Vision-Language Models [Paper]

2024

  • [EMNLP'24] TinyChart: Efficient Chart Understanding with Program-of-Thoughts Learning and Visual Token Merging [Paper] [Code]
  • [NeurIPS'24] Token Merging for Training-Free Semantic Binding in Text-to-Image Synthesis [Paper] [Code]
  • [ECCV'24] IVTP: Instruction-guided Visual Token Pruning for Large Vision-Language Models [Paper]
  • [ECCV'24] An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Acceleration for VLLM Inference [Paper] [Code]
  • [ICML'24] CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers [Paper] [Code]
  • [ECCV'24] LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models [Paper] [Code]
  • [ECCV'24] BRAVE: Broadening the visual encoding of vision-language models [Paper] [Code]
  • [CVPR'24] MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer [Paper] [Code]
  • [CVPR'24] Honeybee: Locality-enhanced Projector for Multimodal LLM [Paper] [Code]
  • [arXiv] Rethinking Token Reduction in MLLMs: Towards a Unified Paradigm for Training-Free Acceleration [Paper] [Code]
  • [OpenReview] LVP: Language-guide Visual Projector for Efficient Multimodal LLM [Paper]
  • [arXiv] FrameFusion: Combining Similarity and Importance for Video Token Reduction on Large Visual Language Models [Paper] [Code]
  • [arXiv] AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning [Paper] [Code]
  • [arXiv] VisionZip: Longer is Better but Not Necessary in Vision Language Models [Paper] [Code]
  • [arXiv] TokenPacker: Efficient Visual Projector for Multimodal LLM [Paper] [Code]
  • [arXiv] mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding [Paper] [Code]
  • [arXiv] TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval [Paper] [Code]
  • [arXiv] Recoverable Compression: A Multimodal Vision Token Recovery Mechanism Guided by Text Information [Paper]
  • [arXiv] Token-level Correlation-guided Compression for Efficient Multimodal Document Understanding [Paper] [Code]
  • [arXiv] DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models [Paper] [Code]
  • [arXiv] CATP: Cross-Attention Token Pruning for Accuracy Preserved Multimodal Model Inference [Paper]
  • [arXiv] MobileVLM V2: Faster and Stronger Baseline for Vision Language Model [Paper] [Code]
  • [arXiv] LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models [Paper] [Code]
  • [arXiv] iLLaVA: An Image is Worth Fewer Than 1/3 Input Tokens in Large Multimodal Models [Paper] [Code]

2023

  • [ACL'23] PuMer: Pruning and Merging Tokens for Efficient Vision Language Models [Paper] [Code]

🐍 State Space Models

  • [arXiv] Dynamic Vision Mamba [Paper] [Code]
  • [EMNLP'24] Rethinking Token Reduction for State Space Models [Paper] [Code]
  • [NeurIPS'24] Exploring Token Pruning in Vision State Space Models [Paper]
  • [ECCV'24 Workshop] Famba-V: Fast Vision Mamba with Cross-Layer Token Fusion [Paper] [Code]

πŸ“± Hardware Co-design

  • [FCCM'24] Accelerating ViT Inference on FPGA through Static and Dynamic Pruning [Paper]
  • [TCASI] BSViT: A Bit-Serial Vision Transformer Accelerator Exploiting Dynamic Patch and Weight Bit-Group Quantization [Paper]
  • [ASPDAC'24] PRIMATE: Processing in Memory Acceleration for Dynamic Token-Pruning Transformers [Paper]
  • [DATE'24] ViT-ToGo : Vision Transformer Accelerator with Grouped Token Pruning [Paper]
  • [HPCA'23] HeatViT: Hardware-Efficient Adaptive Token Pruning for Vision Transformers [Paper]
  • [HPCA'21] SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning [Paper] [Code]

πŸ“œ Citation

If you find our work useful for your project, please consider citing our paper.

@article{kong2025token,
  title={Token Reduction Should Go Beyond Efficiency in Generative Models--From Vision, Language to Multimodality},
  author={Kong, Zhenglun and Li, Yize and Zeng, Fanhu and Xin, Lei and Messica, Shvat and Lin, Xue and Zhao, Pu and Kellis, Manolis and Tang, Hao and Zitnik, Marinka},
  journal={arXiv preprint arXiv:2505.18227},
  year={2025}
}

πŸ’– Star History

Star History Chart

About

A collection of token reduction (token pruning, merging, clustering, etc.) techniques for ML/AI

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  
0