Stars
The open-source Mixture of Depths code and the official implementation of the paper "Router-Tuning: A Simple and Effective Approach for Enabling Dynamic Depth in Transformers."
The official implementation of the paper "Towards Efficient Mixture of Experts: A Holistic Study of Compression Techniques (TMLR)".
Shwai-He / Multi-modal-Attention-Network-for-Stock-Movements-Prediction
Forked from HeathCiff/Multi-modal-Attention-Network-for-Stock-Movements-PredictionThe dataset of Multi-modal Attention Network for Stock Movements Prediction
The source code of "Merging Experts into One: Improving Computational Efficiency of Mixture of Experts (EMNLP 2023)":
Source code of EMNLP 2022 Findings paper "SparseAdapter: An Easy Approach for Improving the Parameter-Efficiency of Adapters"
Source code of ACL 2023 Main Conference Paper "PAD-Net: An Efficient Framework for Dynamic Networks".
A collection of AWESOME things about mixture-of-experts
The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".