smol-moe is a minimal and modular implementation of a Sparse Mixture of Experts (MoE) transformer in PyTorch. It supports dynamic top-k expert routing (with auxiliary load balancing) and can be trained on TinyStories/TinyShakespeare-style datasets.
-
Notifications
You must be signed in to change notification settings - Fork 0
smol-moe is a minimal and modular implementation of a Sparse Mixture of Experts (MoE) transformer in PyTorch.
License
ydnyshhh/smol-moe
About
smol-moe is a minimal and modular implementation of a Sparse Mixture of Experts (MoE) transformer in PyTorch.
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published