PyTorch MoE

PyTorch MoE is a modular implementation of a Mixture-of-Experts transformer using PyTorch. It is designed for flexibility, extensibility, and performance. The architecture focuses on sparse expert routing, efficient attention, and scalable tokenization.

Features

Sparse expert activation with top-k routing per token
Rotary embeddings and grouped-query attention
Attention with KV caching for autoregressive use cases
Expert capacity limits with residual fallback on overflow
Parallelized BPE tokenizer with deterministic merges
Modular PyTorch components for experimentation

Components

MoEBlock
Implements token-to-expert routing. Each token selects k experts with highest router scores. Experts are capped in capacity; excess tokens are handled by residual pass-through.

TransformerBlock
Layer block combining rotary attention with either dense FF or sparse MoE. Supports per-head configuration for queries and keys/values.

MoeTransformer
Stacked transformer with shared embeddings. Exposes encoder and decoder functions through tied parameters.

Tokenizer
Multiprocess BPE vocabulary builder. Compatible with GPT-style byte-level encoding. Outputs human-readable vocab and merge rules.

Usage

Tokenizer

python tokenizer/parallel_bpe.py -w 8 -n 300 -d data_cache/

Saves merges.txt and vocabs.txt under data_cache/.

Install

pip install -r requirements.txt

Notes

Top-k expert selection is differentiable and batch-aware
Attention uses rotary position encoding on keys and queries
Grouped-query attention reduces KV head duplication
BPE uses regex-based segmentation compatible with GPT-4 patterns

Work in Progress

This repo is not complete for end-to-end training. Missing features include:

Dataset pipeline and loss function
Optimizer and training loop
Generation/sampling logic
Flash attention or fused kernels

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
tokenizer		tokenizer
LICENSE		LICENSE
README.md		README.md
example.py		example.py
model.py		model.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PyTorch MoE

Features

Components

Usage

Tokenizer

Install

Notes

Work in Progress

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

b8zhong/torch-moe-transformer

Folders and files

Latest commit

History

Repository files navigation

PyTorch MoE

Features

Components

Usage

Tokenizer

Install

Notes

Work in Progress

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages