Contribute

NanoMoE is a minimal rewrite of Karpathy's NanoGPT, from scratch, with pedogical implementations for some new features, such as MOEs.

A compact, from-scratch character-level Transformer model with Rotary Position Embeddings, Mixture-of-Experts feed-forward layers, and F-gram contextual augmentation — all implemented in a single model.py file. Beats out NanoGPT's generative capabilities on a roughly similar LOC (to offset higher memory but retain gains from MOE and RoPE, consider turning of the F-gram context). Run the (badly named) model.py file to start training; you might want to change some of the hyperparameters to fit on a consumer GPU. Verified trains across CUDA, MPS and ROCM.

Written for practice, manually, on a keyboard in two afternoons.

Currently, it loads the TinyShakespeare dataset; perhaps a switch to FineWeb is warranted.

Contribute

Swap in a subword tokenizer (BPE/WordPiece) in place of the character encoder
Automate F-gram mining from the corpus rather than hard-coding
Add muon support, add cosine annealing and other LR schedulers.
Allow mixed-precision training.
triton kernels for fun?

I'll be adding an arg parser soon, but until then, these are rough, recommended values! (Although I assume TinyShakespeare would work no matter what you choose to run on). Here's a schema that anyone who wants to contribute could follow.

Argument	Description	Default
`--embedding_dim`	Token embedding size	128
`--num_heads`	Number of attention heads	4
`--num_layers`	Number of Transformer blocks	4
`--block_size`	Context window (sequence length)	64
`--dropout`	Dropout probability	0.1
`--moe_experts`	Number of experts in the MoE layer	4
`--fgram_max_n`	Maximum n-gram length for F-gram augmentation	3
`--learning_rate`	AdamW learning rate	3e-4
`--batch_size`	Batch size	512
`--epochs`	Number of training epochs	10

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
old		old
README.md		README.md
input.txt		input.txt
logs.txt		logs.txt
model.py		model.py
server.log		server.log

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Contribute

About

Uh oh!

Uh oh!

Languages

plugyawn/NanoGPT-MoE

Folders and files

Latest commit

History

Repository files navigation

Contribute

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages