muP made easy

A minimal (really) implementation of muP with SGD and Adam, following the Tensor Programs IV and Tensor Programs V papers. Classes SPMLP and muMLPTab9 implement SP and muP parametrizations as shown in Table1 TPIV paper or Table9 TPV paper equivalently. Rest of the code is just training utils.

This implementation does not rely on "setting shapes", nor optimizer trickes, like others. There is also no tunable scaling hyperparameters.

Running run_mlp.sh will reproduce the results from below. Training script will auto-run on all GPUs with >16GB memory and <50% utilization. Feel free to change it as per your GPUs.

Special thanks to dvruette for help deciphering the notation and discussions on debugging.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
results		results
README.md		README.md
image-1.png		image-1.png
run_mlp.sh		run_mlp.sh
train_mlp.py		train_mlp.py
train_mlp_test.py		train_mlp_test.py
train_mlp_test_class		train_mlp_test_class
train_mlp_test_rotation.py		train_mlp_test_rotation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

muP made easy

About

Uh oh!

Releases

Packages

Languages

xszheng2020/mup

Folders and files

Latest commit

History

Repository files navigation

muP made easy

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages