GitHub - Aryan8912/metamon: Human-Level Competitive Pokémon via Scalable Offline Reinforcement Learning with Transformers

Metamon enables reinforcement learning (RL) research in Competitive Pokémon (as played on Pokémon Showdown) by providing:

A large dataset of RL trajectories "reconstructed" from real human battles.
Starting points for training imitation learning (IL) and offline RL policies.
A standardized suite of teams and opponents for evaluation.

Currently, it is focused on the first four generations of Pokémon, which have the longset battle lengths and provide the least information about the opponent's team.

Metamon is the codebase behind "Human-Level Competetitive Pokémon via Scalable Offline RL and Transformers". Please check out our project website for an overview of our results. This README documents the dataset, pretrained models, training, and evaluation details to help you get battling!

The public version of this repo is very much in beta :) Please come back soon for updates!

Installation

Metamon is written and tested for ubuntu and python 3.10+. We recommend creating a fresh virtual environment or conda environment:

conda create -n metamon python==3.10
conda activate metamon

Then, install with:

git clone git@github.com:UT-Austin-RPL/metamon.git
cd metamon
pip install -e .

To install Pokémon Showdown (PS), you will need a modern version of npm / Node.js. It's likely you already have this (check that npm -v is > 10.0), but if not, you can find instructions here. This repo comes packaged with the specific commit that we used during the project (though newer versions should be fine!)

cd server/pokemon-showdown
npm install

Then, we will start a local PS server to handle our battle traffic. The server settings are determined by a configuration file which we'll copy from the provided example (server/config.js):

cp ../config.js config/

The main setting in this config.js file worth knowing about is export.num_workers, which helps handle concurrent battles.

You will need to have the PS server running in the background while using Metamon:

# recommended: `screen`
node pokemon-showdown start --no-security # no-security removes the account login of the public website
# Press Ctrl+A+D to detach from the screen

You should see a status message printed for each worker.

poke-env is a python interface for interacting with the javascript PS server. Metamon relies on a custom (and now quite out-of-sync) fork for various early-gen fixes, which should install as part of the metamon package. If you run into issues, the repo is here:

# does not need to be in the same directory as pokemon-showdown
git clone git@github.com:jakegrigsby/poke-env.git
cd poke-env
pip install -e .

You can verify that installation has gone smoothly with:

python -m metamon.env

Which will run a few test battles on your local server and print a progress bar to the terminal.

Battle Datasets

PS creates "replays" of battles that players can choose to upload to the website before they expire. We gathered all surviving historical replays for Generations 1-4 Ubers, OverUsed, UnderUsed, and NeverUsed, and now save active battles before expiration to accelerate dataset growth.

PS replays are saved from the point-of-view of a spectator rather than the point-of-view of a player. We unlock the replay dataset for RL by "reconstructing" the point-of-view of each player.

Datasets are stored on huggingface in two formats:

Name	Battles	Description
`metamon-parsed-replays`	1.05M	Real Showdown battles only! Provides the dataset in the most portable form (fresh from the replay parser). Observations are dicts of text and numerical features. These datasets have missing actions (`action = -1`) where the player's choice is not revelead to spectators. Includes ~100k more trajectories than were used by most experiments in the paper (because more human battles have been played!).
`metamon-synthetic`	5M	Parsed replays + self-play data converted to the format expected by the RL trainer --- though they can still be used anywhere with a little pre-processing. Text is stored as tokenized ints based on all the words that appear in the parsed replays. Missing actions have been filled by an IL model. This is the final version used in the paper.

See python -m metamon.data.replay_dataset.download -h to download an 8129 d extract the parsed replay dataset.

Pretrained Models

We have made every checkpoint of 18 models available on huggingface at jakegrigsby/metamon.

Load and run pretrained models with metamon.rl.eval_pretrained. For example:

python -m metamon.rl.eval_pretrained --agent SyntheticRLV2 --gens 1 --formats ou --n_challenges 50 --eval_type heuristic

Will run the default checkpoint of the best model for 50 battles against a set of heuristic baselines.

Here is an overview. Some model sizes have several variants testing different RL objectives. See metamon/rl/eval_pretrained.py for a complete list.

Model Name (`--agent`)	Description
`SmallIL` (2 variants)	15M imitation learning model trained on 1M human battles
`SmallRL` (5 variants)	15M actor-critic model trained on 1M human battles
`MediumIL`	50M imitation learning model trained on 1M human battles
`MediumRL` (3 variants)	50M actor-critic model trained on 1M human battles
`LargeIL`	200M imitation learning model trained on 1M human battles
`LargeRL`	200M actor-critic model trained on 1M human battles
`SyntheticRLV0`	200M actor-critic model trained on 1M human + 1M diverse self-play battles
`SyntheticRLV1`	200M actor-critic model trained on 1M human + 2M diverse self-play battles
`SyntheticRLV1_SelfPlay`	SyntheticRLV1 fine-tuned on 2M extra battles against itself
`SyntheticRLV1_PlusPlus`	SyntheticRLV1 fine-tuned on 2M extra battles against diverse opponents
`SyntheticRLV2`	Final 200M actor-critic model with value classification trained on 1M human + 4M diverse self-play battles.

Training

Use --help for documentation of each script

metamon/il/ is a basic imitation learning pipeline that might be a useful template / starting point for playing with model architectures and new datasets. This code is a remnant of an early version of this project, and the final paper's experiments only use this for a few "BC-RNN" baselines that were mostly pushed to the Appendix.

python -m metamon.il.train

metamon/rl/ connects Metamon to amago, which powers the main IL and RL experiments in the paper.

python -m metamon.rl.offline_from_config is the main training script. Before training begins, we need to assemble an offline dataset from the parsed replays released above and self-play trajectoires. More on this soon.

Baselines

baselines/ contains baseline opponents that we can battle against via poke-env. baselines/heuritics provides more than a dozen heuristic opponents and starter code for developing new ones (or mixing ground-truth Pokémon knowledge into ML agents). baselines/model_based ties the simple il model checkpoints to poke-env (with CPU inference).

Compare baselines with:

python -m metamon.compete --task_dist Gen1OU --player GymLeader --opponent RandomBaseline --tasks 10

Data

data/teams: contains sets of Pokémon teams scraped from forum discussions or procedurally generated from real replays and/or usage statistics.

data/tokenizer: standardizes the conversion between text observations and token ids.

data/replay_dataset: includes all the behind-the-scenes logic that creates the replay dataset on huggingface.

Citation

@misc{grigsby2025metamon,
      title={Human-Level Competitive Pok\'emon via Scalable Offline Reinforcement Learning with Transformers}, 
      author={Jake Grigsby and Yuqi Xie and Justin Sasek and Steven Zheng and Yuke Zhu},
      year={2025},
      eprint={2504.04395},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2504.04395}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
.github/workflows		.github/workflows
media		media
metamon		metamon
server		server
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Installation

Battle Datasets

Pretrained Models

Training

Baselines

Data

Citation

About

Uh oh!

Releases

Packages

Languages

License

Aryan8912/metamon

Folders and files

Latest commit

History

Repository files navigation

Installation

Battle Datasets

Pretrained Models

Training

Baselines

Data

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages