RLOps Policies Pi(π) Incubator

Pi(π) Incubator is an unified RL training framework designed with RLOps in mind to empower researchers with a modular, scalable framework for training reinforcement learning (RL) policies. It integrates various tools such as TorchRL, Metaflow, and Ray to support both local development and cloud-based experiments (e.g., AWS EKS). This repository includes training scripts, configuration files, and utilities for orchestrating distributed RL experiments at scale, providing frictionless, scalable, and user-friendly infrastructure that accelerates research development cycles

High-Level Architecture Diagram

Core Components

External RL Open-Source Repos

Sim Env Zoo: Collection of simulated training environments from various high fidelity simulators (MuJoCo, NVIDIA Isaac etc.)
Sim Model Zoo: Reusable simulation/dynamics models and configurations from the shared simulators for standardized scenarios
NN Model Zoo: Neural architectures tailored for RL (CNNs, RNN, Transformers, Mamba )
Policy Algorithm Zoo: Range of RL algorithms (PPO, SAC, Dreamer, GRPO, etc.)

Distributed Trainer (TorchRL)

Leverages PyTorch and TorchRL for training with GPU acceleration

Distributed Training Data Collection (Ray)

Parallel rollout workers collecting experiences from high fidelity simulators
Dynamically orchestrates resource allocation and training workflow using Ray and Ray Tune

Distributed Replay Buffer (Ray + VectorDB)

Stores large-scale offline and real-world gathered experience data
Facilitates off-policy training, data re-sampling, and memory-based RL

Experiment Orchestrator (Metaflow)

Defines, executes, and tracks end-to-end training workflows (data prep, training, evaluation, deployment)
Manages job submission to Kubernetes for auto-scaling

Inference & Evaluation (Custom with Metaflow + Ray Tune + MLflow)

Custom inference pipelines for batch evaluation with custom metrics
Integrates with Ray Tune for hyperparameter tuning, metric aggregation and push to MLflow
Automated alerts and early stopping to automatically halt underperforming experiments or trigger adjustments based on defined performance metrics.

Experiment Tracking with Observability (MLflow)

Stores artifacts (checkpoints, logs, metrics) for reproducibility and comparison
Provides a dashboard for historical experiment insights

Optional Policy Distillation Process

Distills large policies into smaller, efficient models for real-world deployment

Policy Exporter (Torch → ONNX)

Converts PyTorch RL policies into ONNX format for edge deployment on embedded devices (Nvidia Jetson, etc.)

Shared RL Training Optimizers Repos

Training Optimization method like Population-Based Training (PBT), Auto-curriculum, Domain Randomization (DR), Unsupervised Environment Design (UED), Self-Play

Custom Experiment Configurations

User defined experiment configurations (YAML/JSON) handling: Training workflow, environment parameters, hyperparameters, compute resources, deployment settings, etc.

By leveraging TorchRL, Ray, and Metaflow, they jointly provide an end-to-end RLOps solution—from flexible algorithm design to scalable, reproducible production experiments.**

TorchRL

PyTorch-first: Leverages PyTorch's flexibility for custom RL algorithms.
Modularity: Allows rapid prototyping with composable network components.

Ray & Ray Tune

Scalable Rollouts: Distributes high-fidelity simulation data collection seamlessly.
Automated Tuning: Integrates hyperparameter search with third-party logging (W&B, MLflow).
Efficient Scaling: Transitions effortlessly from local experiments to multi-node clusters.

Metaflow

Orchestration: Simplifies complex workflow management and experiment tracking.
Kubernetes Ready: Automates containerized training jobs for on-prem or cloud deployments.

Installation

Conda Environment Setup

Create and configure your Conda environment with the following commands:

conda create --name rlops_env python=3.11
conda install -y -c conda-forge glew
conda install -y -c anaconda mesa-libgl-cos6-x86_64
conda install -y -c menpo glfw3
conda env config vars set MUJOCO_GL=egl PYOPENGL_PLATFORM=egl
conda deactivate && conda activate rlops_env

Optional Dependencies

For additional functionality, install these optional packages:

pip install ray[default]==2.8.1
pip install onnx-pytorch
pip install onnxruntime
pip install metaflow-ray

Install

To install pi_incubator:

pip install -e .

Current Experimental Usage

Running Locally

To run a baseline training locally, use the following command:

python train_ppo.py --config configs/experiment_config_baseline.yaml

python torchrl_ray_train.py --config configs/experiment_config_baseline.yaml

Running with `torchrun`

For distributed training using torchrun, execute:

torchrun --nnodes=1 --nproc-per-node=1 --max-restarts=1 --rdzv-id=1 --rdzv-backend=c10d --rdzv-endpoint=localhost:0 train_ppo.py --config configs/experiment_config_baseline.yaml

Or to run a tuned configuration:

torchrun --nnodes=1 --nproc-per-node=1 --max-restarts=1 --rdzv-id=1 --rdzv-backend=c10d --rdzv-endpoint=localhost:0 train_ppo.py --config configs/experiment_config_tuned_PPO.yaml

AWS & Kubernetes Configuration

Deploying to AWS with Kubernetes:

As of now, please go to terraform-aws-metaflow dir then follow the official Outerbounds instruction to deploy these services in your AWS account:AWS EKS cluster, Amazon S3, AWS Fargate and Relational Database Service (RDS).

AWS EKS:

Update your kubeconfig for your EKS cluster:
```
aws eks update-kubeconfig --name <cluster name>
```

AWS API Gateway:

Retrieve your API key value:

aws apigateway get-api-key --api-key <YOUR_KEY_ID_FROM_CFN> --include-value | grep value

Metaflow AWS Configuration:

Configure Metaflow to use AWS:
```
metaflow configure aws
```

Metaflow Runs

Run your training flows on Kubernetes with Metaflow:

python torch_flow.py --no-pylint --environment=conda run --with kubernetes

Other experimental Metaflow runs:

python ppo_metaflow.py run
python ppo_metaflow.py run --with kubernetes
python ray_flow.py --no-pylint --environment=pypi run --with kubernetes

Tracking Running Processes

If you're using Argo Workflows, you can track the running processes via port forwarding:

kubectl port-forward -n argo service/argo-argo-workflows-server 2746:2746

Experimental Runs

Investigating why collector kwargs are not being passed to desinated collector from RayCollector. Need to understand how the argument are being pass. Will reachout to TorchRL team.

python ray_collector_train.py

Currently not able to train with RayCollector and TorchRL trainer together. However, by fixing the the following on the official ray_train.py example. It is working without TorchRL trainer.

RuntimeError: Setting 'advantage' via the constructor is deprecated, use .set_keys(<key>='some_key') instead.

Uppon fixed the outdated ClipPPOLoss

RuntimeError: The distribution TanhNormal has not analytical mode. Use ExplorationMode.DETERMINISTIC to get a deterministic sample from it.

Need to set_exploration_type from ExplorationType.MODE to ExplorationType.DETERMINISTIC

TODOs

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
contrib/experimental_scripts		contrib/experimental_scripts
docs/images		docs/images
examples		examples
pi_incubator		pi_incubator
src/pi_incubator		src/pi_incubator
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RLOps Policies Pi(π) Incubator

Table of Contents

High-Level Architecture Diagram

External RL Open-Source Repos

Distributed Trainer (TorchRL)

Distributed Training Data Collection (Ray)

Distributed Replay Buffer (Ray + VectorDB)

Experiment Orchestrator (Metaflow)

Inference & Evaluation (Custom with Metaflow + Ray Tune + MLflow)

Experiment Tracking with Observability (MLflow)

Optional Policy Distillation Process

Policy Exporter (Torch → ONNX)

Shared RL Training Optimizers Repos

Custom Experiment Configurations

Installation

Conda Environment Setup

Optional Dependencies

Install

Current Experimental Usage

Running Locally

Running with `torchrun`

AWS & Kubernetes Configuration

Metaflow Runs

Tracking Running Processes

Experimental Runs

TODOs

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

heng2j/pi_incubator

Folders and files

Latest commit

History

Repository files navigation

RLOps Policies Pi(π) Incubator

Table of Contents

High-Level Architecture Diagram

External RL Open-Source Repos

Distributed Trainer (TorchRL)

Distributed Training Data Collection (Ray)

Distributed Replay Buffer (Ray + VectorDB)

Experiment Orchestrator (Metaflow)

Inference & Evaluation (Custom with Metaflow + Ray Tune + MLflow)

Experiment Tracking with Observability (MLflow)

Optional Policy Distillation Process

Policy Exporter (Torch → ONNX)

Shared RL Training Optimizers Repos

Custom Experiment Configurations

Installation

Conda Environment Setup

Optional Dependencies

Install

Current Experimental Usage

Running Locally

Running with torchrun

AWS & Kubernetes Configuration

Metaflow Runs

Tracking Running Processes

Experimental Runs

TODOs

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Running with `torchrun`

Packages