8000 GitHub - heng2j/pi_incubator
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

heng2j/pi_incubator

Repository files navigation

RLOps Policies Pi(π) Incubator

Pi(π) Incubator is an unified RL training framework designed with RLOps in mind to empower researchers with a modular, scalable framework for training reinforcement learning (RL) policies. It integrates various tools such as TorchRL, Metaflow, and Ray to support both local development and cloud-based experiments (e.g., AWS EKS). This repository includes training scripts, configuration files, and utilities for orchestrating distributed RL experiments at scale, providing frictionless, scalable, and user-friendly infrastructure that accelerates research development cycles

High-Level User Workflow Diagram


Table of Contents


High-Level Architecture Diagram

High-Level Architecture Diagram

Core Components

External RL Open-Source Repos

  • Sim Env Zoo: Collection of simulated training environments from various high fidelity simulators (MuJoCo, NVIDIA Isaac etc.)
  • Sim Model Zoo: Reusable simulation/dynamics models and configurations from the shared simulators for standardized scenarios
  • NN Model Zoo: Neural architectures tailored for RL (CNNs, RNN, Transformers, Mamba )
  • Policy Algorithm Zoo: Range of RL algorithms (PPO, SAC, Dreamer, GRPO, etc.)

Distributed Trainer (TorchRL)

  • Leverages PyTorch and TorchRL for training with GPU acceleration

Distributed Training Data Collection (Ray)

  • Parallel rollout workers collecting experiences from high fidelity simulators
  • Dynamically orchestrates resource allocation and training workflow using Ray and Ray Tune

Distributed Replay Buffer (Ray + VectorDB)

  • Stores large-scale offline and real-world gathered experience data
  • Facilitates off-policy training, data re-sampling, and memory-based RL

Experiment Orchestrator (Metaflow)

  • Defines, executes, and tracks end-to-end training workflows (data prep, training, evaluation, deployment)
  • Manages job submission to Kubernetes for auto-scaling

Inference & Evaluation (Custom with Metaflow + Ray Tune + MLflow)

  • Custom inference pipelines for batch evaluation with custom metrics
  • Integrates with Ray Tune for hyperparameter tuning, metric aggregation and push to MLflow
  • Automated alerts and early stopping to automatically halt underperforming experiments or trigger adjustments based on defined performance metrics.

Experiment Tracking with Observability (MLflow)

  • Stores artifacts (checkpoints, logs, metrics) for reproducibility and comparison
  • Provides a dashboard for historical experiment insights

Optional Policy Distillation Process

  • Distills large policies into smaller, efficient models for real-world deployment

Policy Exporter (Torch → ONNX)

  • Converts PyTorch RL policies into ONNX format for edge deployment on embedded devices (Nvidia Jetson, etc.)

Shared RL Training Optimizers Repos

  • Training Optimization method like Population-Based Training (PBT), Auto-curriculum, Domain Randomization (DR), Unsupervised Environment Design (UED), Self-Play

Custom Experiment Configurations

  • User defined experiment configurations (YAML/JSON) handling: Training workflow, environment parameters, hyperparameters, compute resources, deployment settings, etc.

By leveraging TorchRL, Ray, and Metaflow, they jointly provide an end-to-end RLOps solution—from flexible algorithm design to scalable, reproducible production experiments.**

TorchRL

  • PyTorch-first: Leverages PyTorch's flexibility for custom RL algorithms.
  • Modularity: Allows rapid prototyping with composable network components.

Ray & Ray Tune

  • Scalable Rollouts: Distributes high-fidelity simulation data collection seamlessly.
  • Automated Tuning: Integrates hyperparameter search with third-party logging (W&B, MLflow).
  • Efficient Scaling: Transitions effortlessly from local experiments to multi-node clusters.

Metaflow

  • Orchestration: Simplifies complex workflow management and experiment tracking.
  • Kubernetes Ready: Automates containerized training jobs for on-prem or cloud deployments.

Installation

Conda Environment Setup

Create and configure your Conda environment with the following commands:

conda create --name rlops_env python=3.11
conda install -y -c conda-forge glew
conda install -y -c anaconda mesa-libgl-cos6-x86_64
conda install -y -c menpo glfw3
conda env config vars set MUJOCO_GL=egl PYOPENGL_PLATFORM=egl
conda deactivate && conda activate rlops_env

Optional Dependencies

For additional functionality, install these optional packages:

pip install ray[default]==2.8.1
pip install onnx-pytorch
pip install onnxruntime
pip install metaflow-ray

Install

To install pi_incubator:

pip install -e .

Current Experimental Usage

Running Locally

To run a baseline training locally, use the following command:

python train_ppo.py --config configs/experiment_config_baseline.yaml
python torchrl_ray_train.py --config configs/experiment_config_baseline.yaml

Running with torchrun

For distributed training using torchrun, execute:

torchrun --nnodes=1 --nproc-per-node=1 --max-restarts=1 --rdzv-id=1 --rdzv-backend=c10d --rdzv-endpoint=localhost:0 train_ppo.py --config configs/experiment_config_baseline.yaml

Or to run a tuned configuration:

torchrun --nnodes=1 --nproc-per-node=1 --max-restarts=1 --rdzv-id=1 --rdzv-backend=c10d --rdzv-endpoint=localhost:0 train_ppo.py --config configs/experiment_config_tuned_PPO.yaml

AWS & Kubernetes Configuration

  • Deploying to AWS with Kubernetes:

As of now, please go to terraform-aws-metaflow dir then follow the official Outerbounds instruction to deploy these services in your AWS account:AWS EKS cluster, Amazon S3, AWS Fargate and Relational Database Service (RDS).

  • AWS EKS:

    Update your kubeconfig for your EKS cluster:

    aws eks update-kubeconfig --name <cluster name>
  • AWS API Gateway:

    Retrieve your API key value:

    aws apigateway get-api-key --api-key <YOUR_KEY_ID_FROM_CFN> --include-value | grep value
  • Metaflow AWS Configuration:

    Configure Metaflow to use AWS:

    metaflow configure aws

Metaflow Runs

Run your training flows on Kubernetes with Metaflow:

python torch_flow.py --no-pylint --environment=conda run --with kubernetes

Other experimental Metaflow runs:

python ppo_metaflow.py run
python ppo_metaflow.py run --with kubernetes
python ray_flow.py --no-pylint --environment=pypi run --with kubernetes

Tracking Running Processes

If you're using Argo Workflows, you can track the running processes via port forwarding:

kubectl port-forward -n argo service/argo-argo-workflows-server 2746:2746

Experimental Runs

Investigating why collector kwargs are not being passed to desinated collector from RayCollector. Need to understand how the argument are being pass. Will reachout to TorchRL team.

python ray_collector_train.py

Currently not able to train with RayCollector and TorchRL trainer together. However, by fixing the the following on the official ray_train.py example. It is working without TorchRL trainer.

RuntimeError: Setting 'advantage' via the constructor is deprecated, use .set_keys(<key>='some_key') instead.

Uppon fixed the outdated ClipPPOLoss

RuntimeError: The distribution TanhNormal has not analytical mode. Use ExplorationMode.DETERMINISTIC to get a deterministic sample from it.

Need to set_exploration_type from ExplorationType.MODE to ExplorationType.DETERMINISTIC


TODOs

  • Experiment with Ray Tune on the TorchRL Trainer.
  • Experiment with Ray Collector with TorchRL Trainer.
  • Investigate how collector kwargs from Ray Collector are passing into SyncDataCollector
  • Investigating Isaac Gym Wrapper
  • Investigating on Self-play env set up
  • Support local Kubernetes with S3Mock
  • Set up Training Dependencies with a Docker registry.
  • Experiment with Ray Collector with Ray Tune + TorchRL Trainer.
  • Use W&B logger for improved experiment tracking.
  • Run Metaflow training comparisons on TunedAdam.

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0