Machine Learning Engineering Open Book
-
Updated
Jun 9, 2025 - Python
8000
Machine Learning Engineering Open Book
Slurm: A Highly Scalable Workload Manager
A DSL for data-driven computational pipelines
dstack is an open-source alternative to Kubernetes and Slurm, designed to simplify GPU allocation and AI workload orchestration for ML teams across top clouds, on-prem clusters, and accelerators.
A scalable, efficient, cross-platform (Linux/macOS) and easy-to-use workflow engine in pure Python.
Best practices & guides on how to write distributed pytorch training code
A Slurm cluster using docker-compose
Lightweight fast function pipeline (DAG) creation in pure Python for scientific workflows 🕸️🧪
TorchX is a universal job launcher for PyTorch applications. TorchX is designed to have fast iteration time for training/research and support for E2E production ML pipelines when you're ready.
A scheduler for GPU/CPU tasks
Simplify HPC and Batch workloads on Azure
An open-source toolkit for deploying and managing high performance clusters for HPC, AI, and data analytics workloads.
Prometheus exporter for performance metrics from Slurm.
Run Slurm in Kubernetes
Tools for computation on batch systems
Add a description, image, and links to the slurm topic page so that developers can more easily learn about it.
To associate your repository with the slurm topic, visit your repo's landing page and select "manage topics."