Stars
AI
4 repositories
Ongoing research training transformer models at scale
A high-throughput and memory-efficient inference and serving engine for LLMs
AI Inference Operator for Kubernetes. The easiest way to serve ML models in production. Supports VLMs, LLMs, embeddings, and speech-to-text.
Python tool for converting files and office documents to Markdown.