Stars
A Datacenter Scale Distributed Inference Serving Framework
Transformers provides a simple, intuitive interface for Rust developers who want to work with Large Language Models locally, powered by the Candle crate. It offers an API inspired by Python's Trans…
Exploration work on executing CUDA kernels on Apple Silicon (Metal-compatible code).
The simplest, fastest repository for training/finetuning small-sized VLMs.
DFloat11: Lossless LLM Compression for Efficient GPU Inference
Official inference framework for 1-bit LLMs
Fast, Lightweight, Unified Engine for Text2Image Diffusion Models
Rust standalone inference of Namo-500M series models. Extremly tiny, runing VLM on CPU.
Model Context Protocol (MCP) implementation in Rust
A modular diffusion pipeline for synthesis of post-treatment glioma MR images.
DeepEP: an efficient expert-parallel communication library
Preference-based Recursive Language Modeling for Exploratory Optimization of Reasoning
Any model. Any hardware. Zero compromise. Built with @ziglang / @openxla / MLIR / @bazelbuild
Open-source framework that builds customized & randomized JSON files for use in endpoint load testing.
Rust bindings to LLVM. (Mirror of https://gitlab.com/taricorp/llvm-sys.rs/)
Code to make working with CUDA, via the CUDARC lib, easier.