Starred repositories
Python tool for converting files and office documents to Markdown.
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
DOM to Semantic-Markdown for use with LLMs
Curated list of useful LLM / Analytics / Datascience resources
an open source, extensible AI agent that goes beyond code suggestions - install, execute, edit, and test with any LLM
A PyTorch native platform for training generative AI models
A deep dive into embeddings starting from fundamentals
Minimalistic large language model 3D-parallelism training
List of Dirty, Naughty, Obscene, and Otherwise Bad Words
Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-of-use, backed by research.
π π¦ The one-person framework for Rust for side-projects and startups
SGLang is a fast serving framework for large language models and vision language models.
Finetune Qwen3, Llama 4, TTS, DeepSeek-R1 & Gemma 3 LLMs 2x faster with 70% less memory! π¦₯
Best practices & guides on how to write distributed pytorch training code
Minimalistic 4D-parallelism distributed training framework for education purpose
A curriculum for learning about foundation models, from scratch to the frontier
Convert PDF to markdown + JSON quickly with high accuracy
Official repository for our work on micro-budget training of large-scale diffusion models.
Effective LLM Alignment Toolkit
Google Research
A Collection of BM25 Algorithms in Python
π Guides, papers, lecture, notebooks and resources for prompt engineering
Generate any location from the real world in Minecraft Java Edition with a high level of detail.
Code for NeurIPS 2024 paper - The GAN is dead; long live the GAN! A Modern Baseline GAN - by Huang et al.