Stars
Quantized Attention achieves speedup of 2-5x and 3-11x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models.
Infinity is a high-throughput, low-latency serving engine for text-embeddings, reranking models, clip, clap and colpali
[Pytorch] Generative retrieval model using semantic IDs from "Recommender Systems with Generative Retrieval"
[CVPR 2025] LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant
Minimalistic 4D-parallelism distributed training framework for education purpose
LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning
This repo contains the code for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks" [ICLR 2025]
An Open-sourced Knowledgable Large Language Model Framework.
Is ChatGPT Good at Search? LLMs as Re-Ranking Agent [EMNLP 2023 Outstanding Paper Award]
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models
A method to increase the speed and lower the memory footprint of existing vision transformers.
Code for 'LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders'
Scalable training for dense retrieval models.
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
🦜🔗 Build context-aware reasoning applications
LLM training code for Databricks foundation models
The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.
Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-of-use, backed by research.
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train Qwen3, Llama 4, DeepSeek-R1, Gemma 3, TTS 2x faster with 70% less VRAM.
Hackable and optimized Transformers building blocks, supporting a composable construction.
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
An open-source implementation for training LLaVA-NeXT.
Official release of InternLM series (InternLM, InternLM2, InternLM2.5, InternLM3).