Stars
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Hackable and optimized Transformers building blocks, supporting a composable construction.
YaRN: Efficient Context Window Extension of Large Language Models
An extremely fast Python package and project manager, written in Rust.
Lumina-T2X is a unified framework for Text to Any Modality Generation
Official PyTorch Implementation of "SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers"
A Python package to effortlessly assemble images in comparison figures. Supports LaTeX, PPTX, and HTML.
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.
[ICCV 2025] A simple training-free approach adapting DUSt3R for dynamic scenes.
🦜🔗 Build context-aware reasoning applications
LlamaIndex is the leading framework for building LLM-powered agents over your data.
Build resilient language agents as graphs.
This repository contains the Hugging Face Agents Course.
🤗 smolagents: a barebones library for agents that think in code.
Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 3, Mistral Small 3.1 and other large language models.
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery 🧑🔬
Code for the project "MegaSaM: Accurate, Fast and Robust Structure and Motion from Casual Dynamic Videos"
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.
SGLang is a fast serving framework for large language models and vision language models.
Wan: Open and Advanced Large-Scale Video Generative Models
[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". A…
New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos
A suite of image and video neural tokenizers
An autoregressive character-level language model for making more things