LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.

Python 484 22 Updated Jan 13, 2025

MoonshotAI / MoBA

MoBA: Mixture of Block Attention for Long-Context LLMs

Python 1,781 107 Updated Apr 3, 2025

fla-org / native-sparse-attention

🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"

Python 686 29 Updated Mar 19, 2025

lucidrains / native-sparse-attention-pytorch

Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paper

Python 637 35 Updated May 16, 2025

Yxxxb / VoCo-LLaMA

[CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".

Python 163 7 Updated May 26, 2025

xuyang-liu16 / GlobalCom2

Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models

Python 25 Updated May 21, 2025

zyinan99 / DOGE

1 Updated Dec 12, 2024

rui-qian / READ

Rui Qian, Xin Yin, Dejing Dou†: Reasoning to Attend: Try to Understand How <SEG> Token Works (CVPR 2025)

Python 32 Updated May 2, 2025

IDEA-FinAI / ChartBench

Python 13 Updated May 15, 2025

PKU-Alignment / align-anything

Align Anything: Training All-modality Model with Feedback

Jupyter Notebook 3,824 476 Updated May 28, 2025

OpenGVLab / V2PE

[ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding

Python 47 2 Updated Dec 13, 2024

raoyongming / DynamicViT

[NeurIPS 2021] [T-PAMI] DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification

Jupyter Notebook 608 75 Updated Jul 11, 2023

mayubo2333 / MMLongBench-Doc

Official Repository of MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations

Python 81 2 Updated Jul 15, 2024

mlpc-ucsd / BLIVA

(AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions

Python 258 23 Updated Apr 14, 2024

SwanHubX / SwanLab

⚡️SwanLab - an open-source, modern-design AI training tracking and visualization tool. Supports Cloud / Self-hosted use. Integrated with PyTorch / Transformers / LLaMA Factory / Swift / Ultralytics…

Python 1,619 109 Updated May 31, 2025

zjuchenlong / Thesis-latex

my Ph.D. thesis (Zhejiang University)

TeX 36 10 Updated Apr 9, 2022

modelscope / ms-swift

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL3, Llava, GLM4…

Python 7,859 667 Updated May 31, 2025

luogen1996 / LLaVA-HR

[ICLR2025] LLaVA-HR: High-Resolution Large Language-Vision Assistant

Python 238 12 Updated Aug 14, 2024

shirlyliu64 / ConvBench

ConvBench: A Multi-Turn Conversation Evaluation Benchmark with Hierarchical Ablation Capability for Large Vision-Language Models

Python 13 2 Updated Sep 27, 2024

deepseek-ai / DeepSeek-VL

DeepSeek-VL: Towards Real-World Vision-Language Understanding

Python 3,849 569 Updated Apr 24, 2024

ymy-k / Hi-SAM

[IEEE TPAMI] Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation

Python 271 15 Updated May 30, 2025

LeapLabTHU / GSVA

[CVPR2024] GSVA: Generalized Segmentation via Multimodal Large Language Models

Python 133 Updated Sep 12, 2024

FeipengMa6 / VLoRA

[NeurIPS 2024] Visual Perception by Large Language Model’s Weights

Python 45 1 Updated Mar 31, 2025

FoundationVision / VAR

[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ult…

Jupyter Notebook 8,069 497 Updated May 18, 2025

opendatalab / OmniDocBench

[CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation

Python 466 44 Updated May 13, 2025

microsoft / rho

Repo for Rho-1: Token-level Data Selection & Selective Pretraining of LLMs.

417 15 Updated Apr 18, 2024

LLaVA-VL / LLaVA-NeXT

Python 3,873 363 Updated May 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chenxn2020

Achievements