8000 chenxn2020 / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View chenxn2020's full-sized avatar

Block or report chenxn2020

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[ICLR 2025] MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation

Python 82 7 Updated Dec 10, 2024

🧑‍🚀 全世界最好的LLM资料总结(视频生成、Agent、辅助编程、数据处理、模型训练、模型推理、o1 模型、MCP、小语言模型、视觉语言模型) | Summary of the world's best LLM resources.

5,359 526 Updated May 25, 2025

Explore the Multimodal “Aha Moment” on 2B Model

Python 588 20 Updated Mar 18, 2025

LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.

Python 484 22 Updated Jan 13, 2025

MoBA: Mixture of Block Attention for Long-Context LLMs

Python 1,781 107 Updated Apr 3, 2025

🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"

Python 686 29 Updated Mar 19, 2025

Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paper

Python 637 35 Updated May 16, 2025

[CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".

Python 163 7 Updated May 26, 2025

Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models

Python 25 Updated May 21, 2025
1 Updated Dec 12, 2024

Rui Qian, Xin Yin, Dejing Dou†: Reasoning to Attend: Try to Understand How <SEG> Token Works (CVPR 2025)

Python 32 Updated May 2, 2025
Python 13 Updated May 15, 2025

Align Anything: Training All-modality Model with Feedback

Jupyter Notebook 3,824 476 Updated May 28, 2025

[ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding

Python 47 2 Updated Dec 13, 2024

[NeurIPS 2021] [T-PAMI] DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification

Jupyter Notebook 608 75 Updated Jul 11, 2023

Official Repository of MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations

Python 81 2 Updated Jul 15, 2024

(AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions

Python 258 23 Updated Apr 14, 2024

⚡️SwanLab - an open-source, modern-design AI training tracking and visualization tool. Supports Cloud / Self-hosted use. Integrated with PyTorch / Transformers / LLaMA Factory / Swift / Ultralytics…

Python 1,619 109 Updated May 31, 2025

my Ph.D. thesis (Zhejiang University)

TeX 36 10 Updated Apr 9, 2022

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL3, Llava, GLM4…

Python 7,859 667 Updated May 31, 2025

[ICLR2025] LLaVA-HR: High-Resolution Large Language-Vision Assistant

Python 238 12 Updated Aug 14, 2024

ConvBench: A Multi-Turn Conversation Evaluation Benchmark with Hierarchical Ablation Capability for Large Vision-Language Models

Python 13 2 Updated Sep 27, 2024

DeepSeek-VL: Towards Real-World Vision-Language Understanding

Python 3,849 569 Updated Apr 24, 2024

[IEEE TPAMI] Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation

Python 271 15 Updated May 30, 2025

[CVPR2024] GSVA: Generalized Segmentation via Multimodal Large Language Models

Python 133 Updated Sep 12, 2024

[NeurIPS 2024] Visual Perception by Large Language Model’s Weights

Python 45 1 Updated Mar 31, 2025

[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ult…

Jupyter Notebook 8,069 497 Updated May 18, 2025

[CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation

Python 466 44 Updated May 13, 2025

Repo for Rho-1: Token-level Data Selection & Selective Pretraining of LLMs.

417 15 Updated Apr 18, 2024
Python 3,873 363 Updated May 24, 2025
Next
0