-
USTC
Highlights
- Pro
Lists (6)
Sort Name ascending (A-Z)
Stars
Use ChatGPT to summarize the arXiv papers. 全流程加速科研,利用chatgpt进行论文全文总结+专业翻译+润色+审稿+审稿回复
This is the official code repository for the "Gradient-Guided Annealing for Domain Generalization" (CVPR 2025) paper.
[CVPR 2025] Towards Training-free Anomaly Detection with Vision and Language Foundation Models
Official implementation of the CVPR '25 highlight paper "Compositional Caching for Training-free Open-vocabulary Attribute Detection"
This repo aims to include materials (papers, codes, slides) about SAM2 (segment anything in images and videos). We are continuously improving the project. Welcome to PR the works (papers, repos) th…
The official code of our CVPR2025 paper: "Segment Any-Quality Images with Generative Latent Space Enhancement".
Project for "HyperSeg: Towards Universal Visual Segmentation with Large Language Model".
🚀 Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models
PyTorch Implementation of "Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Large Language Models"
Research papers and blogs to transition to AI Engineering
🐙 Guides, papers, lecture, notebooks and resources for prompt engineering
LMM solved catastrophic forgetting, AAAI2025
🔥 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Retrieval and Retrieval-augmented LLMs
Awesome Reasoning LLM Tutorial/Survey/Guide
[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Image inpainting tool powered by SOTA AI Model. Remove any unwanted object, defect, people from your pictures or erase and replace(powered by stable diffusion) any thing on your pictures.
Official inference framework for 1-bit LLMs
[ICLR 2025 Spotlight] The official implementation of the paper “LOKI:A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models”
🌐 Make websites accessible for AI agents. Automate tasks online with ease.
「ICLR 2025」 A Sanity Check for AI-generated Image Detection
[ICLR'25] Official code for the paper 'MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs'
Official implementation of 🛸 "UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface"
[CVPR 2025 Highlight] Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual Grounding