Highlights
Lists (22)
Sort Name ascending (A-Z)
๐ค AI
๐ฏ Algorithm
๐ BigQuery
๐
๐ CLIP / VLM
Data Mining
๐๏ธโ๐จ๏ธ Vision
Game Bot
๐งโ๐ป Git
๐ GNN
๐จ Personal Web Templates
๐ฌ NLP
๐ป nodesktop
JS, CSS๐ง object-centric learning
๐ Open Vocabulary
๐ Scene Graph
๐ Templates
โ๏ธ Setup, dotfile
๐ Part Segmentation
โญ Hetero GNN / CL
๐ฅ๏ธ Ubuntu
๐ฒ Wordle
wordleStarred repositories
Train transformer language models with reinforcement learning.
Data release of Sci-PosterLayout
Cheng-Fu Yang*, Wan-Cyuan Fan*, Fu-En Yang, Yu-Chiang Frank Wang, "LayoutTransformer: Scene Layout Generation with Conceptual and Spatial Diversity", Proceedings of the IEEE/CVF Conference on Compuโฆ
LangCode - Improving alignment and reasoning of large language models (LLMs) with natural language embedded program (NLEP).
Code for the paper "AutoPresent: Designing Structured Visuals From Scratch" (CVPR 2025)
PPTAgent: Generating and Evaluating Presentations Beyond Text-to-Slides
Get your documents ready for gen AI
Collection of leaked system prompts
๐ก All-in-one open-source AI framework for semantic search, LLM orchestration and language model workflows
[ECCV 2024] Elysium: Exploring Object-level Perception in Videos via MLLM
All-in-One Development Tool based on PaddlePaddle
[CVPR 2025] This is an official inference code of the paper "BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation" . Project page: https://bizgen-msra.github.io/
A Unified Toolkit for Deep Learning Based Document Image Analysis
The official repository for paper "MLLMs Need 3D-Aware Representation Supervision for Scene Understanding"
Open-source Multi-agent Poster Generation from Papers
EVE Series: Encoder-Free Vision-Language Models from BAAI
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
[NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
๐ฆ ๊นํ๋ธ ํ๋์ผ๋ก ํซ์ ํค์ฐ์ธ์ / Have pet in your github
SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding
Visual Instruction Tuning towards General-Purpose Multimodal Model: A Survey
Collection of AWESOME vision-language models for vision tasks
3D-Aware Vision-Language Models Fine-Tuning with Geometric Distillation
Code for CVPR 2025 paper: TraF-Align: Trajectory-aware Feature Alignment for Asynchronous Multi-agent Perception
[ICML 2025] A platform for developers to simulate collaborative research activities
[CVPR 2025 (Oral)] Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key
Official Implementation for paper: Negative Token Merging: Image-based Adversarial Feature Guidance