Lists (6)
Sort Name ascending (A-Z)
Stars
Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’
[ICML 2025] Official repository for paper "Scaling Video-Language Models to 10K Frames via Hierarchical Differential Distillation"
D-FINE: Redefine Regression Task of DETRs as Fine-grained Distribution Refinement [ICLR 2025 Spotlight]
An open collection of implementation tips, tricks and resources for training large language models
From the Transistor to the Web Browser, a rough outline for a 12 week course
Implementation for Describe Anything: Detailed Localized Image and Video Captioning
Minimal and annotated implementations of key ideas from modern deep learning research.
RooCodeInc / Roo-Code
Forked from cline/clineRoo Code (prev. Roo Cline) gives you a whole dev team of AI agents in your code editor.
Solve Visual Understanding with Reinforced VLMs
A Comprehensive Evaluation Benchmark for Open-Vocabulary Detection (AAAI 2024)
Automatically fetch the titles of pasted links
Enhanced Quick Switcher plugin for Obsidian.md
UniDisc: A discrete diffusion model for joint multimodal generation, enabling controllable and efficient text-image synthesis, editing, and inpainting.
Official PyTorch Implementation of Opt-CWM: Self-Supervised Learning of Motion Concepts by Optimizing Counterfactuals.
A conference poster format with structure, content, creation, and presentation recommendations.
[ICLR'25] Official code for the paper 'MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs'
The official implement of "Grounded Chain-of-Thought for Multimodal Large Language Models"
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
jmhessel / mmc4
Forked from allenai/mmc4MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.
Incredibly fast JavaScript runtime, bundler, test runner, and package manager – all in one
👻 Ghostty is a fast, feature-rich, and cross-platform terminal emulator that uses platform-native UI and GPU acceleration.