-
06:40
(UTC -04:00) - https://yunlong10.github.io
- in/yunlong-yolo-tang
- @YunlongTang6
Highlights
- Pro
Stars
Code for paper "Towards Understanding Camera Motions in Any Video"
PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"
[CVPR 2025] OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?
Implementation for Describe Anything: Detailed Localized Image and Video Captioning
Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs
This repository collects papers on VLLM applications. We will update new papers irregularly.
🚀🚀🚀A curated list of papers on controllable video generation.
An AI-powered interactive avatar engine using Live2D, LLM, ASR, TTS, and RVC. Ideal for VTubing, streaming, and virtual assistant applications.
MAGI-1: Autoregressive Video Generation at Scale
Lightweight coding agent that runs in your terminal
Pytorch implementation for the paper titled "SimpleAR: Pushing the Frontier of Autoregressive Visual Generation"
程序员在家做饭方法指南。Programmer's guide about how to cook at home (Simplified Chinese only).
😎 A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond
Code for MetaMorph Multimodal Understanding and Generation via Instruction Tuning
A Unity MCP server that allows MCP clients like Claude Desktop or Cursor to perform Unity Editor actions.
[SIGGRAPH Asia 2024] Painting process generating using diffusion models
QwQ is the reasoning model series developed by Qwen team, Alibaba Cloud.
Offical repo for CAT-V - Caption Anything in Video: Object-centric Dense Video Captioning with Spatiotemporal Multimodal Prompting
Official PyTorch implementation of One-Minute Video Generation with Test-Time Training
Solution of the NTIRE 2024 Challenge on Efficient Super-Resolution
Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]
An otaku index for everything! ⭐ Star the project if you like it!