-
Northeastern University (China)
- Shenyang, Liaoning, China
Starred repositories
[IROS'25 Oral] WMNav: Integrating Vision-Language Models into World Models for Object Goal Navigation
【CVPR 2025 Highlight】MonSter: Marry Monodepth to Stereo Unleashes Power
Code of the paper "NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning" (TPAMI 2025)
[ECCV 2024] Official implementation of NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models
Convert PDF to markdown + JSON quickly with high accuracy
RoboVerse: Towards a Unified Platform, Dataset and Benchmark for Scalable and Generalizable Robot Learning
Reading list for research topics in embodied vision
Ideas and thoughts about the fascinating Vision-and-Language Navigation
MM-EUREKA: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning
LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoning
Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
This is the first paper to explore how to effectively use RL for MLLMs and introduce Vision-R1, a reasoning MLLM that leverages cold-start initialization and RL training to incentivize reasoning ca…
A fork to add multimodal model training to open-r1
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
R1-onevision, a visual language model capable of deep CoT reasoning.
从无名小卒到大模型(LLM)大英雄~ 欢迎关注后续!!!
WWW2025 Multimodal Intent Recognition for Dialogue Systems Challenge
支持查询主流agent框架技术文档的MCP server(支持stdio和sse两种传输协议), 支持 langchain、llama-index、autogen、agno、openai-agents-sdk、mcp-doc、camel-ai 和 crew-ai
A lightweight, powerful framework for multi-agent workflows
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Official code and checkpoint release for mobile robot foundation models: GNM, ViNT, and NoMaD.