-
Nankai University
- Tianjin, China
-
21:52
(UTC -12:00) - https://www.nankai.edu.cn/
Stars
This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!
HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model
AnyBimanual: Transfering Unimanual Policy for General Bimanual Manipulation
[CVPR2025] Rethinking Query-based Transformer for Continual Image Segmentation
Imitation learning algorithms with Co-training for Mobile ALOHA: ACT, Diffusion Policy, VINN
Lumina Robotics Talent Call | Lumina社区具身智能招贤榜 | A list for Embodied AI / Robotics Jobs (PhD, RA, intern, full-time, etc
official repo of paper: Drone Referring Localization: An Efficient Heterogeneous Spatial Feature Interaction Method For UAV Self-Localization
「TCSVT2021」A Transformer-Based Feature Segmentation and Region Alignment Method For UAV-View Geo-Localization
「TIP2023」Vision-Based UAV Self-Positioning in Low-Altitude Urban Environments
[AAAI2025] - Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints
[CVPR 25 Highlight & ECCV Workshop 24 Best Paper] RoboTwin Dual-arm Robot Manipulation Simulation Platform
RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation
Project Page For "Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement"
Vision-and-Language Navigation in Continuous Environments using Habitat
Offical implementation of "Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning"
🚀 One-stop solution for creating your digital avatar from chat history 💡 Fine-tune LLMs with your chat logs to capture your unique style, then bind to a chatbot to bring your digital self to life. …
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Official implementation for "HA-VLN: A Benchmark for Human-Aware Navigation in Discrete-Continuous Environments with Dynamic Multi-Human Interactions, Real-World Validation, and an Open Leaderboard".
Awesome Reasoning LLM Tutorial/Survey/Guide
[AAAI 2025] The official repository of our paper "GCD: Advancing Vision-Language Models for Incremental Object Detection via Global Alignment and Correspondence Distillation"
Enhancing Representations through Heterogeneous Self-Supervised Learning (TPAMI 2025)
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
MAGI-1: Autoregressive Video Generation at Scale