-
MAIS, CASIA @BraveGroup
- China
- Robertwyq.github.io
Stars
An open-source AI agent that brings the power of Gemini directly into your terminal.
We introduce CausalVQA, a benchmark dataset for video question answering (VQA) composed of question-answer pairs that probe models’ understanding of causality in the physical world.
PyTorch code and models for VJEPA2 self-supervised learning from video.
Drive-Pi0 and DriveMoE on End-to-end Autonomous Driving
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
MAGI-1: Autoregressive Video Generation at Scale
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
End-to-End Driving with Online Trajectory Evaluation via BEV World Model
MichalZawalski / embodied-CoT
Forked from openvla/openvlaEmbodied Chain of Thought: A robotic policy that reason to solve the task.
RoboVerse: Towards a Unified Platform, Dataset and Benchmark for Scalable and Generalizable Robot Learning
deepspeedai / Megatron-DeepSpeed
Forked from NVIDIA/Megatron-LMOngoing research training transformer language models at scale, including: BERT & GPT-2
[ICLR 2025] Autoregressive Video Generation without Vector Quantization
A GUI Agent application based on UI-TARS(Vision-Language Model) that allows you to control your computer using natural language.
NVIDIA Isaac GR00T N1.5 is the world's first open foundation model for generalized humanoid robot reasoning and skills.
Cosmos-Reason1 models understand the physical common sense and generate appropriate embodied decisions in natural language through long chain-of-thought reasoning processes.
verl: Volcano Engine Reinforcement Learning for LLMs
🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning
Embodied Reasoning Question Answer (ERQA) Benchmark
An AI-powered research assistant that performs iterative, deep research on any topic by combining search engines, web scraping, and large language models. The goal of this repo is to provide the si…
Fully open reproduction of DeepSeek-R1
Official implementation of Diffusion Policy Policy Optimization, arxiv 2024
[CVPR 2025 Highlight] Truncated Diffusion Model for Real-Time End-to-End Autonomous Driving
The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention
New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos