Stars
Code for PerAct², a language-conditioned imitation learning agent designed for bimanual robotic manipulation using the RLBench environment. It includes dataset generation, training scripts, and eva…
RoboBrain 2.0: Advanced version of RoboBrain. See Better. Think Harder. Do Smarter. 🎉🎉🎉
AudioLDM training, finetuning, evaluation and inference.
Pybullet and libero data collection related code.
PyTorch implementation of [ThinkSound], a unified framework for generating audio from any modality, guided by Chain-of-Thought (CoT) reasoning.
Make-An-Audio-3: Transforming Text/Video into Audio via Flow-based Large Diffusion Transformers
PyTorch Implementation of AudioLCM (ACM-MM'24): a efficient and high-quality text-to-audio generation with latent consistency model.
RLBench_ACT: Running ALoha ACT and Diffusion Policy in the RLBench Framework
GraspVLA: a Grasping Foundation Model Pre-trained on Billion-scale Synthetic Action Data
RoboVerse: Towards a Unified Platform, Dataset and Benchmark for Scalable and Generalizable Robot Learning
[CVPR 2024] OneLLM: One Framework to Align All Modalities with Language
🔥🔥🔥 专注于YOLO11,YOLOv8、TYOLOv12、YOLOv10、RT-DETR、YOLOv7、YOLOv5改进模型,Support to improve backbone, neck, head, loss, IoU, NMS and other modules🚀
《开源大模型食用指南》针对中国宝宝量身打造的基于Linux环境快速微调(全参数/Lora)、部署国内外开源大模型(LLM)/多模态大模型(MLLM)教程
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
Online RL with Simple Reward Enables Training VLA Models with Only One Trajectory
Interactive Post-Training for Vision-Language-Action Models
Source code for models described in the paper "AudioCLIP: Extending CLIP to Image, Text and Audio" (https://arxiv.org/abs/2106.13043)
The official code base of Accommodating Audio Modality in CLIP for Multimodal Processing
A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
verl: Volcano Engine Reinforcement Learning for LLMs
Deep Reinforcement Learning for mobile robot navigation in ROS2 Gazebo simulator. Using DRL (SAC, TD3) neural networks, a robot learns to navigate to a random goal point in a simulated environment …
[IJCAI 2024] EAT: Self-Supervised Pre-Training with Efficient Audio Transformer