Highlights
- Pro
Lists (3)
Sort Name ascending (A-Z)
Stars
通过MCP协议操作blender建模, 让LLM直接创建3D模型, 开启3D建模的新篇章
This is an official implementation for "Video Swin Transformers".
[IJCV 2024] InterGen: Diffusion-based Multi-human Motion Generation under Complex Interactions
The official PyTorch implementation of L2CS-Net for gaze estimation and tracking
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture
Collection of AWESOME vision-language models for vision tasks
High-resolution models for human tasks.
DORA (Dataflow-Oriented Robotic Architecture) is middleware designed to streamline and simplify the creation of AI-based robotic applications. It offers low latency, composable, and distributed dat…
[NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
中文nlp解决方案(大模型、数据、模型、训练、推理)
Implementation of RT1 (Robotic Transformer) in Pytorch
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation
GRUtopia: Dream General Robots in a City at Scale
EVE Series: Encoder-Free Vision-Language Models from BAAI