Stars
Train transformer language models with reinforcement learning.
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
[CoRL 2024] VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding
Collection of papers and resources on Multimodal Reasoning, including Vision-Language Models, Multimodal Chain-of-Thought, Visual Inference, and others.
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
用于从头预训练+SFT一个小参数量的中文LLaMa2的仓库;24G单卡即可运行得到一个具备简单中文问答能力的chat-llama2.
欢迎来到 LLM-Dojo,这里是一个开源大模型学习场所,使用简洁且易阅读的代码构建模型训练框架(支持各种主流模型如Qwen、Llama、GLM等等)、RLHF框架(DPO/CPO/KTO/PPO)等各种功能。👩🎓👨🎓
🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.
心理健康大模型 (LLM x Mental Health), Pre & Post-training & Dataset & Evaluation & Depoly & RAG, with InternLM / Qwen / Baichuan / DeepSeek / Mixtral / LLama / GLM series models
Awesome multi-modal large language paper/project, collections of popular training strategies, e.g., PEFT, LoRA.
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Ridge SfM Structure from Motion via robust pairwise matching under depth uncertainty
《动手学大模型Dive into LLMs》系列编程实践教程
[ICASSP 2025] Diffusion Features to Bridge Domain Gap for Semantic Segmentation
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
[AAAI 2025] DepthFM: Fast Monocular Depth Estimation with Flow Matching
STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution
[NeurIPS 2024] Geometry-Aware Large Reconstruction Model for Efficient and High-Quality 3D Generation
[ECCV2024] Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding
[CVPR-2024] Pytorch implementation of "Misalignment-Robust Frequency Distribution Loss for Image Transformation"
[NeurIPS 2024] MVInpainter: Learning Multi-View Consistent Inpainting to Bridge 2D and 3D Editing
[CVPR2024] VideoBooth: Diffusion-based Video Generation with Image Prompts
[CVPR 2025] StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
The official PyTorch implementation of the paper "Learning by Analogy: Reliable Supervision from Transformations for Unsupervised Optical Flow Estimation".
DeepSeek-VL: Towards Real-World Vision-Language Understanding
[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization