Stars
A Model Context Protocol server for Excel file manipulation
A curated list of awesome papers on Embodied AI and related research/industry-driven resources.
Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)
《深度强化学习:原理与实践》,Code of the book <Deep Reinforcement Learning: Principles and Practices>
For the second dissertation chapter, extension of ns-arch (ch. 1) + Unity integration
[NeurIPS 2021] Multiscale Benchmarks for Multimodal Representation Learning
The code for ICASSP 2023 paper: MRML: Multimodal Rumor Detection by Deep Metric Learning.
Lumina-mGPT 2.0: Stand-alone Autoregressive Image Modeling
Documentation that simply works
Implementation of Diffusion Model for Cifar10
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
An Easy-to-use, Scalable and High-performance RLHF Framework designed for Multimodal Models.
An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & LoRA & vLLM & RFT)
HuggingLLM, Hugging Future.
Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.
This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!
🏡 GitHub Pages template for personal academic homepage
Solve Visual Understanding with Reinforced VLMs
A curated list of resources for activation engineering
Official implementation of paper VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interaction Format
[ICLR 2025] TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning
Frontier Multimodal Foundation Models for Image and Video Understanding
User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
使深信服(Sangfor)开发的非自由的 VPN 软件 EasyConnect 和 aTrust 运行在 docker 或 podman 中,并作为网关和/或提供 socks5、http 代理服务