Starred repositories
The official implement of "VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning"
New generation of CLIP with fine grained discrimination capability, ICML2025
A powerful tool for creating fine-tuning datasets for LLM
Suna - Open Source Generalist AI Agent
MAGI-1: Autoregressive Video Generation at Scale
A GUI Agent application based on UI-TARS(Vision-Language Model) that allows you to control your computer using natural language.
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
QwQ is the reasoning model series developed by Qwen team, Alibaba Cloud.
An AI-powered research assistant that performs iterative, deep research on any topic by combining search engines, web scraping, and large language models. The goal of this repo is to provide the si…
Open source alternative to Gemini Deep Research. Generate reports with AI based on search results.
Keep searching, reading webpages, reasoning until it finds the answer (or exceeding the token budget)
🦉 OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation
No fortress, purely open ground. OpenManus is Coming.
This is the official repository for Retrieval Augmented Visual Question Answering
Solve Visual Understanding with Reinforced VLMs
[ACL2025 Findings] Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models
A CPU Realtime VLM in 500M. Surpassed Moondream2 and SmolVLM. Training from scratch with ease.
Automate the process of making money online.
"MiniRAG: Making RAG Simpler with Small and Free Language Models"
Agentic-RAG explores advanced Retrieval-Augmented Generation systems enhanced with AI LLM agents.
Code for the paper "FinRL-DeepSeek: LLM-Infused Risk-Sensitive Reinforcement Learning for Trading Agents" arXiv:2502.07393