-
Shanghai AI Lab
- Shanghai
Stars
bespokelabsai / verifiers
Forked from willccbb/verifiersVerifiers for LLM Reinforcement Learning
Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.
Official implementation of HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned Guidance
MM-IFEngine: Towards Multimodal Instruction Following
拼好RAG:手搓并融合了GraphRAG、LightRAG、Neo4j-llm-graph-builder进行知识图谱构建以及搜索;整合DeepSearch技术实现私域RAG的推理;自制针对GraphRAG的评估框架| Integrate GraphRAG, LightRAG, and Neo4j-llm-graph-builder for knowledge graph construct…
Infinity is a high-throughput, low-latency serving engine for text-embeddings, reranking models, clip, clap and colpali
🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://discord.gg/jP8KfhDhyN
RelightVid: Temporal-Consistent Diffusion Model for Video Relighting
CodeScientist: An automated scientific discovery system for code-based experiments
Search for text, news, images and videos using the DuckDuckGo.com search engine
Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting yo…
The official Python SDK for Model Context Protocol servers and clients
Model Context Protocol Servers
Collection of awesome LLM apps with AI Agents and RAG using OpenAI, Anthropic, Gemini and opensource models.
This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!
(CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction
Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’
Light-A-Video: Training-free Video Relighting via Progressive Light Fusion
[ICML 2025] SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation
[ICML 2025 Spotlight] An official implementation of VideoRoPE: What Makes for Good Video Rotary Position Embedding?
Official implementation of X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL3, Llava, GLM4…
Constraint Back-translation Improves Complex Instruction Following of Large Language Models
Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.