Stars
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.
推荐/广告/搜索领域工业界经典以及最前沿论文集合。A collection of industry classics and cutting-edge papers in the field of recommendation/advertising/search.
Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilities
[CVPR 2024 Highlight] Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding
QwQ is the reasoning model series developed by Qwen team, Alibaba Cloud.
Official PyTorch Implementation of MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced Reranking and Noise-injected Training.
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
Awesome-RAG-Vision: a curated list of advanced retrieval augmented generation (RAG) for Computer Vision
This is the code repo for our paper "Learning More Effective Representations for Dense Retrieval through Deliberate Thinking Before Search".
ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
YaRN: Efficient Context Window Extension of Large Language Models
Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’
A jounery to real multimodel R1 ! We are doing on large-scale experiment
A fork to add multimodal model training to open-r1
Train transformer language models with reinforcement learning.
Fully open reproduction of DeepSeek-R1
Minimal reproduction of DeepSeek R1-Zero
Fast and memory-efficient exact attention