More
Stars
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.
Pretrained mixed models to be used with Calamari.
Official implementation of UnifiedReward & UnifiedReward-Think
✨✨Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy
Toolkit for linearizing PDFs for LLM datasets/training
The simplest, fastest repository for training/finetuning small-sized VLMs.
Official Repository of "Learning to Reason under Off-Policy Guidance"
Implementation for "The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer"
Minimalistic large language model 3D-parallelism training
Minimalistic 4D-parallelism distributed training framework for education purpose
Official implementation for ICDAR 2024 Oral paper "ICAL: Implicit Character-Aided Learning for Enhanced Handwritten Mathematical Expression Recognition"
Implementation of Nougat Neural Optical Understanding for Academic Documents
Formal representation and solving for Euclidean plane geometry problems.
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Python and JS tools to generate Printed LaTex formulas and images
自动生成HTML常用表单元素的样本数据集。供机器学习目标检测训练使用
Understanding R1-Zero-Like Training: A Critical Perspective
ToTTo is an open-domain English table-to-text dataset with over 120,000 training examples that proposes a controlled generation task: given a Wikipedia table and a set of highlighted table cells, p…
[CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
An open-source implementaion for fine-tuning Qwen2-VL and Qwen2.5-VL series by Alibaba Cloud.
Democratizing Reinforcement Learning for LLMs