More
Starred repositories
This project aims to collect and collate various datasets for multimodal large model training, including but not limited to pre-training data, instruction fine-tuning data, and In-Context learning …
Get your documents ready for gen AI
Official repository for EXAONE Deep built by LG AI Research
Resources on Large Language Models for Table Processing
The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.
Datasets and Evaluation Scripts for CompHRDoc
Parse PDFs into markdown using Vision LLMs
Official code implementation of Slow Perception:Let's Perceive Geometric Figures Step-by-step
[CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception
Official implementation of the ANLS* metric
This is the official release of the datasets introduced in the EMNLP 2024 paper: Modeling Layout Reading Order as Ordering Relations for Visually-rich Document Understanding.
An open-source implementaion for fine-tuning Qwen2-VL and Qwen2.5-VL series by Alibaba Cloud.
Document Artifical Intelligence
A high-throughput and memory-efficient inference and serving engine for LLMs
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website …
Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks
Companion code for FanOutQA: Multi-Hop, Multi-Document Question Answering for Large Language Models (ACL 2024)
mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
Awesome list for research on CLIP (Contrastive Language-Image Pre-Training).
InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions (AAAI2024)