Get your documents ready for gen AI
-
Updated
Jun 13, 2025 - Python
8000
Get your documents ready for gen AI
Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical structure, tables, and meta information from textual electronic documents. (Parse document; Document content extraction; Logical structure extraction; PDF parser; Scanned document parser; DOCX parser; HTML parser
Read SVG files and convert them to other formats.
Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelines & ingestor to Solr or Elastic search index & linked data graph database
AI-powered Dropbox search tool for private documents
Implementation of my paper "Real-time Document Localization in Natural Images by Recursive Application of a CNN."
Openai-style, fast & lightweight local language model inference w/ documents
BoxDetect is a Python package based on OpenCV which allows you to easily detect rectangular shapes like character or checkbox boxes on scanned forms.
Ingestors extract the contents of mixed unstructured documents into structured (followthemoney) data.
VLMHyperBench – open source фреймворк для оценки возможностей Vision language models (VLM) распознавать документы на русском языке с целью оценки их потенциала для автоматизации документооборота.
Python library for Entities, relationships and schemas extraction from documents
The invoice, document, and resume parser powered by AI.
Tools for Star Wars: Dark Forces assets.
Add a description, image, and links to the documents topic page so that developers can more easily learn about it.
To associate your repository with the documents topic, visit your repo's landing page and select "manage topics."