GitHub - suryatmodulus/chunkr: Vision model based PDF chunking.

Chunkr | Open Source Document Intelligence API

Production-ready API service for document layout analysis, OCR, and semantic chunking.
Convert PDFs, PPTs, Word docs & images into RAG/LLM-ready chunks.

Layout Analysis | OCR + Bounding Boxes | Structured HTML and markdown | VLM Processing controls

Try it out! · Report Bug · Contact

(Super) Quick Start

Go to chunkr.ai
Make an account and copy your API key
Install our Python SDK:
```
pip install chunkr-ai
```

Use the SDK to process your documents:

from chunkr_ai import Chunkr

# Initialize with your API key from chunkr.ai
chunkr = Chunkr(api_key="your_api_key")

# Upload a document (URL or local file path)
url = "https://chunkr-web.s3.us-east-1.amazonaws.com/landing_page/input/science.pdf"
task = chunkr.upload(url)

# Export results in various formats
task.html(output_file="output.html")
task.markdown(output_file="output.md")
task.content(output_file="output.txt")
task.json(output_file="output.json")

# Clean up
chunkr.close()

Documentation

Visit our docs for more information and examples.

Self-Hosted Deployment Options

Quick Start with Docker Compose

Prerequisites:
- Docker and Docker Compose
- NVIDIA Container Toolkit (for GPU support, optional)
Clone the repo:

git clone https://github.com/lumina-ai-inc/chunkr
cd chunkr

Set up environment variables:

# Copy the example environment file
cp .env.example .env

# Configure your environment variables
# Required: LLM_KEY as your OpenAI API key

Start the services:

With GPU:

docker compose up -d

Access the services:
- Web UI: http://localhost:5173
- API: http://localhost:8000

Note: Requires an NVIDIA CUDA GPU

Stop the services when done:

docker compose down

Deployment with Kubernetes

For production environments, we provide a Helm chart and detailed deployment instructions:

See our detailed guide at kube/README.md
Includes configurations for high availability and scaling

For enterprise support and deployment assistance, contact us.

Licensing

This project is dual-licensed:

GNU Affero General Public License v3.0 (AGPL-3.0)
Commercial License

To use Chunkr without complying with the AGPL-3.0 license terms you can contact us or visit our website.

Connect With Us

📧 Email: mehul@lumina.sh
📅 Schedule a call: Book a 30-minute meeting
🌐 Visit our website: chunkr.ai

Name		Name	Last commit message	Last commit date
Latest commit History 2,939 Commits
.github/workflows		.github/workflows
.vscode		.vscode
apps/web		apps/web
clients		clients
core		core
docker		docker
images		images
kube		kube
packages		packages
services		services
terraform		terraform
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.npmrc		.npmrc
CHANGELOG.md		CHANGELOG.md
COMMERCIAL_LICENSE.md		COMMERCIAL_LICENSE.md
LICENSE		LICENSE
README.md		README.md
THIRD-PARTY-NOTICES.md		THIRD-PARTY-NOTICES.md
Terminal		Terminal
build_dockers.sh		build_dockers.sh
compose-cpu.yaml		compose-cpu.yaml
compose.yaml		compose.yaml
git.sh		git.sh
realm-export.json		realm-export.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Chunkr | Open Source Document Intelligence API

Table of Contents

(Super) Quick Start

Documentation

Self-Hosted Deployment Options

Quick Start with Docker Compose

Deployment with Kubernetes

Licensing

Connect With Us

About

Uh oh!

Releases

Packages

Languages

License

suryatmodulus/chunkr

Folders and files

Latest commit

History

Repository files navigation

Chunkr | Open Source Document Intelligence API

Table of Contents

(Super) Quick Start

Documentation

Self-Hosted Deployment Options

Quick Start with Docker Compose

Deployment with Kubernetes

Licensing

Connect With Us

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages