8000 GitHub - suryatmodulus/chunkr: Vision model based PDF chunking.
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

suryatmodulus/chunkr

 
 

Repository files navigation


Logo

Chunkr | Open Source Document Intelligence API

Production-ready API service for document layout analysis, OCR, and semantic chunking.
Convert PDFs, PPTs, Word docs & images into RAG/LLM-ready chunks.

Layout Analysis | OCR + Bounding Boxes | Structured HTML and markdown | VLM Processing controls

Try it out! · Report Bug · Contact

Table of Contents

(Super) Quick Start

  1. Go to chunkr.ai
  2. Make an account and copy your API key
  3. Install our Python SDK:
    pip install chunkr-ai
  4. Use the SDK to process your documents:
    from chunkr_ai import Chunkr
    
    # Initialize with your API key from chunkr.ai
    chunkr = Chunkr(api_key="your_api_key")
    
    # Upload a document (URL or local file path)
    url = "https://chunkr-web.s3.us-east-1.amazonaws.com/landing_page/input/science.pdf"
    task = chunkr.upload(url)
    
    # Export results in various formats
    task.html(output_file="output.html")
    task.markdown(output_file="output.md")
    task.content(output_file="output.txt")
    task.json(output_file="output.json")
    
    # Clean up
    chunkr.close()

Documentation

Visit our docs for more information and examples.

Self-Hosted Deployment Options

Quick Start with Docker Compose

  1. Prerequisites:

  2. Clone the repo:

git clone https://github.com/lumina-ai-inc/chunkr
cd chunkr
  1. Set up environment variables:
# Copy the example environment file
cp .env.example .env

# Configure your environment variables
# Required: LLM_KEY as your OpenAI API key
  1. Start the services:

With GPU:

docker compose up -d
  1. Access the services:
    • Web UI: http://localhost:5173
    • API: http://localhost:8000

Note: Requires an NVIDIA CUDA GPU

  1. Stop the services when done:
docker compose down

Deployment with Kubernetes

For production environments, we provide a Helm chart and detailed deployment instructions:

  1. See our detailed guide at kube/README.md
  2. Includes configurations for high availability and scaling

For enterprise support and deployment assistance, contact us.

Licensing

This project is dual-licensed:

  1. GNU Affero General Public License v3.0 (AGPL-3.0)
  2. Commercial License

To use Chunkr without complying with the AGPL-3.0 license terms you can contact us or visit our website.

Connect With Us

About

Vision model based PDF chunking.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Rust 31.2%
  • TypeScript 30.5%
  • Python 21.4%
  • CSS 7.2%
  • PLpgSQL 5.2%
  • HCL 2.3%
  • Other 2.2%
0