Document Extraction is an AI-powered tool for extracting structured data from unstructured documents (PDF, Word, images). It uses OpenAI and LlamaIndex for prompt-driven, flexible data extraction.
- Supports PDF, Word, Xlsx, images
- Customizable extraction via prompt
- Batch processing
- Outputs JSON
- FastAPI backend
-
Install dependencies
poetry install
-
Configure environment
- Set your OpenAI API key and extraction
prompt
.
- Set your OpenAI API key and extraction
-
Start the API
poetry run uvicorn main:app --reload
-
Use the API
- Upload documents and set your prompt via
/extract
endpoint. - See docs at
http://localhost:8000/docs
.
- Upload documents and set your prompt via
prompt: "Extract the contract parties and the signing date from the document."
- FastAPI, OpenAI, LlamaIndex, OCR tools
Open an issue for help or feedback.
https://deepwiki.com/akulubala/document-extraction/1-overview