8000 GitHub - tiwater/MinerU-API: 📇 A MinerU server that auto-detects device settings and model sources, engineered for straightforward use.
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

📇 A MinerU server that auto-detects device settings and model sources, engineered for straightforward use.

License

Notifications You must be signed in to change notification settings

tiwater/MinerU-API

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MinerU v2.0 Multi-GPU Server

简体中文

A streamlined multi-GPU server implementation.

Quick Start

1. install MinerU

pip install --upgrade pip
pip install uv
uv pip install -r requirements.txt

2. Start the Server

python src/server.py

3. Start the example Client

python example/client.py

Now, pdf files under folder ./example/pdfs will be processed in parallel. Assuming you have 2 gpus, if you change the workers_per_device to 2, 4 pdf files will be processed at the same time!

Docker

Build Image

docker build -t mineru-api:v2.0 .

Builds the Docker image for MinerU API.

Run Container

docker run -d --gpus all \
    -p 24008:24008 \
    -v mineru-api:/app/output \
    mineru-api:v2.0

Runs the container, exposing port 24008 and mounting a named volume for output.

API Endpoints

/predict

This endpoint is used to submit PDF files for processing.

POST /predict

Request Body Example:

{
    "file": "base64_encoded_pdf_content",
    "options": {
        "backend": "pipeline",
        "lang": "en",
        "method": "auto",
        "formula_enable": true,
        "table_enable": true
    },
    "file_key": "optional_unique_file_identifier"
}

/download

This endpoint allows retrieval of processed files.

GET /download/{file_key}/all.zip
GET /download/{file_key}/file.md

Replace {file_key} with the identifier returned by the /predict endpoint or a custom file_key provided in the request.

Parsed File Retrieval

You can visit /download/ route to retrieve the output files. You might want to specify file_key in the client otherwise a random UUID would be returned by the server.

Using curl to retrieve files

# Download all output as a zip file
curl -O http://127.0.0.1:24008/download/my_document_key/all.zip

# Retrieve only the text content as a markdown file
curl -O http://127.0.0.1:24008/download/my_document_key/file.md

Generated files are currently hard coded to be removed in 7 days.

About

📇 A MinerU server that auto-detects device settings and model sources, engineered for straightforward use.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 92.6%
  • Dockerfile 7.4%
0