GitHub - kaye0110/GOT-OCR2.0: Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Haoran Wei*, Chenglong Liu*, Jinyue Chen, Jia Wang, Lingyu Kong, Yanming Xu, Zheng Ge, Liang Zhao, Jianjian Sun, Yuang Peng, Chunrui Han, Xiangyu Zhang

Release

[2024/9/14]🔥🔥🔥 We release the official demo. Thanks very much for Huggingface providing the GPU resource.
[2024/9/13]🔥🔥🔥 We release the Huggingface deployment.
[2024/9/03]🔥🔥🔥 We open-source the codes, weights, and benchmarks. The paper can be found in this repo. We also have submitted it to Arxiv.
[2024/9/03]🔥🔥🔥 We release the OCR-2.0 model GOT!

Usage and License Notices: The data, code, and checkpoint are intended and licensed for research use only. They are also restricted to use that follow the license agreement of Vary.

Community contributions

We encourage everyone to develop GOT applications based on this repo. Thanks for the following contributions :

Colab of GOT ~ contributor: @Zizhe Wang

CPU version of GOT ~ contributor: @ElvisClaros

Online demo ~ contributor: @Joseph Pollack

Dokcer & client demo ~ contributor: @QIN2DIM

GUI of GOT ~ contributor: @XJF2332

Install

Our environment is cuda11.8+torch2.0.1
Clone this repository and navigate to the GOT folder

git clone https://github.com/Ucas-HaoranWei/GOT-OCR2.0.git
cd 'the GOT folder'

Install Package

conda create -n got python=3.10 -y
conda activate got
pip install -e .

Install Flash-Attention

pip install ninja
pip install flash-attn --no-build-isolation

GOT Weights

Demo

plain texts OCR:

python3 GOT/demo/run_ocr_2.0.py  --model-name  /GOT_weights/  --image-file  /an/image/file.png  --type ocr

format texts OCR:

python3 GOT/demo/run_ocr_2.0.py  --model-name  /GOT_weights/  --image-file  /an/image/file.png  --type format

fine-grained OCR:

python3 GOT/demo/run_ocr_2.0.py  --model-name  /GOT_weights/  --image-file  /an/image/file.png  --type format/ocr --box [x1,y1,x2,y2]

python3 GOT/demo/run_ocr_2.0.py  --model-name  /GOT_weights/  --image-file  /an/image/file.png  --type format/ocr --color red/green/blue

multi-crop OCR:

python3 GOT/demo/run_ocr_2.0_crop.py  --model-name  /GOT_weights/ --image-file  /an/image/file.png

multi-page OCR (the image path contains multiple .png files):

python3 GOT/demo/run_ocr_2.0_crop.py  --model-name  /GOT_weights/ --image-file  /images/path/  --multi-page

render the formatted OCR results:

python3 GOT/demo/run_ocr_2.0.py  --model-name  /GOT_weights/  --image-file  /an/image/file.png  --type format --render

Note: The rendering results can be found in /results/demo.html. Please open the demo.html to see the results.

Train

Train sample can be found here. Note that the '<image>' in the 'conversations'-'human'-'value' is necessary!
This codebase only supports post-training (stage-2/stage-3) upon our GOT weights.
If you want train from stage-1 described in our paper, you need this repo.

< 7284 div class="highlight highlight-source-shell notranslate position-relative overflow-auto" dir="auto" data-snippet-clipboard-copy-content="deepspeed /GOT-OCR-2.0-master/GOT/train/train_GOT.py \ --deepspeed /GOT-OCR-2.0-master/zero_config/zero2.json --model_name_or_path /GOT_weights/ \ --use_im_start_end True \ --bf16 True \ --gradient_accumulation_steps 2 \ --evaluation_strategy "no" \ --save_strategy "steps" \ --save_steps 200 \ --save_total_limit 1 \ --weight_decay 0. \ --warmup_ratio 0.001 \ --lr_scheduler_type "cosine" \ --logging_steps 1 \ --tf32 True \ --model_max_length 8192 \ --gradient_checkpointing True \ --dataloader_num_workers 8 \ --report_to none \ --per_device_train_batch_size 2 \ --num_train_epochs 1 \ --learning_rate 2e-5 \ --datasets pdf-ocr+scence \ --output_dir /your/output/path">

deepspeed   /GOT-OCR-2.0-master/GOT/train/train_GOT.py \
 --deepspeed /GOT-OCR-2.0-master/zero_config/zero2.json    --model_name_or_path /GOT_weights/ \
 --use_im_start_end True   \
 --bf16 True   \
 --gradient_accumulation_steps 2    \
 --evaluation_strategy "no"   \
 --save_strategy "steps"  \
 --save_steps 200   \
 --save_total_limit 1   \
 --weight_decay 0.    \
 --warmup_ratio 0.001     \
 --lr_scheduler_type "cosine"    \
 --logging_steps 1    \
 --tf32 True     \
 --model_max_length 8192    \
 --gradient_checkpointing True   \
 --dataloader_num_workers 8    \
 --report_to none  \
 --per_device_train_batch_size 2    \
 --num_train_epochs 1  \
 --learning_rate 2e-5   \
 --datasets pdf-ocr+scence \
 --output_dir /your/output/path

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
GOT-OCR-2.0-master		GOT-OCR-2.0-master
assets		assets
GOT-OCR-2.0-paper.pdf		GOT-OCR-2.0-paper.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Release

Community contributions

Contents

Install

GOT Weights

Demo

Train

Eval

Contact

Acknowledgement

Citation

About

Uh oh!

Releases

Packages

Languages

kaye0110/GOT-OCR2.0

Folders and files

Latest commit

History

Repository files navigation

General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Release

Community contributions

Contents

Install

GOT Weights

Demo

Train

Eval

Contact

Acknowledgement

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages