8000 GitHub - herobd/layoutlmv2: running LayoutLMv2
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

herobd/layoutlmv2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fine-tuning LayoutLMv2 on DocVQA with Tesseract evaluation

This code fine-tunes LayoutLMv2 on DocVQA, training from either the dataset provided OCR, or for Tesseract's OCR. It evaluates using Tesseract's OCR.

This is not as optimized as it could be, but is merely to demonstrate the choice of OCR is quite important for DocVQA.

This was adapted from https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/LayoutLMv2/DocVQA/Fine_tuning_LayoutLMv2ForQuestionAnswering_on_DocVQA.ipynb by Brian Davis (hero.bd@gmail.com)

It fixes some issues with aligning the OCR to the GT answers by allowing fuzzy string matching and when an alignment cannot be found it removes the instance, rather than setting the CLS token to be the "answer."

The batch size I used was 5. It validates after an epoch and keeps a snapshot of the best.

Can be fine-tuned using: python train_docvqa.py [1/load] where "1" uses tesseract and "load" uses dataset OCR for training.

Can be run on test dataset with: python eval_docvqa.py snapshot.pth output.json

About

running LayoutLMv2

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

0