Fine-tuning LayoutLMv2 on DocVQA with Tesseract evaluation

This code fine-tunes LayoutLMv2 on DocVQA, training from either the dataset provided OCR, or for Tesseract's OCR. It evaluates using Tesseract's OCR.

This is not as optimized as it could be, but is merely to demonstrate the choice of OCR is quite important for DocVQA.

This was adapted from https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/LayoutLMv2/DocVQA/Fine_tuning_LayoutLMv2ForQuestionAnswering_on_DocVQA.ipynb by Brian Davis (hero.bd@gmail.com)

It fixes some issues with aligning the OCR to the GT answers by allowing fuzzy string matching and when an alignment cannot be found it removes the instance, rather than setting the CLS token to be the "answer."

The batch size I used was 5. It validates after an epoch and keeps a snapshot of the best.

Can be fine-tuned using: python train_docvqa.py [1/load] where "1" uses tesseract and "load" uses dataset OCR for training.

Can be run on test dataset with: python eval_docvqa.py snapshot.pth output.json

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
docvqa_dataset.py		docvqa_dataset.py
eval_docvqa.py		eval_docvqa.py
layoutlmv2_on_docvqa.py		layoutlmv2_on_docvqa.py
train_docvqa.py		train_docvqa.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Fine-tuning LayoutLMv2 on DocVQA with Tesseract evaluation

About

Uh oh!

Releases

Packages

Languages

herobd/layoutlmv2

Folders and files

Latest commit

History

Repository files navigation

Fine-tuning LayoutLMv2 on DocVQA with Tesseract evaluation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages