Bugfixes
- In certain PDFs (<1%), rotated text caused words to be combined without spaces (like
Helloworld
). This was patched in pdftext, with the fix pulled in. - Surya sometimes inserted
<br>
tags into equations - this was fixed in surya
Block type rules
Add a processor to force relabel blocks based on rules.
An example is --block_relabel_str "Table:Picture:0.97,Form:Picture:1.0,TableOfContents:Picture:0.97"
. Each comma separated block is a rule, with the original block type first, the new block type second, and the confidence threshold last. The original block will be relabeled to the new block type if the layout model confidence is less than the threshold.
What's Changed
- Add block relabel processor by @tarun-menta in #747
- Dev by @VikParuchuri in #733
- Version bump by @VikParuchuri in #748
Full Changelog: v1.7.4...v1.7.5