8000 Release Block type rules; bugfixes · datalab-to/marker · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Block type rules; bugfixes

Latest
Compare
Choose a tag to compare
@VikParuchuri VikParuchuri released this 11 Jun 20:36
· 53 commits to master since this release
1f7b686

Bugfixes

  • In certain PDFs (<1%), rotated text caused words to be combined without spaces (like Helloworld). This was patched in pdftext, with the fix pulled in.
  • Surya sometimes inserted <br> tags into equations - this was fixed in surya

Block type rules

Add a processor to force relabel blocks based on rules.

An example is --block_relabel_str "Table:Picture:0.97,Form:Picture:1.0,TableOfContents:Picture:0.97". Each comma separated block is a rule, with the original block type first, the new block type second, and the confidence threshold last. The original block will be relabeled to the new block type if the layout model confidence is less than the threshold.

What's Changed

Full Changelog: v1.7.4...v1.7.5

0