8000 convert more exceptions by bertsky · Pull Request #365 · sirfz/tesserocr · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

convert more exceptions #365

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

bertsky
Copy link
Contributor
@bertsky bertsky commented May 7, 2025

To be used in conjunction with tesseract-ocr/tesseract#4420 – which hopefully will get merged in some form, ultimately.

There may be more functions that need exception conversion.

Example result:

15:36:15.156 ERROR ocrd.processor.base - Failure on page PHYS_0067: ELIST_ITERATOR::forward:Error:List would have returned a nullptr data pointer
Traceback (most recent call last):
  File "/data/ocr-d/ocrd_all/core/src/ocrd/processor/base.py", line 710, in process_workspace_handle_page_task
    task.result()
  File "/data/ocr-d/ocrd_all/core/src/ocrd/processor/base.py", line 124, in result
    return self.fn(*self.args, **self.kwargs)
  File "/data/ocr-d/ocrd_all/core/src/ocrd/processor/base.py", line 1157, in _page_worker
    _page_worker_processor.process_page_file(*input_files)
  File "/data/ocr-d/ocrd_all/core/src/ocrd/processor/base.py", line 809, in process_page_file
    result = self.process_page_pcgts(*input_pcgts, page_id=page_id)
  File "/data/ocr-d/ocrd_all/ocrd_tesserocr/ocrd_tesserocr/recognize.py", line 512, in process_page_pcgts
    self._process_existing_regions(regions, page_image, page_coords, pcgts.mapping)
  File "/data/ocr-d/ocrd_all/ocrd_tesserocr/ocrd_tesserocr/recognize.py", line 995, in _process_existing_regions
    self._process_existing_lines(textlines, region_image, region_coords, mapping)
  File "/data/ocr-d/ocrd_all/ocrd_tesserocr/ocrd_tesserocr/recognize.py", line 1059, in _process_existing_lines
    self._process_existing_words(words, line_image, line_coords, mapping)
  File "/data/ocr-d/ocrd_all/ocrd_tesserocr/ocrd_tesserocr/recognize.py", line 1095, in _process_existing_words
    word_conf = self.tessapi.AllWordConfidences()
  File "tesserocr.pyx", line 2386, in tesserocr.PyTessBaseAPI.AllWordConfidences
RuntimeError: ELIST_ITERATOR::forward:Error:List would have returned a nullptr data pointer

That is, there was a failed assertion in libtesseract, which now throws a C++ exception instead of hard abort(), so in Cython we could convert it to Python exception, which we can then catch and act on.

@sirfz
Copy link
Owner
sirfz commented May 8, 2025

This won't break older versions right?

@bertsky
Copy link
Contributor Author
bertsky commented May 8, 2025

This won't break older versions right?

I cannot imagine how. Adding except + will just convert C++ exceptions (instead of having them be handled by the C++ runtime). If libtesseract does not throw them, nothing should change. (And the difference of sacrificing the const specifier should also be negligable.)

Sry for not creating a minimal PR in the first place, btw. Do you want me to rebase to current master?

@sirfz
Copy link
Owner
sirfz commented May 8, 2025

Yes please rebase

8000

@bertsky bertsky force-pushed the convert-more-exceptions branch from 8de15c0 to 29cc0f3 Compare May 8, 2025 08:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0