fix: [ISSUE] indexing pdf with scans inside failed with timeout, tesseract vs llama3.2-vision? · Issue #1194 · Zipstack/unstract · GitHub
More Web Proxy on the site http://driver.im/
You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
uploading and indexing a big pdf containing scans , tesseract is used but is too slow and get a timeout:
tesseract is still running when extractor do a tiemout
unstract-backend | 172.28.0.1 - - [17/Mar/2025:09:57:30 +0000] "GET /api/v1/socket/?EIO=4&transport=websocket HTTP/1.1" 400 25 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/133.0.0.0 Safari/537.36"
unstract-x2text-service | [2025-03-17 09:57:30 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:7)
unstract-x2text-service | [2025-03-17 09:57:30 +0000] [7] [ERROR] Error handling request /api/v1/x2text/process
unstract-x2text-service | Traceback (most recent call last):
unstract-x2text-service | File "/app/.venv/lib/python3.9/site-packages/gunicorn/workers/sync.py", line 134, in handle
unstract-x2text-service | self.handle_request(listener, req, client, addr)
unstract-x2text-service | File "/app/.venv/lib/python3.9/site-packages/gunicorn/workers/sync.py", line 177, in handle_request
unstract-x2text-service | respiter = self.wsgi(environ, resp.start_response)
unstract-x2text-service | File "/app/.venv/lib/python3.9/site-packages/flask/app.py", line 1498, in __call__
unstract-x2text-service | return self.wsgi_app(environ, start_response)
unstract-x2text-service | File "/app/.venv/lib/python3.9/site-packages/flask/app.py", line 1473, in wsgi_app
unstract-x2text-service | response = self.full_dispatch_request()
unstract-x2text-service | File "/app/.venv/lib/python3.9/site-packages/flask/app.py", line 880, in full_dispatch_request
unstract-x2text-service | rv = self.dispatch_request()
unstract-x2text-service | File "/app/.venv/lib/python3.9/site-packages/flask/app.py", line 865, in dispatch_request
unstract-x2text-service | return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args) # type: ignore[no-any-return]
unstract-x2text-service | File "/app/app/authentication_middleware.py", line 16, in wrapper
unstract-x2text-service | return func(*args, **kwargs)
unstract-x2text-service | File "/app/app/controllers/controller.py", line 120, in process
unstract-x2text-service | response = requests.request(
unstract-x2text-service | File "/app/.venv/lib/python3.9/site-packages/requests/api.py", line 59, in request
unstract-x2text-service | return session.request(method=method, url=url, **kwargs)
unstract-x2text-service | File "/app/.venv/lib/python3.9/site-packages/requests/sessions.py", line 589, in request
unstract-x2text-service | resp = self.send(prep, **send_kwargs)
unstract-x2text-service | File "/app/.venv/lib/python3.9/site-packages/requests/sessions.py", line 703, in send
unstract-x2text-service | r = adapter.send(request, **kwargs)
unstract-x2text-service | File "/app/.venv/lib/python3.9/site-packages/requests/adapters.py", line 667, in send
unstract-x2text-service | resp = conn.urlopen(
unstract-x2text-service | File "/app/.venv/lib/python3.9/site-packages/urllib3/connectionpool.py", line 789, in urlopen
unstract-x2text-service | response = self._make_request(
unstract-x2text-service | File "/app/.venv/lib/python3.9/site-packages/urllib3/connectionpool.py", line 536, in _make_request
unstract-x2text-service | response = conn.getresponse()
unstract-x2text-service | File "/app/.venv/lib/python3.9/site-packages/urllib3/connection.py", line 464, in getresponse
unstract-x2text-service | httplib_response = super().getresponse()
unstract-x2text-service | File "/usr/local/lib/python3.9/http/client.py", line 1377, in getresponse
unstract-x2text-service | response.begin()
unstract-x2text-service | File "/usr/local/lib/python3.9/http/client.py", line 320, in begin
unstract-x2text-service | version, status, reason = self._read_status()
unstract-x2text-service | File "/usr/local/lib/python3.9/http/client.py", line 281, in _read_status
unstract-x2text-service | line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
unstract-x2text-service | File "/usr/local/lib/python3.9/socket.py", line 716, in readinto
unstract-x2text-service | return self._sock.recv_into(b)
unstract-x2text-service | File "/app/.venv/lib/python3.9/site-packages/gunicorn/workers/base.py", line 204, in handle_abort
unstract-x2text-service | sys.exit(1)
unstract-x2text-service | SystemExit: 1
To reproduce
llm profile:
Name LLM Embedding Model Vector Database Text Extractor
ollama-deepseek-r1 ollama-deepseek-r1 ollama-emb-deepseek-r1 pg-vdb-1 unstructured-io-1
Expected behavior
indexation ok
Environment details
Version: latest with optional profil
Additional context
Question
is there a way to replace old tesseract , not accelerated by gpu, with model llama 3.2 vision?
The text was updated successfully, but these errors were encountered:
Describe the bug
uploading and indexing a big pdf containing scans , tesseract is used but is too slow and get a timeout:
tesseract is still running when extractor do a tiemout
To reproduce
llm profile:
Name LLM Embedding Model Vector Database Text Extractor
ollama-deepseek-r1 ollama-deepseek-r1 ollama-emb-deepseek-r1 pg-vdb-1 unstructured-io-1
Expected behavior
indexation ok
Environment details
Additional context
Question
is there a way to replace old tesseract , not accelerated by gpu, with model llama 3.2 vision?
The text was updated successfully, but these errors were encountered: