8000 Update Unstructured Provider, Fix Type Error in SDK by NolanTrem · Pull Request #2103 · SciPhi-AI/R2R · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Update Unstructured Provider, Fix Type Error in SDK #2103

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 28, 2025

Conversation

NolanTrem
Copy link
Collaborator
@NolanTrem NolanTrem commented Mar 28, 2025

Important

Update r2r-dashboard image version, add OCR parser support, and fix type error in SDK.

  • Docker Compose:
    • Update r2r-dashboard image version to 1.0.3 in compose.full.swarm.yaml, compose.full.yaml, and compose.yaml.
  • Ingestion Providers:
    • Remove hardcoded parser override check in R2RIngestionProvider in r2r/base.py.
    • Add support for ocr parser override in UnstructuredIngestionProvider in unstructured/base.py.
  • SDK:
    • Fix type hint for ingestion_mode in create() method in documents.py to accept IngestionMode | str.
  • Misc:
    • Update project version to 3.5.8 in pyproject.toml.

This description was created by Ellipsis for dbc0495. It will automatically update as commits are pushed.

Copy link
Contributor
@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Reviewed everything up to dbc0495 in 2 minutes and 49 seconds

More details
  • Looked at 122 lines of code in 7 files
  • Skipped 0 files when reviewing.
  • Skipped posting 15 drafted comments based on config settings.
1. docker/compose.full.swarm.yaml:388
  • Draft comment:
    Image updated to 1.0.3. Verify that the new version is backwards compatible.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50%
    None
2. docker/compose.full.yaml:196
  • Draft comment:
    Image updated to 1.0.3; ensure environment consistency with other compose files.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50%
    None
3. docker/compose.yaml:52
  • Draft comment:
    r2r-dashboard image updated to 1.0.3. This should align with other configs.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50%
    None
4. py/core/providers/ingestion/r2r/base.py:259
  • Draft comment:
    Remove commented-out hardcoded parser override code if no longer needed.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
5. py/pyproject.toml:7
  • Draft comment:
    Version bumped from 3.5.7 to 3.5.8; ensure changelog is updated accordingly.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50%
    None
6. py/sdk/sync_methods/documents.py:65
  • Draft comment:
    Updated type hint for ingestion_mode to allow IngestionMode | str. Verify associated documentation matches this change.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50%
    The comment is asking the PR author to verify that the documentation matches the code change. This falls under the rule of not asking the author to ensure or verify things. The comment does not provide a specific suggestion or point out a potential issue with the code itself.
7. docker/compose.full.swarm.yaml:388
  • Draft comment:
    Updated r2r-dashboard image to version 1.0.3. Confirm that the new version is compatible with dependent services.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50%
    None
8. docker/compose.full.yaml:193
  • Draft comment:
    Ensure consistent update of r2r-dashboard image to version 1.0.3 across all compose configurations.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50%
    None
9. docker/compose.yaml:50
  • Draft comment:
    r2r-dashboard image updated to version 1.0.3 here as well. Verify consistency across environments.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50%
    None
10. py/core/providers/ingestion/r2r/base.py:259
  • Draft comment:
    Removed hardcoded parser override check. Ensure that fallback processing for 'zerox' PDF override is intentionally relaxed and consistent with other providers.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50%
    None
11. py/core/providers/ingestion/unstructured/base.py:293
  • Draft comment:
    Added explicit handling for parser_override values 'zerox' and 'ocr'. Consider adding an 'else' branch to log or handle unexpected override values.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50%
    None
12. py/pyproject.toml:7
  • Draft comment:
    Version bumped to 3.5.8. Ensure that the changelog and release notes are updated appropriately.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50%
    None
13. py/sdk/sync_methods/documents.py:65
  • Draft comment:
    Updated 'ingestion_mode' type annotation to accept both IngestionMode and str to fix type errors. Verify that downstream usage correctly handles both types.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50%
    None
14. py/core/providers/ingestion/r2r/base.py:222
  • Draft comment:
    Typographical error: Consider renaming 'text_spliiter' to 'text_splitter' for clarity and consistency.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
15. py/core/providers/ingestion/unstructured/base.py:416
  • Draft comment:
    Typographical error: The word 'inadvertedly' in the TODO comment should be corrected to 'inadvertently'.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_ZsHvBUDRFqYo7csL


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

8000
parser_name=f"zerox_{DocumentType.PDF.value}",
):
elements.append(element)
if parser_overrides[document.document_type.value] == "zerox":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider consolidating duplicated logic in the added branch for OCR override and ensure warning message clarity.

@NolanTrem NolanTrem merged commit 1bbdc56 into main Mar 28, 2025
44 of 46 checks passed
@NolanTrem NolanTrem deleted the Nolan/FixUnstructuredProvider branch March 28, 2025 18:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant 337D
0