8000 Issue with Large-Scale Document Embedding in H2O GPT · Issue #1926 · h2oai/h2ogpt · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
Issue with Large-Scale Document Embedding in H2O GPT #1926
Open
@BhoomikaMuralidhara

Description

@BhoomikaMuralidhara

Hi everyone,

A mode was created specifically for an email folder in H2O GPT, where all documents are .docx. An issue has been observed when embedding a large number of documents into this mode.

Here’s what happens:

When embedding fewer documents (e.g., around 100 or less), everything works fine—all documents are successfully added to the database, and new ones can be added without any problems.
However, when embedding a large number of documents (e.g., around 13,000 .docx files), only a portion of the documents (approximately 4,000) appears in the database. After that, adding new documents becomes impossible.
This issue seems specific to the email folder mode. Since all documents are .docx, it doesn’t appear to be related to missing libraries.

Could this behavior be related to:

A database size limit?
Memory constraints?
A misconfiguration in the mode or embedding setup?
Any insights or suggestions for resolving this would be greatly appreciated.

Thanks in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0